All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors.
@ 2013-08-08  3:04 Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 1/7] Intel MIC Host Driver for X100 family Sudeep Dutt
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-08  3:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell,
	Michael S. Tsirkin, Rob Landley, linux-kernel, virtualization,
	linux-doc
  Cc: Harshavardhan R Kharche, Peter P Waskiewicz Jr,
	Yaozu (Eddie) Dong, Sudeep Dutt, Ashutosh Dixit, AsiasHeasias,
	Caz Yokoyama, Dasaratharaman Chandramouli

ChangeLog:
=========

v1 => v2:
a) License wording cleanup, sysfs ABI documentation, patch 1 refactoring
   into 3 smaller patches and function renames, as per feedback from
   Greg Kroah-Hartman.
b) Use VRINGH infrastructure for accessing virtio rings from the host
   in patch 5, as per feedback from Michael S. Tsirkin.

v1: Initial post @ https://lkml.org/lkml/2013/7/24/810

Description:
============

An Intel MIC X100 device is a PCIe form factor add-in coprocessor
card based on the Intel Many Integrated Core (MIC) architecture
that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
implements the three required standard address spaces i.e. configuration,
memory and I/O. The host OS loads a device driver as is typical for
PCIe devices. The card itself runs a bootstrap after reset that
transfers control to the card OS downloaded from the host driver.
The card OS as shipped by Intel is a Linux kernel with modifications
for the X100 devices.

Since it is a PCIe card, it does not have the ability to host hardware
devices for networking, storage and console. We provide these devices
on X100 coprocessors thus enabling a self-bootable equivalent environment
for applications. A key benefit of our solution is that it leverages
the standard virtio framework for network, disk and console devices,
though in our case the virtio framework is used across a PCIe bus.

Here is a block diagram of the various components described above. The
virtio backends are situated on the host rather than the card given better
single threaded performance for the host compared to MIC and the ability of
the host to initiate DMA's to/from the card using the MIC DMA engine.

                              |
       +----------+           |             +----------+
       | Card OS  |           |             | Host OS  |
       +----------+           |             +----------+
                              |
+-------+ +--------+ +------+ | +---------+  +--------+ +--------+
| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
| Net   | |Console | |Block | | |Net      |  |Console | |Block   |
| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
+-------+ +--------+ +------+ | +---------+  +--------+ +--------+
    |         |         |     |      |            |         |
    |         |         |     |Ring 3|            |         |
    |         |         |     |------|------------|---------|-------
    +-------------------+     |Ring 0+--------------------------+
              |               |      | Virtio over PCIe IOCTLs  |
              |               |      +--------------------------+
      +--------------+        |                   |
      |Intel MIC     |        |            +---------------+
      |Card Driver   |        |            |Intel MIC      |
      +--------------+        |            |Host Driver    |
              |               |            +---------------+
              |               |                   |
     +-------------------------------------------------------------+
     |                                                             |
     |                    PCIe Bus                                 |
     +-------------------------------------------------------------+

The following series of patches are partitioned as follows:

Patch 1: This patch introduces the "Intel MIC Host Driver" in the block
diagram which does the following:
a) Initializes the Intel MIC X100 PCIe devices.
b) Provides sysfs entries for family and stepping information.

Patch 2: This patch enables the following features in the
"Intel MIC Host Driver" in the block diagram:
a) MSIx, MSI and legacy interrupt support.
b) System Memory Page Table(SMPT) support. SMPT enables system memory
   access from the card. On X100 devices the host can program 32 SMPT
   registers each capable of accessing 16GB of system memory
   address space from X100 devices. The registers can thereby be used
   to access a cumulative 512GB of system memory address space from
   X100 devices at any point in time.

Patch 3: This patch enables the following features in the
"Intel MIC Host Driver" in the block diagram:
a) Boots and shuts down the card via sysfs entries.
b) Allocates and maps a device page for communication with the
   card driver and updates the device page address via scratchpad
   registers.
c) Provides sysfs entries for shutdown status, kernel command line,
   ramdisk and log buffer information.

Patch 4: This patch introduces the "Intel MIC Card Driver" in the block
diagram which does the following:
a) Initializes the Intel MIC X100 platform device and driver.
b) Sets up support to handle shutdown requests from the host.
c) Maps the device page after obtaining the device page address
   from the scratchpad registers updated by the host.
d) Informs the host upon a card crash by registering a panic notifier.
e) Informs the host upon a poweroff/halt event.

Patch 5: This patch introduces the host "Virtio over PCIe" interface for
Intel MIC. It allows creating user space backends on the host and instantiating
virtio devices for them on the Intel MIC card. It uses the existing VRINGH
infrastructure in the kernel to access virtio rings from the host. A character
device per MIC is exposed with IOCTL, mmap and poll callbacks. This allows the
user space backend to:
(a) add/remove a virtio device via a device page.
(b) map (R/O) virtio rings and device page to user space.
(c) poll for availability of data.
(d) copy a descriptor or entire descriptor chain to/from the card.
(e) modify virtio configuration.
(f) handle virtio device reset.
The buffers are copied over using CPU copies for this initial patch
and host initiated MIC DMA support is planned for future patches.
The avail and desc virtio rings are in host memory and the used ring
is in card memory to maximize writes across PCIe for performance.

Patch 6: This patch introduces the card "Virtio over PCIe" interface for
Intel MIC. It allows virtio drivers on the card to communicate with their
user space backends on the host via a device page. Ring 3 apps on the host
can add, remove and configure virtio devices. A thin MIC specific
virtio_config_ops is implemented which is borrowed heavily from previous
similar implementations in lguest and s390 @
drivers/lguest/lguest_device.c
drivers/s390/kvm/kvm_virtio.c

Patch 7: This patch introduces a sample user space daemon which
implements the virtio device backends on the host. The daemon
creates/removes/configures virtio device backends by communicating with
the Intel MIC Host Driver. The virtio devices currently supported are
virtio net, virtio console and virtio block. Virtio net supports TSO/GSO.
The daemon also monitors card shutdown status and takes appropriate actions
like killing the virtio backends and resetting the card upon card shutdown
and crashes.

The patches have been compiled/validated against v3.10.

Ashutosh Dixit (2):
  Intel MIC Host Driver Changes for Virtio Devices.
  Intel MIC Card Driver Changes for Virtio Devices.

Caz Yokoyama (1):
  Sample Implementation of Intel MIC User Space Daemon.

Dasaratharaman Chandramouli (1):
  Intel MIC Host Driver Interrupt/SMPT support for X100 family.

Sudeep Dutt (3):
  Intel MIC Host Driver for X100 family.
  Intel MIC Host Driver, card OS state management.
  Intel MIC Card Driver for X100 family.

 Documentation/ABI/testing/sysfs-class-mic.txt |  118 ++
 Documentation/mic/mic_overview.txt            |   48 +
 Documentation/mic/mpssd/.gitignore            |    1 +
 Documentation/mic/mpssd/Makefile              |   19 +
 Documentation/mic/mpssd/micctrl               |  152 +++
 Documentation/mic/mpssd/mpss                  |  245 ++++
 Documentation/mic/mpssd/mpssd.c               | 1689 +++++++++++++++++++++++++
 Documentation/mic/mpssd/mpssd.h               |  100 ++
 Documentation/mic/mpssd/sysfs.c               |  103 ++
 drivers/misc/Kconfig                          |    1 +
 drivers/misc/Makefile                         |    1 +
 drivers/misc/mic/Kconfig                      |   39 +
 drivers/misc/mic/Makefile                     |    6 +
 drivers/misc/mic/card/Makefile                |   11 +
 drivers/misc/mic/card/mic_common.h            |   36 +
 drivers/misc/mic/card/mic_debugfs.c           |  132 ++
 drivers/misc/mic/card/mic_debugfs.h           |   35 +
 drivers/misc/mic/card/mic_device.c            |  306 +++++
 drivers/misc/mic/card/mic_device.h            |  127 ++
 drivers/misc/mic/card/mic_virtio.c            |  647 ++++++++++
 drivers/misc/mic/card/mic_virtio.h            |   74 ++
 drivers/misc/mic/card/mic_x100.c              |  256 ++++
 drivers/misc/mic/card/mic_x100.h              |   48 +
 drivers/misc/mic/common/mic_device.h          |   51 +
 drivers/misc/mic/host/Makefile                |   13 +
 drivers/misc/mic/host/mic_boot.c              |  185 +++
 drivers/misc/mic/host/mic_common.h            |   32 +
 drivers/misc/mic/host/mic_debugfs.c           |  494 ++++++++
 drivers/misc/mic/host/mic_debugfs.h           |   29 +
 drivers/misc/mic/host/mic_device.h            |  291 +++++
 drivers/misc/mic/host/mic_fops.c              |  227 ++++
 drivers/misc/mic/host/mic_fops.h              |   32 +
 drivers/misc/mic/host/mic_main.c              | 1110 ++++++++++++++++
 drivers/misc/mic/host/mic_smpt.c              |  436 +++++++
 drivers/misc/mic/host/mic_smpt.h              |   98 ++
 drivers/misc/mic/host/mic_sysfs.c             |  314 +++++
 drivers/misc/mic/host/mic_virtio.c            |  710 +++++++++++
 drivers/misc/mic/host/mic_virtio.h            |  139 ++
 drivers/misc/mic/host/mic_x100.c              |  661 ++++++++++
 drivers/misc/mic/host/mic_x100.h              |   99 ++
 include/uapi/linux/Kbuild                     |    2 +
 include/uapi/linux/mic_common.h               |  236 ++++
 include/uapi/linux/mic_ioctl.h                |   76 ++
 43 files changed, 9429 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-mic.txt
 create mode 100644 Documentation/mic/mic_overview.txt
 create mode 100644 Documentation/mic/mpssd/.gitignore
 create mode 100644 Documentation/mic/mpssd/Makefile
 create mode 100755 Documentation/mic/mpssd/micctrl
 create mode 100755 Documentation/mic/mpssd/mpss
 create mode 100644 Documentation/mic/mpssd/mpssd.c
 create mode 100644 Documentation/mic/mpssd/mpssd.h
 create mode 100644 Documentation/mic/mpssd/sysfs.c
 create mode 100644 drivers/misc/mic/Kconfig
 create mode 100644 drivers/misc/mic/Makefile
 create mode 100644 drivers/misc/mic/card/Makefile
 create mode 100644 drivers/misc/mic/card/mic_common.h
 create mode 100644 drivers/misc/mic/card/mic_debugfs.c
 create mode 100644 drivers/misc/mic/card/mic_debugfs.h
 create mode 100644 drivers/misc/mic/card/mic_device.c
 create mode 100644 drivers/misc/mic/card/mic_device.h
 create mode 100644 drivers/misc/mic/card/mic_virtio.c
 create mode 100644 drivers/misc/mic/card/mic_virtio.h
 create mode 100644 drivers/misc/mic/card/mic_x100.c
 create mode 100644 drivers/misc/mic/card/mic_x100.h
 create mode 100644 drivers/misc/mic/common/mic_device.h
 create mode 100644 drivers/misc/mic/host/Makefile
 create mode 100644 drivers/misc/mic/host/mic_boot.c
 create mode 100644 drivers/misc/mic/host/mic_common.h
 create mode 100644 drivers/misc/mic/host/mic_debugfs.c
 create mode 100644 drivers/misc/mic/host/mic_debugfs.h
 create mode 100644 drivers/misc/mic/host/mic_device.h
 create mode 100644 drivers/misc/mic/host/mic_fops.c
 create mode 100644 drivers/misc/mic/host/mic_fops.h
 create mode 100644 drivers/misc/mic/host/mic_main.c
 create mode 100644 drivers/misc/mic/host/mic_smpt.c
 create mode 100644 drivers/misc/mic/host/mic_smpt.h
 create mode 100644 drivers/misc/mic/host/mic_sysfs.c
 create mode 100644 drivers/misc/mic/host/mic_virtio.c
 create mode 100644 drivers/misc/mic/host/mic_virtio.h
 create mode 100644 drivers/misc/mic/host/mic_x100.c
 create mode 100644 drivers/misc/mic/host/mic_x100.h
 create mode 100644 include/uapi/linux/mic_common.h
 create mode 100644 include/uapi/linux/mic_ioctl.h

-- 
1.8.2.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/7] Intel MIC Host Driver for X100 family.
  2013-08-08  3:04 [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors Sudeep Dutt
@ 2013-08-08  3:04 ` Sudeep Dutt
  2013-08-12 22:58   ` Greg Kroah-Hartman
                     ` (2 more replies)
  2013-08-08  3:04 ` [PATCH v2 2/7] Intel MIC Host Driver Interrupt/SMPT support " Sudeep Dutt
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-08  3:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell,
	Michael S. Tsirkin, Rob Landley, linux-kernel, virtualization,
	linux-doc
  Cc: Harshavardhan R Kharche, Peter P Waskiewicz Jr,
	Yaozu (Eddie) Dong, Sudeep Dutt, Ashutosh Dixit, AsiasHeasias,
	Caz Yokoyama, Dasaratharaman Chandramouli

This patch enables the following:
a) Initializes the Intel MIC X100 PCIe devices.
b) Provides sysfs entries for family and stepping information.

Co-author: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---
 Documentation/ABI/testing/sysfs-class-mic.txt |  34 +++
 drivers/misc/Kconfig                          |   1 +
 drivers/misc/Makefile                         |   1 +
 drivers/misc/mic/Kconfig                      |  19 ++
 drivers/misc/mic/Makefile                     |   5 +
 drivers/misc/mic/common/mic_device.h          |  37 +++
 drivers/misc/mic/host/Makefile                |   8 +
 drivers/misc/mic/host/mic_common.h            |  30 +++
 drivers/misc/mic/host/mic_device.h            | 114 +++++++++
 drivers/misc/mic/host/mic_main.c              | 326 ++++++++++++++++++++++++++
 drivers/misc/mic/host/mic_sysfs.c             |  94 ++++++++
 drivers/misc/mic/host/mic_x100.c              |  86 +++++++
 drivers/misc/mic/host/mic_x100.h              |  47 ++++
 13 files changed, 802 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-mic.txt
 create mode 100644 drivers/misc/mic/Kconfig
 create mode 100644 drivers/misc/mic/Makefile
 create mode 100644 drivers/misc/mic/common/mic_device.h
 create mode 100644 drivers/misc/mic/host/Makefile
 create mode 100644 drivers/misc/mic/host/mic_common.h
 create mode 100644 drivers/misc/mic/host/mic_device.h
 create mode 100644 drivers/misc/mic/host/mic_main.c
 create mode 100644 drivers/misc/mic/host/mic_sysfs.c
 create mode 100644 drivers/misc/mic/host/mic_x100.c
 create mode 100644 drivers/misc/mic/host/mic_x100.h

diff --git a/Documentation/ABI/testing/sysfs-class-mic.txt b/Documentation/ABI/testing/sysfs-class-mic.txt
new file mode 100644
index 0000000..36cdb70
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-mic.txt
@@ -0,0 +1,34 @@
+What:		/sys/class/mic/
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		The mic class directory belongs to Intel MIC devices and
+		provides information per MIC device. An Intel MIC device is a
+		PCIe form factor add-in Coprocessor card based on the Intel Many
+		Integrated Core (MIC) architecture that runs a Linux OS.
+
+What:		/sys/class/mic/mic(x)
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		The /sys/class/mic/mic0, /sys/class/mic/mic1, like directories
+		represent MIC devices (0,1,..etc). Each directory has
+		information specific to that MIC device.
+
+What:		/sys/class/mic/mic(x)/family
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		Provides information about the Coprocessor family for an Intel
+		MIC device. For example - "x100"
+
+What:		/sys/class/mic/mic(x)/stepping
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		Provides information about the silicon stepping for an Intel
+		MIC device. For example - "A0" or "B0"
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index c002d86..09fcca9 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -536,4 +536,5 @@ source "drivers/misc/carma/Kconfig"
 source "drivers/misc/altera-stapl/Kconfig"
 source "drivers/misc/mei/Kconfig"
 source "drivers/misc/vmw_vmci/Kconfig"
+source "drivers/misc/mic/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index c235d5b..0b7ea3e 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -53,3 +53,4 @@ obj-$(CONFIG_INTEL_MEI)		+= mei/
 obj-$(CONFIG_VMWARE_VMCI)	+= vmw_vmci/
 obj-$(CONFIG_LATTICE_ECP3_CONFIG)	+= lattice-ecp3-config.o
 obj-$(CONFIG_SRAM)		+= sram.o
+obj-y				+= mic/
diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
new file mode 100644
index 0000000..aaefd0c
--- /dev/null
+++ b/drivers/misc/mic/Kconfig
@@ -0,0 +1,19 @@
+comment "Intel MIC Host Driver"
+
+config INTEL_MIC_HOST
+	tristate "Intel MIC Host Driver"
+	depends on 64BIT && PCI
+	default N
+	help
+	  This enables Host Driver support for the Intel Many Integrated
+	  Core (MIC) family of PCIe form factor coprocessor devices that
+	  run a 64 bit Linux OS. The driver manages card OS state and
+	  enables communication between host and card. Intel MIC X100
+	  devices are currently supported.
+
+	  If you are building a host kernel with an Intel MIC device then
+	  say M (recommended) or Y, else say N. If unsure say N.
+
+	  More information about the Intel MIC family as well as the Linux
+	  OS and tools for MIC to use with this driver are available from
+	  <http://software.intel.com/en-us/mic-developer>.
diff --git a/drivers/misc/mic/Makefile b/drivers/misc/mic/Makefile
new file mode 100644
index 0000000..8e72421
--- /dev/null
+++ b/drivers/misc/mic/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile - Intel MIC Linux driver.
+# Copyright(c) 2013, Intel Corporation.
+#
+obj-$(CONFIG_INTEL_MIC_HOST) += host/
diff --git a/drivers/misc/mic/common/mic_device.h b/drivers/misc/mic/common/mic_device.h
new file mode 100644
index 0000000..f02262e
--- /dev/null
+++ b/drivers/misc/mic/common/mic_device.h
@@ -0,0 +1,37 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC driver.
+ *
+ */
+#ifndef __MIC_COMMON_DEVICE_H_
+#define __MIC_COMMON_DEVICE_H_
+
+/**
+ * struct mic_mw - MIC memory window
+ *
+ * @pa: Base physical address.
+ * @va: Base ioremap'd virtual address.
+ * @len: Size of the memory window.
+ */
+struct mic_mw {
+	phys_addr_t pa;
+	void __iomem *va;
+	resource_size_t len;
+};
+
+#endif
diff --git a/drivers/misc/mic/host/Makefile b/drivers/misc/mic/host/Makefile
new file mode 100644
index 0000000..93b9d25
--- /dev/null
+++ b/drivers/misc/mic/host/Makefile
@@ -0,0 +1,8 @@
+#
+# Makefile - Intel MIC Linux driver.
+# Copyright(c) 2013, Intel Corporation.
+#
+obj-$(CONFIG_INTEL_MIC_HOST) += mic_host.o
+mic_host-objs := mic_main.o
+mic_host-objs += mic_x100.o
+mic_host-objs += mic_sysfs.o
diff --git a/drivers/misc/mic/host/mic_common.h b/drivers/misc/mic/host/mic_common.h
new file mode 100644
index 0000000..55b0337
--- /dev/null
+++ b/drivers/misc/mic/host/mic_common.h
@@ -0,0 +1,30 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#ifndef _MIC_HOST_COMMON_H_
+#define _MIC_HOST_COMMON_H_
+
+#include <linux/cdev.h>
+
+#include "../common/mic_device.h"
+#include "mic_device.h"
+#include "mic_x100.h"
+
+#endif
diff --git a/drivers/misc/mic/host/mic_device.h b/drivers/misc/mic/host/mic_device.h
new file mode 100644
index 0000000..207f0ba
--- /dev/null
+++ b/drivers/misc/mic/host/mic_device.h
@@ -0,0 +1,114 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#ifndef _MIC_DEVICE_H_
+#define _MIC_DEVICE_H_
+
+#define MIC_MAX_NUM_DEVS 256
+
+/**
+ * enum mic_hw_family - The hardware family to which a device belongs.
+ */
+enum mic_hw_family {
+	MIC_FAMILY_X100 = 0,
+	MIC_FAMILY_UNKNOWN
+};
+
+/**
+ * enum mic_stepping - MIC stepping ids.
+ */
+enum mic_stepping {
+	MIC_A0_STEP = 0x0,
+	MIC_B0_STEP = 0x10,
+	MIC_B1_STEP = 0x11,
+	MIC_C0_STEP = 0x20,
+};
+
+/**
+ * struct mic_device -  MIC device information for each card.
+ *
+ * @name: Unique name for this MIC device.
+ * @mmio: MMIO bar information.
+ * @pdev: The PCI device structure.
+ * @family: The MIC family to which this device belongs.
+ * @ops: MIC HW specific operations.
+ * @id: The unique device id for this MIC device.
+ * @stepping: Stepping ID.
+ * @attr_group: Sysfs attribute group.
+ * @sdev: Device for sysfs entries.
+ * @aper: Aperture bar information.
+ */
+struct mic_device {
+	char name[20];
+	struct mic_mw mmio;
+	struct pci_dev *pdev;
+	enum mic_hw_family family;
+	struct mic_hw_ops *ops;
+	int id;
+	enum mic_stepping stepping;
+	struct attribute_group attr_group;
+	struct device *sdev;
+	struct mic_mw aper;
+};
+
+/**
+ * struct mic_hw_ops - MIC HW specific operations.
+ * @aper_bar: Aperture bar resource number.
+ * @mmio_bar: MMIO bar resource number.
+ * @init: Initialize the MIC HW information.
+ * @read_spad: Read from scratch pad register.
+ * @write_spad: Write to scratch pad register.
+ */
+struct mic_hw_ops {
+	u8 aper_bar;
+	u8 mmio_bar;
+	void (*init)(struct mic_device *mdev);
+	u32 (*read_spad)(struct mic_device *mdev, unsigned int idx);
+	void (*write_spad)(struct mic_device *mdev, u32 idx, u32 val);
+};
+
+/**
+ * mic_mmio_read - read from an MMIO register.
+ * @mw: MMIO register base virtual address.
+ * @offset: register offset.
+ *
+ * RETURNS: register value.
+ */
+static inline u32 mic_mmio_read(struct mic_mw *mw, u32 offset)
+{
+	return ioread32(mw->va + offset);
+}
+
+/**
+ * mic_mmio_write - write to an MMIO register.
+ * @mw: MMIO register base virtual address.
+ * @val: the data value to put into the register
+ * @offset: register offset.
+ *
+ * RETURNS: none.
+ */
+static inline void
+mic_mmio_write(struct mic_mw *mw, u32 val, u32 offset)
+{
+	iowrite32(val, mw->va + offset);
+}
+
+void mic_sysfs_init(struct mic_device *mdev);
+#endif
diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c
new file mode 100644
index 0000000..e53a74e
--- /dev/null
+++ b/drivers/misc/mic/host/mic_main.c
@@ -0,0 +1,326 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ * Global TODO's across the driver to be added after initial base
+ * patches are accepted upstream:
+ * 1) Enable DMA support.
+ * 2) Enable per vring interrupt support.
+ */
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/pci.h>
+
+#include "mic_common.h"
+
+static const char mic_driver_name[] = "mic";
+
+static DEFINE_PCI_DEVICE_TABLE(mic_pci_tbl) = {
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2250)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2251)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2252)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2253)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2254)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2255)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2256)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2257)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2258)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2259)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225a)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225b)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225c)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225d)},
+	{PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225e)},
+
+	/* required last entry */
+	{ 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, mic_pci_tbl);
+
+/**
+ * struct mic_info -  Global information about all MIC devices.
+ *
+ * @next_id: Next available MIC device id.
+ * @mic_class: Class of MIC devices for sysfs accessibility.
+ * @mdev_id: Base device node number.
+ */
+struct mic_info {
+	int next_id;
+	struct class *mic_class;
+	dev_t mdev_id;
+};
+
+/* g_mic - Global information about all MIC devices. */
+static struct mic_info g_mic;
+
+/**
+ * mic_ops_init: Initialize HW specific operation tables.
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * returns none.
+ */
+static void mic_ops_init(struct mic_device *mdev)
+{
+	switch (mdev->family) {
+	case MIC_FAMILY_X100:
+		mdev->ops = &mic_x100_ops;
+		break;
+	default:
+		break;
+	}
+}
+
+/**
+ * mic_get_family - Determine hardware family to which this MIC belongs.
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * returns family.
+ */
+static enum mic_hw_family mic_get_family(struct mic_device *mdev)
+{
+	int dev_id = mdev->pdev->device;
+	enum mic_hw_family family;
+
+	switch (dev_id) {
+	case MIC_X100_PCI_DEVICE_2250:
+	case MIC_X100_PCI_DEVICE_2251:
+	case MIC_X100_PCI_DEVICE_2252:
+	case MIC_X100_PCI_DEVICE_2253:
+	case MIC_X100_PCI_DEVICE_2254:
+	case MIC_X100_PCI_DEVICE_2255:
+	case MIC_X100_PCI_DEVICE_2256:
+	case MIC_X100_PCI_DEVICE_2257:
+	case MIC_X100_PCI_DEVICE_2258:
+	case MIC_X100_PCI_DEVICE_2259:
+	case MIC_X100_PCI_DEVICE_225a:
+	case MIC_X100_PCI_DEVICE_225b:
+	case MIC_X100_PCI_DEVICE_225c:
+	case MIC_X100_PCI_DEVICE_225d:
+	case MIC_X100_PCI_DEVICE_225e:
+		family = MIC_FAMILY_X100;
+		break;
+	default:
+		family = MIC_FAMILY_UNKNOWN;
+		break;
+	}
+	return family;
+}
+
+/**
+ * mic_device_init - Allocates and initializes the MIC device structure
+ *
+ * @mdev: pointer to mic_device instance
+ * @pdev: The pci device structure
+ *
+ * returns none.
+ */
+static void
+mic_device_init(struct mic_device *mdev, struct pci_dev *pdev)
+{
+	mdev->pdev = pdev;
+	mdev->family = mic_get_family(mdev);
+	mic_ops_init(mdev);
+	mic_sysfs_init(mdev);
+}
+
+/**
+ * mic_probe - Device Initialization Routine
+ *
+ * @pdev: PCI device structure
+ * @ent: entry in mic_pci_tbl
+ *
+ * returns 0 on success, < 0 on failure.
+ */
+static int __init mic_probe(struct pci_dev *pdev,
+		const struct pci_device_id *ent)
+{
+	int rc;
+	struct mic_device *mdev;
+	char name[20];
+
+	rc = g_mic.next_id++;
+
+	snprintf(name, sizeof(name), "mic%d", rc);
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev) {
+		rc = -ENOMEM;
+		dev_err(&pdev->dev, "dev kmalloc failed rc %d\n", rc);
+		goto dec_num_dev;
+	}
+	strncpy(mdev->name, name, sizeof(name));
+	mdev->id = rc;
+
+	mic_device_init(mdev, pdev);
+
+	rc = pci_enable_device(pdev);
+	if (rc) {
+		dev_err(&pdev->dev, "failed to enable pci device.\n");
+		goto free_device;
+	}
+
+	pci_set_master(pdev);
+
+	rc = pci_request_regions(pdev, mic_driver_name);
+	if (rc) {
+		dev_err(&pdev->dev, "failed to get pci regions.\n");
+		goto disable_device;
+	}
+
+	rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	if (rc) {
+		dev_err(&pdev->dev, "Cannot set DMA mask\n");
+		goto release_regions;
+	}
+
+	mdev->mmio.pa = pci_resource_start(pdev, mdev->ops->mmio_bar);
+	mdev->mmio.len = pci_resource_len(pdev, mdev->ops->mmio_bar);
+	mdev->mmio.va = pci_ioremap_bar(pdev, mdev->ops->mmio_bar);
+	if (!mdev->mmio.va) {
+		dev_err(&pdev->dev, "Cannot remap MMIO BAR\n");
+		rc = -EIO;
+		goto release_regions;
+	}
+
+	mdev->aper.pa = pci_resource_start(pdev, mdev->ops->aper_bar);
+	mdev->aper.len = pci_resource_len(pdev, mdev->ops->aper_bar);
+	mdev->aper.va = ioremap_wc(mdev->aper.pa, mdev->aper.len);
+	if (!mdev->aper.va) {
+		dev_err(&pdev->dev, "Cannot remap Aperture BAR\n");
+		rc = -EIO;
+		goto unmap_mmio;
+	}
+
+	mdev->ops->init(mdev);
+
+	pci_set_drvdata(pdev, mdev);
+
+	mdev->sdev = device_create(g_mic.mic_class, &pdev->dev,
+		MKDEV(MAJOR(g_mic.mdev_id), mdev->id), NULL, "%s", mdev->name);
+	if (IS_ERR(mdev->sdev)) {
+		rc = PTR_ERR(mdev->sdev);
+		dev_err(&pdev->dev, "device_create failed rc %d\n", rc);
+		goto unmap_aper;
+	}
+
+	rc = sysfs_create_group(&mdev->sdev->kobj, &mdev->attr_group);
+	if (rc) {
+		dev_err(&pdev->dev, "sysfs_create_group failed rc %d\n", rc);
+		goto destroy_device;
+	}
+	dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name);
+	return 0;
+destroy_device:
+	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
+unmap_aper:
+	iounmap(mdev->mmio.va);
+unmap_mmio:
+	iounmap(mdev->aper.va);
+release_regions:
+	pci_release_regions(pdev);
+disable_device:
+	pci_disable_device(pdev);
+free_device:
+	kfree(mdev);
+dec_num_dev:
+	g_mic.next_id--;
+	dev_err(&pdev->dev, "Probe failed rc %d\n", rc);
+	return rc;
+}
+
+/**
+ * mic_remove - Device Removal Routine
+ * mic_remove is called by the PCI subsystem to alert the driver
+ * that it should release a PCI device.
+ *
+ * @pdev: PCI device structure
+ */
+static void mic_remove(struct pci_dev *pdev)
+{
+	struct mic_device *mdev;
+	int id;
+
+	mdev = pci_get_drvdata(pdev);
+	if (!mdev)
+		return;
+
+	id = mdev->id;
+
+	sysfs_remove_group(&mdev->sdev->kobj, &mdev->attr_group);
+	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
+	iounmap(mdev->mmio.va);
+	iounmap(mdev->aper.va);
+	pci_release_regions(pdev);
+	pci_disable_device(pdev);
+	kfree(mdev);
+	dev_dbg(&pdev->dev, "Removed mic%d\n", id);
+}
+static struct pci_driver mic_driver = {
+	.name = mic_driver_name,
+	.id_table = mic_pci_tbl,
+	.probe = mic_probe,
+	.remove = mic_remove
+};
+
+static int __init mic_init(void)
+{
+	int ret;
+
+	ret = alloc_chrdev_region(&g_mic.mdev_id, 0,
+		MIC_MAX_NUM_DEVS, mic_driver_name);
+	if (ret) {
+		pr_err("alloc_chrdev_region failed ret %d\n", ret);
+		goto error;
+	}
+
+	g_mic.mic_class = class_create(THIS_MODULE, mic_driver_name);
+	if (IS_ERR(g_mic.mic_class)) {
+		ret = PTR_ERR(g_mic.mic_class);
+		pr_err("class_create failed ret %d\n", ret);
+		goto cleanup_chrdev;
+	}
+
+	ret = pci_register_driver(&mic_driver);
+	if (ret) {
+		pr_err("pci_register_driver failed ret %d\n", ret);
+		goto class_destroy;
+	}
+	return ret;
+class_destroy:
+	class_destroy(g_mic.mic_class);
+cleanup_chrdev:
+	unregister_chrdev_region(g_mic.mdev_id, MIC_MAX_NUM_DEVS);
+error:
+	return ret;
+}
+
+static void __exit mic_exit(void)
+{
+	pci_unregister_driver(&mic_driver);
+	class_destroy(g_mic.mic_class);
+	unregister_chrdev_region(g_mic.mdev_id, MIC_MAX_NUM_DEVS);
+}
+
+module_init(mic_init);
+module_exit(mic_exit);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("Intel(R) MIC X100 Host driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/mic/host/mic_sysfs.c b/drivers/misc/mic/host/mic_sysfs.c
new file mode 100644
index 0000000..fe0605d
--- /dev/null
+++ b/drivers/misc/mic/host/mic_sysfs.c
@@ -0,0 +1,94 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#include <linux/module.h>
+#include <linux/pci.h>
+
+#include "mic_common.h"
+
+static ssize_t
+mic_show_family(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	static const char x100[] = "x100";
+	static const char unknown[] = "Unknown";
+	const char *card = NULL;
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+
+	if (!mdev)
+		return -EINVAL;
+
+	switch (mdev->family) {
+	case MIC_FAMILY_X100:
+		card = x100;
+		break;
+	default:
+		card = unknown;
+		break;
+	}
+	return snprintf(buf, PAGE_SIZE, "%s\n", card);
+}
+static DEVICE_ATTR(family, S_IRUGO, mic_show_family, NULL);
+
+static ssize_t
+mic_show_stepping(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+	char *string = "??";
+
+	if (!mdev)
+		return -EINVAL;
+
+	switch (mdev->family) {
+	case MIC_FAMILY_X100:
+		switch (mdev->stepping) {
+		case MIC_A0_STEP:
+			string = "A0";
+			break;
+		case MIC_B0_STEP:
+			string = "B0";
+			break;
+		case MIC_B1_STEP:
+			string = "B1";
+			break;
+		case MIC_C0_STEP:
+			string = "C0";
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+	return snprintf(buf, PAGE_SIZE, "%s\n", string);
+}
+static DEVICE_ATTR(stepping, S_IRUGO, mic_show_stepping, NULL);
+
+static struct attribute *default_attrs[] = {
+	&dev_attr_family.attr,
+	&dev_attr_stepping.attr,
+
+	NULL
+};
+
+void mic_sysfs_init(struct mic_device *mdev)
+{
+	mdev->attr_group.attrs = default_attrs;
+}
diff --git a/drivers/misc/mic/host/mic_x100.c b/drivers/misc/mic/host/mic_x100.c
new file mode 100644
index 0000000..a6c1f11
--- /dev/null
+++ b/drivers/misc/mic/host/mic_x100.c
@@ -0,0 +1,86 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#include <linux/fs.h>
+#include <linux/pci.h>
+
+#include "mic_common.h"
+
+/*
+ * mic_x100_hw_init - Initialize hardware information.
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * returns none:
+ */
+static void mic_x100_hw_init(struct mic_device *mdev)
+{
+	mdev->stepping = mdev->pdev->revision;
+}
+
+/**
+ * mic_x100_write_spad - write to the scratchpad register
+ * @mdev: pointer to mic_device instance
+ * @idx: index to the scratchpad register, 0 based
+ * @val: the data value to put into the register
+ *
+ * This function allows writing of a 32bit value to the indexed scratchpad
+ * register.
+ *
+ * RETURNS: none.
+ */
+static void
+mic_x100_write_spad(struct mic_device *mdev, unsigned int idx, u32 val)
+{
+	dev_dbg(&mdev->pdev->dev, "Writing 0x%x to scratch pad index %d\n",
+		val, idx);
+	mic_mmio_write(&mdev->mmio, val,
+		MIC_X100_SBOX_BASE_ADDRESS +
+		MIC_X100_SBOX_SPAD0 + idx * 4);
+}
+
+/**
+ * mic_x100_read_spad - read from the scratchpad register
+ * @mdev: pointer to mic_device instance
+ * @idx: index to scratchpad register, 0 based
+ *
+ * This function allows reading of the 32bit scratchpad register.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static u32
+mic_x100_read_spad(struct mic_device *mdev, unsigned int idx)
+{
+	u32 val = mic_mmio_read(&mdev->mmio,
+		MIC_X100_SBOX_BASE_ADDRESS +
+		MIC_X100_SBOX_SPAD0 + idx * 4);
+
+	dev_dbg(&mdev->pdev->dev,
+		"Reading 0x%x from scratch pad index %d\n", val, idx);
+	return val;
+}
+
+struct mic_hw_ops mic_x100_ops = {
+	.aper_bar = MIC_X100_APER_BAR,
+	.mmio_bar = MIC_X100_MMIO_BAR,
+	.init = mic_x100_hw_init,
+	.read_spad = mic_x100_read_spad,
+	.write_spad = mic_x100_write_spad,
+};
diff --git a/drivers/misc/mic/host/mic_x100.h b/drivers/misc/mic/host/mic_x100.h
new file mode 100644
index 0000000..1f4e630
--- /dev/null
+++ b/drivers/misc/mic/host/mic_x100.h
@@ -0,0 +1,47 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#ifndef _MIC_X100_HW_H_
+#define _MIC_X100_HW_H_
+
+#define MIC_X100_PCI_DEVICE_2250 0x2250
+#define MIC_X100_PCI_DEVICE_2251 0x2251
+#define MIC_X100_PCI_DEVICE_2252 0x2252
+#define MIC_X100_PCI_DEVICE_2253 0x2253
+#define MIC_X100_PCI_DEVICE_2254 0x2254
+#define MIC_X100_PCI_DEVICE_2255 0x2255
+#define MIC_X100_PCI_DEVICE_2256 0x2256
+#define MIC_X100_PCI_DEVICE_2257 0x2257
+#define MIC_X100_PCI_DEVICE_2258 0x2258
+#define MIC_X100_PCI_DEVICE_2259 0x2259
+#define MIC_X100_PCI_DEVICE_225a 0x225a
+#define MIC_X100_PCI_DEVICE_225b 0x225b
+#define MIC_X100_PCI_DEVICE_225c 0x225c
+#define MIC_X100_PCI_DEVICE_225d 0x225d
+#define MIC_X100_PCI_DEVICE_225e 0x225e
+
+#define MIC_X100_APER_BAR 0
+#define MIC_X100_MMIO_BAR 4
+
+#define MIC_X100_SBOX_BASE_ADDRESS 0x00010000
+#define MIC_X100_SBOX_SPAD0 0x0000AB20
+extern struct mic_hw_ops mic_x100_ops;
+
+#endif
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/7] Intel MIC Host Driver Interrupt/SMPT support for X100 family.
  2013-08-08  3:04 [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 1/7] Intel MIC Host Driver for X100 family Sudeep Dutt
@ 2013-08-08  3:04 ` Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 3/7] Intel MIC Host Driver, card OS state management Sudeep Dutt
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-08  3:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell,
	Michael S. Tsirkin, Rob Landley, linux-kernel, virtualization,
	linux-doc
  Cc: Harshavardhan R Kharche, Peter P Waskiewicz Jr,
	Yaozu (Eddie) Dong, Sudeep Dutt, Ashutosh Dixit, AsiasHeasias,
	Caz Yokoyama, Dasaratharaman Chandramouli

From: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>

This patch enables the following features:
a) MSIx, MSI and legacy interrupt support.
b) System Memory Page Table(SMPT) support. SMPT enables system memory
   access from the card. On X100 devices the host can program 32 SMPT
   registers each capable of accessing 16GB of system memory
   address space from X100 devices. The registers can thereby be used
   to access a cumulative 512GB of system memory address space from
   X100 devices at any point in time.

Co-author: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---
 drivers/misc/mic/host/Makefile     |   1 +
 drivers/misc/mic/host/mic_common.h |   1 +
 drivers/misc/mic/host/mic_device.h | 119 +++++++
 drivers/misc/mic/host/mic_main.c   | 640 ++++++++++++++++++++++++++++++++++++-
 drivers/misc/mic/host/mic_smpt.c   | 436 +++++++++++++++++++++++++
 drivers/misc/mic/host/mic_smpt.h   |  98 ++++++
 drivers/misc/mic/host/mic_x100.c   | 248 ++++++++++++++
 drivers/misc/mic/host/mic_x100.h   |  40 +++
 8 files changed, 1581 insertions(+), 2 deletions(-)
 create mode 100644 drivers/misc/mic/host/mic_smpt.c
 create mode 100644 drivers/misc/mic/host/mic_smpt.h

diff --git a/drivers/misc/mic/host/Makefile b/drivers/misc/mic/host/Makefile
index 93b9d25..3702d2a 100644
--- a/drivers/misc/mic/host/Makefile
+++ b/drivers/misc/mic/host/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_INTEL_MIC_HOST) += mic_host.o
 mic_host-objs := mic_main.o
 mic_host-objs += mic_x100.o
 mic_host-objs += mic_sysfs.o
+mic_host-objs += mic_smpt.o
diff --git a/drivers/misc/mic/host/mic_common.h b/drivers/misc/mic/host/mic_common.h
index 55b0337..2a0624e 100644
--- a/drivers/misc/mic/host/mic_common.h
+++ b/drivers/misc/mic/host/mic_common.h
@@ -26,5 +26,6 @@
 #include "../common/mic_device.h"
 #include "mic_device.h"
 #include "mic_x100.h"
+#include "mic_smpt.h"
 
 #endif
diff --git a/drivers/misc/mic/host/mic_device.h b/drivers/misc/mic/host/mic_device.h
index 207f0ba..d191831 100644
--- a/drivers/misc/mic/host/mic_device.h
+++ b/drivers/misc/mic/host/mic_device.h
@@ -22,6 +22,11 @@
 #define _MIC_DEVICE_H_
 
 #define MIC_MAX_NUM_DEVS 256
+#define MIC_NUM_INTR_TYPES 3
+
+/* The minimum number of msix vectors required
+ * for normal operation */
+#define MIC_MIN_MSIX 5
 
 /**
  * enum mic_hw_family - The hardware family to which a device belongs.
@@ -42,6 +47,74 @@ enum mic_stepping {
 };
 
 /**
+ * mic_intr_source - The type of source that will generate
+ * the interrupt.The number of types needs to be in sync with
+ * MIC_NUM_INTR_TYPES
+ *
+ * MIC_INTR_DB: The source is a doorbell
+ * MIC_INTR_DMA: The source is a DMA channel
+ * MIC_INTR_ERR: The source is an error interrupt e.g. SBOX ERR
+ */
+enum mic_intr_type {
+	MIC_INTR_DB = 0,
+	MIC_INTR_DMA,
+	MIC_INTR_ERR,
+};
+
+/**
+ * struct mic_intr_info - Contains h/w specific interrupt sources info
+ *
+ * @intr_start_idx: Contains the starting indexes of the
+ * interrupt types.
+ * @intr_len: Contains the length of the interrupt types.
+ */
+struct mic_intr_info {
+	u16 intr_start_idx[MIC_NUM_INTR_TYPES];
+	u16 intr_len[MIC_NUM_INTR_TYPES];
+};
+
+/**
+ * struct mic_irq_info - OS specific irq information
+ *
+ * @next_avail_src: next available doorbell that can be assigned.
+ * @msix_entries: msix entries allocated while setting up MSI-x
+ * @mic_msi_map: The MSI/MSI-x mapping information.
+ * @num_vectors: The number of MSI/MSI-x vectors that have been allocated.
+ * @cb_id: Running count of the number of callbacks registered.
+ * @mic_intr_lock: spinlock to protect the interrupt callback list.
+ * @cb_list: Array of callback lists one for each source.
+ */
+struct mic_irq_info {
+	int next_avail_src;
+	struct msix_entry *msix_entries;
+	u32 *mic_msi_map;
+	u16 num_vectors;
+	u32 cb_id;
+	spinlock_t mic_intr_lock;
+	struct list_head *cb_list;
+};
+
+/**
+ * struct mic_intr_cb - Interrupt callback structure.
+ *
+ * @func: The callback function
+ * @data: Private data of the requester.
+ * @cb_id: The callback id. Identifies this callback.
+ * @list: list head pointing to the next callback structure.
+ */
+struct mic_intr_cb {
+	irqreturn_t (*func) (int irq, void *data);
+	void *data;
+	u32 cb_id;
+	struct list_head list;
+};
+
+/**
+ * struct mic_irq - opaque pointer used as cookie
+ */
+struct mic_irq;
+
+/**
  * struct mic_device -  MIC device information for each card.
  *
  * @name: Unique name for this MIC device.
@@ -54,6 +127,12 @@ enum mic_stepping {
  * @attr_group: Sysfs attribute group.
  * @sdev: Device for sysfs entries.
  * @aper: Aperture bar information.
+ * @mic_mutex: Mutex for synchronizing access to mic_device.
+ * @intr_ops: HW specific interrupt operations.
+ * @smpt_ops: Hardware specific SMPT operations.
+ * @smpt: MIC SMPT information.
+ * @intr_info: H/W specific interrupt information.
+ * @irq_info: The OS specific irq information
  */
 struct mic_device {
 	char name[20];
@@ -66,6 +145,32 @@ struct mic_device {
 	struct attribute_group attr_group;
 	struct device *sdev;
 	struct mic_mw aper;
+	struct mutex mic_mutex;
+	struct mic_hw_intr_ops *intr_ops;
+	struct mic_smpt_ops *smpt_ops;
+	struct mic_smpt_info *smpt;
+	struct mic_intr_info *intr_info;
+	struct mic_irq_info irq_info;
+};
+
+/**
+ * struct mic_hw_intr_ops: MIC HW specific interrupt operations
+ * @intr_init: Initialize H/W specific interrupt information.
+ * @enable_interrupts: Enable interrupts from the hardware.
+ * @disable_interrupts: Disable interrupts from the hardware.
+ * @program_msi_to_src_map: Update MSI mapping registers with
+ * irq information.
+ * @read_msi_to_src_map: Read MSI mapping registers containing
+ * irq information.
+ */
+struct mic_hw_intr_ops {
+	void (*intr_init)(struct mic_device *mdev);
+	void (*enable_interrupts)(struct mic_device *mdev);
+	void (*disable_interrupts)(struct mic_device *mdev);
+	void (*program_msi_to_src_map) (struct mic_device *mdev,
+			int idx, int intr_src, bool set);
+	u32 (*read_msi_to_src_map) (struct mic_device *mdev,
+			int idx);
 };
 
 /**
@@ -75,6 +180,9 @@ struct mic_device {
  * @init: Initialize the MIC HW information.
  * @read_spad: Read from scratch pad register.
  * @write_spad: Write to scratch pad register.
+ * @send_intr: Send an interrupt for a particular doorbell on the card.
+ * @ack_interrupt: Hardware specific operations to ack the h/w on
+ * receipt of an interrupt.
  */
 struct mic_hw_ops {
 	u8 aper_bar;
@@ -82,6 +190,8 @@ struct mic_hw_ops {
 	void (*init)(struct mic_device *mdev);
 	u32 (*read_spad)(struct mic_device *mdev, unsigned int idx);
 	void (*write_spad)(struct mic_device *mdev, u32 idx, u32 val);
+	void (*send_intr)(struct mic_device *mdev, int doorbell);
+	u32 (*ack_interrupt)(struct mic_device *mdev);
 };
 
 /**
@@ -111,4 +221,13 @@ mic_mmio_write(struct mic_mw *mw, u32 val, u32 offset)
 }
 
 void mic_sysfs_init(struct mic_device *mdev);
+int mic_next_db(struct mic_device *mdev);
+struct mic_irq *mic_request_irq(struct mic_device *mdev,
+	irqreturn_t (*func)(int irq, void *data),
+	const char *name, void *data, int intr_src,
+	enum mic_intr_type type);
+
+void mic_free_irq(struct mic_device *mdev,
+		struct mic_irq *cookie, void *data);
+void mic_intr_restore(struct mic_device *mdev);
 #endif
diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c
index e53a74e..505b249 100644
--- a/drivers/misc/mic/host/mic_main.c
+++ b/drivers/misc/mic/host/mic_main.c
@@ -25,6 +25,7 @@
 #include <linux/module.h>
 #include <linux/fs.h>
 #include <linux/pci.h>
+#include <linux/interrupt.h>
 
 #include "mic_common.h"
 
@@ -69,6 +70,621 @@ struct mic_info {
 /* g_mic - Global information about all MIC devices. */
 static struct mic_info g_mic;
 
+/*
+ * mic_invoke_callback - Invoke callback functions registered for
+ * the corresponding source id.
+ *
+ * @mdev: pointer to the mic_device instance
+ * @idx: The interrupt source id.
+ *
+ * Returns none.
+ */
+static inline void mic_invoke_callback(struct mic_device *mdev, int idx)
+{
+	struct mic_intr_cb *intr_cb;
+
+	spin_lock(&mdev->irq_info.mic_intr_lock);
+	list_for_each_entry(intr_cb, &mdev->irq_info.cb_list[idx], list)
+		if (intr_cb->func)
+			intr_cb->func(mdev->pdev->irq, intr_cb->data);
+	spin_unlock(&mdev->irq_info.mic_intr_lock);
+}
+
+/**
+ * mic_interrupt - Generic interrupt handler for
+ * MSI and INTx based interrupts.
+ */
+static irqreturn_t mic_interrupt(int irq, void *dev)
+{
+	struct mic_device *mdev = dev;
+	struct mic_intr_info *info = mdev->intr_info;
+	u32 mask;
+	int i;
+
+	mask = mdev->ops->ack_interrupt(mdev);
+	if (!mask)
+		return IRQ_NONE;
+
+	for (i = info->intr_start_idx[MIC_INTR_DB];
+			i < info->intr_len[MIC_INTR_DB]; i++)
+		if (mask & BIT(i))
+			mic_invoke_callback(mdev, i);
+
+	return IRQ_HANDLED;
+}
+
+/* Retrieve the next doorbell interrupt source. */
+int mic_next_db(struct mic_device *mdev)
+{
+	int next_db;
+
+	next_db = mdev->irq_info.next_avail_src %
+		mdev->intr_info->intr_len[MIC_INTR_DB];
+	mdev->irq_info.next_avail_src++;
+	return next_db;
+}
+
+/* Return the interrupt offset from the index. Index is 0 based. */
+static u16 mic_map_src_to_offset(struct mic_device *mdev,
+		int intr_src, enum mic_intr_type type) {
+
+	if (type >= MIC_NUM_INTR_TYPES)
+		return MIC_NUM_OFFSETS;
+
+	if (intr_src >= mdev->intr_info->intr_len[type])
+		return MIC_NUM_OFFSETS;
+
+	return mdev->intr_info->intr_start_idx[type] + intr_src;
+}
+
+/* Return next available msix_entry. */
+static struct msix_entry *mic_get_available_vector(struct mic_device *mdev)
+{
+	int i;
+	struct mic_irq_info *info = &mdev->irq_info;
+
+	for (i = 0; i < info->num_vectors; i++) {
+		if (!info->mic_msi_map[i])
+			return &info->msix_entries[i];
+	}
+	return NULL;
+}
+
+/**
+ * mic_intr_restore - Restore h/w specific interrupt
+ * registers after a card reset. mic_mutex needs to be
+ * held before calling this function.
+ *
+ */
+void mic_intr_restore(struct mic_device *mdev)
+{
+	int entry, offset;
+
+	if (!pci_dev_msi_enabled(mdev->pdev))
+		return;
+
+	WARN_ON(!mutex_is_locked(&mdev->mic_mutex));
+	for (entry = 0; entry < mdev->irq_info.num_vectors; entry++) {
+		for (offset = 0; offset < MIC_NUM_OFFSETS; offset++) {
+			if (mdev->irq_info.mic_msi_map[entry] & BIT(offset))
+				mdev->intr_ops->program_msi_to_src_map(mdev,
+					entry, offset, true);
+		}
+	}
+}
+
+/**
+ * mic_register_intr_callback - Register a callback handler for the
+ * given source id.
+ *
+ * @mdev: pointer to the mic_device instance
+ * @idx: The source id to be registered.
+ * @func: The function to be called when the source id receives
+ * the interrupt.
+ * @data: Private data of the requester.
+ * Return the callback structure that was registered or an
+ * appropriate error on failure.
+ */
+static struct mic_intr_cb *mic_register_intr_callback(struct mic_device *mdev,
+			u8 idx, irqreturn_t (*func) (int irq, void *dev),
+			void *data)
+{
+	struct mic_intr_cb *intr_cb;
+	unsigned long flags;
+	intr_cb = kmalloc(sizeof(struct mic_intr_cb), GFP_KERNEL);
+
+	if (!intr_cb)
+		return ERR_PTR(-ENOMEM);
+
+	intr_cb->func = func;
+	intr_cb->data = data;
+	intr_cb->cb_id = mdev->irq_info.cb_id++;
+
+	spin_lock_irqsave(&mdev->irq_info.mic_intr_lock, flags);
+	list_add_tail(&intr_cb->list, &mdev->irq_info.cb_list[idx]);
+	spin_unlock_irqrestore(&mdev->irq_info.mic_intr_lock, flags);
+
+	return intr_cb;
+}
+
+/**
+ * mic_unregister_intr_callback - Unregister the callback handler
+ * identified by its callback id.
+ *
+ * @mdev: pointer to the mic_device instance
+ * @idx: The callback structure id to be unregistered.
+ * Return the source id that was unregistered or MIC_NUM_OFFSETS if no
+ * such callback handler was found.
+ */
+static u8 mic_unregister_intr_callback(struct mic_device *mdev, u32 idx)
+{
+	struct list_head *pos, *tmp;
+	struct mic_intr_cb *intr_cb;
+	unsigned long flags;
+	int i;
+
+	for (i = 0;  i < MIC_NUM_OFFSETS; i++) {
+		spin_lock_irqsave(&mdev->irq_info.mic_intr_lock, flags);
+		list_for_each_safe(pos, tmp, &mdev->irq_info.cb_list[i]) {
+			intr_cb = list_entry(pos, struct mic_intr_cb, list);
+			if (intr_cb->cb_id == idx) {
+				list_del(pos);
+				kfree(intr_cb);
+				spin_unlock_irqrestore(
+					&mdev->irq_info.mic_intr_lock, flags);
+				return i;
+			}
+		}
+		spin_unlock_irqrestore(&mdev->irq_info.mic_intr_lock, flags);
+	}
+
+	return MIC_NUM_OFFSETS;
+}
+
+/**
+ * mic_alloc_msi_map - Allocate mapping information for MSI
+ * and MSI-x interrupts.
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * 0 on success. Appropriate error on failure.
+ */
+static int mic_alloc_msi_map(struct mic_device *mdev)
+{
+	mdev->irq_info.mic_msi_map = kzalloc((sizeof(u32) *
+		mdev->irq_info.num_vectors), GFP_KERNEL);
+
+	if (!mdev->irq_info.mic_msi_map)
+		return -ENOMEM;
+	return 0;
+}
+
+/**
+ * mic_free_msi_map - Free mapping information for MSI
+ * and MSI-x interrupts.
+ *
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_free_msi_map(struct mic_device *mdev)
+{
+	kfree(mdev->irq_info.mic_msi_map);
+}
+
+#define COOKIE_ID_SHIFT 16
+#define GET_ENTRY(cookie) ((cookie) & 0xFFFF)
+#define GET_OFFSET(cookie) ((cookie) >> COOKIE_ID_SHIFT)
+#define MK_COOKIE(x, y) ((x) | (y) << COOKIE_ID_SHIFT)
+
+/**
+ * mic_request_irq - request an irq. mic_mutex needs
+ * to be held before calling this function.
+ *
+ * @mdev: pointer to mic_device instance
+ * @func: The callback function that handles the interrupt.
+ * The function needs to call ack_interrupts
+ * (mdev->ops->ack_interrupt(mdev)) when handling the interrupts.
+ * @name: The ASCII name of the callee requesting the irq.
+ * @data: private data that is returned back when calling the
+ * function handler.
+ * @intr_src: The source id of the requester. Its the doorbell id
+ * for Doorbell interrupts and DMA channel id for DMA interrupts.
+ * @type: The type of interrupt. Values defined in mic_intr_type
+ *
+ * returns: The cookie that is transparent to the caller. Passed
+ * back when calling mic_free_irq. An appropriate error code
+ * is returned on failure. Caller needs to use IS_ERR(return_val)
+ * to check for failure and PTR_ERR(return_val) to obtained the
+ * error code.
+ *
+ */
+struct mic_irq *mic_request_irq(struct mic_device *mdev,
+	irqreturn_t (*func)(int irq, void *dev),
+	const char *name, void *data, int intr_src,
+	enum mic_intr_type type) {
+
+	u16 offset;
+	int rc = 0;
+	struct msix_entry *msix = NULL;
+	unsigned long cookie = 0;
+	u16 entry;
+	struct mic_intr_cb *intr_cb;
+
+	if (!mdev) {
+		rc = -EINVAL;
+		goto err;
+	}
+
+	WARN_ON(!mutex_is_locked(&mdev->mic_mutex));
+	offset = mic_map_src_to_offset(mdev, intr_src, type);
+	if (offset >= MIC_NUM_OFFSETS) {
+		dev_err(&mdev->pdev->dev,
+				"Error mapping index %d to a valid source id.\n",
+				intr_src);
+		rc = -EINVAL;
+		goto err;
+	}
+
+	if (mdev->irq_info.num_vectors > 1) {
+		msix = mic_get_available_vector(mdev);
+		if (!msix) {
+			dev_err(&mdev->pdev->dev,
+			"No MSIx vectors available for use.\n");
+			rc = -ENOSPC;
+			goto err;
+		}
+
+		rc = request_irq(msix->vector, func, 0, name, data);
+		if (rc) {
+			dev_dbg(&mdev->pdev->dev,
+				"request irq failed rc = %d\n", rc);
+			goto err;
+		}
+
+		entry = msix->entry;
+		mdev->irq_info.mic_msi_map[entry] |= BIT(offset);
+		mdev->intr_ops->program_msi_to_src_map(mdev,
+				entry, offset, true);
+		cookie = MK_COOKIE(entry, offset);
+		dev_dbg(&mdev->pdev->dev, "irq: %d assigned for src: %d\n",
+			msix->vector, intr_src);
+	} else {
+		intr_cb = mic_register_intr_callback(mdev,
+				offset, func, data);
+		if (IS_ERR(intr_cb)) {
+			dev_err(&mdev->pdev->dev,
+			"No available callback entries for use\n");
+			rc = PTR_ERR(intr_cb);
+			goto err;
+		}
+
+		entry = 0;
+		if (pci_dev_msi_enabled(mdev->pdev)) {
+			mdev->irq_info.mic_msi_map[entry] |= (1 << offset);
+			mdev->intr_ops->program_msi_to_src_map(mdev,
+				entry, offset, true);
+		}
+		cookie = MK_COOKIE(entry, intr_cb->cb_id);
+		dev_dbg(&mdev->pdev->dev, "callback %d registered for src: %d\n",
+			intr_cb->cb_id, intr_src);
+	}
+
+	return (struct mic_irq *)cookie;
+err:
+	return ERR_PTR(rc);
+}
+
+/**
+ * mic_free_irq - free irq. mic_mutex
+ *  needs to be held before calling this function.
+ *
+ * @mdev: pointer to mic_device instance
+ * @cookie: cookie obtained during a successful call to mic_request_irq
+ * @data: private data specified by the calling function during the
+ * mic_request_irq
+ *
+ * returns: none.
+ */
+void mic_free_irq(struct mic_device *mdev,
+		struct mic_irq *cookie, void *data)
+{
+	u32 offset;
+	u32 entry;
+	u8 src_id;
+	unsigned int irq;
+
+	if (!mdev)
+		return;
+
+	WARN_ON(!mutex_is_locked(&mdev->mic_mutex));
+
+	entry = GET_ENTRY((unsigned long)cookie);
+	offset = GET_OFFSET((unsigned long)cookie);
+	if (mdev->irq_info.num_vectors > 1) {
+		if (entry >= mdev->irq_info.num_vectors) {
+			dev_warn(&mdev->pdev->dev,
+				"entry %d should be < num_irq %d\n",
+				entry, mdev->irq_info.num_vectors);
+			return;
+		}
+		irq = mdev->irq_info.msix_entries[entry].vector;
+		free_irq(irq, data);
+		mdev->irq_info.mic_msi_map[entry] &= ~(BIT(offset));
+		mdev->intr_ops->program_msi_to_src_map(mdev,
+			entry, offset, false);
+
+		dev_dbg(&mdev->pdev->dev, "irq: %d freed\n", irq);
+	} else {
+		irq = mdev->pdev->irq;
+		src_id = mic_unregister_intr_callback(mdev, offset);
+		if (src_id >= MIC_NUM_OFFSETS) {
+			dev_warn(&mdev->pdev->dev, "Error unregistering callback\n");
+			return;
+		}
+		if (pci_dev_msi_enabled(mdev->pdev)) {
+			mdev->irq_info.mic_msi_map[entry] &= ~(BIT(src_id));
+			mdev->intr_ops->program_msi_to_src_map(mdev,
+				entry, src_id, false);
+		}
+		dev_dbg(&mdev->pdev->dev, "callback %d unregistered for src: %d\n",
+			offset, src_id);
+	}
+}
+
+/**
+ * mic_setup_msix - Initializes MSIx interrupts.
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static int mic_setup_msix(struct mic_device *mdev)
+{
+	struct pci_dev *pdev = mdev->pdev;
+	int rc, i;
+
+	mdev->irq_info.msix_entries = kmalloc(sizeof(struct msix_entry) *
+			MIC_MIN_MSIX, GFP_KERNEL);
+	if (!mdev->irq_info.msix_entries) {
+		rc = -ENOMEM;
+		goto err_nomem1;
+	}
+
+	for (i = 0; i < MIC_MIN_MSIX; i++)
+		mdev->irq_info.msix_entries[i].entry = i;
+
+	rc = pci_enable_msix(pdev, mdev->irq_info.msix_entries,
+		MIC_MIN_MSIX);
+	if (rc) {
+		dev_dbg(&pdev->dev, "Error enabling MSIx. rc = %d\n", rc);
+		goto err_enable_msix;
+	}
+
+	mdev->irq_info.num_vectors = MIC_MIN_MSIX;
+	rc = mic_alloc_msi_map(mdev);
+	if (rc)
+		goto err_nomem2;
+
+	dev_dbg(&mdev->pdev->dev,
+		"%d MSIx irqs setup\n", mdev->irq_info.num_vectors);
+	return 0;
+
+err_nomem2:
+	pci_disable_msix(pdev);
+err_enable_msix:
+	kfree(mdev->irq_info.msix_entries);
+err_nomem1:
+	mdev->irq_info.num_vectors = 0;
+	return rc;
+}
+
+/**
+ * mic_setup_callbacks - Initialize data structures needed
+ * to handle callbacks.
+ *
+ * @mdev: pointer to mic_device instance
+ */
+static int mic_setup_callbacks(struct mic_device *mdev)
+{
+	int i;
+
+	mdev->irq_info.cb_list = kmalloc(sizeof(struct list_head) *
+		MIC_NUM_OFFSETS, GFP_KERNEL);
+	if (!mdev->irq_info.cb_list)
+		return -ENOMEM;
+
+	for (i = 0; i < MIC_NUM_OFFSETS; i++)
+		INIT_LIST_HEAD(&mdev->irq_info.cb_list[i]);
+
+	spin_lock_init(&mdev->irq_info.mic_intr_lock);
+	return 0;
+}
+
+/**
+ * mic_release_callbacks - Uninitialize data structures needed
+ * to handle callbacks.
+ *
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_release_callbacks(struct mic_device *mdev)
+{
+	unsigned long flags;
+	struct list_head *pos, *tmp;
+	struct mic_intr_cb *intr_cb;
+	int i;
+
+	for (i = 0; i < MIC_NUM_OFFSETS; i++) {
+		spin_lock_irqsave(&mdev->irq_info.mic_intr_lock, flags);
+
+		if (!list_empty(&mdev->irq_info.cb_list[i])) {
+			dev_warn(&mdev->pdev->dev,
+			"irq %d may still be in use.\n", mdev->pdev->irq);
+		} else {
+			spin_unlock_irqrestore(&mdev->irq_info.mic_intr_lock,
+				flags);
+			break;
+		}
+
+		list_for_each_safe(pos, tmp, &mdev->irq_info.cb_list[i]) {
+			intr_cb = list_entry(pos, struct mic_intr_cb, list);
+			list_del(pos);
+			kfree(intr_cb);
+		}
+		spin_unlock_irqrestore(&mdev->irq_info.mic_intr_lock, flags);
+	}
+
+	kfree(mdev->irq_info.cb_list);
+}
+
+/**
+ * mic_setup_msi - Initializes MSI interrupts.
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static int mic_setup_msi(struct mic_device *mdev)
+{
+	struct pci_dev *pdev = mdev->pdev;
+	int rc;
+
+	rc = pci_enable_msi(pdev);
+	if (rc) {
+		dev_dbg(&pdev->dev, "Error enabling MSI. rc = %d\n", rc);
+		return rc;
+	}
+
+	mdev->irq_info.num_vectors = 1;
+	rc = mic_alloc_msi_map(mdev);
+	if (rc)
+		goto err_nomem1;
+
+	rc = mic_setup_callbacks(mdev);
+	if (rc) {
+		dev_err(&pdev->dev, "Error setting up callbacks\n");
+		goto err_nomem2;
+	}
+
+	rc = request_irq(pdev->irq, mic_interrupt, 0 , "mic-msi", mdev);
+	if (rc) {
+		dev_err(&pdev->dev, "Error allocating MSI interrupt\n");
+		goto err_irq_req_fail;
+	}
+
+	dev_dbg(&pdev->dev, "%d MSI irqs setup\n", mdev->irq_info.num_vectors);
+	return 0;
+
+err_irq_req_fail:
+	mic_release_callbacks(mdev);
+err_nomem2:
+	mic_free_msi_map(mdev);
+err_nomem1:
+	pci_disable_msi(pdev);
+	mdev->irq_info.num_vectors = 0;
+	return rc;
+}
+
+/**
+ * mic_setup_intx - Initializes legacy interrupts.
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static int mic_setup_intx(struct mic_device *mdev)
+{
+	struct pci_dev *pdev = mdev->pdev;
+	int rc;
+
+	pci_msi_off(pdev);
+
+	/* Enable intx */
+	pci_intx(pdev, 1);
+
+	rc = mic_setup_callbacks(mdev);
+	if (rc) {
+		dev_err(&pdev->dev, "Error setting up callbacks\n");
+		goto err_nomem;
+	}
+
+	rc = request_irq(pdev->irq, mic_interrupt,
+		IRQF_SHARED, "mic-intx", mdev);
+	if (rc)
+		goto err;
+
+	dev_dbg(&pdev->dev, "intx irq setup\n");
+
+	return 0;
+err:
+	mic_release_callbacks(mdev);
+err_nomem:
+	return rc;
+
+}
+
+/**
+ * mic_setup_interrupts - Initializes interrupts.
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static int mic_setup_interrupts(struct mic_device *mdev)
+{
+	int rc;
+
+	rc = mic_setup_msix(mdev);
+	if (!rc)
+		goto done;
+
+	rc = mic_setup_msi(mdev);
+	if (!rc)
+		goto done;
+
+	rc = mic_setup_intx(mdev);
+	if (rc) {
+		dev_err(&mdev->pdev->dev, "no usable interrupts\n");
+		return rc;
+	}
+done:
+	mdev->intr_ops->enable_interrupts(mdev);
+	return 0;
+}
+
+/**
+ * mic_free_interrupts - Frees interrupts setup by mic_setup_interrupts
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * returns none.
+ */
+static void mic_free_interrupts(struct mic_device *mdev)
+{
+	struct pci_dev *pdev = mdev->pdev;
+	int i;
+
+	mdev->intr_ops->disable_interrupts(mdev);
+	if (mdev->irq_info.num_vectors > 1) {
+		for (i = 0; i < mdev->irq_info.num_vectors; i++) {
+			if (mdev->irq_info.mic_msi_map[i])
+				dev_warn(&pdev->dev, "irq %d may still be in use.\n",
+					mdev->irq_info.msix_entries[i].vector);
+		}
+		mic_free_msi_map(mdev);
+		kfree(mdev->irq_info.msix_entries);
+		pci_disable_msix(pdev);
+	} else {
+		if (pci_dev_msi_enabled(mdev->pdev)) {
+			free_irq(mdev->pdev->irq, mdev);
+			mic_free_msi_map(mdev);
+			pci_disable_msi(pdev);
+		} else {
+			free_irq(mdev->pdev->irq, mdev);
+		}
+		mic_release_callbacks(mdev);
+	}
+}
+
 /**
  * mic_ops_init: Initialize HW specific operation tables.
  *
@@ -81,6 +697,8 @@ static void mic_ops_init(struct mic_device *mdev)
 	switch (mdev->family) {
 	case MIC_FAMILY_X100:
 		mdev->ops = &mic_x100_ops;
+		mdev->intr_ops = &mic_x100_intr_ops;
+		mdev->smpt_ops = &mic_x100_smpt_ops;
 		break;
 	default:
 		break;
@@ -139,6 +757,8 @@ mic_device_init(struct mic_device *mdev, struct pci_dev *pdev)
 	mdev->family = mic_get_family(mdev);
 	mic_ops_init(mdev);
 	mic_sysfs_init(mdev);
+	mutex_init(&mdev->mic_mutex);
+	mdev->irq_info.next_avail_src = 0;
 }
 
 /**
@@ -209,7 +829,17 @@ static int __init mic_probe(struct pci_dev *pdev,
 	}
 
 	mdev->ops->init(mdev);
-
+	mdev->intr_ops->intr_init(mdev);
+	rc = mic_setup_interrupts(mdev);
+	if (rc) {
+		dev_err(&pdev->dev, "mic_setup_interrupts failed %d\n", rc);
+		goto unmap_aper;
+	}
+	rc = mic_smpt_init(mdev);
+	if (rc) {
+		dev_err(&pdev->dev, "smpt_init failed %d\n", rc);
+		goto free_interrupts;
+	}
 	pci_set_drvdata(pdev, mdev);
 
 	mdev->sdev = device_create(g_mic.mic_class, &pdev->dev,
@@ -217,7 +847,7 @@ static int __init mic_probe(struct pci_dev *pdev,
 	if (IS_ERR(mdev->sdev)) {
 		rc = PTR_ERR(mdev->sdev);
 		dev_err(&pdev->dev, "device_create failed rc %d\n", rc);
-		goto unmap_aper;
+		goto smpt_uninit;
 	}
 
 	rc = sysfs_create_group(&mdev->sdev->kobj, &mdev->attr_group);
@@ -229,6 +859,10 @@ static int __init mic_probe(struct pci_dev *pdev,
 	return 0;
 destroy_device:
 	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
+smpt_uninit:
+	mic_smpt_uninit(mdev);
+free_interrupts:
+	mic_free_interrupts(mdev);
 unmap_aper:
 	iounmap(mdev->mmio.va);
 unmap_mmio:
@@ -265,6 +899,8 @@ static void mic_remove(struct pci_dev *pdev)
 
 	sysfs_remove_group(&mdev->sdev->kobj, &mdev->attr_group);
 	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
+	mic_smpt_uninit(mdev);
+	mic_free_interrupts(mdev);
 	iounmap(mdev->mmio.va);
 	iounmap(mdev->aper.va);
 	pci_release_regions(pdev);
diff --git a/drivers/misc/mic/host/mic_smpt.c b/drivers/misc/mic/host/mic_smpt.c
new file mode 100644
index 0000000..616f85d
--- /dev/null
+++ b/drivers/misc/mic/host/mic_smpt.c
@@ -0,0 +1,436 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+
+#include "mic_common.h"
+
+static inline u64 mic_system_page_mask(struct mic_device *mdev)
+{
+	return (1ULL << mdev->smpt->info.page_shift) - 1ULL;
+}
+
+static inline u64 mic_sys_addr_to_smpt(struct mic_device *mdev, dma_addr_t pa)
+{
+	return (pa - mdev->smpt->info.base) >> mdev->smpt->info.page_shift;
+}
+
+static inline u64 mic_smpt_to_pa(struct mic_device *mdev, u64 index)
+{
+	return mdev->smpt->info.base + (index * mdev->smpt->info.page_size);
+}
+
+static inline u64 mic_smpt_offset(struct mic_device *mdev, dma_addr_t pa)
+{
+	return pa & mic_system_page_mask(mdev);
+}
+
+static inline u64 mic_smpt_align_low(struct mic_device *mdev, dma_addr_t pa)
+{
+	return ALIGN(pa - mic_system_page_mask(mdev),
+		mdev->smpt->info.page_size);
+}
+
+static inline u64 mic_smpt_align_high(struct mic_device *mdev, dma_addr_t pa)
+{
+	return ALIGN(pa, mdev->smpt->info.page_size);
+}
+
+/* Total Cumulative system memory accessible by MIC across all SMPT entries */
+static inline u64 mic_max_system_memory(struct mic_device *mdev)
+{
+	return mdev->smpt->info.num_reg * mdev->smpt->info.page_size;
+}
+
+/* Maximum system memory address accessible by MIC */
+static inline u64 mic_max_system_addr(struct mic_device *mdev)
+{
+	return mdev->smpt->info.base + mic_max_system_memory(mdev) - 1ULL;
+}
+
+/* Check if the DMA address is a MIC system memory address */
+static inline bool
+mic_is_system_addr(struct mic_device *mdev, dma_addr_t pa)
+{
+	return pa >= mdev->smpt->info.base && pa <= mic_max_system_addr(mdev);
+}
+
+/* Populate an SMPT entry and update the reference counts. */
+static void mic_add_smpt_entry(int spt, s64 *ref, u64 addr,
+		int entries, struct mic_device *mdev)
+{
+	struct mic_smpt_info *smpt_info = mdev->smpt;
+	int i;
+
+	for (i = spt; i < spt + entries; i++,
+		addr += smpt_info->info.page_size) {
+		if (!smpt_info->entry[i].ref_count &&
+			(smpt_info->entry[i].dma_addr != addr)) {
+			mdev->smpt_ops->set(mdev, addr, i);
+			smpt_info->entry[i].dma_addr = addr;
+		}
+		smpt_info->entry[i].ref_count += ref[i - spt];
+	}
+}
+
+/*
+ * Find an available MIC address in MIC SMPT address space
+ * for a given DMA address and size.
+ */
+static dma_addr_t mic_smpt_op(struct mic_device *mdev, u64 dma_addr,
+				int entries, s64 *ref, size_t size)
+{
+	int spt = -1;   /* smpt index */
+	int ee = 0;	/* existing entries */
+	int fe = 0;	/* free entries */
+	int i;
+	unsigned long flags;
+	dma_addr_t mic_addr = 0;
+	dma_addr_t addr = dma_addr;
+	struct mic_smpt_info *smpt_info = mdev->smpt;
+
+	spin_lock_irqsave(&smpt_info->smpt_lock, flags);
+
+	/* find existing entries */
+	for (i = 0; i < smpt_info->info.num_reg; i++) {
+		if (smpt_info->entry[i].dma_addr == addr) {
+			ee++;
+			addr += smpt_info->info.page_size;
+		} else if (ee) /* cannot find contiguous entries */
+			goto not_found;
+
+		if (ee == entries)
+			goto found;
+	}
+
+	/* find free entry */
+	for (i = 0; i < smpt_info->info.num_reg; i++) {
+		fe = (smpt_info->entry[i].ref_count == 0) ? fe + 1 : 0;
+		if (fe == entries)
+			goto found;
+	}
+
+not_found:
+	spin_unlock_irqrestore(&smpt_info->smpt_lock, flags);
+	return mic_addr;
+
+found:
+	spt = i - entries + 1;
+	mic_addr = mic_smpt_to_pa(mdev, spt);
+	mic_add_smpt_entry(spt, ref, dma_addr, entries, mdev);
+	smpt_info->map_count++;
+	smpt_info->ref_count += (s64)size;
+	spin_unlock_irqrestore(&smpt_info->smpt_lock, flags);
+	return mic_addr;
+}
+
+/*
+ * Returns number of smpt entries needed for dma_addr to dma_addr + size
+ * also returns the reference count array for each of those entries
+ * and the starting smpt address
+ */
+static int mic_get_smpt_ref_count(s64 *ref, dma_addr_t dma_addr, size_t size,
+				u64 *smpt_start, struct mic_device *mdev)
+{
+	u64 start =  dma_addr;
+	u64 end = dma_addr + size;
+	int i = 0;
+
+	while (start < end) {
+		ref[i++] = min(mic_smpt_align_high(mdev, start + 1),
+			end) - start;
+		start = mic_smpt_align_high(mdev, start + 1);
+	}
+
+	if (smpt_start)
+		*smpt_start = mic_smpt_align_low(mdev, dma_addr);
+
+	return i;
+}
+
+/*
+ * mic_to_dma_addr - Converts a MIC address to a DMA address.
+ *
+ * @mdev: pointer to mic_device instance.
+ * @mic_address: MIC address.
+ *
+ * returns a DMA address.
+ */
+static dma_addr_t
+mic_to_dma_addr(struct mic_device *mdev, dma_addr_t mic_addr)
+{
+	struct mic_smpt_info *smpt_info = mdev->smpt;
+	int spt;
+	dma_addr_t dma_addr;
+
+	if (!mic_is_system_addr(mdev, mic_addr)) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+	spt = mic_sys_addr_to_smpt(mdev, mic_addr);
+	dma_addr = smpt_info->entry[spt].dma_addr +
+		mic_smpt_offset(mdev, mic_addr);
+	return dma_addr;
+}
+
+/**
+ * mic_map - Maps a DMA address to a MIC physical address.
+ *
+ * @mdev: pointer to mic_device instance.
+ * @dma_address: DMA address.
+ * @size: Size of the region to be mapped.
+ *
+ * This API converts the DMA address provided to a DMA address understood
+ * by MIC. Callee should check for errors by called mic_map_error(..).
+ *
+ * returns DMA address as required by MIC.
+ */
+dma_addr_t mic_map(struct mic_device *mdev, dma_addr_t dma_addr, size_t size)
+{
+	dma_addr_t mic_addr = 0;
+	int entries;
+	s64 *ref;
+	u64 smpt_start;
+
+	if (!size || size > mic_max_system_memory(mdev))
+		return mic_addr;
+
+	ref = kmalloc(mdev->smpt->info.num_reg * sizeof(s64), GFP_KERNEL);
+	if (!ref)
+		return mic_addr;
+
+	/*
+	 * Get number of smpt entries to be mapped, ref count array
+	 * and the starting smpt address to start the search for
+	 * free or existing smpt entries.
+	 */
+	entries = mic_get_smpt_ref_count(ref,
+		dma_addr, size, &smpt_start, mdev);
+
+	/* Set the smpt table appropriately and get 16G aligned mic address */
+	mic_addr = mic_smpt_op(mdev, smpt_start, entries, ref, size);
+
+	kfree(ref);
+
+	/*
+	 * If mic_addr is zero then its an error case
+	 * since mic_addr can never be zero.
+	 * else generate mic_addr by adding the 16G offset in dma_addr
+	 */
+	if (!mic_addr && MIC_FAMILY_X100 == mdev->family) {
+		WARN_ON(1);
+		return mic_addr;
+	} else {
+		return mic_addr + (dma_addr & mic_system_page_mask(mdev));
+	}
+}
+
+/**
+ * mic_unmap - Unmaps a MIC physical address.
+ *
+ * @mdev: pointer to mic_device instance.
+ * @mic_addr: MIC physical address.
+ * @size: Size of the region to be unmapped.
+ *
+ * This API unmaps the mappings created by mic_map(..).
+ *
+ * returns None.
+ */
+void mic_unmap(struct mic_device *mdev, dma_addr_t mic_addr, size_t size)
+{
+	struct mic_smpt_info *smpt_info = mdev->smpt;
+	s64 *ref;
+	int num_smpt;
+	int spt;
+	int i;
+	unsigned long flags;
+
+	if (!size)
+		return;
+
+	if (!mic_is_system_addr(mdev, mic_addr)) {
+		WARN_ON(1);
+		return;
+	}
+
+	spt = mic_sys_addr_to_smpt(mdev, mic_addr);
+	ref = kmalloc(mdev->smpt->info.num_reg * sizeof(s64), GFP_KERNEL);
+	if (!ref)
+		return;
+
+	/* Get number of smpt entries to be mapped, ref count array */
+	num_smpt = mic_get_smpt_ref_count(ref, mic_addr, size, NULL, mdev);
+
+	spin_lock_irqsave(&smpt_info->smpt_lock, flags);
+	smpt_info->unmap_count++;
+	smpt_info->ref_count -= (s64)size;
+
+	for (i = spt; i < spt + num_smpt; i++) {
+		smpt_info->entry[i].ref_count -= ref[i - spt];
+		WARN_ON(smpt_info->entry[i].ref_count < 0);
+	}
+	spin_unlock_irqrestore(&smpt_info->smpt_lock, flags);
+	kfree(ref);
+}
+
+/**
+ * mic_map_single - Maps a virtual address to a MIC physical address.
+ *
+ * @mdev: pointer to mic_device instance.
+ * @va: Kernel direct mapped virtual address.
+ * @size: Size of the region to be mapped.
+ *
+ * This API calls pci_map_single(..) for the direct mapped virtual address
+ * and then converts the DMA address provided to a DMA address understood
+ * by MIC. Callee should check for errors by called mic_map_error(..).
+ *
+ * returns DMA address as required by MIC.
+ */
+dma_addr_t mic_map_single(struct mic_device *mdev, void *va, size_t size)
+{
+	dma_addr_t mic_addr = 0;
+	dma_addr_t dma_addr =
+		pci_map_single(mdev->pdev, va, size, PCI_DMA_BIDIRECTIONAL);
+
+	if (!pci_dma_mapping_error(mdev->pdev, dma_addr)) {
+		mic_addr = mic_map(mdev, dma_addr, size);
+		if (!mic_addr) {
+			dev_err(&mdev->pdev->dev,
+				"mic_map failed dma_addr 0x%llx size 0x%lx\n",
+				dma_addr, size);
+			pci_unmap_single(mdev->pdev, dma_addr,
+				size, PCI_DMA_BIDIRECTIONAL);
+		}
+	}
+	return mic_addr;
+}
+
+/**
+ * mic_unmap_single - Unmaps a MIC physical address.
+ *
+ * @mdev: pointer to mic_device instance.
+ * @mic_addr: MIC physical address.
+ * @size: Size of the region to be unmapped.
+ *
+ * This API unmaps the mappings created by mic_map_single(..).
+ *
+ * returns None.
+ */
+void
+mic_unmap_single(struct mic_device *mdev, dma_addr_t mic_addr, size_t size)
+{
+	dma_addr_t dma_addr = mic_to_dma_addr(mdev, mic_addr);
+	mic_unmap(mdev, mic_addr, size);
+	pci_unmap_single(mdev->pdev, dma_addr, size, PCI_DMA_BIDIRECTIONAL);
+}
+
+/**
+ * mic_smpt_init - Initialize MIC System Memory Page Tables.
+ *
+ * @mdev: pointer to mic_device instance.
+ *
+ * returns 0 for success and -errno for error.
+ */
+int mic_smpt_init(struct mic_device *mdev)
+{
+	int i, err = 0;
+	dma_addr_t dma_addr;
+	struct mic_smpt_info *smpt_info;
+
+	mdev->smpt = kmalloc(sizeof(*mdev->smpt), GFP_KERNEL);
+	if (!mdev->smpt)
+		return -ENOMEM;
+
+	smpt_info = mdev->smpt;
+	mdev->smpt_ops->init(mdev);
+	smpt_info->entry = kmalloc(sizeof(struct mic_smpt)
+			* smpt_info->info.num_reg, GFP_KERNEL);
+	if (!smpt_info->entry) {
+		err = -ENOMEM;
+		goto free_smpt;
+	}
+	spin_lock_init(&smpt_info->smpt_lock);
+	for (i = 0; i < smpt_info->info.num_reg; i++) {
+		dma_addr = i * smpt_info->info.page_size;
+		smpt_info->entry[i].dma_addr = dma_addr;
+		smpt_info->entry[i].ref_count = 0;
+		mdev->smpt_ops->set(mdev, dma_addr, i);
+	}
+	smpt_info->ref_count = 0;
+	smpt_info->map_count = 0;
+	smpt_info->unmap_count = 0;
+	return 0;
+free_smpt:
+	kfree(smpt_info);
+	return err;
+}
+
+/**
+ * mic_smpt_uninit - UnInitialize MIC System Memory Page Tables.
+ *
+ * @mdev: pointer to mic_device instance.
+ *
+ * returns None.
+ */
+void mic_smpt_uninit(struct mic_device *mdev)
+{
+	struct mic_smpt_info *smpt_info = mdev->smpt;
+	int i;
+
+	dev_dbg(&mdev->pdev->dev,
+		"nodeid %d SMPT ref count %lld map %lld unmap %lld\n",
+		mdev->id, smpt_info->ref_count,
+		smpt_info->map_count, smpt_info->unmap_count);
+
+	for (i = 0; i < smpt_info->info.num_reg; i++) {
+		dev_dbg(&mdev->pdev->dev,
+			"SMPT entry[%d] dma_addr = 0x%llx ref_count = %lld\n",
+			i, smpt_info->entry[i].dma_addr,
+			smpt_info->entry[i].ref_count);
+		WARN_ON(smpt_info->entry[i].ref_count);
+	}
+	kfree(smpt_info->entry);
+	kfree(smpt_info);
+}
+
+/**
+ * mic_smpt_restore - Restore MIC System Memory Page Tables.
+ *
+ * @mdev: pointer to mic_device instance.
+ *
+ * Restore the SMPT registers to values previously stored in the
+ * SW data structures. Some MIC steppings lose register state
+ * across resets and this API should be called for performing
+ * a restore operation if required.
+ *
+ * returns None.
+ */
+void mic_smpt_restore(struct mic_device *mdev)
+{
+	int i;
+	dma_addr_t dma_addr;
+
+	for (i = 0; i < mdev->smpt->info.num_reg; i++) {
+		dma_addr = mdev->smpt->entry[i].dma_addr;
+		mdev->smpt_ops->set(mdev, dma_addr, i);
+	}
+}
diff --git a/drivers/misc/mic/host/mic_smpt.h b/drivers/misc/mic/host/mic_smpt.h
new file mode 100644
index 0000000..51970ab
--- /dev/null
+++ b/drivers/misc/mic/host/mic_smpt.h
@@ -0,0 +1,98 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#ifndef MIC_SMPT_H
+#define MIC_SMPT_H
+/**
+ * struct mic_smpt_ops - MIC HW specific SMPT operations.
+ * @init: Initialize hardware specific SMPT information in mic_smpt_hw_info.
+ * @set: Set the value for a particular SMPT entry.
+ */
+struct mic_smpt_ops {
+	void (*init)(struct mic_device *mdev);
+	void (*set)(struct mic_device *mdev, dma_addr_t dma_addr, u8 index);
+};
+
+/**
+ * struct mic_smpt - MIC SMPT entry information.
+ * @dma_addr: Base DMA address for this SMPT entry.
+ * @ref_count: Number of active mappings for this SMPT entry in bytes.
+ */
+struct mic_smpt {
+	dma_addr_t dma_addr;
+	s64 ref_count;
+};
+
+/**
+ * struct mic_smpt_hw_info - MIC SMPT hardware specific information.
+ * @num_reg: Number of SMPT registers.
+ * @page_shift: System memory page shift.
+ * @page_size: System memory page size.
+ * @base: System address base.
+ */
+struct mic_smpt_hw_info {
+	u8 num_reg;
+	u8 page_shift;
+	u64 page_size;
+	u64 base;
+};
+
+/**
+ * struct mic_smpt_info - MIC SMPT information.
+ * @entry: Array of SMPT entries.
+ * @smpt_lock: Spin lock protecting access to SMPT data structures.
+ * @info: Hardware specific SMPT information.
+ * @ref_count: Number of active SMPT mappings (for debug).
+ * @map_count: Number of SMPT mappings created (for debug).
+ * @unmap_count: Number of SMPT mappings destroyed (for debug).
+ */
+struct mic_smpt_info {
+	struct mic_smpt *entry;
+	spinlock_t smpt_lock;
+	struct mic_smpt_hw_info info;
+	s64 ref_count;
+	s64 map_count;
+	s64 unmap_count;
+};
+
+dma_addr_t mic_map_single(struct mic_device *mdev, void *va, size_t size);
+void mic_unmap_single(struct mic_device *mdev,
+	dma_addr_t mic_addr, size_t size);
+dma_addr_t mic_map(struct mic_device *mdev,
+	dma_addr_t dma_addr, size_t size);
+void mic_unmap(struct mic_device *mdev, dma_addr_t mic_addr, size_t size);
+
+/**
+ * mic_map_error - Check a MIC address for errors.
+ *
+ * @mdev: pointer to mic_device instance.
+ *
+ * returns Whether there was an error during mic_map..(..) APIs.
+ */
+static inline bool mic_map_error(dma_addr_t mic_addr)
+{
+	return !mic_addr;
+}
+
+int mic_smpt_init(struct mic_device *mdev);
+void mic_smpt_uninit(struct mic_device *mdev);
+void mic_smpt_restore(struct mic_device *mdev);
+
+#endif
diff --git a/drivers/misc/mic/host/mic_x100.c b/drivers/misc/mic/host/mic_x100.c
index a6c1f11..055bd00 100644
--- a/drivers/misc/mic/host/mic_x100.c
+++ b/drivers/misc/mic/host/mic_x100.c
@@ -77,10 +77,258 @@ mic_x100_read_spad(struct mic_device *mdev, unsigned int idx)
 	return val;
 }
 
+/**
+ * mic_x100_enable_interrupts - Enable interrupts.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_x100_enable_interrupts(struct mic_device *mdev)
+{
+	u32 reg;
+	struct mic_mw *mw = &mdev->mmio;
+	u32 sice0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SICE0;
+	u32 siac0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SIAC0;
+
+	reg = mic_mmio_read(mw, sice0);
+	reg |= MIC_X100_SBOX_DBR_BITS(0xf) | MIC_X100_SBOX_DMA_BITS(0xff);
+	mic_mmio_write(mw, reg, sice0);
+
+	/* Enable auto-clear when enabling interrupts.
+	 * Applicable only for MSI-x interrupts. Legacy
+	 * and MSI mode cannot have auto-clear enabled */
+	if (mdev->irq_info.num_vectors > 1) {
+		reg = mic_mmio_read(mw, siac0);
+		reg |= MIC_X100_SBOX_DBR_BITS(0xf) |
+			MIC_X100_SBOX_DMA_BITS(0xff);
+		mic_mmio_write(mw, reg, siac0);
+	}
+}
+
+/**
+ * mic_x100_disable_interrupts - Disable interrupts.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_x100_disable_interrupts(struct mic_device *mdev)
+{
+	u32 reg;
+	struct mic_mw *mw = &mdev->mmio;
+	u32 sice0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SICE0;
+	u32 siac0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SIAC0;
+	u32 sicc0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SICC0;
+
+	reg = mic_mmio_read(mw, sice0);
+	mic_mmio_write(mw, reg, sicc0);
+
+	if (mdev->irq_info.num_vectors > 1) {
+		reg = mic_mmio_read(mw, siac0);
+		reg &= ~(MIC_X100_SBOX_DBR_BITS(0xf) |
+			MIC_X100_SBOX_DMA_BITS(0xff));
+		mic_mmio_write(mw, reg, siac0);
+	}
+}
+
+/**
+ * mic_x100_send_sbox_intr - Send an MIC_X100_SBOX interrupt to MIC.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_x100_send_sbox_intr(struct mic_device *mdev,
+			int doorbell)
+{
+	struct mic_mw *mw = &mdev->mmio;
+	u64 apic_icr_offset = MIC_X100_SBOX_APICICR0 + doorbell * 8;
+	u32 apicicr_low = mic_mmio_read(mw,
+			MIC_X100_SBOX_BASE_ADDRESS + apic_icr_offset);
+
+	/* for MIC we need to make sure we "hit" the send_icr bit (13) */
+	apicicr_low = (apicicr_low | (1 << 13));
+
+	/* Ensure all previous stores are ordered. */
+	wmb();
+	/* MIC card only triggers when we write the lower part of the
+	 * address (upper bits)
+	 */
+	mic_mmio_write(mw, apicicr_low,
+		MIC_X100_SBOX_BASE_ADDRESS + apic_icr_offset);
+}
+
+/**
+ * mic_x100_send_rdmasr_intr - Send an RDMASR interrupt to MIC.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_x100_send_rdmasr_intr(struct mic_device *mdev,
+			int doorbell)
+{
+	int rdmasr_offset = MIC_X100_SBOX_RDMASR0 + (doorbell << 2);
+	/* Ensure all previous stores are ordered. */
+	wmb();
+	mic_mmio_write(&mdev->mmio, 0,
+		MIC_X100_SBOX_BASE_ADDRESS + rdmasr_offset);
+}
+
+/**
+ * __mic_x100_send_intr - Send interrupt to MIC.
+ * @mdev: pointer to mic_device instance
+ * @doorbell: doorbell number.
+ */
+static void mic_x100_send_intr(struct mic_device *mdev, int doorbell)
+{
+	int rdmasr_db;
+	if (doorbell < MIC_X100_NUM_SBOX_IRQ) {
+		mic_x100_send_sbox_intr(mdev, doorbell);
+	} else {
+		rdmasr_db = doorbell - MIC_X100_NUM_SBOX_IRQ +
+			MIC_X100_RDMASR_IRQ_BASE;
+		mic_x100_send_rdmasr_intr(mdev, rdmasr_db);
+	}
+}
+
+/**
+ * mic_ack_interrupt - Device specific interrupt handling.
+ * @mdev: pointer to mic_device instance
+ *
+ * Returns: bitmask of doorbell events triggered.
+ */
+static u32 mic_x100_ack_interrupt(struct mic_device *mdev)
+{
+	u32 reg = 0;
+	struct mic_mw *mw = &mdev->mmio;
+	u32 sicr0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SICR0;
+
+	/* Clear pending bit array. */
+	if (MIC_A0_STEP == mdev->stepping)
+		mic_mmio_write(mw, 1, MIC_X100_SBOX_BASE_ADDRESS +
+			MIC_X100_SBOX_MSIXPBACR);
+
+	if (mdev->irq_info.num_vectors <= 1) {
+		reg = mic_mmio_read(mw, sicr0);
+
+		if (unlikely(!reg))
+			goto done;
+
+		mic_mmio_write(mw, reg, sicr0);
+	}
+
+	if (mdev->stepping >= MIC_B0_STEP)
+		mdev->intr_ops->enable_interrupts(mdev);
+done:
+	return reg;
+}
+
+/**
+ * mic_x100_hw_intr_init - Initialize h/w specific interrupt
+ * information.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_x100_hw_intr_init(struct mic_device *mdev)
+{
+	mdev->intr_info = (struct mic_intr_info *) mic_x100_intr_init;
+}
+
+/**
+ * mic_x100_read_msi_to_src_map - read from the MSI mapping registers
+ * @mdev: pointer to mic_device instance
+ * @idx: index to the mapping register, 0 based
+ *
+ * This function allows reading of the 32bit MSI mapping register.
+ *
+ * RETURNS: The value in the register.
+ */
+static u32
+mic_x100_read_msi_to_src_map(struct mic_device *mdev, int idx)
+{
+	return mic_mmio_read(&mdev->mmio,
+		MIC_X100_SBOX_BASE_ADDRESS +
+		MIC_X100_SBOX_MXAR0 + idx * 4);
+}
+
+/**
+ * mic_x100_program_msi_to_src_map - program the MSI mapping registers
+ * @mdev: pointer to mic_device instance
+ * @idx: index to the mapping register, 0 based
+ * @offset: The bit offset in the register that needs to be updated.
+ * @set: boolean specifying if the bit in the specified offset needs
+ * to be set or cleared.
+ *
+ * RETURNS: None.
+ */
+static void
+mic_x100_program_msi_to_src_map(struct mic_device *mdev,
+			int idx, int offset, bool set)
+{
+	unsigned long reg;
+	struct mic_mw *mw = &mdev->mmio;
+	u32 mxar = MIC_X100_SBOX_BASE_ADDRESS +
+		MIC_X100_SBOX_MXAR0 + idx * 4;
+
+	reg = mic_mmio_read(mw, mxar);
+	if (set)
+		__set_bit(offset, &reg);
+	else
+		__clear_bit(offset, &reg);
+	mic_mmio_write(mw, reg, mxar);
+}
+
+/**
+ * mic_x100_smpt_set - Update an SMPT entry with a DMA address.
+ * @mdev: pointer to mic_device instance
+ *
+ * RETURNS: none.
+ */
+static void
+mic_x100_smpt_set(struct mic_device *mdev, dma_addr_t dma_addr, u8 index)
+{
+#define SNOOP_ON	(0 << 0)
+#define SNOOP_OFF	(1 << 0)
+/*
+ * Sbox Smpt Reg Bits:
+ * Bits	31:2	Host address
+ * Bits	1	RSVD
+ * Bits	0	No snoop
+ */
+#define BUILD_SMPT(NO_SNOOP, HOST_ADDR)  \
+	(u32)(((((HOST_ADDR) << 2) & (~0x03)) | ((NO_SNOOP) & (0x01))))
+
+	uint32_t smpt_reg_val = BUILD_SMPT(SNOOP_ON,
+			dma_addr >> mdev->smpt->info.page_shift);
+	mic_mmio_write(&mdev->mmio, smpt_reg_val,
+		MIC_X100_SBOX_BASE_ADDRESS +
+		MIC_X100_SBOX_SMPT00 + (4 * index));
+}
+
+/**
+ * mic_x100_smpt_hw_init - Initialize SMPT X100 specific fields.
+ * @mdev: pointer to mic_device instance
+ *
+ * RETURNS: none.
+ */
+static void mic_x100_smpt_hw_init(struct mic_device *mdev)
+{
+	struct mic_smpt_hw_info *info = &mdev->smpt->info;
+
+	info->num_reg = 32;
+	info->page_shift = 34;
+	info->page_size = (1ULL << info->page_shift);
+	info->base = 0x8000000000ULL;
+}
+
+struct mic_smpt_ops mic_x100_smpt_ops = {
+	.init = mic_x100_smpt_hw_init,
+	.set = mic_x100_smpt_set,
+};
+
 struct mic_hw_ops mic_x100_ops = {
 	.aper_bar = MIC_X100_APER_BAR,
 	.mmio_bar = MIC_X100_MMIO_BAR,
 	.init = mic_x100_hw_init,
 	.read_spad = mic_x100_read_spad,
 	.write_spad = mic_x100_write_spad,
+	.send_intr = mic_x100_send_intr,
+	.ack_interrupt = mic_x100_ack_interrupt,
+};
+
+struct mic_hw_intr_ops mic_x100_intr_ops = {
+	.intr_init = mic_x100_hw_intr_init,
+	.enable_interrupts = mic_x100_enable_interrupts,
+	.disable_interrupts = mic_x100_disable_interrupts,
+	.program_msi_to_src_map = mic_x100_program_msi_to_src_map,
+	.read_msi_to_src_map = mic_x100_read_msi_to_src_map,
 };
diff --git a/drivers/misc/mic/host/mic_x100.h b/drivers/misc/mic/host/mic_x100.h
index 1f4e630..fd98b2b 100644
--- a/drivers/misc/mic/host/mic_x100.h
+++ b/drivers/misc/mic/host/mic_x100.h
@@ -42,6 +42,46 @@
 
 #define MIC_X100_SBOX_BASE_ADDRESS 0x00010000
 #define MIC_X100_SBOX_SPAD0 0x0000AB20
+#define MIC_X100_SBOX_SICR0_DBR(x) ((x) & 0xf)
+#define MIC_X100_SBOX_SICR0_DMA(x) (((x) >> 8) & 0xff)
+#define MIC_X100_SBOX_SICE0_DBR(x) ((x) & 0xf)
+#define MIC_X100_SBOX_DBR_BITS(x) ((x) & 0xf)
+#define MIC_X100_SBOX_SICE0_DMA(x) (((x) >> 8) & 0xff)
+#define MIC_X100_SBOX_DMA_BITS(x) (((x) & 0xff) << 8)
+
+#define MIC_X100_SBOX_APICICR0 0x0000A9D0
+#define MIC_X100_SBOX_SICR0 0x00009004
+#define MIC_X100_SBOX_SICE0 0x0000900C
+#define MIC_X100_SBOX_SICC0 0x00009010
+#define MIC_X100_SBOX_SIAC0 0x00009014
+#define MIC_X100_SBOX_MSIXPBACR 0x00009084
+#define MIC_X100_SBOX_MXAR0 0x00009044
+#define MIC_X100_SBOX_SMPT00 0x00003100
+#define MIC_X100_SBOX_RDMASR0 0x0000B180
+
+#define MIC_X100_DOORBELL_IDX_START 0
+#define MIC_X100_NUM_DOORBELL 4
+#define MIC_X100_DMA_IDX_START 8
+#define MIC_X100_NUM_DMA 8
+#define MIC_X100_ERR_IDX_START 30
+#define MIC_X100_NUM_ERR 1
+
+#define MIC_X100_NUM_SBOX_IRQ 8
+#define MIC_X100_NUM_RDMASR_IRQ 8
+#define MIC_X100_RDMASR_IRQ_BASE 17
+#define MIC_NUM_OFFSETS 32
+
+static const u16 mic_x100_intr_init[] = {
+		MIC_X100_DOORBELL_IDX_START,
+		MIC_X100_DMA_IDX_START,
+		MIC_X100_ERR_IDX_START,
+		MIC_X100_NUM_DOORBELL,
+		MIC_X100_NUM_DMA,
+		MIC_X100_NUM_ERR,
+};
+
 extern struct mic_hw_ops mic_x100_ops;
+extern struct mic_smpt_ops mic_x100_smpt_ops;
+extern struct mic_hw_intr_ops mic_x100_intr_ops;
 
 #endif
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/7] Intel MIC Host Driver, card OS state management.
  2013-08-08  3:04 [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 1/7] Intel MIC Host Driver for X100 family Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 2/7] Intel MIC Host Driver Interrupt/SMPT support " Sudeep Dutt
@ 2013-08-08  3:04 ` Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 4/7] Intel MIC Card Driver for X100 family Sudeep Dutt
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-08  3:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell,
	Michael S. Tsirkin, Rob Landley, linux-kernel, virtualization,
	linux-doc
  Cc: Harshavardhan R Kharche, Peter P Waskiewicz Jr,
	Yaozu (Eddie) Dong, Sudeep Dutt, Ashutosh Dixit, AsiasHeasias,
	Caz Yokoyama, Dasaratharaman Chandramouli

This patch enables the following features:
a) Boots and shuts down the card via sysfs entries.
b) Allocates and maps a device page for communication with the
   card driver and updates the device page address via scratchpad
   registers.
c) Provides sysfs entries for shutdown status, kernel command line,
   ramdisk and log buffer information.

Co-author: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---
 Documentation/ABI/testing/sysfs-class-mic.txt |  84 ++++++
 drivers/misc/mic/common/mic_device.h          |   7 +
 drivers/misc/mic/host/Makefile                |   2 +
 drivers/misc/mic/host/mic_boot.c              | 183 +++++++++++++
 drivers/misc/mic/host/mic_common.h            |   1 +
 drivers/misc/mic/host/mic_debugfs.c           | 354 ++++++++++++++++++++++++++
 drivers/misc/mic/host/mic_debugfs.h           |  29 +++
 drivers/misc/mic/host/mic_device.h            |  56 ++++
 drivers/misc/mic/host/mic_main.c              | 130 +++++++++-
 drivers/misc/mic/host/mic_sysfs.c             | 220 ++++++++++++++++
 drivers/misc/mic/host/mic_x100.c              | 327 ++++++++++++++++++++++++
 drivers/misc/mic/host/mic_x100.h              |  12 +
 include/uapi/linux/Kbuild                     |   1 +
 include/uapi/linux/mic_common.h               |  74 ++++++
 14 files changed, 1476 insertions(+), 4 deletions(-)
 create mode 100644 drivers/misc/mic/host/mic_boot.c
 create mode 100644 drivers/misc/mic/host/mic_debugfs.c
 create mode 100644 drivers/misc/mic/host/mic_debugfs.h
 create mode 100644 include/uapi/linux/mic_common.h

diff --git a/Documentation/ABI/testing/sysfs-class-mic.txt b/Documentation/ABI/testing/sysfs-class-mic.txt
index 36cdb70..8eefd1f 100644
--- a/Documentation/ABI/testing/sysfs-class-mic.txt
+++ b/Documentation/ABI/testing/sysfs-class-mic.txt
@@ -32,3 +32,87 @@ Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
 Description:
 		Provides information about the silicon stepping for an Intel
 		MIC device. For example - "A0" or "B0"
+
+What:		/sys/class/mic/mic(x)/state
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		When read, this entry provides the current state of an Intel
+		MIC device in the context of the card OS. Possible values that
+		will be read are:
+		"offline" - The MIC device is ready to boot the card OS.
+		"online" - The MIC device has initiated booting a card OS.
+		"shutting_down" - The card OS is shutting down.
+		"reset_failed" - The MIC device has failed to reset.
+
+		When written, this sysfs entry triggers different state change
+		operations depending upon the current state of the card OS.
+		Acceptable values are:
+		"boot:linux:<fw_path>:<ramdisk_path>" - Boot the card OS image
+			at lib/firmware/<fw_path> with an optional ramdisk
+			image at lib/firmware/<ramdisk_path>
+		"boot:elf:<fw_path>" - Boot an ELF image at
+			lib/firmware/<fw_path>
+		"reset" - Initiates device reset.
+		"shutdown" - Initiates card OS shutdown.
+
+What:		/sys/class/mic/mic(x)/shutdown_status
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		An Intel MIC device runs a Linux OS during its operation. This
+		OS can shutdown because of various reasons. When read, this
+		entry provides the status on why the card OS was shutdown.
+		Possible values are:
+		"nop" -  shutdown status is not applicable, when the card OS is
+			"online"
+		"crashed" - Shutdown because of a HW or SW crash.
+		"halted" - Shutdown because of a halt command.
+		"poweroff" - Shutdown because of a poweroff command.
+		"restart" - Shutdown because of a restart command.
+
+What:		/sys/class/mic/mic(x)/cmdline
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		An Intel MIC device runs a Linux OS during its operation. Before
+		booting this card OS, it is possible to pass kernel command line
+		options to configure various features in it, similar to
+		self-bootable machines. When read, this entry provides
+		information about the current kernel command line options set to
+		boot the card OS. This entry can be written to change the
+		existing kernel command line options. Typically, the user would
+		want to read the current command line options, append new ones
+		or modify existing ones and then write the whole kernel command
+		line back to this entry.
+
+What:		/sys/class/mic/mic(x)/log_buf_addr
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		An Intel MIC device runs a Linux OS during its operation. For
+		debugging purpose and early kernel boot messages, the user can
+		access the card OS log buffer via debugfs. When read, this entry
+		provides the kernel virtual address of the buffer where the card
+		OS log buffer can be read. This entry is written by the host
+		configuration daemon to set the log buffer address. The correct
+		log buffer address to be written can be found in the System.map
+		file of the card OS.
+
+What:		/sys/class/mic/mic(x)/log_buf_len
+Date:		August 2013
+KernelVersion:	3.10
+Contact:	Sudeep Dutt <sudeep.dutt@intel.com>
+Description:
+		An Intel MIC device runs a Linux OS during its operation. For
+		debugging purpose and early kernel boot messages, the user can
+		access the card OS log buffer via debugfs. When read, this entry
+		provides the kernel virtual address where the card OS log buffer
+		length can be read. This entry is written by host configuration
+		daemon to set the log buffer length address. The correct log
+		buffer length address to be written can be foundin the
+		System.map file of the card OS.
diff --git a/drivers/misc/mic/common/mic_device.h b/drivers/misc/mic/common/mic_device.h
index f02262e..6440e9d 100644
--- a/drivers/misc/mic/common/mic_device.h
+++ b/drivers/misc/mic/common/mic_device.h
@@ -34,4 +34,11 @@ struct mic_mw {
 	resource_size_t len;
 };
 
+/*
+ * Scratch pad register offsets used by the host to communicate
+ * device page DMA address to the card.
+ */
+#define MIC_DPLO_SPAD 14
+#define MIC_DPHI_SPAD 15
+
 #endif
diff --git a/drivers/misc/mic/host/Makefile b/drivers/misc/mic/host/Makefile
index 3702d2a..98bf565 100644
--- a/drivers/misc/mic/host/Makefile
+++ b/drivers/misc/mic/host/Makefile
@@ -7,3 +7,5 @@ mic_host-objs := mic_main.o
 mic_host-objs += mic_x100.o
 mic_host-objs += mic_sysfs.o
 mic_host-objs += mic_smpt.o
+mic_host-objs += mic_boot.o
+mic_host-objs += mic_debugfs.o
diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c
new file mode 100644
index 0000000..fcfb86a
--- /dev/null
+++ b/drivers/misc/mic/host/mic_boot.c
@@ -0,0 +1,183 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+#include <linux/firmware.h>
+#include <linux/delay.h>
+
+#include "mic_common.h"
+
+/**
+ * mic_reset - Reset the MIC device.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_reset(struct mic_device *mdev)
+{
+	int i;
+
+#define MIC_RESET_TO (45)
+
+	mdev->ops->reset_fw_ready(mdev);
+	mdev->ops->reset(mdev);
+
+	for (i = 0; i < MIC_RESET_TO; i++) {
+		if (mdev->ops->is_fw_ready(mdev))
+			return;
+		/*
+		 * Resets typically take 10s of seconds to complete.
+		 * Since an MMIO read is required to check if the
+		 * firmware is ready or not, a 1 second delay works nicely.
+		 */
+		msleep(1000);
+	}
+	mic_set_state(mdev, MIC_RESET_FAILED);
+}
+
+/* Initialize the MIC bootparams */
+void mic_bootparam_init(struct mic_device *mdev)
+{
+	struct mic_bootparam *bootparam = mdev->dp;
+
+	bootparam->magic = MIC_MAGIC;
+	bootparam->c2h_shutdown_db = mdev->shutdown_db;
+	bootparam->h2c_shutdown_db = -1;
+	bootparam->h2c_config_db = -1;
+	bootparam->shutdown_status = 0;
+	bootparam->shutdown_card = 0;
+}
+
+/**
+ * mic_start - Start the MIC.
+ * @mdev: pointer to mic_device instance
+ * @buf: buffer containing boot string including firmware/ramdisk path.
+ *
+ * This function prepares an MIC for boot and initiates boot.
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+int mic_start(struct mic_device *mdev, const char *buf)
+{
+	int rc;
+	mutex_lock(&mdev->mic_mutex);
+retry:
+	if (MIC_OFFLINE != mdev->state) {
+		rc = -EINVAL;
+		goto unlock_ret;
+	}
+	if (!mdev->ops->is_fw_ready(mdev)) {
+		mic_reset(mdev);
+		/*
+		 * The state will either be MIC_OFFLINE if the reset succeeded
+		 * or MIC_RESET_FAILED if the firmware reset failed.
+		 */
+		goto retry;
+	}
+	rc = mdev->ops->load_mic_fw(mdev, buf);
+	if (rc)
+		goto unlock_ret;
+	mic_smpt_restore(mdev);
+	mic_intr_restore(mdev);
+	mdev->intr_ops->enable_interrupts(mdev);
+	mdev->ops->write_spad(mdev, MIC_DPLO_SPAD, mdev->dp_dma_addr);
+	mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev->dp_dma_addr >> 32);
+	mdev->ops->send_firmware_intr(mdev);
+	mic_set_state(mdev, MIC_ONLINE);
+unlock_ret:
+	mutex_unlock(&mdev->mic_mutex);
+	return rc;
+}
+
+/**
+ * mic_stop - Prepare the MIC for reset and trigger reset.
+ * @mdev: pointer to mic_device instance
+ * @force: force a MIC to reset even if it is already offline.
+ *
+ * RETURNS: None.
+ */
+void mic_stop(struct mic_device *mdev, bool force)
+{
+	mutex_lock(&mdev->mic_mutex);
+	if (MIC_OFFLINE != mdev->state || force) {
+		mic_bootparam_init(mdev);
+		mic_reset(mdev);
+		if (MIC_RESET_FAILED == mdev->state)
+			goto unlock;
+		mic_set_shutdown_status(mdev, MIC_NOP);
+		mic_set_state(mdev, MIC_OFFLINE);
+	}
+unlock:
+	mutex_unlock(&mdev->mic_mutex);
+}
+
+/**
+ * mic_shutdown - Initiate MIC shutdown.
+ * @mdev: pointer to mic_device instance
+ *
+ * RETURNS: None.
+ */
+void mic_shutdown(struct mic_device *mdev)
+{
+	struct mic_bootparam *bootparam = mdev->dp;
+	s8 db = bootparam->h2c_shutdown_db;
+
+	mutex_lock(&mdev->mic_mutex);
+	if (MIC_ONLINE == mdev->state && db != -1) {
+		bootparam->shutdown_card = 1;
+		mdev->ops->send_intr(mdev, db);
+		mic_set_state(mdev, MIC_SHUTTING_DOWN);
+	}
+	mutex_unlock(&mdev->mic_mutex);
+}
+
+/**
+ * mic_shutdown_work - Handle shutdown interrupt from MIC.
+ * @work: The work structure.
+ *
+ * This work is scheduled whenever the host has received a shutdown
+ * interrupt from the MIC.
+ */
+void mic_shutdown_work(struct work_struct *work)
+{
+	struct mic_device *mdev = container_of(work, struct mic_device,
+			shutdown_work);
+	struct mic_bootparam *bootparam = mdev->dp;
+
+	mutex_lock(&mdev->mic_mutex);
+	mic_set_shutdown_status(mdev, bootparam->shutdown_status);
+	bootparam->shutdown_status = 0;
+	if (MIC_SHUTTING_DOWN != mdev->state)
+		mic_set_state(mdev, MIC_SHUTTING_DOWN);
+	mutex_unlock(&mdev->mic_mutex);
+}
+
+/**
+ * mic_reset_trigger_work - Trigger MIC reset.
+ * @work: The work structure.
+ *
+ * This work is scheduled whenever the host wants to reset the MIC.
+ */
+void mic_reset_trigger_work(struct work_struct *work)
+{
+	struct mic_device *mdev = container_of(work, struct mic_device,
+			reset_trigger_work);
+
+	mic_stop(mdev, false);
+}
diff --git a/drivers/misc/mic/host/mic_common.h b/drivers/misc/mic/host/mic_common.h
index 2a0624e..73dc47e 100644
--- a/drivers/misc/mic/host/mic_common.h
+++ b/drivers/misc/mic/host/mic_common.h
@@ -22,6 +22,7 @@
 #define _MIC_HOST_COMMON_H_
 
 #include <linux/cdev.h>
+#include <linux/mic_common.h>
 
 #include "../common/mic_device.h"
 #include "mic_device.h"
diff --git a/drivers/misc/mic/host/mic_debugfs.c b/drivers/misc/mic/host/mic_debugfs.c
new file mode 100644
index 0000000..74f0713
--- /dev/null
+++ b/drivers/misc/mic/host/mic_debugfs.c
@@ -0,0 +1,354 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+#include <linux/debugfs.h>
+#include <linux/module.h>
+#include <linux/seq_file.h>
+
+#include "mic_common.h"
+#include "mic_debugfs.h"
+
+/* Debugfs parent dir */
+static struct dentry *mic_dbg;
+
+/**
+ * mic_log_buf_show - Display MIC kernel log buffer.
+ *
+ * log_buf addr/len is read from System.map by user space
+ * and populated in sysfs entries.
+ */
+static int mic_log_buf_show(struct seq_file *s, void *unused)
+{
+	void __iomem *log_buf_va;
+	int __iomem *log_buf_len_va;
+	struct mic_device *mdev = s->private;
+	void *kva;
+	int size;
+	unsigned long aper_offset;
+
+	if (!mdev || !mdev->log_buf_addr || !mdev->log_buf_len)
+		goto done;
+	/*
+	 * Card kernel will never be relocated and any kernel text/data mapping
+	 * can be translated to phys address by subtracting __START_KERNEL_map.
+	 */
+	aper_offset = (unsigned long)mdev->log_buf_len - __START_KERNEL_map;
+	log_buf_len_va = mdev->aper.va + aper_offset;
+	aper_offset = (unsigned long)mdev->log_buf_addr - __START_KERNEL_map;
+	log_buf_va = mdev->aper.va + aper_offset;
+	size = ioread32(log_buf_len_va);
+
+	kva = kmalloc(size, GFP_KERNEL);
+	if (!kva)
+		goto done;
+	mutex_lock(&mdev->mic_mutex);
+	memcpy_fromio(kva, log_buf_va, size);
+	switch (mdev->state) {
+	case MIC_ONLINE:
+		/* Fall through */
+	case MIC_SHUTTING_DOWN:
+		seq_write(s, kva, size);
+		break;
+	default:
+		break;
+	}
+	mutex_unlock(&mdev->mic_mutex);
+	kfree(kva);
+done:
+	return 0;
+}
+
+static int mic_log_buf_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mic_log_buf_show, inode->i_private);
+}
+
+static int mic_log_buf_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations log_buf_ops = {
+	.owner   = THIS_MODULE,
+	.open    = mic_log_buf_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = mic_log_buf_release
+};
+
+static int mic_smpt_show(struct seq_file *s, void *pos)
+{
+	int i;
+	struct mic_device *mdev = s->private;
+	unsigned long flags;
+
+	seq_printf(s, "MIC %-2d |%-10s| %-14s %-10s\n",
+		mdev->id, "SMPT entry", "SW DMA addr", "RefCount");
+	seq_puts(s, "====================================================\n");
+
+	if (mdev->smpt) {
+		struct mic_smpt_info *smpt_info = mdev->smpt;
+		spin_lock_irqsave(&smpt_info->smpt_lock, flags);
+		for (i = 0; i < smpt_info->info.num_reg; i++) {
+			seq_printf(s, "%9s|%-10d| %-#14llx %-10lld\n",
+				" ",  i, smpt_info->entry[i].dma_addr,
+				smpt_info->entry[i].ref_count);
+		}
+		spin_unlock_irqrestore(&smpt_info->smpt_lock, flags);
+	}
+	seq_puts(s, "====================================================\n");
+	return 0;
+}
+
+static int mic_smpt_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mic_smpt_show, inode->i_private);
+}
+
+static int mic_smpt_debug_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations smpt_file_ops = {
+	.owner   = THIS_MODULE,
+	.open    = mic_smpt_debug_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = mic_smpt_debug_release
+};
+
+static int mic_soft_reset_show(struct seq_file *s, void *pos)
+{
+	struct mic_device *mdev = s->private;
+
+	mic_stop(mdev, true);
+	return 0;
+}
+
+static int mic_soft_reset_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mic_soft_reset_show, inode->i_private);
+}
+
+static int mic_soft_reset_debug_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations soft_reset_ops = {
+	.owner   = THIS_MODULE,
+	.open    = mic_soft_reset_debug_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = mic_soft_reset_debug_release
+};
+
+static int mic_post_code_show(struct seq_file *s, void *pos)
+{
+	struct mic_device *mdev = s->private;
+	u32 reg = mdev->ops->get_postcode(mdev);
+
+	seq_printf(s, "%c%c", reg & 0xff, (reg >> 8) & 0xff);
+	return 0;
+}
+
+static int mic_post_code_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mic_post_code_show, inode->i_private);
+}
+
+static int mic_post_code_debug_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations post_code_ops = {
+	.owner   = THIS_MODULE,
+	.open    = mic_post_code_debug_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = mic_post_code_debug_release
+};
+
+static int mic_dp_show(struct seq_file *s, void *pos)
+{
+	struct mic_device *mdev = s->private;
+	struct mic_bootparam *bootparam = mdev->dp;
+
+	seq_printf(s, "Bootparam: magic 0x%x\n",
+		bootparam->magic);
+	seq_printf(s, "Bootparam: h2c_shutdown_db %d\n",
+		bootparam->h2c_shutdown_db);
+	seq_printf(s, "Bootparam: h2c_config_db %d\n",
+		bootparam->h2c_config_db);
+	seq_printf(s, "Bootparam: c2h_shutdown_db %d\n",
+		bootparam->c2h_shutdown_db);
+	seq_printf(s, "Bootparam: shutdown_status %d\n",
+		bootparam->shutdown_status);
+	seq_printf(s, "Bootparam: shutdown_card %d\n",
+		bootparam->shutdown_card);
+
+	return 0;
+}
+
+static int mic_dp_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mic_dp_show, inode->i_private);
+}
+
+static int mic_dp_debug_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations dp_ops = {
+	.owner   = THIS_MODULE,
+	.open    = mic_dp_debug_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = mic_dp_debug_release
+};
+
+static int mic_msi_irq_info_show(struct seq_file *s, void *pos)
+{
+	struct mic_device *mdev  = s->private;
+	int reg;
+	int i, j;
+	u16 entry;
+	u16 vector;
+
+	if (pci_dev_msi_enabled(mdev->pdev)) {
+		for (i = 0; i < mdev->irq_info.num_vectors; i++) {
+			if (mdev->pdev->msix_enabled) {
+				entry = mdev->irq_info.msix_entries[i].entry;
+				vector = mdev->irq_info.msix_entries[i].vector;
+			} else {
+				entry = 0;
+				vector = mdev->pdev->irq;
+			}
+
+			reg = mdev->intr_ops->read_msi_to_src_map(mdev, entry);
+
+			seq_printf(s, "%s %-10d %s %-10d MXAR[%d]: %08X\n",
+				"IRQ:", vector, "Entry:", entry, i, reg);
+
+			seq_printf(s, "%-10s", "offset:");
+			for (j = (MIC_NUM_OFFSETS - 1); j >= 0; j--)
+				seq_printf(s, "%4d ", j);
+			seq_puts(s, "\n");
+
+
+			seq_printf(s, "%-10s", "count:");
+			for (j = (MIC_NUM_OFFSETS - 1); j >= 0; j--)
+				seq_printf(s, "%4d ",
+				(mdev->irq_info.mic_msi_map[i] & BIT(j)) ?
+					1 : 0);
+			seq_puts(s, "\n\n");
+		}
+	} else {
+		seq_puts(s, "MSI/MSIx interrupts not enabled\n");
+	}
+
+	return 0;
+
+}
+
+static int mic_msi_irq_info_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mic_msi_irq_info_show, inode->i_private);
+}
+
+static int
+mic_msi_irq_info_debug_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations msi_irq_info_ops = {
+	.owner   = THIS_MODULE,
+	.open    = mic_msi_irq_info_debug_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = mic_msi_irq_info_debug_release
+};
+
+/**
+ * mic_create_debug_dir - Initialize MIC debugfs entries.
+ */
+void __init mic_create_debug_dir(struct mic_device *mdev)
+{
+	if (!mic_dbg)
+		return;
+
+	mdev->dbg_dir = debugfs_create_dir(mdev->name, mic_dbg);
+	if (!mdev->dbg_dir)
+		return;
+
+	debugfs_create_file("log_buf", 0444, mdev->dbg_dir,
+		mdev, &log_buf_ops);
+
+	debugfs_create_file("smpt", 0444, mdev->dbg_dir,
+		mdev, &smpt_file_ops);
+
+	debugfs_create_file("soft_reset", 0444, mdev->dbg_dir,
+		mdev, &soft_reset_ops);
+
+	debugfs_create_file("post_code", 0444, mdev->dbg_dir,
+		mdev, &post_code_ops);
+
+	debugfs_create_file("dp", 0444, mdev->dbg_dir,
+		mdev, &dp_ops);
+
+	debugfs_create_file("msi_irq_info", 0444, mdev->dbg_dir,
+		mdev, &msi_irq_info_ops);
+}
+
+/**
+ * mic_delete_debug_dir - Uninitialize MIC debugfs entries.
+ */
+void mic_delete_debug_dir(struct mic_device *mdev)
+{
+	if (!mdev->dbg_dir)
+		return;
+
+	debugfs_remove_recursive(mdev->dbg_dir);
+}
+
+/**
+ * mic_init_debugfs - Initialize global debugfs entry.
+ */
+void __init mic_init_debugfs(void)
+{
+	mic_dbg = debugfs_create_dir(KBUILD_MODNAME, NULL);
+	if (!mic_dbg)
+		pr_err("can't create debugfs dir\n");
+}
+
+/**
+ * mic_exit_debugfs - Uninitialize global debugfs entry
+ */
+void mic_exit_debugfs(void)
+{
+	debugfs_remove(mic_dbg);
+}
diff --git a/drivers/misc/mic/host/mic_debugfs.h b/drivers/misc/mic/host/mic_debugfs.h
new file mode 100644
index 0000000..86db36e
--- /dev/null
+++ b/drivers/misc/mic/host/mic_debugfs.h
@@ -0,0 +1,29 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#ifndef _MIC_DEBUGFS_H_
+#define _MIC_DEBUGFS_H_
+
+void __init mic_create_debug_dir(struct mic_device *dev);
+void mic_delete_debug_dir(struct mic_device *dev);
+void __init mic_init_debugfs(void);
+void mic_exit_debugfs(void);
+
+#endif
diff --git a/drivers/misc/mic/host/mic_device.h b/drivers/misc/mic/host/mic_device.h
index d191831..79312aa 100644
--- a/drivers/misc/mic/host/mic_device.h
+++ b/drivers/misc/mic/host/mic_device.h
@@ -133,6 +133,23 @@ struct mic_irq;
  * @smpt: MIC SMPT information.
  * @intr_info: H/W specific interrupt information.
  * @irq_info: The OS specific irq information
+ * @dbg_dir: debugfs directory of this MIC device.
+ * @cmdline: Kernel command line.
+ * @firmware: Firmware file name.
+ * @ramdisk: Ramdisk file name.
+ * @bootaddr: MIC boot address.
+ * @reset_trigger_work: Work for triggering reset requests.
+ * @shutdown_work: Work for handling shutdown interrupts.
+ * @state: MIC state.
+ * @shutdown_status: MIC status reported by card for shutdown/crashes.
+ * @state_sysfs: Sysfs dirent for notifying ring 3 about MIC state changes.
+ * @log_buf_addr: Log buffer address for MIC.
+ * @log_buf_len: Log buffer length address for MIC.
+ * @dp: virtio device page
+ * @dp_dma_addr: virtio device page DMA address.
+ * @cdev: Character device for MIC.
+ * @shutdown_db: shutdown doorbell.
+ * @shutdown_cookie: shutdown cookie.
  */
 struct mic_device {
 	char name[20];
@@ -151,6 +168,23 @@ struct mic_device {
 	struct mic_smpt_info *smpt;
 	struct mic_intr_info *intr_info;
 	struct mic_irq_info irq_info;
+	struct dentry *dbg_dir;
+	char *cmdline;
+	char *firmware;
+	char *ramdisk;
+	u32 bootaddr;
+	struct work_struct reset_trigger_work;
+	struct work_struct shutdown_work;
+	u8 state;
+	u8 shutdown_status;
+	struct sysfs_dirent *state_sysfs;
+	void *log_buf_addr;
+	int *log_buf_len;
+	void *dp;
+	dma_addr_t dp_dma_addr;
+	struct cdev cdev;
+	int shutdown_db;
+	struct mic_irq *shutdown_cookie;
 };
 
 /**
@@ -183,6 +217,13 @@ struct mic_hw_intr_ops {
  * @send_intr: Send an interrupt for a particular doorbell on the card.
  * @ack_interrupt: Hardware specific operations to ack the h/w on
  * receipt of an interrupt.
+ * @reset: Reset the remote processor.
+ * @reset_fw_ready: Reset firmware ready field.
+ * @is_fw_ready: Check if firmware is ready for OS download.
+ * @send_firmware_intr: Send an interrupt to the card firmware.
+ * @load_mic_fw: Load firmware segments required to boot the card
+ * into card memory. This includes the kernel, command line, ramdisk etc.
+ * @get_postcode: Get post code status from firmware.
  */
 struct mic_hw_ops {
 	u8 aper_bar;
@@ -192,6 +233,12 @@ struct mic_hw_ops {
 	void (*write_spad)(struct mic_device *mdev, u32 idx, u32 val);
 	void (*send_intr)(struct mic_device *mdev, int doorbell);
 	u32 (*ack_interrupt)(struct mic_device *mdev);
+	void (*reset)(struct mic_device *mdev);
+	void (*reset_fw_ready)(struct mic_device *mdev);
+	bool (*is_fw_ready)(struct mic_device *mdev);
+	void (*send_firmware_intr)(struct mic_device *mdev);
+	int (*load_mic_fw)(struct mic_device *mdev, const char *buf);
+	u32 (*get_postcode)(struct mic_device *mdev);
 };
 
 /**
@@ -230,4 +277,13 @@ struct mic_irq *mic_request_irq(struct mic_device *mdev,
 void mic_free_irq(struct mic_device *mdev,
 		struct mic_irq *cookie, void *data);
 void mic_intr_restore(struct mic_device *mdev);
+int mic_start(struct mic_device *mdev, const char *buf);
+void mic_stop(struct mic_device *mdev, bool force);
+void mic_shutdown(struct mic_device *mdev);
+void mic_reset_delayed_work(struct work_struct *work);
+void mic_reset_trigger_work(struct work_struct *work);
+void mic_shutdown_work(struct work_struct *work);
+void mic_bootparam_init(struct mic_device *mdev);
+void mic_set_state(struct mic_device *mdev, u8 state);
+void mic_set_shutdown_status(struct mic_device *mdev, u8 status);
 #endif
diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c
index 505b249..ede682f 100644
--- a/drivers/misc/mic/host/mic_main.c
+++ b/drivers/misc/mic/host/mic_main.c
@@ -26,8 +26,11 @@
 #include <linux/fs.h>
 #include <linux/pci.h>
 #include <linux/interrupt.h>
+#include <linux/firmware.h>
+#include <linux/completion.h>
 
 #include "mic_common.h"
+#include "mic_debugfs.h"
 
 static const char mic_driver_name[] = "mic";
 
@@ -70,6 +73,60 @@ struct mic_info {
 /* g_mic - Global information about all MIC devices. */
 static struct mic_info g_mic;
 
+/* Initialize the device page */
+static int mic_dp_init(struct mic_device *mdev)
+{
+	mdev->dp = kzalloc(MIC_DP_SIZE, GFP_KERNEL);
+	if (!mdev->dp) {
+		dev_err(&mdev->pdev->dev, "%s %d err %d\n",
+			__func__, __LINE__, -ENOMEM);
+		return -ENOMEM;
+	}
+
+	mdev->dp_dma_addr = mic_map_single(mdev,
+		mdev->dp, MIC_DP_SIZE);
+	if (mic_map_error(mdev->dp_dma_addr)) {
+		kfree(mdev->dp);
+		dev_err(&mdev->pdev->dev, "%s %d err %d\n",
+			__func__, __LINE__, -ENOMEM);
+		return -ENOMEM;
+	}
+	mdev->ops->write_spad(mdev, MIC_DPLO_SPAD, mdev->dp_dma_addr);
+	mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev->dp_dma_addr >> 32);
+	return 0;
+}
+
+/* Uninitialize the device page */
+static void mic_dp_uninit(struct mic_device *mdev)
+{
+	mic_unmap_single(mdev, mdev->dp_dma_addr, MIC_DP_SIZE);
+	kfree(mdev->dp);
+}
+
+/**
+ * mic_shutdown_db - Shutdown doorbell interrupt handler.
+ */
+static irqreturn_t mic_shutdown_db(int irq, void *data)
+{
+	struct mic_device *mdev = data;
+	struct mic_bootparam *bootparam = mdev->dp;
+
+	mdev->ops->ack_interrupt(mdev);
+
+	switch (bootparam->shutdown_status) {
+	case MIC_HALTED:
+	case MIC_POWER_OFF:
+	case MIC_RESTART:
+		/* Fall through */
+	case MIC_CRASHED:
+		schedule_work(&mdev->shutdown_work);
+		break;
+	default:
+		break;
+	};
+	return IRQ_HANDLED;
+}
+
 /*
  * mic_invoke_callback - Invoke callback functions registered for
  * the corresponding source id.
@@ -759,6 +816,24 @@ mic_device_init(struct mic_device *mdev, struct pci_dev *pdev)
 	mic_sysfs_init(mdev);
 	mutex_init(&mdev->mic_mutex);
 	mdev->irq_info.next_avail_src = 0;
+	INIT_WORK(&mdev->reset_trigger_work, mic_reset_trigger_work);
+	INIT_WORK(&mdev->shutdown_work, mic_shutdown_work);
+}
+
+/**
+ * mic_device_uninit - Frees resources allocated during mic_device_init(..)
+ *
+ * @mdev: pointer to mic_device instance
+ *
+ * returns none
+ */
+static void mic_device_uninit(struct mic_device *mdev)
+{
+	/* The cmdline sysfs entry might have allocated cmdline */
+	kfree(mdev->cmdline);
+	kfree(mdev->firmware);
+	flush_work(&mdev->reset_trigger_work);
+	flush_work(&mdev->shutdown_work);
 }
 
 /**
@@ -793,7 +868,7 @@ static int __init mic_probe(struct pci_dev *pdev,
 	rc = pci_enable_device(pdev);
 	if (rc) {
 		dev_err(&pdev->dev, "failed to enable pci device.\n");
-		goto free_device;
+		goto uninit_device;
 	}
 
 	pci_set_master(pdev);
@@ -840,6 +915,7 @@ static int __init mic_probe(struct pci_dev *pdev,
 		dev_err(&pdev->dev, "smpt_init failed %d\n", rc);
 		goto free_interrupts;
 	}
+
 	pci_set_drvdata(pdev, mdev);
 
 	mdev->sdev = device_create(g_mic.mic_class, &pdev->dev,
@@ -855,8 +931,41 @@ static int __init mic_probe(struct pci_dev *pdev,
 		dev_err(&pdev->dev, "sysfs_create_group failed rc %d\n", rc);
 		goto destroy_device;
 	}
+	mdev->state_sysfs = sysfs_get_dirent(mdev->sdev->kobj.sd,
+		NULL, "state");
+	if (!mdev->state_sysfs) {
+		rc = -ENODEV;
+		dev_err(&pdev->dev, "sysfs_get_dirent failed rc %d\n", rc);
+		goto destroy_group;
+	}
+
+	rc = mic_dp_init(mdev);
+	if (rc) {
+		dev_err(&pdev->dev, "mic_dp_init failed rc %d\n", rc);
+		goto sysfs_put;
+	}
+	mutex_lock(&mdev->mic_mutex);
+
+	mdev->shutdown_db = mic_next_db(mdev);
+	mdev->shutdown_cookie = mic_request_irq(mdev, mic_shutdown_db,
+		"shutdown-interrupt", mdev, mdev->shutdown_db, MIC_INTR_DB);
+	if (IS_ERR(mdev->shutdown_cookie)) {
+		rc = PTR_ERR(mdev->shutdown_cookie);
+		mutex_unlock(&mdev->mic_mutex);
+		goto dp_uninit;
+	}
+	mutex_unlock(&mdev->mic_mutex);
+	mic_bootparam_init(mdev);
+
+	mic_create_debug_dir(mdev);
 	dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name);
 	return 0;
+dp_uninit:
+	mic_dp_uninit(mdev);
+sysfs_put:
+	sysfs_put(mdev->state_sysfs);
+destroy_group:
+	sysfs_remove_group(&mdev->sdev->kobj, &mdev->attr_group);
 destroy_device:
 	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
 smpt_uninit:
@@ -871,7 +980,8 @@ release_regions:
 	pci_release_regions(pdev);
 disable_device:
 	pci_disable_device(pdev);
-free_device:
+uninit_device:
+	mic_device_uninit(mdev);
 	kfree(mdev);
 dec_num_dev:
 	g_mic.next_id--;
@@ -897,12 +1007,21 @@ static void mic_remove(struct pci_dev *pdev)
 
 	id = mdev->id;
 
+	mic_stop(mdev, false);
+	mic_delete_debug_dir(mdev);
+	mutex_lock(&mdev->mic_mutex);
+	mic_free_irq(mdev, mdev->shutdown_cookie, mdev);
+	mutex_unlock(&mdev->mic_mutex);
+	flush_work(&mdev->shutdown_work);
+	mic_dp_uninit(mdev);
+	sysfs_put(mdev->state_sysfs);
 	sysfs_remove_group(&mdev->sdev->kobj, &mdev->attr_group);
 	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
 	mic_smpt_uninit(mdev);
 	mic_free_interrupts(mdev);
 	iounmap(mdev->mmio.va);
 	iounmap(mdev->aper.va);
+	mic_device_uninit(mdev);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
 	kfree(mdev);
@@ -933,13 +1052,15 @@ static int __init mic_init(void)
 		goto cleanup_chrdev;
 	}
 
+	mic_init_debugfs();
 	ret = pci_register_driver(&mic_driver);
 	if (ret) {
 		pr_err("pci_register_driver failed ret %d\n", ret);
-		goto class_destroy;
+		goto cleanup_debugfs;
 	}
 	return ret;
-class_destroy:
+cleanup_debugfs:
+	mic_exit_debugfs();
 	class_destroy(g_mic.mic_class);
 cleanup_chrdev:
 	unregister_chrdev_region(g_mic.mdev_id, MIC_MAX_NUM_DEVS);
@@ -950,6 +1071,7 @@ error:
 static void __exit mic_exit(void)
 {
 	pci_unregister_driver(&mic_driver);
+	mic_exit_debugfs();
 	class_destroy(g_mic.mic_class);
 	unregister_chrdev_region(g_mic.mdev_id, MIC_MAX_NUM_DEVS);
 }
diff --git a/drivers/misc/mic/host/mic_sysfs.c b/drivers/misc/mic/host/mic_sysfs.c
index fe0605d..b9f5b1c 100644
--- a/drivers/misc/mic/host/mic_sysfs.c
+++ b/drivers/misc/mic/host/mic_sysfs.c
@@ -23,6 +23,48 @@
 
 #include "mic_common.h"
 
+/*
+ * A state-to-string lookup table, for exposing a human readable state
+ * via sysfs. Always keep in sync with enum mic_states
+ */
+static const char * const mic_state_string[] = {
+	[MIC_OFFLINE] = "offline",
+	[MIC_ONLINE] = "online",
+	[MIC_SHUTTING_DOWN] = "shutting_down",
+	[MIC_RESET_FAILED] = "reset_failed",
+};
+
+/*
+ * A shutdown-status-to-string lookup table, for exposing a human
+ * readable state via sysfs. Always keep in sync with enum mic_shutdown_status
+ */
+static const char * const mic_shutdown_status_string[] = {
+	[MIC_NOP] = "nop",
+	[MIC_CRASHED] = "crashed",
+	[MIC_HALTED] = "halted",
+	[MIC_POWER_OFF] = "poweroff",
+	[MIC_RESTART] = "restart",
+};
+
+void mic_set_shutdown_status(struct mic_device *mdev, u8 shutdown_status)
+{
+	WARN_ON(!mutex_is_locked(&mdev->mic_mutex));
+	dev_info(&mdev->pdev->dev, "Shutdown Status %s -> %s\n",
+		mic_shutdown_status_string[mdev->shutdown_status],
+		mic_shutdown_status_string[shutdown_status]);
+	mdev->shutdown_status = shutdown_status;
+}
+
+void mic_set_state(struct mic_device *mdev, u8 state)
+{
+	WARN_ON(!mutex_is_locked(&mdev->mic_mutex));
+	dev_info(&mdev->pdev->dev, "State %s -> %s\n",
+		mic_state_string[mdev->state],
+		mic_state_string[state]);
+	mdev->state = state;
+	sysfs_notify_dirent(mdev->state_sysfs);
+}
+
 static ssize_t
 mic_show_family(struct device *dev, struct device_attribute *attr, char *buf)
 {
@@ -81,9 +123,187 @@ mic_show_stepping(struct device *dev, struct device_attribute *attr, char *buf)
 }
 static DEVICE_ATTR(stepping, S_IRUGO, mic_show_stepping, NULL);
 
+static ssize_t
+mic_show_state(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+
+	if (!mdev || mdev->state >= MIC_LAST)
+		return -EINVAL;
+
+	return snprintf(buf, PAGE_SIZE, "%s",
+		mic_state_string[mdev->state]);
+}
+
+static ssize_t
+mic_store_state(struct device *dev, struct device_attribute *attr,
+	const char *buf, size_t count)
+{
+	int rc = 0;
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+	if (!mdev)
+		return -EINVAL;
+	if (!strncmp(buf, "boot", strlen("boot"))) {
+		rc = mic_start(mdev, buf);
+		if (rc) {
+			dev_err(&mdev->pdev->dev,
+				"mic_boot failed rc %d\n", rc);
+			count = rc;
+		}
+		goto done;
+	}
+
+	if (sysfs_streq(buf, "reset")) {
+		schedule_work(&mdev->reset_trigger_work);
+		goto done;
+	}
+
+	if (sysfs_streq(buf, "shutdown")) {
+		mic_shutdown(mdev);
+		goto done;
+	}
+
+	count = -EINVAL;
+done:
+	return count;
+}
+static DEVICE_ATTR(state, S_IRUGO|S_IWUSR, mic_show_state, mic_store_state);
+
+static ssize_t mic_show_shutdown_status(struct device *dev,
+	struct device_attribute *attr, char *buf)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+
+	if (!mdev || mdev->shutdown_status >= MIC_STATUS_LAST)
+		return -EINVAL;
+
+	return snprintf(buf, PAGE_SIZE, "%s",
+		mic_shutdown_status_string[mdev->shutdown_status]);
+}
+static DEVICE_ATTR(shutdown_status, S_IRUGO|S_IWUSR,
+	mic_show_shutdown_status, NULL);
+
+static ssize_t
+mic_show_cmdline(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+	char *cmdline;
+
+	if (!mdev)
+		return -EINVAL;
+
+	cmdline = mdev->cmdline;
+
+	if (cmdline)
+		return snprintf(buf, PAGE_SIZE, "%s\n", cmdline);
+	return 0;
+}
+
+static ssize_t
+mic_store_cmdline(struct device *dev, struct device_attribute *attr,
+	const char *buf, size_t count)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+
+	if (!mdev)
+		return -EINVAL;
+
+	kfree(mdev->cmdline);
+
+	mdev->cmdline = kmalloc(count + 1, GFP_KERNEL);
+	if (!mdev->cmdline)
+		return -ENOMEM;
+
+	strncpy(mdev->cmdline, buf, count);
+
+	if (mdev->cmdline[count - 1] == '\n')
+		mdev->cmdline[count - 1] = '\0';
+	else
+		mdev->cmdline[count] = '\0';
+
+	return count;
+}
+static DEVICE_ATTR(cmdline, S_IRUGO | S_IWUSR,
+	mic_show_cmdline, mic_store_cmdline);
+
+static ssize_t
+mic_show_log_buf_addr(struct device *dev, struct device_attribute *attr,
+	char *buf)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+
+	if (!mdev)
+		return -EINVAL;
+
+	return snprintf(buf, PAGE_SIZE, "%p\n", mdev->log_buf_addr);
+}
+
+static ssize_t
+mic_store_log_buf_addr(struct device *dev, struct device_attribute *attr,
+	const char *buf, size_t count)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+	int ret;
+	unsigned long addr;
+
+	if (!mdev)
+		return -EINVAL;
+
+	ret = kstrtoul(buf, 16, &addr);
+	if (ret)
+		goto exit;
+
+	mdev->log_buf_addr = (void *)addr;
+	ret = count;
+exit:
+	return ret;
+}
+static DEVICE_ATTR(log_buf_addr, S_IRUGO | S_IWUSR,
+	mic_show_log_buf_addr, mic_store_log_buf_addr);
+
+static ssize_t
+mic_show_log_buf_len(struct device *dev, struct device_attribute *attr,
+	char *buf)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+
+	if (!mdev)
+		return -EINVAL;
+
+	return snprintf(buf, PAGE_SIZE, "%p\n", mdev->log_buf_len);
+}
+
+static ssize_t
+mic_store_log_buf_len(struct device *dev, struct device_attribute *attr,
+	const char *buf, size_t count)
+{
+	struct mic_device *mdev = dev_get_drvdata(dev->parent);
+	int ret;
+	unsigned long addr;
+
+	if (!mdev)
+		return -EINVAL;
+
+	ret = kstrtoul(buf, 16, &addr);
+	if (ret)
+		goto exit;
+
+	mdev->log_buf_len = (int *)addr;
+	ret = count;
+exit:
+	return ret;
+}
+static DEVICE_ATTR(log_buf_len, S_IRUGO | S_IWUSR,
+	mic_show_log_buf_len, mic_store_log_buf_len);
+
 static struct attribute *default_attrs[] = {
 	&dev_attr_family.attr,
 	&dev_attr_stepping.attr,
+	&dev_attr_state.attr,
+	&dev_attr_shutdown_status.attr,
+	&dev_attr_cmdline.attr,
+	&dev_attr_log_buf_addr.attr,
+	&dev_attr_log_buf_len.attr,
 
 	NULL
 };
diff --git a/drivers/misc/mic/host/mic_x100.c b/drivers/misc/mic/host/mic_x100.c
index 055bd00..d9e9322 100644
--- a/drivers/misc/mic/host/mic_x100.c
+++ b/drivers/misc/mic/host/mic_x100.c
@@ -20,6 +20,9 @@
  */
 #include <linux/fs.h>
 #include <linux/pci.h>
+#include <linux/sched.h>
+#include <linux/firmware.h>
+#include <linux/delay.h>
 
 #include "mic_common.h"
 
@@ -267,6 +270,324 @@ mic_x100_program_msi_to_src_map(struct mic_device *mdev,
 	mic_mmio_write(mw, reg, mxar);
 }
 
+/*
+ * mic_x100_reset_fw_ready - Reset Firmware ready status field.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_x100_reset_fw_ready(struct mic_device *mdev)
+{
+	mdev->ops->write_spad(mdev, MIC_X100_DOWNLOAD_INFO, 0);
+}
+
+/*
+ * mic_x100_is_fw_ready - Check if firmware is ready.
+ * @mdev: pointer to mic_device instance
+ */
+static bool mic_x100_is_fw_ready(struct mic_device *mdev)
+{
+	u32 scratch2 = mdev->ops->read_spad(mdev, MIC_X100_DOWNLOAD_INFO);
+	return MIC_X100_SPAD2_DOWNLOAD_STATUS(scratch2) ? true : false;
+}
+
+/**
+ * mic_x100_get_apic_id - Get bootstrap APIC ID.
+ * @mdev: pointer to mic_device instance
+ */
+static u32 mic_x100_get_apic_id(struct mic_device *mdev)
+{
+	u32 scratch2 = 0;
+
+	scratch2 = mdev->ops->read_spad(mdev, MIC_X100_DOWNLOAD_INFO);
+	return MIC_X100_SPAD2_APIC_ID(scratch2);
+}
+
+/**
+ * mic_x100_send_firmware_intr - Send an interrupt to the firmware on MIC.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_x100_send_firmware_intr(struct mic_device *mdev)
+{
+	u32 apicicr_low;
+	u64 apic_icr_offset = MIC_X100_SBOX_APICICR7;
+	int vector = MIC_X100_BSP_INTERRUPT_VECTOR;
+	struct mic_mw *mw = &mdev->mmio;
+
+	/*
+	 * For MIC we need to make sure we "hit"
+	 * the send_icr bit (13).
+	 */
+	apicicr_low = (vector | (1 << 13));
+
+	mic_mmio_write(mw, mic_x100_get_apic_id(mdev),
+		MIC_X100_SBOX_BASE_ADDRESS + apic_icr_offset + 4);
+
+	/* Ensure all previous stores are ordered. */
+	wmb();
+	/*
+	 * MIC card interrupt triggers only when we write
+	 * the lower part of the address (upper bits).
+	 */
+	mic_mmio_write(mw, apicicr_low,
+		MIC_X100_SBOX_BASE_ADDRESS + apic_icr_offset);
+}
+
+/**
+ * mic_x100_hw_reset - Reset the MIC device.
+ * @mdev: pointer to mic_device instance
+ */
+static void mic_x100_hw_reset(struct mic_device *mdev)
+{
+	u32 reset_reg;
+	u32 rgcr = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_RGCR;
+	struct mic_mw *mw = &mdev->mmio;
+
+	/* Ensure all previous loads and stores are ordered */
+	mb();
+	/* Trigger reset */
+	reset_reg = mic_mmio_read(mw, rgcr);
+	reset_reg |= 0x1;
+	mic_mmio_write(mw, reset_reg, rgcr);
+	/*
+	 * It seems we really want to delay at least 1 second
+	 * after touching reset to prevent a lot of problems.
+	 */
+	msleep(1000);
+}
+
+/**
+ * mic_x100_load_command_line - Load command line to MIC.
+ * @mdev: pointer to mic_device instance
+ * @fw: the firmware image
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static int
+mic_x100_load_command_line(struct mic_device *mdev, const struct firmware *fw)
+{
+	u32 len = 0;
+	u32 boot_mem;
+	char *buf;
+	void __iomem *cmd_line_va = mdev->aper.va + mdev->bootaddr + fw->size;
+#define CMDLINE_SIZE 2048
+
+	boot_mem = mdev->aper.len >> 20;
+	buf = kzalloc(CMDLINE_SIZE, GFP_KERNEL);
+	if (!buf) {
+		dev_err(&mdev->pdev->dev,
+			"%s %d allocation failed\n", __func__, __LINE__);
+		return -ENOMEM;
+	}
+	len += snprintf(buf, CMDLINE_SIZE - len,
+		" mem=%dM crashkernel=1M@80M", boot_mem);
+	if (mdev->cmdline)
+		snprintf(buf + len, CMDLINE_SIZE - len,
+				" %s", mdev->cmdline);
+	memcpy_toio(cmd_line_va, buf, strlen(buf) + 1);
+	kfree(buf);
+	return 0;
+}
+
+/**
+ * mic_x100_load_ramdisk - Load ramdisk to MIC.
+ * @mdev: pointer to mic_device instance
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static int
+mic_x100_load_ramdisk(struct mic_device *mdev)
+{
+	const struct firmware *fw;
+	int rc;
+	struct boot_params __iomem *bp = mdev->aper.va + mdev->bootaddr;
+
+	rc = request_firmware(&fw,
+			mdev->ramdisk, &mdev->pdev->dev);
+	if (rc < 0) {
+		dev_err(&mdev->pdev->dev,
+			"ramdisk request_firmware failed: %d %s\n",
+			rc, mdev->ramdisk);
+		goto error;
+	}
+	/*
+	 * Typically the bootaddr for card OS is 64M
+	 * so copy over the ramdisk @ 128M.
+	 */
+	memcpy_toio(mdev->aper.va + (mdev->bootaddr << 1),
+		fw->data, fw->size);
+	iowrite32(cpu_to_le32(mdev->bootaddr << 1), &bp->hdr.ramdisk_image);
+	iowrite32(cpu_to_le32(fw->size), &bp->hdr.ramdisk_size);
+	release_firmware(fw);
+error:
+	return rc;
+}
+
+/**
+ * mic_x100_get_boot_addr - Get MIC boot address.
+ * @mdev: pointer to mic_device instance
+ *
+ * This function is called during firmware load to determine
+ * the address at which the OS should be downloaded in card
+ * memory i.e. GDDR.
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static int
+mic_x100_get_boot_addr(struct mic_device *mdev)
+{
+	u32 scratch2, boot_addr;
+	int rc = 0;
+
+	scratch2 = mdev->ops->read_spad(mdev, MIC_X100_DOWNLOAD_INFO);
+	boot_addr = MIC_X100_SPAD2_DOWNLOAD_ADDR(scratch2);
+	dev_dbg(&mdev->pdev->dev, "%s %d boot_addr 0x%x\n",
+		__func__, __LINE__, boot_addr);
+	if (boot_addr > (1 << 31)) {
+		dev_err(&mdev->pdev->dev,
+			"incorrect bootaddr 0x%x\n",
+			boot_addr);
+		rc = -EINVAL;
+		goto error;
+	}
+	mdev->bootaddr = boot_addr;
+error:
+	return rc;
+}
+
+/* Either a Linux OS or an ELF for flash updates is currently supported */
+enum mic_mode {
+	MIC_LINUX = 0,
+	MIC_ELF,
+};
+
+static const char * const mic_boot_str[] = {
+	[MIC_LINUX] = "boot:linux:",
+	[MIC_ELF] = "boot:elf:",
+};
+
+/*
+ * mic_x100_parse_fw_path - Parse firmware/ramdisk path.
+ * @mdev: pointer to mic_device instance
+ * @buf: buffer containing boot string.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or mode on success.
+ */
+static int mic_parse_fw_path(struct mic_device *mdev, const char *buf)
+{
+	enum mic_mode mode;
+	char *firmware, *ramdisk = NULL;
+	const char *default_mm_image = "mic/RASMM.elf";
+	int len;
+
+	if (!strncmp(buf, mic_boot_str[MIC_LINUX],
+		strlen(mic_boot_str[MIC_LINUX]))) {
+		mode = MIC_LINUX;
+		len = strlen(mic_boot_str[MIC_LINUX]);
+	} else if (!strncmp(buf, mic_boot_str[MIC_ELF],
+		strlen(mic_boot_str[MIC_ELF]))) {
+		mode = MIC_ELF;
+		len = strlen(mic_boot_str[MIC_ELF]);
+	} else {
+		dev_err(&mdev->pdev->dev,
+			"incorrect boot string %s\n", buf);
+		return -EINVAL;
+	}
+	buf += len;
+	len = strlen(buf);
+	if (!(len - 1) && mode == MIC_ELF) {
+		buf = default_mm_image;
+		len = strlen(default_mm_image);
+	}
+	firmware = kmalloc(len + 1, GFP_KERNEL);
+	if (!firmware)
+		return -ENOMEM;
+	memcpy(firmware, buf, len);
+	if ('\n' == firmware[len - 1])
+		firmware[len - 1] = '\0';
+	else
+		firmware[len] = '\0';
+	if (MIC_LINUX == mode) {
+		/*
+		 * if booting linux, the ramdisk image will likely follow.
+		 * The format is "boot:linux:<fw_path>:<ramdisk_path>"
+		 */
+		ramdisk = strchr(firmware, ':');
+		if (ramdisk)
+			*ramdisk++ = '\0';
+	}
+	kfree(mdev->firmware);
+	mdev->firmware = firmware;
+	mdev->ramdisk = ramdisk;
+	return mode;
+}
+
+/**
+ * mic_x100_load_firmware - Load firmware to MIC.
+ * @mdev: pointer to mic_device instance
+ * @buf: buffer containing boot string including firmware/ramdisk path.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+static int
+mic_x100_load_firmware(struct mic_device *mdev, const char *buf)
+{
+	int rc, mode;
+	const struct firmware *fw;
+
+	rc = mic_x100_get_boot_addr(mdev);
+	if (rc)
+		goto error;
+	mode = mic_parse_fw_path(mdev, buf);
+	if (mode < 0) {
+		rc = mode;
+		goto error;
+	}
+	/* load OS */
+	rc = request_firmware(&fw, mdev->firmware, &mdev->pdev->dev);
+	if (rc < 0) {
+		dev_err(&mdev->pdev->dev,
+			"ramdisk request_firmware failed: %d %s\n",
+			rc, mdev->firmware);
+		goto error;
+	}
+	if (mdev->bootaddr > mdev->aper.len - fw->size) {
+		rc = -EINVAL;
+		dev_err(&mdev->pdev->dev, "%s %d rc %d bootaddr 0x%x\n",
+			__func__, __LINE__, rc, mdev->bootaddr);
+		release_firmware(fw);
+		goto error;
+	}
+	memcpy_toio(mdev->aper.va + mdev->bootaddr, fw->data, fw->size);
+	mdev->ops->write_spad(mdev, MIC_X100_FW_SIZE, fw->size);
+	if (MIC_ELF == mode)
+		goto done;
+	/* load command line */
+	rc = mic_x100_load_command_line(mdev, fw);
+	if (rc) {
+		dev_err(&mdev->pdev->dev, "%s %d rc %d\n",
+			__func__, __LINE__, rc);
+		goto error;
+	}
+	release_firmware(fw);
+	/* load ramdisk */
+	if (mdev->ramdisk)
+		rc = mic_x100_load_ramdisk(mdev);
+error:
+	dev_dbg(&mdev->pdev->dev, "%s %d rc %d\n",
+			__func__, __LINE__, rc);
+done:
+	return rc;
+}
+
+/**
+ * mic_x100_get_postcode - Get postcode status from firmware.
+ * @mdev: pointer to mic_device instance
+ *
+ * RETURNS: postcode.
+ */
+static u32 mic_x100_get_postcode(struct mic_device *mdev)
+{
+	return mic_mmio_read(&mdev->mmio, MIC_X100_POSTCODE);
+}
+
 /**
  * mic_x100_smpt_set - Update an SMPT entry with a DMA address.
  * @mdev: pointer to mic_device instance
@@ -323,6 +644,12 @@ struct mic_hw_ops mic_x100_ops = {
 	.write_spad = mic_x100_write_spad,
 	.send_intr = mic_x100_send_intr,
 	.ack_interrupt = mic_x100_ack_interrupt,
+	.reset = mic_x100_hw_reset,
+	.reset_fw_ready = mic_x100_reset_fw_ready,
+	.is_fw_ready = mic_x100_is_fw_ready,
+	.send_firmware_intr = mic_x100_send_firmware_intr,
+	.load_mic_fw = mic_x100_load_firmware,
+	.get_postcode = mic_x100_get_postcode,
 };
 
 struct mic_hw_intr_ops mic_x100_intr_ops = {
diff --git a/drivers/misc/mic/host/mic_x100.h b/drivers/misc/mic/host/mic_x100.h
index fd98b2b..227f73b 100644
--- a/drivers/misc/mic/host/mic_x100.h
+++ b/drivers/misc/mic/host/mic_x100.h
@@ -70,6 +70,15 @@
 #define MIC_X100_NUM_RDMASR_IRQ 8
 #define MIC_X100_RDMASR_IRQ_BASE 17
 #define MIC_NUM_OFFSETS 32
+#define MIC_X100_SPAD2_DOWNLOAD_STATUS(x) ((x) & 0x1)
+#define MIC_X100_SPAD2_APIC_ID(x)	(((x) >> 1) & 0x1ff)
+#define MIC_X100_SPAD2_DOWNLOAD_ADDR(x) ((x) & 0xfffff000)
+#define MIC_X100_SBOX_APICICR7 0x0000AA08
+#define MIC_X100_SBOX_RGCR 0x00004010
+#define MIC_X100_SBOX_SDBIC0 0x0000CC90
+#define MIC_X100_DOWNLOAD_INFO 2
+#define MIC_X100_FW_SIZE 5
+#define MIC_X100_POSTCODE 0x242c
 
 static const u16 mic_x100_intr_init[] = {
 		MIC_X100_DOORBELL_IDX_START,
@@ -80,6 +89,9 @@ static const u16 mic_x100_intr_init[] = {
 		MIC_X100_NUM_ERR,
 };
 
+/* Host->Card(bootstrap) Interrupt Vector */
+#define MIC_X100_BSP_INTERRUPT_VECTOR 229
+
 extern struct mic_hw_ops mic_x100_ops;
 extern struct mic_smpt_ops mic_x100_smpt_ops;
 extern struct mic_hw_intr_ops mic_x100_intr_ops;
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index bdc6e87..8f985dd 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -239,6 +239,7 @@ header-y += media.h
 header-y += mei.h
 header-y += mempolicy.h
 header-y += meye.h
+header-y += mic_common.h
 header-y += mii.h
 header-y += minix_fs.h
 header-y += mman.h
diff --git a/include/uapi/linux/mic_common.h b/include/uapi/linux/mic_common.h
new file mode 100644
index 0000000..a9091e5
--- /dev/null
+++ b/include/uapi/linux/mic_common.h
@@ -0,0 +1,74 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC driver.
+ *
+ */
+#ifndef __MIC_COMMON_H_
+#define __MIC_COMMON_H_
+
+#include <linux/types.h>
+
+/**
+ * struct mic_bootparam: Virtio device independent information in device page
+ *
+ * @magic: A magic value used by the card to ensure it can see the host
+ * @c2h_shutdown_db: Card to Host shutdown doorbell set by host
+ * @h2c_shutdown_db: Host to Card shutdown doorbell set by card
+ * @h2c_config_db: Host to Card Virtio config doorbell set by card
+ * @shutdown_status: Card shutdown status set by card
+ * @shutdown_card: Set to 1 by the host when a card shutdown is initiated
+ */
+struct mic_bootparam {
+	__u32 magic;
+	__s8 c2h_shutdown_db;
+	__s8 h2c_shutdown_db;
+	__s8 h2c_config_db;
+	__u8 shutdown_status;
+	__u8 shutdown_card;
+} __aligned(8);
+
+/* Device page size */
+#define MIC_DP_SIZE 4096
+
+#define MIC_MAGIC 0xc0ffee00
+
+/**
+ * enum mic_states - MIC states.
+ */
+enum mic_states {
+	MIC_OFFLINE = 0,
+	MIC_ONLINE,
+	MIC_SHUTTING_DOWN,
+	MIC_RESET_FAILED,
+	MIC_LAST
+};
+
+/**
+ * enum mic_status - MIC status reported by card after
+ * a host or card initiated shutdown or a card crash.
+ */
+enum mic_status {
+	MIC_NOP = 0,
+	MIC_CRASHED,
+	MIC_HALTED,
+	MIC_POWER_OFF,
+	MIC_RESTART,
+	MIC_STATUS_LAST
+};
+
+#endif
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/7] Intel MIC Card Driver for X100 family.
  2013-08-08  3:04 [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors Sudeep Dutt
                   ` (2 preceding siblings ...)
  2013-08-08  3:04 ` [PATCH v2 3/7] Intel MIC Host Driver, card OS state management Sudeep Dutt
@ 2013-08-08  3:04 ` Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 5/7] Intel MIC Host Driver Changes for Virtio Devices Sudeep Dutt
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-08  3:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell,
	Michael S. Tsirkin, Rob Landley, linux-kernel, virtualization,
	linux-doc
  Cc: Harshavardhan R Kharche, Peter P Waskiewicz Jr,
	Yaozu (Eddie) Dong, Sudeep Dutt, Ashutosh Dixit, AsiasHeasias,
	Caz Yokoyama, Dasaratharaman Chandramouli

This patch does the following:
a) Initializes the Intel MIC X100 platform device and driver.
b) Sets up support to handle shutdown requests from the host.
c) Maps the device page after obtaining the device page address
from the scratchpad registers updated by the host.
d) Informs the host upon a card crash by registering a panic notifier.
e) Informs the host upon a poweroff/halt event.

Co-author: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---
 drivers/misc/mic/Kconfig            |  18 +++
 drivers/misc/mic/Makefile           |   1 +
 drivers/misc/mic/card/Makefile      |  10 ++
 drivers/misc/mic/card/mic_common.h  |  36 +++++
 drivers/misc/mic/card/mic_debugfs.c | 132 ++++++++++++++++
 drivers/misc/mic/card/mic_debugfs.h |  35 +++++
 drivers/misc/mic/card/mic_device.c  | 299 ++++++++++++++++++++++++++++++++++++
 drivers/misc/mic/card/mic_device.h  | 127 +++++++++++++++
 drivers/misc/mic/card/mic_x100.c    | 256 ++++++++++++++++++++++++++++++
 drivers/misc/mic/card/mic_x100.h    |  48 ++++++
 10 files changed, 962 insertions(+)
 create mode 100644 drivers/misc/mic/card/Makefile
 create mode 100644 drivers/misc/mic/card/mic_common.h
 create mode 100644 drivers/misc/mic/card/mic_debugfs.c
 create mode 100644 drivers/misc/mic/card/mic_debugfs.h
 create mode 100644 drivers/misc/mic/card/mic_device.c
 create mode 100644 drivers/misc/mic/card/mic_device.h
 create mode 100644 drivers/misc/mic/card/mic_x100.c
 create mode 100644 drivers/misc/mic/card/mic_x100.h

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index aaefd0c..279a2e6 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -17,3 +17,21 @@ config INTEL_MIC_HOST
 	  More information about the Intel MIC family as well as the Linux
 	  OS and tools for MIC to use with this driver are available from
 	  <http://software.intel.com/en-us/mic-developer>.
+
+comment "Intel MIC Card Driver"
+
+config INTEL_MIC_CARD
+	tristate "Intel MIC Card Driver"
+	depends on 64BIT
+	default N
+	help
+	  This enables card driver support for the Intel Many Integrated
+	  Core (MIC) device family. The card driver communicates shutdown/
+	  crash events to the host and allows registration/configuration of
+	  virtio devices. Intel MIC X100 devices are currently supported.
+
+	  If you are building a card kernel for an Intel MIC device then
+	  say M (recommended) or Y, else say N. If unsure say N.
+
+	  For more information see
+	  <http://software.intel.com/en-us/mic-developer>.
diff --git a/drivers/misc/mic/Makefile b/drivers/misc/mic/Makefile
index 8e72421..05b34d6 100644
--- a/drivers/misc/mic/Makefile
+++ b/drivers/misc/mic/Makefile
@@ -3,3 +3,4 @@
 # Copyright(c) 2013, Intel Corporation.
 #
 obj-$(CONFIG_INTEL_MIC_HOST) += host/
+obj-$(CONFIG_INTEL_MIC_CARD) += card/
diff --git a/drivers/misc/mic/card/Makefile b/drivers/misc/mic/card/Makefile
new file mode 100644
index 0000000..6e9675e
--- /dev/null
+++ b/drivers/misc/mic/card/Makefile
@@ -0,0 +1,10 @@
+#
+# Makefile - Intel MIC Linux driver.
+# Copyright(c) 2013, Intel Corporation.
+#
+ccflags-y += -DINTEL_MIC_CARD
+
+obj-$(CONFIG_INTEL_MIC_CARD) += mic_card.o
+mic_card-y += mic_x100.o
+mic_card-y += mic_device.o
+mic_card-y += mic_debugfs.o
diff --git a/drivers/misc/mic/card/mic_common.h b/drivers/misc/mic/card/mic_common.h
new file mode 100644
index 0000000..daceb9b
--- /dev/null
+++ b/drivers/misc/mic/card/mic_common.h
@@ -0,0 +1,36 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#ifndef _MIC_CARD_COMMON_H_
+#define _MIC_CARD_COMMON_H_
+
+#include <linux/mic_common.h>
+
+#include "../common/mic_device.h"
+#include "mic_device.h"
+#include "mic_x100.h"
+
+#endif
diff --git a/drivers/misc/mic/card/mic_debugfs.c b/drivers/misc/mic/card/mic_debugfs.c
new file mode 100644
index 0000000..007a227
--- /dev/null
+++ b/drivers/misc/mic/card/mic_debugfs.c
@@ -0,0 +1,132 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/sched.h>
+#include <linux/debugfs.h>
+#include <linux/module.h>
+#include <linux/delay.h>
+#include <linux/seq_file.h>
+
+#include "mic_common.h"
+#include "mic_debugfs.h"
+
+/* Debugfs parent dir */
+static struct dentry *mic_dbg;
+
+/**
+ * mic_intr_test - Send interrupts to host.
+ */
+static int mic_intr_test(struct seq_file *s, void *unused)
+{
+	struct mic_driver *mdrv = s->private;
+	struct mic_device *mdev = &mdrv->mdev;
+
+	mic_send_intr(mdev, 0);
+	msleep(1000);
+	mic_send_intr(mdev, 1);
+	msleep(1000);
+	mic_send_intr(mdev, 2);
+	msleep(1000);
+	mic_send_intr(mdev, 3);
+	msleep(1000);
+
+	return 0;
+}
+
+static int mic_intr_test_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mic_intr_test, inode->i_private);
+}
+
+static int mic_intr_test_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations intr_test_ops = {
+	.owner   = THIS_MODULE,
+	.open    = mic_intr_test_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = mic_intr_test_release
+};
+
+/**
+ * mic_create_card_debug_dir - Initialize MIC debugfs entries.
+ */
+void __init mic_create_card_debug_dir(struct mic_driver *mdrv)
+{
+	struct dentry *d;
+
+	if (!mic_dbg)
+		return;
+
+	mdrv->dbg_dir = debugfs_create_dir(mdrv->name, mic_dbg);
+	if (!mdrv->dbg_dir) {
+		dev_err(mdrv->dev, "Cant create dbg_dir %s\n", mdrv->name);
+		return;
+	}
+
+	d = debugfs_create_file("intr_test", 0444, mdrv->dbg_dir,
+		mdrv, &intr_test_ops);
+
+	if (!d) {
+		dev_err(mdrv->dev,
+			"Cant create dbg intr_test %s\n", mdrv->name);
+		return;
+	}
+}
+
+/**
+ * mic_delete_card_debug_dir - Uninitialize MIC debugfs entries.
+ */
+void mic_delete_card_debug_dir(struct mic_driver *mdrv)
+{
+	if (!mdrv->dbg_dir)
+		return;
+
+	debugfs_remove_recursive(mdrv->dbg_dir);
+}
+
+/**
+ * mic_init_card_debugfs - Initialize global debugfs entry.
+ */
+void __init mic_init_card_debugfs(void)
+{
+	mic_dbg = debugfs_create_dir(KBUILD_MODNAME, NULL);
+	if (!mic_dbg)
+		pr_err("can't create debugfs dir\n");
+}
+
+/**
+ * mic_exit_card_debugfs - Uninitialize global debugfs entry
+ */
+void mic_exit_card_debugfs(void)
+{
+	debugfs_remove(mic_dbg);
+}
diff --git a/drivers/misc/mic/card/mic_debugfs.h b/drivers/misc/mic/card/mic_debugfs.h
new file mode 100644
index 0000000..5bdf8c5
--- /dev/null
+++ b/drivers/misc/mic/card/mic_debugfs.h
@@ -0,0 +1,35 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#ifndef _MIC_CARD_DEBUGFS_H_
+#define _MIC_CARD_DEBUGFS_H_
+
+void __init mic_create_card_debug_dir(struct mic_driver *mdrv);
+void mic_delete_card_debug_dir(struct mic_driver *mdrv);
+void __init mic_init_card_debugfs(void);
+void mic_exit_card_debugfs(void);
+
+#endif
diff --git a/drivers/misc/mic/card/mic_device.c b/drivers/misc/mic/card/mic_device.c
new file mode 100644
index 0000000..b186445
--- /dev/null
+++ b/drivers/misc/mic/card/mic_device.c
@@ -0,0 +1,299 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/reboot.h>
+
+#include "mic_common.h"
+#include "mic_debugfs.h"
+
+static struct mic_driver *g_drv;
+static struct mic_irq *shutdown_cookie;
+
+static void mic_notify_host(u8 state)
+{
+	struct mic_driver *mdrv = g_drv;
+	struct mic_bootparam __iomem *bootparam = mdrv->dp;
+
+	iowrite8(state, &bootparam->shutdown_status);
+	dev_info(mdrv->dev, "%s %d system_state %d\n",
+		__func__, __LINE__, state);
+	mic_send_intr(&mdrv->mdev, ioread8(&bootparam->c2h_shutdown_db));
+}
+
+static int mic_panic_event(struct notifier_block *this, unsigned long event,
+		void *ptr)
+{
+	struct mic_driver *mdrv = g_drv;
+	struct mic_bootparam __iomem *bootparam = mdrv->dp;
+
+	iowrite8(-1, &bootparam->h2c_config_db);
+	iowrite8(-1, &bootparam->h2c_shutdown_db);
+	mic_notify_host(MIC_CRASHED);
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block mic_panic = {
+	.notifier_call  = mic_panic_event,
+};
+
+static irqreturn_t mic_shutdown_isr(int irq, void *data)
+{
+	struct mic_driver *mdrv = g_drv;
+	struct mic_bootparam __iomem *bootparam = mdrv->dp;
+
+	mic_ack_interrupt(&g_drv->mdev);
+	if (ioread8(&bootparam->shutdown_card))
+		orderly_poweroff(true);
+	return IRQ_HANDLED;
+}
+
+static int mic_shutdown_init(void)
+{
+	int rc = 0;
+	struct mic_driver *mdrv = g_drv;
+	struct mic_bootparam __iomem *bootparam = mdrv->dp;
+	int shutdown_db;
+
+	shutdown_db = mic_next_card_db();
+	shutdown_cookie = mic_request_card_irq(mic_shutdown_isr,
+			"Shutdown", mdrv, shutdown_db);
+	if (IS_ERR(shutdown_cookie))
+		rc = PTR_ERR(shutdown_cookie);
+	else
+		iowrite8(shutdown_db, &bootparam->h2c_shutdown_db);
+	return rc;
+}
+
+static void mic_shutdown_uninit(void)
+{
+	struct mic_driver *mdrv = g_drv;
+	struct mic_bootparam __iomem *bootparam = mdrv->dp;
+
+	iowrite8(-1, &bootparam->h2c_shutdown_db);
+	mic_free_card_irq(shutdown_cookie, mdrv);
+}
+
+static int __init mic_dp_init(void)
+{
+	struct mic_driver *mdrv = g_drv;
+	struct mic_device *mdev = &mdrv->mdev;
+	struct mic_bootparam __iomem *bootparam;
+	u64 lo, hi, dp_dma_addr;
+	u32 magic;
+
+	lo = mic_read_spad(&mdrv->mdev, MIC_DPLO_SPAD);
+	hi = mic_read_spad(&mdrv->mdev, MIC_DPHI_SPAD);
+
+	dp_dma_addr = lo | (hi << 32);
+	mdrv->dp = mic_card_map(mdev, dp_dma_addr, MIC_DP_SIZE);
+	if (!mdrv->dp) {
+		dev_err(mdrv->dev, "Cannot remap Aperture BAR\n");
+		return -ENOMEM;
+	}
+	bootparam = mdrv->dp;
+	magic = ioread32(&bootparam->magic);
+	if (MIC_MAGIC != magic) {
+		dev_err(mdrv->dev, "bootparam magic mismatch 0x%x\n", magic);
+		return -EIO;
+	}
+	dev_info(mdrv->dev, "bootparam magic success 0x%x\n", magic);
+	return 0;
+}
+
+/* Uninitialize the device page */
+static void mic_dp_uninit(void)
+{
+	mic_card_unmap(&g_drv->mdev, g_drv->dp);
+}
+
+/**
+ * mic_request_card_irq - request an irq.
+ *
+ * @func: The callback function that handles the interrupt.
+ * @name: The ASCII name of the callee requesting the irq.
+ * @data: private data that is returned back when calling the
+ * function handler.
+ * @index: The doorbell index of the requester.
+ *
+ * returns: The cookie that is transparent to the caller. Passed
+ * back when calling mic_free_irq. An appropriate error code
+ * is returned on failure. Caller needs to use IS_ERR(return_val)
+ * to check for failure and PTR_ERR(return_val) to obtained the
+ * error code.
+ *
+ */
+struct mic_irq *mic_request_card_irq(irqreturn_t (*func)(int irq, void *data),
+	const char *name, void *data, int index)
+{
+	int rc = 0;
+	unsigned long cookie;
+	struct mic_driver *mdrv = g_drv;
+
+	rc  = request_irq(mic_db_to_irq(mdrv, index), func,
+		0, name, data);
+	if (rc) {
+		dev_err(mdrv->dev, "request_irq failed rc = %d\n", rc);
+		goto err;
+	}
+	mdrv->irq_info.irq_usage_count[index]++;
+	cookie = index;
+	return (struct mic_irq *)cookie;
+err:
+	return ERR_PTR(rc);
+
+}
+
+/**
+ * mic_free_card_irq - free irq.
+ *
+ * @cookie: cookie obtained during a successful call to mic_request_irq
+ * @data: private data specified by the calling function during the
+ * mic_request_irq
+ *
+ * returns: none.
+ */
+void mic_free_card_irq(struct mic_irq *cookie, void *data)
+{
+	int index;
+	struct mic_driver *mdrv = g_drv;
+
+	index = (unsigned long)cookie & 0xFFFFU;
+	free_irq(mic_db_to_irq(mdrv, index), data);
+	mdrv->irq_info.irq_usage_count[index]--;
+}
+
+/**
+ * mic_next_card_db - Get the doorbell with minimum usage count.
+ *
+ * Returns the irq index.
+ */
+int mic_next_card_db(void)
+{
+	int i;
+	int index = 0;
+	struct mic_driver *mdrv = g_drv;
+
+	for (i = 0; i < mdrv->intr_info.num_intr; i++) {
+		if (mdrv->irq_info.irq_usage_count[i] <
+			mdrv->irq_info.irq_usage_count[index])
+			index = i;
+	}
+
+	return index;
+}
+
+/**
+ * mic_init_irq - Initialize irq information.
+ *
+ * Returns 0 in success. Appropriate error code on failure.
+ */
+static int mic_init_irq(void)
+{
+	struct mic_driver *mdrv = g_drv;
+
+	mdrv->irq_info.irq_usage_count = kzalloc((sizeof(u32) *
+			mdrv->intr_info.num_intr),
+			GFP_KERNEL);
+	if (!mdrv->irq_info.irq_usage_count)
+		return -ENOMEM;
+	return 0;
+}
+
+/**
+ * mic_uninit_irq - Uninitialize irq information.
+ *
+ * None.
+ */
+static void mic_uninit_irq(void)
+{
+	struct mic_driver *mdrv = g_drv;
+
+	kfree(mdrv->irq_info.irq_usage_count);
+}
+
+/*
+ * mic_driver_init - MIC driver initialization tasks.
+ *
+ * Returns 0 in success. Appropriate error code on failure.
+ */
+int __init mic_driver_init(struct mic_driver *mdrv)
+{
+	int rc;
+
+	g_drv = mdrv;
+	/*
+	 * Unloading the card module is not supported. The MIC card module
+	 * handles fundamental operations like host/card initiated shutdowns
+	 * and informing the host about card crashes and cannot be unloaded.
+	 */
+	if (!try_module_get(mdrv->dev->driver->owner)) {
+		rc = -ENODEV;
+		goto done;
+	}
+	rc = mic_dp_init();
+	if (rc)
+		goto put;
+	rc = mic_init_irq();
+	if (rc)
+		goto dp_uninit;
+	rc = mic_shutdown_init();
+	if (rc)
+		goto irq_uninit;
+	mic_create_card_debug_dir(mdrv);
+	atomic_notifier_chain_register(&panic_notifier_list, &mic_panic);
+done:
+	return rc;
+irq_uninit:
+	mic_uninit_irq();
+dp_uninit:
+	mic_dp_uninit();
+put:
+	module_put(mdrv->dev->driver->owner);
+	return rc;
+}
+
+/*
+ * mic_driver_uninit - MIC driver uninitialization tasks.
+ *
+ * Returns None
+ */
+void mic_driver_uninit(struct mic_driver *mdrv)
+{
+	mic_delete_card_debug_dir(mdrv);
+	/*
+	 * Inform the host about the shutdown status i.e. poweroff/restart etc.
+	 * The module cannot be unloaded so the only code path to call
+	 * mic_devices_uninit(..) is the shutdown callback.
+	 */
+	mic_notify_host(system_state);
+	mic_shutdown_uninit();
+	mic_uninit_irq();
+	mic_dp_uninit();
+	module_put(mdrv->dev->driver->owner);
+}
diff --git a/drivers/misc/mic/card/mic_device.h b/drivers/misc/mic/card/mic_device.h
new file mode 100644
index 0000000..c50d9d6
--- /dev/null
+++ b/drivers/misc/mic/card/mic_device.h
@@ -0,0 +1,127 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#ifndef _MIC_CARD_DEVICE_H_
+#define _MIC_CARD_DEVICE_H_
+
+/**
+ * struct mic_intr_info - Contains h/w specific interrupt sources info
+ *
+ * @num_intr: The number of irqs available
+ */
+struct mic_intr_info {
+	u32 num_intr;
+};
+
+/**
+ * struct mic_irq_info - OS specific irq information
+ *
+ * @irq_usage_count: usage count array tracking the number of sources
+ * assigned for each irq.
+ */
+struct mic_irq_info {
+	int *irq_usage_count;
+};
+
+/**
+ * struct mic_device -  MIC device information.
+ *
+ * @mmio: MMIO bar information.
+ */
+struct mic_device {
+	struct mic_mw mmio;
+};
+
+/**
+ * struct mic_driver - MIC card driver information.
+ *
+ * @name: Name for MIC driver.
+ * @dbg_dir: debugfs directory of this MIC device.
+ * @dev: The device backing this MIC.
+ * @dp: The pointer to the virtio device page.
+ * @mdev: MIC device information for the host.
+ * @hotplug_work: Hot plug work for adding/removing virtio devices.
+ * @irq_info: The OS specific irq information
+ * @intr_info: H/W specific interrupt information.
+ */
+struct mic_driver {
+	char name[20];
+	struct dentry *dbg_dir;
+	struct device *dev;
+	void __iomem *dp;
+	struct mic_device mdev;
+	struct work_struct hotplug_work;
+	struct mic_irq_info irq_info;
+	struct mic_intr_info intr_info;
+};
+
+/**
+ * struct mic_irq - opaque pointer used as cookie
+ */
+struct mic_irq;
+
+/**
+ * mic_mmio_read - read from an MMIO register.
+ * @mw: MMIO register base virtual address.
+ * @offset: register offset.
+ *
+ * RETURNS: register value.
+ */
+static inline u32 mic_mmio_read(struct mic_mw *mw, u32 offset)
+{
+	return ioread32(mw->va + offset);
+}
+
+/**
+ * mic_mmio_write - write to an MMIO register.
+ * @mw: MMIO register base virtual address.
+ * @val: the data value to put into the register
+ * @offset: register offset.
+ *
+ * RETURNS: none.
+ */
+static inline void
+mic_mmio_write(struct mic_mw *mw, u32 val, u32 offset)
+{
+	iowrite32(val, mw->va + offset);
+}
+
+int mic_driver_init(struct mic_driver *mdrv);
+void mic_driver_uninit(struct mic_driver *mdrv);
+int mic_next_card_db(void);
+struct mic_irq *mic_request_card_irq(irqreturn_t (*func)(int irq, void *data),
+	const char *name, void *data, int intr_src);
+void mic_free_card_irq(struct mic_irq *cookie, void *data);
+u32 mic_read_spad(struct mic_device *mdev, unsigned int idx);
+void mic_send_intr(struct mic_device *mdev, int doorbell);
+int mic_db_to_irq(struct mic_driver *mdrv, int db);
+u32 mic_ack_interrupt(struct mic_device *mdev);
+void mic_hw_intr_init(struct mic_driver *mdrv);
+void __iomem *
+mic_card_map(struct mic_device *mdev, dma_addr_t addr, size_t size);
+void mic_card_unmap(struct mic_device *mdev, void __iomem *addr);
+
+#endif
diff --git a/drivers/misc/mic/card/mic_x100.c b/drivers/misc/mic/card/mic_x100.c
new file mode 100644
index 0000000..c6249fe
--- /dev/null
+++ b/drivers/misc/mic/card/mic_x100.c
@@ -0,0 +1,256 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/platform_device.h>
+
+#include "mic_common.h"
+#include "mic_debugfs.h"
+
+static const char mic_driver_name[] = "mic";
+
+static struct mic_driver g_drv;
+
+/**
+ * mic_read_spad - read from the scratchpad register
+ * @mdev: pointer to mic_device instance
+ * @idx: index to scratchpad register, 0 based
+ *
+ * This function allows reading of the 32bit scratchpad register.
+ *
+ * RETURNS: An appropriate -ERRNO error value on error, or zero for success.
+ */
+u32 mic_read_spad(struct mic_device *mdev, unsigned int idx)
+{
+	return mic_mmio_read(&mdev->mmio,
+		MIC_X100_SBOX_BASE_ADDRESS +
+		MIC_X100_SBOX_SPAD0 + idx * 4);
+}
+
+/**
+ * __mic_send_intr - Send interrupt to Host.
+ * @mdev: pointer to mic_device instance
+ * @doorbell: Doorbell number.
+ */
+void mic_send_intr(struct mic_device *mdev, int doorbell)
+{
+	struct mic_mw *mw = &mdev->mmio;
+
+	if (doorbell > MIC_X100_MAX_DOORBELL_IDX)
+		return;
+	/* Ensure all stores have been completed before sending an interrupt */
+	wmb();
+	mic_mmio_write(mw, MIC_X100_SBOX_SDBIC0_DBREQ_BIT,
+		MIC_X100_SBOX_BASE_ADDRESS +
+		(MIC_X100_SBOX_SDBIC0 + (4 * doorbell)));
+}
+
+/**
+ * mic_ack_interrupt - Device specific interrupt handling.
+ * @mdev: pointer to mic_device instance
+ *
+ * Returns: bitmask of doorbell events triggered.
+ */
+u32 mic_ack_interrupt(struct mic_device *mdev)
+{
+	return 0;
+}
+
+static inline int mic_get_sbox_irq(int db)
+{
+	return MIC_X100_IRQ_BASE + db;
+}
+
+static inline int mic_get_rdmasr_irq(int index)
+{
+	return  MIC_X100_RDMASR_IRQ_BASE + index;
+}
+
+/**
+ * mic_hw_intr_init - Initialize h/w specific interrupt
+ * information.
+ * @mdrv: pointer to mic_driver
+ */
+void mic_hw_intr_init(struct mic_driver *mdrv)
+{
+	mdrv->intr_info.num_intr = MIC_X100_NUM_SBOX_IRQ +
+				MIC_X100_NUM_RDMASR_IRQ;
+}
+
+/**
+ * mic_db_to_irq - Retrieve irq number corresponding to a doorbell.
+ * @mdrv: pointer to mic_driver
+ * @db: The doorbell obtained for which the irq is needed. Doorbell
+ * may correspond to an sbox doorbell or an rdmasr index.
+ *
+ * Returns the irq corresponding to the doorbell.
+ */
+int mic_db_to_irq(struct mic_driver *mdrv, int db)
+{
+	int rdmasr_index;
+	if (db < MIC_X100_NUM_SBOX_IRQ) {
+		return mic_get_sbox_irq(db);
+	} else {
+		rdmasr_index = db - MIC_X100_NUM_SBOX_IRQ +
+			MIC_X100_RDMASR_IRQ_BASE;
+		return mic_get_rdmasr_irq(rdmasr_index);
+	}
+}
+
+/*
+ * mic_card_map - Allocate virtual address for a remote memory region.
+ * @mdev: pointer to mic_device instance.
+ * @addr: Remote DMA address.
+ * @size: Size of the region.
+ *
+ * Returns: Virtual address backing the remote memory region.
+ */
+void __iomem *
+mic_card_map(struct mic_device *mdev, dma_addr_t addr, size_t size)
+{
+	return ioremap(addr, size);
+}
+
+/*
+ * mic_card_unmap - Unmap the virtual address for a remote memory region.
+ * @mdev: pointer to mic_device instance.
+ * @addr: Virtual address for remote memory region.
+ *
+ * Returns: None.
+ */
+void mic_card_unmap(struct mic_device *mdev, void __iomem *addr)
+{
+	iounmap(addr);
+}
+
+static int __init mic_probe(struct platform_device *pdev)
+{
+	struct mic_driver *mdrv = &g_drv;
+	struct mic_device *mdev = &mdrv->mdev;
+	int rc = 0;
+
+	mdrv->dev = &pdev->dev;
+	snprintf(mdrv->name, sizeof(mic_driver_name), mic_driver_name);
+
+	mdev->mmio.pa = MIC_X100_MMIO_BASE;
+	mdev->mmio.len = MIC_X100_MMIO_LEN;
+	mdev->mmio.va = ioremap(MIC_X100_MMIO_BASE, MIC_X100_MMIO_LEN);
+	if (!mdev->mmio.va) {
+		dev_err(&pdev->dev, "Cannot remap MMIO BAR\n");
+		rc = -EIO;
+		goto done;
+	}
+	mic_hw_intr_init(mdrv);
+	rc = mic_driver_init(mdrv);
+	if (rc) {
+		dev_err(&pdev->dev, "mic_driver_init failed rc %d\n", rc);
+		goto iounmap;
+	}
+	dev_info(&pdev->dev, "Probe successful for %s\n", mic_driver_name);
+done:
+	return rc;
+iounmap:
+	iounmap(mdev->mmio.va);
+	return rc;
+}
+
+static int mic_remove(struct platform_device *pdev)
+{
+	struct mic_driver *mdrv = &g_drv;
+	struct mic_device *mdev = &mdrv->mdev;
+
+	mic_driver_uninit(mdrv);
+	iounmap(mdev->mmio.va);
+	return 0;
+}
+
+static void mic_platform_shutdown(struct platform_device *pdev)
+{
+	mic_remove(pdev);
+}
+
+static struct platform_device mic_platform_dev = {
+	.name = mic_driver_name,
+	.id   = 0,
+	.num_resources = 0,
+};
+
+static struct platform_driver mic_platform_driver = {
+	.probe = mic_probe,
+	.remove = mic_remove,
+	.shutdown = mic_platform_shutdown,
+	.driver         = {
+		.name   = mic_driver_name,
+		.owner	= THIS_MODULE,
+	},
+};
+
+static int __init mic_init(void)
+{
+	int ret;
+	struct cpuinfo_x86 *c = &cpu_data(0);
+
+	if (!(c->x86 == 11 && c->x86_model == 1)) {
+		ret = -ENODEV;
+		pr_err("%s not running on X100 ret %d\n", __func__, ret);
+		goto done;
+	}
+
+	mic_init_card_debugfs();
+	ret = platform_device_register(&mic_platform_dev);
+	if (ret) {
+		pr_err("platform_device_register ret %d\n", ret);
+		goto cleanup_debugfs;
+	}
+	ret = platform_driver_register(&mic_platform_driver);
+	if (ret) {
+		pr_err("platform_driver_register ret %d\n", ret);
+		goto device_unregister;
+	}
+	return ret;
+
+device_unregister:
+	platform_device_unregister(&mic_platform_dev);
+cleanup_debugfs:
+	mic_exit_card_debugfs();
+done:
+	return ret;
+}
+
+static void __exit mic_exit(void)
+{
+	platform_driver_unregister(&mic_platform_driver);
+	platform_device_unregister(&mic_platform_dev);
+	mic_exit_card_debugfs();
+}
+
+module_init(mic_init);
+module_exit(mic_exit);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("Intel(R) MIC X100 Card driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/misc/mic/card/mic_x100.h b/drivers/misc/mic/card/mic_x100.h
new file mode 100644
index 0000000..d66ea55
--- /dev/null
+++ b/drivers/misc/mic/card/mic_x100.h
@@ -0,0 +1,48 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#ifndef _MIC_X100_CARD_H_
+#define _MIC_X100_CARD_H_
+
+#define MIC_X100_MMIO_BASE 0x08007C0000ULL
+#define MIC_X100_MMIO_LEN 0x00020000ULL
+#define MIC_X100_SBOX_BASE_ADDRESS 0x00010000ULL
+
+#define MIC_X100_SBOX_SPAD0 0x0000AB20
+#define MIC_X100_SBOX_SDBIC0 0x0000CC90
+#define MIC_X100_SBOX_SDBIC0_DBREQ_BIT 0x80000000
+#define MIC_X100_SBOX_RDMASR0	0x0000B180
+
+#define MIC_X100_MAX_DOORBELL_IDX 8
+
+#define MIC_X100_NUM_SBOX_IRQ 8
+#define MIC_X100_NUM_RDMASR_IRQ 8
+#define MIC_X100_SBOX_IRQ_BASE 0
+#define MIC_X100_RDMASR_IRQ_BASE 17
+
+#define MIC_X100_IRQ_BASE 26
+
+#endif
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/7] Intel MIC Host Driver Changes for Virtio Devices.
  2013-08-08  3:04 [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors Sudeep Dutt
                   ` (3 preceding siblings ...)
  2013-08-08  3:04 ` [PATCH v2 4/7] Intel MIC Card Driver for X100 family Sudeep Dutt
@ 2013-08-08  3:04 ` Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 6/7] Intel MIC Card " Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon Sudeep Dutt
  6 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-08  3:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell,
	Michael S. Tsirkin, Rob Landley, linux-kernel, virtualization,
	linux-doc
  Cc: Harshavardhan R Kharche, Peter P Waskiewicz Jr,
	Yaozu (Eddie) Dong, Sudeep Dutt, Ashutosh Dixit, AsiasHeasias,
	Caz Yokoyama, Dasaratharaman Chandramouli

From: Ashutosh Dixit <ashutosh.dixit@intel.com>

This patch introduces the host "Virtio over PCIe" interface for
Intel MIC. It allows creating user space backends on the host and instantiating
virtio devices for them on the Intel MIC card. It uses the existing VRINGH
infrastructure in the kernel to access virtio rings from the host. A character
device per MIC is exposed with IOCTL, mmap and poll callbacks. This allows the
user space backend to:
(a) add/remove a virtio device via a device page.
(b) map (R/O) virtio rings and device page to user space.
(c) poll for availability of data.
(d) copy a descriptor or entire descriptor chain to/from the card.
(e) modify virtio configuration.
(f) handle virtio device reset.
The buffers are copied over using CPU copies for this initial patch
and host initiated MIC DMA support is planned for future patches.
The avail and desc virtio rings are in host memory and the used ring
is in card memory to maximize writes across PCIe for performance.

Co-author: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---
 drivers/misc/mic/Kconfig             |   1 +
 drivers/misc/mic/common/mic_device.h |   7 +
 drivers/misc/mic/host/Makefile       |   2 +
 drivers/misc/mic/host/mic_boot.c     |   2 +
 drivers/misc/mic/host/mic_debugfs.c  | 140 +++++++
 drivers/misc/mic/host/mic_device.h   |   2 +
 drivers/misc/mic/host/mic_fops.c     | 227 +++++++++++
 drivers/misc/mic/host/mic_fops.h     |  32 ++
 drivers/misc/mic/host/mic_main.c     |  26 ++
 drivers/misc/mic/host/mic_virtio.c   | 710 +++++++++++++++++++++++++++++++++++
 drivers/misc/mic/host/mic_virtio.h   | 139 +++++++
 include/uapi/linux/Kbuild            |   1 +
 include/uapi/linux/mic_common.h      | 164 +++++++-
 include/uapi/linux/mic_ioctl.h       |  76 ++++
 14 files changed, 1528 insertions(+), 1 deletion(-)
 create mode 100644 drivers/misc/mic/host/mic_fops.c
 create mode 100644 drivers/misc/mic/host/mic_fops.h
 create mode 100644 drivers/misc/mic/host/mic_virtio.c
 create mode 100644 drivers/misc/mic/host/mic_virtio.h
 create mode 100644 include/uapi/linux/mic_ioctl.h

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 279a2e6..01f1a4a 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -3,6 +3,7 @@ comment "Intel MIC Host Driver"
 config INTEL_MIC_HOST
 	tristate "Intel MIC Host Driver"
 	depends on 64BIT && PCI
+	select VHOST_RING
 	default N
 	help
 	  This enables Host Driver support for the Intel Many Integrated
diff --git a/drivers/misc/mic/common/mic_device.h b/drivers/misc/mic/common/mic_device.h
index 6440e9d..01eb74f 100644
--- a/drivers/misc/mic/common/mic_device.h
+++ b/drivers/misc/mic/common/mic_device.h
@@ -41,4 +41,11 @@ struct mic_mw {
 #define MIC_DPLO_SPAD 14
 #define MIC_DPHI_SPAD 15
 
+/*
+ * These values are supposed to be in the config_change field of the
+ * device page when the host sends a config change interrupt to the card.
+ */
+#define MIC_VIRTIO_PARAM_DEV_REMOVE 0x1
+#define MIC_VIRTIO_PARAM_CONFIG_CHANGED 0x2
+
 #endif
diff --git a/drivers/misc/mic/host/Makefile b/drivers/misc/mic/host/Makefile
index 98bf565..e73ac74 100644
--- a/drivers/misc/mic/host/Makefile
+++ b/drivers/misc/mic/host/Makefile
@@ -9,3 +9,5 @@ mic_host-objs += mic_sysfs.o
 mic_host-objs += mic_smpt.o
 mic_host-objs += mic_boot.o
 mic_host-objs += mic_debugfs.o
+mic_host-objs += mic_fops.o
+mic_host-objs += mic_virtio.o
diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c
index fcfb86a..695a767 100644
--- a/drivers/misc/mic/host/mic_boot.c
+++ b/drivers/misc/mic/host/mic_boot.c
@@ -25,6 +25,7 @@
 #include <linux/delay.h>
 
 #include "mic_common.h"
+#include "mic_virtio.h"
 
 /**
  * mic_reset - Reset the MIC device.
@@ -116,6 +117,7 @@ void mic_stop(struct mic_device *mdev, bool force)
 {
 	mutex_lock(&mdev->mic_mutex);
 	if (MIC_OFFLINE != mdev->state || force) {
+		mic_virtio_reset_devices(mdev);
 		mic_bootparam_init(mdev);
 		mic_reset(mdev);
 		if (MIC_RESET_FAILED == mdev->state)
diff --git a/drivers/misc/mic/host/mic_debugfs.c b/drivers/misc/mic/host/mic_debugfs.c
index 74f0713..f64fd4b 100644
--- a/drivers/misc/mic/host/mic_debugfs.c
+++ b/drivers/misc/mic/host/mic_debugfs.c
@@ -27,6 +27,7 @@
 
 #include "mic_common.h"
 #include "mic_debugfs.h"
+#include "mic_virtio.h"
 
 /* Debugfs parent dir */
 static struct dentry *mic_dbg;
@@ -194,7 +195,13 @@ static const struct file_operations post_code_ops = {
 static int mic_dp_show(struct seq_file *s, void *pos)
 {
 	struct mic_device *mdev = s->private;
+	struct mic_device_desc *d;
+	struct mic_device_ctrl *dc;
+	struct mic_vqconfig *vqconfig;
+	__u32 *features;
+	__u8 *config;
 	struct mic_bootparam *bootparam = mdev->dp;
+	int i, j;
 
 	seq_printf(s, "Bootparam: magic 0x%x\n",
 		bootparam->magic);
@@ -209,6 +216,53 @@ static int mic_dp_show(struct seq_file *s, void *pos)
 	seq_printf(s, "Bootparam: shutdown_card %d\n",
 		bootparam->shutdown_card);
 
+	for (i = sizeof(*bootparam); i < MIC_DP_SIZE;
+	     i += mic_total_desc_size(d)) {
+		d = mdev->dp + i;
+		dc = (void *)d + mic_aligned_desc_size(d);
+
+		/* end of list */
+		if (d->type == 0)
+			break;
+
+		if (d->type == -1)
+			continue;
+
+		seq_printf(s, "Type %d ", d->type);
+		seq_printf(s, "Num VQ %d ", d->num_vq);
+		seq_printf(s, "Feature Len %d\n", d->feature_len);
+		seq_printf(s, "Config Len %d ", d->config_len);
+		seq_printf(s, "Shutdown Status %d\n", d->status);
+
+		for (j = 0; j < d->num_vq; j++) {
+			vqconfig = mic_vq_config(d) + j;
+			seq_printf(s, "vqconfig[%d]: ", j);
+			seq_printf(s, "address 0x%llx ", vqconfig->address);
+			seq_printf(s, "num %d ", vqconfig->num);
+			seq_printf(s, "used address 0x%llx\n",
+				vqconfig->used_address);
+		}
+
+		features = (__u32 *) mic_vq_features(d);
+		seq_printf(s, "Features: Host 0x%x ", features[0]);
+		seq_printf(s, "Guest 0x%x\n", features[1]);
+
+		config = mic_vq_configspace(d);
+		for (j = 0; j < d->config_len; j++)
+			seq_printf(s, "config[%d]=%d\n", j, config[j]);
+
+		seq_puts(s, "Device control:\n");
+		seq_printf(s, "Config Change %d ", dc->config_change);
+		seq_printf(s, "Vdev reset %d\n", dc->vdev_reset);
+		seq_printf(s, "Guest Ack %d ", dc->guest_ack);
+		seq_printf(s, "Host ack %d\n", dc->host_ack);
+		seq_printf(s, "Used address updated %d ",
+			dc->used_address_updated);
+		seq_printf(s, "Vdev 0x%llx\n", dc->vdev);
+		seq_printf(s, "c2h doorbell %d ", dc->c2h_vdev_db);
+		seq_printf(s, "h2c doorbell %d\n", dc->h2c_vdev_db);
+	}
+
 	return 0;
 }
 
@@ -230,6 +284,89 @@ static const struct file_operations dp_ops = {
 	.release = mic_dp_debug_release
 };
 
+static int mic_vdev_info_show(struct seq_file *s, void *unused)
+{
+	struct mic_device *mdev = s->private;
+	struct list_head *pos, *tmp;
+	struct mic_vdev *mvdev;
+	int i, j;
+
+	mutex_lock(&mdev->mic_mutex);
+	list_for_each_safe(pos, tmp, &mdev->vdev_list) {
+		mvdev = list_entry(pos, struct mic_vdev, list);
+		seq_printf(s, "VDEV type %d state %s in %ld out %ld\n",
+			mvdev->virtio_id,
+			mic_vdevup(mvdev) ? "UP" : "DOWN",
+			mvdev->in_bytes,
+			mvdev->out_bytes);
+		for (i = 0; i < MIC_MAX_VRINGS; i++) {
+			struct vring_desc *desc;
+			struct vring_avail *avail;
+			struct vring_used *used;
+			struct mic_vringh *mvr = &mvdev->mvr[i];
+			struct vringh *vrh = &mvr->vrh;
+			int num = vrh->vring.num;
+			if (!num)
+				continue;
+			desc = vrh->vring.desc;
+			seq_printf(s, "vring i %d avail_idx %d",
+				i, mvr->vring.info->avail_idx & (num - 1));
+			seq_printf(s, " vring i %d avail_idx %d\n",
+				i, mvr->vring.info->avail_idx);
+			seq_printf(s, "vrh i %d weak_barriers %d",
+				i, vrh->weak_barriers);
+			seq_printf(s, " last_avail_idx %d last_used_idx %d",
+				vrh->last_avail_idx, vrh->last_used_idx);
+			seq_printf(s, " completed %d\n", vrh->completed);
+			for (j = 0; j < num; j++) {
+				seq_printf(s, "desc[%d] addr 0x%llx len %d",
+					j, desc->addr, desc->len);
+				seq_printf(s, " flags 0x%x next %d\n",
+					desc->flags,
+					desc->next);
+				desc++;
+			}
+			avail = vrh->vring.avail;
+			seq_printf(s, "avail flags 0x%x idx %d\n",
+				avail->flags, avail->idx & (num - 1));
+			seq_printf(s, "avail flags 0x%x idx %d\n",
+				avail->flags, avail->idx);
+			for (j = 0; j < num; j++)
+				seq_printf(s, "avail ring[%d] %d\n",
+					j, avail->ring[j]);
+			used = vrh->vring.used;
+			seq_printf(s, "used flags 0x%x idx %d\n",
+				used->flags, used->idx & (num - 1));
+			seq_printf(s, "used flags 0x%x idx %d\n",
+				used->flags, used->idx);
+			for (j = 0; j < num; j++)
+				seq_printf(s, "used ring[%d] id %d len %d\n",
+					j, used->ring[j].id, used->ring[j].len);
+		}
+	}
+	mutex_unlock(&mdev->mic_mutex);
+
+	return 0;
+}
+
+static int mic_vdev_info_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, mic_vdev_info_show, inode->i_private);
+}
+
+static int mic_vdev_info_debug_release(struct inode *inode, struct file *file)
+{
+	return single_release(inode, file);
+}
+
+static const struct file_operations vdev_info_ops = {
+	.owner   = THIS_MODULE,
+	.open    = mic_vdev_info_debug_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = mic_vdev_info_debug_release
+};
+
 static int mic_msi_irq_info_show(struct seq_file *s, void *pos)
 {
 	struct mic_device *mdev  = s->private;
@@ -320,6 +457,9 @@ void __init mic_create_debug_dir(struct mic_device *mdev)
 	debugfs_create_file("dp", 0444, mdev->dbg_dir,
 		mdev, &dp_ops);
 
+	debugfs_create_file("vdev_info", 0444, mdev->dbg_dir,
+		mdev, &vdev_info_ops);
+
 	debugfs_create_file("msi_irq_info", 0444, mdev->dbg_dir,
 		mdev, &msi_irq_info_ops);
 }
diff --git a/drivers/misc/mic/host/mic_device.h b/drivers/misc/mic/host/mic_device.h
index 79312aa..2f80494 100644
--- a/drivers/misc/mic/host/mic_device.h
+++ b/drivers/misc/mic/host/mic_device.h
@@ -150,6 +150,7 @@ struct mic_irq;
  * @cdev: Character device for MIC.
  * @shutdown_db: shutdown doorbell.
  * @shutdown_cookie: shutdown cookie.
+ * @vdev_list: list of virtio devices.
  */
 struct mic_device {
 	char name[20];
@@ -185,6 +186,7 @@ struct mic_device {
 	struct cdev cdev;
 	int shutdown_db;
 	struct mic_irq *shutdown_cookie;
+	struct list_head vdev_list;
 };
 
 /**
diff --git a/drivers/misc/mic/host/mic_fops.c b/drivers/misc/mic/host/mic_fops.c
new file mode 100644
index 0000000..107a083
--- /dev/null
+++ b/drivers/misc/mic/host/mic_fops.c
@@ -0,0 +1,227 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/firmware.h>
+#include <linux/completion.h>
+#include <linux/poll.h>
+#include <linux/virtio_ids.h>
+#include <linux/mic_ioctl.h>
+
+#include "mic_common.h"
+#include "mic_fops.h"
+#include "mic_virtio.h"
+
+int mic_open(struct inode *inode, struct file *f)
+{
+	struct mic_vdev *mvdev;
+	struct mic_device *mdev = container_of(inode->i_cdev,
+		struct mic_device, cdev);
+
+	mvdev = kzalloc(sizeof(*mvdev), GFP_KERNEL);
+	if (!mvdev)
+		return -ENOMEM;
+
+	init_waitqueue_head(&mvdev->waitq);
+	INIT_LIST_HEAD(&mvdev->list);
+	mvdev->mdev = mdev;
+	mvdev->virtio_id = -1;
+
+	f->private_data = mvdev;
+	return 0;
+}
+
+int mic_release(struct inode *inode, struct file *f)
+{
+	struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data;
+
+	if (-1 != mvdev->virtio_id)
+		mic_virtio_del_device(mvdev);
+	f->private_data = NULL;
+	kfree(mvdev);
+	return 0;
+}
+
+long mic_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
+{
+	struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data;
+	void __user *argp = (void __user *)arg;
+	int ret;
+
+	switch (cmd) {
+	case MIC_VIRTIO_ADD_DEVICE:
+	{
+		ret = mic_virtio_add_device(mvdev, argp);
+		if (ret < 0) {
+			dev_err(mic_dev(mvdev),
+				"%s %d errno ret %d\n",
+				__func__, __LINE__, ret);
+			return ret;
+		}
+		break;
+	}
+	case MIC_VIRTIO_COPY_DESC:
+	{
+		struct mic_copy_desc copy;
+
+		ret = mic_vdev_inited(mvdev);
+		if (ret)
+			return ret;
+
+		if (copy_from_user(&copy, argp, sizeof(copy)))
+			return -EFAULT;
+
+		dev_dbg(mic_dev(mvdev),
+			"%s %d === iovcnt 0x%x vr_idx 0x%x update_used %d\n",
+			__func__, __LINE__, copy.iovcnt, copy.vr_idx,
+			copy.update_used);
+
+		ret = mic_virtio_copy_desc(mvdev, &copy);
+		if (ret < 0) {
+			dev_err(mic_dev(mvdev),
+				"%s %d errno ret %d\n",
+				__func__, __LINE__, ret);
+			return ret;
+		}
+		if (copy_to_user(
+			&((struct mic_copy_desc __user *)argp)->out_len,
+			&copy.out_len, sizeof(copy.out_len))) {
+			dev_err(mic_dev(mvdev), "%s %d errno ret %d\n",
+				__func__, __LINE__, -EFAULT);
+			return -EFAULT;
+		}
+		break;
+	}
+	case MIC_VIRTIO_CONFIG_CHANGE:
+	{
+		ret = mic_vdev_inited(mvdev);
+		if (ret)
+			return ret;
+
+		ret = mic_virtio_config_change(mvdev, argp);
+		if (ret < 0) {
+			dev_err(mic_dev(mvdev),
+				"%s %d errno ret %d\n",
+				__func__, __LINE__, ret);
+			return ret;
+		}
+		break;
+	}
+	default:
+		return -ENOIOCTLCMD;
+	};
+	return 0;
+}
+
+/*
+ * We return POLLIN | POLLOUT from poll when new buffers are enqueued, and
+ * not when previously enqueued buffers may be available. This means that
+ * in the card->host (TX) path, when userspace is unblocked by poll it
+ * must drain all available descriptors or it can stall.
+ */
+unsigned int mic_poll(struct file *f, poll_table *wait)
+{
+	struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data;
+	int mask = 0;
+
+	poll_wait(f, &mvdev->waitq, wait);
+
+	if (mic_vdev_inited(mvdev))
+		mask = POLLERR;
+	else if (mvdev->poll_wake) {
+		mvdev->poll_wake = 0;
+		mask = POLLIN | POLLOUT;
+	}
+
+	return mask;
+}
+
+static inline int
+mic_query_offset(struct mic_vdev *mvdev, unsigned long offset,
+	unsigned long *size, unsigned long *pa)
+{
+	struct mic_device *mdev = mvdev->mdev;
+	unsigned long start = MIC_DP_SIZE;
+	int i;
+
+	/*
+	 * MMAP interface is as follows:
+	 * offset				region
+	 * 0x0					virtio device_page
+	 * 0x1000				first vring
+	 * 0x1000 + size of 1st vring		second vring
+	 * ....
+	 */
+	if (!offset) {
+		*pa = virt_to_phys(mdev->dp);
+		*size = MIC_DP_SIZE;
+		return 0;
+	}
+
+	for (i = 0; i < mvdev->dd->num_vq; i++) {
+		struct mic_vringh *mvr = &mvdev->mvr[i];
+		if (offset == start) {
+			*pa = virt_to_phys(mvr->vring.va);
+			*size = mvr->vring.len;
+			return 0;
+		}
+		start += mvr->vring.len;
+	}
+	return -1;
+}
+
+/*
+ * Maps the device page and virtio rings to user space for readonly access.
+ */
+int
+mic_mmap(struct file *f, struct vm_area_struct *vma)
+{
+	struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data;
+	unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
+	unsigned long pa, size = vma->vm_end - vma->vm_start, size_rem = size;
+	int i, err;
+
+	err = mic_vdev_inited(mvdev);
+	if (err)
+		return err;
+
+	if (vma->vm_flags & VM_WRITE)
+		return -EACCES;
+
+	while (size_rem) {
+		i = mic_query_offset(mvdev, offset, &size, &pa);
+		if (i < 0)
+			return -EINVAL;
+		err = remap_pfn_range(vma, vma->vm_start + offset,
+			pa >> PAGE_SHIFT, size, vma->vm_page_prot);
+		if (err)
+			return err;
+		dev_dbg(mic_dev(mvdev),
+			"%s %d type %d size 0x%lx off 0x%lx pa 0x%lx vma 0x%lx\n",
+			__func__, __LINE__, mvdev->virtio_id, size, offset,
+			pa, vma->vm_start + offset);
+		size_rem -= size;
+		offset += size;
+	}
+	return 0;
+}
diff --git a/drivers/misc/mic/host/mic_fops.h b/drivers/misc/mic/host/mic_fops.h
new file mode 100644
index 0000000..dc3893d
--- /dev/null
+++ b/drivers/misc/mic/host/mic_fops.h
@@ -0,0 +1,32 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#ifndef _MIC_FOPS_H_
+#define _MIC_FOPS_H_
+
+int mic_open(struct inode *inode, struct file *filp);
+int mic_release(struct inode *inode, struct file *filp);
+ssize_t mic_read(struct file *filp, char __user *buf,
+			size_t count, loff_t *pos);
+long mic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
+int mic_mmap(struct file *f, struct vm_area_struct *vma);
+unsigned int mic_poll(struct file *f, poll_table *wait);
+
+#endif
diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c
index ede682f..d9407a9 100644
--- a/drivers/misc/mic/host/mic_main.c
+++ b/drivers/misc/mic/host/mic_main.c
@@ -28,9 +28,12 @@
 #include <linux/interrupt.h>
 #include <linux/firmware.h>
 #include <linux/completion.h>
+#include <linux/poll.h>
 
 #include "mic_common.h"
 #include "mic_debugfs.h"
+#include "mic_fops.h"
+#include "mic_virtio.h"
 
 static const char mic_driver_name[] = "mic";
 
@@ -73,6 +76,15 @@ struct mic_info {
 /* g_mic - Global information about all MIC devices. */
 static struct mic_info g_mic;
 
+static const struct file_operations mic_fops = {
+	.open = mic_open,
+	.release = mic_release,
+	.unlocked_ioctl = mic_ioctl,
+	.poll = mic_poll,
+	.mmap = mic_mmap,
+	.owner = THIS_MODULE,
+};
+
 /* Initialize the device page */
 static int mic_dp_init(struct mic_device *mdev)
 {
@@ -818,6 +830,7 @@ mic_device_init(struct mic_device *mdev, struct pci_dev *pdev)
 	mdev->irq_info.next_avail_src = 0;
 	INIT_WORK(&mdev->reset_trigger_work, mic_reset_trigger_work);
 	INIT_WORK(&mdev->shutdown_work, mic_shutdown_work);
+	INIT_LIST_HEAD(&mdev->vdev_list);
 }
 
 /**
@@ -958,8 +971,20 @@ static int __init mic_probe(struct pci_dev *pdev,
 	mic_bootparam_init(mdev);
 
 	mic_create_debug_dir(mdev);
+	cdev_init(&mdev->cdev, &mic_fops);
+	mdev->cdev.owner = THIS_MODULE;
+	rc = cdev_add(&mdev->cdev, MKDEV(MAJOR(g_mic.mdev_id), mdev->id), 1);
+	if (rc) {
+		dev_err(&pdev->dev, "cdev_add err id %d rc %d\n", mdev->id, rc);
+		goto cleanup_debug_dir;
+	}
 	dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name);
 	return 0;
+cleanup_debug_dir:
+	mic_delete_debug_dir(mdev);
+	mutex_lock(&mdev->mic_mutex);
+	mic_free_irq(mdev, mdev->shutdown_cookie, mdev);
+	mutex_unlock(&mdev->mic_mutex);
 dp_uninit:
 	mic_dp_uninit(mdev);
 sysfs_put:
@@ -1008,6 +1033,7 @@ static void mic_remove(struct pci_dev *pdev)
 	id = mdev->id;
 
 	mic_stop(mdev, false);
+	cdev_del(&mdev->cdev);
 	mic_delete_debug_dir(mdev);
 	mutex_lock(&mdev->mic_mutex);
 	mic_free_irq(mdev, mdev->shutdown_cookie, mdev);
diff --git a/drivers/misc/mic/host/mic_virtio.c b/drivers/misc/mic/host/mic_virtio.c
new file mode 100644
index 0000000..731a81d
--- /dev/null
+++ b/drivers/misc/mic/host/mic_virtio.c
@@ -0,0 +1,710 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/interrupt.h>
+#include <linux/firmware.h>
+#include <linux/completion.h>
+#include <linux/poll.h>
+#include <linux/sched.h>
+#include <uapi/linux/virtio_ids.h>
+#include <uapi/linux/virtio_net.h>
+
+#include "mic_common.h"
+#include "mic_virtio.h"
+
+/*
+ * Initiates the copies across the PCIe bus from card memory to
+ * a user space buffer.
+ */
+static int mic_virtio_copy_to_user(struct mic_vdev *mvdev,
+		void __user *ubuf, size_t len, u64 addr)
+{
+	int err;
+	void __iomem *dbuf = mvdev->mdev->aper.va + addr;
+	/*
+	 * We are copying from IO below an should ideally use something
+	 * like copy_to_user_fromio(..) if it existed.
+	 */
+	if (copy_to_user(ubuf, dbuf, len)) {
+		err = -EFAULT;
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, err);
+		goto err;
+	}
+	mvdev->in_bytes += len;
+	err = 0;
+err:
+	return err;
+}
+
+/*
+ * Issue a read across the bus to ensure that all previous PCIe posted writes
+ * from host to card have completed. The host maps card with write combining
+ * attributes and an sfence instruction is apparently not sufficient
+ * to flush the writes across PCIe.
+ * http://lkml.indiana.edu/hypermail/linux/kernel/0208.2/0049.html
+ * https://lkml.org/lkml/2006/2/25/146
+ */
+static inline u8 mic_flush_writes(struct mic_vdev *mvdev)
+{
+	return ioread8(mvdev->mdev->aper.va);
+}
+
+/*
+ * Initiates copies across the PCIe bus from a user space
+ * buffer to card memory.
+ */
+static int mic_virtio_copy_from_user(struct mic_vdev *mvdev,
+		void __user *ubuf, size_t len, u64 addr)
+{
+	int err;
+	void __iomem *dbuf = mvdev->mdev->aper.va + addr;
+	/*
+	 * We are copying to IO below and should ideally use something
+	 * like copy_from_user_toio(..) if it existed.
+	 */
+	if (copy_from_user(dbuf, ubuf, len)) {
+		err = -EFAULT;
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, err);
+		goto err;
+	}
+	mvdev->out_bytes += len;
+	mic_flush_writes(mvdev);
+	err = 0;
+err:
+	return err;
+}
+
+#define MIC_VRINGH_READ true
+
+/* Determine the total number of bytes consumed in a VRINGH KIOV */
+static inline u32 mic_vringh_iov_consumed(struct vringh_kiov *iov)
+{
+	int i;
+	u32 total = iov->consumed;
+
+	for (i = 0; i < iov->i; i++)
+		total += iov->iov[i].iov_len;
+	return total;
+}
+
+/*
+ * Traverse the VRINGH KIOV and issue the APIs to trigger the copies.
+ * This API is heavily based on the vringh_iov_xfer(..) implementation
+ * in vringh.c. The reason we cannot reuse vringh_iov_pull_kern(..)
+ * and vringh_iov_push_kern(..) directly is because there is no
+ * way to override the VRINGH xfer(..) routines as of v3.10.
+ */
+static int mic_vringh_copy(struct mic_vdev *mvdev, struct vringh_kiov *iov,
+	void __user *ubuf, size_t len, bool read, size_t *out_len)
+{
+	int ret = 0;
+	size_t partlen, tot_len = 0;
+
+	while (len && iov->i < iov->used) {
+		partlen = min(iov->iov[iov->i].iov_len, len);
+		if (read)
+			ret = mic_virtio_copy_to_user(mvdev,
+				ubuf, partlen,
+				(u64)iov->iov[iov->i].iov_base);
+		else
+			ret = mic_virtio_copy_from_user(mvdev,
+				ubuf, partlen,
+				(u64)iov->iov[iov->i].iov_base);
+		if (ret) {
+			dev_err(mic_dev(mvdev), "%s %d err %d\n",
+				__func__, __LINE__, ret);
+			break;
+		}
+		len -= partlen;
+		ubuf += partlen;
+		tot_len += partlen;
+		iov->consumed += partlen;
+		iov->iov[iov->i].iov_len -= partlen;
+		iov->iov[iov->i].iov_base += partlen;
+		if (!iov->iov[iov->i].iov_len) {
+			/* Fix up old iov element then increment. */
+			iov->iov[iov->i].iov_len = iov->consumed;
+			iov->iov[iov->i].iov_base -= iov->consumed;
+
+			iov->consumed = 0;
+			iov->i++;
+		}
+	}
+	*out_len = tot_len;
+	return ret;
+}
+
+/*
+ * Use the standard VRINGH infrastructure in the kernel to fetch new
+ * descriptors, initiate the copies and update the used ring.
+ */
+static int _mic_virtio_copy(struct mic_vdev *mvdev,
+	struct mic_copy_desc *copy)
+{
+	int ret = 0, iovcnt = copy->iovcnt;
+	struct iovec iov;
+	struct iovec __user *u_iov = copy->iov;
+	void __user *ubuf;
+	struct mic_vringh *mvr = &mvdev->mvr[copy->vr_idx];
+	struct vringh_kiov *riov = &mvr->riov;
+	struct vringh_kiov *wiov = &mvr->wiov;
+	struct vringh *vrh = &mvr->vrh;
+	u16 *head = &mvr->head;
+	struct mic_vring *vr = &mvr->vring;
+	size_t len = 0, out_len;
+
+	copy->out_len = 0;
+	/* Fetch a new IOVEC if all previous elements have been processed */
+	if (riov->i == riov->used && wiov->i == wiov->used) {
+		ret = vringh_getdesc_kern(vrh, riov, wiov,
+				head, GFP_KERNEL);
+		/* Check if there are available descriptors */
+		if (ret <= 0)
+			return 0;
+	}
+	while (iovcnt) {
+		if (!len) {
+			/* Copy over a new iovec from user space. */
+			ret = copy_from_user(&iov, u_iov, sizeof(*u_iov));
+			if (ret) {
+				ret = -EINVAL;
+				dev_err(mic_dev(mvdev), "%s %d err %d\n",
+					__func__, __LINE__, ret);
+				break;
+			}
+			len = iov.iov_len;
+			ubuf = iov.iov_base;
+		}
+		/* Issue all the read descriptors first */
+		ret = mic_vringh_copy(mvdev, riov, ubuf, len,
+			MIC_VRINGH_READ, &out_len);
+		if (ret) {
+			dev_err(mic_dev(mvdev), "%s %d err %d\n",
+					__func__, __LINE__, ret);
+			break;
+		}
+		len -= out_len;
+		ubuf += out_len;
+		copy->out_len += out_len;
+		/* Issue the write descriptors next */
+		ret = mic_vringh_copy(mvdev, wiov, ubuf, len,
+			!MIC_VRINGH_READ, &out_len);
+		if (ret) {
+			dev_err(mic_dev(mvdev), "%s %d err %d\n",
+					__func__, __LINE__, ret);
+			break;
+		}
+		len -= out_len;
+		ubuf += out_len;
+		copy->out_len += out_len;
+		if (!len) {
+			/* One user space iovec is now completed */
+			iovcnt--;
+			u_iov++;
+		}
+		/* Exit loop if all elements in KIOVs have been processed. */
+		if (riov->i == riov->used && wiov->i == wiov->used)
+			break;
+	}
+	/*
+	 * Update the used ring if a descriptor was available and some data was
+	 * copied in/out and the user asked for a used ring update.
+	 */
+	if (*head != USHRT_MAX && copy->out_len &&
+		copy->update_used) {
+		s8 db = mvdev->dc->h2c_vdev_db;
+		u32 total = 0;
+
+		/* Determine the total data consumed */
+		total += mic_vringh_iov_consumed(riov);
+		total += mic_vringh_iov_consumed(wiov);
+		vringh_complete_kern(vrh, *head, total);
+		*head = USHRT_MAX;
+		/*
+		 * We need a cookie in vringh to enable use of vringh_notify(..)
+		 * since there is no way to get to MIC specific data structures
+		 * AFAICS using container_of(..) without at least knowing the
+		 * vring index in the notify API.
+		 */
+		if (db != -1 && vringh_need_notify_kern(vrh) > 0)
+			mvdev->mdev->ops->send_intr(mvdev->mdev, db);
+		vringh_kiov_cleanup(riov);
+		vringh_kiov_cleanup(wiov);
+		/* Update avail idx for user space */
+		vr->info->avail_idx = vrh->last_avail_idx;
+	}
+	return ret;
+}
+
+static inline int mic_verify_copy_args(struct mic_vdev *mvdev,
+		struct mic_copy_desc *copy)
+{
+	if (copy->vr_idx >= mvdev->dd->num_vq) {
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, -EINVAL);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* Copy a specified number of virtio descriptors in a chain */
+int mic_virtio_copy_desc(struct mic_vdev *mvdev,
+		struct mic_copy_desc *copy)
+{
+	int err;
+	struct mic_vringh *mvr = &mvdev->mvr[copy->vr_idx];
+
+	err = mic_verify_copy_args(mvdev, copy);
+	if (err)
+		return err;
+
+	mutex_lock(&mvr->vr_mutex);
+	if (!mic_vdevup(mvdev)) {
+		err = -ENODEV;
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, err);
+		goto err;
+	}
+	err = _mic_virtio_copy(mvdev, copy);
+	if (err) {
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, err);
+	}
+err:
+	mutex_unlock(&mvr->vr_mutex);
+	return err;
+}
+
+static void mic_virtio_init_post(struct mic_vdev *mvdev)
+{
+	struct mic_vqconfig *vqconfig = mic_vq_config(mvdev->dd);
+	int i;
+
+	for (i = 0; i < mvdev->dd->num_vq; i++) {
+		if (!le64_to_cpu(vqconfig[i].used_address)) {
+			dev_warn(mic_dev(mvdev), "used_address zero??\n");
+			continue;
+		}
+		mvdev->mvr[i].vrh.vring.used =
+			mvdev->mdev->aper.va +
+			le64_to_cpu(vqconfig[i].used_address);
+	}
+
+	mvdev->dc->used_address_updated = 0;
+
+	dev_info(mic_dev(mvdev), "%s: device type %d LINKUP\n",
+		__func__, mvdev->virtio_id);
+}
+
+static inline void mic_virtio_device_reset(struct mic_vdev *mvdev)
+{
+	int i;
+
+	dev_info(mic_dev(mvdev), "%s: status %d device type %d RESET\n",
+		__func__, mvdev->dd->status, mvdev->virtio_id);
+
+	for (i = 0; i < mvdev->dd->num_vq; i++)
+		/*
+		 * Avoid lockdep false positive. The + 1 is for the mic
+		 * mutex which is held in the reset devices code path.
+		 */
+		mutex_lock_nested(&mvdev->mvr[i].vr_mutex, i + 1);
+
+	/* 0 status means "reset" */
+	mvdev->dd->status = 0;
+	mvdev->dc->vdev_reset = 0;
+	mvdev->dc->host_ack = 1;
+
+	for (i = 0; i < mvdev->dd->num_vq; i++)
+		mvdev->mvr[i].vring.info->avail_idx = 0;
+
+	for (i = 0; i < mvdev->dd->num_vq; i++)
+		mutex_unlock(&mvdev->mvr[i].vr_mutex);
+}
+
+void mic_virtio_reset_devices(struct mic_device *mdev)
+{
+	struct list_head *pos, *tmp;
+	struct mic_vdev *mvdev;
+
+	dev_info(&mdev->pdev->dev, "%s\n",  __func__);
+
+	WARN_ON(!mutex_is_locked(&mdev->mic_mutex));
+	list_for_each_safe(pos, tmp, &mdev->vdev_list) {
+		mvdev = list_entry(pos, struct mic_vdev, list);
+		mic_virtio_device_reset(mvdev);
+		mvdev->poll_wake = 1;
+		wake_up(&mvdev->waitq);
+	}
+}
+
+void mic_bh_handler(struct work_struct *work)
+{
+	struct mic_vdev *mvdev = container_of(work, struct mic_vdev,
+			virtio_bh_work);
+
+	if (mvdev->dc->used_address_updated)
+		mic_virtio_init_post(mvdev);
+
+	if (mvdev->dc->vdev_reset)
+		mic_virtio_device_reset(mvdev);
+
+	mvdev->poll_wake = 1;
+	wake_up(&mvdev->waitq);
+}
+
+static irqreturn_t mic_virtio_intr_handler(int irq, void *data)
+{
+
+	struct mic_vdev *mvdev = data;
+	struct mic_device *mdev = mvdev->mdev;
+
+	mdev->ops->ack_interrupt(mdev);
+	schedule_work(&mvdev->virtio_bh_work);
+	return IRQ_HANDLED;
+}
+
+int mic_virtio_config_change(struct mic_vdev *mvdev,
+			void __user *argp)
+{
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake);
+	int ret = 0, retry = 100, i;
+	struct mic_bootparam *bootparam = mvdev->mdev->dp;
+	s8 db = bootparam->h2c_config_db;
+
+	mutex_lock(&mvdev->mdev->mic_mutex);
+	for (i = 0; i < mvdev->dd->num_vq; i++)
+		mutex_lock_nested(&mvdev->mvr[i].vr_mutex, i + 1);
+
+	if (db == -1 || mvdev->dd->type == -1) {
+		ret = -EIO;
+		goto exit;
+	}
+
+	if (copy_from_user(mic_vq_configspace(mvdev->dd),
+				argp, mvdev->dd->config_len)) {
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, -EFAULT);
+		ret = -EFAULT;
+		goto exit;
+	}
+	mvdev->dc->config_change = MIC_VIRTIO_PARAM_CONFIG_CHANGED;
+	mvdev->mdev->ops->send_intr(mvdev->mdev, db);
+
+	for (i = retry; i--;) {
+		ret = wait_event_timeout(wake,
+			mvdev->dc->guest_ack, msecs_to_jiffies(100));
+		if (ret)
+			break;
+	}
+
+	dev_info(mic_dev(mvdev),
+		"%s %d retry: %d\n", __func__, __LINE__, retry);
+	mvdev->dc->config_change = 0;
+	mvdev->dc->guest_ack = 0;
+exit:
+	for (i = 0; i < mvdev->dd->num_vq; i++)
+		mutex_unlock(&mvdev->mvr[i].vr_mutex);
+	mutex_unlock(&mvdev->mdev->mic_mutex);
+	return ret;
+}
+
+static int mic_copy_dp_entry(struct mic_vdev *mvdev,
+					void __user *argp,
+					__u8 *type,
+					struct mic_device_desc **devpage)
+{
+	struct mic_device *mdev = mvdev->mdev;
+	struct mic_device_desc dd, *dd_config, *devp;
+	struct mic_vqconfig *vqconfig;
+	int ret = 0, i;
+	bool slot_found = false;
+
+	if (copy_from_user(&dd, argp, sizeof(dd))) {
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, -EFAULT);
+		return -EFAULT;
+	}
+
+	if (mic_aligned_desc_size(&dd) > MIC_MAX_DESC_BLK_SIZE
+		|| dd.num_vq > MIC_MAX_VRINGS) {
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, -EINVAL);
+		return -EINVAL;
+	}
+
+	dd_config = kmalloc(mic_desc_size(&dd), GFP_KERNEL);
+	if (dd_config == NULL) {
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, -ENOMEM);
+		return -ENOMEM;
+	}
+	if (copy_from_user(dd_config, argp, mic_desc_size(&dd))) {
+		ret = -EFAULT;
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, ret);
+		goto exit;
+	}
+
+	vqconfig = mic_vq_config(dd_config);
+	for (i = 0; i < dd.num_vq; i++) {
+		if (le16_to_cpu(vqconfig[i].num) > MIC_MAX_VRING_ENTRIES) {
+			ret =  -EINVAL;
+			dev_err(mic_dev(mvdev), "%s %d err %d\n",
+				__func__, __LINE__, ret);
+			goto exit;
+		}
+	}
+
+	/* Find the first free device page entry */
+	for (i = mic_aligned_size(struct mic_bootparam);
+		i < MIC_DP_SIZE - mic_total_desc_size(dd_config);
+		i += mic_total_desc_size(devp)) {
+		devp = mdev->dp + i;
+		if (devp->type == 0 || devp->type == -1) {
+			slot_found = true;
+			break;
+		}
+	}
+	if (!slot_found) {
+		ret =  -EINVAL;
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, ret);
+		goto exit;
+	}
+
+	/* Save off the type before doing the memcpy. Type will be set in the
+	 * end after completing all initialization for the new device */
+	*type = dd_config->type;
+	dd_config->type = 0;
+	memcpy(devp, dd_config, mic_desc_size(dd_config));
+
+	*devpage = devp;
+exit:
+	kfree(dd_config);
+	return ret;
+}
+
+static void mic_init_device_ctrl(struct mic_vdev *mvdev,
+				struct mic_device_desc *devpage)
+{
+	struct mic_device_ctrl *dc;
+
+	dc = mvdev->dc = (void *)devpage + mic_aligned_desc_size(devpage);
+
+	dc->config_change = 0;
+	dc->guest_ack = 0;
+	dc->vdev_reset = 0;
+	dc->host_ack = 0;
+	dc->used_address_updated = 0;
+	dc->c2h_vdev_db = -1;
+	dc->h2c_vdev_db = -1;
+}
+
+int mic_virtio_add_device(struct mic_vdev *mvdev,
+			void __user *argp)
+{
+	struct mic_device *mdev = mvdev->mdev;
+	struct mic_device_desc *dd;
+	struct mic_vqconfig *vqconfig;
+	int vr_size, i, j, ret;
+	u8 type;
+	s8 db;
+	char irqname[10];
+	struct mic_bootparam *bootparam = mdev->dp;
+	u16 num;
+
+	mutex_lock(&mdev->mic_mutex);
+
+	ret = mic_copy_dp_entry(mvdev, argp, &type, &dd);
+	if (ret) {
+		mutex_unlock(&mdev->mic_mutex);
+		return ret;
+	}
+
+	mic_init_device_ctrl(mvdev, dd);
+
+	mvdev->dd = dd;
+	mvdev->virtio_id = type;
+	vqconfig = mic_vq_config(dd);
+	INIT_WORK(&mvdev->virtio_bh_work, mic_bh_handler);
+
+	for (i = 0; i < dd->num_vq; i++) {
+		struct mic_vringh *mvr = &mvdev->mvr[i];
+		struct mic_vring *vr = &mvdev->mvr[i].vring;
+		num = le16_to_cpu(vqconfig[i].num);
+		mutex_init(&mvr->vr_mutex);
+		vr_size = PAGE_ALIGN(vring_size(num, MIC_VIRTIO_RING_ALIGN) +
+			sizeof(struct _mic_vring_info));
+		vr->va = (void *)
+			__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+			get_order(vr_size));
+		if (!vr->va) {
+			ret = -ENOMEM;
+			dev_err(mic_dev(mvdev), "%s %d err %d\n",
+				__func__, __LINE__, ret);
+			goto err;
+		}
+		vr->len = vr_size;
+		vr->info = vr->va + vring_size(num, MIC_VIRTIO_RING_ALIGN);
+		vr->info->magic = MIC_MAGIC + mvdev->virtio_id + i;
+		vqconfig[i].address = mic_map_single(mdev,
+			vr->va, vr_size);
+		if (mic_map_error(vqconfig[i].address)) {
+			free_pages((unsigned long)vr->va,
+				get_order(vr_size));
+			ret = -ENOMEM;
+			dev_err(mic_dev(mvdev), "%s %d err %d\n",
+				__func__, __LINE__, ret);
+			goto err;
+		}
+		vqconfig[i].address = cpu_to_le64(vqconfig[i].address);
+
+		vring_init(&vr->vr, num,
+			vr->va, MIC_VIRTIO_RING_ALIGN);
+		ret = vringh_init_kern(&mvr->vrh,
+			*(u32 *)mic_vq_features(mvdev->dd), num, false,
+			vr->vr.desc, vr->vr.avail, vr->vr.used);
+		if (ret) {
+			dev_err(mic_dev(mvdev), "%s %d err %d\n",
+				__func__, __LINE__, ret);
+			goto err;
+		}
+		vringh_kiov_init(&mvr->riov, NULL, 0);
+		vringh_kiov_init(&mvr->wiov, NULL, 0);
+		mvr->head = USHRT_MAX;
+		dev_dbg(&mdev->pdev->dev,
+			"%s %d index %d va %p info %p vr_size 0x%x\n",
+			__func__, __LINE__, i, vr->va, vr->info, vr_size);
+	}
+
+	snprintf(irqname, sizeof(irqname),
+		"mic%dvirtio%d", mdev->id, mvdev->virtio_id);
+	mvdev->virtio_db = mic_next_db(mdev);
+	mvdev->virtio_cookie = mic_request_irq(mdev, mic_virtio_intr_handler,
+			irqname, mvdev, mvdev->virtio_db, MIC_INTR_DB);
+	if (IS_ERR(mvdev->virtio_cookie)) {
+		ret = PTR_ERR(mvdev->virtio_cookie);
+		dev_dbg(&mdev->pdev->dev, "request irq failed\n");
+		goto err;
+	}
+
+	mvdev->dc->c2h_vdev_db = mvdev->virtio_db;
+
+	list_add_tail(&mvdev->list, &mdev->vdev_list);
+	/*
+	 * Order the type update with previous stores. This write barrier
+	 * is paired with the corresponding read barrier before the uncached
+	 * system memory read of the type, on the card while scanning the
+	 * device page.
+	 */
+	smp_wmb();
+	dd->type = type;
+
+	dev_info(&mdev->pdev->dev, "Added virtio device id %d\n", dd->type);
+
+	db = bootparam->h2c_config_db;
+	if (db != -1)
+		mdev->ops->send_intr(mdev, db);
+	mutex_unlock(&mdev->mic_mutex);
+	return 0;
+err:
+	vqconfig = mic_vq_config(dd);
+	for (j = 0; j < i; j++) {
+		struct mic_vringh *mvr = &mvdev->mvr[j];
+		mic_unmap_single(mdev, le64_to_cpu(vqconfig[j].address),
+				mvr->vring.len);
+		free_pages((unsigned long)mvr->vring.va,
+			get_order(mvr->vring.len));
+	}
+	mutex_unlock(&mdev->mic_mutex);
+	return ret;
+}
+
+void mic_virtio_del_device(struct mic_vdev *mvdev)
+{
+	struct list_head *pos, *tmp;
+	struct mic_vdev *tmp_mvdev;
+	struct mic_device *mdev = mvdev->mdev;
+	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake);
+	int i, ret, retry = 100;
+	struct mic_vqconfig *vqconfig;
+	struct mic_bootparam *bootparam = mdev->dp;
+	s8 db;
+
+	mutex_lock(&mdev->mic_mutex);
+	db = bootparam->h2c_config_db;
+	if (db == -1)
+		goto skip_hot_remove;
+	dev_info(&mdev->pdev->dev,
+		"Requesting hot remove id %d\n", mvdev->virtio_id);
+	mvdev->dc->config_change = MIC_VIRTIO_PARAM_DEV_REMOVE;
+	mdev->ops->send_intr(mdev, db);
+	for (i = retry; i--;) {
+		ret = wait_event_timeout(wake,
+			mvdev->dc->guest_ack, msecs_to_jiffies(100));
+		if (ret)
+			break;
+	}
+	dev_info(&mdev->pdev->dev,
+		"Device id %d config_change %d guest_ack %d\n",
+		mvdev->virtio_id, mvdev->dc->config_change,
+		mvdev->dc->guest_ack);
+	mvdev->dc->config_change = 0;
+	mvdev->dc->guest_ack = 0;
+skip_hot_remove:
+	mic_free_irq(mdev, mvdev->virtio_cookie, mvdev);
+	flush_work(&mvdev->virtio_bh_work);
+	vqconfig = mic_vq_config(mvdev->dd);
+	for (i = 0; i < mvdev->dd->num_vq; i++) {
+		struct mic_vringh *mvr = &mvdev->mvr[i];
+		vringh_kiov_cleanup(&mvr->riov);
+		vringh_kiov_cleanup(&mvr->wiov);
+		mic_unmap_single(mdev, le64_to_cpu(vqconfig[i].address),
+				mvr->vring.len);
+		free_pages((unsigned long)mvr->vring.va,
+			get_order(mvr->vring.len));
+	}
+
+	list_for_each_safe(pos, tmp, &mdev->vdev_list) {
+		tmp_mvdev = list_entry(pos, struct mic_vdev, list);
+		if (tmp_mvdev == mvdev) {
+			list_del(pos);
+			dev_info(&mdev->pdev->dev,
+				"Removing virtio device id %d\n",
+				mvdev->virtio_id);
+			break;
+		}
+	}
+	/*
+	 * Order the type update with previous stores. This write barrier
+	 * is paired with the corresponding read barrier before the uncached
+	 * system memory read of the type, on the card while scanning the
+	 * device page.
+	 */
+	smp_wmb();
+	mvdev->dd->type = -1;
+	mutex_unlock(&mdev->mic_mutex);
+}
diff --git a/drivers/misc/mic/host/mic_virtio.h b/drivers/misc/mic/host/mic_virtio.h
new file mode 100644
index 0000000..39741ff
--- /dev/null
+++ b/drivers/misc/mic/host/mic_virtio.h
@@ -0,0 +1,139 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#ifndef MIC_VIRTIO_H
+#define MIC_VIRTIO_H
+
+#include <linux/types.h>
+#include <linux/virtio_ring.h>
+#include <linux/virtio_config.h>
+
+#include <linux/mic_ioctl.h>
+
+/*
+ * Note on endianness.
+ * 1. Host can be both BE or LE
+ * 2. Guest/card is LE. Host uses le_to_cpu to access desc/avail
+ *    rings and ioreadXX/iowriteXX to access used ring.
+ * 3. Device page exposed by host to guest contains LE values. Guest
+ *    accesses these using ioreadXX/iowriteXX etc. This way in general we
+ *    obey the virtio spec according to which guest works with native
+ *    endianness and host is aware of guest endianness and does all
+ *    required endianness conversion.
+ * 4. Data provided from user space to guest (in ADD_DEVICE and
+ *    CONFIG_CHANGE ioctl's) is not interpreted by the driver and should be
+ *    in guest endianness.
+ */
+
+/**
+ * struct mic_vringh - Virtio ring host information.
+ *
+ * @vring: The MIC vring used for setting up user space mappings.
+ * @vrh: The host VRINGH used for accessing the card vrings.
+ * @riov: The VRINGH read kernel IOV.
+ * @wiov: The VRINGH write kernel IOV.
+ * @head: The VRINGH head index address passed to vringh_getdesc_kern(..).
+ * @vr_mutex: Mutex for synchronizing access to the VRING.
+ */
+struct mic_vringh {
+	struct mic_vring vring;
+	struct vringh vrh;
+	struct vringh_kiov riov;
+	struct vringh_kiov wiov;
+	u16 head;
+	struct mutex vr_mutex;
+};
+
+/**
+ * struct mic_vdev - Host information for a card Virtio device.
+ *
+ * @virtio_id - Virtio device id.
+ * @waitq - Waitqueue to allow ring3 apps to poll.
+ * @mdev - Back pointer to host MIC device.
+ * @poll_wake - Used for waking up threads blocked in poll.
+ * @out_bytes - Debug stats for number of bytes copied from host to card.
+ * @in_bytes - Debug stats for number of bytes copied from card to host.
+ * @mvr - Store per VRING data structures.
+ * @virtio_bh_work - Work struct used to schedule virtio bottom half handling.
+ * @dd - Virtio device descriptor.
+ * @dc - Virtio device control fields.
+ * @list - List of Virtio devices.
+ * @virtio_db - The doorbell used by the card to interrupt the host.
+ * @virtio_cookie - The cookie returned while requesting interrupts.
+ */
+struct mic_vdev {
+	int virtio_id;
+	wait_queue_head_t waitq;
+	struct mic_device *mdev;
+	int poll_wake;
+	unsigned long out_bytes;
+	unsigned long in_bytes;
+	struct mic_vringh mvr[MIC_MAX_VRINGS];
+	struct work_struct virtio_bh_work;
+	struct mic_device_desc *dd;
+	struct mic_device_ctrl *dc;
+	struct list_head list;
+	int virtio_db;
+	struct mic_irq *virtio_cookie;
+};
+
+void mic_virtio_uninit(struct mic_device *mdev);
+int mic_virtio_add_device(struct mic_vdev *mvdev,
+			void __user *argp);
+void mic_virtio_del_device(struct mic_vdev *mvdev);
+int mic_virtio_config_change(struct mic_vdev *mvdev,
+			void __user *argp);
+int mic_virtio_copy_desc(struct mic_vdev *mvdev,
+	struct mic_copy_desc *request);
+void mic_virtio_reset_devices(struct mic_device *mdev);
+void mic_bh_handler(struct work_struct *work);
+
+/* Helper API to obtain the MIC PCIe device */
+static inline struct device *mic_dev(struct mic_vdev *mvdev)
+{
+	return &mvdev->mdev->pdev->dev;
+}
+
+/* Helper API to check if a virtio device is initialized */
+static inline int mic_vdev_inited(struct mic_vdev *mvdev)
+{
+	/* Device has not been created yet */
+	if (!mvdev->dd || !mvdev->dd->type) {
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, -EINVAL);
+		return -EINVAL;
+	}
+
+	/* Device has been removed/deleted */
+	if (mvdev->dd->type == -1) {
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, -ENODEV);
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+/* Helper API to check if a virtio device is running */
+static inline bool mic_vdevup(struct mic_vdev *mvdev)
+{
+	return !!mvdev->dd->status;
+}
+#endif
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 8f985dd..1579aab 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -240,6 +240,7 @@ header-y += mei.h
 header-y += mempolicy.h
 header-y += meye.h
 header-y += mic_common.h
+header-y += mic_ioctl.h
 header-y += mii.h
 header-y += minix_fs.h
 header-y += mman.h
diff --git a/include/uapi/linux/mic_common.h b/include/uapi/linux/mic_common.h
index a9091e5..1bde039 100644
--- a/include/uapi/linux/mic_common.h
+++ b/include/uapi/linux/mic_common.h
@@ -21,7 +21,60 @@
 #ifndef __MIC_COMMON_H_
 #define __MIC_COMMON_H_
 
-#include <linux/types.h>
+#include <linux/virtio_ring.h>
+
+#ifndef __KERNEL__
+#define ALIGN(a, x)	(((a) + (x) - 1) & ~((x) - 1))
+#define __aligned(x)	__attribute__ ((aligned(x)))
+#endif
+
+#define mic_aligned_size(x) ALIGN(sizeof(x), 8)
+
+/**
+ * struct mic_device_desc: Virtio device information shared between the
+ * virtio driver and userspace backend
+ *
+ * @type: Device type: console/network/disk etc.  Type 0/-1 terminates.
+ * @num_vq: Number of virtqueues.
+ * @feature_len: Number of bytes of feature bits.  Multiply by 2: one for
+   host features and one for guest acknowledgements.
+ * @config_len: Number of bytes of the config array after virtqueues.
+ * @status: A status byte, written by the Guest.
+ * @config: Start of the following variable length config.
+ */
+struct mic_device_desc {
+	__s8 type;
+	__u8 num_vq;
+	__u8 feature_len;
+	__u8 config_len;
+	__u8 status;
+	__u64 config[0];
+} __aligned(8);
+
+/**
+ * struct mic_device_ctrl: Per virtio device information in the device page
+ * used internally by the host and card side drivers.
+ *
+ * @vdev: Used for storing MIC vdev information by the guest.
+ * @config_change: Set to 1 by host when a config change is requested.
+ * @vdev_reset: Set to 1 by guest to indicate virtio device has been reset.
+ * @guest_ack: Set to 1 by guest to ack a command.
+ * @host_ack: Set to 1 by host to ack a command.
+ * @used_address_updated: Set to 1 by guest when the used address should be
+ * updated.
+ * @c2h_vdev_db: The doorbell number to be used by guest. Set by host.
+ * @h2c_vdev_db: The doorbell number to be used by host. Set by guest.
+ */
+struct mic_device_ctrl {
+	__u64 vdev;
+	__u8 config_change;
+	__u8 vdev_reset;
+	__u8 guest_ack;
+	__u8 host_ack;
+	__u8 used_address_updated;
+	__s8 c2h_vdev_db;
+	__s8 h2c_vdev_db;
+} __aligned(8);
 
 /**
  * struct mic_bootparam: Virtio device independent information in device page
@@ -42,6 +95,115 @@ struct mic_bootparam {
 	__u8 shutdown_card;
 } __aligned(8);
 
+/**
+ * struct mic_device_page: High level representation of the device page
+ *
+ * @bootparam: The bootparam structure is used for sharing information and
+ * status updates between MIC host and card drivers.
+ * @desc: Array of MIC virtio device descriptors.
+ */
+struct mic_device_page {
+	struct mic_bootparam bootparam;
+	struct mic_device_desc desc[0];
+};
+/**
+ * struct mic_vqconfig: This is how we expect the device configuration field
+ * for a virtqueue to be laid out in config space.
+ *
+ * @address: Guest/MIC physical address of the virtio ring
+ * (avail and desc rings)
+ * @used_address: Guest/MIC physical address of the used ring
+ * @num: The number of entries in the virtio_ring
+ */
+struct mic_vqconfig {
+	__u64 address;
+	__u64 used_address;
+	__u16 num;
+} __aligned(8);
+
+/* The alignment to use between consumer and producer parts of vring.
+ * This is pagesize for historical reasons. */
+#define MIC_VIRTIO_RING_ALIGN		4096
+
+#define MIC_MAX_VRINGS			4
+#define MIC_VRING_ENTRIES		128
+
+/*
+ * Max vring entries (power of 2) to ensure desc and avail rings
+ * fit in a single page
+ */
+#define MIC_MAX_VRING_ENTRIES		128
+
+/**
+ * Max size of the desc block in bytes: includes:
+ *	- struct mic_device_desc
+ *	- struct mic_vqconfig (num_vq of these)
+ *	- host and guest features
+ *	- virtio device config space
+ */
+#define MIC_MAX_DESC_BLK_SIZE		256
+
+/**
+ * struct _mic_vring_info - Host vring info exposed to userspace backend
+ * for the avail index and magic for the card.
+ *
+ * @avail_idx: host avail idx
+ * @magic: A magic debug cookie.
+ */
+struct _mic_vring_info {
+	__u16 avail_idx;
+	int magic;
+};
+
+/**
+ * struct mic_vring - Vring information.
+ *
+ * @vr: The virtio ring.
+ * @info: Host vring information exposed to the userspace backend for the
+ * avail index and magic for the card.
+ * @va: The va for the buffer allocated for vr and info.
+ * @len: The length of the buffer required for allocating vr and info.
+ */
+struct mic_vring {
+	struct vring vr;
+	struct _mic_vring_info *info;
+	void *va;
+	int len;
+};
+
+#define mic_aligned_desc_size(d) ALIGN(mic_desc_size(d), 8)
+
+#ifndef INTEL_MIC_CARD
+static inline unsigned mic_desc_size(const struct mic_device_desc *desc)
+{
+	return mic_aligned_size(*desc)
+		+ desc->num_vq * mic_aligned_size(struct mic_vqconfig)
+		+ desc->feature_len * 2
+		+ desc->config_len;
+}
+
+static inline struct mic_vqconfig *
+mic_vq_config(const struct mic_device_desc *desc)
+{
+	return (struct mic_vqconfig *)(desc + 1);
+}
+
+static inline __u8 *mic_vq_features(const struct mic_device_desc *desc)
+{
+	return (__u8 *)(mic_vq_config(desc) + desc->num_vq);
+}
+
+static inline __u8 *mic_vq_configspace(const struct mic_device_desc *desc)
+{
+	return mic_vq_features(desc) + desc->feature_len * 2;
+}
+static inline unsigned mic_total_desc_size(struct mic_device_desc *desc)
+{
+	return mic_aligned_desc_size(desc) +
+		mic_aligned_size(struct mic_device_ctrl);
+}
+#endif
+
 /* Device page size */
 #define MIC_DP_SIZE 4096
 
diff --git a/include/uapi/linux/mic_ioctl.h b/include/uapi/linux/mic_ioctl.h
new file mode 100644
index 0000000..b9ed19f
--- /dev/null
+++ b/include/uapi/linux/mic_ioctl.h
@@ -0,0 +1,76 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC Host driver.
+ *
+ */
+#ifndef _MIC_IOCTL_H_
+#define _MIC_IOCTL_H_
+
+#include <linux/mic_common.h>
+
+/*
+ * mic_copy - MIC virtio descriptor copy.
+ *
+ * @iov: An array of IOVEC structures containing user space buffers.
+ * @iovcnt: Number of IOVEC structures in iov.
+ * @vr_idx: The vring index.
+ * @update_used: A non zero value results in used index being updated.
+ * @out_len: The aggregate of the total length written to or read from
+ *	the virtio device.
+ */
+struct mic_copy_desc {
+#ifdef __KERNEL__
+	struct iovec __user *iov;
+#else
+	struct iovec *iov;
+#endif
+	int iovcnt;
+	__u8 vr_idx;
+	__u8 update_used;
+	__u32 out_len;
+};
+
+/*
+ * Add a new virtio device
+ * The (struct mic_device_desc *) pointer points to a device page entry
+ *	for the virtio device consisting of:
+ *	- struct mic_device_desc
+ *	- struct mic_vqconfig (num_vq of these)
+ *	- host and guest features
+ *	- virtio device config space
+ * The total size referenced by the pointer should equal the size returned
+ * by desc_size() in mic_common.h
+ */
+#define MIC_VIRTIO_ADD_DEVICE _IOWR('s', 1, struct mic_device_desc *)
+
+/*
+ * Copy the number of entries in the iovec and update the used index
+ * if requested by the user.
+ */
+#define MIC_VIRTIO_COPY_DESC	_IOWR('s', 2, struct mic_copy_desc *)
+
+/*
+ * Notify virtio device of a config change
+ * The (__u8 *) pointer points to config space values for the device
+ * as they should be written into the device page. The total size
+ * referenced by the pointer should equal the config_len field of struct
+ * mic_device_desc.
+ */
+#define MIC_VIRTIO_CONFIG_CHANGE _IOWR('s', 5, __u8 *)
+
+#endif
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 6/7] Intel MIC Card Driver Changes for Virtio Devices.
  2013-08-08  3:04 [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors Sudeep Dutt
                   ` (4 preceding siblings ...)
  2013-08-08  3:04 ` [PATCH v2 5/7] Intel MIC Host Driver Changes for Virtio Devices Sudeep Dutt
@ 2013-08-08  3:04 ` Sudeep Dutt
  2013-08-08  3:04 ` [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon Sudeep Dutt
  6 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-08  3:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell,
	Michael S. Tsirkin, Rob Landley, linux-kernel, virtualization,
	linux-doc
  Cc: Harshavardhan R Kharche, Peter P Waskiewicz Jr,
	Yaozu (Eddie) Dong, Sudeep Dutt, Ashutosh Dixit, AsiasHeasias,
	Caz Yokoyama, Dasaratharaman Chandramouli

From: Ashutosh Dixit <ashutosh.dixit@intel.com>

This patch introduces the card "Virtio over PCIe" interface for
Intel MIC. It allows virtio drivers on the card to communicate with their
user space backends on the host via a device page. Ring 3 apps on the host
can add, remove and configure virtio devices. A thin MIC specific
virtio_config_ops is implemented which is borrowed heavily from previous
similar implementations in lguest and s390 @
drivers/lguest/lguest_device.c
drivers/s390/kvm/kvm_virtio.c

Co-author: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
---
 drivers/misc/mic/Kconfig           |   1 +
 drivers/misc/mic/card/Makefile     |   1 +
 drivers/misc/mic/card/mic_device.c |   7 +
 drivers/misc/mic/card/mic_virtio.c | 647 +++++++++++++++++++++++++++++++++++++
 drivers/misc/mic/card/mic_virtio.h |  74 +++++
 5 files changed, 730 insertions(+)
 create mode 100644 drivers/misc/mic/card/mic_virtio.c
 create mode 100644 drivers/misc/mic/card/mic_virtio.h

diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig
index 01f1a4a..d453768 100644
--- a/drivers/misc/mic/Kconfig
+++ b/drivers/misc/mic/Kconfig
@@ -24,6 +24,7 @@ comment "Intel MIC Card Driver"
 config INTEL_MIC_CARD
 	tristate "Intel MIC Card Driver"
 	depends on 64BIT
+	select VIRTIO
 	default N
 	help
 	  This enables card driver support for the Intel Many Integrated
diff --git a/drivers/misc/mic/card/Makefile b/drivers/misc/mic/card/Makefile
index 6e9675e..69d58be 100644
--- a/drivers/misc/mic/card/Makefile
+++ b/drivers/misc/mic/card/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_INTEL_MIC_CARD) += mic_card.o
 mic_card-y += mic_x100.o
 mic_card-y += mic_device.o
 mic_card-y += mic_debugfs.o
+mic_card-y += mic_virtio.o
diff --git a/drivers/misc/mic/card/mic_device.c b/drivers/misc/mic/card/mic_device.c
index b186445..b49a7fc 100644
--- a/drivers/misc/mic/card/mic_device.c
+++ b/drivers/misc/mic/card/mic_device.c
@@ -31,6 +31,7 @@
 
 #include "mic_common.h"
 #include "mic_debugfs.h"
+#include "mic_virtio.h"
 
 static struct mic_driver *g_drv;
 static struct mic_irq *shutdown_cookie;
@@ -265,10 +266,15 @@ int __init mic_driver_init(struct mic_driver *mdrv)
 	rc = mic_shutdown_init();
 	if (rc)
 		goto irq_uninit;
+	rc = mic_devices_init(mdrv);
+	if (rc)
+		goto shutdown_uninit;
 	mic_create_card_debug_dir(mdrv);
 	atomic_notifier_chain_register(&panic_notifier_list, &mic_panic);
 done:
 	return rc;
+shutdown_uninit:
+	mic_shutdown_uninit();
 irq_uninit:
 	mic_uninit_irq();
 dp_uninit:
@@ -286,6 +292,7 @@ put:
 void mic_driver_uninit(struct mic_driver *mdrv)
 {
 	mic_delete_card_debug_dir(mdrv);
+	mic_devices_uninit(mdrv);
 	/*
 	 * Inform the host about the shutdown status i.e. poweroff/restart etc.
 	 * The module cannot be unloaded so the only code path to call
diff --git a/drivers/misc/mic/card/mic_virtio.c b/drivers/misc/mic/card/mic_virtio.c
new file mode 100644
index 0000000..ebb0477
--- /dev/null
+++ b/drivers/misc/mic/card/mic_virtio.c
@@ -0,0 +1,647 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Adapted from:
+ *
+ * virtio for kvm on s390
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2 only)
+ * as published by the Free Software Foundation.
+ *
+ *    Author(s): Christian Borntraeger <borntraeger@de.ibm.com>
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/kernel_stat.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/err.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/slab.h>
+#include <linux/virtio_console.h>
+#include <linux/interrupt.h>
+#include <linux/virtio_ring.h>
+#include <linux/pfn.h>
+#include <linux/delay.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/io.h>
+#include "mic_common.h"
+#include "mic_virtio.h"
+
+#define VIRTIO_SUBCODE_64 0x0D00
+
+#define MIC_MAX_VRINGS                4
+
+struct mic_vdev {
+	struct virtio_device vdev;
+	struct mic_device_desc __iomem *desc;
+	struct mic_device_ctrl __iomem *dc;
+	struct mic_device *mdev;
+	void __iomem *vr[MIC_MAX_VRINGS];
+	int used_size[MIC_MAX_VRINGS];
+	struct completion reset_done;
+	struct mic_irq *virtio_cookie;
+	int c2h_vdev_db;
+};
+
+static struct mic_irq *virtio_config_cookie;
+#define to_micvdev(vd) container_of(vd, struct mic_vdev, vdev)
+
+/* Helper API to obtain the parent of the virtio device */
+static inline struct device *mic_dev(struct mic_vdev *mvdev)
+{
+	return mvdev->vdev.dev.parent;
+}
+
+/* This gets the device's feature bits. */
+static u32 mic_get_features(struct virtio_device *vdev)
+{
+	unsigned int i, bits;
+	u32 features = 0;
+	struct mic_device_desc __iomem *desc = to_micvdev(vdev)->desc;
+	u8 __iomem *in_features = mic_vq_features(desc);
+	int feature_len = ioread8(&desc->feature_len);
+
+	bits = min_t(unsigned, feature_len,
+		sizeof(vdev->features)) * 8;
+	for (i = 0; i < bits; i++)
+		if (ioread8(&in_features[i / 8]) & (BIT(i % 8)))
+			features |= BIT(i);
+
+	return features;
+}
+
+static void mic_finalize_features(struct virtio_device *vdev)
+{
+	unsigned int i, bits;
+	struct mic_device_desc __iomem *desc = to_micvdev(vdev)->desc;
+	u8 feature_len = ioread8(&desc->feature_len);
+	/* Second half of bitmap is features we accept. */
+	u8 __iomem *out_features =
+		mic_vq_features(desc) + feature_len;
+
+	/* Give virtio_ring a chance to accept features. */
+	vring_transport_features(vdev);
+
+	memset_io(out_features, 0, feature_len);
+	bits = min_t(unsigned, feature_len,
+		sizeof(vdev->features)) * 8;
+	for (i = 0; i < bits; i++) {
+		if (test_bit(i, vdev->features))
+			iowrite8(ioread8(&out_features[i / 8]) | (1 << (i % 8)),
+				&out_features[i / 8]);
+	}
+}
+
+/*
+ * Reading and writing elements in config space
+ */
+static void mic_get(struct virtio_device *vdev, unsigned int offset,
+		   void *buf, unsigned len)
+{
+	struct mic_device_desc __iomem *desc = to_micvdev(vdev)->desc;
+
+	if (offset + len > ioread8(&desc->config_len))
+		return;
+	memcpy_fromio(buf, mic_vq_configspace(desc) + offset, len);
+}
+
+static void mic_set(struct virtio_device *vdev, unsigned int offset,
+		   const void *buf, unsigned len)
+{
+	struct mic_device_desc __iomem *desc = to_micvdev(vdev)->desc;
+
+	if (offset + len > ioread8(&desc->config_len))
+		return;
+	memcpy_toio(mic_vq_configspace(desc) + offset, buf, len);
+}
+
+/*
+ * The operations to get and set the status word just access the status
+ * field of the device descriptor. set_status also interrupts the host
+ * to tell about status changes.
+ */
+static u8 mic_get_status(struct virtio_device *vdev)
+{
+	return ioread8(&to_micvdev(vdev)->desc->status);
+}
+
+static void mic_set_status(struct virtio_device *vdev, u8 status)
+{
+	struct mic_vdev *mvdev = to_micvdev(vdev);
+	if (!status)
+		return;
+	iowrite8(status, &mvdev->desc->status);
+	mic_send_intr(mvdev->mdev, mvdev->c2h_vdev_db);
+}
+
+/* Inform host on a virtio device reset and wait for ack from host */
+static void mic_reset_inform_host(struct virtio_device *vdev)
+{
+	struct mic_vdev *mvdev = to_micvdev(vdev);
+	struct mic_device_ctrl __iomem *dc = mvdev->dc;
+	int retry = 100, i;
+
+	iowrite8(0, &dc->host_ack);
+	iowrite8(1, &dc->vdev_reset);
+	mic_send_intr(mvdev->mdev, mvdev->c2h_vdev_db);
+
+	/* Wait till host completes all card accesses and acks the reset */
+	for (i = retry; i--;) {
+		if (ioread8(&dc->host_ack))
+			break;
+		msleep(100);
+	};
+
+	dev_info(mic_dev(mvdev), "%s: retry: %d\n", __func__, retry);
+
+	/* Reset status to 0 in case we timed out */
+	iowrite8(0, &mvdev->desc->status);
+}
+
+static void mic_reset(struct virtio_device *vdev)
+{
+	struct mic_vdev *mvdev = to_micvdev(vdev);
+
+	dev_info(mic_dev(mvdev), "%s: virtio id %d\n",
+		__func__, vdev->id.device);
+
+	mic_reset_inform_host(vdev);
+	complete_all(&mvdev->reset_done);
+}
+
+/*
+ * The virtio_ring code calls this API when it wants to notify the Host.
+ */
+static void mic_notify(struct virtqueue *vq)
+{
+	struct mic_vdev *mvdev = vq->priv;
+
+	mic_send_intr(mvdev->mdev, mvdev->c2h_vdev_db);
+}
+
+static void mic_del_vq(struct virtqueue *vq, int n)
+{
+	struct mic_vdev *mvdev = to_micvdev(vq->vdev);
+	struct vring *vr = (struct vring *) (vq + 1);
+
+	free_pages((unsigned long) vr->used,
+		get_order(mvdev->used_size[n]));
+	vring_del_virtqueue(vq);
+	mic_card_unmap(mvdev->mdev, mvdev->vr[n]);
+	mvdev->vr[n] = NULL;
+}
+
+static void mic_del_vqs(struct virtio_device *vdev)
+{
+	struct mic_vdev *mvdev = to_micvdev(vdev);
+	struct virtqueue *vq, *n;
+	int idx = 0;
+
+	dev_info(mic_dev(mvdev), "%s\n", __func__);
+
+	list_for_each_entry_safe(vq, n, &vdev->vqs, list)
+		mic_del_vq(vq, idx++);
+}
+
+/*
+ * This routine will assign vring's allocated in host/io memory. Code in
+ * virtio_ring.c however continues to access this io memory as if it were local
+ * memory without io accessors.
+ */
+static struct virtqueue *mic_find_vq(struct virtio_device *vdev,
+				     unsigned index,
+				     void (*callback)(struct virtqueue *vq),
+				     const char *name)
+{
+	struct mic_vdev *mvdev = to_micvdev(vdev);
+	struct mic_vqconfig __iomem *vqconfig;
+	struct mic_vqconfig config;
+	struct virtqueue *vq;
+	void __iomem *va;
+	struct _mic_vring_info __iomem *info;
+	void *used;
+	int vr_size, _vr_size, err, magic;
+	struct vring *vr;
+	u8 type = ioread8(&mvdev->desc->type);
+
+	if (index >= ioread8(&mvdev->desc->num_vq))
+		return ERR_PTR(-ENOENT);
+
+	if (!name)
+		return ERR_PTR(-ENOENT);
+
+	/* First assign the vring's allocated in host memory */
+	vqconfig = mic_vq_config(mvdev->desc) + index;
+	memcpy_fromio(&config, vqconfig, sizeof(config));
+	_vr_size = vring_size(config.num, MIC_VIRTIO_RING_ALIGN);
+	vr_size = PAGE_ALIGN(_vr_size + sizeof(struct _mic_vring_info));
+	va = mic_card_map(mvdev->mdev, config.address, vr_size);
+	if (!va)
+		return ERR_PTR(-ENOMEM);
+	mvdev->vr[index] = va;
+	memset_io(va, 0x0, _vr_size);
+	vq = vring_new_virtqueue(index,
+				config.num, MIC_VIRTIO_RING_ALIGN, vdev,
+				false,
+				va, mic_notify, callback, name);
+	if (!vq) {
+		err = -ENOMEM;
+		goto unmap;
+	}
+	info = va + _vr_size;
+	magic = ioread32(&info->magic);
+	dev_info(mic_dev(mvdev),
+		"%s: magic 0x%x type 0x%x index 0x%x expected 0x%x\n",
+		__func__, magic, type, index, MIC_MAGIC + type + index);
+
+	if (WARN(magic != MIC_MAGIC + type + index, "magic mismatch")) {
+		err = -EIO;
+		goto unmap;
+	}
+
+	/* Allocate and reassign used ring now */
+	mvdev->used_size[index] = PAGE_ALIGN(sizeof(__u16) * 3 +
+			sizeof(struct vring_used_elem) * config.num);
+	used = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+				get_order(mvdev->used_size[index]));
+	if (!used) {
+		err = -ENOMEM;
+		dev_err(mic_dev(mvdev), "%s %d err %d\n",
+			__func__, __LINE__, err);
+		goto del_vq;
+	}
+	iowrite64(virt_to_phys(used), &vqconfig->used_address);
+
+	/*
+	 * To reassign the used ring here we are directly accessing
+	 * struct vring_virtqueue which is a private data structure
+	 * in virtio_ring.c. At the minimum, a BUILD_BUG_ON() in
+	 * vring_new_virtqueue() would ensure that
+	 *  (&vq->vring == (struct vring *) (&vq->vq + 1));
+	 */
+	vr = (struct vring *) (vq + 1);
+	vr->used = used;
+
+	vq->priv = mvdev;
+	return vq;
+del_vq:
+	vring_del_virtqueue(vq);
+unmap:
+	mic_card_unmap(mvdev->mdev, mvdev->vr[index]);
+	return ERR_PTR(err);
+}
+
+static int mic_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+			struct virtqueue *vqs[],
+			vq_callback_t *callbacks[],
+			const char *names[])
+{
+	struct mic_vdev *mvdev = to_micvdev(vdev);
+	struct mic_device_ctrl __iomem *dc = mvdev->dc;
+	int i, err, retry = 100;
+
+	/* We must have this many virtqueues. */
+	if (nvqs > ioread8(&mvdev->desc->num_vq))
+		return -ENOENT;
+
+	for (i = 0; i < nvqs; ++i) {
+		dev_info(mic_dev(mvdev), "%s: %d: %s\n",
+			__func__, i, names[i]);
+		vqs[i] = mic_find_vq(vdev, i, callbacks[i], names[i]);
+		if (IS_ERR(vqs[i])) {
+			err = PTR_ERR(vqs[i]);
+			goto error;
+		}
+	}
+
+	iowrite8(1, &dc->used_address_updated);
+
+	/* Send an interrupt to the host to inform it that used rings have
+	 * been re-assigned */
+	mic_send_intr(mvdev->mdev, mvdev->c2h_vdev_db);
+	for (i = retry; i--;) {
+		if (!ioread8(&dc->used_address_updated))
+			break;
+		msleep(100);
+	};
+
+	dev_info(mic_dev(mvdev), "%s: retry: %d\n", __func__, retry);
+	if (!retry) {
+		err = -ENODEV;
+		goto error;
+	}
+
+	return 0;
+error:
+	mic_del_vqs(vdev);
+	return err;
+}
+
+/*
+ * The config ops structure as defined by virtio config
+ */
+static struct virtio_config_ops mic_vq_config_ops = {
+	.get_features = mic_get_features,
+	.finalize_features = mic_finalize_features,
+	.get = mic_get,
+	.set = mic_set,
+	.get_status = mic_get_status,
+	.set_status = mic_set_status,
+	.reset = mic_reset,
+	.find_vqs = mic_find_vqs,
+	.del_vqs = mic_del_vqs,
+};
+
+static irqreturn_t
+mic_virtio_intr_handler(int irq, void *data)
+{
+	struct mic_vdev *mvdev = data;
+	struct virtqueue *vq;
+
+	mic_ack_interrupt(mvdev->mdev);
+	list_for_each_entry(vq, &mvdev->vdev.vqs, list)
+		vring_interrupt(0, vq);
+
+	return IRQ_HANDLED;
+}
+
+static void mic_virtio_release_dev(struct device *_d)
+{
+	/*
+	 * No need for a release method similar to virtio PCI.
+	 * Provide an empty one to avoid getting a warning from core.
+	 */
+}
+
+/*
+ * adds a new device and register it with virtio
+ * appropriate drivers are loaded by the device model
+ */
+static int mic_add_device(struct mic_device_desc __iomem *d,
+	unsigned int offset, struct mic_driver *mdrv)
+{
+	struct mic_vdev *mvdev;
+	int ret;
+	int virtio_db;
+	u8 type = ioread8(&d->type);
+
+	mvdev = kzalloc(sizeof(*mvdev), GFP_KERNEL);
+	if (!mvdev) {
+		dev_err(mdrv->dev, "Cannot allocate mic dev %u type %u\n",
+			offset, type);
+		return -ENOMEM;
+	}
+
+	mvdev->mdev = &mdrv->mdev;
+	mvdev->vdev.dev.parent = mdrv->dev;
+	mvdev->vdev.dev.release = mic_virtio_release_dev;
+	mvdev->vdev.id.device = type;
+	mvdev->vdev.config = &mic_vq_config_ops;
+	mvdev->desc = d;
+	mvdev->dc = (void __iomem *)d + mic_aligned_desc_size(d);
+	init_completion(&mvdev->reset_done);
+
+	virtio_db = mic_next_card_db();
+	mvdev->virtio_cookie = mic_request_card_irq(mic_virtio_intr_handler,
+			"virtio intr", mvdev, virtio_db);
+	if (IS_ERR(mvdev->virtio_cookie)) {
+		ret = PTR_ERR(mvdev->virtio_cookie);
+		goto kfree;
+	}
+	iowrite8((u8)virtio_db, &mvdev->dc->h2c_vdev_db);
+	mvdev->c2h_vdev_db = ioread8(&mvdev->dc->c2h_vdev_db);
+
+	ret = register_virtio_device(&mvdev->vdev);
+	if (ret) {
+		dev_err(mic_dev(mvdev),
+			"Failed to register mic device %u type %u\n",
+			offset, type);
+		goto free_irq;
+	}
+	iowrite64((u64)mvdev, &mvdev->dc->vdev);
+	dev_info(mic_dev(mvdev), "%s: registered mic device %u type %u mvdev %p\n",
+		__func__, offset, type, mvdev);
+
+	return 0;
+
+free_irq:
+	mic_free_card_irq(mvdev->virtio_cookie, mvdev);
+kfree:
+	kfree(mvdev);
+	return ret;
+}
+
+/*
+ * match for a mic device with a specific desc pointer
+ */
+static int mic_match_desc(struct device *dev, void *data)
+{
+	struct virtio_device *vdev = dev_to_virtio(dev);
+	struct mic_vdev *mvdev = to_micvdev(vdev);
+
+	return mvdev->desc == (void __iomem *)data;
+}
+
+static void mic_handle_config_change(struct mic_device_desc __iomem *d,
+	unsigned int offset, struct mic_driver *mdrv)
+{
+	struct mic_device_ctrl __iomem *dc
+		= (void __iomem *)d + mic_aligned_desc_size(d);
+	struct mic_vdev *mvdev = (struct mic_vdev *)ioread64(&dc->vdev);
+	struct virtio_driver *drv;
+
+	if (ioread8(&dc->config_change) != MIC_VIRTIO_PARAM_CONFIG_CHANGED)
+		return;
+
+	dev_info(mdrv->dev, "%s %d\n", __func__, __LINE__);
+	drv = container_of(mvdev->vdev.dev.driver,
+				struct virtio_driver, driver);
+	if (drv->config_changed)
+		drv->config_changed(&mvdev->vdev);
+	iowrite8(1, &dc->guest_ack);
+}
+
+/*
+ * removes a virtio device if a hot remove event has been
+ * requested by the host.
+ */
+static int mic_remove_device(struct mic_device_desc __iomem *d,
+	unsigned int offset, struct mic_driver *mdrv)
+{
+	struct mic_device_ctrl __iomem *dc
+		= (void __iomem *)d + mic_aligned_desc_size(d);
+	struct mic_vdev *mvdev = (struct mic_vdev *)ioread64(&dc->vdev);
+	u8 status;
+	int ret = -1;
+
+	if (ioread8(&dc->config_change) == MIC_VIRTIO_PARAM_DEV_REMOVE) {
+		dev_info(mdrv->dev,
+			"%s %d config_change %d type %d mvdev %p\n",
+			__func__, __LINE__,
+			ioread8(&dc->config_change), ioread8(&d->type), mvdev);
+
+		status = ioread8(&d->status);
+		INIT_COMPLETION(mvdev->reset_done);
+		unregister_virtio_device(&mvdev->vdev);
+		mic_free_card_irq(mvdev->virtio_cookie, mvdev);
+		if (status & VIRTIO_CONFIG_S_DRIVER_OK)
+			wait_for_completion(&mvdev->reset_done);
+		kfree(mvdev);
+		iowrite8(1, &dc->guest_ack);
+		dev_info(mdrv->dev, "%s %d guest_ack %d\n",
+			__func__, __LINE__, ioread8(&dc->guest_ack));
+		ret = 0;
+	}
+
+	return ret;
+}
+
+#define REMOVE_DEVICES true
+
+static void mic_scan_devices(struct mic_driver *mdrv, bool remove)
+{
+	s8 type;
+	unsigned int i;
+	struct mic_device_desc __iomem *d;
+	struct mic_device_ctrl __iomem *dc;
+	struct device *dev;
+	int ret;
+
+	for (i = mic_aligned_size(struct mic_bootparam);
+		i < MIC_DP_SIZE; i += mic_total_desc_size(d)) {
+		d = mdrv->dp + i;
+		dc = (void __iomem *)d + mic_aligned_desc_size(d);
+		/*
+		 * This read barrier is paired with the corresponding write
+		 * barrier on the host which is inserted before adding or
+		 * removing a virtio device descriptor, by updating the type.
+		 */
+		smp_rmb();
+		type = ioread8(&d->type);
+
+		/* end of list */
+		if (type == 0)
+			break;
+
+		if (type == -1)
+			continue;
+
+		/* device already exists */
+		dev = device_find_child(mdrv->dev, d, mic_match_desc);
+		if (dev) {
+			if (remove)
+				iowrite8(MIC_VIRTIO_PARAM_DEV_REMOVE,
+					&dc->config_change);
+			put_device(dev);
+			mic_handle_config_change(d, i, mdrv);
+			ret = mic_remove_device(d, i, mdrv);
+			if (!ret && !remove)
+				iowrite8(-1, &d->type);
+			if (remove) {
+				iowrite8(0, &dc->config_change);
+				iowrite8(0, &dc->guest_ack);
+			}
+			continue;
+		}
+
+		/* new device */
+		dev_info(mdrv->dev, "%s %d Adding new virtio device %p\n",
+				__func__, __LINE__, d);
+		if (!remove)
+			mic_add_device(d, i, mdrv);
+	}
+}
+
+/*
+ * mic_hotplug_device tries to find changes in the device page.
+ */
+static void mic_hotplug_devices(struct work_struct *work)
+{
+	struct mic_driver *mdrv = container_of(work,
+		struct mic_driver, hotplug_work);
+
+	mic_scan_devices(mdrv, !REMOVE_DEVICES);
+}
+
+/*
+ * Interrupt handler for hot plug/config changes etc.
+ */
+static irqreturn_t
+mic_extint_handler(int irq, void *data)
+{
+	struct mic_driver *mdrv = (struct mic_driver *)data;
+
+	dev_dbg(mdrv->dev, "%s %d hotplug work\n",
+		__func__, __LINE__);
+	mic_ack_interrupt(&mdrv->mdev);
+	schedule_work(&mdrv->hotplug_work);
+	return IRQ_HANDLED;
+}
+
+/*
+ * Init function for virtio
+ */
+int mic_devices_init(struct mic_driver *mdrv)
+{
+	int rc;
+	struct mic_bootparam __iomem *bootparam;
+	int config_db;
+
+	INIT_WORK(&mdrv->hotplug_work, mic_hotplug_devices);
+	mic_scan_devices(mdrv, !REMOVE_DEVICES);
+
+	config_db = mic_next_card_db();
+	virtio_config_cookie = mic_request_card_irq(mic_extint_handler,
+			"virtio_config_intr", mdrv, config_db);
+	if (IS_ERR(virtio_config_cookie)) {
+		rc = PTR_ERR(virtio_config_cookie);
+		goto exit;
+	}
+
+	bootparam = mdrv->dp;
+	iowrite8(config_db, &bootparam->h2c_config_db);
+	return 0;
+exit:
+	return rc;
+}
+
+/*
+ * Uninit function for virtio
+ */
+void mic_devices_uninit(struct mic_driver *mdrv)
+{
+	struct mic_bootparam __iomem *bootparam = mdrv->dp;
+	iowrite8(-1, &bootparam->h2c_config_db);
+	mic_free_card_irq(virtio_config_cookie, mdrv);
+	flush_work(&mdrv->hotplug_work);
+	mic_scan_devices(mdrv, REMOVE_DEVICES);
+}
diff --git a/drivers/misc/mic/card/mic_virtio.h b/drivers/misc/mic/card/mic_virtio.h
new file mode 100644
index 0000000..91a98ff
--- /dev/null
+++ b/drivers/misc/mic/card/mic_virtio.h
@@ -0,0 +1,74 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Disclaimer: The codes contained in these modules may be specific to
+ * the Intel Software Development Platform codenamed: Knights Ferry, and
+ * the Intel product codenamed: Knights Corner, and are not backward
+ * compatible with other Intel products. Additionally, Intel will NOT
+ * support the codes or instruction set in future products.
+ *
+ * Intel MIC Card driver.
+ *
+ */
+#ifndef __MIC_CARD_VIRTIO_H
+#define __MIC_CARD_VIRTIO_H
+
+/*
+ * 64 bit I/O access
+ */
+#ifndef ioread64
+#define ioread64 readq
+#endif
+#ifndef iowrite64
+#define iowrite64 writeq
+#endif
+
+static inline unsigned mic_desc_size(struct mic_device_desc __iomem *desc)
+{
+	return mic_aligned_size(*desc)
+		+ ioread8(&desc->num_vq) * mic_aligned_size(struct mic_vqconfig)
+		+ ioread8(&desc->feature_len) * 2
+		+ ioread8(&desc->config_len);
+}
+
+static inline struct mic_vqconfig __iomem *
+mic_vq_config(struct mic_device_desc __iomem *desc)
+{
+	return (struct mic_vqconfig __iomem *)(desc + 1);
+}
+
+static inline __u8 __iomem *
+mic_vq_features(struct mic_device_desc __iomem *desc)
+{
+	return (__u8 __iomem *)(mic_vq_config(desc) + ioread8(&desc->num_vq));
+}
+
+static inline __u8 __iomem *
+mic_vq_configspace(struct mic_device_desc __iomem *desc)
+{
+	return mic_vq_features(desc) + ioread8(&desc->feature_len) * 2;
+}
+static inline unsigned mic_total_desc_size(struct mic_device_desc __iomem *desc)
+{
+	return mic_aligned_desc_size(desc) +
+		mic_aligned_size(struct mic_device_ctrl);
+}
+
+int mic_devices_init(struct mic_driver *mdrv);
+void mic_devices_uninit(struct mic_driver *mdrv);
+
+#endif
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon.
  2013-08-08  3:04 [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors Sudeep Dutt
                   ` (5 preceding siblings ...)
  2013-08-08  3:04 ` [PATCH v2 6/7] Intel MIC Card " Sudeep Dutt
@ 2013-08-08  3:04 ` Sudeep Dutt
  2013-08-08  6:40     ` Michael S. Tsirkin
  6 siblings, 1 reply; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-08  3:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell,
	Michael S. Tsirkin, Rob Landley, linux-kernel, virtualization,
	linux-doc
  Cc: Harshavardhan R Kharche, Peter P Waskiewicz Jr,
	Yaozu (Eddie) Dong, Sudeep Dutt, Ashutosh Dixit, AsiasHeasias,
	Caz Yokoyama, Dasaratharaman Chandramouli

From: Caz Yokoyama <Caz.Yokoyama@intel.com>

This patch introduces a sample user space daemon which
implements the virtio device backends on the host. The daemon
creates/removes/configures virtio device backends by communicating with
the Intel MIC Host Driver. The virtio devices currently supported are
virtio net, virtio console and virtio block. Virtio net supports TSO/GSO.
The daemon also monitors card shutdown status and takes appropriate actions
like killing the virtio backends and resetting the card upon card shutdown
and crashes.

Co-author: Ashutosh Dixit <ashutosh.dixit@intel.com>
Co-author: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
---
 Documentation/mic/mic_overview.txt |   48 +
 Documentation/mic/mpssd/.gitignore |    1 +
 Documentation/mic/mpssd/Makefile   |   19 +
 Documentation/mic/mpssd/micctrl    |  152 ++++
 Documentation/mic/mpssd/mpss       |  245 ++++++
 Documentation/mic/mpssd/mpssd.c    | 1689 ++++++++++++++++++++++++++++++++++++
 Documentation/mic/mpssd/mpssd.h    |  100 +++
 Documentation/mic/mpssd/sysfs.c    |  103 +++
 8 files changed, 2357 insertions(+)
 create mode 100644 Documentation/mic/mic_overview.txt
 create mode 100644 Documentation/mic/mpssd/.gitignore
 create mode 100644 Documentation/mic/mpssd/Makefile
 create mode 100755 Documentation/mic/mpssd/micctrl
 create mode 100755 Documentation/mic/mpssd/mpss
 create mode 100644 Documentation/mic/mpssd/mpssd.c
 create mode 100644 Documentation/mic/mpssd/mpssd.h
 create mode 100644 Documentation/mic/mpssd/sysfs.c

diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt
new file mode 100644
index 0000000..8b1a916
--- /dev/null
+++ b/Documentation/mic/mic_overview.txt
@@ -0,0 +1,48 @@
+An Intel MIC X100 device is a PCIe form factor add-in coprocessor
+card based on the Intel Many Integrated Core (MIC) architecture
+that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
+implements the three required standard address spaces i.e. configuration,
+memory and I/O. The host OS loads a device driver as is typical for
+PCIe devices. The card itself runs a bootstrap after reset that
+transfers control to the card OS downloaded from the host driver.
+The card OS as shipped by Intel is a Linux kernel with modifications
+for the X100 devices.
+
+Since it is a PCIe card, it does not have the ability to host hardware
+devices for networking, storage and console. We provide these devices
+on X100 coprocessors thus enabling a self-bootable equivalent environment
+for applications. A key benefit of our solution is that it leverages
+the standard virtio framework for network, disk and console devices,
+though in our case the virtio framework is used across a PCIe bus.
+
+Here is a block diagram of the various components described above. The
+virtio backends are situated on the host rather than the card given better
+single threaded performance for the host compared to MIC and the ability of
+the host to initiate DMA's to/from the card using the MIC DMA engine.
+
+                              |
+       +----------+           |             +----------+
+       | Card OS  |           |             | Host OS  |
+       +----------+           |             +----------+
+                              |
++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
+| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
+| Net   | |Console | |Block | | |Net      |  |Console | |Block   |
+| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
+    |         |         |     |      |            |         |
+    |         |         |     |Ring 3|            |         |
+    |         |         |     |------|------------|---------|-------
+    +-------------------+     |Ring 0+--------------------------+
+              |               |      | Virtio over PCIe IOCTLs  |
+              |               |      +--------------------------+
+      +--------------+        |                   |
+      |Intel MIC     |        |            +---------------+
+      |Card Driver   |        |            |Intel MIC      |
+      +--------------+        |            |Host Driver    |
+              |               |            +---------------+
+              |               |                   |
+     +-------------------------------------------------------------+
+     |                                                             |
+     |                    PCIe Bus                                 |
+     +-------------------------------------------------------------+
diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore
new file mode 100644
index 0000000..8b7c72f
--- /dev/null
+++ b/Documentation/mic/mpssd/.gitignore
@@ -0,0 +1 @@
+mpssd
diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile
new file mode 100644
index 0000000..eb860a7
--- /dev/null
+++ b/Documentation/mic/mpssd/Makefile
@@ -0,0 +1,19 @@
+#
+# Makefile - Intel MIC User Space Tools.
+# Copyright(c) 2013, Intel Corporation.
+#
+ifdef DEBUG
+CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG)
+else
+CFLAGS += $(USERWARNFLAGS) -I. -g -Wall
+endif
+
+mpssd: mpssd.o sysfs.o
+	$(CC) $(CFLAGS) -o $@ $^ -lpthread
+
+install:
+	install mpssd /usr/sbin/mpssd
+	install micctrl /usr/sbin/micctrl
+
+clean:
+	rm -f mpssd *.o
diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl
new file mode 100755
index 0000000..e0cfa53
--- /dev/null
+++ b/Documentation/mic/mpssd/micctrl
@@ -0,0 +1,152 @@
+#!/bin/bash
+# Intel MIC Platform Software Stack (MPSS)
+#
+# Copyright(c) 2013 Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License, version 2, as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# The full GNU General Public License is included in this distribution in
+# the file called "COPYING".
+#
+# Intel MIC User Space Tools.
+#
+# micctrl - Controls MIC boot/start/stop.
+#
+# chkconfig: 2345 95 05
+# description: start MPSS stack processing.
+#
+### BEGIN INIT INFO
+# Provides: micctrl
+### END INIT INFO
+
+# Source function library.
+. /etc/init.d/functions
+
+sysfs="/sys/class/mic"
+
+status()
+{
+	if [ "`echo $1 | head -c3`" == "mic" ]; then
+		f=$sysfs/$1
+		echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`"
+		return 0
+	fi
+
+	if [ -d "$sysfs" ]; then
+		for f in $sysfs/*
+		do
+			echo -e ""`basename $f`" state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`""
+		done
+	fi
+
+	return 0
+}
+
+reset()
+{
+	if [ "`echo $1 | head -c3`" == "mic" ]; then
+		f=$sysfs/$1
+		echo reset > $f/state
+		return 0
+	fi
+
+	if [ -d "$sysfs" ]; then
+		for f in $sysfs/*
+		do
+			echo reset > $f/state
+		done
+	fi
+
+	return 0
+}
+
+boot()
+{
+	if [ "`echo $1 | head -c3`" == "mic" ]; then
+		f=$sysfs/$1
+		echo "boot:linux:mic/uos.img:mic/$1.image" > $f/state
+		return 0
+	fi
+
+	if [ -d "$sysfs" ]; then
+		for f in $sysfs/*
+		do
+			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
+		done
+	fi
+
+	return 0
+}
+
+shutdown()
+{
+	if [ "`echo $1 | head -c3`" == "mic" ]; then
+		f=$sysfs/$1
+		echo shutdown > $f/state
+		return 0
+	fi
+
+	if [ -d "$sysfs" ]; then
+		for f in $sysfs/*
+		do
+			echo shutdown > $f/state
+		done
+	fi
+
+	return 0
+}
+
+wait()
+{
+	if [ "`echo $1 | head -c3`" == "mic" ]; then
+		f=$sysfs/$1
+		while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
+		do
+			sleep 1
+			echo -e "Waiting for $1 to go offline"
+		done
+		return 0
+	fi
+
+	if [ -d "$sysfs" ]; then
+		# Wait for the cards to go offline
+		for f in $sysfs/*
+		do
+			while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
+			do
+				sleep 1
+				echo -e "Waiting for "`basename $f`" to go offline"
+			done
+		done
+	fi
+}
+
+case $1 in
+	-s)
+		status $2
+		;;
+	-r)
+		reset $2
+		;;
+	-b)
+		boot $2
+		;;
+	-S)
+		shutdown $2
+		;;
+	-w)
+		wait $2
+		;;
+	*)
+		echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}"
+		exit 2
+esac
+
+exit $?
diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss
new file mode 100755
index 0000000..f0bb3dd
--- /dev/null
+++ b/Documentation/mic/mpssd/mpss
@@ -0,0 +1,245 @@
+#!/bin/bash
+# Intel MIC Platform Software Stack (MPSS)
+#
+# Copyright(c) 2013 Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License, version 2, as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# The full GNU General Public License is included in this distribution in
+# the file called "COPYING".
+#
+# Intel MIC User Space Tools.
+#
+# mpss	Start mpssd.
+#
+# chkconfig: 2345 95 05
+# description: start MPSS stack processing.
+#
+### BEGIN INIT INFO
+# Provides: mpss
+# Required-Start:
+# Required-Stop:
+# Short-Description: MPSS stack control
+# Description: MPSS stack control
+### END INIT INFO
+
+# Source function library.
+. /etc/init.d/functions
+
+exec=/usr/sbin/mpssd
+sysfs="/sys/class/mic"
+
+start()
+{
+	[ -x $exec ] || exit 5
+
+	echo -e $"Starting MPSS Stack"
+
+	echo -e $"Loading MIC_HOST Module"
+
+	# Ensure the driver is loaded
+	[ -d "$sysfs" ] || modprobe mic_host
+
+	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then
+		echo -e $"MPSSD already running! "
+		success
+		echo
+		return 0;
+	fi
+
+	# Start the daemon
+	echo -n $"Starting MPSSD"
+	$exec &
+	RETVAL=$?
+	if [ $RETVAL -ne 0 ]; then
+		failure
+	else
+		success
+	fi
+	echo
+
+	sleep 5
+
+	# Boot the cards
+	if [ $RETVAL -eq 0 ]; then
+		for f in $sysfs/*
+		do
+			echo -ne "Booting "`basename $f`" "
+			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
+			RETVAL=$?
+			if [ $RETVAL -ne 0 ]; then
+				failure
+			else
+				success
+			fi
+			echo
+		done
+	fi
+
+	# Wait till ping works
+	if [ $RETVAL -eq 0 ]; then
+		for f in $sysfs/*
+		do
+			count=100
+			ipaddr=`cat $f/cmdline`
+			ipaddr=${ipaddr#*address,}
+			ipaddr=`echo $ipaddr | cut -d, -f1 | cut -d\; -f1`
+
+			while [ $count -ge 0 ]
+			do
+				echo -e "Pinging "`basename $f`" "
+				ping -c 1 $ipaddr &> /dev/null
+				RETVAL=$?
+				if [ $RETVAL -eq 0 ]; then
+					success
+					break
+				fi
+				sleep 1
+				count=`expr $count - 1`
+			done
+			if [ $RETVAL -ne 0 ]; then
+				failure
+			else
+				success
+			fi
+			echo
+		done
+	fi
+	return $RETVAL
+}
+
+stop()
+{
+	echo -e $"Shutting down MPSS Stack: "
+
+	# Bail out if module is unloaded
+	if [ ! -d "$sysfs" ]; then
+		echo -n $"Module unloaded "
+		killall -9 mpssd 2>/dev/null
+		success
+		echo
+		return 0
+	fi
+
+	# Shut down the cards
+	for f in $sysfs/*
+	do
+		echo -e "Shutting down `basename $f` "
+		echo "shutdown" > $f/state 2>/dev/null
+	done
+
+	# Wait for the cards to go offline
+	for f in $sysfs/*
+	do
+		while [ "`cat $f/state`" != "offline" ]
+		do
+			sleep 1
+			echo -e "Waiting for "`basename $f`" to go offline"
+		done
+	done
+
+	# Display the status of the cards
+	for f in $sysfs/*
+	do
+		echo -e ""`basename $f`" state: "`cat $f/state`""
+	done
+
+	sleep 5
+
+	# Kill MPSSD now
+	echo -n $"Killing MPSSD"
+	killall -9 mpssd 2>/dev/null
+	RETVAL=$?
+	if [ $RETVAL -ne 0 ]; then
+		failure
+	else
+		success
+	fi
+	echo
+	return $RETVAL
+}
+
+restart()
+{
+	stop
+	sleep 5
+	start
+}
+
+status()
+{
+	if [ -d "$sysfs" ]; then
+		for f in $sysfs/*
+		do
+			echo -e ""`basename $f`" state: "`cat $f/state`""
+		done
+	fi
+
+	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then
+		echo "mpssd is running"
+	else
+		echo "mpssd is stopped"
+	fi
+	return 0
+}
+
+unload()
+{
+	if [ ! -d "$sysfs" ]; then
+		echo -n $"No MIC_HOST Module: "
+		killall -9 mpssd 2>/dev/null
+		success
+		echo
+		return
+	fi
+
+	stop
+	RETVAL=$?
+
+	sleep 5
+	echo -n $"Removing MIC_HOST Module: "
+
+	if [ $RETVAL = 0 ]; then
+		sleep 1
+		modprobe -r mic_host
+		RETVAL=$?
+	fi
+
+	if [ $RETVAL -ne 0 ]; then
+		failure
+	else
+		success
+	fi
+	echo
+	return $RETVAL
+}
+
+case $1 in
+	start)
+		start
+		;;
+	stop)
+		stop
+		;;
+	restart)
+		restart
+		;;
+	status)
+		status
+		;;
+	unload)
+		unload
+		;;
+	*)
+		echo $"Usage: $0 {start|stop|restart|status|unload}"
+		exit 2
+esac
+
+exit $?
diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c
new file mode 100644
index 0000000..3bc34cb
--- /dev/null
+++ b/Documentation/mic/mpssd/mpssd.c
@@ -0,0 +1,1689 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC User Space Tools.
+ */
+
+#define _GNU_SOURCE
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <assert.h>
+#include <unistd.h>
+#include <stdbool.h>
+#include <signal.h>
+#include <poll.h>
+#include <features.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <linux/virtio_ring.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_blk.h>
+#include <linux/version.h>
+#include "mpssd.h"
+#include <linux/mic_ioctl.h>
+#include <linux/mic_common.h>
+
+static void init_mic(struct mic_info *mic);
+
+static FILE *logfp;
+static struct mic_info mic_list;
+
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+
+#define min_t(type, x, y) ({				\
+		type __min1 = (x);                      \
+		type __min2 = (y);                      \
+		__min1 < __min2 ? __min1 : __min2; })
+
+/* align addr on a size boundary - adjust address up/down if needed */
+#define _ALIGN_UP(addr, size)    (((addr)+((size)-1))&(~((size)-1)))
+#define _ALIGN_DOWN(addr, size)  ((addr)&(~((size)-1)))
+
+/* align addr on a size boundary - adjust address up if needed */
+#define _ALIGN(addr, size)     _ALIGN_UP(addr, size)
+
+/* to align the pointer to the (next) page boundary */
+#define PAGE_ALIGN(addr)        _ALIGN(addr, PAGE_SIZE)
+
+#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+
+/* Insert REP NOP (PAUSE) in busy-wait loops. */
+static inline void cpu_relax(void)
+{
+	asm volatile("rep; nop" : : : "memory");
+}
+
+#define GSO_ENABLED		1
+#define MAX_GSO_SIZE		(64 * 1024)
+#define ETH_H_LEN		14
+#define MAX_NET_PKT_SIZE	(_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64))
+#define MIC_DEVICE_PAGE_END	0x1000
+
+#ifndef VIRTIO_NET_HDR_F_DATA_VALID
+#define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
+#endif
+
+static struct {
+	struct mic_device_desc dd;
+	struct mic_vqconfig vqconfig[2];
+	__u32 host_features, guest_acknowledgements;
+	struct virtio_console_config cons_config;
+} virtcons_dev_page = {
+	.dd = {
+		.type = VIRTIO_ID_CONSOLE,
+		.num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig),
+		.feature_len = sizeof(virtcons_dev_page.host_features),
+		.config_len = sizeof(virtcons_dev_page.cons_config),
+	},
+	.vqconfig[0] = {
+		.num = htole16(MIC_VRING_ENTRIES),
+	},
+	.vqconfig[1] = {
+		.num = htole16(MIC_VRING_ENTRIES),
+	},
+};
+
+static struct {
+	struct mic_device_desc dd;
+	struct mic_vqconfig vqconfig[2];
+	__u32 host_features, guest_acknowledgements;
+	struct virtio_net_config net_config;
+} virtnet_dev_page = {
+	.dd = {
+		.type = VIRTIO_ID_NET,
+		.num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig),
+		.feature_len = sizeof(virtnet_dev_page.host_features),
+		.config_len = sizeof(virtnet_dev_page.net_config),
+	},
+	.vqconfig[0] = {
+		.num = htole16(MIC_VRING_ENTRIES),
+	},
+	.vqconfig[1] = {
+		.num = htole16(MIC_VRING_ENTRIES),
+	},
+#if GSO_ENABLED
+		.host_features = htole32(
+		1 << VIRTIO_NET_F_CSUM |
+		1 << VIRTIO_NET_F_GSO |
+		1 << VIRTIO_NET_F_GUEST_TSO4 |
+		1 << VIRTIO_NET_F_GUEST_TSO6 |
+		1 << VIRTIO_NET_F_GUEST_ECN |
+		1 << VIRTIO_NET_F_GUEST_UFO),
+#else
+		.host_features = 0,
+#endif
+};
+
+static const char *mic_config_dir = "/etc/sysconfig/mic";
+static const char *virtblk_backend = "VIRTBLK_BACKEND";
+static struct {
+	struct mic_device_desc dd;
+	struct mic_vqconfig vqconfig[1];
+	__u32 host_features, guest_acknowledgements;
+	struct virtio_blk_config blk_config;
+} virtblk_dev_page = {
+	.dd = {
+		.type = VIRTIO_ID_BLOCK,
+		.num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig),
+		.feature_len = sizeof(virtblk_dev_page.host_features),
+		.config_len = sizeof(virtblk_dev_page.blk_config),
+	},
+	.vqconfig[0] = {
+		.num = htole16(MIC_VRING_ENTRIES),
+	},
+	.host_features =
+		htole32(1<<VIRTIO_BLK_F_SEG_MAX),
+	.blk_config = {
+		.seg_max = htole32(MIC_VRING_ENTRIES - 2),
+		.capacity = htole64(0),
+	 }
+};
+
+static char *myname;
+
+static int
+tap_configure(struct mic_info *mic, char *dev)
+{
+	pid_t pid;
+	char *ifargv[7];
+	char ipaddr[IFNAMSIZ];
+	int ret = 0;
+
+	pid = fork();
+	if (pid == 0) {
+		ifargv[0] = "ip";
+		ifargv[1] = "link";
+		ifargv[2] = "set";
+		ifargv[3] = dev;
+		ifargv[4] = "up";
+		ifargv[5] = NULL;
+		mpsslog("Configuring %s\n", dev);
+		ret = execvp("ip", ifargv);
+		if (ret < 0) {
+			mpsslog("%s execvp failed errno %s\n",
+				mic->name, strerror(errno));
+			return ret;
+		}
+	}
+	if (pid < 0) {
+		mpsslog("%s fork failed errno %s\n",
+			mic->name, strerror(errno));
+		return ret;
+	}
+
+	ret = waitpid(pid, NULL, 0);
+	if (ret < 0) {
+		mpsslog("%s waitpid failed errno %s\n",
+			mic->name, strerror(errno));
+		return ret;
+	}
+
+	snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id);
+
+	pid = fork();
+	if (pid == 0) {
+		ifargv[0] = "ip";
+		ifargv[1] = "addr";
+		ifargv[2] = "add";
+		ifargv[3] = ipaddr;
+		ifargv[4] = "dev";
+		ifargv[5] = dev;
+		ifargv[6] = NULL;
+		mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr);
+		ret = execvp("ip", ifargv);
+		if (ret < 0) {
+			mpsslog("%s execvp failed errno %s\n",
+				mic->name, strerror(errno));
+			return ret;
+		}
+	}
+	if (pid < 0) {
+		mpsslog("%s fork failed errno %s\n",
+			mic->name, strerror(errno));
+		return ret;
+	}
+
+	ret = waitpid(pid, NULL, 0);
+	if (ret < 0) {
+		mpsslog("%s waitpid failed errno %s\n",
+			mic->name, strerror(errno));
+		return ret;
+	}
+	mpsslog("MIC name %s %s %d DONE!\n",
+		mic->name, __func__, __LINE__);
+	return 0;
+}
+
+static int tun_alloc(struct mic_info *mic, char *dev)
+{
+	struct ifreq ifr;
+	int fd, err;
+#if GSO_ENABLED
+	unsigned offload;
+#endif
+	fd = open("/dev/net/tun", O_RDWR);
+	if (fd < 0) {
+		mpsslog("Could not open /dev/net/tun %s\n", strerror(errno));
+		goto done;
+	}
+
+	memset(&ifr, 0, sizeof(ifr));
+
+	ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
+	if (*dev)
+		strncpy(ifr.ifr_name, dev, IFNAMSIZ);
+
+	err = ioctl(fd, TUNSETIFF, (void *) &ifr);
+	if (err < 0) {
+		mpsslog("%s %s %d TUNSETIFF failed %s\n",
+			mic->name, __func__, __LINE__, strerror(errno));
+		close(fd);
+		return err;
+	}
+#if GSO_ENABLED
+	offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
+		TUN_F_TSO_ECN | TUN_F_UFO;
+
+	err = ioctl(fd, TUNSETOFFLOAD, offload);
+	if (err < 0) {
+		mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n",
+			mic->name, __func__, __LINE__, strerror(errno));
+		close(fd);
+		return err;
+	}
+#endif
+	strcpy(dev, ifr.ifr_name);
+	mpsslog("Created TAP %s\n", dev);
+done:
+	return fd;
+}
+
+#define NET_FD_VIRTIO_NET 0
+#define NET_FD_TUN 1
+#define MAX_NET_FD 2
+
+static void * *
+get_dp(struct mic_info *mic, int type)
+{
+	switch (type) {
+	case VIRTIO_ID_CONSOLE:
+		return &mic->mic_console.console_dp;
+	case VIRTIO_ID_NET:
+		return &mic->mic_net.net_dp;
+	case VIRTIO_ID_BLOCK:
+		return &mic->mic_virtblk.block_dp;
+	}
+	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
+	assert(0);
+	return NULL;
+}
+
+static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type)
+{
+	struct mic_device_desc *d;
+	int i;
+	void *dp = *get_dp(mic, type);
+
+	for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE;
+		i += mic_total_desc_size(d)) {
+		d = dp + i;
+
+		/* End of list */
+		if (d->type == 0)
+			break;
+
+		if (d->type == -1)
+			continue;
+
+		mpsslog("%s %s d-> type %d d %p\n",
+			mic->name, __func__, d->type, d);
+
+		if (d->type == (__u8)type)
+			return d;
+	}
+	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
+	assert(0);
+	return NULL;
+}
+
+/* See comments in vhost.c for explanation of next_desc() */
+static unsigned next_desc(struct vring_desc *desc)
+{
+	unsigned int next;
+
+	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT))
+		return -1U;
+	next = le16toh(desc->next);
+	return next;
+}
+
+/* Sum up all the IOVEC length */
+static ssize_t
+sum_iovec_len(struct mic_copy_desc *copy)
+{
+	ssize_t sum = 0;
+	int i;
+
+	for (i = 0; i < copy->iovcnt; i++)
+		sum += copy->iov[i].iov_len;
+	return sum;
+}
+
+static inline void verify_out_len(struct mic_info *mic,
+	struct mic_copy_desc *copy)
+{
+	if (copy->out_len != sum_iovec_len(copy)) {
+		mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%x\n",
+				mic->name, __func__, __LINE__,
+				copy->out_len, sum_iovec_len(copy));
+		assert(copy->out_len == sum_iovec_len(copy));
+	}
+}
+
+/* Display an iovec */
+static void
+disp_iovec(struct mic_info *mic, struct mic_copy_desc *copy,
+	const char *s, int line)
+{
+	int i;
+
+	for (i = 0; i < copy->iovcnt; i++)
+		mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%lx\n",
+			mic->name, s, line, i,
+			copy->iov[i].iov_base, copy->iov[i].iov_len);
+}
+
+static inline __u16 read_avail_idx(struct mic_vring *vr)
+{
+	return ACCESS_ONCE(vr->info->avail_idx);
+}
+
+static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
+				struct mic_copy_desc *copy, ssize_t len)
+{
+	copy->vr_idx = tx ? 0 : 1;
+	copy->update_used = true;
+	if (type == VIRTIO_ID_NET)
+		copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr);
+	else
+		copy->iov[0].iov_len = len;
+}
+
+/* Central API which triggers the copies */
+static int
+mic_virtio_copy(struct mic_info *mic, int fd,
+	struct mic_vring *vr, struct mic_copy_desc *copy)
+{
+	int ret;
+
+	ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, copy);
+	if (ret) {
+		mpsslog("%s %s %d errno %s ret %d\n",
+			mic->name, __func__, __LINE__,
+			strerror(errno), ret);
+	}
+	return ret;
+}
+
+/*
+ * This initialization routine requires at least one
+ * vring i.e. vr0. vr1 is optional.
+ */
+static void *
+init_vr(struct mic_info *mic, int fd, int type,
+	struct mic_vring *vr0, struct mic_vring *vr1, int num_vq)
+{
+	int vr_size;
+	char *va;
+
+	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
+		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
+	va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq,
+		PROT_READ, MAP_SHARED, fd, 0);
+	if (MAP_FAILED == va) {
+		mpsslog("%s %s %d mmap failed errno %s\n",
+			mic->name, __func__, __LINE__,
+			strerror(errno));
+		goto done;
+	}
+	*get_dp(mic, type) = (void *)va;
+	vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END];
+	vr0->info = vr0->va +
+		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN);
+	vring_init(&vr0->vr,
+		MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN);
+	mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ",
+		__func__, mic->name, vr0->va, vr0->info, vr_size,
+		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
+	mpsslog("magic 0x%x expected 0x%x\n",
+		vr0->info->magic, MIC_MAGIC + type + 0);
+	assert(vr0->info->magic == MIC_MAGIC + type + 0);
+	if (vr1) {
+		vr1->va = (struct mic_vring *)
+			&va[MIC_DEVICE_PAGE_END + vr_size];
+		vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES,
+			MIC_VIRTIO_RING_ALIGN);
+		vring_init(&vr1->vr,
+			MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN);
+		mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ",
+			__func__, mic->name, vr1->va, vr1->info, vr_size,
+			vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
+		mpsslog("magic 0x%x expected 0x%x\n",
+			vr1->info->magic, MIC_MAGIC + type + 1);
+		assert(vr1->info->magic == MIC_MAGIC + type + 1);
+	}
+done:
+	return va;
+}
+
+static void
+uninit_vr(struct mic_info *mic, int num_vq)
+{
+	int vr_size, ret;
+
+	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
+		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
+	ret = munmap(mic->mic_virtblk.block_dp,
+		MIC_DEVICE_PAGE_END + vr_size * num_vq);
+	if (ret < 0)
+		mpsslog("%s munmap errno %d\n", mic->name, errno);
+}
+
+static void
+wait_for_card_driver(struct mic_info *mic, int fd, int type)
+{
+	struct pollfd pollfd;
+	int err;
+	struct mic_device_desc *desc = get_device_desc(mic, type);
+
+	pollfd.fd = fd;
+	mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n",
+		mic->name, __func__, type, desc->status);
+	while (1) {
+		pollfd.events = POLLIN;
+		pollfd.revents = 0;
+		err = poll(&pollfd, 1, -1);
+		if (err < 0) {
+			mpsslog("%s %s poll failed %s\n",
+				mic->name, __func__, strerror(errno));
+			continue;
+		}
+
+		if (pollfd.revents) {
+			mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n",
+				mic->name, __func__, type, desc->status);
+			if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
+				mpsslog("%s %s poll.revents %d\n",
+					mic->name, __func__, pollfd.revents);
+				mpsslog("%s %s desc-> type %d status 0x%x\n",
+					mic->name, __func__, type,
+					desc->status);
+				break;
+			}
+		}
+	}
+}
+
+/* Spin till we have some descriptors */
+static void
+wait_for_descriptors(struct mic_info *mic, struct mic_vring *vr)
+{
+	__u16 avail_idx = read_avail_idx(vr);
+
+	while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
+#ifdef DEBUG
+		mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
+			mic->name, __func__,
+			le16toh(vr->vr.avail->idx), vr->info->avail_idx);
+#endif
+		cpu_relax();
+	}
+}
+
+static void *
+virtio_net(void *arg)
+{
+	static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)];
+	static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64);
+	struct iovec vnet_iov[2][2] = {
+		{ { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) },
+		  { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } },
+		{ { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) },
+		  { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } },
+	};
+	struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1];
+	struct mic_info *mic = (struct mic_info *)arg;
+	char if_name[IFNAMSIZ];
+	struct pollfd net_poll[MAX_NET_FD];
+	struct mic_vring tx_vr, rx_vr;
+	struct mic_copy_desc copy;
+	struct mic_device_desc *desc;
+	int err;
+
+	snprintf(if_name, IFNAMSIZ, "mic%d", mic->id);
+	mic->mic_net.tap_fd = tun_alloc(mic, if_name);
+	if (mic->mic_net.tap_fd < 0)
+		goto done;
+
+	if (tap_configure(mic, if_name))
+		goto done;
+	mpsslog("MIC name %s id %d\n", mic->name, mic->id);
+
+	net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd;
+	net_poll[NET_FD_VIRTIO_NET].events = POLLIN;
+	net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd;
+	net_poll[NET_FD_TUN].events = POLLIN;
+
+	if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd,
+		VIRTIO_ID_NET, &tx_vr, &rx_vr,
+		virtnet_dev_page.dd.num_vq)) {
+		mpsslog("%s init_vr failed %s\n",
+			mic->name, strerror(errno));
+		goto done;
+	}
+
+	copy.iovcnt = 2;
+	desc = get_device_desc(mic, VIRTIO_ID_NET);
+
+	while (1) {
+		ssize_t len;
+
+		net_poll[NET_FD_VIRTIO_NET].revents = 0;
+		net_poll[NET_FD_TUN].revents = 0;
+
+		/* Start polling for data from tap and virtio net */
+		err = poll(net_poll, 2, -1);
+		if (err < 0) {
+			mpsslog("%s poll failed %s\n",
+				__func__, strerror(errno));
+			continue;
+		}
+		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
+			wait_for_card_driver(mic, mic->mic_net.virtio_net_fd,
+					VIRTIO_ID_NET);
+		/*
+		 * Check if there is data to be read from TUN and write to
+		 * virtio net fd if there is.
+		 */
+		if (net_poll[NET_FD_TUN].revents & POLLIN) {
+			copy.iov = iov0;
+			len = readv(net_poll[NET_FD_TUN].fd,
+				copy.iov, copy.iovcnt);
+			if (len > 0) {
+				struct virtio_net_hdr *hdr
+					= (struct virtio_net_hdr *) vnet_hdr[0];
+
+				/* Disable checksums on the card since we are on
+				   a reliable PCIe link */
+				hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
+#ifdef DEBUG
+				mpsslog("%s %s %d hdr->flags 0x%x ", mic->name,
+					__func__, __LINE__, hdr->flags);
+				mpsslog("copy.out_len %d hdr->gso_type 0x%x\n",
+					copy.out_len, hdr->gso_type);
+#endif
+#ifdef DEBUG
+				disp_iovec(mic, copy, __func__, __LINE__);
+				mpsslog("%s %s %d read from tap 0x%lx\n",
+					mic->name, __func__, __LINE__,
+					len);
+#endif
+				wait_for_descriptors(mic, &tx_vr);
+				txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, &copy,
+					len);
+
+				err = mic_virtio_copy(mic,
+					mic->mic_net.virtio_net_fd, &tx_vr,
+					&copy);
+				if (err < 0) {
+					mpsslog("%s %s %d mic_virtio_copy %s\n",
+						mic->name, __func__, __LINE__,
+						strerror(errno));
+				}
+				if (!err)
+					verify_out_len(mic, &copy);
+#ifdef DEBUG
+				disp_iovec(mic, copy, __func__, __LINE__);
+				mpsslog("%s %s %d wrote to net 0x%lx\n",
+					mic->name, __func__, __LINE__,
+					sum_iovec_len(&copy));
+#endif
+				/* Reinitialize IOV for next run */
+				iov0[1].iov_len = MAX_NET_PKT_SIZE;
+			} else if (len < 0) {
+				disp_iovec(mic, &copy, __func__, __LINE__);
+				mpsslog("%s %s %d read failed %s ", mic->name,
+					__func__, __LINE__, strerror(errno));
+				mpsslog("cnt %d sum %d\n",
+					copy.iovcnt, sum_iovec_len(&copy));
+			}
+		}
+
+		/*
+		 * Check if there is data to be read from virtio net and
+		 * write to TUN if there is.
+		 */
+		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) {
+			while (rx_vr.info->avail_idx !=
+				le16toh(rx_vr.vr.avail->idx)) {
+				copy.iov = iov1;
+				txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, &copy,
+					MAX_NET_PKT_SIZE
+					+ sizeof(struct virtio_net_hdr));
+
+				err = mic_virtio_copy(mic,
+					mic->mic_net.virtio_net_fd, &rx_vr,
+					&copy);
+				if (!err) {
+#ifdef DEBUG
+					struct virtio_net_hdr *hdr
+						= (struct virtio_net_hdr *)
+							vnet_hdr[1];
+
+					mpsslog("%s %s %d hdr->flags 0x%x, ",
+						mic->name, __func__, __LINE__,
+						hdr->flags);
+					mpsslog("out_len %d gso_type 0x%x\n",
+						copy.out_len,
+						hdr->gso_type);
+#endif
+					/* Set the correct output iov_len */
+					iov1[1].iov_len = copy.out_len -
+						sizeof(struct virtio_net_hdr);
+					verify_out_len(mic, &copy);
+#ifdef DEBUG
+					disp_iovec(mic, copy, __func__,
+						__LINE__);
+					mpsslog("%s %s %d ",
+						mic->name, __func__, __LINE__);
+					mpsslog("read from net 0x%lx\n",
+						sum_iovec_len(copy));
+#endif
+					len = writev(net_poll[NET_FD_TUN].fd,
+						copy.iov, copy.iovcnt);
+					if (len != sum_iovec_len(&copy)) {
+						mpsslog("Tun write failed %s ",
+							strerror(errno));
+						mpsslog("len 0x%x ", len);
+						mpsslog("read_len 0x%x\n",
+							sum_iovec_len(&copy));
+					} else {
+#ifdef DEBUG
+						disp_iovec(mic, &copy, __func__,
+							__LINE__);
+						mpsslog("%s %s %d ",
+							mic->name, __func__,
+							__LINE__);
+						mpsslog("wrote to tap 0x%lx\n",
+							len);
+#endif
+					}
+				} else {
+					mpsslog("%s %s %d mic_virtio_copy %s\n",
+						mic->name, __func__, __LINE__,
+						strerror(errno));
+					break;
+				}
+			}
+		}
+		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
+			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
+			sleep(1);
+		}
+	}
+done:
+	pthread_exit(NULL);
+}
+
+/* virtio_console */
+#define VIRTIO_CONSOLE_FD 0
+#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1)
+#define MAX_CONSOLE_FD (MONITOR_FD + 1)  /* must be the last one + 1 */
+#define MAX_BUFFER_SIZE PAGE_SIZE
+
+static void *
+virtio_console(void *arg)
+{
+	static __u8 vcons_buf[2][PAGE_SIZE];
+	struct iovec vcons_iov[2] = {
+		{ .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) },
+		{ .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) },
+	};
+	struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1];
+	struct mic_info *mic = (struct mic_info *)arg;
+	int err;
+	struct pollfd console_poll[MAX_CONSOLE_FD];
+	int pty_fd;
+	char *pts_name;
+	ssize_t len;
+	struct mic_vring tx_vr, rx_vr;
+	struct mic_copy_desc copy;
+	struct mic_device_desc *desc;
+
+	pty_fd = posix_openpt(O_RDWR);
+	if (pty_fd < 0) {
+		mpsslog("can't open a pseudoterminal master device: %s\n",
+			strerror(errno));
+		goto _return;
+	}
+	pts_name = ptsname(pty_fd);
+	if (pts_name == NULL) {
+		mpsslog("can't get pts name\n");
+		goto _close_pty;
+	}
+	printf("%s console message goes to %s\n", mic->name, pts_name);
+	mpsslog("%s console message goes to %s\n", mic->name, pts_name);
+	err = grantpt(pty_fd);
+	if (err < 0) {
+		mpsslog("can't grant access: %s %s\n",
+				pts_name, strerror(errno));
+		goto _close_pty;
+	}
+	err = unlockpt(pty_fd);
+	if (err < 0) {
+		mpsslog("can't unlock a pseudoterminal: %s %s\n",
+				pts_name, strerror(errno));
+		goto _close_pty;
+	}
+	console_poll[MONITOR_FD].fd = pty_fd;
+	console_poll[MONITOR_FD].events = POLLIN;
+
+	console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd;
+	console_poll[VIRTIO_CONSOLE_FD].events = POLLIN;
+
+	if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd,
+		VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr,
+		virtcons_dev_page.dd.num_vq)) {
+		mpsslog("%s init_vr failed %s\n",
+			mic->name, strerror(errno));
+		goto _close_pty;
+	}
+
+	copy.iovcnt = 1;
+	desc = get_device_desc(mic, VIRTIO_ID_CONSOLE);
+
+	for (;;) {
+		console_poll[MONITOR_FD].revents = 0;
+		console_poll[VIRTIO_CONSOLE_FD].revents = 0;
+		err = poll(console_poll, MAX_CONSOLE_FD, -1);
+		if (err < 0) {
+			mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__,
+				strerror(errno));
+			continue;
+		}
+		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
+			wait_for_card_driver(mic,
+				mic->mic_console.virtio_console_fd,
+				VIRTIO_ID_CONSOLE);
+
+		if (console_poll[MONITOR_FD].revents & POLLIN) {
+			copy.iov = iov0;
+			len = readv(pty_fd, copy.iov, copy.iovcnt);
+			if (len > 0) {
+#ifdef DEBUG
+				disp_iovec(mic, copy, __func__, __LINE__);
+				mpsslog("%s %s %d read from tap 0x%lx\n",
+					mic->name, __func__, __LINE__,
+					len);
+#endif
+				wait_for_descriptors(mic, &tx_vr);
+				txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr,
+					&copy, len);
+
+				err = mic_virtio_copy(mic,
+					mic->mic_console.virtio_console_fd,
+					&tx_vr, &copy);
+				if (err < 0) {
+					mpsslog("%s %s %d mic_virtio_copy %s\n",
+						mic->name, __func__, __LINE__,
+						strerror(errno));
+				}
+				if (!err)
+					verify_out_len(mic, &copy);
+#ifdef DEBUG
+				disp_iovec(mic, copy, __func__, __LINE__);
+				mpsslog("%s %s %d wrote to net 0x%lx\n",
+					mic->name, __func__, __LINE__,
+					sum_iovec_len(copy));
+#endif
+				/* Reinitialize IOV for next run */
+				iov0->iov_len = PAGE_SIZE;
+			} else if (len < 0) {
+				disp_iovec(mic, &copy, __func__, __LINE__);
+				mpsslog("%s %s %d read failed %s ",
+					mic->name, __func__, __LINE__,
+					strerror(errno));
+				mpsslog("cnt %d sum %d\n",
+					copy.iovcnt, sum_iovec_len(&copy));
+			}
+		}
+
+		if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) {
+			while (rx_vr.info->avail_idx !=
+				le16toh(rx_vr.vr.avail->idx)) {
+				copy.iov = iov1;
+				txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr,
+					&copy, PAGE_SIZE);
+
+				err = mic_virtio_copy(mic,
+					mic->mic_console.virtio_console_fd,
+					&rx_vr, &copy);
+				if (!err) {
+					/* Set the correct output iov_len */
+					iov1->iov_len = copy.out_len;
+					verify_out_len(mic, &copy);
+#ifdef DEBUG
+					disp_iovec(mic, copy, __func__,
+						__LINE__);
+					mpsslog("%s %s %d ",
+						mic->name, __func__, __LINE__);
+					mpsslog("read from net 0x%lx\n",
+						sum_iovec_len(copy));
+#endif
+					len = writev(pty_fd,
+						copy.iov, copy.iovcnt);
+					if (len != sum_iovec_len(&copy)) {
+						mpsslog("Tun write failed %s ",
+							strerror(errno));
+						mpsslog("len 0x%x ", len);
+						mpsslog("read_len 0x%x\n",
+							sum_iovec_len(&copy));
+					} else {
+#ifdef DEBUG
+						disp_iovec(mic, copy, __func__,
+							__LINE__);
+						mpsslog("%s %s %d ",
+							mic->name, __func__,
+							__LINE__);
+						mpsslog("wrote to tap 0x%lx\n",
+							len);
+#endif
+					}
+				} else {
+					mpsslog("%s %s %d mic_virtio_copy %s\n",
+						mic->name, __func__, __LINE__,
+						strerror(errno));
+					break;
+				}
+			}
+		}
+		if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
+			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
+			sleep(1);
+		}
+	}
+_close_pty:
+	close(pty_fd);
+_return:
+	pthread_exit(NULL);
+}
+
+static void
+add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd)
+{
+	char path[PATH_MAX];
+	int fd, err;
+
+	snprintf(path, PATH_MAX, "/dev/mic%d", mic->id);
+	fd = open(path, O_RDWR);
+	if (fd < 0) {
+		mpsslog("Could not open %s %s\n", path, strerror(errno));
+		return;
+	}
+
+	err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd);
+	if (err < 0) {
+		mpsslog("Could not add %d %s\n", dd->type, strerror(errno));
+		close(fd);
+		return;
+	}
+	switch (dd->type) {
+	case VIRTIO_ID_NET:
+		mic->mic_net.virtio_net_fd = fd;
+		mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name);
+		break;
+	case VIRTIO_ID_CONSOLE:
+		mic->mic_console.virtio_console_fd = fd;
+		mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name);
+		break;
+	case VIRTIO_ID_BLOCK:
+		mic->mic_virtblk.virtio_block_fd = fd;
+		mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name);
+		break;
+	}
+}
+
+static bool
+set_backend_file(struct mic_info *mic)
+{
+	FILE *config;
+	char buff[PATH_MAX], *line, *evv, *p;
+
+	snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id);
+	config = fopen(buff, "r");
+	if (config == NULL)
+		return false;
+	do {  /* look for "virtblk_backend=XXXX" */
+		line = fgets(buff, PATH_MAX, config);
+		if (line == NULL)
+			break;
+		if (*line == '#')
+			continue;
+		p = strchr(line, '\n');
+		if (p)
+			*p = '\0';
+	} while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0);
+	fclose(config);
+	if (line == NULL)
+		return false;
+	evv = strchr(line, '=');
+	if (evv == NULL)
+		return false;
+	mic->mic_virtblk.backend_file = malloc(strlen(evv));
+	if (mic->mic_virtblk.backend_file == NULL) {
+		mpsslog("can't allocate memory\n", mic->name, mic->id);
+		return false;
+	}
+	strcpy(mic->mic_virtblk.backend_file, evv + 1);
+	return true;
+}
+
+#define SECTOR_SIZE 512
+static bool
+set_backend_size(struct mic_info *mic)
+{
+	mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0,
+		SEEK_END);
+	if (mic->mic_virtblk.backend_size < 0) {
+		mpsslog("%s: can't seek: %s\n",
+			mic->name, mic->mic_virtblk.backend_file);
+		return false;
+	}
+	virtblk_dev_page.blk_config.capacity =
+		mic->mic_virtblk.backend_size / SECTOR_SIZE;
+	if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0)
+		virtblk_dev_page.blk_config.capacity++;
+
+	virtblk_dev_page.blk_config.capacity =
+		htole64(virtblk_dev_page.blk_config.capacity);
+
+	return true;
+}
+
+static bool
+open_backend(struct mic_info *mic)
+{
+	if (!set_backend_file(mic))
+		goto _error_exit;
+	mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR);
+	if (mic->mic_virtblk.backend < 0) {
+		mpsslog("%s: can't open: %s\n", mic->name,
+			mic->mic_virtblk.backend_file);
+		goto _error_free;
+	}
+	if (!set_backend_size(mic))
+		goto _error_close;
+	mic->mic_virtblk.backend_addr = mmap(NULL,
+		mic->mic_virtblk.backend_size,
+		PROT_READ|PROT_WRITE, MAP_SHARED,
+		mic->mic_virtblk.backend, 0L);
+	if (mic->mic_virtblk.backend_addr == MAP_FAILED) {
+		mpsslog("%s: can't map: %s %s\n",
+			mic->name, mic->mic_virtblk.backend_file,
+			strerror(errno));
+		goto _error_close;
+	}
+	return true;
+
+ _error_close:
+	close(mic->mic_virtblk.backend);
+ _error_free:
+	free(mic->mic_virtblk.backend_file);
+ _error_exit:
+	return false;
+}
+
+static void
+close_backend(struct mic_info *mic)
+{
+	munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size);
+	close(mic->mic_virtblk.backend);
+	free(mic->mic_virtblk.backend_file);
+}
+
+static bool
+start_virtblk(struct mic_info *mic, struct mic_vring *vring)
+{
+	if (((__u64)&virtblk_dev_page.blk_config % 8) != 0) {
+		mpsslog("%s: blk_config is not 8 byte aligned.\n",
+			mic->name);
+		return false;
+	}
+	add_virtio_device(mic, &virtblk_dev_page.dd);
+	if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd,
+		VIRTIO_ID_BLOCK, vring, NULL, virtblk_dev_page.dd.num_vq)) {
+		mpsslog("%s init_vr failed %s\n",
+			mic->name, strerror(errno));
+		return false;
+	}
+	return true;
+}
+
+static void
+stop_virtblk(struct mic_info *mic)
+{
+	uninit_vr(mic, virtblk_dev_page.dd.num_vq);
+	close(mic->mic_virtblk.virtio_block_fd);
+}
+
+static __u8
+header_error_check(struct vring_desc *desc)
+{
+	if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) {
+		mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n",
+				__func__, __LINE__);
+		return -EIO;
+	}
+	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) {
+		mpsslog("%s() %d: alone\n",
+			__func__, __LINE__);
+		return -EIO;
+	}
+	if (le16toh(desc->flags) & VRING_DESC_F_WRITE) {
+		mpsslog("%s() %d: not read\n",
+			__func__, __LINE__);
+		return -EIO;
+	}
+	return 0;
+}
+
+static int
+read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx)
+{
+	struct iovec iovec;
+	struct mic_copy_desc copy;
+
+	iovec.iov_len = sizeof(*hdr);
+	iovec.iov_base = hdr;
+	copy.iov = &iovec;
+	copy.iovcnt = 1;
+	copy.vr_idx = 0;  /* only one vring on virtio_block */
+	copy.update_used = false;  /* do not update used index */
+	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
+}
+
+static int
+transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt)
+{
+	struct mic_copy_desc copy;
+
+	copy.iov = iovec;
+	copy.iovcnt = iovcnt;
+	copy.vr_idx = 0;  /* only one vring on virtio_block */
+	copy.update_used = false;  /* do not update used index */
+	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
+}
+
+static __u8
+status_error_check(struct vring_desc *desc)
+{
+	if (le32toh(desc->len) != sizeof(__u8)) {
+		mpsslog("%s() %d: length is not sizeof(status)\n",
+			__func__, __LINE__);
+		return -EIO;
+	}
+	return 0;
+}
+
+static int
+write_status(int fd, __u8 *status)
+{
+	struct iovec iovec;
+	struct mic_copy_desc copy;
+
+	iovec.iov_base = status;
+	iovec.iov_len = sizeof(*status);
+	copy.iov = &iovec;
+	copy.iovcnt = 1;
+	copy.vr_idx = 0;  /* only one vring on virtio_block */
+	copy.update_used = true; /* Update used index */
+	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
+}
+
+static void *
+virtio_block(void *arg)
+{
+	struct mic_info *mic = (struct mic_info *) arg;
+	int ret;
+	struct pollfd block_poll;
+	struct mic_vring vring;
+	__u16 avail_idx;
+	__u32 desc_idx;
+	struct vring_desc *desc;
+	struct iovec *iovec, *piov;
+	__u8 status;
+	__u32 buffer_desc_idx;
+	struct virtio_blk_outhdr hdr;
+	void *fos;
+
+	for (;;) {  /* forever */
+		if (!open_backend(mic)) { /* No virtblk */
+			for (mic->mic_virtblk.signaled = 0;
+				!mic->mic_virtblk.signaled;)
+				sleep(1);
+			continue;
+		}
+
+		/* backend file is specified. */
+		if (!start_virtblk(mic, &vring))
+			goto _close_backend;
+		iovec = malloc(sizeof(*iovec) *
+			le32toh(virtblk_dev_page.blk_config.seg_max));
+		if (!iovec) {
+			mpsslog("%s: can't alloc iovec: %s\n",
+				mic->name, strerror(ENOMEM));
+			goto _stop_virtblk;
+		}
+
+		block_poll.fd = mic->mic_virtblk.virtio_block_fd;
+		block_poll.events = POLLIN;
+		for (mic->mic_virtblk.signaled = 0;
+		     !mic->mic_virtblk.signaled;) {
+			block_poll.revents = 0;
+					/* timeout in 1 sec to see signaled */
+			ret = poll(&block_poll, 1, 1000);
+			if (ret < 0) {
+				mpsslog("%s %d: poll failed: %s\n",
+					__func__, __LINE__,
+					strerror(errno));
+				continue;
+			}
+
+			if (!(block_poll.revents & POLLIN)) {
+#ifdef DEBUG
+				mpsslog("%s %d: block_poll.revents=0x%x\n",
+					__func__, __LINE__, block_poll.revents);
+				sleep(1);
+#endif
+				continue;
+			}
+
+			/* POLLIN */
+			while (vring.info->avail_idx !=
+				le16toh(vring.vr.avail->idx)) {
+				/* read header element */
+				avail_idx =
+					vring.info->avail_idx &
+					(vring.vr.num - 1);
+				desc_idx = le16toh(
+					vring.vr.avail->ring[avail_idx]);
+				desc = &vring.vr.desc[desc_idx];
+#ifdef DEBUG
+				mpsslog("%s() %d: avail_idx=%d ",
+					__func__, __LINE__,
+					vring.info->avail_idx);
+				mpsslog("vring.vr.num=%d desc=%p\n",
+					vring.vr.num, desc);
+#endif
+				status = header_error_check(desc);
+				ret = read_header(
+					mic->mic_virtblk.virtio_block_fd,
+					&hdr, desc_idx);
+				if (ret < 0) {
+					mpsslog("%s() %d %s: ret=%d %s\n",
+						__func__, __LINE__,
+						mic->name, ret,
+						strerror(errno));
+					break;
+				}
+				/* buffer element */
+				piov = iovec;
+				status = 0;
+				fos = mic->mic_virtblk.backend_addr +
+					(hdr.sector * SECTOR_SIZE);
+				buffer_desc_idx = desc_idx =
+					next_desc(desc);
+				for (desc = &vring.vr.desc[buffer_desc_idx];
+				     desc->flags & VRING_DESC_F_NEXT;
+				     desc_idx = next_desc(desc),
+					     desc = &vring.vr.desc[desc_idx]) {
+					piov->iov_len = desc->len;
+					piov->iov_base = fos;
+					piov++;
+					fos += desc->len;
+				}
+				/* Returning NULLs for VIRTIO_BLK_T_GET_ID. */
+				if (hdr.type & ~(VIRTIO_BLK_T_OUT |
+					VIRTIO_BLK_T_GET_ID)) {
+					/*
+					  VIRTIO_BLK_T_IN - does not do
+					  anything. Probably for documenting.
+					  VIRTIO_BLK_T_SCSI_CMD - for
+					  virtio_scsi.
+					  VIRTIO_BLK_T_FLUSH - turned off in
+					  config space.
+					  VIRTIO_BLK_T_BARRIER - defined but not
+					  used in anywhere.
+					*/
+					mpsslog("%s() %d: type %x ",
+						__func__, __LINE__,
+						hdr.type);
+					mpsslog("is not supported\n");
+					status = -ENOTSUP;
+
+				} else {
+					ret = transfer_blocks(
+					mic->mic_virtblk.virtio_block_fd,
+						iovec,
+						piov - iovec);
+					if (ret < 0 &&
+						status != 0)
+						status = ret;
+				}
+				/* write status and update used pointer */
+				if (status != 0)
+					status = status_error_check(desc);
+				ret = write_status(
+					mic->mic_virtblk.virtio_block_fd,
+					&status);
+#ifdef DEBUG
+				mpsslog("%s() %d: write status=%d on desc=%p\n",
+					__func__, __LINE__,
+					status, desc);
+#endif
+			}
+		}
+		free(iovec);
+_stop_virtblk:
+		stop_virtblk(mic);
+_close_backend:
+		close_backend(mic);
+	}  /* forever */
+
+	pthread_exit(NULL);
+}
+
+static void
+reset(struct mic_info *mic)
+{
+#define RESET_TIMEOUT 120
+	int i = RESET_TIMEOUT;
+	setsysfs(mic->name, "state", "reset");
+	while (i) {
+		char *state;
+		state = readsysfs(mic->name, "state");
+		if (!state)
+			goto retry;
+		mpsslog("%s: %s %d state %s\n",
+			mic->name, __func__, __LINE__, state);
+		if ((!strcmp(state, "offline"))) {
+			free(state);
+			break;
+		}
+		free(state);
+retry:
+		sleep(1);
+		i--;
+	}
+}
+
+static int
+get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status)
+{
+	if (!strcmp(shutdown_status, "nop"))
+		return MIC_NOP;
+	if (!strcmp(shutdown_status, "crashed"))
+		return MIC_CRASHED;
+	if (!strcmp(shutdown_status, "halted"))
+		return MIC_HALTED;
+	if (!strcmp(shutdown_status, "poweroff"))
+		return MIC_POWER_OFF;
+	if (!strcmp(shutdown_status, "restart"))
+		return MIC_RESTART;
+	mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status);
+	/* Invalid state */
+	assert(0);
+};
+
+static int get_mic_state(struct mic_info *mic, char *state)
+{
+	if (!strcmp(state, "offline"))
+		return MIC_OFFLINE;
+	if (!strcmp(state, "online"))
+		return MIC_ONLINE;
+	if (!strcmp(state, "shutting_down"))
+		return MIC_SHUTTING_DOWN;
+	if (!strcmp(state, "reset_failed"))
+		return MIC_RESET_FAILED;
+	mpsslog("%s: BUG invalid state %s\n", mic->name, state);
+	/* Invalid state */
+	assert(0);
+};
+
+static void mic_handle_shutdown(struct mic_info *mic)
+{
+#define SHUTDOWN_TIMEOUT 60
+	int i = SHUTDOWN_TIMEOUT, ret, stat = 0;
+	char *shutdown_status;
+	while (i) {
+		shutdown_status = readsysfs(mic->name, "shutdown_status");
+		if (!shutdown_status)
+			continue;
+		mpsslog("%s: %s %d shutdown_status %s\n",
+			mic->name, __func__, __LINE__, shutdown_status);
+		switch (get_mic_shutdown_status(mic, shutdown_status)) {
+		case MIC_RESTART:
+			mic->restart = 1;
+		case MIC_HALTED:
+		case MIC_POWER_OFF:
+		case MIC_CRASHED:
+			goto reset;
+		default:
+			break;
+		}
+		free(shutdown_status);
+		sleep(1);
+		i--;
+	}
+reset:
+	ret = kill(mic->pid, SIGTERM);
+	mpsslog("%s: %s %d kill pid %d ret %d\n",
+		mic->name, __func__, __LINE__,
+		mic->pid, ret);
+	if (!ret) {
+		ret = waitpid(mic->pid, &stat,
+			WIFSIGNALED(stat));
+		mpsslog("%s: %s %d waitpid ret %d pid %d\n",
+			mic->name, __func__, __LINE__,
+			ret, mic->pid);
+	}
+	if (ret == mic->pid)
+		reset(mic);
+}
+
+static void *
+mic_config(void *arg)
+{
+	struct mic_info *mic = (struct mic_info *)arg;
+	char *state = NULL;
+	char pathname[PATH_MAX];
+	int fd, ret;
+	struct pollfd ufds[1];
+	char value[4096];
+
+	snprintf(pathname, PATH_MAX - 1, "%s/%s/%s",
+		MICSYSFSDIR, mic->name, "state");
+
+	fd = open(pathname, O_RDONLY);
+	if (fd < 0) {
+		mpsslog("%s: opening file %s failed %s\n",
+			mic->name, pathname, strerror(errno));
+		goto error;
+	}
+
+	do {
+		ret = read(fd, value, sizeof(value));
+		if (ret < 0) {
+			mpsslog("%s: Failed to read sysfs entry '%s': %s\n",
+				mic->name, pathname, strerror(errno));
+			goto close_error1;
+		}
+retry:
+		state = readsysfs(mic->name, "state");
+		if (!state)
+			goto retry;
+		mpsslog("%s: %s %d state %s\n",
+			mic->name, __func__, __LINE__, state);
+		switch (get_mic_state(mic, state)) {
+		case MIC_SHUTTING_DOWN:
+			mic_handle_shutdown(mic);
+			goto close_error;
+		default:
+			break;
+		}
+		free(state);
+
+		ufds[0].fd = fd;
+		ufds[0].events = POLLERR | POLLPRI;
+		ret = poll(ufds, 1, -1);
+		if (ret < 0) {
+			mpsslog("%s: poll failed %s\n",
+				mic->name, strerror(errno));
+			goto close_error1;
+		}
+	} while (1);
+close_error:
+	free(state);
+close_error1:
+	close(fd);
+error:
+	init_mic(mic);
+	pthread_exit(NULL);
+}
+
+static void
+set_cmdline(struct mic_info *mic)
+{
+	char buffer[PATH_MAX];
+	int len;
+
+	len = snprintf(buffer, PATH_MAX,
+		"clocksource=tsc highres=off nohz=off ");
+	len += snprintf(buffer + len, PATH_MAX,
+		"cpufreq_on;corec6_off;pc3_off;pc6_off ");
+	len += snprintf(buffer + len, PATH_MAX,
+		"ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0",
+		mic->id);
+
+	setsysfs(mic->name, "cmdline", buffer);
+	mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer);
+	snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id);
+	mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer);
+}
+
+static void
+set_log_buf_info(struct mic_info *mic)
+{
+	int fd;
+	off_t len;
+	char system_map[] = "/lib/firmware/mic/System.map";
+	char *map, *temp, log_buf[17] = {'\0'};
+
+	fd = open(system_map, O_RDONLY);
+	if (fd < 0) {
+		mpsslog("%s: Opening System.map failed: %d\n",
+			mic->name, errno);
+		return;
+	}
+	len = lseek(fd, 0, SEEK_END);
+	if (len < 0) {
+		mpsslog("%s: Reading System.map size failed: %d\n",
+			mic->name, errno);
+		close(fd);
+		return;
+	}
+	map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (map == MAP_FAILED) {
+		mpsslog("%s: mmap of System.map failed: %d\n",
+			mic->name, errno);
+		close(fd);
+		return;
+	}
+	temp = strstr(map, "__log_buf");
+	if (!temp) {
+		mpsslog("%s: __log_buf not found: %d\n", mic->name, errno);
+		munmap(map, len);
+		close(fd);
+		return;
+	}
+	strncpy(log_buf, temp - 19, 16);
+	setsysfs(mic->name, "log_buf_addr", log_buf);
+	mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf);
+	temp = strstr(map, "log_buf_len");
+	if (!temp) {
+		mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno);
+		munmap(map, len);
+		close(fd);
+		return;
+	}
+	strncpy(log_buf, temp - 19, 16);
+	setsysfs(mic->name, "log_buf_len", log_buf);
+	mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf);
+	munmap(map, len);
+	close(fd);
+}
+
+static void init_mic(struct mic_info *mic);
+
+static void
+change_virtblk_backend(int x, siginfo_t *siginfo, void *p)
+{
+	struct mic_info *mic;
+
+	for (mic = mic_list.next; mic != NULL; mic = mic->next)
+		mic->mic_virtblk.signaled = 1/* true */;
+}
+
+static void
+init_mic(struct mic_info *mic)
+{
+	struct sigaction ignore = {
+		.sa_flags = 0,
+		.sa_handler = SIG_IGN
+	};
+	struct sigaction act = {
+		.sa_flags = SA_SIGINFO,
+		.sa_sigaction = change_virtblk_backend,
+	};
+	char buffer[PATH_MAX];
+	int err;
+
+		/* ignore SIGUSR1 for both process */
+	sigaction(SIGUSR1, &ignore, NULL);
+
+	mic->pid = fork();
+	switch (mic->pid) {
+	case 0:
+		set_log_buf_info(mic);
+		set_cmdline(mic);
+		add_virtio_device(mic, &virtcons_dev_page.dd);
+		add_virtio_device(mic, &virtnet_dev_page.dd);
+		err = pthread_create(&mic->mic_console.console_thread, NULL,
+			virtio_console, mic);
+		if (err)
+			mpsslog("%s virtcons pthread_create failed %s\n",
+			mic->name, strerror(err));
+		/*
+		 * TODO: Debug why not adding this sleep results in the tap
+		 * interface not coming up during certain runs sporadically.
+		 */
+		usleep(1000);
+		err = pthread_create(&mic->mic_net.net_thread, NULL,
+			virtio_net, mic);
+		if (err)
+			mpsslog("%s virtnet pthread_create failed %s\n",
+			mic->name, strerror(err));
+		err = pthread_create(&mic->mic_virtblk.block_thread, NULL,
+			virtio_block, mic);
+		if (err)
+			mpsslog("%s virtblk pthread_create failed %s\n",
+			mic->name, strerror(err));
+		sigemptyset(&act.sa_mask);
+		err = sigaction(SIGUSR1, &act, NULL);
+		if (err)
+			mpsslog("%s sigaction SIGUSR1 failed %s\n",
+			mic->name, strerror(errno));
+		while (1)
+			sleep(60);
+	case -1:
+		mpsslog("fork failed MIC name %s id %d errno %d\n",
+			mic->name, mic->id, errno);
+		break;
+	default:
+		if (mic->restart) {
+			snprintf(buffer, PATH_MAX,
+				"boot:linux:mic/uos.img:mic/mic%d.image",
+				mic->id);
+			setsysfs(mic->name, "state", buffer);
+			mpsslog("%s restarting mic %d\n",
+				mic->name, mic->restart);
+			mic->restart = 0;
+		}
+		pthread_create(&mic->config_thread, NULL, mic_config, mic);
+	}
+}
+
+static void
+start_daemon(void)
+{
+	struct mic_info *mic;
+
+	for (mic = mic_list.next; mic != NULL; mic = mic->next)
+		init_mic(mic);
+
+	while (1)
+		sleep(60);
+}
+
+static int
+init_mic_list(void)
+{
+	struct mic_info *mic = &mic_list;
+	struct dirent *file;
+	DIR *dp;
+	int cnt = 0;
+
+	dp = opendir(MICSYSFSDIR);
+	if (!dp)
+		return 0;
+
+	while ((file = readdir(dp)) != NULL) {
+		if (!strncmp(file->d_name, "mic", 3)) {
+			mic->next = malloc(sizeof(struct mic_info));
+			if (mic->next) {
+				mic = mic->next;
+				mic->next = NULL;
+				memset(mic, 0, sizeof(struct mic_info));
+				mic->id = atoi(&file->d_name[3]);
+				mic->name = malloc(strlen(file->d_name) + 16);
+				if (mic->name)
+					strcpy(mic->name, file->d_name);
+				mpsslog("MIC name %s id %d\n", mic->name,
+					mic->id);
+				cnt++;
+			}
+		}
+	}
+
+	closedir(dp);
+	return cnt;
+}
+
+void
+mpsslog(char *format, ...)
+{
+	va_list args;
+	char buffer[4096];
+	time_t t;
+	char *ts;
+
+	if (logfp == NULL)
+		return;
+
+	va_start(args, format);
+	vsprintf(buffer, format, args);
+	va_end(args);
+
+	time(&t);
+	ts = ctime(&t);
+	ts[strlen(ts) - 1] = '\0';
+	fprintf(logfp, "%s: %s", ts, buffer);
+
+	fflush(logfp);
+}
+
+int
+main(int argc, char *argv[])
+{
+	int cnt;
+
+	myname = argv[0];
+
+	logfp = fopen(LOGFILE_NAME, "a+");
+	if (!logfp) {
+		fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME);
+		exit(1);
+	}
+
+	mpsslog("MIC Daemon start\n");
+
+	cnt = init_mic_list();
+	if (cnt == 0) {
+		mpsslog("MIC module not loaded\n");
+		exit(2);
+	}
+	mpsslog("MIC found %d devices\n", cnt);
+
+	start_daemon();
+
+	exit(0);
+}
diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h
new file mode 100644
index 0000000..b6dee38
--- /dev/null
+++ b/Documentation/mic/mpssd/mpssd.h
@@ -0,0 +1,100 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC User Space Tools.
+ */
+#ifndef _MPSSD_H_
+#define _MPSSD_H_
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <dirent.h>
+#include <libgen.h>
+#include <pthread.h>
+#include <stdarg.h>
+#include <time.h>
+#include <errno.h>
+#include <sys/dir.h>
+#include <sys/ioctl.h>
+#include <sys/poll.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/utsname.h>
+#include <sys/wait.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <netdb.h>
+#include <pthread.h>
+#include <signal.h>
+#include <limits.h>
+#include <syslog.h>
+#include <getopt.h>
+#include <net/if.h>
+#include <linux/if_tun.h>
+#include <linux/if_tun.h>
+#include <linux/virtio_ids.h>
+
+#define MICSYSFSDIR "/sys/class/mic"
+#define LOGFILE_NAME "/var/log/mpssd"
+#define PAGE_SIZE 4096
+
+struct mic_console_info {
+	pthread_t       console_thread;
+	int		virtio_console_fd;
+	void		*console_dp;
+};
+
+struct mic_net_info {
+	pthread_t       net_thread;
+	int		virtio_net_fd;
+	int		tap_fd;
+	void		*net_dp;
+};
+
+struct mic_virtblk_info {
+	pthread_t       block_thread;
+	int		virtio_block_fd;
+	void		*block_dp;
+	volatile sig_atomic_t	signaled;
+	char		*backend_file;
+	int		backend;
+	void		*backend_addr;
+	long		backend_size;
+};
+
+struct mic_info {
+	int		id;
+	char		*name;
+	pthread_t       config_thread;
+	pid_t		pid;
+	struct mic_console_info	mic_console;
+	struct mic_net_info	mic_net;
+	struct mic_virtblk_info	mic_virtblk;
+	int		restart;
+	struct mic_info *next;
+};
+
+void mpsslog(char *format, ...);
+char *readsysfs(char *dir, char *entry);
+int setsysfs(char *dir, char *entry, char *value);
+#endif
diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c
new file mode 100644
index 0000000..3244dcf
--- /dev/null
+++ b/Documentation/mic/mpssd/sysfs.c
@@ -0,0 +1,103 @@
+/*
+ * Intel MIC Platform Software Stack (MPSS)
+ *
+ * Copyright(c) 2013 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Intel MIC User Space Tools.
+ */
+
+#include "mpssd.h"
+
+#define PAGE_SIZE 4096
+
+char *
+readsysfs(char *dir, char *entry)
+{
+	char filename[PATH_MAX];
+	char value[PAGE_SIZE];
+	char *string = NULL;
+	int fd;
+	int len;
+
+	if (dir == NULL)
+		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
+	else
+		snprintf(filename, PATH_MAX,
+			"%s/%s/%s", MICSYSFSDIR, dir, entry);
+
+	fd = open(filename, O_RDONLY);
+	if (fd < 0) {
+		mpsslog("Failed to open sysfs entry '%s': %s\n",
+			filename, strerror(errno));
+		return NULL;
+	}
+
+	len = read(fd, value, sizeof(value));
+	if (len < 0) {
+		mpsslog("Failed to read sysfs entry '%s': %s\n",
+			filename, strerror(errno));
+		goto readsys_ret;
+	}
+
+	value[len] = '\0';
+
+	string = malloc(strlen(value) + 1);
+	if (string)
+		strcpy(string, value);
+
+readsys_ret:
+	close(fd);
+	return string;
+}
+
+int
+setsysfs(char *dir, char *entry, char *value)
+{
+	char filename[PATH_MAX];
+	char oldvalue[PAGE_SIZE];
+	int fd;
+
+	if (dir == NULL)
+		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
+	else
+		snprintf(filename, PATH_MAX, "%s/%s/%s",
+			MICSYSFSDIR, dir, entry);
+
+	fd = open(filename, O_RDWR);
+	if (fd < 0) {
+		mpsslog("Failed to open sysfs entry '%s': %s\n",
+			filename, strerror(errno));
+		return errno;
+	}
+
+	if (read(fd, oldvalue, sizeof(oldvalue)) < 0) {
+		mpsslog("Failed to read sysfs entry '%s': %s\n",
+			filename, strerror(errno));
+		close(fd);
+		return errno;
+	}
+
+	if (strcmp(value, oldvalue)) {
+		if (write(fd, value, strlen(value)) < 0) {
+			mpsslog("Failed to write new sysfs entry '%s': %s\n",
+				filename, strerror(errno));
+			close(fd);
+			return errno;
+		}
+	}
+
+	close(fd);
+	return 0;
+}
-- 
1.8.2.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon.
  2013-08-08  3:04 ` [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon Sudeep Dutt
@ 2013-08-08  6:40     ` Michael S. Tsirkin
  0 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2013-08-08  6:40 UTC (permalink / raw)
  To: Sudeep Dutt
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell, Rob Landley,
	linux-kernel, virtualization, linux-doc, asias, Nikhil Rao,
	Ashutosh Dixit, Caz Yokoyama, Dasaratharaman Chandramouli,
	Harshavardhan R Kharche, Yaozu (Eddie) Dong,
	Peter P Waskiewicz Jr

On Wed, Aug 07, 2013 at 08:04:13PM -0700, Sudeep Dutt wrote:
> From: Caz Yokoyama <Caz.Yokoyama@intel.com>
> 
> This patch introduces a sample user space daemon which
> implements the virtio device backends on the host. The daemon
> creates/removes/configures virtio device backends by communicating with
> the Intel MIC Host Driver. The virtio devices currently supported are
> virtio net, virtio console and virtio block. Virtio net supports TSO/GSO.
> The daemon also monitors card shutdown status and takes appropriate actions
> like killing the virtio backends and resetting the card upon card shutdown
> and crashes.
> 
> Co-author: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Co-author: Sudeep Dutt <sudeep.dutt@intel.com>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
> Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
> ---
>  Documentation/mic/mic_overview.txt |   48 +
>  Documentation/mic/mpssd/.gitignore |    1 +
>  Documentation/mic/mpssd/Makefile   |   19 +
>  Documentation/mic/mpssd/micctrl    |  152 ++++
>  Documentation/mic/mpssd/mpss       |  245 ++++++
>  Documentation/mic/mpssd/mpssd.c    | 1689 ++++++++++++++++++++++++++++++++++++
>  Documentation/mic/mpssd/mpssd.h    |  100 +++
>  Documentation/mic/mpssd/sysfs.c    |  103 +++

Is this generally useful or just example code?
If the former, you can put it in tools/ as well.

>  8 files changed, 2357 insertions(+)
>  create mode 100644 Documentation/mic/mic_overview.txt
>  create mode 100644 Documentation/mic/mpssd/.gitignore
>  create mode 100644 Documentation/mic/mpssd/Makefile
>  create mode 100755 Documentation/mic/mpssd/micctrl
>  create mode 100755 Documentation/mic/mpssd/mpss
>  create mode 100644 Documentation/mic/mpssd/mpssd.c
>  create mode 100644 Documentation/mic/mpssd/mpssd.h
>  create mode 100644 Documentation/mic/mpssd/sysfs.c
> 
> diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt
> new file mode 100644
> index 0000000..8b1a916
> --- /dev/null
> +++ b/Documentation/mic/mic_overview.txt
> @@ -0,0 +1,48 @@
> +An Intel MIC X100 device is a PCIe form factor add-in coprocessor
> +card based on the Intel Many Integrated Core (MIC) architecture
> +that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
> +implements the three required standard address spaces i.e. configuration,
> +memory and I/O. The host OS loads a device driver as is typical for
> +PCIe devices. The card itself runs a bootstrap after reset that
> +transfers control to the card OS downloaded from the host driver.
> +The card OS as shipped by Intel is a Linux kernel with modifications
> +for the X100 devices.
> +
> +Since it is a PCIe card, it does not have the ability to host hardware
> +devices for networking, storage and console. We provide these devices
> +on X100 coprocessors thus enabling a self-bootable equivalent environment
> +for applications. A key benefit of our solution is that it leverages
> +the standard virtio framework for network, disk and console devices,
> +though in our case the virtio framework is used across a PCIe bus.
> +
> +Here is a block diagram of the various components described above. The
> +virtio backends are situated on the host rather than the card given better
> +single threaded performance for the host compared to MIC and the ability of
> +the host to initiate DMA's to/from the card using the MIC DMA engine.
> +
> +                              |
> +       +----------+           |             +----------+
> +       | Card OS  |           |             | Host OS  |
> +       +----------+           |             +----------+
> +                              |
> ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> +| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
> +| Net   | |Console | |Block | | |Net      |  |Console | |Block   |
> +| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
> ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> +    |         |         |     |      |            |         |
> +    |         |         |     |Ring 3|            |         |
> +    |         |         |     |------|------------|---------|-------
> +    +-------------------+     |Ring 0+--------------------------+
> +              |               |      | Virtio over PCIe IOCTLs  |
> +              |               |      +--------------------------+
> +      +--------------+        |                   |
> +      |Intel MIC     |        |            +---------------+
> +      |Card Driver   |        |            |Intel MIC      |
> +      +--------------+        |            |Host Driver    |
> +              |               |            +---------------+
> +              |               |                   |
> +     +-------------------------------------------------------------+
> +     |                                                             |
> +     |                    PCIe Bus                                 |
> +     +-------------------------------------------------------------+
> diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore
> new file mode 100644
> index 0000000..8b7c72f
> --- /dev/null
> +++ b/Documentation/mic/mpssd/.gitignore
> @@ -0,0 +1 @@
> +mpssd
> diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile
> new file mode 100644
> index 0000000..eb860a7
> --- /dev/null
> +++ b/Documentation/mic/mpssd/Makefile
> @@ -0,0 +1,19 @@
> +#
> +# Makefile - Intel MIC User Space Tools.
> +# Copyright(c) 2013, Intel Corporation.
> +#
> +ifdef DEBUG
> +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG)
> +else
> +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall
> +endif
> +
> +mpssd: mpssd.o sysfs.o
> +	$(CC) $(CFLAGS) -o $@ $^ -lpthread
> +
> +install:
> +	install mpssd /usr/sbin/mpssd
> +	install micctrl /usr/sbin/micctrl
> +
> +clean:
> +	rm -f mpssd *.o
> diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl
> new file mode 100755
> index 0000000..e0cfa53
> --- /dev/null
> +++ b/Documentation/mic/mpssd/micctrl
> @@ -0,0 +1,152 @@
> +#!/bin/bash
> +# Intel MIC Platform Software Stack (MPSS)
> +#
> +# Copyright(c) 2013 Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License, version 2, as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Intel MIC User Space Tools.
> +#
> +# micctrl - Controls MIC boot/start/stop.
> +#
> +# chkconfig: 2345 95 05
> +# description: start MPSS stack processing.
> +#
> +### BEGIN INIT INFO
> +# Provides: micctrl
> +### END INIT INFO
> +
> +# Source function library.
> +. /etc/init.d/functions
> +
> +sysfs="/sys/class/mic"
> +
> +status()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`"
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -e ""`basename $f`" state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`""
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +reset()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo reset > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo reset > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +boot()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo "boot:linux:mic/uos.img:mic/$1.image" > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +shutdown()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo shutdown > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo shutdown > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +wait()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> +		do
> +			sleep 1
> +			echo -e "Waiting for $1 to go offline"
> +		done
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		# Wait for the cards to go offline
> +		for f in $sysfs/*
> +		do
> +			while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> +			do
> +				sleep 1
> +				echo -e "Waiting for "`basename $f`" to go offline"
> +			done
> +		done
> +	fi
> +}
> +
> +case $1 in
> +	-s)
> +		status $2
> +		;;
> +	-r)
> +		reset $2
> +		;;
> +	-b)
> +		boot $2
> +		;;
> +	-S)
> +		shutdown $2
> +		;;
> +	-w)
> +		wait $2
> +		;;
> +	*)
> +		echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}"
> +		exit 2
> +esac
> +
> +exit $?
> diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss
> new file mode 100755
> index 0000000..f0bb3dd
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpss
> @@ -0,0 +1,245 @@
> +#!/bin/bash
> +# Intel MIC Platform Software Stack (MPSS)
> +#
> +# Copyright(c) 2013 Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License, version 2, as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Intel MIC User Space Tools.
> +#
> +# mpss	Start mpssd.
> +#
> +# chkconfig: 2345 95 05
> +# description: start MPSS stack processing.
> +#
> +### BEGIN INIT INFO
> +# Provides: mpss
> +# Required-Start:
> +# Required-Stop:
> +# Short-Description: MPSS stack control
> +# Description: MPSS stack control
> +### END INIT INFO
> +
> +# Source function library.
> +. /etc/init.d/functions
> +
> +exec=/usr/sbin/mpssd
> +sysfs="/sys/class/mic"
> +
> +start()
> +{
> +	[ -x $exec ] || exit 5
> +
> +	echo -e $"Starting MPSS Stack"
> +
> +	echo -e $"Loading MIC_HOST Module"
> +
> +	# Ensure the driver is loaded
> +	[ -d "$sysfs" ] || modprobe mic_host
> +
> +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then
> +		echo -e $"MPSSD already running! "
> +		success
> +		echo
> +		return 0;
> +	fi
> +
> +	# Start the daemon
> +	echo -n $"Starting MPSSD"
> +	$exec &
> +	RETVAL=$?
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +
> +	sleep 5
> +
> +	# Boot the cards
> +	if [ $RETVAL -eq 0 ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -ne "Booting "`basename $f`" "
> +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> +			RETVAL=$?
> +			if [ $RETVAL -ne 0 ]; then
> +				failure
> +			else
> +				success
> +			fi
> +			echo
> +		done
> +	fi
> +
> +	# Wait till ping works
> +	if [ $RETVAL -eq 0 ]; then
> +		for f in $sysfs/*
> +		do
> +			count=100
> +			ipaddr=`cat $f/cmdline`
> +			ipaddr=${ipaddr#*address,}
> +			ipaddr=`echo $ipaddr | cut -d, -f1 | cut -d\; -f1`
> +
> +			while [ $count -ge 0 ]
> +			do
> +				echo -e "Pinging "`basename $f`" "
> +				ping -c 1 $ipaddr &> /dev/null
> +				RETVAL=$?
> +				if [ $RETVAL -eq 0 ]; then
> +					success
> +					break
> +				fi
> +				sleep 1
> +				count=`expr $count - 1`
> +			done
> +			if [ $RETVAL -ne 0 ]; then
> +				failure
> +			else
> +				success
> +			fi
> +			echo
> +		done
> +	fi
> +	return $RETVAL
> +}
> +
> +stop()
> +{
> +	echo -e $"Shutting down MPSS Stack: "
> +
> +	# Bail out if module is unloaded
> +	if [ ! -d "$sysfs" ]; then
> +		echo -n $"Module unloaded "
> +		killall -9 mpssd 2>/dev/null
> +		success
> +		echo
> +		return 0
> +	fi
> +
> +	# Shut down the cards
> +	for f in $sysfs/*
> +	do
> +		echo -e "Shutting down `basename $f` "
> +		echo "shutdown" > $f/state 2>/dev/null
> +	done
> +
> +	# Wait for the cards to go offline
> +	for f in $sysfs/*
> +	do
> +		while [ "`cat $f/state`" != "offline" ]
> +		do
> +			sleep 1
> +			echo -e "Waiting for "`basename $f`" to go offline"
> +		done
> +	done
> +
> +	# Display the status of the cards
> +	for f in $sysfs/*
> +	do
> +		echo -e ""`basename $f`" state: "`cat $f/state`""
> +	done
> +
> +	sleep 5
> +
> +	# Kill MPSSD now
> +	echo -n $"Killing MPSSD"
> +	killall -9 mpssd 2>/dev/null
> +	RETVAL=$?
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +	return $RETVAL
> +}
> +
> +restart()
> +{
> +	stop
> +	sleep 5
> +	start
> +}
> +
> +status()
> +{
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -e ""`basename $f`" state: "`cat $f/state`""
> +		done
> +	fi
> +
> +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then
> +		echo "mpssd is running"
> +	else
> +		echo "mpssd is stopped"
> +	fi
> +	return 0
> +}
> +
> +unload()
> +{
> +	if [ ! -d "$sysfs" ]; then
> +		echo -n $"No MIC_HOST Module: "
> +		killall -9 mpssd 2>/dev/null
> +		success
> +		echo
> +		return
> +	fi
> +
> +	stop
> +	RETVAL=$?
> +
> +	sleep 5
> +	echo -n $"Removing MIC_HOST Module: "
> +
> +	if [ $RETVAL = 0 ]; then
> +		sleep 1
> +		modprobe -r mic_host
> +		RETVAL=$?
> +	fi
> +
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +	return $RETVAL
> +}
> +
> +case $1 in
> +	start)
> +		start
> +		;;
> +	stop)
> +		stop
> +		;;
> +	restart)
> +		restart
> +		;;
> +	status)
> +		status
> +		;;
> +	unload)
> +		unload
> +		;;
> +	*)
> +		echo $"Usage: $0 {start|stop|restart|status|unload}"
> +		exit 2
> +esac
> +
> +exit $?
> diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c
> new file mode 100644
> index 0000000..3bc34cb
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpssd.c
> @@ -0,0 +1,1689 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include <stdlib.h>
> +#include <fcntl.h>
> +#include <getopt.h>
> +#include <assert.h>
> +#include <unistd.h>
> +#include <stdbool.h>
> +#include <signal.h>
> +#include <poll.h>
> +#include <features.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <sys/mman.h>
> +#include <sys/socket.h>
> +#include <linux/virtio_ring.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_console.h>
> +#include <linux/virtio_blk.h>
> +#include <linux/version.h>
> +#include "mpssd.h"
> +#include <linux/mic_ioctl.h>
> +#include <linux/mic_common.h>
> +
> +static void init_mic(struct mic_info *mic);
> +
> +static FILE *logfp;
> +static struct mic_info mic_list;
> +
> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +
> +#define min_t(type, x, y) ({				\
> +		type __min1 = (x);                      \
> +		type __min2 = (y);                      \
> +		__min1 < __min2 ? __min1 : __min2; })
> +
> +/* align addr on a size boundary - adjust address up/down if needed */
> +#define _ALIGN_UP(addr, size)    (((addr)+((size)-1))&(~((size)-1)))
> +#define _ALIGN_DOWN(addr, size)  ((addr)&(~((size)-1)))
> +
> +/* align addr on a size boundary - adjust address up if needed */
> +#define _ALIGN(addr, size)     _ALIGN_UP(addr, size)
> +
> +/* to align the pointer to the (next) page boundary */
> +#define PAGE_ALIGN(addr)        _ALIGN(addr, PAGE_SIZE)
> +
> +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +
> +/* Insert REP NOP (PAUSE) in busy-wait loops. */
> +static inline void cpu_relax(void)
> +{
> +	asm volatile("rep; nop" : : : "memory");
> +}
> +
> +#define GSO_ENABLED		1
> +#define MAX_GSO_SIZE		(64 * 1024)
> +#define ETH_H_LEN		14
> +#define MAX_NET_PKT_SIZE	(_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64))
> +#define MIC_DEVICE_PAGE_END	0x1000
> +
> +#ifndef VIRTIO_NET_HDR_F_DATA_VALID
> +#define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
> +#endif
> +
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[2];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_console_config cons_config;
> +} virtcons_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_CONSOLE,
> +		.num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig),
> +		.feature_len = sizeof(virtcons_dev_page.host_features),
> +		.config_len = sizeof(virtcons_dev_page.cons_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.vqconfig[1] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +};
> +
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[2];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_net_config net_config;
> +} virtnet_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_NET,
> +		.num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig),
> +		.feature_len = sizeof(virtnet_dev_page.host_features),
> +		.config_len = sizeof(virtnet_dev_page.net_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.vqconfig[1] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +#if GSO_ENABLED
> +		.host_features = htole32(
> +		1 << VIRTIO_NET_F_CSUM |
> +		1 << VIRTIO_NET_F_GSO |
> +		1 << VIRTIO_NET_F_GUEST_TSO4 |
> +		1 << VIRTIO_NET_F_GUEST_TSO6 |
> +		1 << VIRTIO_NET_F_GUEST_ECN |
> +		1 << VIRTIO_NET_F_GUEST_UFO),
> +#else
> +		.host_features = 0,
> +#endif
> +};
> +
> +static const char *mic_config_dir = "/etc/sysconfig/mic";
> +static const char *virtblk_backend = "VIRTBLK_BACKEND";
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[1];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_blk_config blk_config;
> +} virtblk_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_BLOCK,
> +		.num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig),
> +		.feature_len = sizeof(virtblk_dev_page.host_features),
> +		.config_len = sizeof(virtblk_dev_page.blk_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.host_features =
> +		htole32(1<<VIRTIO_BLK_F_SEG_MAX),
> +	.blk_config = {
> +		.seg_max = htole32(MIC_VRING_ENTRIES - 2),
> +		.capacity = htole64(0),
> +	 }
> +};
> +
> +static char *myname;
> +
> +static int
> +tap_configure(struct mic_info *mic, char *dev)
> +{
> +	pid_t pid;
> +	char *ifargv[7];
> +	char ipaddr[IFNAMSIZ];
> +	int ret = 0;
> +
> +	pid = fork();
> +	if (pid == 0) {
> +		ifargv[0] = "ip";
> +		ifargv[1] = "link";
> +		ifargv[2] = "set";
> +		ifargv[3] = dev;
> +		ifargv[4] = "up";
> +		ifargv[5] = NULL;
> +		mpsslog("Configuring %s\n", dev);
> +		ret = execvp("ip", ifargv);
> +		if (ret < 0) {
> +			mpsslog("%s execvp failed errno %s\n",
> +				mic->name, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	if (pid < 0) {
> +		mpsslog("%s fork failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	ret = waitpid(pid, NULL, 0);
> +	if (ret < 0) {
> +		mpsslog("%s waitpid failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id);
> +
> +	pid = fork();
> +	if (pid == 0) {
> +		ifargv[0] = "ip";
> +		ifargv[1] = "addr";
> +		ifargv[2] = "add";
> +		ifargv[3] = ipaddr;
> +		ifargv[4] = "dev";
> +		ifargv[5] = dev;
> +		ifargv[6] = NULL;
> +		mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr);
> +		ret = execvp("ip", ifargv);
> +		if (ret < 0) {
> +			mpsslog("%s execvp failed errno %s\n",
> +				mic->name, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	if (pid < 0) {
> +		mpsslog("%s fork failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	ret = waitpid(pid, NULL, 0);
> +	if (ret < 0) {
> +		mpsslog("%s waitpid failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +	mpsslog("MIC name %s %s %d DONE!\n",
> +		mic->name, __func__, __LINE__);
> +	return 0;
> +}
> +
> +static int tun_alloc(struct mic_info *mic, char *dev)
> +{
> +	struct ifreq ifr;
> +	int fd, err;
> +#if GSO_ENABLED
> +	unsigned offload;
> +#endif
> +	fd = open("/dev/net/tun", O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Could not open /dev/net/tun %s\n", strerror(errno));
> +		goto done;
> +	}
> +
> +	memset(&ifr, 0, sizeof(ifr));
> +
> +	ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
> +	if (*dev)
> +		strncpy(ifr.ifr_name, dev, IFNAMSIZ);
> +
> +	err = ioctl(fd, TUNSETIFF, (void *) &ifr);
> +	if (err < 0) {
> +		mpsslog("%s %s %d TUNSETIFF failed %s\n",
> +			mic->name, __func__, __LINE__, strerror(errno));
> +		close(fd);
> +		return err;
> +	}
> +#if GSO_ENABLED
> +	offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> +		TUN_F_TSO_ECN | TUN_F_UFO;
> +
> +	err = ioctl(fd, TUNSETOFFLOAD, offload);
> +	if (err < 0) {
> +		mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n",
> +			mic->name, __func__, __LINE__, strerror(errno));
> +		close(fd);
> +		return err;
> +	}
> +#endif
> +	strcpy(dev, ifr.ifr_name);
> +	mpsslog("Created TAP %s\n", dev);
> +done:
> +	return fd;
> +}
> +
> +#define NET_FD_VIRTIO_NET 0
> +#define NET_FD_TUN 1
> +#define MAX_NET_FD 2
> +
> +static void * *
> +get_dp(struct mic_info *mic, int type)
> +{
> +	switch (type) {
> +	case VIRTIO_ID_CONSOLE:
> +		return &mic->mic_console.console_dp;
> +	case VIRTIO_ID_NET:
> +		return &mic->mic_net.net_dp;
> +	case VIRTIO_ID_BLOCK:
> +		return &mic->mic_virtblk.block_dp;
> +	}
> +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> +	assert(0);
> +	return NULL;
> +}
> +
> +static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type)
> +{
> +	struct mic_device_desc *d;
> +	int i;
> +	void *dp = *get_dp(mic, type);
> +
> +	for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE;
> +		i += mic_total_desc_size(d)) {
> +		d = dp + i;
> +
> +		/* End of list */
> +		if (d->type == 0)
> +			break;
> +
> +		if (d->type == -1)
> +			continue;
> +
> +		mpsslog("%s %s d-> type %d d %p\n",
> +			mic->name, __func__, d->type, d);
> +
> +		if (d->type == (__u8)type)
> +			return d;
> +	}
> +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> +	assert(0);
> +	return NULL;
> +}
> +
> +/* See comments in vhost.c for explanation of next_desc() */
> +static unsigned next_desc(struct vring_desc *desc)
> +{
> +	unsigned int next;
> +
> +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT))
> +		return -1U;
> +	next = le16toh(desc->next);
> +	return next;
> +}
> +
> +/* Sum up all the IOVEC length */
> +static ssize_t
> +sum_iovec_len(struct mic_copy_desc *copy)
> +{
> +	ssize_t sum = 0;
> +	int i;
> +
> +	for (i = 0; i < copy->iovcnt; i++)
> +		sum += copy->iov[i].iov_len;
> +	return sum;
> +}
> +
> +static inline void verify_out_len(struct mic_info *mic,
> +	struct mic_copy_desc *copy)
> +{
> +	if (copy->out_len != sum_iovec_len(copy)) {
> +		mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%x\n",
> +				mic->name, __func__, __LINE__,
> +				copy->out_len, sum_iovec_len(copy));
> +		assert(copy->out_len == sum_iovec_len(copy));
> +	}
> +}
> +
> +/* Display an iovec */
> +static void
> +disp_iovec(struct mic_info *mic, struct mic_copy_desc *copy,
> +	const char *s, int line)
> +{
> +	int i;
> +
> +	for (i = 0; i < copy->iovcnt; i++)
> +		mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%lx\n",
> +			mic->name, s, line, i,
> +			copy->iov[i].iov_base, copy->iov[i].iov_len);
> +}
> +
> +static inline __u16 read_avail_idx(struct mic_vring *vr)
> +{
> +	return ACCESS_ONCE(vr->info->avail_idx);
> +}
> +
> +static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
> +				struct mic_copy_desc *copy, ssize_t len)
> +{
> +	copy->vr_idx = tx ? 0 : 1;
> +	copy->update_used = true;
> +	if (type == VIRTIO_ID_NET)
> +		copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr);
> +	else
> +		copy->iov[0].iov_len = len;
> +}
> +
> +/* Central API which triggers the copies */
> +static int
> +mic_virtio_copy(struct mic_info *mic, int fd,
> +	struct mic_vring *vr, struct mic_copy_desc *copy)
> +{
> +	int ret;
> +
> +	ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, copy);
> +	if (ret) {
> +		mpsslog("%s %s %d errno %s ret %d\n",
> +			mic->name, __func__, __LINE__,
> +			strerror(errno), ret);
> +	}
> +	return ret;
> +}
> +
> +/*
> + * This initialization routine requires at least one
> + * vring i.e. vr0. vr1 is optional.
> + */
> +static void *
> +init_vr(struct mic_info *mic, int fd, int type,
> +	struct mic_vring *vr0, struct mic_vring *vr1, int num_vq)
> +{
> +	int vr_size;
> +	char *va;
> +
> +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> +	va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq,
> +		PROT_READ, MAP_SHARED, fd, 0);
> +	if (MAP_FAILED == va) {
> +		mpsslog("%s %s %d mmap failed errno %s\n",
> +			mic->name, __func__, __LINE__,
> +			strerror(errno));
> +		goto done;
> +	}
> +	*get_dp(mic, type) = (void *)va;
> +	vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END];
> +	vr0->info = vr0->va +
> +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN);
> +	vring_init(&vr0->vr,
> +		MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN);
> +	mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ",
> +		__func__, mic->name, vr0->va, vr0->info, vr_size,
> +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> +	mpsslog("magic 0x%x expected 0x%x\n",
> +		vr0->info->magic, MIC_MAGIC + type + 0);
> +	assert(vr0->info->magic == MIC_MAGIC + type + 0);
> +	if (vr1) {
> +		vr1->va = (struct mic_vring *)
> +			&va[MIC_DEVICE_PAGE_END + vr_size];
> +		vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES,
> +			MIC_VIRTIO_RING_ALIGN);
> +		vring_init(&vr1->vr,
> +			MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN);
> +		mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ",
> +			__func__, mic->name, vr1->va, vr1->info, vr_size,
> +			vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> +		mpsslog("magic 0x%x expected 0x%x\n",
> +			vr1->info->magic, MIC_MAGIC + type + 1);
> +		assert(vr1->info->magic == MIC_MAGIC + type + 1);
> +	}
> +done:
> +	return va;
> +}
> +
> +static void
> +uninit_vr(struct mic_info *mic, int num_vq)
> +{
> +	int vr_size, ret;
> +
> +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> +	ret = munmap(mic->mic_virtblk.block_dp,
> +		MIC_DEVICE_PAGE_END + vr_size * num_vq);
> +	if (ret < 0)
> +		mpsslog("%s munmap errno %d\n", mic->name, errno);
> +}
> +
> +static void
> +wait_for_card_driver(struct mic_info *mic, int fd, int type)
> +{
> +	struct pollfd pollfd;
> +	int err;
> +	struct mic_device_desc *desc = get_device_desc(mic, type);
> +
> +	pollfd.fd = fd;
> +	mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n",
> +		mic->name, __func__, type, desc->status);
> +	while (1) {
> +		pollfd.events = POLLIN;
> +		pollfd.revents = 0;
> +		err = poll(&pollfd, 1, -1);
> +		if (err < 0) {
> +			mpsslog("%s %s poll failed %s\n",
> +				mic->name, __func__, strerror(errno));
> +			continue;
> +		}
> +
> +		if (pollfd.revents) {
> +			mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n",
> +				mic->name, __func__, type, desc->status);
> +			if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +				mpsslog("%s %s poll.revents %d\n",
> +					mic->name, __func__, pollfd.revents);
> +				mpsslog("%s %s desc-> type %d status 0x%x\n",
> +					mic->name, __func__, type,
> +					desc->status);
> +				break;
> +			}
> +		}
> +	}
> +}
> +
> +/* Spin till we have some descriptors */
> +static void
> +wait_for_descriptors(struct mic_info *mic, struct mic_vring *vr)
> +{
> +	__u16 avail_idx = read_avail_idx(vr);
> +
> +	while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
> +#ifdef DEBUG
> +		mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
> +			mic->name, __func__,
> +			le16toh(vr->vr.avail->idx), vr->info->avail_idx);
> +#endif
> +		cpu_relax();
> +	}
> +}
> +
> +static void *
> +virtio_net(void *arg)
> +{
> +	static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)];
> +	static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64);
> +	struct iovec vnet_iov[2][2] = {
> +		{ { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) },
> +		  { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } },
> +		{ { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) },
> +		  { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } },
> +	};
> +	struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1];
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	char if_name[IFNAMSIZ];
> +	struct pollfd net_poll[MAX_NET_FD];
> +	struct mic_vring tx_vr, rx_vr;
> +	struct mic_copy_desc copy;
> +	struct mic_device_desc *desc;
> +	int err;
> +
> +	snprintf(if_name, IFNAMSIZ, "mic%d", mic->id);
> +	mic->mic_net.tap_fd = tun_alloc(mic, if_name);
> +	if (mic->mic_net.tap_fd < 0)
> +		goto done;
> +
> +	if (tap_configure(mic, if_name))
> +		goto done;
> +	mpsslog("MIC name %s id %d\n", mic->name, mic->id);
> +
> +	net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd;
> +	net_poll[NET_FD_VIRTIO_NET].events = POLLIN;
> +	net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd;
> +	net_poll[NET_FD_TUN].events = POLLIN;
> +
> +	if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd,
> +		VIRTIO_ID_NET, &tx_vr, &rx_vr,
> +		virtnet_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		goto done;
> +	}
> +
> +	copy.iovcnt = 2;
> +	desc = get_device_desc(mic, VIRTIO_ID_NET);
> +
> +	while (1) {
> +		ssize_t len;
> +
> +		net_poll[NET_FD_VIRTIO_NET].revents = 0;
> +		net_poll[NET_FD_TUN].revents = 0;
> +
> +		/* Start polling for data from tap and virtio net */
> +		err = poll(net_poll, 2, -1);
> +		if (err < 0) {
> +			mpsslog("%s poll failed %s\n",
> +				__func__, strerror(errno));
> +			continue;
> +		}
> +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> +			wait_for_card_driver(mic, mic->mic_net.virtio_net_fd,
> +					VIRTIO_ID_NET);
> +		/*
> +		 * Check if there is data to be read from TUN and write to
> +		 * virtio net fd if there is.
> +		 */
> +		if (net_poll[NET_FD_TUN].revents & POLLIN) {
> +			copy.iov = iov0;
> +			len = readv(net_poll[NET_FD_TUN].fd,
> +				copy.iov, copy.iovcnt);
> +			if (len > 0) {
> +				struct virtio_net_hdr *hdr
> +					= (struct virtio_net_hdr *) vnet_hdr[0];
> +
> +				/* Disable checksums on the card since we are on
> +				   a reliable PCIe link */
> +				hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
> +#ifdef DEBUG
> +				mpsslog("%s %s %d hdr->flags 0x%x ", mic->name,
> +					__func__, __LINE__, hdr->flags);
> +				mpsslog("copy.out_len %d hdr->gso_type 0x%x\n",
> +					copy.out_len, hdr->gso_type);
> +#endif
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read from tap 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					len);
> +#endif
> +				wait_for_descriptors(mic, &tx_vr);
> +				txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, &copy,
> +					len);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_net.virtio_net_fd, &tx_vr,
> +					&copy);
> +				if (err < 0) {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +				}
> +				if (!err)
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					sum_iovec_len(&copy));
> +#endif
> +				/* Reinitialize IOV for next run */
> +				iov0[1].iov_len = MAX_NET_PKT_SIZE;
> +			} else if (len < 0) {
> +				disp_iovec(mic, &copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read failed %s ", mic->name,
> +					__func__, __LINE__, strerror(errno));
> +				mpsslog("cnt %d sum %d\n",
> +					copy.iovcnt, sum_iovec_len(&copy));
> +			}
> +		}
> +
> +		/*
> +		 * Check if there is data to be read from virtio net and
> +		 * write to TUN if there is.
> +		 */
> +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) {
> +			while (rx_vr.info->avail_idx !=
> +				le16toh(rx_vr.vr.avail->idx)) {
> +				copy.iov = iov1;
> +				txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, &copy,
> +					MAX_NET_PKT_SIZE
> +					+ sizeof(struct virtio_net_hdr));
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_net.virtio_net_fd, &rx_vr,
> +					&copy);
> +				if (!err) {
> +#ifdef DEBUG
> +					struct virtio_net_hdr *hdr
> +						= (struct virtio_net_hdr *)
> +							vnet_hdr[1];
> +
> +					mpsslog("%s %s %d hdr->flags 0x%x, ",
> +						mic->name, __func__, __LINE__,
> +						hdr->flags);
> +					mpsslog("out_len %d gso_type 0x%x\n",
> +						copy.out_len,
> +						hdr->gso_type);
> +#endif
> +					/* Set the correct output iov_len */
> +					iov1[1].iov_len = copy.out_len -
> +						sizeof(struct virtio_net_hdr);
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +					disp_iovec(mic, copy, __func__,
> +						__LINE__);
> +					mpsslog("%s %s %d ",
> +						mic->name, __func__, __LINE__);
> +					mpsslog("read from net 0x%lx\n",
> +						sum_iovec_len(copy));
> +#endif
> +					len = writev(net_poll[NET_FD_TUN].fd,
> +						copy.iov, copy.iovcnt);
> +					if (len != sum_iovec_len(&copy)) {
> +						mpsslog("Tun write failed %s ",
> +							strerror(errno));
> +						mpsslog("len 0x%x ", len);
> +						mpsslog("read_len 0x%x\n",
> +							sum_iovec_len(&copy));
> +					} else {
> +#ifdef DEBUG
> +						disp_iovec(mic, &copy, __func__,
> +							__LINE__);
> +						mpsslog("%s %s %d ",
> +							mic->name, __func__,
> +							__LINE__);
> +						mpsslog("wrote to tap 0x%lx\n",
> +							len);
> +#endif
> +					}
> +				} else {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +					break;
> +				}
> +			}
> +		}
> +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> +			sleep(1);
> +		}
> +	}
> +done:
> +	pthread_exit(NULL);
> +}
> +
> +/* virtio_console */
> +#define VIRTIO_CONSOLE_FD 0
> +#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1)
> +#define MAX_CONSOLE_FD (MONITOR_FD + 1)  /* must be the last one + 1 */
> +#define MAX_BUFFER_SIZE PAGE_SIZE
> +
> +static void *
> +virtio_console(void *arg)
> +{
> +	static __u8 vcons_buf[2][PAGE_SIZE];
> +	struct iovec vcons_iov[2] = {
> +		{ .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) },
> +		{ .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) },
> +	};
> +	struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1];
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	int err;
> +	struct pollfd console_poll[MAX_CONSOLE_FD];
> +	int pty_fd;
> +	char *pts_name;
> +	ssize_t len;
> +	struct mic_vring tx_vr, rx_vr;
> +	struct mic_copy_desc copy;
> +	struct mic_device_desc *desc;
> +
> +	pty_fd = posix_openpt(O_RDWR);
> +	if (pty_fd < 0) {
> +		mpsslog("can't open a pseudoterminal master device: %s\n",
> +			strerror(errno));
> +		goto _return;
> +	}
> +	pts_name = ptsname(pty_fd);
> +	if (pts_name == NULL) {
> +		mpsslog("can't get pts name\n");
> +		goto _close_pty;
> +	}
> +	printf("%s console message goes to %s\n", mic->name, pts_name);
> +	mpsslog("%s console message goes to %s\n", mic->name, pts_name);
> +	err = grantpt(pty_fd);
> +	if (err < 0) {
> +		mpsslog("can't grant access: %s %s\n",
> +				pts_name, strerror(errno));
> +		goto _close_pty;
> +	}
> +	err = unlockpt(pty_fd);
> +	if (err < 0) {
> +		mpsslog("can't unlock a pseudoterminal: %s %s\n",
> +				pts_name, strerror(errno));
> +		goto _close_pty;
> +	}
> +	console_poll[MONITOR_FD].fd = pty_fd;
> +	console_poll[MONITOR_FD].events = POLLIN;
> +
> +	console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd;
> +	console_poll[VIRTIO_CONSOLE_FD].events = POLLIN;
> +
> +	if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd,
> +		VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr,
> +		virtcons_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		goto _close_pty;
> +	}
> +
> +	copy.iovcnt = 1;
> +	desc = get_device_desc(mic, VIRTIO_ID_CONSOLE);
> +
> +	for (;;) {
> +		console_poll[MONITOR_FD].revents = 0;
> +		console_poll[VIRTIO_CONSOLE_FD].revents = 0;
> +		err = poll(console_poll, MAX_CONSOLE_FD, -1);
> +		if (err < 0) {
> +			mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__,
> +				strerror(errno));
> +			continue;
> +		}
> +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> +			wait_for_card_driver(mic,
> +				mic->mic_console.virtio_console_fd,
> +				VIRTIO_ID_CONSOLE);
> +
> +		if (console_poll[MONITOR_FD].revents & POLLIN) {
> +			copy.iov = iov0;
> +			len = readv(pty_fd, copy.iov, copy.iovcnt);
> +			if (len > 0) {
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read from tap 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					len);
> +#endif
> +				wait_for_descriptors(mic, &tx_vr);
> +				txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr,
> +					&copy, len);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_console.virtio_console_fd,
> +					&tx_vr, &copy);
> +				if (err < 0) {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +				}
> +				if (!err)
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					sum_iovec_len(copy));
> +#endif
> +				/* Reinitialize IOV for next run */
> +				iov0->iov_len = PAGE_SIZE;
> +			} else if (len < 0) {
> +				disp_iovec(mic, &copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read failed %s ",
> +					mic->name, __func__, __LINE__,
> +					strerror(errno));
> +				mpsslog("cnt %d sum %d\n",
> +					copy.iovcnt, sum_iovec_len(&copy));
> +			}
> +		}
> +
> +		if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) {
> +			while (rx_vr.info->avail_idx !=
> +				le16toh(rx_vr.vr.avail->idx)) {
> +				copy.iov = iov1;
> +				txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr,
> +					&copy, PAGE_SIZE);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_console.virtio_console_fd,
> +					&rx_vr, &copy);
> +				if (!err) {
> +					/* Set the correct output iov_len */
> +					iov1->iov_len = copy.out_len;
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +					disp_iovec(mic, copy, __func__,
> +						__LINE__);
> +					mpsslog("%s %s %d ",
> +						mic->name, __func__, __LINE__);
> +					mpsslog("read from net 0x%lx\n",
> +						sum_iovec_len(copy));
> +#endif
> +					len = writev(pty_fd,
> +						copy.iov, copy.iovcnt);
> +					if (len != sum_iovec_len(&copy)) {
> +						mpsslog("Tun write failed %s ",
> +							strerror(errno));
> +						mpsslog("len 0x%x ", len);
> +						mpsslog("read_len 0x%x\n",
> +							sum_iovec_len(&copy));
> +					} else {
> +#ifdef DEBUG
> +						disp_iovec(mic, copy, __func__,
> +							__LINE__);
> +						mpsslog("%s %s %d ",
> +							mic->name, __func__,
> +							__LINE__);
> +						mpsslog("wrote to tap 0x%lx\n",
> +							len);
> +#endif
> +					}
> +				} else {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +					break;
> +				}
> +			}
> +		}
> +		if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> +			sleep(1);
> +		}
> +	}
> +_close_pty:
> +	close(pty_fd);
> +_return:
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd)
> +{
> +	char path[PATH_MAX];
> +	int fd, err;
> +
> +	snprintf(path, PATH_MAX, "/dev/mic%d", mic->id);
> +	fd = open(path, O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Could not open %s %s\n", path, strerror(errno));
> +		return;
> +	}
> +
> +	err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd);
> +	if (err < 0) {
> +		mpsslog("Could not add %d %s\n", dd->type, strerror(errno));
> +		close(fd);
> +		return;
> +	}
> +	switch (dd->type) {
> +	case VIRTIO_ID_NET:
> +		mic->mic_net.virtio_net_fd = fd;
> +		mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name);
> +		break;
> +	case VIRTIO_ID_CONSOLE:
> +		mic->mic_console.virtio_console_fd = fd;
> +		mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name);
> +		break;
> +	case VIRTIO_ID_BLOCK:
> +		mic->mic_virtblk.virtio_block_fd = fd;
> +		mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name);
> +		break;
> +	}
> +}
> +
> +static bool
> +set_backend_file(struct mic_info *mic)
> +{
> +	FILE *config;
> +	char buff[PATH_MAX], *line, *evv, *p;
> +
> +	snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id);
> +	config = fopen(buff, "r");
> +	if (config == NULL)
> +		return false;
> +	do {  /* look for "virtblk_backend=XXXX" */
> +		line = fgets(buff, PATH_MAX, config);
> +		if (line == NULL)
> +			break;
> +		if (*line == '#')
> +			continue;
> +		p = strchr(line, '\n');
> +		if (p)
> +			*p = '\0';
> +	} while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0);
> +	fclose(config);
> +	if (line == NULL)
> +		return false;
> +	evv = strchr(line, '=');
> +	if (evv == NULL)
> +		return false;
> +	mic->mic_virtblk.backend_file = malloc(strlen(evv));
> +	if (mic->mic_virtblk.backend_file == NULL) {
> +		mpsslog("can't allocate memory\n", mic->name, mic->id);
> +		return false;
> +	}
> +	strcpy(mic->mic_virtblk.backend_file, evv + 1);
> +	return true;
> +}
> +
> +#define SECTOR_SIZE 512
> +static bool
> +set_backend_size(struct mic_info *mic)
> +{
> +	mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0,
> +		SEEK_END);
> +	if (mic->mic_virtblk.backend_size < 0) {
> +		mpsslog("%s: can't seek: %s\n",
> +			mic->name, mic->mic_virtblk.backend_file);
> +		return false;
> +	}
> +	virtblk_dev_page.blk_config.capacity =
> +		mic->mic_virtblk.backend_size / SECTOR_SIZE;
> +	if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0)
> +		virtblk_dev_page.blk_config.capacity++;
> +
> +	virtblk_dev_page.blk_config.capacity =
> +		htole64(virtblk_dev_page.blk_config.capacity);
> +
> +	return true;
> +}
> +
> +static bool
> +open_backend(struct mic_info *mic)
> +{
> +	if (!set_backend_file(mic))
> +		goto _error_exit;
> +	mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR);
> +	if (mic->mic_virtblk.backend < 0) {
> +		mpsslog("%s: can't open: %s\n", mic->name,
> +			mic->mic_virtblk.backend_file);
> +		goto _error_free;
> +	}
> +	if (!set_backend_size(mic))
> +		goto _error_close;
> +	mic->mic_virtblk.backend_addr = mmap(NULL,
> +		mic->mic_virtblk.backend_size,
> +		PROT_READ|PROT_WRITE, MAP_SHARED,
> +		mic->mic_virtblk.backend, 0L);
> +	if (mic->mic_virtblk.backend_addr == MAP_FAILED) {
> +		mpsslog("%s: can't map: %s %s\n",
> +			mic->name, mic->mic_virtblk.backend_file,
> +			strerror(errno));
> +		goto _error_close;
> +	}
> +	return true;
> +
> + _error_close:
> +	close(mic->mic_virtblk.backend);
> + _error_free:
> +	free(mic->mic_virtblk.backend_file);
> + _error_exit:
> +	return false;
> +}
> +
> +static void
> +close_backend(struct mic_info *mic)
> +{
> +	munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size);
> +	close(mic->mic_virtblk.backend);
> +	free(mic->mic_virtblk.backend_file);
> +}
> +
> +static bool
> +start_virtblk(struct mic_info *mic, struct mic_vring *vring)
> +{
> +	if (((__u64)&virtblk_dev_page.blk_config % 8) != 0) {
> +		mpsslog("%s: blk_config is not 8 byte aligned.\n",
> +			mic->name);
> +		return false;
> +	}
> +	add_virtio_device(mic, &virtblk_dev_page.dd);
> +	if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd,
> +		VIRTIO_ID_BLOCK, vring, NULL, virtblk_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		return false;
> +	}
> +	return true;
> +}
> +
> +static void
> +stop_virtblk(struct mic_info *mic)
> +{
> +	uninit_vr(mic, virtblk_dev_page.dd.num_vq);
> +	close(mic->mic_virtblk.virtio_block_fd);
> +}
> +
> +static __u8
> +header_error_check(struct vring_desc *desc)
> +{
> +	if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) {
> +		mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n",
> +				__func__, __LINE__);
> +		return -EIO;
> +	}
> +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) {
> +		mpsslog("%s() %d: alone\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	if (le16toh(desc->flags) & VRING_DESC_F_WRITE) {
> +		mpsslog("%s() %d: not read\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	return 0;
> +}
> +
> +static int
> +read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx)
> +{
> +	struct iovec iovec;
> +	struct mic_copy_desc copy;
> +
> +	iovec.iov_len = sizeof(*hdr);
> +	iovec.iov_base = hdr;
> +	copy.iov = &iovec;
> +	copy.iovcnt = 1;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = false;  /* do not update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static int
> +transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt)
> +{
> +	struct mic_copy_desc copy;
> +
> +	copy.iov = iovec;
> +	copy.iovcnt = iovcnt;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = false;  /* do not update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static __u8
> +status_error_check(struct vring_desc *desc)
> +{
> +	if (le32toh(desc->len) != sizeof(__u8)) {
> +		mpsslog("%s() %d: length is not sizeof(status)\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	return 0;
> +}
> +
> +static int
> +write_status(int fd, __u8 *status)
> +{
> +	struct iovec iovec;
> +	struct mic_copy_desc copy;
> +
> +	iovec.iov_base = status;
> +	iovec.iov_len = sizeof(*status);
> +	copy.iov = &iovec;
> +	copy.iovcnt = 1;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = true; /* Update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static void *
> +virtio_block(void *arg)
> +{
> +	struct mic_info *mic = (struct mic_info *) arg;
> +	int ret;
> +	struct pollfd block_poll;
> +	struct mic_vring vring;
> +	__u16 avail_idx;
> +	__u32 desc_idx;
> +	struct vring_desc *desc;
> +	struct iovec *iovec, *piov;
> +	__u8 status;
> +	__u32 buffer_desc_idx;
> +	struct virtio_blk_outhdr hdr;
> +	void *fos;
> +
> +	for (;;) {  /* forever */
> +		if (!open_backend(mic)) { /* No virtblk */
> +			for (mic->mic_virtblk.signaled = 0;
> +				!mic->mic_virtblk.signaled;)
> +				sleep(1);
> +			continue;
> +		}
> +
> +		/* backend file is specified. */
> +		if (!start_virtblk(mic, &vring))
> +			goto _close_backend;
> +		iovec = malloc(sizeof(*iovec) *
> +			le32toh(virtblk_dev_page.blk_config.seg_max));
> +		if (!iovec) {
> +			mpsslog("%s: can't alloc iovec: %s\n",
> +				mic->name, strerror(ENOMEM));
> +			goto _stop_virtblk;
> +		}
> +
> +		block_poll.fd = mic->mic_virtblk.virtio_block_fd;
> +		block_poll.events = POLLIN;
> +		for (mic->mic_virtblk.signaled = 0;
> +		     !mic->mic_virtblk.signaled;) {
> +			block_poll.revents = 0;
> +					/* timeout in 1 sec to see signaled */
> +			ret = poll(&block_poll, 1, 1000);
> +			if (ret < 0) {
> +				mpsslog("%s %d: poll failed: %s\n",
> +					__func__, __LINE__,
> +					strerror(errno));
> +				continue;
> +			}
> +
> +			if (!(block_poll.revents & POLLIN)) {
> +#ifdef DEBUG
> +				mpsslog("%s %d: block_poll.revents=0x%x\n",
> +					__func__, __LINE__, block_poll.revents);
> +				sleep(1);
> +#endif
> +				continue;
> +			}
> +
> +			/* POLLIN */
> +			while (vring.info->avail_idx !=
> +				le16toh(vring.vr.avail->idx)) {
> +				/* read header element */
> +				avail_idx =
> +					vring.info->avail_idx &
> +					(vring.vr.num - 1);
> +				desc_idx = le16toh(
> +					vring.vr.avail->ring[avail_idx]);
> +				desc = &vring.vr.desc[desc_idx];
> +#ifdef DEBUG
> +				mpsslog("%s() %d: avail_idx=%d ",
> +					__func__, __LINE__,
> +					vring.info->avail_idx);
> +				mpsslog("vring.vr.num=%d desc=%p\n",
> +					vring.vr.num, desc);
> +#endif
> +				status = header_error_check(desc);
> +				ret = read_header(
> +					mic->mic_virtblk.virtio_block_fd,
> +					&hdr, desc_idx);
> +				if (ret < 0) {
> +					mpsslog("%s() %d %s: ret=%d %s\n",
> +						__func__, __LINE__,
> +						mic->name, ret,
> +						strerror(errno));
> +					break;
> +				}
> +				/* buffer element */
> +				piov = iovec;
> +				status = 0;
> +				fos = mic->mic_virtblk.backend_addr +
> +					(hdr.sector * SECTOR_SIZE);
> +				buffer_desc_idx = desc_idx =
> +					next_desc(desc);
> +				for (desc = &vring.vr.desc[buffer_desc_idx];
> +				     desc->flags & VRING_DESC_F_NEXT;
> +				     desc_idx = next_desc(desc),
> +					     desc = &vring.vr.desc[desc_idx]) {
> +					piov->iov_len = desc->len;
> +					piov->iov_base = fos;
> +					piov++;
> +					fos += desc->len;
> +				}
> +				/* Returning NULLs for VIRTIO_BLK_T_GET_ID. */
> +				if (hdr.type & ~(VIRTIO_BLK_T_OUT |
> +					VIRTIO_BLK_T_GET_ID)) {
> +					/*
> +					  VIRTIO_BLK_T_IN - does not do
> +					  anything. Probably for documenting.
> +					  VIRTIO_BLK_T_SCSI_CMD - for
> +					  virtio_scsi.
> +					  VIRTIO_BLK_T_FLUSH - turned off in
> +					  config space.
> +					  VIRTIO_BLK_T_BARRIER - defined but not
> +					  used in anywhere.
> +					*/
> +					mpsslog("%s() %d: type %x ",
> +						__func__, __LINE__,
> +						hdr.type);
> +					mpsslog("is not supported\n");
> +					status = -ENOTSUP;
> +
> +				} else {
> +					ret = transfer_blocks(
> +					mic->mic_virtblk.virtio_block_fd,
> +						iovec,
> +						piov - iovec);
> +					if (ret < 0 &&
> +						status != 0)
> +						status = ret;
> +				}
> +				/* write status and update used pointer */
> +				if (status != 0)
> +					status = status_error_check(desc);
> +				ret = write_status(
> +					mic->mic_virtblk.virtio_block_fd,
> +					&status);
> +#ifdef DEBUG
> +				mpsslog("%s() %d: write status=%d on desc=%p\n",
> +					__func__, __LINE__,
> +					status, desc);
> +#endif
> +			}
> +		}
> +		free(iovec);
> +_stop_virtblk:
> +		stop_virtblk(mic);
> +_close_backend:
> +		close_backend(mic);
> +	}  /* forever */
> +
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +reset(struct mic_info *mic)
> +{
> +#define RESET_TIMEOUT 120
> +	int i = RESET_TIMEOUT;
> +	setsysfs(mic->name, "state", "reset");
> +	while (i) {
> +		char *state;
> +		state = readsysfs(mic->name, "state");
> +		if (!state)
> +			goto retry;
> +		mpsslog("%s: %s %d state %s\n",
> +			mic->name, __func__, __LINE__, state);
> +		if ((!strcmp(state, "offline"))) {
> +			free(state);
> +			break;
> +		}
> +		free(state);
> +retry:
> +		sleep(1);
> +		i--;
> +	}
> +}
> +
> +static int
> +get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status)
> +{
> +	if (!strcmp(shutdown_status, "nop"))
> +		return MIC_NOP;
> +	if (!strcmp(shutdown_status, "crashed"))
> +		return MIC_CRASHED;
> +	if (!strcmp(shutdown_status, "halted"))
> +		return MIC_HALTED;
> +	if (!strcmp(shutdown_status, "poweroff"))
> +		return MIC_POWER_OFF;
> +	if (!strcmp(shutdown_status, "restart"))
> +		return MIC_RESTART;
> +	mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status);
> +	/* Invalid state */
> +	assert(0);
> +};
> +
> +static int get_mic_state(struct mic_info *mic, char *state)
> +{
> +	if (!strcmp(state, "offline"))
> +		return MIC_OFFLINE;
> +	if (!strcmp(state, "online"))
> +		return MIC_ONLINE;
> +	if (!strcmp(state, "shutting_down"))
> +		return MIC_SHUTTING_DOWN;
> +	if (!strcmp(state, "reset_failed"))
> +		return MIC_RESET_FAILED;
> +	mpsslog("%s: BUG invalid state %s\n", mic->name, state);
> +	/* Invalid state */
> +	assert(0);
> +};
> +
> +static void mic_handle_shutdown(struct mic_info *mic)
> +{
> +#define SHUTDOWN_TIMEOUT 60
> +	int i = SHUTDOWN_TIMEOUT, ret, stat = 0;
> +	char *shutdown_status;
> +	while (i) {
> +		shutdown_status = readsysfs(mic->name, "shutdown_status");
> +		if (!shutdown_status)
> +			continue;
> +		mpsslog("%s: %s %d shutdown_status %s\n",
> +			mic->name, __func__, __LINE__, shutdown_status);
> +		switch (get_mic_shutdown_status(mic, shutdown_status)) {
> +		case MIC_RESTART:
> +			mic->restart = 1;
> +		case MIC_HALTED:
> +		case MIC_POWER_OFF:
> +		case MIC_CRASHED:
> +			goto reset;
> +		default:
> +			break;
> +		}
> +		free(shutdown_status);
> +		sleep(1);
> +		i--;
> +	}
> +reset:
> +	ret = kill(mic->pid, SIGTERM);
> +	mpsslog("%s: %s %d kill pid %d ret %d\n",
> +		mic->name, __func__, __LINE__,
> +		mic->pid, ret);
> +	if (!ret) {
> +		ret = waitpid(mic->pid, &stat,
> +			WIFSIGNALED(stat));
> +		mpsslog("%s: %s %d waitpid ret %d pid %d\n",
> +			mic->name, __func__, __LINE__,
> +			ret, mic->pid);
> +	}
> +	if (ret == mic->pid)
> +		reset(mic);
> +}
> +
> +static void *
> +mic_config(void *arg)
> +{
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	char *state = NULL;
> +	char pathname[PATH_MAX];
> +	int fd, ret;
> +	struct pollfd ufds[1];
> +	char value[4096];
> +
> +	snprintf(pathname, PATH_MAX - 1, "%s/%s/%s",
> +		MICSYSFSDIR, mic->name, "state");
> +
> +	fd = open(pathname, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("%s: opening file %s failed %s\n",
> +			mic->name, pathname, strerror(errno));
> +		goto error;
> +	}
> +
> +	do {
> +		ret = read(fd, value, sizeof(value));
> +		if (ret < 0) {
> +			mpsslog("%s: Failed to read sysfs entry '%s': %s\n",
> +				mic->name, pathname, strerror(errno));
> +			goto close_error1;
> +		}
> +retry:
> +		state = readsysfs(mic->name, "state");
> +		if (!state)
> +			goto retry;
> +		mpsslog("%s: %s %d state %s\n",
> +			mic->name, __func__, __LINE__, state);
> +		switch (get_mic_state(mic, state)) {
> +		case MIC_SHUTTING_DOWN:
> +			mic_handle_shutdown(mic);
> +			goto close_error;
> +		default:
> +			break;
> +		}
> +		free(state);
> +
> +		ufds[0].fd = fd;
> +		ufds[0].events = POLLERR | POLLPRI;
> +		ret = poll(ufds, 1, -1);
> +		if (ret < 0) {
> +			mpsslog("%s: poll failed %s\n",
> +				mic->name, strerror(errno));
> +			goto close_error1;
> +		}
> +	} while (1);
> +close_error:
> +	free(state);
> +close_error1:
> +	close(fd);
> +error:
> +	init_mic(mic);
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +set_cmdline(struct mic_info *mic)
> +{
> +	char buffer[PATH_MAX];
> +	int len;
> +
> +	len = snprintf(buffer, PATH_MAX,
> +		"clocksource=tsc highres=off nohz=off ");
> +	len += snprintf(buffer + len, PATH_MAX,
> +		"cpufreq_on;corec6_off;pc3_off;pc6_off ");
> +	len += snprintf(buffer + len, PATH_MAX,
> +		"ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0",
> +		mic->id);
> +
> +	setsysfs(mic->name, "cmdline", buffer);
> +	mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer);
> +	snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id);
> +	mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer);
> +}
> +
> +static void
> +set_log_buf_info(struct mic_info *mic)
> +{
> +	int fd;
> +	off_t len;
> +	char system_map[] = "/lib/firmware/mic/System.map";
> +	char *map, *temp, log_buf[17] = {'\0'};
> +
> +	fd = open(system_map, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("%s: Opening System.map failed: %d\n",
> +			mic->name, errno);
> +		return;
> +	}
> +	len = lseek(fd, 0, SEEK_END);
> +	if (len < 0) {
> +		mpsslog("%s: Reading System.map size failed: %d\n",
> +			mic->name, errno);
> +		close(fd);
> +		return;
> +	}
> +	map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
> +	if (map == MAP_FAILED) {
> +		mpsslog("%s: mmap of System.map failed: %d\n",
> +			mic->name, errno);
> +		close(fd);
> +		return;
> +	}
> +	temp = strstr(map, "__log_buf");
> +	if (!temp) {
> +		mpsslog("%s: __log_buf not found: %d\n", mic->name, errno);
> +		munmap(map, len);
> +		close(fd);
> +		return;
> +	}
> +	strncpy(log_buf, temp - 19, 16);
> +	setsysfs(mic->name, "log_buf_addr", log_buf);
> +	mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf);
> +	temp = strstr(map, "log_buf_len");
> +	if (!temp) {
> +		mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno);
> +		munmap(map, len);
> +		close(fd);
> +		return;
> +	}
> +	strncpy(log_buf, temp - 19, 16);
> +	setsysfs(mic->name, "log_buf_len", log_buf);
> +	mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf);
> +	munmap(map, len);
> +	close(fd);
> +}
> +
> +static void init_mic(struct mic_info *mic);
> +
> +static void
> +change_virtblk_backend(int x, siginfo_t *siginfo, void *p)
> +{
> +	struct mic_info *mic;
> +
> +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> +		mic->mic_virtblk.signaled = 1/* true */;
> +}
> +
> +static void
> +init_mic(struct mic_info *mic)
> +{
> +	struct sigaction ignore = {
> +		.sa_flags = 0,
> +		.sa_handler = SIG_IGN
> +	};
> +	struct sigaction act = {
> +		.sa_flags = SA_SIGINFO,
> +		.sa_sigaction = change_virtblk_backend,
> +	};
> +	char buffer[PATH_MAX];
> +	int err;
> +
> +		/* ignore SIGUSR1 for both process */
> +	sigaction(SIGUSR1, &ignore, NULL);
> +
> +	mic->pid = fork();
> +	switch (mic->pid) {
> +	case 0:
> +		set_log_buf_info(mic);
> +		set_cmdline(mic);
> +		add_virtio_device(mic, &virtcons_dev_page.dd);
> +		add_virtio_device(mic, &virtnet_dev_page.dd);
> +		err = pthread_create(&mic->mic_console.console_thread, NULL,
> +			virtio_console, mic);
> +		if (err)
> +			mpsslog("%s virtcons pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		/*
> +		 * TODO: Debug why not adding this sleep results in the tap
> +		 * interface not coming up during certain runs sporadically.
> +		 */

Indeed.

> +		usleep(1000);
> +		err = pthread_create(&mic->mic_net.net_thread, NULL,
> +			virtio_net, mic);
> +		if (err)
> +			mpsslog("%s virtnet pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		err = pthread_create(&mic->mic_virtblk.block_thread, NULL,
> +			virtio_block, mic);
> +		if (err)
> +			mpsslog("%s virtblk pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		sigemptyset(&act.sa_mask);
> +		err = sigaction(SIGUSR1, &act, NULL);

Confused. Who sends this SIGUSR1 here?


> +		if (err)
> +			mpsslog("%s sigaction SIGUSR1 failed %s\n",
> +			mic->name, strerror(errno));
> +		while (1)
> +			sleep(60);
> +	case -1:
> +		mpsslog("fork failed MIC name %s id %d errno %d\n",
> +			mic->name, mic->id, errno);
> +		break;
> +	default:
> +		if (mic->restart) {
> +			snprintf(buffer, PATH_MAX,
> +				"boot:linux:mic/uos.img:mic/mic%d.image",
> +				mic->id);
> +			setsysfs(mic->name, "state", buffer);
> +			mpsslog("%s restarting mic %d\n",
> +				mic->name, mic->restart);
> +			mic->restart = 0;
> +		}
> +		pthread_create(&mic->config_thread, NULL, mic_config, mic);
> +	}
> +}
> +
> +static void
> +start_daemon(void)
> +{
> +	struct mic_info *mic;
> +
> +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> +		init_mic(mic);
> +
> +	while (1)
> +		sleep(60);
> +}
> +
> +static int
> +init_mic_list(void)
> +{
> +	struct mic_info *mic = &mic_list;
> +	struct dirent *file;
> +	DIR *dp;
> +	int cnt = 0;
> +
> +	dp = opendir(MICSYSFSDIR);
> +	if (!dp)
> +		return 0;
> +
> +	while ((file = readdir(dp)) != NULL) {
> +		if (!strncmp(file->d_name, "mic", 3)) {
> +			mic->next = malloc(sizeof(struct mic_info));
> +			if (mic->next) {
> +				mic = mic->next;
> +				mic->next = NULL;
> +				memset(mic, 0, sizeof(struct mic_info));
> +				mic->id = atoi(&file->d_name[3]);
> +				mic->name = malloc(strlen(file->d_name) + 16);
> +				if (mic->name)
> +					strcpy(mic->name, file->d_name);
> +				mpsslog("MIC name %s id %d\n", mic->name,
> +					mic->id);
> +				cnt++;
> +			}
> +		}
> +	}
> +
> +	closedir(dp);
> +	return cnt;
> +}
> +
> +void
> +mpsslog(char *format, ...)
> +{
> +	va_list args;
> +	char buffer[4096];
> +	time_t t;
> +	char *ts;
> +
> +	if (logfp == NULL)
> +		return;
> +
> +	va_start(args, format);
> +	vsprintf(buffer, format, args);
> +	va_end(args);
> +
> +	time(&t);
> +	ts = ctime(&t);
> +	ts[strlen(ts) - 1] = '\0';
> +	fprintf(logfp, "%s: %s", ts, buffer);
> +
> +	fflush(logfp);
> +}
> +
> +int
> +main(int argc, char *argv[])
> +{
> +	int cnt;
> +
> +	myname = argv[0];
> +
> +	logfp = fopen(LOGFILE_NAME, "a+");
> +	if (!logfp) {
> +		fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME);
> +		exit(1);
> +	}
> +
> +	mpsslog("MIC Daemon start\n");
> +
> +	cnt = init_mic_list();
> +	if (cnt == 0) {
> +		mpsslog("MIC module not loaded\n");
> +		exit(2);
> +	}
> +	mpsslog("MIC found %d devices\n", cnt);
> +
> +	start_daemon();
> +
> +	exit(0);
> +}
> diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h
> new file mode 100644
> index 0000000..b6dee38
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpssd.h
> @@ -0,0 +1,100 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +#ifndef _MPSSD_H_
> +#define _MPSSD_H_
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <dirent.h>
> +#include <libgen.h>
> +#include <pthread.h>
> +#include <stdarg.h>
> +#include <time.h>
> +#include <errno.h>
> +#include <sys/dir.h>
> +#include <sys/ioctl.h>
> +#include <sys/poll.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <sys/utsname.h>
> +#include <sys/wait.h>
> +#include <netinet/in.h>
> +#include <arpa/inet.h>
> +#include <netdb.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <limits.h>
> +#include <syslog.h>
> +#include <getopt.h>
> +#include <net/if.h>
> +#include <linux/if_tun.h>
> +#include <linux/if_tun.h>
> +#include <linux/virtio_ids.h>
> +
> +#define MICSYSFSDIR "/sys/class/mic"
> +#define LOGFILE_NAME "/var/log/mpssd"
> +#define PAGE_SIZE 4096
> +
> +struct mic_console_info {
> +	pthread_t       console_thread;
> +	int		virtio_console_fd;
> +	void		*console_dp;
> +};
> +
> +struct mic_net_info {
> +	pthread_t       net_thread;
> +	int		virtio_net_fd;
> +	int		tap_fd;
> +	void		*net_dp;
> +};
> +
> +struct mic_virtblk_info {
> +	pthread_t       block_thread;
> +	int		virtio_block_fd;
> +	void		*block_dp;
> +	volatile sig_atomic_t	signaled;
> +	char		*backend_file;
> +	int		backend;
> +	void		*backend_addr;
> +	long		backend_size;
> +};
> +
> +struct mic_info {
> +	int		id;
> +	char		*name;
> +	pthread_t       config_thread;
> +	pid_t		pid;
> +	struct mic_console_info	mic_console;
> +	struct mic_net_info	mic_net;
> +	struct mic_virtblk_info	mic_virtblk;
> +	int		restart;
> +	struct mic_info *next;
> +};
> +
> +void mpsslog(char *format, ...);
> +char *readsysfs(char *dir, char *entry);
> +int setsysfs(char *dir, char *entry, char *value);
> +#endif
> diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c
> new file mode 100644
> index 0000000..3244dcf
> --- /dev/null
> +++ b/Documentation/mic/mpssd/sysfs.c
> @@ -0,0 +1,103 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +
> +#include "mpssd.h"
> +
> +#define PAGE_SIZE 4096
> +
> +char *
> +readsysfs(char *dir, char *entry)
> +{
> +	char filename[PATH_MAX];
> +	char value[PAGE_SIZE];
> +	char *string = NULL;
> +	int fd;
> +	int len;
> +
> +	if (dir == NULL)
> +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> +	else
> +		snprintf(filename, PATH_MAX,
> +			"%s/%s/%s", MICSYSFSDIR, dir, entry);
> +
> +	fd = open(filename, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		return NULL;
> +	}
> +
> +	len = read(fd, value, sizeof(value));
> +	if (len < 0) {
> +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		goto readsys_ret;
> +	}
> +
> +	value[len] = '\0';

Why are you careful to put this \0 here but not in setsysfs below?

If you do, I'd fail on len == sizeof value as well, it isn't going to work with
that.

> +
> +	string = malloc(strlen(value) + 1);
> +	if (string)
> +		strcpy(string, value);
> +
> +readsys_ret:
> +	close(fd);
> +	return string;
> +}
> +
> +int
> +setsysfs(char *dir, char *entry, char *value)
> +{
> +	char filename[PATH_MAX];
> +	char oldvalue[PAGE_SIZE];
> +	int fd;
> +
> +	if (dir == NULL)
> +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> +	else
> +		snprintf(filename, PATH_MAX, "%s/%s/%s",
> +			MICSYSFSDIR, dir, entry);
> +
> +	fd = open(filename, O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		return errno;
> +	}
> +
> +	if (read(fd, oldvalue, sizeof(oldvalue)) < 0) {
> +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		close(fd);
> +		return errno;
> +	}
> +
> +	if (strcmp(value, oldvalue)) {
> +		if (write(fd, value, strlen(value)) < 0) {
> +			mpsslog("Failed to write new sysfs entry '%s': %s\n",
> +				filename, strerror(errno));
> +			close(fd);
> +			return errno;
> +		}
> +	}
> +
> +	close(fd);
> +	return 0;
> +}
> -- 
> 1.8.2.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon.
@ 2013-08-08  6:40     ` Michael S. Tsirkin
  0 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2013-08-08  6:40 UTC (permalink / raw)
  To: Sudeep Dutt
  Cc: Peter P Waskiewicz Jr, Arnd Bergmann, linux-doc,
	Greg Kroah-Hartman, Yaozu (Eddie) Dong, linux-kernel,
	virtualization, Ashutosh Dixit, Rob Landley,
	Harshavardhan R Kharche, Caz Yokoyama,
	Dasaratharaman Chandramouli

On Wed, Aug 07, 2013 at 08:04:13PM -0700, Sudeep Dutt wrote:
> From: Caz Yokoyama <Caz.Yokoyama@intel.com>
> 
> This patch introduces a sample user space daemon which
> implements the virtio device backends on the host. The daemon
> creates/removes/configures virtio device backends by communicating with
> the Intel MIC Host Driver. The virtio devices currently supported are
> virtio net, virtio console and virtio block. Virtio net supports TSO/GSO.
> The daemon also monitors card shutdown status and takes appropriate actions
> like killing the virtio backends and resetting the card upon card shutdown
> and crashes.
> 
> Co-author: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Co-author: Sudeep Dutt <sudeep.dutt@intel.com>
> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
> Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
> ---
>  Documentation/mic/mic_overview.txt |   48 +
>  Documentation/mic/mpssd/.gitignore |    1 +
>  Documentation/mic/mpssd/Makefile   |   19 +
>  Documentation/mic/mpssd/micctrl    |  152 ++++
>  Documentation/mic/mpssd/mpss       |  245 ++++++
>  Documentation/mic/mpssd/mpssd.c    | 1689 ++++++++++++++++++++++++++++++++++++
>  Documentation/mic/mpssd/mpssd.h    |  100 +++
>  Documentation/mic/mpssd/sysfs.c    |  103 +++

Is this generally useful or just example code?
If the former, you can put it in tools/ as well.

>  8 files changed, 2357 insertions(+)
>  create mode 100644 Documentation/mic/mic_overview.txt
>  create mode 100644 Documentation/mic/mpssd/.gitignore
>  create mode 100644 Documentation/mic/mpssd/Makefile
>  create mode 100755 Documentation/mic/mpssd/micctrl
>  create mode 100755 Documentation/mic/mpssd/mpss
>  create mode 100644 Documentation/mic/mpssd/mpssd.c
>  create mode 100644 Documentation/mic/mpssd/mpssd.h
>  create mode 100644 Documentation/mic/mpssd/sysfs.c
> 
> diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt
> new file mode 100644
> index 0000000..8b1a916
> --- /dev/null
> +++ b/Documentation/mic/mic_overview.txt
> @@ -0,0 +1,48 @@
> +An Intel MIC X100 device is a PCIe form factor add-in coprocessor
> +card based on the Intel Many Integrated Core (MIC) architecture
> +that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
> +implements the three required standard address spaces i.e. configuration,
> +memory and I/O. The host OS loads a device driver as is typical for
> +PCIe devices. The card itself runs a bootstrap after reset that
> +transfers control to the card OS downloaded from the host driver.
> +The card OS as shipped by Intel is a Linux kernel with modifications
> +for the X100 devices.
> +
> +Since it is a PCIe card, it does not have the ability to host hardware
> +devices for networking, storage and console. We provide these devices
> +on X100 coprocessors thus enabling a self-bootable equivalent environment
> +for applications. A key benefit of our solution is that it leverages
> +the standard virtio framework for network, disk and console devices,
> +though in our case the virtio framework is used across a PCIe bus.
> +
> +Here is a block diagram of the various components described above. The
> +virtio backends are situated on the host rather than the card given better
> +single threaded performance for the host compared to MIC and the ability of
> +the host to initiate DMA's to/from the card using the MIC DMA engine.
> +
> +                              |
> +       +----------+           |             +----------+
> +       | Card OS  |           |             | Host OS  |
> +       +----------+           |             +----------+
> +                              |
> ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> +| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
> +| Net   | |Console | |Block | | |Net      |  |Console | |Block   |
> +| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
> ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> +    |         |         |     |      |            |         |
> +    |         |         |     |Ring 3|            |         |
> +    |         |         |     |------|------------|---------|-------
> +    +-------------------+     |Ring 0+--------------------------+
> +              |               |      | Virtio over PCIe IOCTLs  |
> +              |               |      +--------------------------+
> +      +--------------+        |                   |
> +      |Intel MIC     |        |            +---------------+
> +      |Card Driver   |        |            |Intel MIC      |
> +      +--------------+        |            |Host Driver    |
> +              |               |            +---------------+
> +              |               |                   |
> +     +-------------------------------------------------------------+
> +     |                                                             |
> +     |                    PCIe Bus                                 |
> +     +-------------------------------------------------------------+
> diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore
> new file mode 100644
> index 0000000..8b7c72f
> --- /dev/null
> +++ b/Documentation/mic/mpssd/.gitignore
> @@ -0,0 +1 @@
> +mpssd
> diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile
> new file mode 100644
> index 0000000..eb860a7
> --- /dev/null
> +++ b/Documentation/mic/mpssd/Makefile
> @@ -0,0 +1,19 @@
> +#
> +# Makefile - Intel MIC User Space Tools.
> +# Copyright(c) 2013, Intel Corporation.
> +#
> +ifdef DEBUG
> +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG)
> +else
> +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall
> +endif
> +
> +mpssd: mpssd.o sysfs.o
> +	$(CC) $(CFLAGS) -o $@ $^ -lpthread
> +
> +install:
> +	install mpssd /usr/sbin/mpssd
> +	install micctrl /usr/sbin/micctrl
> +
> +clean:
> +	rm -f mpssd *.o
> diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl
> new file mode 100755
> index 0000000..e0cfa53
> --- /dev/null
> +++ b/Documentation/mic/mpssd/micctrl
> @@ -0,0 +1,152 @@
> +#!/bin/bash
> +# Intel MIC Platform Software Stack (MPSS)
> +#
> +# Copyright(c) 2013 Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License, version 2, as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Intel MIC User Space Tools.
> +#
> +# micctrl - Controls MIC boot/start/stop.
> +#
> +# chkconfig: 2345 95 05
> +# description: start MPSS stack processing.
> +#
> +### BEGIN INIT INFO
> +# Provides: micctrl
> +### END INIT INFO
> +
> +# Source function library.
> +. /etc/init.d/functions
> +
> +sysfs="/sys/class/mic"
> +
> +status()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`"
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -e ""`basename $f`" state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`""
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +reset()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo reset > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo reset > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +boot()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo "boot:linux:mic/uos.img:mic/$1.image" > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +shutdown()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		echo shutdown > $f/state
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo shutdown > $f/state
> +		done
> +	fi
> +
> +	return 0
> +}
> +
> +wait()
> +{
> +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> +		f=$sysfs/$1
> +		while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> +		do
> +			sleep 1
> +			echo -e "Waiting for $1 to go offline"
> +		done
> +		return 0
> +	fi
> +
> +	if [ -d "$sysfs" ]; then
> +		# Wait for the cards to go offline
> +		for f in $sysfs/*
> +		do
> +			while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> +			do
> +				sleep 1
> +				echo -e "Waiting for "`basename $f`" to go offline"
> +			done
> +		done
> +	fi
> +}
> +
> +case $1 in
> +	-s)
> +		status $2
> +		;;
> +	-r)
> +		reset $2
> +		;;
> +	-b)
> +		boot $2
> +		;;
> +	-S)
> +		shutdown $2
> +		;;
> +	-w)
> +		wait $2
> +		;;
> +	*)
> +		echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}"
> +		exit 2
> +esac
> +
> +exit $?
> diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss
> new file mode 100755
> index 0000000..f0bb3dd
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpss
> @@ -0,0 +1,245 @@
> +#!/bin/bash
> +# Intel MIC Platform Software Stack (MPSS)
> +#
> +# Copyright(c) 2013 Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License, version 2, as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> +# General Public License for more details.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Intel MIC User Space Tools.
> +#
> +# mpss	Start mpssd.
> +#
> +# chkconfig: 2345 95 05
> +# description: start MPSS stack processing.
> +#
> +### BEGIN INIT INFO
> +# Provides: mpss
> +# Required-Start:
> +# Required-Stop:
> +# Short-Description: MPSS stack control
> +# Description: MPSS stack control
> +### END INIT INFO
> +
> +# Source function library.
> +. /etc/init.d/functions
> +
> +exec=/usr/sbin/mpssd
> +sysfs="/sys/class/mic"
> +
> +start()
> +{
> +	[ -x $exec ] || exit 5
> +
> +	echo -e $"Starting MPSS Stack"
> +
> +	echo -e $"Loading MIC_HOST Module"
> +
> +	# Ensure the driver is loaded
> +	[ -d "$sysfs" ] || modprobe mic_host
> +
> +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then
> +		echo -e $"MPSSD already running! "
> +		success
> +		echo
> +		return 0;
> +	fi
> +
> +	# Start the daemon
> +	echo -n $"Starting MPSSD"
> +	$exec &
> +	RETVAL=$?
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +
> +	sleep 5
> +
> +	# Boot the cards
> +	if [ $RETVAL -eq 0 ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -ne "Booting "`basename $f`" "
> +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> +			RETVAL=$?
> +			if [ $RETVAL -ne 0 ]; then
> +				failure
> +			else
> +				success
> +			fi
> +			echo
> +		done
> +	fi
> +
> +	# Wait till ping works
> +	if [ $RETVAL -eq 0 ]; then
> +		for f in $sysfs/*
> +		do
> +			count=100
> +			ipaddr=`cat $f/cmdline`
> +			ipaddr=${ipaddr#*address,}
> +			ipaddr=`echo $ipaddr | cut -d, -f1 | cut -d\; -f1`
> +
> +			while [ $count -ge 0 ]
> +			do
> +				echo -e "Pinging "`basename $f`" "
> +				ping -c 1 $ipaddr &> /dev/null
> +				RETVAL=$?
> +				if [ $RETVAL -eq 0 ]; then
> +					success
> +					break
> +				fi
> +				sleep 1
> +				count=`expr $count - 1`
> +			done
> +			if [ $RETVAL -ne 0 ]; then
> +				failure
> +			else
> +				success
> +			fi
> +			echo
> +		done
> +	fi
> +	return $RETVAL
> +}
> +
> +stop()
> +{
> +	echo -e $"Shutting down MPSS Stack: "
> +
> +	# Bail out if module is unloaded
> +	if [ ! -d "$sysfs" ]; then
> +		echo -n $"Module unloaded "
> +		killall -9 mpssd 2>/dev/null
> +		success
> +		echo
> +		return 0
> +	fi
> +
> +	# Shut down the cards
> +	for f in $sysfs/*
> +	do
> +		echo -e "Shutting down `basename $f` "
> +		echo "shutdown" > $f/state 2>/dev/null
> +	done
> +
> +	# Wait for the cards to go offline
> +	for f in $sysfs/*
> +	do
> +		while [ "`cat $f/state`" != "offline" ]
> +		do
> +			sleep 1
> +			echo -e "Waiting for "`basename $f`" to go offline"
> +		done
> +	done
> +
> +	# Display the status of the cards
> +	for f in $sysfs/*
> +	do
> +		echo -e ""`basename $f`" state: "`cat $f/state`""
> +	done
> +
> +	sleep 5
> +
> +	# Kill MPSSD now
> +	echo -n $"Killing MPSSD"
> +	killall -9 mpssd 2>/dev/null
> +	RETVAL=$?
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +	return $RETVAL
> +}
> +
> +restart()
> +{
> +	stop
> +	sleep 5
> +	start
> +}
> +
> +status()
> +{
> +	if [ -d "$sysfs" ]; then
> +		for f in $sysfs/*
> +		do
> +			echo -e ""`basename $f`" state: "`cat $f/state`""
> +		done
> +	fi
> +
> +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then
> +		echo "mpssd is running"
> +	else
> +		echo "mpssd is stopped"
> +	fi
> +	return 0
> +}
> +
> +unload()
> +{
> +	if [ ! -d "$sysfs" ]; then
> +		echo -n $"No MIC_HOST Module: "
> +		killall -9 mpssd 2>/dev/null
> +		success
> +		echo
> +		return
> +	fi
> +
> +	stop
> +	RETVAL=$?
> +
> +	sleep 5
> +	echo -n $"Removing MIC_HOST Module: "
> +
> +	if [ $RETVAL = 0 ]; then
> +		sleep 1
> +		modprobe -r mic_host
> +		RETVAL=$?
> +	fi
> +
> +	if [ $RETVAL -ne 0 ]; then
> +		failure
> +	else
> +		success
> +	fi
> +	echo
> +	return $RETVAL
> +}
> +
> +case $1 in
> +	start)
> +		start
> +		;;
> +	stop)
> +		stop
> +		;;
> +	restart)
> +		restart
> +		;;
> +	status)
> +		status
> +		;;
> +	unload)
> +		unload
> +		;;
> +	*)
> +		echo $"Usage: $0 {start|stop|restart|status|unload}"
> +		exit 2
> +esac
> +
> +exit $?
> diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c
> new file mode 100644
> index 0000000..3bc34cb
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpssd.c
> @@ -0,0 +1,1689 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include <stdlib.h>
> +#include <fcntl.h>
> +#include <getopt.h>
> +#include <assert.h>
> +#include <unistd.h>
> +#include <stdbool.h>
> +#include <signal.h>
> +#include <poll.h>
> +#include <features.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <sys/mman.h>
> +#include <sys/socket.h>
> +#include <linux/virtio_ring.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_console.h>
> +#include <linux/virtio_blk.h>
> +#include <linux/version.h>
> +#include "mpssd.h"
> +#include <linux/mic_ioctl.h>
> +#include <linux/mic_common.h>
> +
> +static void init_mic(struct mic_info *mic);
> +
> +static FILE *logfp;
> +static struct mic_info mic_list;
> +
> +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> +
> +#define min_t(type, x, y) ({				\
> +		type __min1 = (x);                      \
> +		type __min2 = (y);                      \
> +		__min1 < __min2 ? __min1 : __min2; })
> +
> +/* align addr on a size boundary - adjust address up/down if needed */
> +#define _ALIGN_UP(addr, size)    (((addr)+((size)-1))&(~((size)-1)))
> +#define _ALIGN_DOWN(addr, size)  ((addr)&(~((size)-1)))
> +
> +/* align addr on a size boundary - adjust address up if needed */
> +#define _ALIGN(addr, size)     _ALIGN_UP(addr, size)
> +
> +/* to align the pointer to the (next) page boundary */
> +#define PAGE_ALIGN(addr)        _ALIGN(addr, PAGE_SIZE)
> +
> +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +
> +/* Insert REP NOP (PAUSE) in busy-wait loops. */
> +static inline void cpu_relax(void)
> +{
> +	asm volatile("rep; nop" : : : "memory");
> +}
> +
> +#define GSO_ENABLED		1
> +#define MAX_GSO_SIZE		(64 * 1024)
> +#define ETH_H_LEN		14
> +#define MAX_NET_PKT_SIZE	(_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64))
> +#define MIC_DEVICE_PAGE_END	0x1000
> +
> +#ifndef VIRTIO_NET_HDR_F_DATA_VALID
> +#define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
> +#endif
> +
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[2];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_console_config cons_config;
> +} virtcons_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_CONSOLE,
> +		.num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig),
> +		.feature_len = sizeof(virtcons_dev_page.host_features),
> +		.config_len = sizeof(virtcons_dev_page.cons_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.vqconfig[1] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +};
> +
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[2];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_net_config net_config;
> +} virtnet_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_NET,
> +		.num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig),
> +		.feature_len = sizeof(virtnet_dev_page.host_features),
> +		.config_len = sizeof(virtnet_dev_page.net_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.vqconfig[1] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +#if GSO_ENABLED
> +		.host_features = htole32(
> +		1 << VIRTIO_NET_F_CSUM |
> +		1 << VIRTIO_NET_F_GSO |
> +		1 << VIRTIO_NET_F_GUEST_TSO4 |
> +		1 << VIRTIO_NET_F_GUEST_TSO6 |
> +		1 << VIRTIO_NET_F_GUEST_ECN |
> +		1 << VIRTIO_NET_F_GUEST_UFO),
> +#else
> +		.host_features = 0,
> +#endif
> +};
> +
> +static const char *mic_config_dir = "/etc/sysconfig/mic";
> +static const char *virtblk_backend = "VIRTBLK_BACKEND";
> +static struct {
> +	struct mic_device_desc dd;
> +	struct mic_vqconfig vqconfig[1];
> +	__u32 host_features, guest_acknowledgements;
> +	struct virtio_blk_config blk_config;
> +} virtblk_dev_page = {
> +	.dd = {
> +		.type = VIRTIO_ID_BLOCK,
> +		.num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig),
> +		.feature_len = sizeof(virtblk_dev_page.host_features),
> +		.config_len = sizeof(virtblk_dev_page.blk_config),
> +	},
> +	.vqconfig[0] = {
> +		.num = htole16(MIC_VRING_ENTRIES),
> +	},
> +	.host_features =
> +		htole32(1<<VIRTIO_BLK_F_SEG_MAX),
> +	.blk_config = {
> +		.seg_max = htole32(MIC_VRING_ENTRIES - 2),
> +		.capacity = htole64(0),
> +	 }
> +};
> +
> +static char *myname;
> +
> +static int
> +tap_configure(struct mic_info *mic, char *dev)
> +{
> +	pid_t pid;
> +	char *ifargv[7];
> +	char ipaddr[IFNAMSIZ];
> +	int ret = 0;
> +
> +	pid = fork();
> +	if (pid == 0) {
> +		ifargv[0] = "ip";
> +		ifargv[1] = "link";
> +		ifargv[2] = "set";
> +		ifargv[3] = dev;
> +		ifargv[4] = "up";
> +		ifargv[5] = NULL;
> +		mpsslog("Configuring %s\n", dev);
> +		ret = execvp("ip", ifargv);
> +		if (ret < 0) {
> +			mpsslog("%s execvp failed errno %s\n",
> +				mic->name, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	if (pid < 0) {
> +		mpsslog("%s fork failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	ret = waitpid(pid, NULL, 0);
> +	if (ret < 0) {
> +		mpsslog("%s waitpid failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id);
> +
> +	pid = fork();
> +	if (pid == 0) {
> +		ifargv[0] = "ip";
> +		ifargv[1] = "addr";
> +		ifargv[2] = "add";
> +		ifargv[3] = ipaddr;
> +		ifargv[4] = "dev";
> +		ifargv[5] = dev;
> +		ifargv[6] = NULL;
> +		mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr);
> +		ret = execvp("ip", ifargv);
> +		if (ret < 0) {
> +			mpsslog("%s execvp failed errno %s\n",
> +				mic->name, strerror(errno));
> +			return ret;
> +		}
> +	}
> +	if (pid < 0) {
> +		mpsslog("%s fork failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +
> +	ret = waitpid(pid, NULL, 0);
> +	if (ret < 0) {
> +		mpsslog("%s waitpid failed errno %s\n",
> +			mic->name, strerror(errno));
> +		return ret;
> +	}
> +	mpsslog("MIC name %s %s %d DONE!\n",
> +		mic->name, __func__, __LINE__);
> +	return 0;
> +}
> +
> +static int tun_alloc(struct mic_info *mic, char *dev)
> +{
> +	struct ifreq ifr;
> +	int fd, err;
> +#if GSO_ENABLED
> +	unsigned offload;
> +#endif
> +	fd = open("/dev/net/tun", O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Could not open /dev/net/tun %s\n", strerror(errno));
> +		goto done;
> +	}
> +
> +	memset(&ifr, 0, sizeof(ifr));
> +
> +	ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
> +	if (*dev)
> +		strncpy(ifr.ifr_name, dev, IFNAMSIZ);
> +
> +	err = ioctl(fd, TUNSETIFF, (void *) &ifr);
> +	if (err < 0) {
> +		mpsslog("%s %s %d TUNSETIFF failed %s\n",
> +			mic->name, __func__, __LINE__, strerror(errno));
> +		close(fd);
> +		return err;
> +	}
> +#if GSO_ENABLED
> +	offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> +		TUN_F_TSO_ECN | TUN_F_UFO;
> +
> +	err = ioctl(fd, TUNSETOFFLOAD, offload);
> +	if (err < 0) {
> +		mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n",
> +			mic->name, __func__, __LINE__, strerror(errno));
> +		close(fd);
> +		return err;
> +	}
> +#endif
> +	strcpy(dev, ifr.ifr_name);
> +	mpsslog("Created TAP %s\n", dev);
> +done:
> +	return fd;
> +}
> +
> +#define NET_FD_VIRTIO_NET 0
> +#define NET_FD_TUN 1
> +#define MAX_NET_FD 2
> +
> +static void * *
> +get_dp(struct mic_info *mic, int type)
> +{
> +	switch (type) {
> +	case VIRTIO_ID_CONSOLE:
> +		return &mic->mic_console.console_dp;
> +	case VIRTIO_ID_NET:
> +		return &mic->mic_net.net_dp;
> +	case VIRTIO_ID_BLOCK:
> +		return &mic->mic_virtblk.block_dp;
> +	}
> +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> +	assert(0);
> +	return NULL;
> +}
> +
> +static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type)
> +{
> +	struct mic_device_desc *d;
> +	int i;
> +	void *dp = *get_dp(mic, type);
> +
> +	for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE;
> +		i += mic_total_desc_size(d)) {
> +		d = dp + i;
> +
> +		/* End of list */
> +		if (d->type == 0)
> +			break;
> +
> +		if (d->type == -1)
> +			continue;
> +
> +		mpsslog("%s %s d-> type %d d %p\n",
> +			mic->name, __func__, d->type, d);
> +
> +		if (d->type == (__u8)type)
> +			return d;
> +	}
> +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> +	assert(0);
> +	return NULL;
> +}
> +
> +/* See comments in vhost.c for explanation of next_desc() */
> +static unsigned next_desc(struct vring_desc *desc)
> +{
> +	unsigned int next;
> +
> +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT))
> +		return -1U;
> +	next = le16toh(desc->next);
> +	return next;
> +}
> +
> +/* Sum up all the IOVEC length */
> +static ssize_t
> +sum_iovec_len(struct mic_copy_desc *copy)
> +{
> +	ssize_t sum = 0;
> +	int i;
> +
> +	for (i = 0; i < copy->iovcnt; i++)
> +		sum += copy->iov[i].iov_len;
> +	return sum;
> +}
> +
> +static inline void verify_out_len(struct mic_info *mic,
> +	struct mic_copy_desc *copy)
> +{
> +	if (copy->out_len != sum_iovec_len(copy)) {
> +		mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%x\n",
> +				mic->name, __func__, __LINE__,
> +				copy->out_len, sum_iovec_len(copy));
> +		assert(copy->out_len == sum_iovec_len(copy));
> +	}
> +}
> +
> +/* Display an iovec */
> +static void
> +disp_iovec(struct mic_info *mic, struct mic_copy_desc *copy,
> +	const char *s, int line)
> +{
> +	int i;
> +
> +	for (i = 0; i < copy->iovcnt; i++)
> +		mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%lx\n",
> +			mic->name, s, line, i,
> +			copy->iov[i].iov_base, copy->iov[i].iov_len);
> +}
> +
> +static inline __u16 read_avail_idx(struct mic_vring *vr)
> +{
> +	return ACCESS_ONCE(vr->info->avail_idx);
> +}
> +
> +static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
> +				struct mic_copy_desc *copy, ssize_t len)
> +{
> +	copy->vr_idx = tx ? 0 : 1;
> +	copy->update_used = true;
> +	if (type == VIRTIO_ID_NET)
> +		copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr);
> +	else
> +		copy->iov[0].iov_len = len;
> +}
> +
> +/* Central API which triggers the copies */
> +static int
> +mic_virtio_copy(struct mic_info *mic, int fd,
> +	struct mic_vring *vr, struct mic_copy_desc *copy)
> +{
> +	int ret;
> +
> +	ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, copy);
> +	if (ret) {
> +		mpsslog("%s %s %d errno %s ret %d\n",
> +			mic->name, __func__, __LINE__,
> +			strerror(errno), ret);
> +	}
> +	return ret;
> +}
> +
> +/*
> + * This initialization routine requires at least one
> + * vring i.e. vr0. vr1 is optional.
> + */
> +static void *
> +init_vr(struct mic_info *mic, int fd, int type,
> +	struct mic_vring *vr0, struct mic_vring *vr1, int num_vq)
> +{
> +	int vr_size;
> +	char *va;
> +
> +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> +	va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq,
> +		PROT_READ, MAP_SHARED, fd, 0);
> +	if (MAP_FAILED == va) {
> +		mpsslog("%s %s %d mmap failed errno %s\n",
> +			mic->name, __func__, __LINE__,
> +			strerror(errno));
> +		goto done;
> +	}
> +	*get_dp(mic, type) = (void *)va;
> +	vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END];
> +	vr0->info = vr0->va +
> +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN);
> +	vring_init(&vr0->vr,
> +		MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN);
> +	mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ",
> +		__func__, mic->name, vr0->va, vr0->info, vr_size,
> +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> +	mpsslog("magic 0x%x expected 0x%x\n",
> +		vr0->info->magic, MIC_MAGIC + type + 0);
> +	assert(vr0->info->magic == MIC_MAGIC + type + 0);
> +	if (vr1) {
> +		vr1->va = (struct mic_vring *)
> +			&va[MIC_DEVICE_PAGE_END + vr_size];
> +		vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES,
> +			MIC_VIRTIO_RING_ALIGN);
> +		vring_init(&vr1->vr,
> +			MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN);
> +		mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ",
> +			__func__, mic->name, vr1->va, vr1->info, vr_size,
> +			vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> +		mpsslog("magic 0x%x expected 0x%x\n",
> +			vr1->info->magic, MIC_MAGIC + type + 1);
> +		assert(vr1->info->magic == MIC_MAGIC + type + 1);
> +	}
> +done:
> +	return va;
> +}
> +
> +static void
> +uninit_vr(struct mic_info *mic, int num_vq)
> +{
> +	int vr_size, ret;
> +
> +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> +	ret = munmap(mic->mic_virtblk.block_dp,
> +		MIC_DEVICE_PAGE_END + vr_size * num_vq);
> +	if (ret < 0)
> +		mpsslog("%s munmap errno %d\n", mic->name, errno);
> +}
> +
> +static void
> +wait_for_card_driver(struct mic_info *mic, int fd, int type)
> +{
> +	struct pollfd pollfd;
> +	int err;
> +	struct mic_device_desc *desc = get_device_desc(mic, type);
> +
> +	pollfd.fd = fd;
> +	mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n",
> +		mic->name, __func__, type, desc->status);
> +	while (1) {
> +		pollfd.events = POLLIN;
> +		pollfd.revents = 0;
> +		err = poll(&pollfd, 1, -1);
> +		if (err < 0) {
> +			mpsslog("%s %s poll failed %s\n",
> +				mic->name, __func__, strerror(errno));
> +			continue;
> +		}
> +
> +		if (pollfd.revents) {
> +			mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n",
> +				mic->name, __func__, type, desc->status);
> +			if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +				mpsslog("%s %s poll.revents %d\n",
> +					mic->name, __func__, pollfd.revents);
> +				mpsslog("%s %s desc-> type %d status 0x%x\n",
> +					mic->name, __func__, type,
> +					desc->status);
> +				break;
> +			}
> +		}
> +	}
> +}
> +
> +/* Spin till we have some descriptors */
> +static void
> +wait_for_descriptors(struct mic_info *mic, struct mic_vring *vr)
> +{
> +	__u16 avail_idx = read_avail_idx(vr);
> +
> +	while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
> +#ifdef DEBUG
> +		mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
> +			mic->name, __func__,
> +			le16toh(vr->vr.avail->idx), vr->info->avail_idx);
> +#endif
> +		cpu_relax();
> +	}
> +}
> +
> +static void *
> +virtio_net(void *arg)
> +{
> +	static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)];
> +	static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64);
> +	struct iovec vnet_iov[2][2] = {
> +		{ { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) },
> +		  { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } },
> +		{ { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) },
> +		  { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } },
> +	};
> +	struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1];
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	char if_name[IFNAMSIZ];
> +	struct pollfd net_poll[MAX_NET_FD];
> +	struct mic_vring tx_vr, rx_vr;
> +	struct mic_copy_desc copy;
> +	struct mic_device_desc *desc;
> +	int err;
> +
> +	snprintf(if_name, IFNAMSIZ, "mic%d", mic->id);
> +	mic->mic_net.tap_fd = tun_alloc(mic, if_name);
> +	if (mic->mic_net.tap_fd < 0)
> +		goto done;
> +
> +	if (tap_configure(mic, if_name))
> +		goto done;
> +	mpsslog("MIC name %s id %d\n", mic->name, mic->id);
> +
> +	net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd;
> +	net_poll[NET_FD_VIRTIO_NET].events = POLLIN;
> +	net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd;
> +	net_poll[NET_FD_TUN].events = POLLIN;
> +
> +	if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd,
> +		VIRTIO_ID_NET, &tx_vr, &rx_vr,
> +		virtnet_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		goto done;
> +	}
> +
> +	copy.iovcnt = 2;
> +	desc = get_device_desc(mic, VIRTIO_ID_NET);
> +
> +	while (1) {
> +		ssize_t len;
> +
> +		net_poll[NET_FD_VIRTIO_NET].revents = 0;
> +		net_poll[NET_FD_TUN].revents = 0;
> +
> +		/* Start polling for data from tap and virtio net */
> +		err = poll(net_poll, 2, -1);
> +		if (err < 0) {
> +			mpsslog("%s poll failed %s\n",
> +				__func__, strerror(errno));
> +			continue;
> +		}
> +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> +			wait_for_card_driver(mic, mic->mic_net.virtio_net_fd,
> +					VIRTIO_ID_NET);
> +		/*
> +		 * Check if there is data to be read from TUN and write to
> +		 * virtio net fd if there is.
> +		 */
> +		if (net_poll[NET_FD_TUN].revents & POLLIN) {
> +			copy.iov = iov0;
> +			len = readv(net_poll[NET_FD_TUN].fd,
> +				copy.iov, copy.iovcnt);
> +			if (len > 0) {
> +				struct virtio_net_hdr *hdr
> +					= (struct virtio_net_hdr *) vnet_hdr[0];
> +
> +				/* Disable checksums on the card since we are on
> +				   a reliable PCIe link */
> +				hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
> +#ifdef DEBUG
> +				mpsslog("%s %s %d hdr->flags 0x%x ", mic->name,
> +					__func__, __LINE__, hdr->flags);
> +				mpsslog("copy.out_len %d hdr->gso_type 0x%x\n",
> +					copy.out_len, hdr->gso_type);
> +#endif
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read from tap 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					len);
> +#endif
> +				wait_for_descriptors(mic, &tx_vr);
> +				txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, &copy,
> +					len);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_net.virtio_net_fd, &tx_vr,
> +					&copy);
> +				if (err < 0) {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +				}
> +				if (!err)
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					sum_iovec_len(&copy));
> +#endif
> +				/* Reinitialize IOV for next run */
> +				iov0[1].iov_len = MAX_NET_PKT_SIZE;
> +			} else if (len < 0) {
> +				disp_iovec(mic, &copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read failed %s ", mic->name,
> +					__func__, __LINE__, strerror(errno));
> +				mpsslog("cnt %d sum %d\n",
> +					copy.iovcnt, sum_iovec_len(&copy));
> +			}
> +		}
> +
> +		/*
> +		 * Check if there is data to be read from virtio net and
> +		 * write to TUN if there is.
> +		 */
> +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) {
> +			while (rx_vr.info->avail_idx !=
> +				le16toh(rx_vr.vr.avail->idx)) {
> +				copy.iov = iov1;
> +				txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, &copy,
> +					MAX_NET_PKT_SIZE
> +					+ sizeof(struct virtio_net_hdr));
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_net.virtio_net_fd, &rx_vr,
> +					&copy);
> +				if (!err) {
> +#ifdef DEBUG
> +					struct virtio_net_hdr *hdr
> +						= (struct virtio_net_hdr *)
> +							vnet_hdr[1];
> +
> +					mpsslog("%s %s %d hdr->flags 0x%x, ",
> +						mic->name, __func__, __LINE__,
> +						hdr->flags);
> +					mpsslog("out_len %d gso_type 0x%x\n",
> +						copy.out_len,
> +						hdr->gso_type);
> +#endif
> +					/* Set the correct output iov_len */
> +					iov1[1].iov_len = copy.out_len -
> +						sizeof(struct virtio_net_hdr);
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +					disp_iovec(mic, copy, __func__,
> +						__LINE__);
> +					mpsslog("%s %s %d ",
> +						mic->name, __func__, __LINE__);
> +					mpsslog("read from net 0x%lx\n",
> +						sum_iovec_len(copy));
> +#endif
> +					len = writev(net_poll[NET_FD_TUN].fd,
> +						copy.iov, copy.iovcnt);
> +					if (len != sum_iovec_len(&copy)) {
> +						mpsslog("Tun write failed %s ",
> +							strerror(errno));
> +						mpsslog("len 0x%x ", len);
> +						mpsslog("read_len 0x%x\n",
> +							sum_iovec_len(&copy));
> +					} else {
> +#ifdef DEBUG
> +						disp_iovec(mic, &copy, __func__,
> +							__LINE__);
> +						mpsslog("%s %s %d ",
> +							mic->name, __func__,
> +							__LINE__);
> +						mpsslog("wrote to tap 0x%lx\n",
> +							len);
> +#endif
> +					}
> +				} else {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +					break;
> +				}
> +			}
> +		}
> +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> +			sleep(1);
> +		}
> +	}
> +done:
> +	pthread_exit(NULL);
> +}
> +
> +/* virtio_console */
> +#define VIRTIO_CONSOLE_FD 0
> +#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1)
> +#define MAX_CONSOLE_FD (MONITOR_FD + 1)  /* must be the last one + 1 */
> +#define MAX_BUFFER_SIZE PAGE_SIZE
> +
> +static void *
> +virtio_console(void *arg)
> +{
> +	static __u8 vcons_buf[2][PAGE_SIZE];
> +	struct iovec vcons_iov[2] = {
> +		{ .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) },
> +		{ .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) },
> +	};
> +	struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1];
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	int err;
> +	struct pollfd console_poll[MAX_CONSOLE_FD];
> +	int pty_fd;
> +	char *pts_name;
> +	ssize_t len;
> +	struct mic_vring tx_vr, rx_vr;
> +	struct mic_copy_desc copy;
> +	struct mic_device_desc *desc;
> +
> +	pty_fd = posix_openpt(O_RDWR);
> +	if (pty_fd < 0) {
> +		mpsslog("can't open a pseudoterminal master device: %s\n",
> +			strerror(errno));
> +		goto _return;
> +	}
> +	pts_name = ptsname(pty_fd);
> +	if (pts_name == NULL) {
> +		mpsslog("can't get pts name\n");
> +		goto _close_pty;
> +	}
> +	printf("%s console message goes to %s\n", mic->name, pts_name);
> +	mpsslog("%s console message goes to %s\n", mic->name, pts_name);
> +	err = grantpt(pty_fd);
> +	if (err < 0) {
> +		mpsslog("can't grant access: %s %s\n",
> +				pts_name, strerror(errno));
> +		goto _close_pty;
> +	}
> +	err = unlockpt(pty_fd);
> +	if (err < 0) {
> +		mpsslog("can't unlock a pseudoterminal: %s %s\n",
> +				pts_name, strerror(errno));
> +		goto _close_pty;
> +	}
> +	console_poll[MONITOR_FD].fd = pty_fd;
> +	console_poll[MONITOR_FD].events = POLLIN;
> +
> +	console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd;
> +	console_poll[VIRTIO_CONSOLE_FD].events = POLLIN;
> +
> +	if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd,
> +		VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr,
> +		virtcons_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		goto _close_pty;
> +	}
> +
> +	copy.iovcnt = 1;
> +	desc = get_device_desc(mic, VIRTIO_ID_CONSOLE);
> +
> +	for (;;) {
> +		console_poll[MONITOR_FD].revents = 0;
> +		console_poll[VIRTIO_CONSOLE_FD].revents = 0;
> +		err = poll(console_poll, MAX_CONSOLE_FD, -1);
> +		if (err < 0) {
> +			mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__,
> +				strerror(errno));
> +			continue;
> +		}
> +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> +			wait_for_card_driver(mic,
> +				mic->mic_console.virtio_console_fd,
> +				VIRTIO_ID_CONSOLE);
> +
> +		if (console_poll[MONITOR_FD].revents & POLLIN) {
> +			copy.iov = iov0;
> +			len = readv(pty_fd, copy.iov, copy.iovcnt);
> +			if (len > 0) {
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read from tap 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					len);
> +#endif
> +				wait_for_descriptors(mic, &tx_vr);
> +				txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr,
> +					&copy, len);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_console.virtio_console_fd,
> +					&tx_vr, &copy);
> +				if (err < 0) {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +				}
> +				if (!err)
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +				disp_iovec(mic, copy, __func__, __LINE__);
> +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> +					mic->name, __func__, __LINE__,
> +					sum_iovec_len(copy));
> +#endif
> +				/* Reinitialize IOV for next run */
> +				iov0->iov_len = PAGE_SIZE;
> +			} else if (len < 0) {
> +				disp_iovec(mic, &copy, __func__, __LINE__);
> +				mpsslog("%s %s %d read failed %s ",
> +					mic->name, __func__, __LINE__,
> +					strerror(errno));
> +				mpsslog("cnt %d sum %d\n",
> +					copy.iovcnt, sum_iovec_len(&copy));
> +			}
> +		}
> +
> +		if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) {
> +			while (rx_vr.info->avail_idx !=
> +				le16toh(rx_vr.vr.avail->idx)) {
> +				copy.iov = iov1;
> +				txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr,
> +					&copy, PAGE_SIZE);
> +
> +				err = mic_virtio_copy(mic,
> +					mic->mic_console.virtio_console_fd,
> +					&rx_vr, &copy);
> +				if (!err) {
> +					/* Set the correct output iov_len */
> +					iov1->iov_len = copy.out_len;
> +					verify_out_len(mic, &copy);
> +#ifdef DEBUG
> +					disp_iovec(mic, copy, __func__,
> +						__LINE__);
> +					mpsslog("%s %s %d ",
> +						mic->name, __func__, __LINE__);
> +					mpsslog("read from net 0x%lx\n",
> +						sum_iovec_len(copy));
> +#endif
> +					len = writev(pty_fd,
> +						copy.iov, copy.iovcnt);
> +					if (len != sum_iovec_len(&copy)) {
> +						mpsslog("Tun write failed %s ",
> +							strerror(errno));
> +						mpsslog("len 0x%x ", len);
> +						mpsslog("read_len 0x%x\n",
> +							sum_iovec_len(&copy));
> +					} else {
> +#ifdef DEBUG
> +						disp_iovec(mic, copy, __func__,
> +							__LINE__);
> +						mpsslog("%s %s %d ",
> +							mic->name, __func__,
> +							__LINE__);
> +						mpsslog("wrote to tap 0x%lx\n",
> +							len);
> +#endif
> +					}
> +				} else {
> +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> +						mic->name, __func__, __LINE__,
> +						strerror(errno));
> +					break;
> +				}
> +			}
> +		}
> +		if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> +			sleep(1);
> +		}
> +	}
> +_close_pty:
> +	close(pty_fd);
> +_return:
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd)
> +{
> +	char path[PATH_MAX];
> +	int fd, err;
> +
> +	snprintf(path, PATH_MAX, "/dev/mic%d", mic->id);
> +	fd = open(path, O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Could not open %s %s\n", path, strerror(errno));
> +		return;
> +	}
> +
> +	err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd);
> +	if (err < 0) {
> +		mpsslog("Could not add %d %s\n", dd->type, strerror(errno));
> +		close(fd);
> +		return;
> +	}
> +	switch (dd->type) {
> +	case VIRTIO_ID_NET:
> +		mic->mic_net.virtio_net_fd = fd;
> +		mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name);
> +		break;
> +	case VIRTIO_ID_CONSOLE:
> +		mic->mic_console.virtio_console_fd = fd;
> +		mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name);
> +		break;
> +	case VIRTIO_ID_BLOCK:
> +		mic->mic_virtblk.virtio_block_fd = fd;
> +		mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name);
> +		break;
> +	}
> +}
> +
> +static bool
> +set_backend_file(struct mic_info *mic)
> +{
> +	FILE *config;
> +	char buff[PATH_MAX], *line, *evv, *p;
> +
> +	snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id);
> +	config = fopen(buff, "r");
> +	if (config == NULL)
> +		return false;
> +	do {  /* look for "virtblk_backend=XXXX" */
> +		line = fgets(buff, PATH_MAX, config);
> +		if (line == NULL)
> +			break;
> +		if (*line == '#')
> +			continue;
> +		p = strchr(line, '\n');
> +		if (p)
> +			*p = '\0';
> +	} while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0);
> +	fclose(config);
> +	if (line == NULL)
> +		return false;
> +	evv = strchr(line, '=');
> +	if (evv == NULL)
> +		return false;
> +	mic->mic_virtblk.backend_file = malloc(strlen(evv));
> +	if (mic->mic_virtblk.backend_file == NULL) {
> +		mpsslog("can't allocate memory\n", mic->name, mic->id);
> +		return false;
> +	}
> +	strcpy(mic->mic_virtblk.backend_file, evv + 1);
> +	return true;
> +}
> +
> +#define SECTOR_SIZE 512
> +static bool
> +set_backend_size(struct mic_info *mic)
> +{
> +	mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0,
> +		SEEK_END);
> +	if (mic->mic_virtblk.backend_size < 0) {
> +		mpsslog("%s: can't seek: %s\n",
> +			mic->name, mic->mic_virtblk.backend_file);
> +		return false;
> +	}
> +	virtblk_dev_page.blk_config.capacity =
> +		mic->mic_virtblk.backend_size / SECTOR_SIZE;
> +	if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0)
> +		virtblk_dev_page.blk_config.capacity++;
> +
> +	virtblk_dev_page.blk_config.capacity =
> +		htole64(virtblk_dev_page.blk_config.capacity);
> +
> +	return true;
> +}
> +
> +static bool
> +open_backend(struct mic_info *mic)
> +{
> +	if (!set_backend_file(mic))
> +		goto _error_exit;
> +	mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR);
> +	if (mic->mic_virtblk.backend < 0) {
> +		mpsslog("%s: can't open: %s\n", mic->name,
> +			mic->mic_virtblk.backend_file);
> +		goto _error_free;
> +	}
> +	if (!set_backend_size(mic))
> +		goto _error_close;
> +	mic->mic_virtblk.backend_addr = mmap(NULL,
> +		mic->mic_virtblk.backend_size,
> +		PROT_READ|PROT_WRITE, MAP_SHARED,
> +		mic->mic_virtblk.backend, 0L);
> +	if (mic->mic_virtblk.backend_addr == MAP_FAILED) {
> +		mpsslog("%s: can't map: %s %s\n",
> +			mic->name, mic->mic_virtblk.backend_file,
> +			strerror(errno));
> +		goto _error_close;
> +	}
> +	return true;
> +
> + _error_close:
> +	close(mic->mic_virtblk.backend);
> + _error_free:
> +	free(mic->mic_virtblk.backend_file);
> + _error_exit:
> +	return false;
> +}
> +
> +static void
> +close_backend(struct mic_info *mic)
> +{
> +	munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size);
> +	close(mic->mic_virtblk.backend);
> +	free(mic->mic_virtblk.backend_file);
> +}
> +
> +static bool
> +start_virtblk(struct mic_info *mic, struct mic_vring *vring)
> +{
> +	if (((__u64)&virtblk_dev_page.blk_config % 8) != 0) {
> +		mpsslog("%s: blk_config is not 8 byte aligned.\n",
> +			mic->name);
> +		return false;
> +	}
> +	add_virtio_device(mic, &virtblk_dev_page.dd);
> +	if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd,
> +		VIRTIO_ID_BLOCK, vring, NULL, virtblk_dev_page.dd.num_vq)) {
> +		mpsslog("%s init_vr failed %s\n",
> +			mic->name, strerror(errno));
> +		return false;
> +	}
> +	return true;
> +}
> +
> +static void
> +stop_virtblk(struct mic_info *mic)
> +{
> +	uninit_vr(mic, virtblk_dev_page.dd.num_vq);
> +	close(mic->mic_virtblk.virtio_block_fd);
> +}
> +
> +static __u8
> +header_error_check(struct vring_desc *desc)
> +{
> +	if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) {
> +		mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n",
> +				__func__, __LINE__);
> +		return -EIO;
> +	}
> +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) {
> +		mpsslog("%s() %d: alone\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	if (le16toh(desc->flags) & VRING_DESC_F_WRITE) {
> +		mpsslog("%s() %d: not read\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	return 0;
> +}
> +
> +static int
> +read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx)
> +{
> +	struct iovec iovec;
> +	struct mic_copy_desc copy;
> +
> +	iovec.iov_len = sizeof(*hdr);
> +	iovec.iov_base = hdr;
> +	copy.iov = &iovec;
> +	copy.iovcnt = 1;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = false;  /* do not update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static int
> +transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt)
> +{
> +	struct mic_copy_desc copy;
> +
> +	copy.iov = iovec;
> +	copy.iovcnt = iovcnt;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = false;  /* do not update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static __u8
> +status_error_check(struct vring_desc *desc)
> +{
> +	if (le32toh(desc->len) != sizeof(__u8)) {
> +		mpsslog("%s() %d: length is not sizeof(status)\n",
> +			__func__, __LINE__);
> +		return -EIO;
> +	}
> +	return 0;
> +}
> +
> +static int
> +write_status(int fd, __u8 *status)
> +{
> +	struct iovec iovec;
> +	struct mic_copy_desc copy;
> +
> +	iovec.iov_base = status;
> +	iovec.iov_len = sizeof(*status);
> +	copy.iov = &iovec;
> +	copy.iovcnt = 1;
> +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> +	copy.update_used = true; /* Update used index */
> +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> +}
> +
> +static void *
> +virtio_block(void *arg)
> +{
> +	struct mic_info *mic = (struct mic_info *) arg;
> +	int ret;
> +	struct pollfd block_poll;
> +	struct mic_vring vring;
> +	__u16 avail_idx;
> +	__u32 desc_idx;
> +	struct vring_desc *desc;
> +	struct iovec *iovec, *piov;
> +	__u8 status;
> +	__u32 buffer_desc_idx;
> +	struct virtio_blk_outhdr hdr;
> +	void *fos;
> +
> +	for (;;) {  /* forever */
> +		if (!open_backend(mic)) { /* No virtblk */
> +			for (mic->mic_virtblk.signaled = 0;
> +				!mic->mic_virtblk.signaled;)
> +				sleep(1);
> +			continue;
> +		}
> +
> +		/* backend file is specified. */
> +		if (!start_virtblk(mic, &vring))
> +			goto _close_backend;
> +		iovec = malloc(sizeof(*iovec) *
> +			le32toh(virtblk_dev_page.blk_config.seg_max));
> +		if (!iovec) {
> +			mpsslog("%s: can't alloc iovec: %s\n",
> +				mic->name, strerror(ENOMEM));
> +			goto _stop_virtblk;
> +		}
> +
> +		block_poll.fd = mic->mic_virtblk.virtio_block_fd;
> +		block_poll.events = POLLIN;
> +		for (mic->mic_virtblk.signaled = 0;
> +		     !mic->mic_virtblk.signaled;) {
> +			block_poll.revents = 0;
> +					/* timeout in 1 sec to see signaled */
> +			ret = poll(&block_poll, 1, 1000);
> +			if (ret < 0) {
> +				mpsslog("%s %d: poll failed: %s\n",
> +					__func__, __LINE__,
> +					strerror(errno));
> +				continue;
> +			}
> +
> +			if (!(block_poll.revents & POLLIN)) {
> +#ifdef DEBUG
> +				mpsslog("%s %d: block_poll.revents=0x%x\n",
> +					__func__, __LINE__, block_poll.revents);
> +				sleep(1);
> +#endif
> +				continue;
> +			}
> +
> +			/* POLLIN */
> +			while (vring.info->avail_idx !=
> +				le16toh(vring.vr.avail->idx)) {
> +				/* read header element */
> +				avail_idx =
> +					vring.info->avail_idx &
> +					(vring.vr.num - 1);
> +				desc_idx = le16toh(
> +					vring.vr.avail->ring[avail_idx]);
> +				desc = &vring.vr.desc[desc_idx];
> +#ifdef DEBUG
> +				mpsslog("%s() %d: avail_idx=%d ",
> +					__func__, __LINE__,
> +					vring.info->avail_idx);
> +				mpsslog("vring.vr.num=%d desc=%p\n",
> +					vring.vr.num, desc);
> +#endif
> +				status = header_error_check(desc);
> +				ret = read_header(
> +					mic->mic_virtblk.virtio_block_fd,
> +					&hdr, desc_idx);
> +				if (ret < 0) {
> +					mpsslog("%s() %d %s: ret=%d %s\n",
> +						__func__, __LINE__,
> +						mic->name, ret,
> +						strerror(errno));
> +					break;
> +				}
> +				/* buffer element */
> +				piov = iovec;
> +				status = 0;
> +				fos = mic->mic_virtblk.backend_addr +
> +					(hdr.sector * SECTOR_SIZE);
> +				buffer_desc_idx = desc_idx =
> +					next_desc(desc);
> +				for (desc = &vring.vr.desc[buffer_desc_idx];
> +				     desc->flags & VRING_DESC_F_NEXT;
> +				     desc_idx = next_desc(desc),
> +					     desc = &vring.vr.desc[desc_idx]) {
> +					piov->iov_len = desc->len;
> +					piov->iov_base = fos;
> +					piov++;
> +					fos += desc->len;
> +				}
> +				/* Returning NULLs for VIRTIO_BLK_T_GET_ID. */
> +				if (hdr.type & ~(VIRTIO_BLK_T_OUT |
> +					VIRTIO_BLK_T_GET_ID)) {
> +					/*
> +					  VIRTIO_BLK_T_IN - does not do
> +					  anything. Probably for documenting.
> +					  VIRTIO_BLK_T_SCSI_CMD - for
> +					  virtio_scsi.
> +					  VIRTIO_BLK_T_FLUSH - turned off in
> +					  config space.
> +					  VIRTIO_BLK_T_BARRIER - defined but not
> +					  used in anywhere.
> +					*/
> +					mpsslog("%s() %d: type %x ",
> +						__func__, __LINE__,
> +						hdr.type);
> +					mpsslog("is not supported\n");
> +					status = -ENOTSUP;
> +
> +				} else {
> +					ret = transfer_blocks(
> +					mic->mic_virtblk.virtio_block_fd,
> +						iovec,
> +						piov - iovec);
> +					if (ret < 0 &&
> +						status != 0)
> +						status = ret;
> +				}
> +				/* write status and update used pointer */
> +				if (status != 0)
> +					status = status_error_check(desc);
> +				ret = write_status(
> +					mic->mic_virtblk.virtio_block_fd,
> +					&status);
> +#ifdef DEBUG
> +				mpsslog("%s() %d: write status=%d on desc=%p\n",
> +					__func__, __LINE__,
> +					status, desc);
> +#endif
> +			}
> +		}
> +		free(iovec);
> +_stop_virtblk:
> +		stop_virtblk(mic);
> +_close_backend:
> +		close_backend(mic);
> +	}  /* forever */
> +
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +reset(struct mic_info *mic)
> +{
> +#define RESET_TIMEOUT 120
> +	int i = RESET_TIMEOUT;
> +	setsysfs(mic->name, "state", "reset");
> +	while (i) {
> +		char *state;
> +		state = readsysfs(mic->name, "state");
> +		if (!state)
> +			goto retry;
> +		mpsslog("%s: %s %d state %s\n",
> +			mic->name, __func__, __LINE__, state);
> +		if ((!strcmp(state, "offline"))) {
> +			free(state);
> +			break;
> +		}
> +		free(state);
> +retry:
> +		sleep(1);
> +		i--;
> +	}
> +}
> +
> +static int
> +get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status)
> +{
> +	if (!strcmp(shutdown_status, "nop"))
> +		return MIC_NOP;
> +	if (!strcmp(shutdown_status, "crashed"))
> +		return MIC_CRASHED;
> +	if (!strcmp(shutdown_status, "halted"))
> +		return MIC_HALTED;
> +	if (!strcmp(shutdown_status, "poweroff"))
> +		return MIC_POWER_OFF;
> +	if (!strcmp(shutdown_status, "restart"))
> +		return MIC_RESTART;
> +	mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status);
> +	/* Invalid state */
> +	assert(0);
> +};
> +
> +static int get_mic_state(struct mic_info *mic, char *state)
> +{
> +	if (!strcmp(state, "offline"))
> +		return MIC_OFFLINE;
> +	if (!strcmp(state, "online"))
> +		return MIC_ONLINE;
> +	if (!strcmp(state, "shutting_down"))
> +		return MIC_SHUTTING_DOWN;
> +	if (!strcmp(state, "reset_failed"))
> +		return MIC_RESET_FAILED;
> +	mpsslog("%s: BUG invalid state %s\n", mic->name, state);
> +	/* Invalid state */
> +	assert(0);
> +};
> +
> +static void mic_handle_shutdown(struct mic_info *mic)
> +{
> +#define SHUTDOWN_TIMEOUT 60
> +	int i = SHUTDOWN_TIMEOUT, ret, stat = 0;
> +	char *shutdown_status;
> +	while (i) {
> +		shutdown_status = readsysfs(mic->name, "shutdown_status");
> +		if (!shutdown_status)
> +			continue;
> +		mpsslog("%s: %s %d shutdown_status %s\n",
> +			mic->name, __func__, __LINE__, shutdown_status);
> +		switch (get_mic_shutdown_status(mic, shutdown_status)) {
> +		case MIC_RESTART:
> +			mic->restart = 1;
> +		case MIC_HALTED:
> +		case MIC_POWER_OFF:
> +		case MIC_CRASHED:
> +			goto reset;
> +		default:
> +			break;
> +		}
> +		free(shutdown_status);
> +		sleep(1);
> +		i--;
> +	}
> +reset:
> +	ret = kill(mic->pid, SIGTERM);
> +	mpsslog("%s: %s %d kill pid %d ret %d\n",
> +		mic->name, __func__, __LINE__,
> +		mic->pid, ret);
> +	if (!ret) {
> +		ret = waitpid(mic->pid, &stat,
> +			WIFSIGNALED(stat));
> +		mpsslog("%s: %s %d waitpid ret %d pid %d\n",
> +			mic->name, __func__, __LINE__,
> +			ret, mic->pid);
> +	}
> +	if (ret == mic->pid)
> +		reset(mic);
> +}
> +
> +static void *
> +mic_config(void *arg)
> +{
> +	struct mic_info *mic = (struct mic_info *)arg;
> +	char *state = NULL;
> +	char pathname[PATH_MAX];
> +	int fd, ret;
> +	struct pollfd ufds[1];
> +	char value[4096];
> +
> +	snprintf(pathname, PATH_MAX - 1, "%s/%s/%s",
> +		MICSYSFSDIR, mic->name, "state");
> +
> +	fd = open(pathname, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("%s: opening file %s failed %s\n",
> +			mic->name, pathname, strerror(errno));
> +		goto error;
> +	}
> +
> +	do {
> +		ret = read(fd, value, sizeof(value));
> +		if (ret < 0) {
> +			mpsslog("%s: Failed to read sysfs entry '%s': %s\n",
> +				mic->name, pathname, strerror(errno));
> +			goto close_error1;
> +		}
> +retry:
> +		state = readsysfs(mic->name, "state");
> +		if (!state)
> +			goto retry;
> +		mpsslog("%s: %s %d state %s\n",
> +			mic->name, __func__, __LINE__, state);
> +		switch (get_mic_state(mic, state)) {
> +		case MIC_SHUTTING_DOWN:
> +			mic_handle_shutdown(mic);
> +			goto close_error;
> +		default:
> +			break;
> +		}
> +		free(state);
> +
> +		ufds[0].fd = fd;
> +		ufds[0].events = POLLERR | POLLPRI;
> +		ret = poll(ufds, 1, -1);
> +		if (ret < 0) {
> +			mpsslog("%s: poll failed %s\n",
> +				mic->name, strerror(errno));
> +			goto close_error1;
> +		}
> +	} while (1);
> +close_error:
> +	free(state);
> +close_error1:
> +	close(fd);
> +error:
> +	init_mic(mic);
> +	pthread_exit(NULL);
> +}
> +
> +static void
> +set_cmdline(struct mic_info *mic)
> +{
> +	char buffer[PATH_MAX];
> +	int len;
> +
> +	len = snprintf(buffer, PATH_MAX,
> +		"clocksource=tsc highres=off nohz=off ");
> +	len += snprintf(buffer + len, PATH_MAX,
> +		"cpufreq_on;corec6_off;pc3_off;pc6_off ");
> +	len += snprintf(buffer + len, PATH_MAX,
> +		"ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0",
> +		mic->id);
> +
> +	setsysfs(mic->name, "cmdline", buffer);
> +	mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer);
> +	snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id);
> +	mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer);
> +}
> +
> +static void
> +set_log_buf_info(struct mic_info *mic)
> +{
> +	int fd;
> +	off_t len;
> +	char system_map[] = "/lib/firmware/mic/System.map";
> +	char *map, *temp, log_buf[17] = {'\0'};
> +
> +	fd = open(system_map, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("%s: Opening System.map failed: %d\n",
> +			mic->name, errno);
> +		return;
> +	}
> +	len = lseek(fd, 0, SEEK_END);
> +	if (len < 0) {
> +		mpsslog("%s: Reading System.map size failed: %d\n",
> +			mic->name, errno);
> +		close(fd);
> +		return;
> +	}
> +	map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
> +	if (map == MAP_FAILED) {
> +		mpsslog("%s: mmap of System.map failed: %d\n",
> +			mic->name, errno);
> +		close(fd);
> +		return;
> +	}
> +	temp = strstr(map, "__log_buf");
> +	if (!temp) {
> +		mpsslog("%s: __log_buf not found: %d\n", mic->name, errno);
> +		munmap(map, len);
> +		close(fd);
> +		return;
> +	}
> +	strncpy(log_buf, temp - 19, 16);
> +	setsysfs(mic->name, "log_buf_addr", log_buf);
> +	mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf);
> +	temp = strstr(map, "log_buf_len");
> +	if (!temp) {
> +		mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno);
> +		munmap(map, len);
> +		close(fd);
> +		return;
> +	}
> +	strncpy(log_buf, temp - 19, 16);
> +	setsysfs(mic->name, "log_buf_len", log_buf);
> +	mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf);
> +	munmap(map, len);
> +	close(fd);
> +}
> +
> +static void init_mic(struct mic_info *mic);
> +
> +static void
> +change_virtblk_backend(int x, siginfo_t *siginfo, void *p)
> +{
> +	struct mic_info *mic;
> +
> +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> +		mic->mic_virtblk.signaled = 1/* true */;
> +}
> +
> +static void
> +init_mic(struct mic_info *mic)
> +{
> +	struct sigaction ignore = {
> +		.sa_flags = 0,
> +		.sa_handler = SIG_IGN
> +	};
> +	struct sigaction act = {
> +		.sa_flags = SA_SIGINFO,
> +		.sa_sigaction = change_virtblk_backend,
> +	};
> +	char buffer[PATH_MAX];
> +	int err;
> +
> +		/* ignore SIGUSR1 for both process */
> +	sigaction(SIGUSR1, &ignore, NULL);
> +
> +	mic->pid = fork();
> +	switch (mic->pid) {
> +	case 0:
> +		set_log_buf_info(mic);
> +		set_cmdline(mic);
> +		add_virtio_device(mic, &virtcons_dev_page.dd);
> +		add_virtio_device(mic, &virtnet_dev_page.dd);
> +		err = pthread_create(&mic->mic_console.console_thread, NULL,
> +			virtio_console, mic);
> +		if (err)
> +			mpsslog("%s virtcons pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		/*
> +		 * TODO: Debug why not adding this sleep results in the tap
> +		 * interface not coming up during certain runs sporadically.
> +		 */

Indeed.

> +		usleep(1000);
> +		err = pthread_create(&mic->mic_net.net_thread, NULL,
> +			virtio_net, mic);
> +		if (err)
> +			mpsslog("%s virtnet pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		err = pthread_create(&mic->mic_virtblk.block_thread, NULL,
> +			virtio_block, mic);
> +		if (err)
> +			mpsslog("%s virtblk pthread_create failed %s\n",
> +			mic->name, strerror(err));
> +		sigemptyset(&act.sa_mask);
> +		err = sigaction(SIGUSR1, &act, NULL);

Confused. Who sends this SIGUSR1 here?


> +		if (err)
> +			mpsslog("%s sigaction SIGUSR1 failed %s\n",
> +			mic->name, strerror(errno));
> +		while (1)
> +			sleep(60);
> +	case -1:
> +		mpsslog("fork failed MIC name %s id %d errno %d\n",
> +			mic->name, mic->id, errno);
> +		break;
> +	default:
> +		if (mic->restart) {
> +			snprintf(buffer, PATH_MAX,
> +				"boot:linux:mic/uos.img:mic/mic%d.image",
> +				mic->id);
> +			setsysfs(mic->name, "state", buffer);
> +			mpsslog("%s restarting mic %d\n",
> +				mic->name, mic->restart);
> +			mic->restart = 0;
> +		}
> +		pthread_create(&mic->config_thread, NULL, mic_config, mic);
> +	}
> +}
> +
> +static void
> +start_daemon(void)
> +{
> +	struct mic_info *mic;
> +
> +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> +		init_mic(mic);
> +
> +	while (1)
> +		sleep(60);
> +}
> +
> +static int
> +init_mic_list(void)
> +{
> +	struct mic_info *mic = &mic_list;
> +	struct dirent *file;
> +	DIR *dp;
> +	int cnt = 0;
> +
> +	dp = opendir(MICSYSFSDIR);
> +	if (!dp)
> +		return 0;
> +
> +	while ((file = readdir(dp)) != NULL) {
> +		if (!strncmp(file->d_name, "mic", 3)) {
> +			mic->next = malloc(sizeof(struct mic_info));
> +			if (mic->next) {
> +				mic = mic->next;
> +				mic->next = NULL;
> +				memset(mic, 0, sizeof(struct mic_info));
> +				mic->id = atoi(&file->d_name[3]);
> +				mic->name = malloc(strlen(file->d_name) + 16);
> +				if (mic->name)
> +					strcpy(mic->name, file->d_name);
> +				mpsslog("MIC name %s id %d\n", mic->name,
> +					mic->id);
> +				cnt++;
> +			}
> +		}
> +	}
> +
> +	closedir(dp);
> +	return cnt;
> +}
> +
> +void
> +mpsslog(char *format, ...)
> +{
> +	va_list args;
> +	char buffer[4096];
> +	time_t t;
> +	char *ts;
> +
> +	if (logfp == NULL)
> +		return;
> +
> +	va_start(args, format);
> +	vsprintf(buffer, format, args);
> +	va_end(args);
> +
> +	time(&t);
> +	ts = ctime(&t);
> +	ts[strlen(ts) - 1] = '\0';
> +	fprintf(logfp, "%s: %s", ts, buffer);
> +
> +	fflush(logfp);
> +}
> +
> +int
> +main(int argc, char *argv[])
> +{
> +	int cnt;
> +
> +	myname = argv[0];
> +
> +	logfp = fopen(LOGFILE_NAME, "a+");
> +	if (!logfp) {
> +		fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME);
> +		exit(1);
> +	}
> +
> +	mpsslog("MIC Daemon start\n");
> +
> +	cnt = init_mic_list();
> +	if (cnt == 0) {
> +		mpsslog("MIC module not loaded\n");
> +		exit(2);
> +	}
> +	mpsslog("MIC found %d devices\n", cnt);
> +
> +	start_daemon();
> +
> +	exit(0);
> +}
> diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h
> new file mode 100644
> index 0000000..b6dee38
> --- /dev/null
> +++ b/Documentation/mic/mpssd/mpssd.h
> @@ -0,0 +1,100 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +#ifndef _MPSSD_H_
> +#define _MPSSD_H_
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <dirent.h>
> +#include <libgen.h>
> +#include <pthread.h>
> +#include <stdarg.h>
> +#include <time.h>
> +#include <errno.h>
> +#include <sys/dir.h>
> +#include <sys/ioctl.h>
> +#include <sys/poll.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <sys/mman.h>
> +#include <sys/utsname.h>
> +#include <sys/wait.h>
> +#include <netinet/in.h>
> +#include <arpa/inet.h>
> +#include <netdb.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <limits.h>
> +#include <syslog.h>
> +#include <getopt.h>
> +#include <net/if.h>
> +#include <linux/if_tun.h>
> +#include <linux/if_tun.h>
> +#include <linux/virtio_ids.h>
> +
> +#define MICSYSFSDIR "/sys/class/mic"
> +#define LOGFILE_NAME "/var/log/mpssd"
> +#define PAGE_SIZE 4096
> +
> +struct mic_console_info {
> +	pthread_t       console_thread;
> +	int		virtio_console_fd;
> +	void		*console_dp;
> +};
> +
> +struct mic_net_info {
> +	pthread_t       net_thread;
> +	int		virtio_net_fd;
> +	int		tap_fd;
> +	void		*net_dp;
> +};
> +
> +struct mic_virtblk_info {
> +	pthread_t       block_thread;
> +	int		virtio_block_fd;
> +	void		*block_dp;
> +	volatile sig_atomic_t	signaled;
> +	char		*backend_file;
> +	int		backend;
> +	void		*backend_addr;
> +	long		backend_size;
> +};
> +
> +struct mic_info {
> +	int		id;
> +	char		*name;
> +	pthread_t       config_thread;
> +	pid_t		pid;
> +	struct mic_console_info	mic_console;
> +	struct mic_net_info	mic_net;
> +	struct mic_virtblk_info	mic_virtblk;
> +	int		restart;
> +	struct mic_info *next;
> +};
> +
> +void mpsslog(char *format, ...);
> +char *readsysfs(char *dir, char *entry);
> +int setsysfs(char *dir, char *entry, char *value);
> +#endif
> diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c
> new file mode 100644
> index 0000000..3244dcf
> --- /dev/null
> +++ b/Documentation/mic/mpssd/sysfs.c
> @@ -0,0 +1,103 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC User Space Tools.
> + */
> +
> +#include "mpssd.h"
> +
> +#define PAGE_SIZE 4096
> +
> +char *
> +readsysfs(char *dir, char *entry)
> +{
> +	char filename[PATH_MAX];
> +	char value[PAGE_SIZE];
> +	char *string = NULL;
> +	int fd;
> +	int len;
> +
> +	if (dir == NULL)
> +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> +	else
> +		snprintf(filename, PATH_MAX,
> +			"%s/%s/%s", MICSYSFSDIR, dir, entry);
> +
> +	fd = open(filename, O_RDONLY);
> +	if (fd < 0) {
> +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		return NULL;
> +	}
> +
> +	len = read(fd, value, sizeof(value));
> +	if (len < 0) {
> +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		goto readsys_ret;
> +	}
> +
> +	value[len] = '\0';

Why are you careful to put this \0 here but not in setsysfs below?

If you do, I'd fail on len == sizeof value as well, it isn't going to work with
that.

> +
> +	string = malloc(strlen(value) + 1);
> +	if (string)
> +		strcpy(string, value);
> +
> +readsys_ret:
> +	close(fd);
> +	return string;
> +}
> +
> +int
> +setsysfs(char *dir, char *entry, char *value)
> +{
> +	char filename[PATH_MAX];
> +	char oldvalue[PAGE_SIZE];
> +	int fd;
> +
> +	if (dir == NULL)
> +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> +	else
> +		snprintf(filename, PATH_MAX, "%s/%s/%s",
> +			MICSYSFSDIR, dir, entry);
> +
> +	fd = open(filename, O_RDWR);
> +	if (fd < 0) {
> +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		return errno;
> +	}
> +
> +	if (read(fd, oldvalue, sizeof(oldvalue)) < 0) {
> +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> +			filename, strerror(errno));
> +		close(fd);
> +		return errno;
> +	}
> +
> +	if (strcmp(value, oldvalue)) {
> +		if (write(fd, value, strlen(value)) < 0) {
> +			mpsslog("Failed to write new sysfs entry '%s': %s\n",
> +				filename, strerror(errno));
> +			close(fd);
> +			return errno;
> +		}
> +	}
> +
> +	close(fd);
> +	return 0;
> +}
> -- 
> 1.8.2.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon.
  2013-08-08  6:40     ` Michael S. Tsirkin
  (?)
@ 2013-08-09 16:47     ` Sudeep Dutt
  -1 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-09 16:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Greg Kroah-Hartman, Arnd Bergmann, Rusty Russell, Rob Landley,
	linux-kernel, virtualization, linux-doc, asias, Nikhil Rao,
	Ashutosh Dixit, Caz Yokoyama, Dasaratharaman Chandramouli,
	Harshavardhan R Kharche, Yaozu (Eddie) Dong,
	Peter P Waskiewicz Jr, Sudeep Dutt

On Thu, 2013-08-08 at 09:40 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 07, 2013 at 08:04:13PM -0700, Sudeep Dutt wrote:
> > From: Caz Yokoyama <Caz.Yokoyama@intel.com>
> > 
> > This patch introduces a sample user space daemon which
> > implements the virtio device backends on the host. The daemon
> > creates/removes/configures virtio device backends by communicating with
> > the Intel MIC Host Driver. The virtio devices currently supported are
> > virtio net, virtio console and virtio block. Virtio net supports TSO/GSO.
> > The daemon also monitors card shutdown status and takes appropriate actions
> > like killing the virtio backends and resetting the card upon card shutdown
> > and crashes.
> > 
> > Co-author: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > Co-author: Sudeep Dutt <sudeep.dutt@intel.com>
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
> > Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
> > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
> > Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
> > Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
> > ---
> >  Documentation/mic/mic_overview.txt |   48 +
> >  Documentation/mic/mpssd/.gitignore |    1 +
> >  Documentation/mic/mpssd/Makefile   |   19 +
> >  Documentation/mic/mpssd/micctrl    |  152 ++++
> >  Documentation/mic/mpssd/mpss       |  245 ++++++
> >  Documentation/mic/mpssd/mpssd.c    | 1689 ++++++++++++++++++++++++++++++++++++
> >  Documentation/mic/mpssd/mpssd.h    |  100 +++
> >  Documentation/mic/mpssd/sysfs.c    |  103 +++
> 
> Is this generally useful or just example code?
> If the former, you can put it in tools/ as well.
> 

Currently, this is just sample working code specific to configuring MIC
devices. The longer term plan might be to move this code to tools but
not with this patch series.

> >  8 files changed, 2357 insertions(+)
> >  create mode 100644 Documentation/mic/mic_overview.txt
> >  create mode 100644 Documentation/mic/mpssd/.gitignore
> >  create mode 100644 Documentation/mic/mpssd/Makefile
> >  create mode 100755 Documentation/mic/mpssd/micctrl
> >  create mode 100755 Documentation/mic/mpssd/mpss
> >  create mode 100644 Documentation/mic/mpssd/mpssd.c
> >  create mode 100644 Documentation/mic/mpssd/mpssd.h
> >  create mode 100644 Documentation/mic/mpssd/sysfs.c
> > 
> > diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt
> > new file mode 100644
> > index 0000000..8b1a916
> > --- /dev/null
> > +++ b/Documentation/mic/mic_overview.txt
> > @@ -0,0 +1,48 @@
> > +An Intel MIC X100 device is a PCIe form factor add-in coprocessor
> > +card based on the Intel Many Integrated Core (MIC) architecture
> > +that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
> > +implements the three required standard address spaces i.e. configuration,
> > +memory and I/O. The host OS loads a device driver as is typical for
> > +PCIe devices. The card itself runs a bootstrap after reset that
> > +transfers control to the card OS downloaded from the host driver.
> > +The card OS as shipped by Intel is a Linux kernel with modifications
> > +for the X100 devices.
> > +
> > +Since it is a PCIe card, it does not have the ability to host hardware
> > +devices for networking, storage and console. We provide these devices
> > +on X100 coprocessors thus enabling a self-bootable equivalent environment
> > +for applications. A key benefit of our solution is that it leverages
> > +the standard virtio framework for network, disk and console devices,
> > +though in our case the virtio framework is used across a PCIe bus.
> > +
> > +Here is a block diagram of the various components described above. The
> > +virtio backends are situated on the host rather than the card given better
> > +single threaded performance for the host compared to MIC and the ability of
> > +the host to initiate DMA's to/from the card using the MIC DMA engine.
> > +
> > +                              |
> > +       +----------+           |             +----------+
> > +       | Card OS  |           |             | Host OS  |
> > +       +----------+           |             +----------+
> > +                              |
> > ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> > +| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
> > +| Net   | |Console | |Block | | |Net      |  |Console | |Block   |
> > +| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
> > ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> > +    |         |         |     |      |            |         |
> > +    |         |         |     |Ring 3|            |         |
> > +    |         |         |     |------|------------|---------|-------
> > +    +-------------------+     |Ring 0+--------------------------+
> > +              |               |      | Virtio over PCIe IOCTLs  |
> > +              |               |      +--------------------------+
> > +      +--------------+        |                   |
> > +      |Intel MIC     |        |            +---------------+
> > +      |Card Driver   |        |            |Intel MIC      |
> > +      +--------------+        |            |Host Driver    |
> > +              |               |            +---------------+
> > +              |               |                   |
> > +     +-------------------------------------------------------------+
> > +     |                                                             |
> > +     |                    PCIe Bus                                 |
> > +     +-------------------------------------------------------------+
> > diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore
> > new file mode 100644
> > index 0000000..8b7c72f
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/.gitignore
> > @@ -0,0 +1 @@
> > +mpssd
> > diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile
> > new file mode 100644
> > index 0000000..eb860a7
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/Makefile
> > @@ -0,0 +1,19 @@
> > +#
> > +# Makefile - Intel MIC User Space Tools.
> > +# Copyright(c) 2013, Intel Corporation.
> > +#
> > +ifdef DEBUG
> > +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG)
> > +else
> > +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall
> > +endif
> > +
> > +mpssd: mpssd.o sysfs.o
> > +	$(CC) $(CFLAGS) -o $@ $^ -lpthread
> > +
> > +install:
> > +	install mpssd /usr/sbin/mpssd
> > +	install micctrl /usr/sbin/micctrl
> > +
> > +clean:
> > +	rm -f mpssd *.o
> > diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl
> > new file mode 100755
> > index 0000000..e0cfa53
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/micctrl
> > @@ -0,0 +1,152 @@
> > +#!/bin/bash
> > +# Intel MIC Platform Software Stack (MPSS)
> > +#
> > +# Copyright(c) 2013 Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License, version 2, as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it will be useful, but
> > +# WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > +# General Public License for more details.
> > +#
> > +# The full GNU General Public License is included in this distribution in
> > +# the file called "COPYING".
> > +#
> > +# Intel MIC User Space Tools.
> > +#
> > +# micctrl - Controls MIC boot/start/stop.
> > +#
> > +# chkconfig: 2345 95 05
> > +# description: start MPSS stack processing.
> > +#
> > +### BEGIN INIT INFO
> > +# Provides: micctrl
> > +### END INIT INFO
> > +
> > +# Source function library.
> > +. /etc/init.d/functions
> > +
> > +sysfs="/sys/class/mic"
> > +
> > +status()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`"
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo -e ""`basename $f`" state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`""
> > +		done
> > +	fi
> > +
> > +	return 0
> > +}
> > +
> > +reset()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		echo reset > $f/state
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo reset > $f/state
> > +		done
> > +	fi
> > +
> > +	return 0
> > +}
> > +
> > +boot()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		echo "boot:linux:mic/uos.img:mic/$1.image" > $f/state
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> > +		done
> > +	fi
> > +
> > +	return 0
> > +}
> > +
> > +shutdown()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		echo shutdown > $f/state
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo shutdown > $f/state
> > +		done
> > +	fi
> > +
> > +	return 0
> > +}
> > +
> > +wait()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> > +		do
> > +			sleep 1
> > +			echo -e "Waiting for $1 to go offline"
> > +		done
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		# Wait for the cards to go offline
> > +		for f in $sysfs/*
> > +		do
> > +			while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> > +			do
> > +				sleep 1
> > +				echo -e "Waiting for "`basename $f`" to go offline"
> > +			done
> > +		done
> > +	fi
> > +}
> > +
> > +case $1 in
> > +	-s)
> > +		status $2
> > +		;;
> > +	-r)
> > +		reset $2
> > +		;;
> > +	-b)
> > +		boot $2
> > +		;;
> > +	-S)
> > +		shutdown $2
> > +		;;
> > +	-w)
> > +		wait $2
> > +		;;
> > +	*)
> > +		echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}"
> > +		exit 2
> > +esac
> > +
> > +exit $?
> > diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss
> > new file mode 100755
> > index 0000000..f0bb3dd
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/mpss
> > @@ -0,0 +1,245 @@
> > +#!/bin/bash
> > +# Intel MIC Platform Software Stack (MPSS)
> > +#
> > +# Copyright(c) 2013 Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License, version 2, as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it will be useful, but
> > +# WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > +# General Public License for more details.
> > +#
> > +# The full GNU General Public License is included in this distribution in
> > +# the file called "COPYING".
> > +#
> > +# Intel MIC User Space Tools.
> > +#
> > +# mpss	Start mpssd.
> > +#
> > +# chkconfig: 2345 95 05
> > +# description: start MPSS stack processing.
> > +#
> > +### BEGIN INIT INFO
> > +# Provides: mpss
> > +# Required-Start:
> > +# Required-Stop:
> > +# Short-Description: MPSS stack control
> > +# Description: MPSS stack control
> > +### END INIT INFO
> > +
> > +# Source function library.
> > +. /etc/init.d/functions
> > +
> > +exec=/usr/sbin/mpssd
> > +sysfs="/sys/class/mic"
> > +
> > +start()
> > +{
> > +	[ -x $exec ] || exit 5
> > +
> > +	echo -e $"Starting MPSS Stack"
> > +
> > +	echo -e $"Loading MIC_HOST Module"
> > +
> > +	# Ensure the driver is loaded
> > +	[ -d "$sysfs" ] || modprobe mic_host
> > +
> > +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then
> > +		echo -e $"MPSSD already running! "
> > +		success
> > +		echo
> > +		return 0;
> > +	fi
> > +
> > +	# Start the daemon
> > +	echo -n $"Starting MPSSD"
> > +	$exec &
> > +	RETVAL=$?
> > +	if [ $RETVAL -ne 0 ]; then
> > +		failure
> > +	else
> > +		success
> > +	fi
> > +	echo
> > +
> > +	sleep 5
> > +
> > +	# Boot the cards
> > +	if [ $RETVAL -eq 0 ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo -ne "Booting "`basename $f`" "
> > +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> > +			RETVAL=$?
> > +			if [ $RETVAL -ne 0 ]; then
> > +				failure
> > +			else
> > +				success
> > +			fi
> > +			echo
> > +		done
> > +	fi
> > +
> > +	# Wait till ping works
> > +	if [ $RETVAL -eq 0 ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			count=100
> > +			ipaddr=`cat $f/cmdline`
> > +			ipaddr=${ipaddr#*address,}
> > +			ipaddr=`echo $ipaddr | cut -d, -f1 | cut -d\; -f1`
> > +
> > +			while [ $count -ge 0 ]
> > +			do
> > +				echo -e "Pinging "`basename $f`" "
> > +				ping -c 1 $ipaddr &> /dev/null
> > +				RETVAL=$?
> > +				if [ $RETVAL -eq 0 ]; then
> > +					success
> > +					break
> > +				fi
> > +				sleep 1
> > +				count=`expr $count - 1`
> > +			done
> > +			if [ $RETVAL -ne 0 ]; then
> > +				failure
> > +			else
> > +				success
> > +			fi
> > +			echo
> > +		done
> > +	fi
> > +	return $RETVAL
> > +}
> > +
> > +stop()
> > +{
> > +	echo -e $"Shutting down MPSS Stack: "
> > +
> > +	# Bail out if module is unloaded
> > +	if [ ! -d "$sysfs" ]; then
> > +		echo -n $"Module unloaded "
> > +		killall -9 mpssd 2>/dev/null
> > +		success
> > +		echo
> > +		return 0
> > +	fi
> > +
> > +	# Shut down the cards
> > +	for f in $sysfs/*
> > +	do
> > +		echo -e "Shutting down `basename $f` "
> > +		echo "shutdown" > $f/state 2>/dev/null
> > +	done
> > +
> > +	# Wait for the cards to go offline
> > +	for f in $sysfs/*
> > +	do
> > +		while [ "`cat $f/state`" != "offline" ]
> > +		do
> > +			sleep 1
> > +			echo -e "Waiting for "`basename $f`" to go offline"
> > +		done
> > +	done
> > +
> > +	# Display the status of the cards
> > +	for f in $sysfs/*
> > +	do
> > +		echo -e ""`basename $f`" state: "`cat $f/state`""
> > +	done
> > +
> > +	sleep 5
> > +
> > +	# Kill MPSSD now
> > +	echo -n $"Killing MPSSD"
> > +	killall -9 mpssd 2>/dev/null
> > +	RETVAL=$?
> > +	if [ $RETVAL -ne 0 ]; then
> > +		failure
> > +	else
> > +		success
> > +	fi
> > +	echo
> > +	return $RETVAL
> > +}
> > +
> > +restart()
> > +{
> > +	stop
> > +	sleep 5
> > +	start
> > +}
> > +
> > +status()
> > +{
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo -e ""`basename $f`" state: "`cat $f/state`""
> > +		done
> > +	fi
> > +
> > +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then
> > +		echo "mpssd is running"
> > +	else
> > +		echo "mpssd is stopped"
> > +	fi
> > +	return 0
> > +}
> > +
> > +unload()
> > +{
> > +	if [ ! -d "$sysfs" ]; then
> > +		echo -n $"No MIC_HOST Module: "
> > +		killall -9 mpssd 2>/dev/null
> > +		success
> > +		echo
> > +		return
> > +	fi
> > +
> > +	stop
> > +	RETVAL=$?
> > +
> > +	sleep 5
> > +	echo -n $"Removing MIC_HOST Module: "
> > +
> > +	if [ $RETVAL = 0 ]; then
> > +		sleep 1
> > +		modprobe -r mic_host
> > +		RETVAL=$?
> > +	fi
> > +
> > +	if [ $RETVAL -ne 0 ]; then
> > +		failure
> > +	else
> > +		success
> > +	fi
> > +	echo
> > +	return $RETVAL
> > +}
> > +
> > +case $1 in
> > +	start)
> > +		start
> > +		;;
> > +	stop)
> > +		stop
> > +		;;
> > +	restart)
> > +		restart
> > +		;;
> > +	status)
> > +		status
> > +		;;
> > +	unload)
> > +		unload
> > +		;;
> > +	*)
> > +		echo $"Usage: $0 {start|stop|restart|status|unload}"
> > +		exit 2
> > +esac
> > +
> > +exit $?
> > diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c
> > new file mode 100644
> > index 0000000..3bc34cb
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/mpssd.c
> > @@ -0,0 +1,1689 @@
> > +/*
> > + * Intel MIC Platform Software Stack (MPSS)
> > + *
> > + * Copyright(c) 2013 Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * General Public License for more details.
> > + *
> > + * The full GNU General Public License is included in this distribution in
> > + * the file called "COPYING".
> > + *
> > + * Intel MIC User Space Tools.
> > + */
> > +
> > +#define _GNU_SOURCE
> > +
> > +#include <stdlib.h>
> > +#include <fcntl.h>
> > +#include <getopt.h>
> > +#include <assert.h>
> > +#include <unistd.h>
> > +#include <stdbool.h>
> > +#include <signal.h>
> > +#include <poll.h>
> > +#include <features.h>
> > +#include <sys/types.h>
> > +#include <sys/stat.h>
> > +#include <sys/mman.h>
> > +#include <sys/socket.h>
> > +#include <linux/virtio_ring.h>
> > +#include <linux/virtio_net.h>
> > +#include <linux/virtio_console.h>
> > +#include <linux/virtio_blk.h>
> > +#include <linux/version.h>
> > +#include "mpssd.h"
> > +#include <linux/mic_ioctl.h>
> > +#include <linux/mic_common.h>
> > +
> > +static void init_mic(struct mic_info *mic);
> > +
> > +static FILE *logfp;
> > +static struct mic_info mic_list;
> > +
> > +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> > +
> > +#define min_t(type, x, y) ({				\
> > +		type __min1 = (x);                      \
> > +		type __min2 = (y);                      \
> > +		__min1 < __min2 ? __min1 : __min2; })
> > +
> > +/* align addr on a size boundary - adjust address up/down if needed */
> > +#define _ALIGN_UP(addr, size)    (((addr)+((size)-1))&(~((size)-1)))
> > +#define _ALIGN_DOWN(addr, size)  ((addr)&(~((size)-1)))
> > +
> > +/* align addr on a size boundary - adjust address up if needed */
> > +#define _ALIGN(addr, size)     _ALIGN_UP(addr, size)
> > +
> > +/* to align the pointer to the (next) page boundary */
> > +#define PAGE_ALIGN(addr)        _ALIGN(addr, PAGE_SIZE)
> > +
> > +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> > +
> > +/* Insert REP NOP (PAUSE) in busy-wait loops. */
> > +static inline void cpu_relax(void)
> > +{
> > +	asm volatile("rep; nop" : : : "memory");
> > +}
> > +
> > +#define GSO_ENABLED		1
> > +#define MAX_GSO_SIZE		(64 * 1024)
> > +#define ETH_H_LEN		14
> > +#define MAX_NET_PKT_SIZE	(_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64))
> > +#define MIC_DEVICE_PAGE_END	0x1000
> > +
> > +#ifndef VIRTIO_NET_HDR_F_DATA_VALID
> > +#define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
> > +#endif
> > +
> > +static struct {
> > +	struct mic_device_desc dd;
> > +	struct mic_vqconfig vqconfig[2];
> > +	__u32 host_features, guest_acknowledgements;
> > +	struct virtio_console_config cons_config;
> > +} virtcons_dev_page = {
> > +	.dd = {
> > +		.type = VIRTIO_ID_CONSOLE,
> > +		.num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig),
> > +		.feature_len = sizeof(virtcons_dev_page.host_features),
> > +		.config_len = sizeof(virtcons_dev_page.cons_config),
> > +	},
> > +	.vqconfig[0] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +	.vqconfig[1] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +};
> > +
> > +static struct {
> > +	struct mic_device_desc dd;
> > +	struct mic_vqconfig vqconfig[2];
> > +	__u32 host_features, guest_acknowledgements;
> > +	struct virtio_net_config net_config;
> > +} virtnet_dev_page = {
> > +	.dd = {
> > +		.type = VIRTIO_ID_NET,
> > +		.num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig),
> > +		.feature_len = sizeof(virtnet_dev_page.host_features),
> > +		.config_len = sizeof(virtnet_dev_page.net_config),
> > +	},
> > +	.vqconfig[0] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +	.vqconfig[1] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +#if GSO_ENABLED
> > +		.host_features = htole32(
> > +		1 << VIRTIO_NET_F_CSUM |
> > +		1 << VIRTIO_NET_F_GSO |
> > +		1 << VIRTIO_NET_F_GUEST_TSO4 |
> > +		1 << VIRTIO_NET_F_GUEST_TSO6 |
> > +		1 << VIRTIO_NET_F_GUEST_ECN |
> > +		1 << VIRTIO_NET_F_GUEST_UFO),
> > +#else
> > +		.host_features = 0,
> > +#endif
> > +};
> > +
> > +static const char *mic_config_dir = "/etc/sysconfig/mic";
> > +static const char *virtblk_backend = "VIRTBLK_BACKEND";
> > +static struct {
> > +	struct mic_device_desc dd;
> > +	struct mic_vqconfig vqconfig[1];
> > +	__u32 host_features, guest_acknowledgements;
> > +	struct virtio_blk_config blk_config;
> > +} virtblk_dev_page = {
> > +	.dd = {
> > +		.type = VIRTIO_ID_BLOCK,
> > +		.num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig),
> > +		.feature_len = sizeof(virtblk_dev_page.host_features),
> > +		.config_len = sizeof(virtblk_dev_page.blk_config),
> > +	},
> > +	.vqconfig[0] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +	.host_features =
> > +		htole32(1<<VIRTIO_BLK_F_SEG_MAX),
> > +	.blk_config = {
> > +		.seg_max = htole32(MIC_VRING_ENTRIES - 2),
> > +		.capacity = htole64(0),
> > +	 }
> > +};
> > +
> > +static char *myname;
> > +
> > +static int
> > +tap_configure(struct mic_info *mic, char *dev)
> > +{
> > +	pid_t pid;
> > +	char *ifargv[7];
> > +	char ipaddr[IFNAMSIZ];
> > +	int ret = 0;
> > +
> > +	pid = fork();
> > +	if (pid == 0) {
> > +		ifargv[0] = "ip";
> > +		ifargv[1] = "link";
> > +		ifargv[2] = "set";
> > +		ifargv[3] = dev;
> > +		ifargv[4] = "up";
> > +		ifargv[5] = NULL;
> > +		mpsslog("Configuring %s\n", dev);
> > +		ret = execvp("ip", ifargv);
> > +		if (ret < 0) {
> > +			mpsslog("%s execvp failed errno %s\n",
> > +				mic->name, strerror(errno));
> > +			return ret;
> > +		}
> > +	}
> > +	if (pid < 0) {
> > +		mpsslog("%s fork failed errno %s\n",
> > +			mic->name, strerror(errno));
> > +		return ret;
> > +	}
> > +
> > +	ret = waitpid(pid, NULL, 0);
> > +	if (ret < 0) {
> > +		mpsslog("%s waitpid failed errno %s\n",
> > +			mic->name, strerror(errno));
> > +		return ret;
> > +	}
> > +
> > +	snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id);
> > +
> > +	pid = fork();
> > +	if (pid == 0) {
> > +		ifargv[0] = "ip";
> > +		ifargv[1] = "addr";
> > +		ifargv[2] = "add";
> > +		ifargv[3] = ipaddr;
> > +		ifargv[4] = "dev";
> > +		ifargv[5] = dev;
> > +		ifargv[6] = NULL;
> > +		mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr);
> > +		ret = execvp("ip", ifargv);
> > +		if (ret < 0) {
> > +			mpsslog("%s execvp failed errno %s\n",
> > +				mic->name, strerror(errno));
> > +			return ret;
> > +		}
> > +	}
> > +	if (pid < 0) {
> > +		mpsslog("%s fork failed errno %s\n",
> > +			mic->name, strerror(errno));
> > +		return ret;
> > +	}
> > +
> > +	ret = waitpid(pid, NULL, 0);
> > +	if (ret < 0) {
> > +		mpsslog("%s waitpid failed errno %s\n",
> > +			mic->name, strerror(errno));
> > +		return ret;
> > +	}
> > +	mpsslog("MIC name %s %s %d DONE!\n",
> > +		mic->name, __func__, __LINE__);
> > +	return 0;
> > +}
> > +
> > +static int tun_alloc(struct mic_info *mic, char *dev)
> > +{
> > +	struct ifreq ifr;
> > +	int fd, err;
> > +#if GSO_ENABLED
> > +	unsigned offload;
> > +#endif
> > +	fd = open("/dev/net/tun", O_RDWR);
> > +	if (fd < 0) {
> > +		mpsslog("Could not open /dev/net/tun %s\n", strerror(errno));
> > +		goto done;
> > +	}
> > +
> > +	memset(&ifr, 0, sizeof(ifr));
> > +
> > +	ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
> > +	if (*dev)
> > +		strncpy(ifr.ifr_name, dev, IFNAMSIZ);
> > +
> > +	err = ioctl(fd, TUNSETIFF, (void *) &ifr);
> > +	if (err < 0) {
> > +		mpsslog("%s %s %d TUNSETIFF failed %s\n",
> > +			mic->name, __func__, __LINE__, strerror(errno));
> > +		close(fd);
> > +		return err;
> > +	}
> > +#if GSO_ENABLED
> > +	offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> > +		TUN_F_TSO_ECN | TUN_F_UFO;
> > +
> > +	err = ioctl(fd, TUNSETOFFLOAD, offload);
> > +	if (err < 0) {
> > +		mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n",
> > +			mic->name, __func__, __LINE__, strerror(errno));
> > +		close(fd);
> > +		return err;
> > +	}
> > +#endif
> > +	strcpy(dev, ifr.ifr_name);
> > +	mpsslog("Created TAP %s\n", dev);
> > +done:
> > +	return fd;
> > +}
> > +
> > +#define NET_FD_VIRTIO_NET 0
> > +#define NET_FD_TUN 1
> > +#define MAX_NET_FD 2
> > +
> > +static void * *
> > +get_dp(struct mic_info *mic, int type)
> > +{
> > +	switch (type) {
> > +	case VIRTIO_ID_CONSOLE:
> > +		return &mic->mic_console.console_dp;
> > +	case VIRTIO_ID_NET:
> > +		return &mic->mic_net.net_dp;
> > +	case VIRTIO_ID_BLOCK:
> > +		return &mic->mic_virtblk.block_dp;
> > +	}
> > +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> > +	assert(0);
> > +	return NULL;
> > +}
> > +
> > +static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type)
> > +{
> > +	struct mic_device_desc *d;
> > +	int i;
> > +	void *dp = *get_dp(mic, type);
> > +
> > +	for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE;
> > +		i += mic_total_desc_size(d)) {
> > +		d = dp + i;
> > +
> > +		/* End of list */
> > +		if (d->type == 0)
> > +			break;
> > +
> > +		if (d->type == -1)
> > +			continue;
> > +
> > +		mpsslog("%s %s d-> type %d d %p\n",
> > +			mic->name, __func__, d->type, d);
> > +
> > +		if (d->type == (__u8)type)
> > +			return d;
> > +	}
> > +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> > +	assert(0);
> > +	return NULL;
> > +}
> > +
> > +/* See comments in vhost.c for explanation of next_desc() */
> > +static unsigned next_desc(struct vring_desc *desc)
> > +{
> > +	unsigned int next;
> > +
> > +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT))
> > +		return -1U;
> > +	next = le16toh(desc->next);
> > +	return next;
> > +}
> > +
> > +/* Sum up all the IOVEC length */
> > +static ssize_t
> > +sum_iovec_len(struct mic_copy_desc *copy)
> > +{
> > +	ssize_t sum = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < copy->iovcnt; i++)
> > +		sum += copy->iov[i].iov_len;
> > +	return sum;
> > +}
> > +
> > +static inline void verify_out_len(struct mic_info *mic,
> > +	struct mic_copy_desc *copy)
> > +{
> > +	if (copy->out_len != sum_iovec_len(copy)) {
> > +		mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%x\n",
> > +				mic->name, __func__, __LINE__,
> > +				copy->out_len, sum_iovec_len(copy));
> > +		assert(copy->out_len == sum_iovec_len(copy));
> > +	}
> > +}
> > +
> > +/* Display an iovec */
> > +static void
> > +disp_iovec(struct mic_info *mic, struct mic_copy_desc *copy,
> > +	const char *s, int line)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < copy->iovcnt; i++)
> > +		mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%lx\n",
> > +			mic->name, s, line, i,
> > +			copy->iov[i].iov_base, copy->iov[i].iov_len);
> > +}
> > +
> > +static inline __u16 read_avail_idx(struct mic_vring *vr)
> > +{
> > +	return ACCESS_ONCE(vr->info->avail_idx);
> > +}
> > +
> > +static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
> > +				struct mic_copy_desc *copy, ssize_t len)
> > +{
> > +	copy->vr_idx = tx ? 0 : 1;
> > +	copy->update_used = true;
> > +	if (type == VIRTIO_ID_NET)
> > +		copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr);
> > +	else
> > +		copy->iov[0].iov_len = len;
> > +}
> > +
> > +/* Central API which triggers the copies */
> > +static int
> > +mic_virtio_copy(struct mic_info *mic, int fd,
> > +	struct mic_vring *vr, struct mic_copy_desc *copy)
> > +{
> > +	int ret;
> > +
> > +	ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, copy);
> > +	if (ret) {
> > +		mpsslog("%s %s %d errno %s ret %d\n",
> > +			mic->name, __func__, __LINE__,
> > +			strerror(errno), ret);
> > +	}
> > +	return ret;
> > +}
> > +
> > +/*
> > + * This initialization routine requires at least one
> > + * vring i.e. vr0. vr1 is optional.
> > + */
> > +static void *
> > +init_vr(struct mic_info *mic, int fd, int type,
> > +	struct mic_vring *vr0, struct mic_vring *vr1, int num_vq)
> > +{
> > +	int vr_size;
> > +	char *va;
> > +
> > +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> > +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> > +	va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq,
> > +		PROT_READ, MAP_SHARED, fd, 0);
> > +	if (MAP_FAILED == va) {
> > +		mpsslog("%s %s %d mmap failed errno %s\n",
> > +			mic->name, __func__, __LINE__,
> > +			strerror(errno));
> > +		goto done;
> > +	}
> > +	*get_dp(mic, type) = (void *)va;
> > +	vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END];
> > +	vr0->info = vr0->va +
> > +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN);
> > +	vring_init(&vr0->vr,
> > +		MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN);
> > +	mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ",
> > +		__func__, mic->name, vr0->va, vr0->info, vr_size,
> > +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> > +	mpsslog("magic 0x%x expected 0x%x\n",
> > +		vr0->info->magic, MIC_MAGIC + type + 0);
> > +	assert(vr0->info->magic == MIC_MAGIC + type + 0);
> > +	if (vr1) {
> > +		vr1->va = (struct mic_vring *)
> > +			&va[MIC_DEVICE_PAGE_END + vr_size];
> > +		vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES,
> > +			MIC_VIRTIO_RING_ALIGN);
> > +		vring_init(&vr1->vr,
> > +			MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN);
> > +		mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ",
> > +			__func__, mic->name, vr1->va, vr1->info, vr_size,
> > +			vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> > +		mpsslog("magic 0x%x expected 0x%x\n",
> > +			vr1->info->magic, MIC_MAGIC + type + 1);
> > +		assert(vr1->info->magic == MIC_MAGIC + type + 1);
> > +	}
> > +done:
> > +	return va;
> > +}
> > +
> > +static void
> > +uninit_vr(struct mic_info *mic, int num_vq)
> > +{
> > +	int vr_size, ret;
> > +
> > +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> > +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> > +	ret = munmap(mic->mic_virtblk.block_dp,
> > +		MIC_DEVICE_PAGE_END + vr_size * num_vq);
> > +	if (ret < 0)
> > +		mpsslog("%s munmap errno %d\n", mic->name, errno);
> > +}
> > +
> > +static void
> > +wait_for_card_driver(struct mic_info *mic, int fd, int type)
> > +{
> > +	struct pollfd pollfd;
> > +	int err;
> > +	struct mic_device_desc *desc = get_device_desc(mic, type);
> > +
> > +	pollfd.fd = fd;
> > +	mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n",
> > +		mic->name, __func__, type, desc->status);
> > +	while (1) {
> > +		pollfd.events = POLLIN;
> > +		pollfd.revents = 0;
> > +		err = poll(&pollfd, 1, -1);
> > +		if (err < 0) {
> > +			mpsslog("%s %s poll failed %s\n",
> > +				mic->name, __func__, strerror(errno));
> > +			continue;
> > +		}
> > +
> > +		if (pollfd.revents) {
> > +			mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n",
> > +				mic->name, __func__, type, desc->status);
> > +			if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > +				mpsslog("%s %s poll.revents %d\n",
> > +					mic->name, __func__, pollfd.revents);
> > +				mpsslog("%s %s desc-> type %d status 0x%x\n",
> > +					mic->name, __func__, type,
> > +					desc->status);
> > +				break;
> > +			}
> > +		}
> > +	}
> > +}
> > +
> > +/* Spin till we have some descriptors */
> > +static void
> > +wait_for_descriptors(struct mic_info *mic, struct mic_vring *vr)
> > +{
> > +	__u16 avail_idx = read_avail_idx(vr);
> > +
> > +	while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
> > +#ifdef DEBUG
> > +		mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
> > +			mic->name, __func__,
> > +			le16toh(vr->vr.avail->idx), vr->info->avail_idx);
> > +#endif
> > +		cpu_relax();
> > +	}
> > +}
> > +
> > +static void *
> > +virtio_net(void *arg)
> > +{
> > +	static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)];
> > +	static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64);
> > +	struct iovec vnet_iov[2][2] = {
> > +		{ { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) },
> > +		  { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } },
> > +		{ { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) },
> > +		  { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } },
> > +	};
> > +	struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1];
> > +	struct mic_info *mic = (struct mic_info *)arg;
> > +	char if_name[IFNAMSIZ];
> > +	struct pollfd net_poll[MAX_NET_FD];
> > +	struct mic_vring tx_vr, rx_vr;
> > +	struct mic_copy_desc copy;
> > +	struct mic_device_desc *desc;
> > +	int err;
> > +
> > +	snprintf(if_name, IFNAMSIZ, "mic%d", mic->id);
> > +	mic->mic_net.tap_fd = tun_alloc(mic, if_name);
> > +	if (mic->mic_net.tap_fd < 0)
> > +		goto done;
> > +
> > +	if (tap_configure(mic, if_name))
> > +		goto done;
> > +	mpsslog("MIC name %s id %d\n", mic->name, mic->id);
> > +
> > +	net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd;
> > +	net_poll[NET_FD_VIRTIO_NET].events = POLLIN;
> > +	net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd;
> > +	net_poll[NET_FD_TUN].events = POLLIN;
> > +
> > +	if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd,
> > +		VIRTIO_ID_NET, &tx_vr, &rx_vr,
> > +		virtnet_dev_page.dd.num_vq)) {
> > +		mpsslog("%s init_vr failed %s\n",
> > +			mic->name, strerror(errno));
> > +		goto done;
> > +	}
> > +
> > +	copy.iovcnt = 2;
> > +	desc = get_device_desc(mic, VIRTIO_ID_NET);
> > +
> > +	while (1) {
> > +		ssize_t len;
> > +
> > +		net_poll[NET_FD_VIRTIO_NET].revents = 0;
> > +		net_poll[NET_FD_TUN].revents = 0;
> > +
> > +		/* Start polling for data from tap and virtio net */
> > +		err = poll(net_poll, 2, -1);
> > +		if (err < 0) {
> > +			mpsslog("%s poll failed %s\n",
> > +				__func__, strerror(errno));
> > +			continue;
> > +		}
> > +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > +			wait_for_card_driver(mic, mic->mic_net.virtio_net_fd,
> > +					VIRTIO_ID_NET);
> > +		/*
> > +		 * Check if there is data to be read from TUN and write to
> > +		 * virtio net fd if there is.
> > +		 */
> > +		if (net_poll[NET_FD_TUN].revents & POLLIN) {
> > +			copy.iov = iov0;
> > +			len = readv(net_poll[NET_FD_TUN].fd,
> > +				copy.iov, copy.iovcnt);
> > +			if (len > 0) {
> > +				struct virtio_net_hdr *hdr
> > +					= (struct virtio_net_hdr *) vnet_hdr[0];
> > +
> > +				/* Disable checksums on the card since we are on
> > +				   a reliable PCIe link */
> > +				hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
> > +#ifdef DEBUG
> > +				mpsslog("%s %s %d hdr->flags 0x%x ", mic->name,
> > +					__func__, __LINE__, hdr->flags);
> > +				mpsslog("copy.out_len %d hdr->gso_type 0x%x\n",
> > +					copy.out_len, hdr->gso_type);
> > +#endif
> > +#ifdef DEBUG
> > +				disp_iovec(mic, copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d read from tap 0x%lx\n",
> > +					mic->name, __func__, __LINE__,
> > +					len);
> > +#endif
> > +				wait_for_descriptors(mic, &tx_vr);
> > +				txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, &copy,
> > +					len);
> > +
> > +				err = mic_virtio_copy(mic,
> > +					mic->mic_net.virtio_net_fd, &tx_vr,
> > +					&copy);
> > +				if (err < 0) {
> > +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> > +						mic->name, __func__, __LINE__,
> > +						strerror(errno));
> > +				}
> > +				if (!err)
> > +					verify_out_len(mic, &copy);
> > +#ifdef DEBUG
> > +				disp_iovec(mic, copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> > +					mic->name, __func__, __LINE__,
> > +					sum_iovec_len(&copy));
> > +#endif
> > +				/* Reinitialize IOV for next run */
> > +				iov0[1].iov_len = MAX_NET_PKT_SIZE;
> > +			} else if (len < 0) {
> > +				disp_iovec(mic, &copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d read failed %s ", mic->name,
> > +					__func__, __LINE__, strerror(errno));
> > +				mpsslog("cnt %d sum %d\n",
> > +					copy.iovcnt, sum_iovec_len(&copy));
> > +			}
> > +		}
> > +
> > +		/*
> > +		 * Check if there is data to be read from virtio net and
> > +		 * write to TUN if there is.
> > +		 */
> > +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) {
> > +			while (rx_vr.info->avail_idx !=
> > +				le16toh(rx_vr.vr.avail->idx)) {
> > +				copy.iov = iov1;
> > +				txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, &copy,
> > +					MAX_NET_PKT_SIZE
> > +					+ sizeof(struct virtio_net_hdr));
> > +
> > +				err = mic_virtio_copy(mic,
> > +					mic->mic_net.virtio_net_fd, &rx_vr,
> > +					&copy);
> > +				if (!err) {
> > +#ifdef DEBUG
> > +					struct virtio_net_hdr *hdr
> > +						= (struct virtio_net_hdr *)
> > +							vnet_hdr[1];
> > +
> > +					mpsslog("%s %s %d hdr->flags 0x%x, ",
> > +						mic->name, __func__, __LINE__,
> > +						hdr->flags);
> > +					mpsslog("out_len %d gso_type 0x%x\n",
> > +						copy.out_len,
> > +						hdr->gso_type);
> > +#endif
> > +					/* Set the correct output iov_len */
> > +					iov1[1].iov_len = copy.out_len -
> > +						sizeof(struct virtio_net_hdr);
> > +					verify_out_len(mic, &copy);
> > +#ifdef DEBUG
> > +					disp_iovec(mic, copy, __func__,
> > +						__LINE__);
> > +					mpsslog("%s %s %d ",
> > +						mic->name, __func__, __LINE__);
> > +					mpsslog("read from net 0x%lx\n",
> > +						sum_iovec_len(copy));
> > +#endif
> > +					len = writev(net_poll[NET_FD_TUN].fd,
> > +						copy.iov, copy.iovcnt);
> > +					if (len != sum_iovec_len(&copy)) {
> > +						mpsslog("Tun write failed %s ",
> > +							strerror(errno));
> > +						mpsslog("len 0x%x ", len);
> > +						mpsslog("read_len 0x%x\n",
> > +							sum_iovec_len(&copy));
> > +					} else {
> > +#ifdef DEBUG
> > +						disp_iovec(mic, &copy, __func__,
> > +							__LINE__);
> > +						mpsslog("%s %s %d ",
> > +							mic->name, __func__,
> > +							__LINE__);
> > +						mpsslog("wrote to tap 0x%lx\n",
> > +							len);
> > +#endif
> > +					}
> > +				} else {
> > +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> > +						mic->name, __func__, __LINE__,
> > +						strerror(errno));
> > +					break;
> > +				}
> > +			}
> > +		}
> > +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> > +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> > +			sleep(1);
> > +		}
> > +	}
> > +done:
> > +	pthread_exit(NULL);
> > +}
> > +
> > +/* virtio_console */
> > +#define VIRTIO_CONSOLE_FD 0
> > +#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1)
> > +#define MAX_CONSOLE_FD (MONITOR_FD + 1)  /* must be the last one + 1 */
> > +#define MAX_BUFFER_SIZE PAGE_SIZE
> > +
> > +static void *
> > +virtio_console(void *arg)
> > +{
> > +	static __u8 vcons_buf[2][PAGE_SIZE];
> > +	struct iovec vcons_iov[2] = {
> > +		{ .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) },
> > +		{ .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) },
> > +	};
> > +	struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1];
> > +	struct mic_info *mic = (struct mic_info *)arg;
> > +	int err;
> > +	struct pollfd console_poll[MAX_CONSOLE_FD];
> > +	int pty_fd;
> > +	char *pts_name;
> > +	ssize_t len;
> > +	struct mic_vring tx_vr, rx_vr;
> > +	struct mic_copy_desc copy;
> > +	struct mic_device_desc *desc;
> > +
> > +	pty_fd = posix_openpt(O_RDWR);
> > +	if (pty_fd < 0) {
> > +		mpsslog("can't open a pseudoterminal master device: %s\n",
> > +			strerror(errno));
> > +		goto _return;
> > +	}
> > +	pts_name = ptsname(pty_fd);
> > +	if (pts_name == NULL) {
> > +		mpsslog("can't get pts name\n");
> > +		goto _close_pty;
> > +	}
> > +	printf("%s console message goes to %s\n", mic->name, pts_name);
> > +	mpsslog("%s console message goes to %s\n", mic->name, pts_name);
> > +	err = grantpt(pty_fd);
> > +	if (err < 0) {
> > +		mpsslog("can't grant access: %s %s\n",
> > +				pts_name, strerror(errno));
> > +		goto _close_pty;
> > +	}
> > +	err = unlockpt(pty_fd);
> > +	if (err < 0) {
> > +		mpsslog("can't unlock a pseudoterminal: %s %s\n",
> > +				pts_name, strerror(errno));
> > +		goto _close_pty;
> > +	}
> > +	console_poll[MONITOR_FD].fd = pty_fd;
> > +	console_poll[MONITOR_FD].events = POLLIN;
> > +
> > +	console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd;
> > +	console_poll[VIRTIO_CONSOLE_FD].events = POLLIN;
> > +
> > +	if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd,
> > +		VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr,
> > +		virtcons_dev_page.dd.num_vq)) {
> > +		mpsslog("%s init_vr failed %s\n",
> > +			mic->name, strerror(errno));
> > +		goto _close_pty;
> > +	}
> > +
> > +	copy.iovcnt = 1;
> > +	desc = get_device_desc(mic, VIRTIO_ID_CONSOLE);
> > +
> > +	for (;;) {
> > +		console_poll[MONITOR_FD].revents = 0;
> > +		console_poll[VIRTIO_CONSOLE_FD].revents = 0;
> > +		err = poll(console_poll, MAX_CONSOLE_FD, -1);
> > +		if (err < 0) {
> > +			mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__,
> > +				strerror(errno));
> > +			continue;
> > +		}
> > +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > +			wait_for_card_driver(mic,
> > +				mic->mic_console.virtio_console_fd,
> > +				VIRTIO_ID_CONSOLE);
> > +
> > +		if (console_poll[MONITOR_FD].revents & POLLIN) {
> > +			copy.iov = iov0;
> > +			len = readv(pty_fd, copy.iov, copy.iovcnt);
> > +			if (len > 0) {
> > +#ifdef DEBUG
> > +				disp_iovec(mic, copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d read from tap 0x%lx\n",
> > +					mic->name, __func__, __LINE__,
> > +					len);
> > +#endif
> > +				wait_for_descriptors(mic, &tx_vr);
> > +				txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr,
> > +					&copy, len);
> > +
> > +				err = mic_virtio_copy(mic,
> > +					mic->mic_console.virtio_console_fd,
> > +					&tx_vr, &copy);
> > +				if (err < 0) {
> > +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> > +						mic->name, __func__, __LINE__,
> > +						strerror(errno));
> > +				}
> > +				if (!err)
> > +					verify_out_len(mic, &copy);
> > +#ifdef DEBUG
> > +				disp_iovec(mic, copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> > +					mic->name, __func__, __LINE__,
> > +					sum_iovec_len(copy));
> > +#endif
> > +				/* Reinitialize IOV for next run */
> > +				iov0->iov_len = PAGE_SIZE;
> > +			} else if (len < 0) {
> > +				disp_iovec(mic, &copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d read failed %s ",
> > +					mic->name, __func__, __LINE__,
> > +					strerror(errno));
> > +				mpsslog("cnt %d sum %d\n",
> > +					copy.iovcnt, sum_iovec_len(&copy));
> > +			}
> > +		}
> > +
> > +		if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) {
> > +			while (rx_vr.info->avail_idx !=
> > +				le16toh(rx_vr.vr.avail->idx)) {
> > +				copy.iov = iov1;
> > +				txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr,
> > +					&copy, PAGE_SIZE);
> > +
> > +				err = mic_virtio_copy(mic,
> > +					mic->mic_console.virtio_console_fd,
> > +					&rx_vr, &copy);
> > +				if (!err) {
> > +					/* Set the correct output iov_len */
> > +					iov1->iov_len = copy.out_len;
> > +					verify_out_len(mic, &copy);
> > +#ifdef DEBUG
> > +					disp_iovec(mic, copy, __func__,
> > +						__LINE__);
> > +					mpsslog("%s %s %d ",
> > +						mic->name, __func__, __LINE__);
> > +					mpsslog("read from net 0x%lx\n",
> > +						sum_iovec_len(copy));
> > +#endif
> > +					len = writev(pty_fd,
> > +						copy.iov, copy.iovcnt);
> > +					if (len != sum_iovec_len(&copy)) {
> > +						mpsslog("Tun write failed %s ",
> > +							strerror(errno));
> > +						mpsslog("len 0x%x ", len);
> > +						mpsslog("read_len 0x%x\n",
> > +							sum_iovec_len(&copy));
> > +					} else {
> > +#ifdef DEBUG
> > +						disp_iovec(mic, copy, __func__,
> > +							__LINE__);
> > +						mpsslog("%s %s %d ",
> > +							mic->name, __func__,
> > +							__LINE__);
> > +						mpsslog("wrote to tap 0x%lx\n",
> > +							len);
> > +#endif
> > +					}
> > +				} else {
> > +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> > +						mic->name, __func__, __LINE__,
> > +						strerror(errno));
> > +					break;
> > +				}
> > +			}
> > +		}
> > +		if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> > +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> > +			sleep(1);
> > +		}
> > +	}
> > +_close_pty:
> > +	close(pty_fd);
> > +_return:
> > +	pthread_exit(NULL);
> > +}
> > +
> > +static void
> > +add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd)
> > +{
> > +	char path[PATH_MAX];
> > +	int fd, err;
> > +
> > +	snprintf(path, PATH_MAX, "/dev/mic%d", mic->id);
> > +	fd = open(path, O_RDWR);
> > +	if (fd < 0) {
> > +		mpsslog("Could not open %s %s\n", path, strerror(errno));
> > +		return;
> > +	}
> > +
> > +	err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd);
> > +	if (err < 0) {
> > +		mpsslog("Could not add %d %s\n", dd->type, strerror(errno));
> > +		close(fd);
> > +		return;
> > +	}
> > +	switch (dd->type) {
> > +	case VIRTIO_ID_NET:
> > +		mic->mic_net.virtio_net_fd = fd;
> > +		mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name);
> > +		break;
> > +	case VIRTIO_ID_CONSOLE:
> > +		mic->mic_console.virtio_console_fd = fd;
> > +		mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name);
> > +		break;
> > +	case VIRTIO_ID_BLOCK:
> > +		mic->mic_virtblk.virtio_block_fd = fd;
> > +		mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name);
> > +		break;
> > +	}
> > +}
> > +
> > +static bool
> > +set_backend_file(struct mic_info *mic)
> > +{
> > +	FILE *config;
> > +	char buff[PATH_MAX], *line, *evv, *p;
> > +
> > +	snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id);
> > +	config = fopen(buff, "r");
> > +	if (config == NULL)
> > +		return false;
> > +	do {  /* look for "virtblk_backend=XXXX" */
> > +		line = fgets(buff, PATH_MAX, config);
> > +		if (line == NULL)
> > +			break;
> > +		if (*line == '#')
> > +			continue;
> > +		p = strchr(line, '\n');
> > +		if (p)
> > +			*p = '\0';
> > +	} while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0);
> > +	fclose(config);
> > +	if (line == NULL)
> > +		return false;
> > +	evv = strchr(line, '=');
> > +	if (evv == NULL)
> > +		return false;
> > +	mic->mic_virtblk.backend_file = malloc(strlen(evv));
> > +	if (mic->mic_virtblk.backend_file == NULL) {
> > +		mpsslog("can't allocate memory\n", mic->name, mic->id);
> > +		return false;
> > +	}
> > +	strcpy(mic->mic_virtblk.backend_file, evv + 1);
> > +	return true;
> > +}
> > +
> > +#define SECTOR_SIZE 512
> > +static bool
> > +set_backend_size(struct mic_info *mic)
> > +{
> > +	mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0,
> > +		SEEK_END);
> > +	if (mic->mic_virtblk.backend_size < 0) {
> > +		mpsslog("%s: can't seek: %s\n",
> > +			mic->name, mic->mic_virtblk.backend_file);
> > +		return false;
> > +	}
> > +	virtblk_dev_page.blk_config.capacity =
> > +		mic->mic_virtblk.backend_size / SECTOR_SIZE;
> > +	if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0)
> > +		virtblk_dev_page.blk_config.capacity++;
> > +
> > +	virtblk_dev_page.blk_config.capacity =
> > +		htole64(virtblk_dev_page.blk_config.capacity);
> > +
> > +	return true;
> > +}
> > +
> > +static bool
> > +open_backend(struct mic_info *mic)
> > +{
> > +	if (!set_backend_file(mic))
> > +		goto _error_exit;
> > +	mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR);
> > +	if (mic->mic_virtblk.backend < 0) {
> > +		mpsslog("%s: can't open: %s\n", mic->name,
> > +			mic->mic_virtblk.backend_file);
> > +		goto _error_free;
> > +	}
> > +	if (!set_backend_size(mic))
> > +		goto _error_close;
> > +	mic->mic_virtblk.backend_addr = mmap(NULL,
> > +		mic->mic_virtblk.backend_size,
> > +		PROT_READ|PROT_WRITE, MAP_SHARED,
> > +		mic->mic_virtblk.backend, 0L);
> > +	if (mic->mic_virtblk.backend_addr == MAP_FAILED) {
> > +		mpsslog("%s: can't map: %s %s\n",
> > +			mic->name, mic->mic_virtblk.backend_file,
> > +			strerror(errno));
> > +		goto _error_close;
> > +	}
> > +	return true;
> > +
> > + _error_close:
> > +	close(mic->mic_virtblk.backend);
> > + _error_free:
> > +	free(mic->mic_virtblk.backend_file);
> > + _error_exit:
> > +	return false;
> > +}
> > +
> > +static void
> > +close_backend(struct mic_info *mic)
> > +{
> > +	munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size);
> > +	close(mic->mic_virtblk.backend);
> > +	free(mic->mic_virtblk.backend_file);
> > +}
> > +
> > +static bool
> > +start_virtblk(struct mic_info *mic, struct mic_vring *vring)
> > +{
> > +	if (((__u64)&virtblk_dev_page.blk_config % 8) != 0) {
> > +		mpsslog("%s: blk_config is not 8 byte aligned.\n",
> > +			mic->name);
> > +		return false;
> > +	}
> > +	add_virtio_device(mic, &virtblk_dev_page.dd);
> > +	if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd,
> > +		VIRTIO_ID_BLOCK, vring, NULL, virtblk_dev_page.dd.num_vq)) {
> > +		mpsslog("%s init_vr failed %s\n",
> > +			mic->name, strerror(errno));
> > +		return false;
> > +	}
> > +	return true;
> > +}
> > +
> > +static void
> > +stop_virtblk(struct mic_info *mic)
> > +{
> > +	uninit_vr(mic, virtblk_dev_page.dd.num_vq);
> > +	close(mic->mic_virtblk.virtio_block_fd);
> > +}
> > +
> > +static __u8
> > +header_error_check(struct vring_desc *desc)
> > +{
> > +	if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) {
> > +		mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n",
> > +				__func__, __LINE__);
> > +		return -EIO;
> > +	}
> > +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) {
> > +		mpsslog("%s() %d: alone\n",
> > +			__func__, __LINE__);
> > +		return -EIO;
> > +	}
> > +	if (le16toh(desc->flags) & VRING_DESC_F_WRITE) {
> > +		mpsslog("%s() %d: not read\n",
> > +			__func__, __LINE__);
> > +		return -EIO;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int
> > +read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx)
> > +{
> > +	struct iovec iovec;
> > +	struct mic_copy_desc copy;
> > +
> > +	iovec.iov_len = sizeof(*hdr);
> > +	iovec.iov_base = hdr;
> > +	copy.iov = &iovec;
> > +	copy.iovcnt = 1;
> > +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> > +	copy.update_used = false;  /* do not update used index */
> > +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> > +}
> > +
> > +static int
> > +transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt)
> > +{
> > +	struct mic_copy_desc copy;
> > +
> > +	copy.iov = iovec;
> > +	copy.iovcnt = iovcnt;
> > +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> > +	copy.update_used = false;  /* do not update used index */
> > +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> > +}
> > +
> > +static __u8
> > +status_error_check(struct vring_desc *desc)
> > +{
> > +	if (le32toh(desc->len) != sizeof(__u8)) {
> > +		mpsslog("%s() %d: length is not sizeof(status)\n",
> > +			__func__, __LINE__);
> > +		return -EIO;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int
> > +write_status(int fd, __u8 *status)
> > +{
> > +	struct iovec iovec;
> > +	struct mic_copy_desc copy;
> > +
> > +	iovec.iov_base = status;
> > +	iovec.iov_len = sizeof(*status);
> > +	copy.iov = &iovec;
> > +	copy.iovcnt = 1;
> > +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> > +	copy.update_used = true; /* Update used index */
> > +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> > +}
> > +
> > +static void *
> > +virtio_block(void *arg)
> > +{
> > +	struct mic_info *mic = (struct mic_info *) arg;
> > +	int ret;
> > +	struct pollfd block_poll;
> > +	struct mic_vring vring;
> > +	__u16 avail_idx;
> > +	__u32 desc_idx;
> > +	struct vring_desc *desc;
> > +	struct iovec *iovec, *piov;
> > +	__u8 status;
> > +	__u32 buffer_desc_idx;
> > +	struct virtio_blk_outhdr hdr;
> > +	void *fos;
> > +
> > +	for (;;) {  /* forever */
> > +		if (!open_backend(mic)) { /* No virtblk */
> > +			for (mic->mic_virtblk.signaled = 0;
> > +				!mic->mic_virtblk.signaled;)
> > +				sleep(1);
> > +			continue;
> > +		}
> > +
> > +		/* backend file is specified. */
> > +		if (!start_virtblk(mic, &vring))
> > +			goto _close_backend;
> > +		iovec = malloc(sizeof(*iovec) *
> > +			le32toh(virtblk_dev_page.blk_config.seg_max));
> > +		if (!iovec) {
> > +			mpsslog("%s: can't alloc iovec: %s\n",
> > +				mic->name, strerror(ENOMEM));
> > +			goto _stop_virtblk;
> > +		}
> > +
> > +		block_poll.fd = mic->mic_virtblk.virtio_block_fd;
> > +		block_poll.events = POLLIN;
> > +		for (mic->mic_virtblk.signaled = 0;
> > +		     !mic->mic_virtblk.signaled;) {
> > +			block_poll.revents = 0;
> > +					/* timeout in 1 sec to see signaled */
> > +			ret = poll(&block_poll, 1, 1000);
> > +			if (ret < 0) {
> > +				mpsslog("%s %d: poll failed: %s\n",
> > +					__func__, __LINE__,
> > +					strerror(errno));
> > +				continue;
> > +			}
> > +
> > +			if (!(block_poll.revents & POLLIN)) {
> > +#ifdef DEBUG
> > +				mpsslog("%s %d: block_poll.revents=0x%x\n",
> > +					__func__, __LINE__, block_poll.revents);
> > +				sleep(1);
> > +#endif
> > +				continue;
> > +			}
> > +
> > +			/* POLLIN */
> > +			while (vring.info->avail_idx !=
> > +				le16toh(vring.vr.avail->idx)) {
> > +				/* read header element */
> > +				avail_idx =
> > +					vring.info->avail_idx &
> > +					(vring.vr.num - 1);
> > +				desc_idx = le16toh(
> > +					vring.vr.avail->ring[avail_idx]);
> > +				desc = &vring.vr.desc[desc_idx];
> > +#ifdef DEBUG
> > +				mpsslog("%s() %d: avail_idx=%d ",
> > +					__func__, __LINE__,
> > +					vring.info->avail_idx);
> > +				mpsslog("vring.vr.num=%d desc=%p\n",
> > +					vring.vr.num, desc);
> > +#endif
> > +				status = header_error_check(desc);
> > +				ret = read_header(
> > +					mic->mic_virtblk.virtio_block_fd,
> > +					&hdr, desc_idx);
> > +				if (ret < 0) {
> > +					mpsslog("%s() %d %s: ret=%d %s\n",
> > +						__func__, __LINE__,
> > +						mic->name, ret,
> > +						strerror(errno));
> > +					break;
> > +				}
> > +				/* buffer element */
> > +				piov = iovec;
> > +				status = 0;
> > +				fos = mic->mic_virtblk.backend_addr +
> > +					(hdr.sector * SECTOR_SIZE);
> > +				buffer_desc_idx = desc_idx =
> > +					next_desc(desc);
> > +				for (desc = &vring.vr.desc[buffer_desc_idx];
> > +				     desc->flags & VRING_DESC_F_NEXT;
> > +				     desc_idx = next_desc(desc),
> > +					     desc = &vring.vr.desc[desc_idx]) {
> > +					piov->iov_len = desc->len;
> > +					piov->iov_base = fos;
> > +					piov++;
> > +					fos += desc->len;
> > +				}
> > +				/* Returning NULLs for VIRTIO_BLK_T_GET_ID. */
> > +				if (hdr.type & ~(VIRTIO_BLK_T_OUT |
> > +					VIRTIO_BLK_T_GET_ID)) {
> > +					/*
> > +					  VIRTIO_BLK_T_IN - does not do
> > +					  anything. Probably for documenting.
> > +					  VIRTIO_BLK_T_SCSI_CMD - for
> > +					  virtio_scsi.
> > +					  VIRTIO_BLK_T_FLUSH - turned off in
> > +					  config space.
> > +					  VIRTIO_BLK_T_BARRIER - defined but not
> > +					  used in anywhere.
> > +					*/
> > +					mpsslog("%s() %d: type %x ",
> > +						__func__, __LINE__,
> > +						hdr.type);
> > +					mpsslog("is not supported\n");
> > +					status = -ENOTSUP;
> > +
> > +				} else {
> > +					ret = transfer_blocks(
> > +					mic->mic_virtblk.virtio_block_fd,
> > +						iovec,
> > +						piov - iovec);
> > +					if (ret < 0 &&
> > +						status != 0)
> > +						status = ret;
> > +				}
> > +				/* write status and update used pointer */
> > +				if (status != 0)
> > +					status = status_error_check(desc);
> > +				ret = write_status(
> > +					mic->mic_virtblk.virtio_block_fd,
> > +					&status);
> > +#ifdef DEBUG
> > +				mpsslog("%s() %d: write status=%d on desc=%p\n",
> > +					__func__, __LINE__,
> > +					status, desc);
> > +#endif
> > +			}
> > +		}
> > +		free(iovec);
> > +_stop_virtblk:
> > +		stop_virtblk(mic);
> > +_close_backend:
> > +		close_backend(mic);
> > +	}  /* forever */
> > +
> > +	pthread_exit(NULL);
> > +}
> > +
> > +static void
> > +reset(struct mic_info *mic)
> > +{
> > +#define RESET_TIMEOUT 120
> > +	int i = RESET_TIMEOUT;
> > +	setsysfs(mic->name, "state", "reset");
> > +	while (i) {
> > +		char *state;
> > +		state = readsysfs(mic->name, "state");
> > +		if (!state)
> > +			goto retry;
> > +		mpsslog("%s: %s %d state %s\n",
> > +			mic->name, __func__, __LINE__, state);
> > +		if ((!strcmp(state, "offline"))) {
> > +			free(state);
> > +			break;
> > +		}
> > +		free(state);
> > +retry:
> > +		sleep(1);
> > +		i--;
> > +	}
> > +}
> > +
> > +static int
> > +get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status)
> > +{
> > +	if (!strcmp(shutdown_status, "nop"))
> > +		return MIC_NOP;
> > +	if (!strcmp(shutdown_status, "crashed"))
> > +		return MIC_CRASHED;
> > +	if (!strcmp(shutdown_status, "halted"))
> > +		return MIC_HALTED;
> > +	if (!strcmp(shutdown_status, "poweroff"))
> > +		return MIC_POWER_OFF;
> > +	if (!strcmp(shutdown_status, "restart"))
> > +		return MIC_RESTART;
> > +	mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status);
> > +	/* Invalid state */
> > +	assert(0);
> > +};
> > +
> > +static int get_mic_state(struct mic_info *mic, char *state)
> > +{
> > +	if (!strcmp(state, "offline"))
> > +		return MIC_OFFLINE;
> > +	if (!strcmp(state, "online"))
> > +		return MIC_ONLINE;
> > +	if (!strcmp(state, "shutting_down"))
> > +		return MIC_SHUTTING_DOWN;
> > +	if (!strcmp(state, "reset_failed"))
> > +		return MIC_RESET_FAILED;
> > +	mpsslog("%s: BUG invalid state %s\n", mic->name, state);
> > +	/* Invalid state */
> > +	assert(0);
> > +};
> > +
> > +static void mic_handle_shutdown(struct mic_info *mic)
> > +{
> > +#define SHUTDOWN_TIMEOUT 60
> > +	int i = SHUTDOWN_TIMEOUT, ret, stat = 0;
> > +	char *shutdown_status;
> > +	while (i) {
> > +		shutdown_status = readsysfs(mic->name, "shutdown_status");
> > +		if (!shutdown_status)
> > +			continue;
> > +		mpsslog("%s: %s %d shutdown_status %s\n",
> > +			mic->name, __func__, __LINE__, shutdown_status);
> > +		switch (get_mic_shutdown_status(mic, shutdown_status)) {
> > +		case MIC_RESTART:
> > +			mic->restart = 1;
> > +		case MIC_HALTED:
> > +		case MIC_POWER_OFF:
> > +		case MIC_CRASHED:
> > +			goto reset;
> > +		default:
> > +			break;
> > +		}
> > +		free(shutdown_status);
> > +		sleep(1);
> > +		i--;
> > +	}
> > +reset:
> > +	ret = kill(mic->pid, SIGTERM);
> > +	mpsslog("%s: %s %d kill pid %d ret %d\n",
> > +		mic->name, __func__, __LINE__,
> > +		mic->pid, ret);
> > +	if (!ret) {
> > +		ret = waitpid(mic->pid, &stat,
> > +			WIFSIGNALED(stat));
> > +		mpsslog("%s: %s %d waitpid ret %d pid %d\n",
> > +			mic->name, __func__, __LINE__,
> > +			ret, mic->pid);
> > +	}
> > +	if (ret == mic->pid)
> > +		reset(mic);
> > +}
> > +
> > +static void *
> > +mic_config(void *arg)
> > +{
> > +	struct mic_info *mic = (struct mic_info *)arg;
> > +	char *state = NULL;
> > +	char pathname[PATH_MAX];
> > +	int fd, ret;
> > +	struct pollfd ufds[1];
> > +	char value[4096];
> > +
> > +	snprintf(pathname, PATH_MAX - 1, "%s/%s/%s",
> > +		MICSYSFSDIR, mic->name, "state");
> > +
> > +	fd = open(pathname, O_RDONLY);
> > +	if (fd < 0) {
> > +		mpsslog("%s: opening file %s failed %s\n",
> > +			mic->name, pathname, strerror(errno));
> > +		goto error;
> > +	}
> > +
> > +	do {
> > +		ret = read(fd, value, sizeof(value));
> > +		if (ret < 0) {
> > +			mpsslog("%s: Failed to read sysfs entry '%s': %s\n",
> > +				mic->name, pathname, strerror(errno));
> > +			goto close_error1;
> > +		}
> > +retry:
> > +		state = readsysfs(mic->name, "state");
> > +		if (!state)
> > +			goto retry;
> > +		mpsslog("%s: %s %d state %s\n",
> > +			mic->name, __func__, __LINE__, state);
> > +		switch (get_mic_state(mic, state)) {
> > +		case MIC_SHUTTING_DOWN:
> > +			mic_handle_shutdown(mic);
> > +			goto close_error;
> > +		default:
> > +			break;
> > +		}
> > +		free(state);
> > +
> > +		ufds[0].fd = fd;
> > +		ufds[0].events = POLLERR | POLLPRI;
> > +		ret = poll(ufds, 1, -1);
> > +		if (ret < 0) {
> > +			mpsslog("%s: poll failed %s\n",
> > +				mic->name, strerror(errno));
> > +			goto close_error1;
> > +		}
> > +	} while (1);
> > +close_error:
> > +	free(state);
> > +close_error1:
> > +	close(fd);
> > +error:
> > +	init_mic(mic);
> > +	pthread_exit(NULL);
> > +}
> > +
> > +static void
> > +set_cmdline(struct mic_info *mic)
> > +{
> > +	char buffer[PATH_MAX];
> > +	int len;
> > +
> > +	len = snprintf(buffer, PATH_MAX,
> > +		"clocksource=tsc highres=off nohz=off ");
> > +	len += snprintf(buffer + len, PATH_MAX,
> > +		"cpufreq_on;corec6_off;pc3_off;pc6_off ");
> > +	len += snprintf(buffer + len, PATH_MAX,
> > +		"ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0",
> > +		mic->id);
> > +
> > +	setsysfs(mic->name, "cmdline", buffer);
> > +	mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer);
> > +	snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id);
> > +	mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer);
> > +}
> > +
> > +static void
> > +set_log_buf_info(struct mic_info *mic)
> > +{
> > +	int fd;
> > +	off_t len;
> > +	char system_map[] = "/lib/firmware/mic/System.map";
> > +	char *map, *temp, log_buf[17] = {'\0'};
> > +
> > +	fd = open(system_map, O_RDONLY);
> > +	if (fd < 0) {
> > +		mpsslog("%s: Opening System.map failed: %d\n",
> > +			mic->name, errno);
> > +		return;
> > +	}
> > +	len = lseek(fd, 0, SEEK_END);
> > +	if (len < 0) {
> > +		mpsslog("%s: Reading System.map size failed: %d\n",
> > +			mic->name, errno);
> > +		close(fd);
> > +		return;
> > +	}
> > +	map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
> > +	if (map == MAP_FAILED) {
> > +		mpsslog("%s: mmap of System.map failed: %d\n",
> > +			mic->name, errno);
> > +		close(fd);
> > +		return;
> > +	}
> > +	temp = strstr(map, "__log_buf");
> > +	if (!temp) {
> > +		mpsslog("%s: __log_buf not found: %d\n", mic->name, errno);
> > +		munmap(map, len);
> > +		close(fd);
> > +		return;
> > +	}
> > +	strncpy(log_buf, temp - 19, 16);
> > +	setsysfs(mic->name, "log_buf_addr", log_buf);
> > +	mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf);
> > +	temp = strstr(map, "log_buf_len");
> > +	if (!temp) {
> > +		mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno);
> > +		munmap(map, len);
> > +		close(fd);
> > +		return;
> > +	}
> > +	strncpy(log_buf, temp - 19, 16);
> > +	setsysfs(mic->name, "log_buf_len", log_buf);
> > +	mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf);
> > +	munmap(map, len);
> > +	close(fd);
> > +}
> > +
> > +static void init_mic(struct mic_info *mic);
> > +
> > +static void
> > +change_virtblk_backend(int x, siginfo_t *siginfo, void *p)
> > +{
> > +	struct mic_info *mic;
> > +
> > +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> > +		mic->mic_virtblk.signaled = 1/* true */;
> > +}
> > +
> > +static void
> > +init_mic(struct mic_info *mic)
> > +{
> > +	struct sigaction ignore = {
> > +		.sa_flags = 0,
> > +		.sa_handler = SIG_IGN
> > +	};
> > +	struct sigaction act = {
> > +		.sa_flags = SA_SIGINFO,
> > +		.sa_sigaction = change_virtblk_backend,
> > +	};
> > +	char buffer[PATH_MAX];
> > +	int err;
> > +
> > +		/* ignore SIGUSR1 for both process */
> > +	sigaction(SIGUSR1, &ignore, NULL);
> > +
> > +	mic->pid = fork();
> > +	switch (mic->pid) {
> > +	case 0:
> > +		set_log_buf_info(mic);
> > +		set_cmdline(mic);
> > +		add_virtio_device(mic, &virtcons_dev_page.dd);
> > +		add_virtio_device(mic, &virtnet_dev_page.dd);
> > +		err = pthread_create(&mic->mic_console.console_thread, NULL,
> > +			virtio_console, mic);
> > +		if (err)
> > +			mpsslog("%s virtcons pthread_create failed %s\n",
> > +			mic->name, strerror(err));
> > +		/*
> > +		 * TODO: Debug why not adding this sleep results in the tap
> > +		 * interface not coming up during certain runs sporadically.
> > +		 */
> 
> Indeed.
> 

Yes, we will look into removing this workaround for the next revision.

> > +		usleep(1000);
> > +		err = pthread_create(&mic->mic_net.net_thread, NULL,
> > +			virtio_net, mic);
> > +		if (err)
> > +			mpsslog("%s virtnet pthread_create failed %s\n",
> > +			mic->name, strerror(err));
> > +		err = pthread_create(&mic->mic_virtblk.block_thread, NULL,
> > +			virtio_block, mic);
> > +		if (err)
> > +			mpsslog("%s virtblk pthread_create failed %s\n",
> > +			mic->name, strerror(err));
> > +		sigemptyset(&act.sa_mask);
> > +		err = sigaction(SIGUSR1, &act, NULL);
> 
> Confused. Who sends this SIGUSR1 here?
> 

Currently, one virtio block device is supported for each MIC card at a
time. Any user (or test) can send a SIGUSR1 to the MIC daemon. The
signal informs the virtio block backend about a change in the
configuration file which specifies the virtio backend file name on the
host. Virtio block backend then re-reads the configuration file and
switches to the new block device. This signalling mechanism may not be
required once multiple virtio block devices are supported by the MIC
daemon. We will document the current signal handling mechanism in the
next revision till such time that it can be nuked.

> 
> > +		if (err)
> > +			mpsslog("%s sigaction SIGUSR1 failed %s\n",
> > +			mic->name, strerror(errno));
> > +		while (1)
> > +			sleep(60);
> > +	case -1:
> > +		mpsslog("fork failed MIC name %s id %d errno %d\n",
> > +			mic->name, mic->id, errno);
> > +		break;
> > +	default:
> > +		if (mic->restart) {
> > +			snprintf(buffer, PATH_MAX,
> > +				"boot:linux:mic/uos.img:mic/mic%d.image",
> > +				mic->id);
> > +			setsysfs(mic->name, "state", buffer);
> > +			mpsslog("%s restarting mic %d\n",
> > +				mic->name, mic->restart);
> > +			mic->restart = 0;
> > +		}
> > +		pthread_create(&mic->config_thread, NULL, mic_config, mic);
> > +	}
> > +}
> > +
> > +static void
> > +start_daemon(void)
> > +{
> > +	struct mic_info *mic;
> > +
> > +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> > +		init_mic(mic);
> > +
> > +	while (1)
> > +		sleep(60);
> > +}
> > +
> > +static int
> > +init_mic_list(void)
> > +{
> > +	struct mic_info *mic = &mic_list;
> > +	struct dirent *file;
> > +	DIR *dp;
> > +	int cnt = 0;
> > +
> > +	dp = opendir(MICSYSFSDIR);
> > +	if (!dp)
> > +		return 0;
> > +
> > +	while ((file = readdir(dp)) != NULL) {
> > +		if (!strncmp(file->d_name, "mic", 3)) {
> > +			mic->next = malloc(sizeof(struct mic_info));
> > +			if (mic->next) {
> > +				mic = mic->next;
> > +				mic->next = NULL;
> > +				memset(mic, 0, sizeof(struct mic_info));
> > +				mic->id = atoi(&file->d_name[3]);
> > +				mic->name = malloc(strlen(file->d_name) + 16);
> > +				if (mic->name)
> > +					strcpy(mic->name, file->d_name);
> > +				mpsslog("MIC name %s id %d\n", mic->name,
> > +					mic->id);
> > +				cnt++;
> > +			}
> > +		}
> > +	}
> > +
> > +	closedir(dp);
> > +	return cnt;
> > +}
> > +
> > +void
> > +mpsslog(char *format, ...)
> > +{
> > +	va_list args;
> > +	char buffer[4096];
> > +	time_t t;
> > +	char *ts;
> > +
> > +	if (logfp == NULL)
> > +		return;
> > +
> > +	va_start(args, format);
> > +	vsprintf(buffer, format, args);
> > +	va_end(args);
> > +
> > +	time(&t);
> > +	ts = ctime(&t);
> > +	ts[strlen(ts) - 1] = '\0';
> > +	fprintf(logfp, "%s: %s", ts, buffer);
> > +
> > +	fflush(logfp);
> > +}
> > +
> > +int
> > +main(int argc, char *argv[])
> > +{
> > +	int cnt;
> > +
> > +	myname = argv[0];
> > +
> > +	logfp = fopen(LOGFILE_NAME, "a+");
> > +	if (!logfp) {
> > +		fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME);
> > +		exit(1);
> > +	}
> > +
> > +	mpsslog("MIC Daemon start\n");
> > +
> > +	cnt = init_mic_list();
> > +	if (cnt == 0) {
> > +		mpsslog("MIC module not loaded\n");
> > +		exit(2);
> > +	}
> > +	mpsslog("MIC found %d devices\n", cnt);
> > +
> > +	start_daemon();
> > +
> > +	exit(0);
> > +}
> > diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h
> > new file mode 100644
> > index 0000000..b6dee38
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/mpssd.h
> > @@ -0,0 +1,100 @@
> > +/*
> > + * Intel MIC Platform Software Stack (MPSS)
> > + *
> > + * Copyright(c) 2013 Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * General Public License for more details.
> > + *
> > + * The full GNU General Public License is included in this distribution in
> > + * the file called "COPYING".
> > + *
> > + * Intel MIC User Space Tools.
> > + */
> > +#ifndef _MPSSD_H_
> > +#define _MPSSD_H_
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <fcntl.h>
> > +#include <unistd.h>
> > +#include <dirent.h>
> > +#include <libgen.h>
> > +#include <pthread.h>
> > +#include <stdarg.h>
> > +#include <time.h>
> > +#include <errno.h>
> > +#include <sys/dir.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/poll.h>
> > +#include <sys/types.h>
> > +#include <sys/socket.h>
> > +#include <sys/stat.h>
> > +#include <sys/types.h>
> > +#include <sys/mman.h>
> > +#include <sys/utsname.h>
> > +#include <sys/wait.h>
> > +#include <netinet/in.h>
> > +#include <arpa/inet.h>
> > +#include <netdb.h>
> > +#include <pthread.h>
> > +#include <signal.h>
> > +#include <limits.h>
> > +#include <syslog.h>
> > +#include <getopt.h>
> > +#include <net/if.h>
> > +#include <linux/if_tun.h>
> > +#include <linux/if_tun.h>
> > +#include <linux/virtio_ids.h>
> > +
> > +#define MICSYSFSDIR "/sys/class/mic"
> > +#define LOGFILE_NAME "/var/log/mpssd"
> > +#define PAGE_SIZE 4096
> > +
> > +struct mic_console_info {
> > +	pthread_t       console_thread;
> > +	int		virtio_console_fd;
> > +	void		*console_dp;
> > +};
> > +
> > +struct mic_net_info {
> > +	pthread_t       net_thread;
> > +	int		virtio_net_fd;
> > +	int		tap_fd;
> > +	void		*net_dp;
> > +};
> > +
> > +struct mic_virtblk_info {
> > +	pthread_t       block_thread;
> > +	int		virtio_block_fd;
> > +	void		*block_dp;
> > +	volatile sig_atomic_t	signaled;
> > +	char		*backend_file;
> > +	int		backend;
> > +	void		*backend_addr;
> > +	long		backend_size;
> > +};
> > +
> > +struct mic_info {
> > +	int		id;
> > +	char		*name;
> > +	pthread_t       config_thread;
> > +	pid_t		pid;
> > +	struct mic_console_info	mic_console;
> > +	struct mic_net_info	mic_net;
> > +	struct mic_virtblk_info	mic_virtblk;
> > +	int		restart;
> > +	struct mic_info *next;
> > +};
> > +
> > +void mpsslog(char *format, ...);
> > +char *readsysfs(char *dir, char *entry);
> > +int setsysfs(char *dir, char *entry, char *value);
> > +#endif
> > diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c
> > new file mode 100644
> > index 0000000..3244dcf
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/sysfs.c
> > @@ -0,0 +1,103 @@
> > +/*
> > + * Intel MIC Platform Software Stack (MPSS)
> > + *
> > + * Copyright(c) 2013 Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * General Public License for more details.
> > + *
> > + * The full GNU General Public License is included in this distribution in
> > + * the file called "COPYING".
> > + *
> > + * Intel MIC User Space Tools.
> > + */
> > +
> > +#include "mpssd.h"
> > +
> > +#define PAGE_SIZE 4096
> > +
> > +char *
> > +readsysfs(char *dir, char *entry)
> > +{
> > +	char filename[PATH_MAX];
> > +	char value[PAGE_SIZE];
> > +	char *string = NULL;
> > +	int fd;
> > +	int len;
> > +
> > +	if (dir == NULL)
> > +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> > +	else
> > +		snprintf(filename, PATH_MAX,
> > +			"%s/%s/%s", MICSYSFSDIR, dir, entry);
> > +
> > +	fd = open(filename, O_RDONLY);
> > +	if (fd < 0) {
> > +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> > +			filename, strerror(errno));
> > +		return NULL;
> > +	}
> > +
> > +	len = read(fd, value, sizeof(value));
> > +	if (len < 0) {
> > +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> > +			filename, strerror(errno));
> > +		goto readsys_ret;
> > +	}
> > +
> > +	value[len] = '\0';
> 
> Why are you careful to put this \0 here but not in setsysfs below?
> 
> If you do, I'd fail on len == sizeof value as well, it isn't going to work with
> that.
> 

Sysfs entries generally return the string ending with a newline. We
should ideally convert the newline to a NULL termination uniformly
across readsysfs/setsysfs APIs in this file. We will make these changes
for the next revision.

Thanks for the review!

Sudeep Dutt

> > +
> > +	string = malloc(strlen(value) + 1);
> > +	if (string)
> > +		strcpy(string, value);
> > +
> > +readsys_ret:
> > +	close(fd);
> > +	return string;
> > +}
> > +
> > +int
> > +setsysfs(char *dir, char *entry, char *value)
> > +{
> > +	char filename[PATH_MAX];
> > +	char oldvalue[PAGE_SIZE];
> > +	int fd;
> > +
> > +	if (dir == NULL)
> > +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> > +	else
> > +		snprintf(filename, PATH_MAX, "%s/%s/%s",
> > +			MICSYSFSDIR, dir, entry);
> > +
> > +	fd = open(filename, O_RDWR);
> > +	if (fd < 0) {
> > +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> > +			filename, strerror(errno));
> > +		return errno;
> > +	}
> > +
> > +	if (read(fd, oldvalue, sizeof(oldvalue)) < 0) {
> > +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> > +			filename, strerror(errno));
> > +		close(fd);
> > +		return errno;
> > +	}
> > +
> > +	if (strcmp(value, oldvalue)) {
> > +		if (write(fd, value, strlen(value)) < 0) {
> > +			mpsslog("Failed to write new sysfs entry '%s': %s\n",
> > +				filename, strerror(errno));
> > +			close(fd);
> > +			return errno;
> > +		}
> > +	}
> > +
> > +	close(fd);
> > +	return 0;
> > +}
> > -- 
> > 1.8.2.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon.
  2013-08-08  6:40     ` Michael S. Tsirkin
  (?)
  (?)
@ 2013-08-09 16:47     ` Sudeep Dutt
  -1 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-09 16:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Sudeep Dutt, Peter P Waskiewicz Jr, Arnd Bergmann, linux-doc,
	Greg Kroah-Hartman, Yaozu (Eddie) Dong, linux-kernel,
	virtualization, Ashutosh Dixit, Rob Landley,
	Harshavardhan R Kharche, Caz Yokoyama,
	Dasaratharaman Chandramouli

On Thu, 2013-08-08 at 09:40 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 07, 2013 at 08:04:13PM -0700, Sudeep Dutt wrote:
> > From: Caz Yokoyama <Caz.Yokoyama@intel.com>
> > 
> > This patch introduces a sample user space daemon which
> > implements the virtio device backends on the host. The daemon
> > creates/removes/configures virtio device backends by communicating with
> > the Intel MIC Host Driver. The virtio devices currently supported are
> > virtio net, virtio console and virtio block. Virtio net supports TSO/GSO.
> > The daemon also monitors card shutdown status and takes appropriate actions
> > like killing the virtio backends and resetting the card upon card shutdown
> > and crashes.
> > 
> > Co-author: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > Co-author: Sudeep Dutt <sudeep.dutt@intel.com>
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > Signed-off-by: Caz Yokoyama <Caz.Yokoyama@intel.com>
> > Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
> > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
> > Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
> > Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
> > ---
> >  Documentation/mic/mic_overview.txt |   48 +
> >  Documentation/mic/mpssd/.gitignore |    1 +
> >  Documentation/mic/mpssd/Makefile   |   19 +
> >  Documentation/mic/mpssd/micctrl    |  152 ++++
> >  Documentation/mic/mpssd/mpss       |  245 ++++++
> >  Documentation/mic/mpssd/mpssd.c    | 1689 ++++++++++++++++++++++++++++++++++++
> >  Documentation/mic/mpssd/mpssd.h    |  100 +++
> >  Documentation/mic/mpssd/sysfs.c    |  103 +++
> 
> Is this generally useful or just example code?
> If the former, you can put it in tools/ as well.
> 

Currently, this is just sample working code specific to configuring MIC
devices. The longer term plan might be to move this code to tools but
not with this patch series.

> >  8 files changed, 2357 insertions(+)
> >  create mode 100644 Documentation/mic/mic_overview.txt
> >  create mode 100644 Documentation/mic/mpssd/.gitignore
> >  create mode 100644 Documentation/mic/mpssd/Makefile
> >  create mode 100755 Documentation/mic/mpssd/micctrl
> >  create mode 100755 Documentation/mic/mpssd/mpss
> >  create mode 100644 Documentation/mic/mpssd/mpssd.c
> >  create mode 100644 Documentation/mic/mpssd/mpssd.h
> >  create mode 100644 Documentation/mic/mpssd/sysfs.c
> > 
> > diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt
> > new file mode 100644
> > index 0000000..8b1a916
> > --- /dev/null
> > +++ b/Documentation/mic/mic_overview.txt
> > @@ -0,0 +1,48 @@
> > +An Intel MIC X100 device is a PCIe form factor add-in coprocessor
> > +card based on the Intel Many Integrated Core (MIC) architecture
> > +that runs a Linux OS. It is a PCIe endpoint in a platform and therefore
> > +implements the three required standard address spaces i.e. configuration,
> > +memory and I/O. The host OS loads a device driver as is typical for
> > +PCIe devices. The card itself runs a bootstrap after reset that
> > +transfers control to the card OS downloaded from the host driver.
> > +The card OS as shipped by Intel is a Linux kernel with modifications
> > +for the X100 devices.
> > +
> > +Since it is a PCIe card, it does not have the ability to host hardware
> > +devices for networking, storage and console. We provide these devices
> > +on X100 coprocessors thus enabling a self-bootable equivalent environment
> > +for applications. A key benefit of our solution is that it leverages
> > +the standard virtio framework for network, disk and console devices,
> > +though in our case the virtio framework is used across a PCIe bus.
> > +
> > +Here is a block diagram of the various components described above. The
> > +virtio backends are situated on the host rather than the card given better
> > +single threaded performance for the host compared to MIC and the ability of
> > +the host to initiate DMA's to/from the card using the MIC DMA engine.
> > +
> > +                              |
> > +       +----------+           |             +----------+
> > +       | Card OS  |           |             | Host OS  |
> > +       +----------+           |             +----------+
> > +                              |
> > ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> > +| Virtio| |Virtio  | |Virtio| | |Virtio   |  |Virtio  | |Virtio  |
> > +| Net   | |Console | |Block | | |Net      |  |Console | |Block   |
> > +| Driver| |Driver  | |Driver| | |backend  |  |backend | |backend |
> > ++-------+ +--------+ +------+ | +---------+  +--------+ +--------+
> > +    |         |         |     |      |            |         |
> > +    |         |         |     |Ring 3|            |         |
> > +    |         |         |     |------|------------|---------|-------
> > +    +-------------------+     |Ring 0+--------------------------+
> > +              |               |      | Virtio over PCIe IOCTLs  |
> > +              |               |      +--------------------------+
> > +      +--------------+        |                   |
> > +      |Intel MIC     |        |            +---------------+
> > +      |Card Driver   |        |            |Intel MIC      |
> > +      +--------------+        |            |Host Driver    |
> > +              |               |            +---------------+
> > +              |               |                   |
> > +     +-------------------------------------------------------------+
> > +     |                                                             |
> > +     |                    PCIe Bus                                 |
> > +     +-------------------------------------------------------------+
> > diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore
> > new file mode 100644
> > index 0000000..8b7c72f
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/.gitignore
> > @@ -0,0 +1 @@
> > +mpssd
> > diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile
> > new file mode 100644
> > index 0000000..eb860a7
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/Makefile
> > @@ -0,0 +1,19 @@
> > +#
> > +# Makefile - Intel MIC User Space Tools.
> > +# Copyright(c) 2013, Intel Corporation.
> > +#
> > +ifdef DEBUG
> > +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG)
> > +else
> > +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall
> > +endif
> > +
> > +mpssd: mpssd.o sysfs.o
> > +	$(CC) $(CFLAGS) -o $@ $^ -lpthread
> > +
> > +install:
> > +	install mpssd /usr/sbin/mpssd
> > +	install micctrl /usr/sbin/micctrl
> > +
> > +clean:
> > +	rm -f mpssd *.o
> > diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl
> > new file mode 100755
> > index 0000000..e0cfa53
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/micctrl
> > @@ -0,0 +1,152 @@
> > +#!/bin/bash
> > +# Intel MIC Platform Software Stack (MPSS)
> > +#
> > +# Copyright(c) 2013 Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License, version 2, as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it will be useful, but
> > +# WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > +# General Public License for more details.
> > +#
> > +# The full GNU General Public License is included in this distribution in
> > +# the file called "COPYING".
> > +#
> > +# Intel MIC User Space Tools.
> > +#
> > +# micctrl - Controls MIC boot/start/stop.
> > +#
> > +# chkconfig: 2345 95 05
> > +# description: start MPSS stack processing.
> > +#
> > +### BEGIN INIT INFO
> > +# Provides: micctrl
> > +### END INIT INFO
> > +
> > +# Source function library.
> > +. /etc/init.d/functions
> > +
> > +sysfs="/sys/class/mic"
> > +
> > +status()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`"
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo -e ""`basename $f`" state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`""
> > +		done
> > +	fi
> > +
> > +	return 0
> > +}
> > +
> > +reset()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		echo reset > $f/state
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo reset > $f/state
> > +		done
> > +	fi
> > +
> > +	return 0
> > +}
> > +
> > +boot()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		echo "boot:linux:mic/uos.img:mic/$1.image" > $f/state
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> > +		done
> > +	fi
> > +
> > +	return 0
> > +}
> > +
> > +shutdown()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		echo shutdown > $f/state
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo shutdown > $f/state
> > +		done
> > +	fi
> > +
> > +	return 0
> > +}
> > +
> > +wait()
> > +{
> > +	if [ "`echo $1 | head -c3`" == "mic" ]; then
> > +		f=$sysfs/$1
> > +		while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> > +		do
> > +			sleep 1
> > +			echo -e "Waiting for $1 to go offline"
> > +		done
> > +		return 0
> > +	fi
> > +
> > +	if [ -d "$sysfs" ]; then
> > +		# Wait for the cards to go offline
> > +		for f in $sysfs/*
> > +		do
> > +			while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ]
> > +			do
> > +				sleep 1
> > +				echo -e "Waiting for "`basename $f`" to go offline"
> > +			done
> > +		done
> > +	fi
> > +}
> > +
> > +case $1 in
> > +	-s)
> > +		status $2
> > +		;;
> > +	-r)
> > +		reset $2
> > +		;;
> > +	-b)
> > +		boot $2
> > +		;;
> > +	-S)
> > +		shutdown $2
> > +		;;
> > +	-w)
> > +		wait $2
> > +		;;
> > +	*)
> > +		echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}"
> > +		exit 2
> > +esac
> > +
> > +exit $?
> > diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss
> > new file mode 100755
> > index 0000000..f0bb3dd
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/mpss
> > @@ -0,0 +1,245 @@
> > +#!/bin/bash
> > +# Intel MIC Platform Software Stack (MPSS)
> > +#
> > +# Copyright(c) 2013 Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License, version 2, as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it will be useful, but
> > +# WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > +# General Public License for more details.
> > +#
> > +# The full GNU General Public License is included in this distribution in
> > +# the file called "COPYING".
> > +#
> > +# Intel MIC User Space Tools.
> > +#
> > +# mpss	Start mpssd.
> > +#
> > +# chkconfig: 2345 95 05
> > +# description: start MPSS stack processing.
> > +#
> > +### BEGIN INIT INFO
> > +# Provides: mpss
> > +# Required-Start:
> > +# Required-Stop:
> > +# Short-Description: MPSS stack control
> > +# Description: MPSS stack control
> > +### END INIT INFO
> > +
> > +# Source function library.
> > +. /etc/init.d/functions
> > +
> > +exec=/usr/sbin/mpssd
> > +sysfs="/sys/class/mic"
> > +
> > +start()
> > +{
> > +	[ -x $exec ] || exit 5
> > +
> > +	echo -e $"Starting MPSS Stack"
> > +
> > +	echo -e $"Loading MIC_HOST Module"
> > +
> > +	# Ensure the driver is loaded
> > +	[ -d "$sysfs" ] || modprobe mic_host
> > +
> > +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then
> > +		echo -e $"MPSSD already running! "
> > +		success
> > +		echo
> > +		return 0;
> > +	fi
> > +
> > +	# Start the daemon
> > +	echo -n $"Starting MPSSD"
> > +	$exec &
> > +	RETVAL=$?
> > +	if [ $RETVAL -ne 0 ]; then
> > +		failure
> > +	else
> > +		success
> > +	fi
> > +	echo
> > +
> > +	sleep 5
> > +
> > +	# Boot the cards
> > +	if [ $RETVAL -eq 0 ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo -ne "Booting "`basename $f`" "
> > +			echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state
> > +			RETVAL=$?
> > +			if [ $RETVAL -ne 0 ]; then
> > +				failure
> > +			else
> > +				success
> > +			fi
> > +			echo
> > +		done
> > +	fi
> > +
> > +	# Wait till ping works
> > +	if [ $RETVAL -eq 0 ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			count=100
> > +			ipaddr=`cat $f/cmdline`
> > +			ipaddr=${ipaddr#*address,}
> > +			ipaddr=`echo $ipaddr | cut -d, -f1 | cut -d\; -f1`
> > +
> > +			while [ $count -ge 0 ]
> > +			do
> > +				echo -e "Pinging "`basename $f`" "
> > +				ping -c 1 $ipaddr &> /dev/null
> > +				RETVAL=$?
> > +				if [ $RETVAL -eq 0 ]; then
> > +					success
> > +					break
> > +				fi
> > +				sleep 1
> > +				count=`expr $count - 1`
> > +			done
> > +			if [ $RETVAL -ne 0 ]; then
> > +				failure
> > +			else
> > +				success
> > +			fi
> > +			echo
> > +		done
> > +	fi
> > +	return $RETVAL
> > +}
> > +
> > +stop()
> > +{
> > +	echo -e $"Shutting down MPSS Stack: "
> > +
> > +	# Bail out if module is unloaded
> > +	if [ ! -d "$sysfs" ]; then
> > +		echo -n $"Module unloaded "
> > +		killall -9 mpssd 2>/dev/null
> > +		success
> > +		echo
> > +		return 0
> > +	fi
> > +
> > +	# Shut down the cards
> > +	for f in $sysfs/*
> > +	do
> > +		echo -e "Shutting down `basename $f` "
> > +		echo "shutdown" > $f/state 2>/dev/null
> > +	done
> > +
> > +	# Wait for the cards to go offline
> > +	for f in $sysfs/*
> > +	do
> > +		while [ "`cat $f/state`" != "offline" ]
> > +		do
> > +			sleep 1
> > +			echo -e "Waiting for "`basename $f`" to go offline"
> > +		done
> > +	done
> > +
> > +	# Display the status of the cards
> > +	for f in $sysfs/*
> > +	do
> > +		echo -e ""`basename $f`" state: "`cat $f/state`""
> > +	done
> > +
> > +	sleep 5
> > +
> > +	# Kill MPSSD now
> > +	echo -n $"Killing MPSSD"
> > +	killall -9 mpssd 2>/dev/null
> > +	RETVAL=$?
> > +	if [ $RETVAL -ne 0 ]; then
> > +		failure
> > +	else
> > +		success
> > +	fi
> > +	echo
> > +	return $RETVAL
> > +}
> > +
> > +restart()
> > +{
> > +	stop
> > +	sleep 5
> > +	start
> > +}
> > +
> > +status()
> > +{
> > +	if [ -d "$sysfs" ]; then
> > +		for f in $sysfs/*
> > +		do
> > +			echo -e ""`basename $f`" state: "`cat $f/state`""
> > +		done
> > +	fi
> > +
> > +	if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then
> > +		echo "mpssd is running"
> > +	else
> > +		echo "mpssd is stopped"
> > +	fi
> > +	return 0
> > +}
> > +
> > +unload()
> > +{
> > +	if [ ! -d "$sysfs" ]; then
> > +		echo -n $"No MIC_HOST Module: "
> > +		killall -9 mpssd 2>/dev/null
> > +		success
> > +		echo
> > +		return
> > +	fi
> > +
> > +	stop
> > +	RETVAL=$?
> > +
> > +	sleep 5
> > +	echo -n $"Removing MIC_HOST Module: "
> > +
> > +	if [ $RETVAL = 0 ]; then
> > +		sleep 1
> > +		modprobe -r mic_host
> > +		RETVAL=$?
> > +	fi
> > +
> > +	if [ $RETVAL -ne 0 ]; then
> > +		failure
> > +	else
> > +		success
> > +	fi
> > +	echo
> > +	return $RETVAL
> > +}
> > +
> > +case $1 in
> > +	start)
> > +		start
> > +		;;
> > +	stop)
> > +		stop
> > +		;;
> > +	restart)
> > +		restart
> > +		;;
> > +	status)
> > +		status
> > +		;;
> > +	unload)
> > +		unload
> > +		;;
> > +	*)
> > +		echo $"Usage: $0 {start|stop|restart|status|unload}"
> > +		exit 2
> > +esac
> > +
> > +exit $?
> > diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c
> > new file mode 100644
> > index 0000000..3bc34cb
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/mpssd.c
> > @@ -0,0 +1,1689 @@
> > +/*
> > + * Intel MIC Platform Software Stack (MPSS)
> > + *
> > + * Copyright(c) 2013 Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * General Public License for more details.
> > + *
> > + * The full GNU General Public License is included in this distribution in
> > + * the file called "COPYING".
> > + *
> > + * Intel MIC User Space Tools.
> > + */
> > +
> > +#define _GNU_SOURCE
> > +
> > +#include <stdlib.h>
> > +#include <fcntl.h>
> > +#include <getopt.h>
> > +#include <assert.h>
> > +#include <unistd.h>
> > +#include <stdbool.h>
> > +#include <signal.h>
> > +#include <poll.h>
> > +#include <features.h>
> > +#include <sys/types.h>
> > +#include <sys/stat.h>
> > +#include <sys/mman.h>
> > +#include <sys/socket.h>
> > +#include <linux/virtio_ring.h>
> > +#include <linux/virtio_net.h>
> > +#include <linux/virtio_console.h>
> > +#include <linux/virtio_blk.h>
> > +#include <linux/version.h>
> > +#include "mpssd.h"
> > +#include <linux/mic_ioctl.h>
> > +#include <linux/mic_common.h>
> > +
> > +static void init_mic(struct mic_info *mic);
> > +
> > +static FILE *logfp;
> > +static struct mic_info mic_list;
> > +
> > +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
> > +
> > +#define min_t(type, x, y) ({				\
> > +		type __min1 = (x);                      \
> > +		type __min2 = (y);                      \
> > +		__min1 < __min2 ? __min1 : __min2; })
> > +
> > +/* align addr on a size boundary - adjust address up/down if needed */
> > +#define _ALIGN_UP(addr, size)    (((addr)+((size)-1))&(~((size)-1)))
> > +#define _ALIGN_DOWN(addr, size)  ((addr)&(~((size)-1)))
> > +
> > +/* align addr on a size boundary - adjust address up if needed */
> > +#define _ALIGN(addr, size)     _ALIGN_UP(addr, size)
> > +
> > +/* to align the pointer to the (next) page boundary */
> > +#define PAGE_ALIGN(addr)        _ALIGN(addr, PAGE_SIZE)
> > +
> > +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> > +
> > +/* Insert REP NOP (PAUSE) in busy-wait loops. */
> > +static inline void cpu_relax(void)
> > +{
> > +	asm volatile("rep; nop" : : : "memory");
> > +}
> > +
> > +#define GSO_ENABLED		1
> > +#define MAX_GSO_SIZE		(64 * 1024)
> > +#define ETH_H_LEN		14
> > +#define MAX_NET_PKT_SIZE	(_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64))
> > +#define MIC_DEVICE_PAGE_END	0x1000
> > +
> > +#ifndef VIRTIO_NET_HDR_F_DATA_VALID
> > +#define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
> > +#endif
> > +
> > +static struct {
> > +	struct mic_device_desc dd;
> > +	struct mic_vqconfig vqconfig[2];
> > +	__u32 host_features, guest_acknowledgements;
> > +	struct virtio_console_config cons_config;
> > +} virtcons_dev_page = {
> > +	.dd = {
> > +		.type = VIRTIO_ID_CONSOLE,
> > +		.num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig),
> > +		.feature_len = sizeof(virtcons_dev_page.host_features),
> > +		.config_len = sizeof(virtcons_dev_page.cons_config),
> > +	},
> > +	.vqconfig[0] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +	.vqconfig[1] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +};
> > +
> > +static struct {
> > +	struct mic_device_desc dd;
> > +	struct mic_vqconfig vqconfig[2];
> > +	__u32 host_features, guest_acknowledgements;
> > +	struct virtio_net_config net_config;
> > +} virtnet_dev_page = {
> > +	.dd = {
> > +		.type = VIRTIO_ID_NET,
> > +		.num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig),
> > +		.feature_len = sizeof(virtnet_dev_page.host_features),
> > +		.config_len = sizeof(virtnet_dev_page.net_config),
> > +	},
> > +	.vqconfig[0] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +	.vqconfig[1] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +#if GSO_ENABLED
> > +		.host_features = htole32(
> > +		1 << VIRTIO_NET_F_CSUM |
> > +		1 << VIRTIO_NET_F_GSO |
> > +		1 << VIRTIO_NET_F_GUEST_TSO4 |
> > +		1 << VIRTIO_NET_F_GUEST_TSO6 |
> > +		1 << VIRTIO_NET_F_GUEST_ECN |
> > +		1 << VIRTIO_NET_F_GUEST_UFO),
> > +#else
> > +		.host_features = 0,
> > +#endif
> > +};
> > +
> > +static const char *mic_config_dir = "/etc/sysconfig/mic";
> > +static const char *virtblk_backend = "VIRTBLK_BACKEND";
> > +static struct {
> > +	struct mic_device_desc dd;
> > +	struct mic_vqconfig vqconfig[1];
> > +	__u32 host_features, guest_acknowledgements;
> > +	struct virtio_blk_config blk_config;
> > +} virtblk_dev_page = {
> > +	.dd = {
> > +		.type = VIRTIO_ID_BLOCK,
> > +		.num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig),
> > +		.feature_len = sizeof(virtblk_dev_page.host_features),
> > +		.config_len = sizeof(virtblk_dev_page.blk_config),
> > +	},
> > +	.vqconfig[0] = {
> > +		.num = htole16(MIC_VRING_ENTRIES),
> > +	},
> > +	.host_features =
> > +		htole32(1<<VIRTIO_BLK_F_SEG_MAX),
> > +	.blk_config = {
> > +		.seg_max = htole32(MIC_VRING_ENTRIES - 2),
> > +		.capacity = htole64(0),
> > +	 }
> > +};
> > +
> > +static char *myname;
> > +
> > +static int
> > +tap_configure(struct mic_info *mic, char *dev)
> > +{
> > +	pid_t pid;
> > +	char *ifargv[7];
> > +	char ipaddr[IFNAMSIZ];
> > +	int ret = 0;
> > +
> > +	pid = fork();
> > +	if (pid == 0) {
> > +		ifargv[0] = "ip";
> > +		ifargv[1] = "link";
> > +		ifargv[2] = "set";
> > +		ifargv[3] = dev;
> > +		ifargv[4] = "up";
> > +		ifargv[5] = NULL;
> > +		mpsslog("Configuring %s\n", dev);
> > +		ret = execvp("ip", ifargv);
> > +		if (ret < 0) {
> > +			mpsslog("%s execvp failed errno %s\n",
> > +				mic->name, strerror(errno));
> > +			return ret;
> > +		}
> > +	}
> > +	if (pid < 0) {
> > +		mpsslog("%s fork failed errno %s\n",
> > +			mic->name, strerror(errno));
> > +		return ret;
> > +	}
> > +
> > +	ret = waitpid(pid, NULL, 0);
> > +	if (ret < 0) {
> > +		mpsslog("%s waitpid failed errno %s\n",
> > +			mic->name, strerror(errno));
> > +		return ret;
> > +	}
> > +
> > +	snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id);
> > +
> > +	pid = fork();
> > +	if (pid == 0) {
> > +		ifargv[0] = "ip";
> > +		ifargv[1] = "addr";
> > +		ifargv[2] = "add";
> > +		ifargv[3] = ipaddr;
> > +		ifargv[4] = "dev";
> > +		ifargv[5] = dev;
> > +		ifargv[6] = NULL;
> > +		mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr);
> > +		ret = execvp("ip", ifargv);
> > +		if (ret < 0) {
> > +			mpsslog("%s execvp failed errno %s\n",
> > +				mic->name, strerror(errno));
> > +			return ret;
> > +		}
> > +	}
> > +	if (pid < 0) {
> > +		mpsslog("%s fork failed errno %s\n",
> > +			mic->name, strerror(errno));
> > +		return ret;
> > +	}
> > +
> > +	ret = waitpid(pid, NULL, 0);
> > +	if (ret < 0) {
> > +		mpsslog("%s waitpid failed errno %s\n",
> > +			mic->name, strerror(errno));
> > +		return ret;
> > +	}
> > +	mpsslog("MIC name %s %s %d DONE!\n",
> > +		mic->name, __func__, __LINE__);
> > +	return 0;
> > +}
> > +
> > +static int tun_alloc(struct mic_info *mic, char *dev)
> > +{
> > +	struct ifreq ifr;
> > +	int fd, err;
> > +#if GSO_ENABLED
> > +	unsigned offload;
> > +#endif
> > +	fd = open("/dev/net/tun", O_RDWR);
> > +	if (fd < 0) {
> > +		mpsslog("Could not open /dev/net/tun %s\n", strerror(errno));
> > +		goto done;
> > +	}
> > +
> > +	memset(&ifr, 0, sizeof(ifr));
> > +
> > +	ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR;
> > +	if (*dev)
> > +		strncpy(ifr.ifr_name, dev, IFNAMSIZ);
> > +
> > +	err = ioctl(fd, TUNSETIFF, (void *) &ifr);
> > +	if (err < 0) {
> > +		mpsslog("%s %s %d TUNSETIFF failed %s\n",
> > +			mic->name, __func__, __LINE__, strerror(errno));
> > +		close(fd);
> > +		return err;
> > +	}
> > +#if GSO_ENABLED
> > +	offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
> > +		TUN_F_TSO_ECN | TUN_F_UFO;
> > +
> > +	err = ioctl(fd, TUNSETOFFLOAD, offload);
> > +	if (err < 0) {
> > +		mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n",
> > +			mic->name, __func__, __LINE__, strerror(errno));
> > +		close(fd);
> > +		return err;
> > +	}
> > +#endif
> > +	strcpy(dev, ifr.ifr_name);
> > +	mpsslog("Created TAP %s\n", dev);
> > +done:
> > +	return fd;
> > +}
> > +
> > +#define NET_FD_VIRTIO_NET 0
> > +#define NET_FD_TUN 1
> > +#define MAX_NET_FD 2
> > +
> > +static void * *
> > +get_dp(struct mic_info *mic, int type)
> > +{
> > +	switch (type) {
> > +	case VIRTIO_ID_CONSOLE:
> > +		return &mic->mic_console.console_dp;
> > +	case VIRTIO_ID_NET:
> > +		return &mic->mic_net.net_dp;
> > +	case VIRTIO_ID_BLOCK:
> > +		return &mic->mic_virtblk.block_dp;
> > +	}
> > +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> > +	assert(0);
> > +	return NULL;
> > +}
> > +
> > +static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type)
> > +{
> > +	struct mic_device_desc *d;
> > +	int i;
> > +	void *dp = *get_dp(mic, type);
> > +
> > +	for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE;
> > +		i += mic_total_desc_size(d)) {
> > +		d = dp + i;
> > +
> > +		/* End of list */
> > +		if (d->type == 0)
> > +			break;
> > +
> > +		if (d->type == -1)
> > +			continue;
> > +
> > +		mpsslog("%s %s d-> type %d d %p\n",
> > +			mic->name, __func__, d->type, d);
> > +
> > +		if (d->type == (__u8)type)
> > +			return d;
> > +	}
> > +	mpsslog("%s %s %d not found\n", mic->name, __func__, type);
> > +	assert(0);
> > +	return NULL;
> > +}
> > +
> > +/* See comments in vhost.c for explanation of next_desc() */
> > +static unsigned next_desc(struct vring_desc *desc)
> > +{
> > +	unsigned int next;
> > +
> > +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT))
> > +		return -1U;
> > +	next = le16toh(desc->next);
> > +	return next;
> > +}
> > +
> > +/* Sum up all the IOVEC length */
> > +static ssize_t
> > +sum_iovec_len(struct mic_copy_desc *copy)
> > +{
> > +	ssize_t sum = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < copy->iovcnt; i++)
> > +		sum += copy->iov[i].iov_len;
> > +	return sum;
> > +}
> > +
> > +static inline void verify_out_len(struct mic_info *mic,
> > +	struct mic_copy_desc *copy)
> > +{
> > +	if (copy->out_len != sum_iovec_len(copy)) {
> > +		mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%x\n",
> > +				mic->name, __func__, __LINE__,
> > +				copy->out_len, sum_iovec_len(copy));
> > +		assert(copy->out_len == sum_iovec_len(copy));
> > +	}
> > +}
> > +
> > +/* Display an iovec */
> > +static void
> > +disp_iovec(struct mic_info *mic, struct mic_copy_desc *copy,
> > +	const char *s, int line)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < copy->iovcnt; i++)
> > +		mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%lx\n",
> > +			mic->name, s, line, i,
> > +			copy->iov[i].iov_base, copy->iov[i].iov_len);
> > +}
> > +
> > +static inline __u16 read_avail_idx(struct mic_vring *vr)
> > +{
> > +	return ACCESS_ONCE(vr->info->avail_idx);
> > +}
> > +
> > +static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr,
> > +				struct mic_copy_desc *copy, ssize_t len)
> > +{
> > +	copy->vr_idx = tx ? 0 : 1;
> > +	copy->update_used = true;
> > +	if (type == VIRTIO_ID_NET)
> > +		copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr);
> > +	else
> > +		copy->iov[0].iov_len = len;
> > +}
> > +
> > +/* Central API which triggers the copies */
> > +static int
> > +mic_virtio_copy(struct mic_info *mic, int fd,
> > +	struct mic_vring *vr, struct mic_copy_desc *copy)
> > +{
> > +	int ret;
> > +
> > +	ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, copy);
> > +	if (ret) {
> > +		mpsslog("%s %s %d errno %s ret %d\n",
> > +			mic->name, __func__, __LINE__,
> > +			strerror(errno), ret);
> > +	}
> > +	return ret;
> > +}
> > +
> > +/*
> > + * This initialization routine requires at least one
> > + * vring i.e. vr0. vr1 is optional.
> > + */
> > +static void *
> > +init_vr(struct mic_info *mic, int fd, int type,
> > +	struct mic_vring *vr0, struct mic_vring *vr1, int num_vq)
> > +{
> > +	int vr_size;
> > +	char *va;
> > +
> > +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> > +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> > +	va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq,
> > +		PROT_READ, MAP_SHARED, fd, 0);
> > +	if (MAP_FAILED == va) {
> > +		mpsslog("%s %s %d mmap failed errno %s\n",
> > +			mic->name, __func__, __LINE__,
> > +			strerror(errno));
> > +		goto done;
> > +	}
> > +	*get_dp(mic, type) = (void *)va;
> > +	vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END];
> > +	vr0->info = vr0->va +
> > +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN);
> > +	vring_init(&vr0->vr,
> > +		MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN);
> > +	mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ",
> > +		__func__, mic->name, vr0->va, vr0->info, vr_size,
> > +		vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> > +	mpsslog("magic 0x%x expected 0x%x\n",
> > +		vr0->info->magic, MIC_MAGIC + type + 0);
> > +	assert(vr0->info->magic == MIC_MAGIC + type + 0);
> > +	if (vr1) {
> > +		vr1->va = (struct mic_vring *)
> > +			&va[MIC_DEVICE_PAGE_END + vr_size];
> > +		vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES,
> > +			MIC_VIRTIO_RING_ALIGN);
> > +		vring_init(&vr1->vr,
> > +			MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN);
> > +		mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ",
> > +			__func__, mic->name, vr1->va, vr1->info, vr_size,
> > +			vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN));
> > +		mpsslog("magic 0x%x expected 0x%x\n",
> > +			vr1->info->magic, MIC_MAGIC + type + 1);
> > +		assert(vr1->info->magic == MIC_MAGIC + type + 1);
> > +	}
> > +done:
> > +	return va;
> > +}
> > +
> > +static void
> > +uninit_vr(struct mic_info *mic, int num_vq)
> > +{
> > +	int vr_size, ret;
> > +
> > +	vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES,
> > +		MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info));
> > +	ret = munmap(mic->mic_virtblk.block_dp,
> > +		MIC_DEVICE_PAGE_END + vr_size * num_vq);
> > +	if (ret < 0)
> > +		mpsslog("%s munmap errno %d\n", mic->name, errno);
> > +}
> > +
> > +static void
> > +wait_for_card_driver(struct mic_info *mic, int fd, int type)
> > +{
> > +	struct pollfd pollfd;
> > +	int err;
> > +	struct mic_device_desc *desc = get_device_desc(mic, type);
> > +
> > +	pollfd.fd = fd;
> > +	mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n",
> > +		mic->name, __func__, type, desc->status);
> > +	while (1) {
> > +		pollfd.events = POLLIN;
> > +		pollfd.revents = 0;
> > +		err = poll(&pollfd, 1, -1);
> > +		if (err < 0) {
> > +			mpsslog("%s %s poll failed %s\n",
> > +				mic->name, __func__, strerror(errno));
> > +			continue;
> > +		}
> > +
> > +		if (pollfd.revents) {
> > +			mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n",
> > +				mic->name, __func__, type, desc->status);
> > +			if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > +				mpsslog("%s %s poll.revents %d\n",
> > +					mic->name, __func__, pollfd.revents);
> > +				mpsslog("%s %s desc-> type %d status 0x%x\n",
> > +					mic->name, __func__, type,
> > +					desc->status);
> > +				break;
> > +			}
> > +		}
> > +	}
> > +}
> > +
> > +/* Spin till we have some descriptors */
> > +static void
> > +wait_for_descriptors(struct mic_info *mic, struct mic_vring *vr)
> > +{
> > +	__u16 avail_idx = read_avail_idx(vr);
> > +
> > +	while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) {
> > +#ifdef DEBUG
> > +		mpsslog("%s %s waiting for desc avail %d info_avail %d\n",
> > +			mic->name, __func__,
> > +			le16toh(vr->vr.avail->idx), vr->info->avail_idx);
> > +#endif
> > +		cpu_relax();
> > +	}
> > +}
> > +
> > +static void *
> > +virtio_net(void *arg)
> > +{
> > +	static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)];
> > +	static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64);
> > +	struct iovec vnet_iov[2][2] = {
> > +		{ { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) },
> > +		  { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } },
> > +		{ { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) },
> > +		  { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } },
> > +	};
> > +	struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1];
> > +	struct mic_info *mic = (struct mic_info *)arg;
> > +	char if_name[IFNAMSIZ];
> > +	struct pollfd net_poll[MAX_NET_FD];
> > +	struct mic_vring tx_vr, rx_vr;
> > +	struct mic_copy_desc copy;
> > +	struct mic_device_desc *desc;
> > +	int err;
> > +
> > +	snprintf(if_name, IFNAMSIZ, "mic%d", mic->id);
> > +	mic->mic_net.tap_fd = tun_alloc(mic, if_name);
> > +	if (mic->mic_net.tap_fd < 0)
> > +		goto done;
> > +
> > +	if (tap_configure(mic, if_name))
> > +		goto done;
> > +	mpsslog("MIC name %s id %d\n", mic->name, mic->id);
> > +
> > +	net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd;
> > +	net_poll[NET_FD_VIRTIO_NET].events = POLLIN;
> > +	net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd;
> > +	net_poll[NET_FD_TUN].events = POLLIN;
> > +
> > +	if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd,
> > +		VIRTIO_ID_NET, &tx_vr, &rx_vr,
> > +		virtnet_dev_page.dd.num_vq)) {
> > +		mpsslog("%s init_vr failed %s\n",
> > +			mic->name, strerror(errno));
> > +		goto done;
> > +	}
> > +
> > +	copy.iovcnt = 2;
> > +	desc = get_device_desc(mic, VIRTIO_ID_NET);
> > +
> > +	while (1) {
> > +		ssize_t len;
> > +
> > +		net_poll[NET_FD_VIRTIO_NET].revents = 0;
> > +		net_poll[NET_FD_TUN].revents = 0;
> > +
> > +		/* Start polling for data from tap and virtio net */
> > +		err = poll(net_poll, 2, -1);
> > +		if (err < 0) {
> > +			mpsslog("%s poll failed %s\n",
> > +				__func__, strerror(errno));
> > +			continue;
> > +		}
> > +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > +			wait_for_card_driver(mic, mic->mic_net.virtio_net_fd,
> > +					VIRTIO_ID_NET);
> > +		/*
> > +		 * Check if there is data to be read from TUN and write to
> > +		 * virtio net fd if there is.
> > +		 */
> > +		if (net_poll[NET_FD_TUN].revents & POLLIN) {
> > +			copy.iov = iov0;
> > +			len = readv(net_poll[NET_FD_TUN].fd,
> > +				copy.iov, copy.iovcnt);
> > +			if (len > 0) {
> > +				struct virtio_net_hdr *hdr
> > +					= (struct virtio_net_hdr *) vnet_hdr[0];
> > +
> > +				/* Disable checksums on the card since we are on
> > +				   a reliable PCIe link */
> > +				hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID;
> > +#ifdef DEBUG
> > +				mpsslog("%s %s %d hdr->flags 0x%x ", mic->name,
> > +					__func__, __LINE__, hdr->flags);
> > +				mpsslog("copy.out_len %d hdr->gso_type 0x%x\n",
> > +					copy.out_len, hdr->gso_type);
> > +#endif
> > +#ifdef DEBUG
> > +				disp_iovec(mic, copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d read from tap 0x%lx\n",
> > +					mic->name, __func__, __LINE__,
> > +					len);
> > +#endif
> > +				wait_for_descriptors(mic, &tx_vr);
> > +				txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, &copy,
> > +					len);
> > +
> > +				err = mic_virtio_copy(mic,
> > +					mic->mic_net.virtio_net_fd, &tx_vr,
> > +					&copy);
> > +				if (err < 0) {
> > +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> > +						mic->name, __func__, __LINE__,
> > +						strerror(errno));
> > +				}
> > +				if (!err)
> > +					verify_out_len(mic, &copy);
> > +#ifdef DEBUG
> > +				disp_iovec(mic, copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> > +					mic->name, __func__, __LINE__,
> > +					sum_iovec_len(&copy));
> > +#endif
> > +				/* Reinitialize IOV for next run */
> > +				iov0[1].iov_len = MAX_NET_PKT_SIZE;
> > +			} else if (len < 0) {
> > +				disp_iovec(mic, &copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d read failed %s ", mic->name,
> > +					__func__, __LINE__, strerror(errno));
> > +				mpsslog("cnt %d sum %d\n",
> > +					copy.iovcnt, sum_iovec_len(&copy));
> > +			}
> > +		}
> > +
> > +		/*
> > +		 * Check if there is data to be read from virtio net and
> > +		 * write to TUN if there is.
> > +		 */
> > +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) {
> > +			while (rx_vr.info->avail_idx !=
> > +				le16toh(rx_vr.vr.avail->idx)) {
> > +				copy.iov = iov1;
> > +				txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, &copy,
> > +					MAX_NET_PKT_SIZE
> > +					+ sizeof(struct virtio_net_hdr));
> > +
> > +				err = mic_virtio_copy(mic,
> > +					mic->mic_net.virtio_net_fd, &rx_vr,
> > +					&copy);
> > +				if (!err) {
> > +#ifdef DEBUG
> > +					struct virtio_net_hdr *hdr
> > +						= (struct virtio_net_hdr *)
> > +							vnet_hdr[1];
> > +
> > +					mpsslog("%s %s %d hdr->flags 0x%x, ",
> > +						mic->name, __func__, __LINE__,
> > +						hdr->flags);
> > +					mpsslog("out_len %d gso_type 0x%x\n",
> > +						copy.out_len,
> > +						hdr->gso_type);
> > +#endif
> > +					/* Set the correct output iov_len */
> > +					iov1[1].iov_len = copy.out_len -
> > +						sizeof(struct virtio_net_hdr);
> > +					verify_out_len(mic, &copy);
> > +#ifdef DEBUG
> > +					disp_iovec(mic, copy, __func__,
> > +						__LINE__);
> > +					mpsslog("%s %s %d ",
> > +						mic->name, __func__, __LINE__);
> > +					mpsslog("read from net 0x%lx\n",
> > +						sum_iovec_len(copy));
> > +#endif
> > +					len = writev(net_poll[NET_FD_TUN].fd,
> > +						copy.iov, copy.iovcnt);
> > +					if (len != sum_iovec_len(&copy)) {
> > +						mpsslog("Tun write failed %s ",
> > +							strerror(errno));
> > +						mpsslog("len 0x%x ", len);
> > +						mpsslog("read_len 0x%x\n",
> > +							sum_iovec_len(&copy));
> > +					} else {
> > +#ifdef DEBUG
> > +						disp_iovec(mic, &copy, __func__,
> > +							__LINE__);
> > +						mpsslog("%s %s %d ",
> > +							mic->name, __func__,
> > +							__LINE__);
> > +						mpsslog("wrote to tap 0x%lx\n",
> > +							len);
> > +#endif
> > +					}
> > +				} else {
> > +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> > +						mic->name, __func__, __LINE__,
> > +						strerror(errno));
> > +					break;
> > +				}
> > +			}
> > +		}
> > +		if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> > +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> > +			sleep(1);
> > +		}
> > +	}
> > +done:
> > +	pthread_exit(NULL);
> > +}
> > +
> > +/* virtio_console */
> > +#define VIRTIO_CONSOLE_FD 0
> > +#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1)
> > +#define MAX_CONSOLE_FD (MONITOR_FD + 1)  /* must be the last one + 1 */
> > +#define MAX_BUFFER_SIZE PAGE_SIZE
> > +
> > +static void *
> > +virtio_console(void *arg)
> > +{
> > +	static __u8 vcons_buf[2][PAGE_SIZE];
> > +	struct iovec vcons_iov[2] = {
> > +		{ .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) },
> > +		{ .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) },
> > +	};
> > +	struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1];
> > +	struct mic_info *mic = (struct mic_info *)arg;
> > +	int err;
> > +	struct pollfd console_poll[MAX_CONSOLE_FD];
> > +	int pty_fd;
> > +	char *pts_name;
> > +	ssize_t len;
> > +	struct mic_vring tx_vr, rx_vr;
> > +	struct mic_copy_desc copy;
> > +	struct mic_device_desc *desc;
> > +
> > +	pty_fd = posix_openpt(O_RDWR);
> > +	if (pty_fd < 0) {
> > +		mpsslog("can't open a pseudoterminal master device: %s\n",
> > +			strerror(errno));
> > +		goto _return;
> > +	}
> > +	pts_name = ptsname(pty_fd);
> > +	if (pts_name == NULL) {
> > +		mpsslog("can't get pts name\n");
> > +		goto _close_pty;
> > +	}
> > +	printf("%s console message goes to %s\n", mic->name, pts_name);
> > +	mpsslog("%s console message goes to %s\n", mic->name, pts_name);
> > +	err = grantpt(pty_fd);
> > +	if (err < 0) {
> > +		mpsslog("can't grant access: %s %s\n",
> > +				pts_name, strerror(errno));
> > +		goto _close_pty;
> > +	}
> > +	err = unlockpt(pty_fd);
> > +	if (err < 0) {
> > +		mpsslog("can't unlock a pseudoterminal: %s %s\n",
> > +				pts_name, strerror(errno));
> > +		goto _close_pty;
> > +	}
> > +	console_poll[MONITOR_FD].fd = pty_fd;
> > +	console_poll[MONITOR_FD].events = POLLIN;
> > +
> > +	console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd;
> > +	console_poll[VIRTIO_CONSOLE_FD].events = POLLIN;
> > +
> > +	if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd,
> > +		VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr,
> > +		virtcons_dev_page.dd.num_vq)) {
> > +		mpsslog("%s init_vr failed %s\n",
> > +			mic->name, strerror(errno));
> > +		goto _close_pty;
> > +	}
> > +
> > +	copy.iovcnt = 1;
> > +	desc = get_device_desc(mic, VIRTIO_ID_CONSOLE);
> > +
> > +	for (;;) {
> > +		console_poll[MONITOR_FD].revents = 0;
> > +		console_poll[VIRTIO_CONSOLE_FD].revents = 0;
> > +		err = poll(console_poll, MAX_CONSOLE_FD, -1);
> > +		if (err < 0) {
> > +			mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__,
> > +				strerror(errno));
> > +			continue;
> > +		}
> > +		if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK))
> > +			wait_for_card_driver(mic,
> > +				mic->mic_console.virtio_console_fd,
> > +				VIRTIO_ID_CONSOLE);
> > +
> > +		if (console_poll[MONITOR_FD].revents & POLLIN) {
> > +			copy.iov = iov0;
> > +			len = readv(pty_fd, copy.iov, copy.iovcnt);
> > +			if (len > 0) {
> > +#ifdef DEBUG
> > +				disp_iovec(mic, copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d read from tap 0x%lx\n",
> > +					mic->name, __func__, __LINE__,
> > +					len);
> > +#endif
> > +				wait_for_descriptors(mic, &tx_vr);
> > +				txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr,
> > +					&copy, len);
> > +
> > +				err = mic_virtio_copy(mic,
> > +					mic->mic_console.virtio_console_fd,
> > +					&tx_vr, &copy);
> > +				if (err < 0) {
> > +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> > +						mic->name, __func__, __LINE__,
> > +						strerror(errno));
> > +				}
> > +				if (!err)
> > +					verify_out_len(mic, &copy);
> > +#ifdef DEBUG
> > +				disp_iovec(mic, copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d wrote to net 0x%lx\n",
> > +					mic->name, __func__, __LINE__,
> > +					sum_iovec_len(copy));
> > +#endif
> > +				/* Reinitialize IOV for next run */
> > +				iov0->iov_len = PAGE_SIZE;
> > +			} else if (len < 0) {
> > +				disp_iovec(mic, &copy, __func__, __LINE__);
> > +				mpsslog("%s %s %d read failed %s ",
> > +					mic->name, __func__, __LINE__,
> > +					strerror(errno));
> > +				mpsslog("cnt %d sum %d\n",
> > +					copy.iovcnt, sum_iovec_len(&copy));
> > +			}
> > +		}
> > +
> > +		if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) {
> > +			while (rx_vr.info->avail_idx !=
> > +				le16toh(rx_vr.vr.avail->idx)) {
> > +				copy.iov = iov1;
> > +				txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr,
> > +					&copy, PAGE_SIZE);
> > +
> > +				err = mic_virtio_copy(mic,
> > +					mic->mic_console.virtio_console_fd,
> > +					&rx_vr, &copy);
> > +				if (!err) {
> > +					/* Set the correct output iov_len */
> > +					iov1->iov_len = copy.out_len;
> > +					verify_out_len(mic, &copy);
> > +#ifdef DEBUG
> > +					disp_iovec(mic, copy, __func__,
> > +						__LINE__);
> > +					mpsslog("%s %s %d ",
> > +						mic->name, __func__, __LINE__);
> > +					mpsslog("read from net 0x%lx\n",
> > +						sum_iovec_len(copy));
> > +#endif
> > +					len = writev(pty_fd,
> > +						copy.iov, copy.iovcnt);
> > +					if (len != sum_iovec_len(&copy)) {
> > +						mpsslog("Tun write failed %s ",
> > +							strerror(errno));
> > +						mpsslog("len 0x%x ", len);
> > +						mpsslog("read_len 0x%x\n",
> > +							sum_iovec_len(&copy));
> > +					} else {
> > +#ifdef DEBUG
> > +						disp_iovec(mic, copy, __func__,
> > +							__LINE__);
> > +						mpsslog("%s %s %d ",
> > +							mic->name, __func__,
> > +							__LINE__);
> > +						mpsslog("wrote to tap 0x%lx\n",
> > +							len);
> > +#endif
> > +					}
> > +				} else {
> > +					mpsslog("%s %s %d mic_virtio_copy %s\n",
> > +						mic->name, __func__, __LINE__,
> > +						strerror(errno));
> > +					break;
> > +				}
> > +			}
> > +		}
> > +		if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) {
> > +			mpsslog("%s: %s: POLLERR\n", __func__, mic->name);
> > +			sleep(1);
> > +		}
> > +	}
> > +_close_pty:
> > +	close(pty_fd);
> > +_return:
> > +	pthread_exit(NULL);
> > +}
> > +
> > +static void
> > +add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd)
> > +{
> > +	char path[PATH_MAX];
> > +	int fd, err;
> > +
> > +	snprintf(path, PATH_MAX, "/dev/mic%d", mic->id);
> > +	fd = open(path, O_RDWR);
> > +	if (fd < 0) {
> > +		mpsslog("Could not open %s %s\n", path, strerror(errno));
> > +		return;
> > +	}
> > +
> > +	err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd);
> > +	if (err < 0) {
> > +		mpsslog("Could not add %d %s\n", dd->type, strerror(errno));
> > +		close(fd);
> > +		return;
> > +	}
> > +	switch (dd->type) {
> > +	case VIRTIO_ID_NET:
> > +		mic->mic_net.virtio_net_fd = fd;
> > +		mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name);
> > +		break;
> > +	case VIRTIO_ID_CONSOLE:
> > +		mic->mic_console.virtio_console_fd = fd;
> > +		mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name);
> > +		break;
> > +	case VIRTIO_ID_BLOCK:
> > +		mic->mic_virtblk.virtio_block_fd = fd;
> > +		mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name);
> > +		break;
> > +	}
> > +}
> > +
> > +static bool
> > +set_backend_file(struct mic_info *mic)
> > +{
> > +	FILE *config;
> > +	char buff[PATH_MAX], *line, *evv, *p;
> > +
> > +	snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id);
> > +	config = fopen(buff, "r");
> > +	if (config == NULL)
> > +		return false;
> > +	do {  /* look for "virtblk_backend=XXXX" */
> > +		line = fgets(buff, PATH_MAX, config);
> > +		if (line == NULL)
> > +			break;
> > +		if (*line == '#')
> > +			continue;
> > +		p = strchr(line, '\n');
> > +		if (p)
> > +			*p = '\0';
> > +	} while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0);
> > +	fclose(config);
> > +	if (line == NULL)
> > +		return false;
> > +	evv = strchr(line, '=');
> > +	if (evv == NULL)
> > +		return false;
> > +	mic->mic_virtblk.backend_file = malloc(strlen(evv));
> > +	if (mic->mic_virtblk.backend_file == NULL) {
> > +		mpsslog("can't allocate memory\n", mic->name, mic->id);
> > +		return false;
> > +	}
> > +	strcpy(mic->mic_virtblk.backend_file, evv + 1);
> > +	return true;
> > +}
> > +
> > +#define SECTOR_SIZE 512
> > +static bool
> > +set_backend_size(struct mic_info *mic)
> > +{
> > +	mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0,
> > +		SEEK_END);
> > +	if (mic->mic_virtblk.backend_size < 0) {
> > +		mpsslog("%s: can't seek: %s\n",
> > +			mic->name, mic->mic_virtblk.backend_file);
> > +		return false;
> > +	}
> > +	virtblk_dev_page.blk_config.capacity =
> > +		mic->mic_virtblk.backend_size / SECTOR_SIZE;
> > +	if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0)
> > +		virtblk_dev_page.blk_config.capacity++;
> > +
> > +	virtblk_dev_page.blk_config.capacity =
> > +		htole64(virtblk_dev_page.blk_config.capacity);
> > +
> > +	return true;
> > +}
> > +
> > +static bool
> > +open_backend(struct mic_info *mic)
> > +{
> > +	if (!set_backend_file(mic))
> > +		goto _error_exit;
> > +	mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR);
> > +	if (mic->mic_virtblk.backend < 0) {
> > +		mpsslog("%s: can't open: %s\n", mic->name,
> > +			mic->mic_virtblk.backend_file);
> > +		goto _error_free;
> > +	}
> > +	if (!set_backend_size(mic))
> > +		goto _error_close;
> > +	mic->mic_virtblk.backend_addr = mmap(NULL,
> > +		mic->mic_virtblk.backend_size,
> > +		PROT_READ|PROT_WRITE, MAP_SHARED,
> > +		mic->mic_virtblk.backend, 0L);
> > +	if (mic->mic_virtblk.backend_addr == MAP_FAILED) {
> > +		mpsslog("%s: can't map: %s %s\n",
> > +			mic->name, mic->mic_virtblk.backend_file,
> > +			strerror(errno));
> > +		goto _error_close;
> > +	}
> > +	return true;
> > +
> > + _error_close:
> > +	close(mic->mic_virtblk.backend);
> > + _error_free:
> > +	free(mic->mic_virtblk.backend_file);
> > + _error_exit:
> > +	return false;
> > +}
> > +
> > +static void
> > +close_backend(struct mic_info *mic)
> > +{
> > +	munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size);
> > +	close(mic->mic_virtblk.backend);
> > +	free(mic->mic_virtblk.backend_file);
> > +}
> > +
> > +static bool
> > +start_virtblk(struct mic_info *mic, struct mic_vring *vring)
> > +{
> > +	if (((__u64)&virtblk_dev_page.blk_config % 8) != 0) {
> > +		mpsslog("%s: blk_config is not 8 byte aligned.\n",
> > +			mic->name);
> > +		return false;
> > +	}
> > +	add_virtio_device(mic, &virtblk_dev_page.dd);
> > +	if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd,
> > +		VIRTIO_ID_BLOCK, vring, NULL, virtblk_dev_page.dd.num_vq)) {
> > +		mpsslog("%s init_vr failed %s\n",
> > +			mic->name, strerror(errno));
> > +		return false;
> > +	}
> > +	return true;
> > +}
> > +
> > +static void
> > +stop_virtblk(struct mic_info *mic)
> > +{
> > +	uninit_vr(mic, virtblk_dev_page.dd.num_vq);
> > +	close(mic->mic_virtblk.virtio_block_fd);
> > +}
> > +
> > +static __u8
> > +header_error_check(struct vring_desc *desc)
> > +{
> > +	if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) {
> > +		mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n",
> > +				__func__, __LINE__);
> > +		return -EIO;
> > +	}
> > +	if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) {
> > +		mpsslog("%s() %d: alone\n",
> > +			__func__, __LINE__);
> > +		return -EIO;
> > +	}
> > +	if (le16toh(desc->flags) & VRING_DESC_F_WRITE) {
> > +		mpsslog("%s() %d: not read\n",
> > +			__func__, __LINE__);
> > +		return -EIO;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int
> > +read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx)
> > +{
> > +	struct iovec iovec;
> > +	struct mic_copy_desc copy;
> > +
> > +	iovec.iov_len = sizeof(*hdr);
> > +	iovec.iov_base = hdr;
> > +	copy.iov = &iovec;
> > +	copy.iovcnt = 1;
> > +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> > +	copy.update_used = false;  /* do not update used index */
> > +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> > +}
> > +
> > +static int
> > +transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt)
> > +{
> > +	struct mic_copy_desc copy;
> > +
> > +	copy.iov = iovec;
> > +	copy.iovcnt = iovcnt;
> > +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> > +	copy.update_used = false;  /* do not update used index */
> > +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> > +}
> > +
> > +static __u8
> > +status_error_check(struct vring_desc *desc)
> > +{
> > +	if (le32toh(desc->len) != sizeof(__u8)) {
> > +		mpsslog("%s() %d: length is not sizeof(status)\n",
> > +			__func__, __LINE__);
> > +		return -EIO;
> > +	}
> > +	return 0;
> > +}
> > +
> > +static int
> > +write_status(int fd, __u8 *status)
> > +{
> > +	struct iovec iovec;
> > +	struct mic_copy_desc copy;
> > +
> > +	iovec.iov_base = status;
> > +	iovec.iov_len = sizeof(*status);
> > +	copy.iov = &iovec;
> > +	copy.iovcnt = 1;
> > +	copy.vr_idx = 0;  /* only one vring on virtio_block */
> > +	copy.update_used = true; /* Update used index */
> > +	return ioctl(fd, MIC_VIRTIO_COPY_DESC, &copy);
> > +}
> > +
> > +static void *
> > +virtio_block(void *arg)
> > +{
> > +	struct mic_info *mic = (struct mic_info *) arg;
> > +	int ret;
> > +	struct pollfd block_poll;
> > +	struct mic_vring vring;
> > +	__u16 avail_idx;
> > +	__u32 desc_idx;
> > +	struct vring_desc *desc;
> > +	struct iovec *iovec, *piov;
> > +	__u8 status;
> > +	__u32 buffer_desc_idx;
> > +	struct virtio_blk_outhdr hdr;
> > +	void *fos;
> > +
> > +	for (;;) {  /* forever */
> > +		if (!open_backend(mic)) { /* No virtblk */
> > +			for (mic->mic_virtblk.signaled = 0;
> > +				!mic->mic_virtblk.signaled;)
> > +				sleep(1);
> > +			continue;
> > +		}
> > +
> > +		/* backend file is specified. */
> > +		if (!start_virtblk(mic, &vring))
> > +			goto _close_backend;
> > +		iovec = malloc(sizeof(*iovec) *
> > +			le32toh(virtblk_dev_page.blk_config.seg_max));
> > +		if (!iovec) {
> > +			mpsslog("%s: can't alloc iovec: %s\n",
> > +				mic->name, strerror(ENOMEM));
> > +			goto _stop_virtblk;
> > +		}
> > +
> > +		block_poll.fd = mic->mic_virtblk.virtio_block_fd;
> > +		block_poll.events = POLLIN;
> > +		for (mic->mic_virtblk.signaled = 0;
> > +		     !mic->mic_virtblk.signaled;) {
> > +			block_poll.revents = 0;
> > +					/* timeout in 1 sec to see signaled */
> > +			ret = poll(&block_poll, 1, 1000);
> > +			if (ret < 0) {
> > +				mpsslog("%s %d: poll failed: %s\n",
> > +					__func__, __LINE__,
> > +					strerror(errno));
> > +				continue;
> > +			}
> > +
> > +			if (!(block_poll.revents & POLLIN)) {
> > +#ifdef DEBUG
> > +				mpsslog("%s %d: block_poll.revents=0x%x\n",
> > +					__func__, __LINE__, block_poll.revents);
> > +				sleep(1);
> > +#endif
> > +				continue;
> > +			}
> > +
> > +			/* POLLIN */
> > +			while (vring.info->avail_idx !=
> > +				le16toh(vring.vr.avail->idx)) {
> > +				/* read header element */
> > +				avail_idx =
> > +					vring.info->avail_idx &
> > +					(vring.vr.num - 1);
> > +				desc_idx = le16toh(
> > +					vring.vr.avail->ring[avail_idx]);
> > +				desc = &vring.vr.desc[desc_idx];
> > +#ifdef DEBUG
> > +				mpsslog("%s() %d: avail_idx=%d ",
> > +					__func__, __LINE__,
> > +					vring.info->avail_idx);
> > +				mpsslog("vring.vr.num=%d desc=%p\n",
> > +					vring.vr.num, desc);
> > +#endif
> > +				status = header_error_check(desc);
> > +				ret = read_header(
> > +					mic->mic_virtblk.virtio_block_fd,
> > +					&hdr, desc_idx);
> > +				if (ret < 0) {
> > +					mpsslog("%s() %d %s: ret=%d %s\n",
> > +						__func__, __LINE__,
> > +						mic->name, ret,
> > +						strerror(errno));
> > +					break;
> > +				}
> > +				/* buffer element */
> > +				piov = iovec;
> > +				status = 0;
> > +				fos = mic->mic_virtblk.backend_addr +
> > +					(hdr.sector * SECTOR_SIZE);
> > +				buffer_desc_idx = desc_idx =
> > +					next_desc(desc);
> > +				for (desc = &vring.vr.desc[buffer_desc_idx];
> > +				     desc->flags & VRING_DESC_F_NEXT;
> > +				     desc_idx = next_desc(desc),
> > +					     desc = &vring.vr.desc[desc_idx]) {
> > +					piov->iov_len = desc->len;
> > +					piov->iov_base = fos;
> > +					piov++;
> > +					fos += desc->len;
> > +				}
> > +				/* Returning NULLs for VIRTIO_BLK_T_GET_ID. */
> > +				if (hdr.type & ~(VIRTIO_BLK_T_OUT |
> > +					VIRTIO_BLK_T_GET_ID)) {
> > +					/*
> > +					  VIRTIO_BLK_T_IN - does not do
> > +					  anything. Probably for documenting.
> > +					  VIRTIO_BLK_T_SCSI_CMD - for
> > +					  virtio_scsi.
> > +					  VIRTIO_BLK_T_FLUSH - turned off in
> > +					  config space.
> > +					  VIRTIO_BLK_T_BARRIER - defined but not
> > +					  used in anywhere.
> > +					*/
> > +					mpsslog("%s() %d: type %x ",
> > +						__func__, __LINE__,
> > +						hdr.type);
> > +					mpsslog("is not supported\n");
> > +					status = -ENOTSUP;
> > +
> > +				} else {
> > +					ret = transfer_blocks(
> > +					mic->mic_virtblk.virtio_block_fd,
> > +						iovec,
> > +						piov - iovec);
> > +					if (ret < 0 &&
> > +						status != 0)
> > +						status = ret;
> > +				}
> > +				/* write status and update used pointer */
> > +				if (status != 0)
> > +					status = status_error_check(desc);
> > +				ret = write_status(
> > +					mic->mic_virtblk.virtio_block_fd,
> > +					&status);
> > +#ifdef DEBUG
> > +				mpsslog("%s() %d: write status=%d on desc=%p\n",
> > +					__func__, __LINE__,
> > +					status, desc);
> > +#endif
> > +			}
> > +		}
> > +		free(iovec);
> > +_stop_virtblk:
> > +		stop_virtblk(mic);
> > +_close_backend:
> > +		close_backend(mic);
> > +	}  /* forever */
> > +
> > +	pthread_exit(NULL);
> > +}
> > +
> > +static void
> > +reset(struct mic_info *mic)
> > +{
> > +#define RESET_TIMEOUT 120
> > +	int i = RESET_TIMEOUT;
> > +	setsysfs(mic->name, "state", "reset");
> > +	while (i) {
> > +		char *state;
> > +		state = readsysfs(mic->name, "state");
> > +		if (!state)
> > +			goto retry;
> > +		mpsslog("%s: %s %d state %s\n",
> > +			mic->name, __func__, __LINE__, state);
> > +		if ((!strcmp(state, "offline"))) {
> > +			free(state);
> > +			break;
> > +		}
> > +		free(state);
> > +retry:
> > +		sleep(1);
> > +		i--;
> > +	}
> > +}
> > +
> > +static int
> > +get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status)
> > +{
> > +	if (!strcmp(shutdown_status, "nop"))
> > +		return MIC_NOP;
> > +	if (!strcmp(shutdown_status, "crashed"))
> > +		return MIC_CRASHED;
> > +	if (!strcmp(shutdown_status, "halted"))
> > +		return MIC_HALTED;
> > +	if (!strcmp(shutdown_status, "poweroff"))
> > +		return MIC_POWER_OFF;
> > +	if (!strcmp(shutdown_status, "restart"))
> > +		return MIC_RESTART;
> > +	mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status);
> > +	/* Invalid state */
> > +	assert(0);
> > +};
> > +
> > +static int get_mic_state(struct mic_info *mic, char *state)
> > +{
> > +	if (!strcmp(state, "offline"))
> > +		return MIC_OFFLINE;
> > +	if (!strcmp(state, "online"))
> > +		return MIC_ONLINE;
> > +	if (!strcmp(state, "shutting_down"))
> > +		return MIC_SHUTTING_DOWN;
> > +	if (!strcmp(state, "reset_failed"))
> > +		return MIC_RESET_FAILED;
> > +	mpsslog("%s: BUG invalid state %s\n", mic->name, state);
> > +	/* Invalid state */
> > +	assert(0);
> > +};
> > +
> > +static void mic_handle_shutdown(struct mic_info *mic)
> > +{
> > +#define SHUTDOWN_TIMEOUT 60
> > +	int i = SHUTDOWN_TIMEOUT, ret, stat = 0;
> > +	char *shutdown_status;
> > +	while (i) {
> > +		shutdown_status = readsysfs(mic->name, "shutdown_status");
> > +		if (!shutdown_status)
> > +			continue;
> > +		mpsslog("%s: %s %d shutdown_status %s\n",
> > +			mic->name, __func__, __LINE__, shutdown_status);
> > +		switch (get_mic_shutdown_status(mic, shutdown_status)) {
> > +		case MIC_RESTART:
> > +			mic->restart = 1;
> > +		case MIC_HALTED:
> > +		case MIC_POWER_OFF:
> > +		case MIC_CRASHED:
> > +			goto reset;
> > +		default:
> > +			break;
> > +		}
> > +		free(shutdown_status);
> > +		sleep(1);
> > +		i--;
> > +	}
> > +reset:
> > +	ret = kill(mic->pid, SIGTERM);
> > +	mpsslog("%s: %s %d kill pid %d ret %d\n",
> > +		mic->name, __func__, __LINE__,
> > +		mic->pid, ret);
> > +	if (!ret) {
> > +		ret = waitpid(mic->pid, &stat,
> > +			WIFSIGNALED(stat));
> > +		mpsslog("%s: %s %d waitpid ret %d pid %d\n",
> > +			mic->name, __func__, __LINE__,
> > +			ret, mic->pid);
> > +	}
> > +	if (ret == mic->pid)
> > +		reset(mic);
> > +}
> > +
> > +static void *
> > +mic_config(void *arg)
> > +{
> > +	struct mic_info *mic = (struct mic_info *)arg;
> > +	char *state = NULL;
> > +	char pathname[PATH_MAX];
> > +	int fd, ret;
> > +	struct pollfd ufds[1];
> > +	char value[4096];
> > +
> > +	snprintf(pathname, PATH_MAX - 1, "%s/%s/%s",
> > +		MICSYSFSDIR, mic->name, "state");
> > +
> > +	fd = open(pathname, O_RDONLY);
> > +	if (fd < 0) {
> > +		mpsslog("%s: opening file %s failed %s\n",
> > +			mic->name, pathname, strerror(errno));
> > +		goto error;
> > +	}
> > +
> > +	do {
> > +		ret = read(fd, value, sizeof(value));
> > +		if (ret < 0) {
> > +			mpsslog("%s: Failed to read sysfs entry '%s': %s\n",
> > +				mic->name, pathname, strerror(errno));
> > +			goto close_error1;
> > +		}
> > +retry:
> > +		state = readsysfs(mic->name, "state");
> > +		if (!state)
> > +			goto retry;
> > +		mpsslog("%s: %s %d state %s\n",
> > +			mic->name, __func__, __LINE__, state);
> > +		switch (get_mic_state(mic, state)) {
> > +		case MIC_SHUTTING_DOWN:
> > +			mic_handle_shutdown(mic);
> > +			goto close_error;
> > +		default:
> > +			break;
> > +		}
> > +		free(state);
> > +
> > +		ufds[0].fd = fd;
> > +		ufds[0].events = POLLERR | POLLPRI;
> > +		ret = poll(ufds, 1, -1);
> > +		if (ret < 0) {
> > +			mpsslog("%s: poll failed %s\n",
> > +				mic->name, strerror(errno));
> > +			goto close_error1;
> > +		}
> > +	} while (1);
> > +close_error:
> > +	free(state);
> > +close_error1:
> > +	close(fd);
> > +error:
> > +	init_mic(mic);
> > +	pthread_exit(NULL);
> > +}
> > +
> > +static void
> > +set_cmdline(struct mic_info *mic)
> > +{
> > +	char buffer[PATH_MAX];
> > +	int len;
> > +
> > +	len = snprintf(buffer, PATH_MAX,
> > +		"clocksource=tsc highres=off nohz=off ");
> > +	len += snprintf(buffer + len, PATH_MAX,
> > +		"cpufreq_on;corec6_off;pc3_off;pc6_off ");
> > +	len += snprintf(buffer + len, PATH_MAX,
> > +		"ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0",
> > +		mic->id);
> > +
> > +	setsysfs(mic->name, "cmdline", buffer);
> > +	mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer);
> > +	snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id);
> > +	mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer);
> > +}
> > +
> > +static void
> > +set_log_buf_info(struct mic_info *mic)
> > +{
> > +	int fd;
> > +	off_t len;
> > +	char system_map[] = "/lib/firmware/mic/System.map";
> > +	char *map, *temp, log_buf[17] = {'\0'};
> > +
> > +	fd = open(system_map, O_RDONLY);
> > +	if (fd < 0) {
> > +		mpsslog("%s: Opening System.map failed: %d\n",
> > +			mic->name, errno);
> > +		return;
> > +	}
> > +	len = lseek(fd, 0, SEEK_END);
> > +	if (len < 0) {
> > +		mpsslog("%s: Reading System.map size failed: %d\n",
> > +			mic->name, errno);
> > +		close(fd);
> > +		return;
> > +	}
> > +	map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
> > +	if (map == MAP_FAILED) {
> > +		mpsslog("%s: mmap of System.map failed: %d\n",
> > +			mic->name, errno);
> > +		close(fd);
> > +		return;
> > +	}
> > +	temp = strstr(map, "__log_buf");
> > +	if (!temp) {
> > +		mpsslog("%s: __log_buf not found: %d\n", mic->name, errno);
> > +		munmap(map, len);
> > +		close(fd);
> > +		return;
> > +	}
> > +	strncpy(log_buf, temp - 19, 16);
> > +	setsysfs(mic->name, "log_buf_addr", log_buf);
> > +	mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf);
> > +	temp = strstr(map, "log_buf_len");
> > +	if (!temp) {
> > +		mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno);
> > +		munmap(map, len);
> > +		close(fd);
> > +		return;
> > +	}
> > +	strncpy(log_buf, temp - 19, 16);
> > +	setsysfs(mic->name, "log_buf_len", log_buf);
> > +	mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf);
> > +	munmap(map, len);
> > +	close(fd);
> > +}
> > +
> > +static void init_mic(struct mic_info *mic);
> > +
> > +static void
> > +change_virtblk_backend(int x, siginfo_t *siginfo, void *p)
> > +{
> > +	struct mic_info *mic;
> > +
> > +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> > +		mic->mic_virtblk.signaled = 1/* true */;
> > +}
> > +
> > +static void
> > +init_mic(struct mic_info *mic)
> > +{
> > +	struct sigaction ignore = {
> > +		.sa_flags = 0,
> > +		.sa_handler = SIG_IGN
> > +	};
> > +	struct sigaction act = {
> > +		.sa_flags = SA_SIGINFO,
> > +		.sa_sigaction = change_virtblk_backend,
> > +	};
> > +	char buffer[PATH_MAX];
> > +	int err;
> > +
> > +		/* ignore SIGUSR1 for both process */
> > +	sigaction(SIGUSR1, &ignore, NULL);
> > +
> > +	mic->pid = fork();
> > +	switch (mic->pid) {
> > +	case 0:
> > +		set_log_buf_info(mic);
> > +		set_cmdline(mic);
> > +		add_virtio_device(mic, &virtcons_dev_page.dd);
> > +		add_virtio_device(mic, &virtnet_dev_page.dd);
> > +		err = pthread_create(&mic->mic_console.console_thread, NULL,
> > +			virtio_console, mic);
> > +		if (err)
> > +			mpsslog("%s virtcons pthread_create failed %s\n",
> > +			mic->name, strerror(err));
> > +		/*
> > +		 * TODO: Debug why not adding this sleep results in the tap
> > +		 * interface not coming up during certain runs sporadically.
> > +		 */
> 
> Indeed.
> 

Yes, we will look into removing this workaround for the next revision.

> > +		usleep(1000);
> > +		err = pthread_create(&mic->mic_net.net_thread, NULL,
> > +			virtio_net, mic);
> > +		if (err)
> > +			mpsslog("%s virtnet pthread_create failed %s\n",
> > +			mic->name, strerror(err));
> > +		err = pthread_create(&mic->mic_virtblk.block_thread, NULL,
> > +			virtio_block, mic);
> > +		if (err)
> > +			mpsslog("%s virtblk pthread_create failed %s\n",
> > +			mic->name, strerror(err));
> > +		sigemptyset(&act.sa_mask);
> > +		err = sigaction(SIGUSR1, &act, NULL);
> 
> Confused. Who sends this SIGUSR1 here?
> 

Currently, one virtio block device is supported for each MIC card at a
time. Any user (or test) can send a SIGUSR1 to the MIC daemon. The
signal informs the virtio block backend about a change in the
configuration file which specifies the virtio backend file name on the
host. Virtio block backend then re-reads the configuration file and
switches to the new block device. This signalling mechanism may not be
required once multiple virtio block devices are supported by the MIC
daemon. We will document the current signal handling mechanism in the
next revision till such time that it can be nuked.

> 
> > +		if (err)
> > +			mpsslog("%s sigaction SIGUSR1 failed %s\n",
> > +			mic->name, strerror(errno));
> > +		while (1)
> > +			sleep(60);
> > +	case -1:
> > +		mpsslog("fork failed MIC name %s id %d errno %d\n",
> > +			mic->name, mic->id, errno);
> > +		break;
> > +	default:
> > +		if (mic->restart) {
> > +			snprintf(buffer, PATH_MAX,
> > +				"boot:linux:mic/uos.img:mic/mic%d.image",
> > +				mic->id);
> > +			setsysfs(mic->name, "state", buffer);
> > +			mpsslog("%s restarting mic %d\n",
> > +				mic->name, mic->restart);
> > +			mic->restart = 0;
> > +		}
> > +		pthread_create(&mic->config_thread, NULL, mic_config, mic);
> > +	}
> > +}
> > +
> > +static void
> > +start_daemon(void)
> > +{
> > +	struct mic_info *mic;
> > +
> > +	for (mic = mic_list.next; mic != NULL; mic = mic->next)
> > +		init_mic(mic);
> > +
> > +	while (1)
> > +		sleep(60);
> > +}
> > +
> > +static int
> > +init_mic_list(void)
> > +{
> > +	struct mic_info *mic = &mic_list;
> > +	struct dirent *file;
> > +	DIR *dp;
> > +	int cnt = 0;
> > +
> > +	dp = opendir(MICSYSFSDIR);
> > +	if (!dp)
> > +		return 0;
> > +
> > +	while ((file = readdir(dp)) != NULL) {
> > +		if (!strncmp(file->d_name, "mic", 3)) {
> > +			mic->next = malloc(sizeof(struct mic_info));
> > +			if (mic->next) {
> > +				mic = mic->next;
> > +				mic->next = NULL;
> > +				memset(mic, 0, sizeof(struct mic_info));
> > +				mic->id = atoi(&file->d_name[3]);
> > +				mic->name = malloc(strlen(file->d_name) + 16);
> > +				if (mic->name)
> > +					strcpy(mic->name, file->d_name);
> > +				mpsslog("MIC name %s id %d\n", mic->name,
> > +					mic->id);
> > +				cnt++;
> > +			}
> > +		}
> > +	}
> > +
> > +	closedir(dp);
> > +	return cnt;
> > +}
> > +
> > +void
> > +mpsslog(char *format, ...)
> > +{
> > +	va_list args;
> > +	char buffer[4096];
> > +	time_t t;
> > +	char *ts;
> > +
> > +	if (logfp == NULL)
> > +		return;
> > +
> > +	va_start(args, format);
> > +	vsprintf(buffer, format, args);
> > +	va_end(args);
> > +
> > +	time(&t);
> > +	ts = ctime(&t);
> > +	ts[strlen(ts) - 1] = '\0';
> > +	fprintf(logfp, "%s: %s", ts, buffer);
> > +
> > +	fflush(logfp);
> > +}
> > +
> > +int
> > +main(int argc, char *argv[])
> > +{
> > +	int cnt;
> > +
> > +	myname = argv[0];
> > +
> > +	logfp = fopen(LOGFILE_NAME, "a+");
> > +	if (!logfp) {
> > +		fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME);
> > +		exit(1);
> > +	}
> > +
> > +	mpsslog("MIC Daemon start\n");
> > +
> > +	cnt = init_mic_list();
> > +	if (cnt == 0) {
> > +		mpsslog("MIC module not loaded\n");
> > +		exit(2);
> > +	}
> > +	mpsslog("MIC found %d devices\n", cnt);
> > +
> > +	start_daemon();
> > +
> > +	exit(0);
> > +}
> > diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h
> > new file mode 100644
> > index 0000000..b6dee38
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/mpssd.h
> > @@ -0,0 +1,100 @@
> > +/*
> > + * Intel MIC Platform Software Stack (MPSS)
> > + *
> > + * Copyright(c) 2013 Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * General Public License for more details.
> > + *
> > + * The full GNU General Public License is included in this distribution in
> > + * the file called "COPYING".
> > + *
> > + * Intel MIC User Space Tools.
> > + */
> > +#ifndef _MPSSD_H_
> > +#define _MPSSD_H_
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <fcntl.h>
> > +#include <unistd.h>
> > +#include <dirent.h>
> > +#include <libgen.h>
> > +#include <pthread.h>
> > +#include <stdarg.h>
> > +#include <time.h>
> > +#include <errno.h>
> > +#include <sys/dir.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/poll.h>
> > +#include <sys/types.h>
> > +#include <sys/socket.h>
> > +#include <sys/stat.h>
> > +#include <sys/types.h>
> > +#include <sys/mman.h>
> > +#include <sys/utsname.h>
> > +#include <sys/wait.h>
> > +#include <netinet/in.h>
> > +#include <arpa/inet.h>
> > +#include <netdb.h>
> > +#include <pthread.h>
> > +#include <signal.h>
> > +#include <limits.h>
> > +#include <syslog.h>
> > +#include <getopt.h>
> > +#include <net/if.h>
> > +#include <linux/if_tun.h>
> > +#include <linux/if_tun.h>
> > +#include <linux/virtio_ids.h>
> > +
> > +#define MICSYSFSDIR "/sys/class/mic"
> > +#define LOGFILE_NAME "/var/log/mpssd"
> > +#define PAGE_SIZE 4096
> > +
> > +struct mic_console_info {
> > +	pthread_t       console_thread;
> > +	int		virtio_console_fd;
> > +	void		*console_dp;
> > +};
> > +
> > +struct mic_net_info {
> > +	pthread_t       net_thread;
> > +	int		virtio_net_fd;
> > +	int		tap_fd;
> > +	void		*net_dp;
> > +};
> > +
> > +struct mic_virtblk_info {
> > +	pthread_t       block_thread;
> > +	int		virtio_block_fd;
> > +	void		*block_dp;
> > +	volatile sig_atomic_t	signaled;
> > +	char		*backend_file;
> > +	int		backend;
> > +	void		*backend_addr;
> > +	long		backend_size;
> > +};
> > +
> > +struct mic_info {
> > +	int		id;
> > +	char		*name;
> > +	pthread_t       config_thread;
> > +	pid_t		pid;
> > +	struct mic_console_info	mic_console;
> > +	struct mic_net_info	mic_net;
> > +	struct mic_virtblk_info	mic_virtblk;
> > +	int		restart;
> > +	struct mic_info *next;
> > +};
> > +
> > +void mpsslog(char *format, ...);
> > +char *readsysfs(char *dir, char *entry);
> > +int setsysfs(char *dir, char *entry, char *value);
> > +#endif
> > diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c
> > new file mode 100644
> > index 0000000..3244dcf
> > --- /dev/null
> > +++ b/Documentation/mic/mpssd/sysfs.c
> > @@ -0,0 +1,103 @@
> > +/*
> > + * Intel MIC Platform Software Stack (MPSS)
> > + *
> > + * Copyright(c) 2013 Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful, but
> > + * WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * General Public License for more details.
> > + *
> > + * The full GNU General Public License is included in this distribution in
> > + * the file called "COPYING".
> > + *
> > + * Intel MIC User Space Tools.
> > + */
> > +
> > +#include "mpssd.h"
> > +
> > +#define PAGE_SIZE 4096
> > +
> > +char *
> > +readsysfs(char *dir, char *entry)
> > +{
> > +	char filename[PATH_MAX];
> > +	char value[PAGE_SIZE];
> > +	char *string = NULL;
> > +	int fd;
> > +	int len;
> > +
> > +	if (dir == NULL)
> > +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> > +	else
> > +		snprintf(filename, PATH_MAX,
> > +			"%s/%s/%s", MICSYSFSDIR, dir, entry);
> > +
> > +	fd = open(filename, O_RDONLY);
> > +	if (fd < 0) {
> > +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> > +			filename, strerror(errno));
> > +		return NULL;
> > +	}
> > +
> > +	len = read(fd, value, sizeof(value));
> > +	if (len < 0) {
> > +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> > +			filename, strerror(errno));
> > +		goto readsys_ret;
> > +	}
> > +
> > +	value[len] = '\0';
> 
> Why are you careful to put this \0 here but not in setsysfs below?
> 
> If you do, I'd fail on len == sizeof value as well, it isn't going to work with
> that.
> 

Sysfs entries generally return the string ending with a newline. We
should ideally convert the newline to a NULL termination uniformly
across readsysfs/setsysfs APIs in this file. We will make these changes
for the next revision.

Thanks for the review!

Sudeep Dutt

> > +
> > +	string = malloc(strlen(value) + 1);
> > +	if (string)
> > +		strcpy(string, value);
> > +
> > +readsys_ret:
> > +	close(fd);
> > +	return string;
> > +}
> > +
> > +int
> > +setsysfs(char *dir, char *entry, char *value)
> > +{
> > +	char filename[PATH_MAX];
> > +	char oldvalue[PAGE_SIZE];
> > +	int fd;
> > +
> > +	if (dir == NULL)
> > +		snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry);
> > +	else
> > +		snprintf(filename, PATH_MAX, "%s/%s/%s",
> > +			MICSYSFSDIR, dir, entry);
> > +
> > +	fd = open(filename, O_RDWR);
> > +	if (fd < 0) {
> > +		mpsslog("Failed to open sysfs entry '%s': %s\n",
> > +			filename, strerror(errno));
> > +		return errno;
> > +	}
> > +
> > +	if (read(fd, oldvalue, sizeof(oldvalue)) < 0) {
> > +		mpsslog("Failed to read sysfs entry '%s': %s\n",
> > +			filename, strerror(errno));
> > +		close(fd);
> > +		return errno;
> > +	}
> > +
> > +	if (strcmp(value, oldvalue)) {
> > +		if (write(fd, value, strlen(value)) < 0) {
> > +			mpsslog("Failed to write new sysfs entry '%s': %s\n",
> > +				filename, strerror(errno));
> > +			close(fd);
> > +			return errno;
> > +		}
> > +	}
> > +
> > +	close(fd);
> > +	return 0;
> > +}
> > -- 
> > 1.8.2.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/7] Intel MIC Host Driver for X100 family.
  2013-08-08  3:04 ` [PATCH v2 1/7] Intel MIC Host Driver for X100 family Sudeep Dutt
@ 2013-08-12 22:58   ` Greg Kroah-Hartman
  2013-08-12 22:58   ` Greg Kroah-Hartman
  2013-08-12 23:06     ` Greg Kroah-Hartman
  2 siblings, 0 replies; 18+ messages in thread
From: Greg Kroah-Hartman @ 2013-08-12 22:58 UTC (permalink / raw)
  To: Sudeep Dutt
  Cc: Arnd Bergmann, Rusty Russell, Michael S. Tsirkin, Rob Landley,
	linux-kernel, virtualization, linux-doc, AsiasHeasias,
	Nikhil Rao, Ashutosh Dixit, Caz Yokoyama,
	Dasaratharaman Chandramouli, Harshavardhan R Kharche,
	Yaozu (Eddie) Dong, Peter P Waskiewicz Jr

On Wed, Aug 07, 2013 at 08:04:07PM -0700, Sudeep Dutt wrote:
> diff --git a/drivers/misc/mic/host/mic_common.h b/drivers/misc/mic/host/mic_common.h
> new file mode 100644
> index 0000000..55b0337
> --- /dev/null
> +++ b/drivers/misc/mic/host/mic_common.h
> @@ -0,0 +1,30 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC Host driver.
> + *
> + */
> +#ifndef _MIC_HOST_COMMON_H_
> +#define _MIC_HOST_COMMON_H_
> +
> +#include <linux/cdev.h>
> +
> +#include "../common/mic_device.h"
> +#include "mic_device.h"
> +#include "mic_x100.h"
> +
> +#endif

Don't create .h files that just wrap other .h files up.  It makes it
hard to unwind stuff and determine exactly what .c files need what .h
files.  Just include the needed .h files properly in the .c code, and
all should be fine.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/7] Intel MIC Host Driver for X100 family.
  2013-08-08  3:04 ` [PATCH v2 1/7] Intel MIC Host Driver for X100 family Sudeep Dutt
  2013-08-12 22:58   ` Greg Kroah-Hartman
@ 2013-08-12 22:58   ` Greg Kroah-Hartman
  2013-08-12 23:06     ` Greg Kroah-Hartman
  2 siblings, 0 replies; 18+ messages in thread
From: Greg Kroah-Hartman @ 2013-08-12 22:58 UTC (permalink / raw)
  To: Sudeep Dutt
  Cc: Peter P Waskiewicz Jr, Yaozu (Eddie) Dong, Arnd Bergmann,
	Michael S. Tsirkin, Harshavardhan R Kharche, linux-doc,
	linux-kernel, virtualization, Ashutosh Dixit, AsiasHeasias,
	Rob Landley, Caz Yokoyama, Dasaratharaman Chandramouli

On Wed, Aug 07, 2013 at 08:04:07PM -0700, Sudeep Dutt wrote:
> diff --git a/drivers/misc/mic/host/mic_common.h b/drivers/misc/mic/host/mic_common.h
> new file mode 100644
> index 0000000..55b0337
> --- /dev/null
> +++ b/drivers/misc/mic/host/mic_common.h
> @@ -0,0 +1,30 @@
> +/*
> + * Intel MIC Platform Software Stack (MPSS)
> + *
> + * Copyright(c) 2013 Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Intel MIC Host driver.
> + *
> + */
> +#ifndef _MIC_HOST_COMMON_H_
> +#define _MIC_HOST_COMMON_H_
> +
> +#include <linux/cdev.h>
> +
> +#include "../common/mic_device.h"
> +#include "mic_device.h"
> +#include "mic_x100.h"
> +
> +#endif

Don't create .h files that just wrap other .h files up.  It makes it
hard to unwind stuff and determine exactly what .c files need what .h
files.  Just include the needed .h files properly in the .c code, and
all should be fine.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/7] Intel MIC Host Driver for X100 family.
  2013-08-08  3:04 ` [PATCH v2 1/7] Intel MIC Host Driver for X100 family Sudeep Dutt
@ 2013-08-12 23:06     ` Greg Kroah-Hartman
  2013-08-12 22:58   ` Greg Kroah-Hartman
  2013-08-12 23:06     ` Greg Kroah-Hartman
  2 siblings, 0 replies; 18+ messages in thread
From: Greg Kroah-Hartman @ 2013-08-12 23:06 UTC (permalink / raw)
  To: Sudeep Dutt
  Cc: Arnd Bergmann, Rusty Russell, Michael S. Tsirkin, Rob Landley,
	linux-kernel, virtualization, linux-doc, AsiasHeasias,
	Nikhil Rao, Ashutosh Dixit, Caz Yokoyama,
	Dasaratharaman Chandramouli, Harshavardhan R Kharche,
	Yaozu (Eddie) Dong, Peter P Waskiewicz Jr

On Wed, Aug 07, 2013 at 08:04:07PM -0700, Sudeep Dutt wrote:
> +/**
> + * struct mic_device -  MIC device information for each card.
> + *
> + * @name: Unique name for this MIC device.
> + * @mmio: MMIO bar information.
> + * @pdev: The PCI device structure.
> + * @family: The MIC family to which this device belongs.
> + * @ops: MIC HW specific operations.
> + * @id: The unique device id for this MIC device.
> + * @stepping: Stepping ID.
> + * @attr_group: Sysfs attribute group.
> + * @sdev: Device for sysfs entries.
> + * @aper: Aperture bar information.
> + */
> +struct mic_device {
> +	char name[20];

The name can be in the struct device (it should be the same, right?)

> +	struct mic_mw mmio;
> +	struct pci_dev *pdev;

Isn't this just the parent of the device?  Do you really need this?

> +	enum mic_hw_family family;
> +	struct mic_hw_ops *ops;
> +	int id;
> +	enum mic_stepping stepping;
> +	struct attribute_group attr_group;

Shouldn't this be a pointer to a list of groups?

> +	struct device *sdev;

Shouldn't this be embedded inside here, instead of a pointer?

> +	struct mic_mw aper;
> +};


> +/**
> + * struct mic_info -  Global information about all MIC devices.
> + *
> + * @next_id: Next available MIC device id.
> + * @mic_class: Class of MIC devices for sysfs accessibility.
> + * @mdev_id: Base device node number.
> + */
> +struct mic_info {
> +	int next_id;

Please use the idr interface, don't roll your own, odds are you got it
wrong, and I don't want to have to debug it :(

> +	struct class *mic_class;

Isn't this a global symbol that you have (or static symbol).  There
should never be more than one "class" around for these devices.

> +	dev_t mdev_id;
> +};
> +
> +/* g_mic - Global information about all MIC devices. */
> +static struct mic_info g_mic;

See, one class :)

> +/**
> + * mic_probe - Device Initialization Routine
> + *
> + * @pdev: PCI device structure
> + * @ent: entry in mic_pci_tbl
> + *
> + * returns 0 on success, < 0 on failure.
> + */
> +static int __init mic_probe(struct pci_dev *pdev,
> +		const struct pci_device_id *ent)
> +{
> +	int rc;
> +	struct mic_device *mdev;
> +	char name[20];
> +
> +	rc = g_mic.next_id++;

No locking, please use the idr interface...

> +
> +	snprintf(name, sizeof(name), "mic%d", rc);
> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> +	if (!mdev) {
> +		rc = -ENOMEM;
> +		dev_err(&pdev->dev, "dev kmalloc failed rc %d\n", rc);
> +		goto dec_num_dev;
> +	}
> +	strncpy(mdev->name, name, sizeof(name));
> +	mdev->id = rc;
> +
> +	mic_device_init(mdev, pdev);
> +
> +	rc = pci_enable_device(pdev);
> +	if (rc) {
> +		dev_err(&pdev->dev, "failed to enable pci device.\n");
> +		goto free_device;
> +	}
> +
> +	pci_set_master(pdev);
> +
> +	rc = pci_request_regions(pdev, mic_driver_name);
> +	if (rc) {
> +		dev_err(&pdev->dev, "failed to get pci regions.\n");
> +		goto disable_device;
> +	}
> +
> +	rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
> +	if (rc) {
> +		dev_err(&pdev->dev, "Cannot set DMA mask\n");
> +		goto release_regions;
> +	}
> +
> +	mdev->mmio.pa = pci_resource_start(pdev, mdev->ops->mmio_bar);
> +	mdev->mmio.len = pci_resource_len(pdev, mdev->ops->mmio_bar);
> +	mdev->mmio.va = pci_ioremap_bar(pdev, mdev->ops->mmio_bar);
> +	if (!mdev->mmio.va) {
> +		dev_err(&pdev->dev, "Cannot remap MMIO BAR\n");
> +		rc = -EIO;
> +		goto release_regions;
> +	}
> +
> +	mdev->aper.pa = pci_resource_start(pdev, mdev->ops->aper_bar);
> +	mdev->aper.len = pci_resource_len(pdev, mdev->ops->aper_bar);
> +	mdev->aper.va = ioremap_wc(mdev->aper.pa, mdev->aper.len);
> +	if (!mdev->aper.va) {
> +		dev_err(&pdev->dev, "Cannot remap Aperture BAR\n");
> +		rc = -EIO;
> +		goto unmap_mmio;
> +	}
> +
> +	mdev->ops->init(mdev);
> +
> +	pci_set_drvdata(pdev, mdev);
> +
> +	mdev->sdev = device_create(g_mic.mic_class, &pdev->dev,
> +		MKDEV(MAJOR(g_mic.mdev_id), mdev->id), NULL, "%s", mdev->name);
> +	if (IS_ERR(mdev->sdev)) {
> +		rc = PTR_ERR(mdev->sdev);
> +		dev_err(&pdev->dev, "device_create failed rc %d\n", rc);
> +		goto unmap_aper;
> +	}
> +
> +	rc = sysfs_create_group(&mdev->sdev->kobj, &mdev->attr_group);

We now have a function you should use instead,
device_create_with_groups() that solves the race condition you just
caused here by creating and notifying userspace that the device is
present, _before_ creating the sysfs files for it.



> +	if (rc) {
> +		dev_err(&pdev->dev, "sysfs_create_group failed rc %d\n", rc);
> +		goto destroy_device;
> +	}
> +	dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name);

Useless noise, remove it.

> +	return 0;
> +destroy_device:
> +	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
> +unmap_aper:
> +	iounmap(mdev->mmio.va);
> +unmap_mmio:
> +	iounmap(mdev->aper.va);
> +release_regions:
> +	pci_release_regions(pdev);
> +disable_device:
> +	pci_disable_device(pdev);
> +free_device:
> +	kfree(mdev);
> +dec_num_dev:
> +	g_mic.next_id--;
> +	dev_err(&pdev->dev, "Probe failed rc %d\n", rc);
> +	return rc;
> +}
> +
> +/**
> + * mic_remove - Device Removal Routine
> + * mic_remove is called by the PCI subsystem to alert the driver
> + * that it should release a PCI device.
> + *
> + * @pdev: PCI device structure
> + */
> +static void mic_remove(struct pci_dev *pdev)
> +{
> +	struct mic_device *mdev;
> +	int id;
> +
> +	mdev = pci_get_drvdata(pdev);
> +	if (!mdev)
> +		return;
> +
> +	id = mdev->id;
> +
> +	sysfs_remove_group(&mdev->sdev->kobj, &mdev->attr_group);

No need to do this as:

> +	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));

Will do it for you if you make the changes above.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/7] Intel MIC Host Driver for X100 family.
@ 2013-08-12 23:06     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 18+ messages in thread
From: Greg Kroah-Hartman @ 2013-08-12 23:06 UTC (permalink / raw)
  To: Sudeep Dutt
  Cc: Peter P Waskiewicz Jr, Yaozu (Eddie) Dong, Arnd Bergmann,
	Michael S. Tsirkin, Harshavardhan R Kharche, linux-doc,
	linux-kernel, virtualization, Ashutosh Dixit, AsiasHeasias,
	Rob Landley, Caz Yokoyama, Dasaratharaman Chandramouli

On Wed, Aug 07, 2013 at 08:04:07PM -0700, Sudeep Dutt wrote:
> +/**
> + * struct mic_device -  MIC device information for each card.
> + *
> + * @name: Unique name for this MIC device.
> + * @mmio: MMIO bar information.
> + * @pdev: The PCI device structure.
> + * @family: The MIC family to which this device belongs.
> + * @ops: MIC HW specific operations.
> + * @id: The unique device id for this MIC device.
> + * @stepping: Stepping ID.
> + * @attr_group: Sysfs attribute group.
> + * @sdev: Device for sysfs entries.
> + * @aper: Aperture bar information.
> + */
> +struct mic_device {
> +	char name[20];

The name can be in the struct device (it should be the same, right?)

> +	struct mic_mw mmio;
> +	struct pci_dev *pdev;

Isn't this just the parent of the device?  Do you really need this?

> +	enum mic_hw_family family;
> +	struct mic_hw_ops *ops;
> +	int id;
> +	enum mic_stepping stepping;
> +	struct attribute_group attr_group;

Shouldn't this be a pointer to a list of groups?

> +	struct device *sdev;

Shouldn't this be embedded inside here, instead of a pointer?

> +	struct mic_mw aper;
> +};


> +/**
> + * struct mic_info -  Global information about all MIC devices.
> + *
> + * @next_id: Next available MIC device id.
> + * @mic_class: Class of MIC devices for sysfs accessibility.
> + * @mdev_id: Base device node number.
> + */
> +struct mic_info {
> +	int next_id;

Please use the idr interface, don't roll your own, odds are you got it
wrong, and I don't want to have to debug it :(

> +	struct class *mic_class;

Isn't this a global symbol that you have (or static symbol).  There
should never be more than one "class" around for these devices.

> +	dev_t mdev_id;
> +};
> +
> +/* g_mic - Global information about all MIC devices. */
> +static struct mic_info g_mic;

See, one class :)

> +/**
> + * mic_probe - Device Initialization Routine
> + *
> + * @pdev: PCI device structure
> + * @ent: entry in mic_pci_tbl
> + *
> + * returns 0 on success, < 0 on failure.
> + */
> +static int __init mic_probe(struct pci_dev *pdev,
> +		const struct pci_device_id *ent)
> +{
> +	int rc;
> +	struct mic_device *mdev;
> +	char name[20];
> +
> +	rc = g_mic.next_id++;

No locking, please use the idr interface...

> +
> +	snprintf(name, sizeof(name), "mic%d", rc);
> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> +	if (!mdev) {
> +		rc = -ENOMEM;
> +		dev_err(&pdev->dev, "dev kmalloc failed rc %d\n", rc);
> +		goto dec_num_dev;
> +	}
> +	strncpy(mdev->name, name, sizeof(name));
> +	mdev->id = rc;
> +
> +	mic_device_init(mdev, pdev);
> +
> +	rc = pci_enable_device(pdev);
> +	if (rc) {
> +		dev_err(&pdev->dev, "failed to enable pci device.\n");
> +		goto free_device;
> +	}
> +
> +	pci_set_master(pdev);
> +
> +	rc = pci_request_regions(pdev, mic_driver_name);
> +	if (rc) {
> +		dev_err(&pdev->dev, "failed to get pci regions.\n");
> +		goto disable_device;
> +	}
> +
> +	rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
> +	if (rc) {
> +		dev_err(&pdev->dev, "Cannot set DMA mask\n");
> +		goto release_regions;
> +	}
> +
> +	mdev->mmio.pa = pci_resource_start(pdev, mdev->ops->mmio_bar);
> +	mdev->mmio.len = pci_resource_len(pdev, mdev->ops->mmio_bar);
> +	mdev->mmio.va = pci_ioremap_bar(pdev, mdev->ops->mmio_bar);
> +	if (!mdev->mmio.va) {
> +		dev_err(&pdev->dev, "Cannot remap MMIO BAR\n");
> +		rc = -EIO;
> +		goto release_regions;
> +	}
> +
> +	mdev->aper.pa = pci_resource_start(pdev, mdev->ops->aper_bar);
> +	mdev->aper.len = pci_resource_len(pdev, mdev->ops->aper_bar);
> +	mdev->aper.va = ioremap_wc(mdev->aper.pa, mdev->aper.len);
> +	if (!mdev->aper.va) {
> +		dev_err(&pdev->dev, "Cannot remap Aperture BAR\n");
> +		rc = -EIO;
> +		goto unmap_mmio;
> +	}
> +
> +	mdev->ops->init(mdev);
> +
> +	pci_set_drvdata(pdev, mdev);
> +
> +	mdev->sdev = device_create(g_mic.mic_class, &pdev->dev,
> +		MKDEV(MAJOR(g_mic.mdev_id), mdev->id), NULL, "%s", mdev->name);
> +	if (IS_ERR(mdev->sdev)) {
> +		rc = PTR_ERR(mdev->sdev);
> +		dev_err(&pdev->dev, "device_create failed rc %d\n", rc);
> +		goto unmap_aper;
> +	}
> +
> +	rc = sysfs_create_group(&mdev->sdev->kobj, &mdev->attr_group);

We now have a function you should use instead,
device_create_with_groups() that solves the race condition you just
caused here by creating and notifying userspace that the device is
present, _before_ creating the sysfs files for it.



> +	if (rc) {
> +		dev_err(&pdev->dev, "sysfs_create_group failed rc %d\n", rc);
> +		goto destroy_device;
> +	}
> +	dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name);

Useless noise, remove it.

> +	return 0;
> +destroy_device:
> +	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
> +unmap_aper:
> +	iounmap(mdev->mmio.va);
> +unmap_mmio:
> +	iounmap(mdev->aper.va);
> +release_regions:
> +	pci_release_regions(pdev);
> +disable_device:
> +	pci_disable_device(pdev);
> +free_device:
> +	kfree(mdev);
> +dec_num_dev:
> +	g_mic.next_id--;
> +	dev_err(&pdev->dev, "Probe failed rc %d\n", rc);
> +	return rc;
> +}
> +
> +/**
> + * mic_remove - Device Removal Routine
> + * mic_remove is called by the PCI subsystem to alert the driver
> + * that it should release a PCI device.
> + *
> + * @pdev: PCI device structure
> + */
> +static void mic_remove(struct pci_dev *pdev)
> +{
> +	struct mic_device *mdev;
> +	int id;
> +
> +	mdev = pci_get_drvdata(pdev);
> +	if (!mdev)
> +		return;
> +
> +	id = mdev->id;
> +
> +	sysfs_remove_group(&mdev->sdev->kobj, &mdev->attr_group);

No need to do this as:

> +	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));

Will do it for you if you make the changes above.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/7] Intel MIC Host Driver for X100 family.
  2013-08-12 23:06     ` Greg Kroah-Hartman
  (?)
  (?)
@ 2013-08-14 20:08     ` Sudeep Dutt
  -1 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-14 20:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Arnd Bergmann, Rusty Russell, Michael S. Tsirkin, Rob Landley,
	linux-kernel, virtualization, linux-doc, AsiasHeasias,
	Nikhil Rao, Ashutosh Dixit, Caz Yokoyama,
	Dasaratharaman Chandramouli, Harshavardhan R Kharche,
	Yaozu (Eddie) Dong, Peter P Waskiewicz Jr, Sudeep Dutt

On Mon, 2013-08-12 at 16:06 -0700, Greg Kroah-Hartman wrote:
> On Wed, Aug 07, 2013 at 08:04:07PM -0700, Sudeep Dutt wrote:
> > +/**
> > + * struct mic_device -  MIC device information for each card.
> > + *
> > + * @name: Unique name for this MIC device.
> > + * @mmio: MMIO bar information.
> > + * @pdev: The PCI device structure.
> > + * @family: The MIC family to which this device belongs.
> > + * @ops: MIC HW specific operations.
> > + * @id: The unique device id for this MIC device.
> > + * @stepping: Stepping ID.
> > + * @attr_group: Sysfs attribute group.
> > + * @sdev: Device for sysfs entries.
> > + * @aper: Aperture bar information.
> > + */
> > +struct mic_device {
> > +	char name[20];
> 
> The name can be in the struct device (it should be the same, right?)
> 

The name field is redundant and will be deleted.

> > +	struct mic_mw mmio;
> > +	struct pci_dev *pdev;
> 
> Isn't this just the parent of the device?  Do you really need this?
> 

pdev is redundant as well and will be deleted.

> > +	enum mic_hw_family family;
> > +	struct mic_hw_ops *ops;
> > +	int id;
> > +	enum mic_stepping stepping;
> > +	struct attribute_group attr_group;
> 
> Shouldn't this be a pointer to a list of groups?
> 

We have a single attribute group. However a list of groups is scalable
and also required by device_create_with_groups(..) so we have changed
this to a list of groups.

> > +	struct device *sdev;
> 
> Shouldn't this be embedded inside here, instead of a pointer?
> 

We are using device_create(..) (or device_create_with_groups(..) now)
which returns a pointer to the device.

> > +	struct mic_mw aper;
> > +};
> 
> 
> > +/**
> > + * struct mic_info -  Global information about all MIC devices.
> > + *
> > + * @next_id: Next available MIC device id.
> > + * @mic_class: Class of MIC devices for sysfs accessibility.
> > + * @mdev_id: Base device node number.
> > + */
> > +struct mic_info {
> > +	int next_id;
> 
> Please use the idr interface, don't roll your own, odds are you got it
> wrong, and I don't want to have to debug it :(
> 

Agreed, the IDA interface works nicely here.

> > +	struct class *mic_class;
> 
> Isn't this a global symbol that you have (or static symbol).  There
> should never be more than one "class" around for these devices.
> 

There is only one class for these devices stored in g_mic static symbol,
as you noticed below. We will improve the documentation here but please
let us know if you were suggesting other changes.

> > +	dev_t mdev_id;
> > +};
> > +
> > +/* g_mic - Global information about all MIC devices. */
> > +static struct mic_info g_mic;
> 
> See, one class :)
> 
> > +/**
> > + * mic_probe - Device Initialization Routine
> > + *
> > + * @pdev: PCI device structure
> > + * @ent: entry in mic_pci_tbl
> > + *
> > + * returns 0 on success, < 0 on failure.
> > + */
> > +static int __init mic_probe(struct pci_dev *pdev,
> > +		const struct pci_device_id *ent)
> > +{
> > +	int rc;
> > +	struct mic_device *mdev;
> > +	char name[20];
> > +
> > +	rc = g_mic.next_id++;
> 
> No locking, please use the idr interface...
> 

Agreed, the IDA interface works nicely here.

> > +
> > +	snprintf(name, sizeof(name), "mic%d", rc);
> > +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> > +	if (!mdev) {
> > +		rc = -ENOMEM;
> > +		dev_err(&pdev->dev, "dev kmalloc failed rc %d\n", rc);
> > +		goto dec_num_dev;
> > +	}
> > +	strncpy(mdev->name, name, sizeof(name));
> > +	mdev->id = rc;
> > +
> > +	mic_device_init(mdev, pdev);
> > +
> > +	rc = pci_enable_device(pdev);
> > +	if (rc) {
> > +		dev_err(&pdev->dev, "failed to enable pci device.\n");
> > +		goto free_device;
> > +	}
> > +
> > +	pci_set_master(pdev);
> > +
> > +	rc = pci_request_regions(pdev, mic_driver_name);
> > +	if (rc) {
> > +		dev_err(&pdev->dev, "failed to get pci regions.\n");
> > +		goto disable_device;
> > +	}
> > +
> > +	rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
> > +	if (rc) {
> > +		dev_err(&pdev->dev, "Cannot set DMA mask\n");
> > +		goto release_regions;
> > +	}
> > +
> > +	mdev->mmio.pa = pci_resource_start(pdev, mdev->ops->mmio_bar);
> > +	mdev->mmio.len = pci_resource_len(pdev, mdev->ops->mmio_bar);
> > +	mdev->mmio.va = pci_ioremap_bar(pdev, mdev->ops->mmio_bar);
> > +	if (!mdev->mmio.va) {
> > +		dev_err(&pdev->dev, "Cannot remap MMIO BAR\n");
> > +		rc = -EIO;
> > +		goto release_regions;
> > +	}
> > +
> > +	mdev->aper.pa = pci_resource_start(pdev, mdev->ops->aper_bar);
> > +	mdev->aper.len = pci_resource_len(pdev, mdev->ops->aper_bar);
> > +	mdev->aper.va = ioremap_wc(mdev->aper.pa, mdev->aper.len);
> > +	if (!mdev->aper.va) {
> > +		dev_err(&pdev->dev, "Cannot remap Aperture BAR\n");
> > +		rc = -EIO;
> > +		goto unmap_mmio;
> > +	}
> > +
> > +	mdev->ops->init(mdev);
> > +
> > +	pci_set_drvdata(pdev, mdev);
> > +
> > +	mdev->sdev = device_create(g_mic.mic_class, &pdev->dev,
> > +		MKDEV(MAJOR(g_mic.mdev_id), mdev->id), NULL, "%s", mdev->name);
> > +	if (IS_ERR(mdev->sdev)) {
> > +		rc = PTR_ERR(mdev->sdev);
> > +		dev_err(&pdev->dev, "device_create failed rc %d\n", rc);
> > +		goto unmap_aper;
> > +	}
> > +
> > +	rc = sysfs_create_group(&mdev->sdev->kobj, &mdev->attr_group);
> 
> We now have a function you should use instead,
> device_create_with_groups() that solves the race condition you just
> caused here by creating and notifying userspace that the device is
> present, _before_ creating the sysfs files for it.
> 

We are using device_create_with_groups(..) now and it works nicely here.

> 
> 
> > +	if (rc) {
> > +		dev_err(&pdev->dev, "sysfs_create_group failed rc %d\n", rc);
> > +		goto destroy_device;
> > +	}
> > +	dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name);
> 
> Useless noise, remove it.
> 

Agreed, will remove.

> > +	return 0;
> > +destroy_device:
> > +	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
> > +unmap_aper:
> > +	iounmap(mdev->mmio.va);
> > +unmap_mmio:
> > +	iounmap(mdev->aper.va);
> > +release_regions:
> > +	pci_release_regions(pdev);
> > +disable_device:
> > +	pci_disable_device(pdev);
> > +free_device:
> > +	kfree(mdev);
> > +dec_num_dev:
> > +	g_mic.next_id--;
> > +	dev_err(&pdev->dev, "Probe failed rc %d\n", rc);
> > +	return rc;
> > +}
> > +
> > +/**
> > + * mic_remove - Device Removal Routine
> > + * mic_remove is called by the PCI subsystem to alert the driver
> > + * that it should release a PCI device.
> > + *
> > + * @pdev: PCI device structure
> > + */
> > +static void mic_remove(struct pci_dev *pdev)
> > +{
> > +	struct mic_device *mdev;
> > +	int id;
> > +
> > +	mdev = pci_get_drvdata(pdev);
> > +	if (!mdev)
> > +		return;
> > +
> > +	id = mdev->id;
> > +
> > +	sysfs_remove_group(&mdev->sdev->kobj, &mdev->attr_group);
> 
> No need to do this as:
> 
> > +	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
> 
> Will do it for you if you make the changes above.
> 

Indeed, it cleans up correctly.

We will incorporate all your suggestions for patch 1 (including the
header file cleanup in the other thread) and post a rev3 once it has
seen some validation.

Thanks for the review!
Sudeep Dutt

> thanks,
> 
> greg k-h



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/7] Intel MIC Host Driver for X100 family.
  2013-08-12 23:06     ` Greg Kroah-Hartman
  (?)
@ 2013-08-14 20:08     ` Sudeep Dutt
  -1 siblings, 0 replies; 18+ messages in thread
From: Sudeep Dutt @ 2013-08-14 20:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sudeep Dutt, Peter P Waskiewicz Jr, Yaozu (Eddie) Dong,
	Arnd Bergmann, Michael S. Tsirkin, Harshavardhan R Kharche,
	linux-doc, linux-kernel, virtualization, Ashutosh Dixit,
	AsiasHeasias, Rob Landley, Caz Yokoyama,
	Dasaratharaman Chandramouli

On Mon, 2013-08-12 at 16:06 -0700, Greg Kroah-Hartman wrote:
> On Wed, Aug 07, 2013 at 08:04:07PM -0700, Sudeep Dutt wrote:
> > +/**
> > + * struct mic_device -  MIC device information for each card.
> > + *
> > + * @name: Unique name for this MIC device.
> > + * @mmio: MMIO bar information.
> > + * @pdev: The PCI device structure.
> > + * @family: The MIC family to which this device belongs.
> > + * @ops: MIC HW specific operations.
> > + * @id: The unique device id for this MIC device.
> > + * @stepping: Stepping ID.
> > + * @attr_group: Sysfs attribute group.
> > + * @sdev: Device for sysfs entries.
> > + * @aper: Aperture bar information.
> > + */
> > +struct mic_device {
> > +	char name[20];
> 
> The name can be in the struct device (it should be the same, right?)
> 

The name field is redundant and will be deleted.

> > +	struct mic_mw mmio;
> > +	struct pci_dev *pdev;
> 
> Isn't this just the parent of the device?  Do you really need this?
> 

pdev is redundant as well and will be deleted.

> > +	enum mic_hw_family family;
> > +	struct mic_hw_ops *ops;
> > +	int id;
> > +	enum mic_stepping stepping;
> > +	struct attribute_group attr_group;
> 
> Shouldn't this be a pointer to a list of groups?
> 

We have a single attribute group. However a list of groups is scalable
and also required by device_create_with_groups(..) so we have changed
this to a list of groups.

> > +	struct device *sdev;
> 
> Shouldn't this be embedded inside here, instead of a pointer?
> 

We are using device_create(..) (or device_create_with_groups(..) now)
which returns a pointer to the device.

> > +	struct mic_mw aper;
> > +};
> 
> 
> > +/**
> > + * struct mic_info -  Global information about all MIC devices.
> > + *
> > + * @next_id: Next available MIC device id.
> > + * @mic_class: Class of MIC devices for sysfs accessibility.
> > + * @mdev_id: Base device node number.
> > + */
> > +struct mic_info {
> > +	int next_id;
> 
> Please use the idr interface, don't roll your own, odds are you got it
> wrong, and I don't want to have to debug it :(
> 

Agreed, the IDA interface works nicely here.

> > +	struct class *mic_class;
> 
> Isn't this a global symbol that you have (or static symbol).  There
> should never be more than one "class" around for these devices.
> 

There is only one class for these devices stored in g_mic static symbol,
as you noticed below. We will improve the documentation here but please
let us know if you were suggesting other changes.

> > +	dev_t mdev_id;
> > +};
> > +
> > +/* g_mic - Global information about all MIC devices. */
> > +static struct mic_info g_mic;
> 
> See, one class :)
> 
> > +/**
> > + * mic_probe - Device Initialization Routine
> > + *
> > + * @pdev: PCI device structure
> > + * @ent: entry in mic_pci_tbl
> > + *
> > + * returns 0 on success, < 0 on failure.
> > + */
> > +static int __init mic_probe(struct pci_dev *pdev,
> > +		const struct pci_device_id *ent)
> > +{
> > +	int rc;
> > +	struct mic_device *mdev;
> > +	char name[20];
> > +
> > +	rc = g_mic.next_id++;
> 
> No locking, please use the idr interface...
> 

Agreed, the IDA interface works nicely here.

> > +
> > +	snprintf(name, sizeof(name), "mic%d", rc);
> > +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> > +	if (!mdev) {
> > +		rc = -ENOMEM;
> > +		dev_err(&pdev->dev, "dev kmalloc failed rc %d\n", rc);
> > +		goto dec_num_dev;
> > +	}
> > +	strncpy(mdev->name, name, sizeof(name));
> > +	mdev->id = rc;
> > +
> > +	mic_device_init(mdev, pdev);
> > +
> > +	rc = pci_enable_device(pdev);
> > +	if (rc) {
> > +		dev_err(&pdev->dev, "failed to enable pci device.\n");
> > +		goto free_device;
> > +	}
> > +
> > +	pci_set_master(pdev);
> > +
> > +	rc = pci_request_regions(pdev, mic_driver_name);
> > +	if (rc) {
> > +		dev_err(&pdev->dev, "failed to get pci regions.\n");
> > +		goto disable_device;
> > +	}
> > +
> > +	rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
> > +	if (rc) {
> > +		dev_err(&pdev->dev, "Cannot set DMA mask\n");
> > +		goto release_regions;
> > +	}
> > +
> > +	mdev->mmio.pa = pci_resource_start(pdev, mdev->ops->mmio_bar);
> > +	mdev->mmio.len = pci_resource_len(pdev, mdev->ops->mmio_bar);
> > +	mdev->mmio.va = pci_ioremap_bar(pdev, mdev->ops->mmio_bar);
> > +	if (!mdev->mmio.va) {
> > +		dev_err(&pdev->dev, "Cannot remap MMIO BAR\n");
> > +		rc = -EIO;
> > +		goto release_regions;
> > +	}
> > +
> > +	mdev->aper.pa = pci_resource_start(pdev, mdev->ops->aper_bar);
> > +	mdev->aper.len = pci_resource_len(pdev, mdev->ops->aper_bar);
> > +	mdev->aper.va = ioremap_wc(mdev->aper.pa, mdev->aper.len);
> > +	if (!mdev->aper.va) {
> > +		dev_err(&pdev->dev, "Cannot remap Aperture BAR\n");
> > +		rc = -EIO;
> > +		goto unmap_mmio;
> > +	}
> > +
> > +	mdev->ops->init(mdev);
> > +
> > +	pci_set_drvdata(pdev, mdev);
> > +
> > +	mdev->sdev = device_create(g_mic.mic_class, &pdev->dev,
> > +		MKDEV(MAJOR(g_mic.mdev_id), mdev->id), NULL, "%s", mdev->name);
> > +	if (IS_ERR(mdev->sdev)) {
> > +		rc = PTR_ERR(mdev->sdev);
> > +		dev_err(&pdev->dev, "device_create failed rc %d\n", rc);
> > +		goto unmap_aper;
> > +	}
> > +
> > +	rc = sysfs_create_group(&mdev->sdev->kobj, &mdev->attr_group);
> 
> We now have a function you should use instead,
> device_create_with_groups() that solves the race condition you just
> caused here by creating and notifying userspace that the device is
> present, _before_ creating the sysfs files for it.
> 

We are using device_create_with_groups(..) now and it works nicely here.

> 
> 
> > +	if (rc) {
> > +		dev_err(&pdev->dev, "sysfs_create_group failed rc %d\n", rc);
> > +		goto destroy_device;
> > +	}
> > +	dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name);
> 
> Useless noise, remove it.
> 

Agreed, will remove.

> > +	return 0;
> > +destroy_device:
> > +	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
> > +unmap_aper:
> > +	iounmap(mdev->mmio.va);
> > +unmap_mmio:
> > +	iounmap(mdev->aper.va);
> > +release_regions:
> > +	pci_release_regions(pdev);
> > +disable_device:
> > +	pci_disable_device(pdev);
> > +free_device:
> > +	kfree(mdev);
> > +dec_num_dev:
> > +	g_mic.next_id--;
> > +	dev_err(&pdev->dev, "Probe failed rc %d\n", rc);
> > +	return rc;
> > +}
> > +
> > +/**
> > + * mic_remove - Device Removal Routine
> > + * mic_remove is called by the PCI subsystem to alert the driver
> > + * that it should release a PCI device.
> > + *
> > + * @pdev: PCI device structure
> > + */
> > +static void mic_remove(struct pci_dev *pdev)
> > +{
> > +	struct mic_device *mdev;
> > +	int id;
> > +
> > +	mdev = pci_get_drvdata(pdev);
> > +	if (!mdev)
> > +		return;
> > +
> > +	id = mdev->id;
> > +
> > +	sysfs_remove_group(&mdev->sdev->kobj, &mdev->attr_group);
> 
> No need to do this as:
> 
> > +	device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.mdev_id), mdev->id));
> 
> Will do it for you if you make the changes above.
> 

Indeed, it cleans up correctly.

We will incorporate all your suggestions for patch 1 (including the
header file cleanup in the other thread) and post a rev3 once it has
seen some validation.

Thanks for the review!
Sudeep Dutt

> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-08-14 20:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-08  3:04 [PATCH v2 0/7] Enable Drivers for Intel MIC X100 Coprocessors Sudeep Dutt
2013-08-08  3:04 ` [PATCH v2 1/7] Intel MIC Host Driver for X100 family Sudeep Dutt
2013-08-12 22:58   ` Greg Kroah-Hartman
2013-08-12 22:58   ` Greg Kroah-Hartman
2013-08-12 23:06   ` Greg Kroah-Hartman
2013-08-12 23:06     ` Greg Kroah-Hartman
2013-08-14 20:08     ` Sudeep Dutt
2013-08-14 20:08     ` Sudeep Dutt
2013-08-08  3:04 ` [PATCH v2 2/7] Intel MIC Host Driver Interrupt/SMPT support " Sudeep Dutt
2013-08-08  3:04 ` [PATCH v2 3/7] Intel MIC Host Driver, card OS state management Sudeep Dutt
2013-08-08  3:04 ` [PATCH v2 4/7] Intel MIC Card Driver for X100 family Sudeep Dutt
2013-08-08  3:04 ` [PATCH v2 5/7] Intel MIC Host Driver Changes for Virtio Devices Sudeep Dutt
2013-08-08  3:04 ` [PATCH v2 6/7] Intel MIC Card " Sudeep Dutt
2013-08-08  3:04 ` [PATCH v2 7/7] Sample Implementation of Intel MIC User Space Daemon Sudeep Dutt
2013-08-08  6:40   ` Michael S. Tsirkin
2013-08-08  6:40     ` Michael S. Tsirkin
2013-08-09 16:47     ` Sudeep Dutt
2013-08-09 16:47     ` Sudeep Dutt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.