All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
@ 2017-02-17  8:29 ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

vfio-ccw: the basic infrastructure
==================================

Introduction
------------

Here we describe the vfio support for I/O subchannel devices for
Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
virtual machine, while vfio is the means.

Different than other hardware architectures, s390 has defined a unified
I/O access method, which is so called Channel I/O. It has its own access
patterns:
- Channel programs run asynchronously on a separate (co)processor.
- The channel subsystem will access any memory designated by the caller
  in the channel program directly, i.e. there is no iommu involved.
Thus when we introduce vfio support for these devices, we realize it
with a mediated device (mdev) implementation. The vfio mdev will be
added to an iommu group, so as to make itself able to be managed by the
vfio framework. And we add read/write callbacks for special vfio I/O
regions to pass the channel programs from the mdev to its parent device
(the real I/O subchannel device) to do further address translation and
to perform I/O instructions.

This document does not intend to explain the s390 I/O architecture in
every detail. More information/reference could be found here:
- A good start to know Channel I/O in general:
  https://en.wikipedia.org/wiki/Channel_I/O
- s390 architecture:
  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
- The existing Qemu code which implements a simple emulated channel
  subsystem could also be a good reference. It makes it easier to follow
  the flow.
  qemu/hw/s390x/css.c

For vfio mediated device framework:
- Documentation/vfio-mediated-device.txt

Motivation of vfio-ccw
----------------------

Currently, a guest virtualized via qemu/kvm on s390 only sees
paravirtualized virtio devices via the "Virtio Over Channel I/O
(virtio-ccw)" transport. This makes virtio devices discoverable via
standard operating system algorithms for handling channel devices.

However this is not enough. On s390 for the majority of devices, which
use the standard Channel I/O based mechanism, we also need to provide
the functionality of passing through them to a Qemu virtual machine.
This includes devices that don't have a virtio counterpart (e.g. tape
drives) or that have specific characteristics which guests want to
exploit.

For passing a device to a guest, we want to use the same interface as
everybody else, namely vfio. Thus, we would like to introduce vfio
support for channel devices. And we would like to name this new vfio
device "vfio-ccw".

Access patterns of CCW devices
------------------------------

s390 architecture has implemented a so called channel subsystem, that
provides a unified view of the devices physically attached to the
systems. Though the s390 hardware platform knows about a huge variety of
different peripheral attachments like disk devices (aka. DASDs), tapes,
communication controllers, etc. They can all be accessed by a well
defined access method and they are presenting I/O completion a unified
way: I/O interruptions.

All I/O requires the use of channel command words (CCWs). A CCW is an
instruction to a specialized I/O channel processor. A channel program is
a sequence of CCWs which are executed by the I/O channel subsystem.  To
issue a channel program to the channel subsystem, it is required to
build an operation request block (ORB), which can be used to point out
the format of the CCW and other control information to the system. The
operating system signals the I/O channel subsystem to begin executing
the channel program with a SSCH (start sub-channel) instruction. The
central processor is then free to proceed with non-I/O instructions
until interrupted. The I/O completion result is received by the
interrupt handler in the form of interrupt response block (IRB).

Back to vfio-ccw, in short:
- ORBs and channel programs are built in guest kernel (with guest
  physical addresses).
- ORBs and channel programs are passed to the host kernel.
- Host kernel translates the guest physical addresses to real addresses
  and starts the I/O with issuing a privileged Channel I/O instruction
  (e.g SSCH).
- channel programs run asynchronously on a separate processor.
- I/O completion will be signaled to the host with I/O interruptions.
  And it will be copied as IRB to user space to pass it back to the
  guest.

Physical vfio ccw device and its child mdev
-------------------------------------------

As mentioned above, we realize vfio-ccw with a mdev implementation.

Channel I/O does not have IOMMU hardware support, so the physical
vfio-ccw device does not have an IOMMU level translation or isolation.

Sub-channel I/O instructions are all privileged instructions, When
handling the I/O instruction interception, vfio-ccw has the software
policing and translation how the channel program is programmed before
it gets sent to hardware.

Within this implementation, we have two drivers for two types of
devices:
- The vfio_ccw driver for the physical subchannel device.
  This is an I/O subchannel driver for the real subchannel device.  It
  realizes a group of callbacks and registers to the mdev framework as a
  parent (physical) device. As a consequence, mdev provides vfio_ccw a
  generic interface (sysfs) to create mdev devices. A vfio mdev could be
  created by vfio_ccw then and added to the mediated bus. It is the vfio
  device that added to an IOMMU group and a vfio group.
  vfio_ccw also provides an I/O region to accept channel program
  request from user space and store I/O interrupt result for user
  space to retrieve. To notify user space an I/O completion, it offers
  an interface to setup an eventfd fd for asynchronous signaling.

- The vfio_mdev driver for the mediated vfio ccw device.
  This is provided by the mdev framework. It is a vfio device driver for
  the mdev that created by vfio_ccw.
  It realize a group of vfio device driver callbacks, adds itself to a
  vfio group, and registers itself to the mdev framework as a mdev
  driver.
  It uses a vfio iommu backend that uses the existing map and unmap
  ioctls, but rather than programming them into an IOMMU for a device,
  it simply stores the translations for use by later requests. This
  means that a device programmed in a VM with guest physical addresses
  can have the vfio kernel convert that address to process virtual
  address, pin the page and program the hardware with the host physical
  address in one step.
  For a mdev, the vfio iommu backend will not pin the pages during the
  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
  of the iova<->vaddr mappings in this operation. And they export a
  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
  backend for the physical devices to pin and unpin pages by demand.

Below is a high Level block diagram.

 +-------------+
 |             |
 | +---------+ | mdev_register_driver() +--------------+
 | |  Mdev   | +<-----------------------+              |
 | |  bus    | |                        | vfio_mdev.ko |
 | | driver  | +----------------------->+              |<-> VFIO user
 | +---------+ |    probe()/remove()    +--------------+    APIs
 |             |
 |  MDEV CORE  |
 |   MODULE    |
 |   mdev.ko   |
 | +---------+ | mdev_register_device() +--------------+
 | |Physical | +<-----------------------+              |
 | | device  | |                        |  vfio_ccw.ko |<-> subchannel
 | |interface| +----------------------->+              |     device
 | +---------+ |       callback         +--------------+
 +-------------+

The process of how these work together.
1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
   physical device (with callbacks) to mdev framework.
   When vfio_ccw probing the subchannel device, it registers device
   pointer and callbacks to the mdev framework. Mdev related file nodes
   under the device node in sysfs would be created for the subchannel
   device, namely 'mdev_create', 'mdev_destroy' and
   'mdev_supported_types'.
2. Create a mediated vfio ccw device.
   Use the 'mdev_create' sysfs file, we need to manually create one (and
   only one for our case) mediated device.
3. vfio_mdev.ko drives the mediated ccw device.
   vfio_mdev is also the vfio device drvier. It will probe the mdev and
   add it to an iommu_group and a vfio_group. Then we could pass through
   the mdev to a guest.

vfio-ccw I/O region
-------------------

An I/O region is used to accept channel program request from user
space and store I/O interrupt result for user space to retrieve. The
defination of the region is:

struct ccw_io_region {
#define ORB_AREA_SIZE 12
	__u8	orb_area[ORB_AREA_SIZE];
#define SCSW_AREA_SIZE 12
	__u8	scsw_area[SCSW_AREA_SIZE];
#define IRB_AREA_SIZE 96
	__u8	irb_area[IRB_AREA_SIZE];
	__u32	ret_code;
} __packed;

While starting an I/O request, orb_area should be filled with the
guest ORB, and scsw_area should be filled with the SCSW of the Virtual
Subchannel.

irb_area stores the I/O result.

ret_code stores a return code for each access of the region.

vfio-ccw patches overview
-------------------------

For now, our patches are rebased on the latest mdev implementation.
vfio-ccw follows what vfio-pci did on the s390 paltform and uses
vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch
the code review for vfio-ccw. Note that the implementation is far from
complete yet; but we'd like to get feedback for the general
architecture.

* CCW translation APIs
- Description:
  These introduce a group of APIs (start with 'cp_') to do CCW
  translation. The CCWs passed in by a user space program are
  organized with their guest physical memory addresses. These APIs
  will copy the CCWs into the kernel space, and assemble a runnable
  kernel channel program by updating the guest physical addresses with
  their corresponding host physical addresses.
- Patches:
  vfio: ccw: introduce channel program interfaces

* vfio_ccw device driver
- Description:
  The following patches utilizes the CCW translation APIs and introduce
  vfio_ccw, which is the driver for the I/O subchannel devices you want
  to pass through.
  vfio_ccw implements the following vfio ioctls:
    VFIO_DEVICE_GET_INFO
    VFIO_DEVICE_GET_IRQ_INFO
    VFIO_DEVICE_GET_REGION_INFO
    VFIO_DEVICE_RESET
    VFIO_DEVICE_SET_IRQS
  This provides an I/O region, so that the user space program can pass a
  channel program to the kernel, to do further CCW translation before
  issuing them to a real device.
  This also provides the SET_IRQ ioctl to setup an event notifier to
  notify the user space program the I/O completion in an asynchronous
  way.
- Patches:
  vfio: ccw: basic implementation for vfio_ccw driver
  vfio: ccw: introduce ccw_io_region
  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls

The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
good example to get understand how these patches work. Here is a little
bit more detail how an I/O request triggered by the Qemu guest will be
handled (without error handling).

Explanation:
Q1-Q7: Qemu side process.
K1-K5: Kernel side process.

Q1. Get I/O region info during initialization.
Q2. Setup event notifier and handler to handle I/O completion.

... ...

Q3. Intercept a ssch instruction.
Q4. Write the guest channel program and ORB to the I/O region.
    K1. Copy from guest to kernel.
    K2. Translate the guest channel program to a host kernel space
        channel program, which becomes runnable for a real device.
    K3. With the necessary information contained in the orb passed in
        by Qemu, issue the ccwchain to the device.
    K4. Return the ssch CC code.
Q5. Return the CC code to the guest.

... ...

    K5. Interrupt handler gets the I/O result and write the result to
        the I/O region.
    K6. Signal Qemu to retrieve the result.
Q6. Get the signal and event handler reads out the result from the I/O
    region.
Q7. Update the irb for the guest.

Limitations
-----------

The current vfio-ccw implementation focuses on supporting basic commands
needed to implement block device functionality (read/write) of DASD/ECKD
device only. Some commands may need special handling in the future, for
example, anything related to path grouping.

DASD is a kind of storage device. While ECKD is a data recording format.
More information for DASD and ECKD could be found here:
https://en.wikipedia.org/wiki/Direct-access_storage_device
https://en.wikipedia.org/wiki/Count_key_data

Together with the corresponding work in Qemu, we can bring the passed
through DASD/ECKD device online in a guest now and use it as a block
device.

Reference
---------
1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
3. https://en.wikipedia.org/wiki/Channel_I/O
4. Documentation/s390/cds.txt
5. Documentation/vfio.txt
6. Documentation/vfio-mediated-device.txt

Change log
----------

v2 -> v3:
1. Adopt the newest mdev interfaces.
2. Move vfio_ccw.h to uapi.
3. Add a atomic_t avil variable in vfio_ccw_private to denotes the
   available mdev instance number of the subchannel.
4. Patch 11: correct the usage of VFIO_CCW_NUM_IRQS.
5. Remove license statements.
6. Cosmetic rewording of commit message and comments.

v1 -> v2:
1. Rebase the implementation to the mdev framework approach.
2. Introduce a MMIO I/O region to handle I/O requests and results.

Dong Jia Shi (15):
  s390: cio: introduce cio_cancel_halt_clear
  s390: cio: export more interfaces
  vfio: ccw: define device_api strings
  vfio: ccw: basic implementation for vfio_ccw driver
  vfio: ccw: introduce channel program interfaces
  vfio: ccw: register vfio_ccw to the mediated device framework
  vfio: ccw: introduce ccw_io_region
  vfio: ccw: handle ccw command request
  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
  vfio: ccw: return I/O results asynchronously
  vfio: ccw: introduce a finite state machine
  docs: add documentation for vfio-ccw
  vfio: ccw: introduce support for ccw0

 Documentation/s390/00-INDEX         |   2 +
 Documentation/s390/vfio-ccw.txt     | 303 +++++++++++++
 arch/s390/Kconfig                   |  17 +
 arch/s390/include/asm/isc.h         |   1 +
 arch/s390/include/uapi/asm/Kbuild   |   1 +
 drivers/iommu/Kconfig               |   8 +
 drivers/s390/cio/Makefile           |   3 +
 drivers/s390/cio/cio.c              |  69 +++
 drivers/s390/cio/cio.h              |   1 +
 drivers/s390/cio/device_fsm.c       |  54 +--
 drivers/s390/cio/vfio_ccw_cp.c      | 870 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_cp.h      |  42 ++
 drivers/s390/cio/vfio_ccw_drv.c     | 308 +++++++++++++
 drivers/s390/cio/vfio_ccw_fsm.c     | 207 +++++++++
 drivers/s390/cio/vfio_ccw_ops.c     | 447 ++++++++++++++++++
 drivers/s390/cio/vfio_ccw_private.h |  96 ++++
 include/uapi/linux/vfio.h           |  18 +
 include/uapi/linux/vfio_ccw.h       |  24 +
 18 files changed, 2425 insertions(+), 46 deletions(-)
 create mode 100644 Documentation/s390/vfio-ccw.txt
 create mode 100644 drivers/s390/cio/vfio_ccw_cp.c
 create mode 100644 drivers/s390/cio/vfio_ccw_cp.h
 create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
 create mode 100644 drivers/s390/cio/vfio_ccw_fsm.c
 create mode 100644 drivers/s390/cio/vfio_ccw_ops.c
 create mode 100644 drivers/s390/cio/vfio_ccw_private.h
 create mode 100644 include/uapi/linux/vfio_ccw.h

-- 
2.8.4

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
@ 2017-02-17  8:29 ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

vfio-ccw: the basic infrastructure
==================================

Introduction
------------

Here we describe the vfio support for I/O subchannel devices for
Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
virtual machine, while vfio is the means.

Different than other hardware architectures, s390 has defined a unified
I/O access method, which is so called Channel I/O. It has its own access
patterns:
- Channel programs run asynchronously on a separate (co)processor.
- The channel subsystem will access any memory designated by the caller
  in the channel program directly, i.e. there is no iommu involved.
Thus when we introduce vfio support for these devices, we realize it
with a mediated device (mdev) implementation. The vfio mdev will be
added to an iommu group, so as to make itself able to be managed by the
vfio framework. And we add read/write callbacks for special vfio I/O
regions to pass the channel programs from the mdev to its parent device
(the real I/O subchannel device) to do further address translation and
to perform I/O instructions.

This document does not intend to explain the s390 I/O architecture in
every detail. More information/reference could be found here:
- A good start to know Channel I/O in general:
  https://en.wikipedia.org/wiki/Channel_I/O
- s390 architecture:
  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
- The existing Qemu code which implements a simple emulated channel
  subsystem could also be a good reference. It makes it easier to follow
  the flow.
  qemu/hw/s390x/css.c

For vfio mediated device framework:
- Documentation/vfio-mediated-device.txt

Motivation of vfio-ccw
----------------------

Currently, a guest virtualized via qemu/kvm on s390 only sees
paravirtualized virtio devices via the "Virtio Over Channel I/O
(virtio-ccw)" transport. This makes virtio devices discoverable via
standard operating system algorithms for handling channel devices.

However this is not enough. On s390 for the majority of devices, which
use the standard Channel I/O based mechanism, we also need to provide
the functionality of passing through them to a Qemu virtual machine.
This includes devices that don't have a virtio counterpart (e.g. tape
drives) or that have specific characteristics which guests want to
exploit.

For passing a device to a guest, we want to use the same interface as
everybody else, namely vfio. Thus, we would like to introduce vfio
support for channel devices. And we would like to name this new vfio
device "vfio-ccw".

Access patterns of CCW devices
------------------------------

s390 architecture has implemented a so called channel subsystem, that
provides a unified view of the devices physically attached to the
systems. Though the s390 hardware platform knows about a huge variety of
different peripheral attachments like disk devices (aka. DASDs), tapes,
communication controllers, etc. They can all be accessed by a well
defined access method and they are presenting I/O completion a unified
way: I/O interruptions.

All I/O requires the use of channel command words (CCWs). A CCW is an
instruction to a specialized I/O channel processor. A channel program is
a sequence of CCWs which are executed by the I/O channel subsystem.  To
issue a channel program to the channel subsystem, it is required to
build an operation request block (ORB), which can be used to point out
the format of the CCW and other control information to the system. The
operating system signals the I/O channel subsystem to begin executing
the channel program with a SSCH (start sub-channel) instruction. The
central processor is then free to proceed with non-I/O instructions
until interrupted. The I/O completion result is received by the
interrupt handler in the form of interrupt response block (IRB).

Back to vfio-ccw, in short:
- ORBs and channel programs are built in guest kernel (with guest
  physical addresses).
- ORBs and channel programs are passed to the host kernel.
- Host kernel translates the guest physical addresses to real addresses
  and starts the I/O with issuing a privileged Channel I/O instruction
  (e.g SSCH).
- channel programs run asynchronously on a separate processor.
- I/O completion will be signaled to the host with I/O interruptions.
  And it will be copied as IRB to user space to pass it back to the
  guest.

Physical vfio ccw device and its child mdev
-------------------------------------------

As mentioned above, we realize vfio-ccw with a mdev implementation.

Channel I/O does not have IOMMU hardware support, so the physical
vfio-ccw device does not have an IOMMU level translation or isolation.

Sub-channel I/O instructions are all privileged instructions, When
handling the I/O instruction interception, vfio-ccw has the software
policing and translation how the channel program is programmed before
it gets sent to hardware.

Within this implementation, we have two drivers for two types of
devices:
- The vfio_ccw driver for the physical subchannel device.
  This is an I/O subchannel driver for the real subchannel device.  It
  realizes a group of callbacks and registers to the mdev framework as a
  parent (physical) device. As a consequence, mdev provides vfio_ccw a
  generic interface (sysfs) to create mdev devices. A vfio mdev could be
  created by vfio_ccw then and added to the mediated bus. It is the vfio
  device that added to an IOMMU group and a vfio group.
  vfio_ccw also provides an I/O region to accept channel program
  request from user space and store I/O interrupt result for user
  space to retrieve. To notify user space an I/O completion, it offers
  an interface to setup an eventfd fd for asynchronous signaling.

- The vfio_mdev driver for the mediated vfio ccw device.
  This is provided by the mdev framework. It is a vfio device driver for
  the mdev that created by vfio_ccw.
  It realize a group of vfio device driver callbacks, adds itself to a
  vfio group, and registers itself to the mdev framework as a mdev
  driver.
  It uses a vfio iommu backend that uses the existing map and unmap
  ioctls, but rather than programming them into an IOMMU for a device,
  it simply stores the translations for use by later requests. This
  means that a device programmed in a VM with guest physical addresses
  can have the vfio kernel convert that address to process virtual
  address, pin the page and program the hardware with the host physical
  address in one step.
  For a mdev, the vfio iommu backend will not pin the pages during the
  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
  of the iova<->vaddr mappings in this operation. And they export a
  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
  backend for the physical devices to pin and unpin pages by demand.

Below is a high Level block diagram.

 +-------------+
 |             |
 | +---------+ | mdev_register_driver() +--------------+
 | |  Mdev   | +<-----------------------+              |
 | |  bus    | |                        | vfio_mdev.ko |
 | | driver  | +----------------------->+              |<-> VFIO user
 | +---------+ |    probe()/remove()    +--------------+    APIs
 |             |
 |  MDEV CORE  |
 |   MODULE    |
 |   mdev.ko   |
 | +---------+ | mdev_register_device() +--------------+
 | |Physical | +<-----------------------+              |
 | | device  | |                        |  vfio_ccw.ko |<-> subchannel
 | |interface| +----------------------->+              |     device
 | +---------+ |       callback         +--------------+
 +-------------+

The process of how these work together.
1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
   physical device (with callbacks) to mdev framework.
   When vfio_ccw probing the subchannel device, it registers device
   pointer and callbacks to the mdev framework. Mdev related file nodes
   under the device node in sysfs would be created for the subchannel
   device, namely 'mdev_create', 'mdev_destroy' and
   'mdev_supported_types'.
2. Create a mediated vfio ccw device.
   Use the 'mdev_create' sysfs file, we need to manually create one (and
   only one for our case) mediated device.
3. vfio_mdev.ko drives the mediated ccw device.
   vfio_mdev is also the vfio device drvier. It will probe the mdev and
   add it to an iommu_group and a vfio_group. Then we could pass through
   the mdev to a guest.

vfio-ccw I/O region
-------------------

An I/O region is used to accept channel program request from user
space and store I/O interrupt result for user space to retrieve. The
defination of the region is:

struct ccw_io_region {
#define ORB_AREA_SIZE 12
	__u8	orb_area[ORB_AREA_SIZE];
#define SCSW_AREA_SIZE 12
	__u8	scsw_area[SCSW_AREA_SIZE];
#define IRB_AREA_SIZE 96
	__u8	irb_area[IRB_AREA_SIZE];
	__u32	ret_code;
} __packed;

While starting an I/O request, orb_area should be filled with the
guest ORB, and scsw_area should be filled with the SCSW of the Virtual
Subchannel.

irb_area stores the I/O result.

ret_code stores a return code for each access of the region.

vfio-ccw patches overview
-------------------------

For now, our patches are rebased on the latest mdev implementation.
vfio-ccw follows what vfio-pci did on the s390 paltform and uses
vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch
the code review for vfio-ccw. Note that the implementation is far from
complete yet; but we'd like to get feedback for the general
architecture.

* CCW translation APIs
- Description:
  These introduce a group of APIs (start with 'cp_') to do CCW
  translation. The CCWs passed in by a user space program are
  organized with their guest physical memory addresses. These APIs
  will copy the CCWs into the kernel space, and assemble a runnable
  kernel channel program by updating the guest physical addresses with
  their corresponding host physical addresses.
- Patches:
  vfio: ccw: introduce channel program interfaces

* vfio_ccw device driver
- Description:
  The following patches utilizes the CCW translation APIs and introduce
  vfio_ccw, which is the driver for the I/O subchannel devices you want
  to pass through.
  vfio_ccw implements the following vfio ioctls:
    VFIO_DEVICE_GET_INFO
    VFIO_DEVICE_GET_IRQ_INFO
    VFIO_DEVICE_GET_REGION_INFO
    VFIO_DEVICE_RESET
    VFIO_DEVICE_SET_IRQS
  This provides an I/O region, so that the user space program can pass a
  channel program to the kernel, to do further CCW translation before
  issuing them to a real device.
  This also provides the SET_IRQ ioctl to setup an event notifier to
  notify the user space program the I/O completion in an asynchronous
  way.
- Patches:
  vfio: ccw: basic implementation for vfio_ccw driver
  vfio: ccw: introduce ccw_io_region
  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls

The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
good example to get understand how these patches work. Here is a little
bit more detail how an I/O request triggered by the Qemu guest will be
handled (without error handling).

Explanation:
Q1-Q7: Qemu side process.
K1-K5: Kernel side process.

Q1. Get I/O region info during initialization.
Q2. Setup event notifier and handler to handle I/O completion.

... ...

Q3. Intercept a ssch instruction.
Q4. Write the guest channel program and ORB to the I/O region.
    K1. Copy from guest to kernel.
    K2. Translate the guest channel program to a host kernel space
        channel program, which becomes runnable for a real device.
    K3. With the necessary information contained in the orb passed in
        by Qemu, issue the ccwchain to the device.
    K4. Return the ssch CC code.
Q5. Return the CC code to the guest.

... ...

    K5. Interrupt handler gets the I/O result and write the result to
        the I/O region.
    K6. Signal Qemu to retrieve the result.
Q6. Get the signal and event handler reads out the result from the I/O
    region.
Q7. Update the irb for the guest.

Limitations
-----------

The current vfio-ccw implementation focuses on supporting basic commands
needed to implement block device functionality (read/write) of DASD/ECKD
device only. Some commands may need special handling in the future, for
example, anything related to path grouping.

DASD is a kind of storage device. While ECKD is a data recording format.
More information for DASD and ECKD could be found here:
https://en.wikipedia.org/wiki/Direct-access_storage_device
https://en.wikipedia.org/wiki/Count_key_data

Together with the corresponding work in Qemu, we can bring the passed
through DASD/ECKD device online in a guest now and use it as a block
device.

Reference
---------
1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
3. https://en.wikipedia.org/wiki/Channel_I/O
4. Documentation/s390/cds.txt
5. Documentation/vfio.txt
6. Documentation/vfio-mediated-device.txt

Change log
----------

v2 -> v3:
1. Adopt the newest mdev interfaces.
2. Move vfio_ccw.h to uapi.
3. Add a atomic_t avil variable in vfio_ccw_private to denotes the
   available mdev instance number of the subchannel.
4. Patch 11: correct the usage of VFIO_CCW_NUM_IRQS.
5. Remove license statements.
6. Cosmetic rewording of commit message and comments.

v1 -> v2:
1. Rebase the implementation to the mdev framework approach.
2. Introduce a MMIO I/O region to handle I/O requests and results.

Dong Jia Shi (15):
  s390: cio: introduce cio_cancel_halt_clear
  s390: cio: export more interfaces
  vfio: ccw: define device_api strings
  vfio: ccw: basic implementation for vfio_ccw driver
  vfio: ccw: introduce channel program interfaces
  vfio: ccw: register vfio_ccw to the mediated device framework
  vfio: ccw: introduce ccw_io_region
  vfio: ccw: handle ccw command request
  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
  vfio: ccw: return I/O results asynchronously
  vfio: ccw: introduce a finite state machine
  docs: add documentation for vfio-ccw
  vfio: ccw: introduce support for ccw0

 Documentation/s390/00-INDEX         |   2 +
 Documentation/s390/vfio-ccw.txt     | 303 +++++++++++++
 arch/s390/Kconfig                   |  17 +
 arch/s390/include/asm/isc.h         |   1 +
 arch/s390/include/uapi/asm/Kbuild   |   1 +
 drivers/iommu/Kconfig               |   8 +
 drivers/s390/cio/Makefile           |   3 +
 drivers/s390/cio/cio.c              |  69 +++
 drivers/s390/cio/cio.h              |   1 +
 drivers/s390/cio/device_fsm.c       |  54 +--
 drivers/s390/cio/vfio_ccw_cp.c      | 870 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_cp.h      |  42 ++
 drivers/s390/cio/vfio_ccw_drv.c     | 308 +++++++++++++
 drivers/s390/cio/vfio_ccw_fsm.c     | 207 +++++++++
 drivers/s390/cio/vfio_ccw_ops.c     | 447 ++++++++++++++++++
 drivers/s390/cio/vfio_ccw_private.h |  96 ++++
 include/uapi/linux/vfio.h           |  18 +
 include/uapi/linux/vfio_ccw.h       |  24 +
 18 files changed, 2425 insertions(+), 46 deletions(-)
 create mode 100644 Documentation/s390/vfio-ccw.txt
 create mode 100644 drivers/s390/cio/vfio_ccw_cp.c
 create mode 100644 drivers/s390/cio/vfio_ccw_cp.h
 create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
 create mode 100644 drivers/s390/cio/vfio_ccw_fsm.c
 create mode 100644 drivers/s390/cio/vfio_ccw_ops.c
 create mode 100644 drivers/s390/cio/vfio_ccw_private.h
 create mode 100644 include/uapi/linux/vfio_ccw.h

-- 
2.8.4

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 01/15] s390: cio: introduce cio_cancel_halt_clear
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

For future code reuse purpose, this decouples the cio code with
the ccw device specific parts from ccw_device_cancel_halt_clear,
and makes a new common I/O interface named cio_cancel_halt_clear.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/cio.c        | 59 +++++++++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/cio.h        |  1 +
 drivers/s390/cio/device_fsm.c | 54 ++++++---------------------------------
 3 files changed, 68 insertions(+), 46 deletions(-)

diff --git a/drivers/s390/cio/cio.c b/drivers/s390/cio/cio.c
index de6fccc..8f86072 100644
--- a/drivers/s390/cio/cio.c
+++ b/drivers/s390/cio/cio.c
@@ -309,6 +309,65 @@ cio_cancel (struct subchannel *sch)
 	}
 }
 
+/**
+ * cio_cancel_halt_clear - Cancel running I/O by performing cancel, halt
+ * and clear ordinally if subchannel is valid.
+ * @sch: subchannel on which to perform the cancel_halt_clear operation
+ * @iretry: the number of the times remained to retry the next operation
+ *
+ * This should be called repeatedly since halt/clear are asynchronous
+ * operations. We do one try with cio_cancel, two tries with cio_halt,
+ * 255 tries with cio_clear. The caller should initialize @iretry with
+ * the value 255 for its first call to this, and keep using the same
+ * @iretry in the subsequent calls until it gets a non -EBUSY return.
+ *
+ * Returns 0 if device now idle, -ENODEV for device not operational,
+ * -EBUSY if an interrupt is expected (either from halt/clear or from a
+ * status pending), and -EIO if out of retries.
+ */
+int cio_cancel_halt_clear(struct subchannel *sch, int *iretry)
+{
+	int ret;
+
+	if (cio_update_schib(sch))
+		return -ENODEV;
+	if (!sch->schib.pmcw.ena)
+		/* Not operational -> done. */
+		return 0;
+	/* Stage 1: cancel io. */
+	if (!(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_HALT_PEND) &&
+	    !(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_CLEAR_PEND)) {
+		if (!scsw_is_tm(&sch->schib.scsw)) {
+			ret = cio_cancel(sch);
+			if (ret != -EINVAL)
+				return ret;
+		}
+		/*
+		 * Cancel io unsuccessful or not applicable (transport mode).
+		 * Continue with asynchronous instructions.
+		 */
+		*iretry = 3;	/* 3 halt retries. */
+	}
+	/* Stage 2: halt io. */
+	if (!(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_CLEAR_PEND)) {
+		if (*iretry) {
+			*iretry -= 1;
+			ret = cio_halt(sch);
+			if (ret != -EBUSY)
+				return (ret == 0) ? -EBUSY : ret;
+		}
+		/* Halt io unsuccessful. */
+		*iretry = 255;	/* 255 clear retries. */
+	}
+	/* Stage 3: clear io. */
+	if (*iretry) {
+		*iretry -= 1;
+		ret = cio_clear(sch);
+		return (ret == 0) ? -EBUSY : ret;
+	}
+	/* Function was unsuccessful */
+	return -EIO;
+}
 
 static void cio_apply_config(struct subchannel *sch, struct schib *schib)
 {
diff --git a/drivers/s390/cio/cio.h b/drivers/s390/cio/cio.h
index f0e57ae..939596d 100644
--- a/drivers/s390/cio/cio.h
+++ b/drivers/s390/cio/cio.h
@@ -123,6 +123,7 @@ extern int cio_enable_subchannel(struct subchannel *, u32);
 extern int cio_disable_subchannel (struct subchannel *);
 extern int cio_cancel (struct subchannel *);
 extern int cio_clear (struct subchannel *);
+extern int cio_cancel_halt_clear(struct subchannel *, int *);
 extern int cio_resume (struct subchannel *);
 extern int cio_halt (struct subchannel *);
 extern int cio_start (struct subchannel *, struct ccw1 *, __u8);
diff --git a/drivers/s390/cio/device_fsm.c b/drivers/s390/cio/device_fsm.c
index 9afb5ce..12016e3 100644
--- a/drivers/s390/cio/device_fsm.c
+++ b/drivers/s390/cio/device_fsm.c
@@ -124,14 +124,6 @@ ccw_device_set_timeout(struct ccw_device *cdev, int expires)
 	add_timer(&cdev->private->timer);
 }
 
-/*
- * Cancel running i/o. This is called repeatedly since halt/clear are
- * asynchronous operations. We do one try with cio_cancel, two tries
- * with cio_halt, 255 tries with cio_clear. If everythings fails panic.
- * Returns 0 if device now idle, -ENODEV for device not operational and
- * -EBUSY if an interrupt is expected (either from halt/clear or from a
- * status pending).
- */
 int
 ccw_device_cancel_halt_clear(struct ccw_device *cdev)
 {
@@ -139,44 +131,14 @@ ccw_device_cancel_halt_clear(struct ccw_device *cdev)
 	int ret;
 
 	sch = to_subchannel(cdev->dev.parent);
-	if (cio_update_schib(sch))
-		return -ENODEV; 
-	if (!sch->schib.pmcw.ena)
-		/* Not operational -> done. */
-		return 0;
-	/* Stage 1: cancel io. */
-	if (!(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_HALT_PEND) &&
-	    !(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_CLEAR_PEND)) {
-		if (!scsw_is_tm(&sch->schib.scsw)) {
-			ret = cio_cancel(sch);
-			if (ret != -EINVAL)
-				return ret;
-		}
-		/* cancel io unsuccessful or not applicable (transport mode).
-		 * Continue with asynchronous instructions. */
-		cdev->private->iretry = 3;	/* 3 halt retries. */
-	}
-	if (!(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_CLEAR_PEND)) {
-		/* Stage 2: halt io. */
-		if (cdev->private->iretry) {
-			cdev->private->iretry--;
-			ret = cio_halt(sch);
-			if (ret != -EBUSY)
-				return (ret == 0) ? -EBUSY : ret;
-		}
-		/* halt io unsuccessful. */
-		cdev->private->iretry = 255;	/* 255 clear retries. */
-	}
-	/* Stage 3: clear io. */
-	if (cdev->private->iretry) {
-		cdev->private->iretry--;
-		ret = cio_clear (sch);
-		return (ret == 0) ? -EBUSY : ret;
-	}
-	/* Function was unsuccessful */
-	CIO_MSG_EVENT(0, "0.%x.%04x: could not stop I/O\n",
-		      cdev->private->dev_id.ssid, cdev->private->dev_id.devno);
-	return -EIO;
+	ret = cio_cancel_halt_clear(sch, &cdev->private->iretry);
+
+	if (ret == -EIO)
+		CIO_MSG_EVENT(0, "0.%x.%04x: could not stop I/O\n",
+			      cdev->private->dev_id.ssid,
+			      cdev->private->dev_id.devno);
+
+	return ret;
 }
 
 void ccw_device_update_sense_data(struct ccw_device *cdev)
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 01/15] s390: cio: introduce cio_cancel_halt_clear
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

For future code reuse purpose, this decouples the cio code with
the ccw device specific parts from ccw_device_cancel_halt_clear,
and makes a new common I/O interface named cio_cancel_halt_clear.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/cio.c        | 59 +++++++++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/cio.h        |  1 +
 drivers/s390/cio/device_fsm.c | 54 ++++++---------------------------------
 3 files changed, 68 insertions(+), 46 deletions(-)

diff --git a/drivers/s390/cio/cio.c b/drivers/s390/cio/cio.c
index de6fccc..8f86072 100644
--- a/drivers/s390/cio/cio.c
+++ b/drivers/s390/cio/cio.c
@@ -309,6 +309,65 @@ cio_cancel (struct subchannel *sch)
 	}
 }
 
+/**
+ * cio_cancel_halt_clear - Cancel running I/O by performing cancel, halt
+ * and clear ordinally if subchannel is valid.
+ * @sch: subchannel on which to perform the cancel_halt_clear operation
+ * @iretry: the number of the times remained to retry the next operation
+ *
+ * This should be called repeatedly since halt/clear are asynchronous
+ * operations. We do one try with cio_cancel, two tries with cio_halt,
+ * 255 tries with cio_clear. The caller should initialize @iretry with
+ * the value 255 for its first call to this, and keep using the same
+ * @iretry in the subsequent calls until it gets a non -EBUSY return.
+ *
+ * Returns 0 if device now idle, -ENODEV for device not operational,
+ * -EBUSY if an interrupt is expected (either from halt/clear or from a
+ * status pending), and -EIO if out of retries.
+ */
+int cio_cancel_halt_clear(struct subchannel *sch, int *iretry)
+{
+	int ret;
+
+	if (cio_update_schib(sch))
+		return -ENODEV;
+	if (!sch->schib.pmcw.ena)
+		/* Not operational -> done. */
+		return 0;
+	/* Stage 1: cancel io. */
+	if (!(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_HALT_PEND) &&
+	    !(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_CLEAR_PEND)) {
+		if (!scsw_is_tm(&sch->schib.scsw)) {
+			ret = cio_cancel(sch);
+			if (ret != -EINVAL)
+				return ret;
+		}
+		/*
+		 * Cancel io unsuccessful or not applicable (transport mode).
+		 * Continue with asynchronous instructions.
+		 */
+		*iretry = 3;	/* 3 halt retries. */
+	}
+	/* Stage 2: halt io. */
+	if (!(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_CLEAR_PEND)) {
+		if (*iretry) {
+			*iretry -= 1;
+			ret = cio_halt(sch);
+			if (ret != -EBUSY)
+				return (ret == 0) ? -EBUSY : ret;
+		}
+		/* Halt io unsuccessful. */
+		*iretry = 255;	/* 255 clear retries. */
+	}
+	/* Stage 3: clear io. */
+	if (*iretry) {
+		*iretry -= 1;
+		ret = cio_clear(sch);
+		return (ret == 0) ? -EBUSY : ret;
+	}
+	/* Function was unsuccessful */
+	return -EIO;
+}
 
 static void cio_apply_config(struct subchannel *sch, struct schib *schib)
 {
diff --git a/drivers/s390/cio/cio.h b/drivers/s390/cio/cio.h
index f0e57ae..939596d 100644
--- a/drivers/s390/cio/cio.h
+++ b/drivers/s390/cio/cio.h
@@ -123,6 +123,7 @@ extern int cio_enable_subchannel(struct subchannel *, u32);
 extern int cio_disable_subchannel (struct subchannel *);
 extern int cio_cancel (struct subchannel *);
 extern int cio_clear (struct subchannel *);
+extern int cio_cancel_halt_clear(struct subchannel *, int *);
 extern int cio_resume (struct subchannel *);
 extern int cio_halt (struct subchannel *);
 extern int cio_start (struct subchannel *, struct ccw1 *, __u8);
diff --git a/drivers/s390/cio/device_fsm.c b/drivers/s390/cio/device_fsm.c
index 9afb5ce..12016e3 100644
--- a/drivers/s390/cio/device_fsm.c
+++ b/drivers/s390/cio/device_fsm.c
@@ -124,14 +124,6 @@ ccw_device_set_timeout(struct ccw_device *cdev, int expires)
 	add_timer(&cdev->private->timer);
 }
 
-/*
- * Cancel running i/o. This is called repeatedly since halt/clear are
- * asynchronous operations. We do one try with cio_cancel, two tries
- * with cio_halt, 255 tries with cio_clear. If everythings fails panic.
- * Returns 0 if device now idle, -ENODEV for device not operational and
- * -EBUSY if an interrupt is expected (either from halt/clear or from a
- * status pending).
- */
 int
 ccw_device_cancel_halt_clear(struct ccw_device *cdev)
 {
@@ -139,44 +131,14 @@ ccw_device_cancel_halt_clear(struct ccw_device *cdev)
 	int ret;
 
 	sch = to_subchannel(cdev->dev.parent);
-	if (cio_update_schib(sch))
-		return -ENODEV; 
-	if (!sch->schib.pmcw.ena)
-		/* Not operational -> done. */
-		return 0;
-	/* Stage 1: cancel io. */
-	if (!(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_HALT_PEND) &&
-	    !(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_CLEAR_PEND)) {
-		if (!scsw_is_tm(&sch->schib.scsw)) {
-			ret = cio_cancel(sch);
-			if (ret != -EINVAL)
-				return ret;
-		}
-		/* cancel io unsuccessful or not applicable (transport mode).
-		 * Continue with asynchronous instructions. */
-		cdev->private->iretry = 3;	/* 3 halt retries. */
-	}
-	if (!(scsw_actl(&sch->schib.scsw) & SCSW_ACTL_CLEAR_PEND)) {
-		/* Stage 2: halt io. */
-		if (cdev->private->iretry) {
-			cdev->private->iretry--;
-			ret = cio_halt(sch);
-			if (ret != -EBUSY)
-				return (ret == 0) ? -EBUSY : ret;
-		}
-		/* halt io unsuccessful. */
-		cdev->private->iretry = 255;	/* 255 clear retries. */
-	}
-	/* Stage 3: clear io. */
-	if (cdev->private->iretry) {
-		cdev->private->iretry--;
-		ret = cio_clear (sch);
-		return (ret == 0) ? -EBUSY : ret;
-	}
-	/* Function was unsuccessful */
-	CIO_MSG_EVENT(0, "0.%x.%04x: could not stop I/O\n",
-		      cdev->private->dev_id.ssid, cdev->private->dev_id.devno);
-	return -EIO;
+	ret = cio_cancel_halt_clear(sch, &cdev->private->iretry);
+
+	if (ret == -EIO)
+		CIO_MSG_EVENT(0, "0.%x.%04x: could not stop I/O\n",
+			      cdev->private->dev_id.ssid,
+			      cdev->private->dev_id.devno);
+
+	return ret;
 }
 
 void ccw_device_update_sense_data(struct ccw_device *cdev)
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 02/15] s390: cio: export more interfaces
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Export the common I/O interfaces those are needed by an I/O
subchannel driver to actually talk to the subchannel.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/cio.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/s390/cio/cio.c b/drivers/s390/cio/cio.c
index 8f86072..f725461 100644
--- a/drivers/s390/cio/cio.c
+++ b/drivers/s390/cio/cio.c
@@ -170,12 +170,14 @@ cio_start_key (struct subchannel *sch,	/* subchannel structure */
 		return ccode;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_start_key);
 
 int
 cio_start (struct subchannel *sch, struct ccw1 *cpa, __u8 lpm)
 {
 	return cio_start_key(sch, cpa, lpm, PAGE_DEFAULT_KEY);
 }
+EXPORT_SYMBOL_GPL(cio_start);
 
 /*
  * resume suspended I/O operation
@@ -208,6 +210,7 @@ cio_resume (struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_resume);
 
 /*
  * halt I/O operation
@@ -241,6 +244,7 @@ cio_halt(struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_halt);
 
 /*
  * Clear I/O operation
@@ -271,6 +275,7 @@ cio_clear(struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_clear);
 
 /*
  * Function: cio_cancel
@@ -308,6 +313,7 @@ cio_cancel (struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_cancel);
 
 /**
  * cio_cancel_halt_clear - Cancel running I/O by performing cancel, halt
@@ -368,6 +374,7 @@ int cio_cancel_halt_clear(struct subchannel *sch, int *iretry)
 	/* Function was unsuccessful */
 	return -EIO;
 }
+EXPORT_SYMBOL_GPL(cio_cancel_halt_clear);
 
 static void cio_apply_config(struct subchannel *sch, struct schib *schib)
 {
@@ -441,6 +448,7 @@ int cio_commit_config(struct subchannel *sch)
 	}
 	return ret;
 }
+EXPORT_SYMBOL_GPL(cio_commit_config);
 
 /**
  * cio_update_schib - Perform stsch and update schib if subchannel is valid.
@@ -1046,6 +1054,7 @@ int cio_tm_start_key(struct subchannel *sch, struct tcw *tcw, u8 lpm, u8 key)
 		return cio_start_handle_notoper(sch, lpm);
 	}
 }
+EXPORT_SYMBOL_GPL(cio_tm_start_key);
 
 /**
  * cio_tm_intrg - perform interrogate function
@@ -1071,3 +1080,4 @@ int cio_tm_intrg(struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_tm_intrg);
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 02/15] s390: cio: export more interfaces
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Export the common I/O interfaces those are needed by an I/O
subchannel driver to actually talk to the subchannel.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/cio.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/s390/cio/cio.c b/drivers/s390/cio/cio.c
index 8f86072..f725461 100644
--- a/drivers/s390/cio/cio.c
+++ b/drivers/s390/cio/cio.c
@@ -170,12 +170,14 @@ cio_start_key (struct subchannel *sch,	/* subchannel structure */
 		return ccode;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_start_key);
 
 int
 cio_start (struct subchannel *sch, struct ccw1 *cpa, __u8 lpm)
 {
 	return cio_start_key(sch, cpa, lpm, PAGE_DEFAULT_KEY);
 }
+EXPORT_SYMBOL_GPL(cio_start);
 
 /*
  * resume suspended I/O operation
@@ -208,6 +210,7 @@ cio_resume (struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_resume);
 
 /*
  * halt I/O operation
@@ -241,6 +244,7 @@ cio_halt(struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_halt);
 
 /*
  * Clear I/O operation
@@ -271,6 +275,7 @@ cio_clear(struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_clear);
 
 /*
  * Function: cio_cancel
@@ -308,6 +313,7 @@ cio_cancel (struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_cancel);
 
 /**
  * cio_cancel_halt_clear - Cancel running I/O by performing cancel, halt
@@ -368,6 +374,7 @@ int cio_cancel_halt_clear(struct subchannel *sch, int *iretry)
 	/* Function was unsuccessful */
 	return -EIO;
 }
+EXPORT_SYMBOL_GPL(cio_cancel_halt_clear);
 
 static void cio_apply_config(struct subchannel *sch, struct schib *schib)
 {
@@ -441,6 +448,7 @@ int cio_commit_config(struct subchannel *sch)
 	}
 	return ret;
 }
+EXPORT_SYMBOL_GPL(cio_commit_config);
 
 /**
  * cio_update_schib - Perform stsch and update schib if subchannel is valid.
@@ -1046,6 +1054,7 @@ int cio_tm_start_key(struct subchannel *sch, struct tcw *tcw, u8 lpm, u8 key)
 		return cio_start_handle_notoper(sch, lpm);
 	}
 }
+EXPORT_SYMBOL_GPL(cio_tm_start_key);
 
 /**
  * cio_tm_intrg - perform interrogate function
@@ -1071,3 +1080,4 @@ int cio_tm_intrg(struct subchannel *sch)
 		return -ENODEV;
 	}
 }
+EXPORT_SYMBOL_GPL(cio_tm_intrg);
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 03/15] vfio: ccw: define device_api strings
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Define vfio-ccw device API strings. CCW vendor driver using mediated
device framework should use this string for device_api attribute.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 include/uapi/linux/vfio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..6183789 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -212,6 +212,7 @@ struct vfio_device_info {
 #define VFIO_DEVICE_API_PCI_STRING		"vfio-pci"
 #define VFIO_DEVICE_API_PLATFORM_STRING		"vfio-platform"
 #define VFIO_DEVICE_API_AMBA_STRING		"vfio-amba"
+#define VFIO_DEVICE_API_CCW_STRING		"vfio-ccw"
 
 /**
  * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 03/15] vfio: ccw: define device_api strings
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Define vfio-ccw device API strings. CCW vendor driver using mediated
device framework should use this string for device_api attribute.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 include/uapi/linux/vfio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 519eff3..6183789 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -212,6 +212,7 @@ struct vfio_device_info {
 #define VFIO_DEVICE_API_PCI_STRING		"vfio-pci"
 #define VFIO_DEVICE_API_PLATFORM_STRING		"vfio-platform"
 #define VFIO_DEVICE_API_AMBA_STRING		"vfio-amba"
+#define VFIO_DEVICE_API_CCW_STRING		"vfio-ccw"
 
 /**
  * VFIO_DEVICE_GET_REGION_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 8,
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

To make vfio support subchannel devices, we need a css driver for
the vfio subchannels. This patch adds a basic vfio-ccw subchannel
driver for this purpose.

To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
and configure VFIO as required.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 arch/s390/Kconfig                   |  10 ++
 arch/s390/include/asm/isc.h         |   1 +
 drivers/iommu/Kconfig               |   8 ++
 drivers/s390/cio/Makefile           |   3 +
 drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_private.h |  25 ++++
 6 files changed, 309 insertions(+)
 create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
 create mode 100644 drivers/s390/cio/vfio_ccw_private.h

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c6722112..b920df8 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -670,6 +670,16 @@ config EADM_SCH
 	  To compile this driver as a module, choose M here: the
 	  module will be called eadm_sch.
 
+config VFIO_CCW
+	def_tristate n
+	prompt "Support for VFIO-CCW subchannels"
+	depends on S390_CCW_IOMMU && VFIO
+	help
+	  This driver allows usage of VFIO-CCW subchannels.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called vfio_ccw.
+
 endmenu
 
 menu "Dump support"
diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
index 68d7d68..8a0b721 100644
--- a/arch/s390/include/asm/isc.h
+++ b/arch/s390/include/asm/isc.h
@@ -16,6 +16,7 @@
 #define CONSOLE_ISC 1			/* console I/O subchannel */
 #define EADM_SCH_ISC 4			/* EADM subchannels */
 #define CHSC_SCH_ISC 7			/* CHSC subchannels */
+#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */
 /* Adapter interrupts. */
 #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
 #define PCI_ISC 2			/* PCI I/O subchannels */
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8ee54d7..a5626d4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -327,6 +327,14 @@ config S390_IOMMU
 	help
 	  Support for the IOMMU API for s390 PCI devices.
 
+config S390_CCW_IOMMU
+	bool "S390 CCW IOMMU Support"
+	depends on S390 && CCW
+	select IOMMU_API
+	help
+	  Enables bits of IOMMU API required by VFIO. The iommu_ops
+	  is not implemented as it is not necessary for VFIO.
+
 config MTK_IOMMU
 	bool "MTK IOMMU Support"
 	depends on ARM || ARM64
diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index 3ab9aed..3d7390e 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -17,3 +17,6 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
+
+vfio_ccw-objs += vfio_ccw_drv.o
+obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
new file mode 100644
index 0000000..b068207
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -0,0 +1,262 @@
+/*
+ * VFIO based Physical Subchannel device driver
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+
+#include <asm/isc.h>
+
+#include "vfio_ccw_private.h"
+
+/*
+ * Helpers
+ */
+static int vfio_ccw_sch_quiesce(struct subchannel *sch)
+{
+	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+	DECLARE_COMPLETION_ONSTACK(completion);
+	int iretry, ret = 0;
+
+	spin_lock_irq(sch->lock);
+	if (!sch->schib.pmcw.ena)
+		goto out_unlock;
+	ret = cio_disable_subchannel(sch);
+	if (ret != -EBUSY)
+		goto out_unlock;
+
+	do {
+		iretry = 255;
+
+		ret = cio_cancel_halt_clear(sch, &iretry);
+		while (ret == -EBUSY) {
+			/*
+			 * Flushing all I/O and wait the
+			 * cancel/halt/clear completion.
+			 */
+			private->completion = &completion;
+			spin_unlock_irq(sch->lock);
+
+			wait_for_completion(&completion);
+
+			spin_lock_irq(sch->lock);
+			private->completion = NULL;
+			ret = cio_cancel_halt_clear(sch, &iretry);
+		};
+
+		ret = cio_disable_subchannel(sch);
+	} while (ret == -EBUSY);
+
+out_unlock:
+	spin_unlock_irq(sch->lock);
+	return ret;
+}
+
+/*
+ * Sysfs interfaces
+ */
+static ssize_t chpids_show(struct device *dev,
+			   struct device_attribute *attr,
+			   char *buf)
+{
+	struct subchannel *sch = to_subchannel(dev);
+	struct chsc_ssd_info *ssd = &sch->ssd_info;
+	ssize_t ret = 0;
+	int chp;
+	int mask;
+
+	for (chp = 0; chp < 8; chp++) {
+		mask = 0x80 >> chp;
+		if (ssd->path_mask & mask)
+			ret += sprintf(buf + ret, "%02x ", ssd->chpid[chp].id);
+		else
+			ret += sprintf(buf + ret, "00 ");
+	}
+	ret += sprintf(buf+ret, "\n");
+	return ret;
+}
+
+static ssize_t pimpampom_show(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct subchannel *sch = to_subchannel(dev);
+	struct pmcw *pmcw = &sch->schib.pmcw;
+
+	return sprintf(buf, "%02x %02x %02x\n",
+		       pmcw->pim, pmcw->pam, pmcw->pom);
+}
+
+static DEVICE_ATTR(chpids, 0444, chpids_show, NULL);
+static DEVICE_ATTR(pimpampom, 0444, pimpampom_show, NULL);
+
+static struct attribute *vfio_subchannel_attrs[] = {
+	&dev_attr_chpids.attr,
+	&dev_attr_pimpampom.attr,
+	NULL,
+};
+
+static struct attribute_group vfio_subchannel_attr_group = {
+	.attrs = vfio_subchannel_attrs,
+};
+
+/*
+ * Css driver callbacks
+ */
+static void vfio_ccw_sch_irq(struct subchannel *sch)
+{
+	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+
+	inc_irq_stat(IRQIO_CIO);
+
+	if (!private)
+		return;
+
+	if (private->completion)
+		complete(private->completion);
+}
+
+static int vfio_ccw_sch_probe(struct subchannel *sch)
+{
+	struct pmcw *pmcw = &sch->schib.pmcw;
+	struct vfio_ccw_private *private;
+	int ret;
+
+	if (pmcw->qf) {
+		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
+			 dev_name(&sch->dev));
+		return -ENOTTY;
+	}
+
+	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
+	if (!private)
+		return -ENOMEM;
+	private->sch = sch;
+	dev_set_drvdata(&sch->dev, private);
+
+	spin_lock_irq(sch->lock);
+	sch->isc = VFIO_CCW_ISC;
+	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	spin_unlock_irq(sch->lock);
+	if (ret)
+		goto out_free;
+
+	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
+	if (ret)
+		goto out_disable;
+
+	return 0;
+
+out_disable:
+	cio_disable_subchannel(sch);
+out_free:
+	dev_set_drvdata(&sch->dev, NULL);
+	kfree(private);
+	return ret;
+}
+
+static int vfio_ccw_sch_remove(struct subchannel *sch)
+{
+	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+
+	vfio_ccw_sch_quiesce(sch);
+
+	sysfs_remove_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
+
+	dev_set_drvdata(&sch->dev, NULL);
+
+	kfree(private);
+
+	return 0;
+}
+
+static void vfio_ccw_sch_shutdown(struct subchannel *sch)
+{
+	vfio_ccw_sch_quiesce(sch);
+}
+
+/**
+ * vfio_ccw_sch_event - process subchannel event
+ * @sch: subchannel
+ * @process: non-zero if function is called in process context
+ *
+ * An unspecified event occurred for this subchannel. Adjust data according
+ * to the current operational state of the subchannel. Return zero when the
+ * event has been handled sufficiently or -EAGAIN when this function should
+ * be called again in process context.
+ */
+static int vfio_ccw_sch_event(struct subchannel *sch, int process)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(sch->lock, flags);
+	if (!device_is_registered(&sch->dev))
+		goto out_unlock;
+
+	if (work_pending(&sch->todo_work))
+		goto out_unlock;
+
+	if (cio_update_schib(sch)) {
+		/* Not operational. */
+		css_sched_sch_todo(sch, SCH_TODO_UNREG);
+
+		/*
+		 * TODO:
+		 * Probably we should send the machine check to the guest.
+		 */
+		goto out_unlock;
+	}
+
+out_unlock:
+	spin_unlock_irqrestore(sch->lock, flags);
+
+	return 0;
+}
+
+static struct css_device_id vfio_ccw_sch_ids[] = {
+	{ .match_flags = 0x1, .type = SUBCHANNEL_TYPE_IO, },
+	{ /* end of list */ },
+};
+MODULE_DEVICE_TABLE(css, vfio_ccw_sch_ids);
+
+static struct css_driver vfio_ccw_sch_driver = {
+	.drv = {
+		.name = "vfio_ccw",
+		.owner = THIS_MODULE,
+	},
+	.subchannel_type = vfio_ccw_sch_ids,
+	.irq = vfio_ccw_sch_irq,
+	.probe = vfio_ccw_sch_probe,
+	.remove = vfio_ccw_sch_remove,
+	.shutdown = vfio_ccw_sch_shutdown,
+	.sch_event = vfio_ccw_sch_event,
+};
+
+static int __init vfio_ccw_sch_init(void)
+{
+	int ret;
+
+	isc_register(VFIO_CCW_ISC);
+	ret = css_driver_register(&vfio_ccw_sch_driver);
+	if (ret)
+		isc_unregister(VFIO_CCW_ISC);
+
+	return ret;
+}
+
+static void __exit vfio_ccw_sch_exit(void)
+{
+	css_driver_unregister(&vfio_ccw_sch_driver);
+	isc_unregister(VFIO_CCW_ISC);
+}
+module_init(vfio_ccw_sch_init);
+module_exit(vfio_ccw_sch_exit);
+
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
new file mode 100644
index 0000000..38d69a5
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -0,0 +1,25 @@
+/*
+ * Private stuff for vfio_ccw driver
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#ifndef _VFIO_CCW_PRIVATE_H_
+#define _VFIO_CCW_PRIVATE_H_
+
+#include "css.h"
+
+/**
+ * struct vfio_ccw_private
+ * @sch: pointer to the subchannel
+ * @completion: synchronization helper of the I/O completion
+ */
+struct vfio_ccw_private {
+	struct subchannel	*sch;
+	struct completion	*completion;
+} __aligned(8);
+
+#endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

To make vfio support subchannel devices, we need a css driver for
the vfio subchannels. This patch adds a basic vfio-ccw subchannel
driver for this purpose.

To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
and configure VFIO as required.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 arch/s390/Kconfig                   |  10 ++
 arch/s390/include/asm/isc.h         |   1 +
 drivers/iommu/Kconfig               |   8 ++
 drivers/s390/cio/Makefile           |   3 +
 drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_private.h |  25 ++++
 6 files changed, 309 insertions(+)
 create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
 create mode 100644 drivers/s390/cio/vfio_ccw_private.h

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c6722112..b920df8 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -670,6 +670,16 @@ config EADM_SCH
 	  To compile this driver as a module, choose M here: the
 	  module will be called eadm_sch.
 
+config VFIO_CCW
+	def_tristate n
+	prompt "Support for VFIO-CCW subchannels"
+	depends on S390_CCW_IOMMU && VFIO
+	help
+	  This driver allows usage of VFIO-CCW subchannels.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called vfio_ccw.
+
 endmenu
 
 menu "Dump support"
diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
index 68d7d68..8a0b721 100644
--- a/arch/s390/include/asm/isc.h
+++ b/arch/s390/include/asm/isc.h
@@ -16,6 +16,7 @@
 #define CONSOLE_ISC 1			/* console I/O subchannel */
 #define EADM_SCH_ISC 4			/* EADM subchannels */
 #define CHSC_SCH_ISC 7			/* CHSC subchannels */
+#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */
 /* Adapter interrupts. */
 #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
 #define PCI_ISC 2			/* PCI I/O subchannels */
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8ee54d7..a5626d4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -327,6 +327,14 @@ config S390_IOMMU
 	help
 	  Support for the IOMMU API for s390 PCI devices.
 
+config S390_CCW_IOMMU
+	bool "S390 CCW IOMMU Support"
+	depends on S390 && CCW
+	select IOMMU_API
+	help
+	  Enables bits of IOMMU API required by VFIO. The iommu_ops
+	  is not implemented as it is not necessary for VFIO.
+
 config MTK_IOMMU
 	bool "MTK IOMMU Support"
 	depends on ARM || ARM64
diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index 3ab9aed..3d7390e 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -17,3 +17,6 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
+
+vfio_ccw-objs += vfio_ccw_drv.o
+obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
new file mode 100644
index 0000000..b068207
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -0,0 +1,262 @@
+/*
+ * VFIO based Physical Subchannel device driver
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+
+#include <asm/isc.h>
+
+#include "vfio_ccw_private.h"
+
+/*
+ * Helpers
+ */
+static int vfio_ccw_sch_quiesce(struct subchannel *sch)
+{
+	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+	DECLARE_COMPLETION_ONSTACK(completion);
+	int iretry, ret = 0;
+
+	spin_lock_irq(sch->lock);
+	if (!sch->schib.pmcw.ena)
+		goto out_unlock;
+	ret = cio_disable_subchannel(sch);
+	if (ret != -EBUSY)
+		goto out_unlock;
+
+	do {
+		iretry = 255;
+
+		ret = cio_cancel_halt_clear(sch, &iretry);
+		while (ret == -EBUSY) {
+			/*
+			 * Flushing all I/O and wait the
+			 * cancel/halt/clear completion.
+			 */
+			private->completion = &completion;
+			spin_unlock_irq(sch->lock);
+
+			wait_for_completion(&completion);
+
+			spin_lock_irq(sch->lock);
+			private->completion = NULL;
+			ret = cio_cancel_halt_clear(sch, &iretry);
+		};
+
+		ret = cio_disable_subchannel(sch);
+	} while (ret == -EBUSY);
+
+out_unlock:
+	spin_unlock_irq(sch->lock);
+	return ret;
+}
+
+/*
+ * Sysfs interfaces
+ */
+static ssize_t chpids_show(struct device *dev,
+			   struct device_attribute *attr,
+			   char *buf)
+{
+	struct subchannel *sch = to_subchannel(dev);
+	struct chsc_ssd_info *ssd = &sch->ssd_info;
+	ssize_t ret = 0;
+	int chp;
+	int mask;
+
+	for (chp = 0; chp < 8; chp++) {
+		mask = 0x80 >> chp;
+		if (ssd->path_mask & mask)
+			ret += sprintf(buf + ret, "%02x ", ssd->chpid[chp].id);
+		else
+			ret += sprintf(buf + ret, "00 ");
+	}
+	ret += sprintf(buf+ret, "\n");
+	return ret;
+}
+
+static ssize_t pimpampom_show(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct subchannel *sch = to_subchannel(dev);
+	struct pmcw *pmcw = &sch->schib.pmcw;
+
+	return sprintf(buf, "%02x %02x %02x\n",
+		       pmcw->pim, pmcw->pam, pmcw->pom);
+}
+
+static DEVICE_ATTR(chpids, 0444, chpids_show, NULL);
+static DEVICE_ATTR(pimpampom, 0444, pimpampom_show, NULL);
+
+static struct attribute *vfio_subchannel_attrs[] = {
+	&dev_attr_chpids.attr,
+	&dev_attr_pimpampom.attr,
+	NULL,
+};
+
+static struct attribute_group vfio_subchannel_attr_group = {
+	.attrs = vfio_subchannel_attrs,
+};
+
+/*
+ * Css driver callbacks
+ */
+static void vfio_ccw_sch_irq(struct subchannel *sch)
+{
+	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+
+	inc_irq_stat(IRQIO_CIO);
+
+	if (!private)
+		return;
+
+	if (private->completion)
+		complete(private->completion);
+}
+
+static int vfio_ccw_sch_probe(struct subchannel *sch)
+{
+	struct pmcw *pmcw = &sch->schib.pmcw;
+	struct vfio_ccw_private *private;
+	int ret;
+
+	if (pmcw->qf) {
+		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
+			 dev_name(&sch->dev));
+		return -ENOTTY;
+	}
+
+	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
+	if (!private)
+		return -ENOMEM;
+	private->sch = sch;
+	dev_set_drvdata(&sch->dev, private);
+
+	spin_lock_irq(sch->lock);
+	sch->isc = VFIO_CCW_ISC;
+	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	spin_unlock_irq(sch->lock);
+	if (ret)
+		goto out_free;
+
+	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
+	if (ret)
+		goto out_disable;
+
+	return 0;
+
+out_disable:
+	cio_disable_subchannel(sch);
+out_free:
+	dev_set_drvdata(&sch->dev, NULL);
+	kfree(private);
+	return ret;
+}
+
+static int vfio_ccw_sch_remove(struct subchannel *sch)
+{
+	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+
+	vfio_ccw_sch_quiesce(sch);
+
+	sysfs_remove_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
+
+	dev_set_drvdata(&sch->dev, NULL);
+
+	kfree(private);
+
+	return 0;
+}
+
+static void vfio_ccw_sch_shutdown(struct subchannel *sch)
+{
+	vfio_ccw_sch_quiesce(sch);
+}
+
+/**
+ * vfio_ccw_sch_event - process subchannel event
+ * @sch: subchannel
+ * @process: non-zero if function is called in process context
+ *
+ * An unspecified event occurred for this subchannel. Adjust data according
+ * to the current operational state of the subchannel. Return zero when the
+ * event has been handled sufficiently or -EAGAIN when this function should
+ * be called again in process context.
+ */
+static int vfio_ccw_sch_event(struct subchannel *sch, int process)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(sch->lock, flags);
+	if (!device_is_registered(&sch->dev))
+		goto out_unlock;
+
+	if (work_pending(&sch->todo_work))
+		goto out_unlock;
+
+	if (cio_update_schib(sch)) {
+		/* Not operational. */
+		css_sched_sch_todo(sch, SCH_TODO_UNREG);
+
+		/*
+		 * TODO:
+		 * Probably we should send the machine check to the guest.
+		 */
+		goto out_unlock;
+	}
+
+out_unlock:
+	spin_unlock_irqrestore(sch->lock, flags);
+
+	return 0;
+}
+
+static struct css_device_id vfio_ccw_sch_ids[] = {
+	{ .match_flags = 0x1, .type = SUBCHANNEL_TYPE_IO, },
+	{ /* end of list */ },
+};
+MODULE_DEVICE_TABLE(css, vfio_ccw_sch_ids);
+
+static struct css_driver vfio_ccw_sch_driver = {
+	.drv = {
+		.name = "vfio_ccw",
+		.owner = THIS_MODULE,
+	},
+	.subchannel_type = vfio_ccw_sch_ids,
+	.irq = vfio_ccw_sch_irq,
+	.probe = vfio_ccw_sch_probe,
+	.remove = vfio_ccw_sch_remove,
+	.shutdown = vfio_ccw_sch_shutdown,
+	.sch_event = vfio_ccw_sch_event,
+};
+
+static int __init vfio_ccw_sch_init(void)
+{
+	int ret;
+
+	isc_register(VFIO_CCW_ISC);
+	ret = css_driver_register(&vfio_ccw_sch_driver);
+	if (ret)
+		isc_unregister(VFIO_CCW_ISC);
+
+	return ret;
+}
+
+static void __exit vfio_ccw_sch_exit(void)
+{
+	css_driver_unregister(&vfio_ccw_sch_driver);
+	isc_unregister(VFIO_CCW_ISC);
+}
+module_init(vfio_ccw_sch_init);
+module_exit(vfio_ccw_sch_exit);
+
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
new file mode 100644
index 0000000..38d69a5
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -0,0 +1,25 @@
+/*
+ * Private stuff for vfio_ccw driver
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#ifndef _VFIO_CCW_PRIVATE_H_
+#define _VFIO_CCW_PRIVATE_H_
+
+#include "css.h"
+
+/**
+ * struct vfio_ccw_private
+ * @sch: pointer to the subchannel
+ * @completion: synchronization helper of the I/O completion
+ */
+struct vfio_ccw_private {
+	struct subchannel	*sch;
+	struct completion	*completion;
+} __aligned(8);
+
+#endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 05/15] vfio: ccw: introduce channel program interfaces
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Introduce ccwchain structure and helper functions that can be used to
handle a channel program issued from a virtual machine.

The following limitations apply:
1. Supports only prefetch enabled mode.
2. Supports idal(c64) ccw chaining.
3. Supports 4k idaw.
4. Supports ccw1.
5. Supports direct ccw chaining by translating them to idal ccws.

CCW translation requires to leverage the vfio_(un)pin_pages interfaces
to pin/unpin sets of mem pages frequently. Currently we have a lack of
support to do this in an efficient way. So we introduce pfn_array data
structure and helper functions to handle pin/unpin operations here.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 drivers/s390/cio/Makefile      |   2 +-
 drivers/s390/cio/vfio_ccw_cp.c | 816 +++++++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_cp.h |  42 +++
 3 files changed, 859 insertions(+), 1 deletion(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_cp.c
 create mode 100644 drivers/s390/cio/vfio_ccw_cp.h

diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index 3d7390e..1bec279 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -18,5 +18,5 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
 
-vfio_ccw-objs += vfio_ccw_drv.o
+vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o
 obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
new file mode 100644
index 0000000..16bbb54
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -0,0 +1,816 @@
+/*
+ * channel program interfaces
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/iommu.h>
+#include <linux/vfio.h>
+#include <asm/idals.h>
+
+#include "vfio_ccw_cp.h"
+
+/*
+ * Max length for ccw chain.
+ * XXX: Limit to 256, need to check more?
+ */
+#define CCWCHAIN_LEN_MAX	256
+
+struct pfn_array {
+	unsigned long		pa_iova;
+	unsigned long		*pa_iova_pfn;
+	unsigned long		*pa_pfn;
+	int			pa_nr;
+};
+
+struct pfn_array_table {
+	struct pfn_array	*pat_pa;
+	int			pat_nr;
+};
+
+struct ccwchain {
+	struct list_head	next;
+	struct ccw1		*ch_ccw;
+	/* Guest physical address of the current chain. */
+	u64			ch_iova;
+	/* Count of the valid ccws in chain. */
+	int			ch_len;
+	/* Pinned PAGEs for the original data. */
+	struct pfn_array_table	*ch_pat;
+};
+
+/*
+ * pfn_array_pin() - pin user pages in memory
+ * @pa: pfn_array on which to perform the operation
+ * @mdev: the mediated device to perform pin/unpin operations
+ *
+ * Attempt to pin user pages in memory.
+ *
+ * Usage of pfn_array:
+ * @pa->pa_iova     starting guest physical I/O address. Assigned by caller.
+ * @pa->pa_iova_pfn array that stores PFNs of the pages need to pin. Allocated
+ *                  by caller.
+ * @pa->pa_pfn      array that receives PFNs of the pages pinned. Allocated by
+ *                  caller.
+ * @pa->pa_nr       number of pages from @pa->pa_iova to pin. Assigned by
+ *                  caller.
+ *                  number of pages pinned. Assigned by callee.
+ *
+ * Returns:
+ *   Number of pages pinned on success.
+ *   If @pa->pa_nr is 0 or negative, returns 0.
+ *   If no pages were pinned, returns -errno.
+ */
+static int pfn_array_pin(struct pfn_array *pa, struct device *mdev)
+{
+	int i, ret;
+
+	if (pa->pa_nr <= 0) {
+		pa->pa_nr = 0;
+		return 0;
+	}
+
+	pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
+	for (i = 1; i < pa->pa_nr; i++)
+		pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
+
+	ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
+			     IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
+
+	if (ret > 0 && ret != pa->pa_nr) {
+		vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
+		pa->pa_nr = 0;
+		return 0;
+	}
+
+	return ret;
+}
+
+/* Unpin the pages before releasing the memory. */
+static void pfn_array_unpin_free(struct pfn_array *pa, struct device *mdev)
+{
+	vfio_unpin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr);
+	pa->pa_nr = 0;
+	kfree(pa->pa_iova_pfn);
+}
+
+/* Alloc memory for PFNs, then pin pages with them. */
+static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
+			       u64 iova, unsigned int len)
+{
+	int ret = 0;
+
+	if (!len || pa->pa_nr)
+		return -EINVAL;
+
+	pa->pa_iova = iova;
+
+	pa->pa_nr = ((iova & ~PAGE_MASK) + len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
+	if (!pa->pa_nr)
+		return -EINVAL;
+
+	pa->pa_iova_pfn = kcalloc(pa->pa_nr,
+				  sizeof(*pa->pa_iova_pfn) +
+				  sizeof(*pa->pa_pfn),
+				  GFP_KERNEL);
+	if (unlikely(!pa->pa_iova_pfn))
+		return -ENOMEM;
+	pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
+
+	ret = pfn_array_pin(pa, mdev);
+
+	if (ret > 0)
+		return ret;
+	else if (!ret)
+		ret = -EINVAL;
+
+	kfree(pa->pa_iova_pfn);
+
+	return ret;
+}
+
+static int pfn_array_table_init(struct pfn_array_table *pat, int nr)
+{
+	pat->pat_pa = kcalloc(nr, sizeof(*pat->pat_pa), GFP_KERNEL);
+	if (unlikely(ZERO_OR_NULL_PTR(pat->pat_pa))) {
+		pat->pat_nr = 0;
+		return -ENOMEM;
+	}
+
+	pat->pat_nr = nr;
+
+	return 0;
+}
+
+static void pfn_array_table_unpin_free(struct pfn_array_table *pat,
+				       struct device *mdev)
+{
+	int i;
+
+	for (i = 0; i < pat->pat_nr; i++)
+		pfn_array_unpin_free(pat->pat_pa + i, mdev);
+
+	if (pat->pat_nr) {
+		kfree(pat->pat_pa);
+		pat->pat_pa = NULL;
+		pat->pat_nr = 0;
+	}
+}
+
+static bool pfn_array_table_iova_pinned(struct pfn_array_table *pat,
+					unsigned long iova)
+{
+	struct pfn_array *pa = pat->pat_pa;
+	unsigned long iova_pfn = iova >> PAGE_SHIFT;
+	int i, j;
+
+	for (i = 0; i < pat->pat_nr; i++, pa++)
+		for (j = 0; j < pa->pa_nr; j++)
+			if (pa->pa_iova_pfn[i] == iova_pfn)
+				return true;
+
+	return false;
+}
+/* Create the list idal words for a pfn_array_table. */
+static inline void pfn_array_table_idal_create_words(
+	struct pfn_array_table *pat,
+	unsigned long *idaws)
+{
+	struct pfn_array *pa;
+	int i, j, k;
+
+	/*
+	 * Idal words (execept the first one) rely on the memory being 4k
+	 * aligned. If a user virtual address is 4K aligned, then it's
+	 * corresponding kernel physical address will also be 4K aligned. Thus
+	 * there will be no problem here to simply use the phys to create an
+	 * idaw.
+	 */
+	k = 0;
+	for (i = 0; i < pat->pat_nr; i++) {
+		pa = pat->pat_pa + i;
+		for (j = 0; j < pa->pa_nr; j++) {
+			idaws[k] = pa->pa_pfn[j] << PAGE_SHIFT;
+			if (k == 0)
+				idaws[k] += pa->pa_iova & (PAGE_SIZE - 1);
+			k++;
+		}
+	}
+}
+
+
+/*
+ * Within the domain (@mdev), copy @n bytes from a guest physical
+ * address (@iova) to a host physical address (@to).
+ */
+static long copy_from_iova(struct device *mdev,
+			   void *to, u64 iova,
+			   unsigned long n)
+{
+	struct pfn_array pa = {0};
+	u64 from;
+	int i, ret;
+	unsigned long l, m;
+
+	ret = pfn_array_alloc_pin(&pa, mdev, iova, n);
+	if (ret <= 0)
+		return ret;
+
+	l = n;
+	for (i = 0; i < pa.pa_nr; i++) {
+		from = pa.pa_pfn[i] << PAGE_SHIFT;
+		m = PAGE_SIZE;
+		if (i == 0) {
+			from += iova & (PAGE_SIZE - 1);
+			m -= iova & (PAGE_SIZE - 1);
+		}
+
+		m = min(l, m);
+		memcpy(to + (n - l), (void *)from, m);
+
+		l -= m;
+		if (l == 0)
+			break;
+	}
+
+	pfn_array_unpin_free(&pa, mdev);
+
+	return l;
+}
+
+static long copy_ccw_from_iova(struct channel_program *cp,
+			       struct ccw1 *to, u64 iova,
+			       unsigned long len)
+{
+	return copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
+}
+
+/*
+ * Helpers to operate ccwchain.
+ */
+#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
+
+#define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
+
+#define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
+
+#define ccw_is_idal(_ccw) ((_ccw)->flags & CCW_FLAG_IDA)
+
+
+#define ccw_is_chain(_ccw) ((_ccw)->flags & (CCW_FLAG_CC | CCW_FLAG_DC))
+
+static struct ccwchain *ccwchain_alloc(struct channel_program *cp, int len)
+{
+	struct ccwchain *chain;
+	void *data;
+	size_t size;
+
+	/* Make ccw address aligned to 8. */
+	size = ((sizeof(*chain) + 7L) & -8L) +
+		sizeof(*chain->ch_ccw) * len +
+		sizeof(*chain->ch_pat) * len;
+	chain = kzalloc(size, GFP_DMA | GFP_KERNEL);
+	if (!chain)
+		return NULL;
+
+	data = (u8 *)chain + ((sizeof(*chain) + 7L) & -8L);
+	chain->ch_ccw = (struct ccw1 *)data;
+
+	data = (u8 *)(chain->ch_ccw) + sizeof(*chain->ch_ccw) * len;
+	chain->ch_pat = (struct pfn_array_table *)data;
+
+	chain->ch_len = len;
+
+	list_add_tail(&chain->next, &cp->ccwchain_list);
+
+	return chain;
+}
+
+static void ccwchain_free(struct ccwchain *chain)
+{
+	list_del(&chain->next);
+	kfree(chain);
+}
+
+/* Free resource for a ccw that allocated memory for its cda. */
+static void ccwchain_cda_free(struct ccwchain *chain, int idx)
+{
+	struct ccw1 *ccw = chain->ch_ccw + idx;
+
+	if (!ccw->count)
+		return;
+
+	kfree((void *)(u64)ccw->cda);
+}
+
+/* Unpin the pages then free the memory resources. */
+static void cp_unpin_free(struct channel_program *cp)
+{
+	struct ccwchain *chain, *temp;
+	int i;
+
+	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
+		for (i = 0; i < chain->ch_len; i++) {
+			pfn_array_table_unpin_free(chain->ch_pat + i,
+						   cp->mdev);
+			ccwchain_cda_free(chain, i);
+		}
+		ccwchain_free(chain);
+	}
+}
+
+/**
+ * ccwchain_calc_length - calculate the length of the ccw chain.
+ * @iova: guest physical address of the target ccw chain
+ * @cp: channel_program on which to perform the operation
+ *
+ * This is the chain length not considering any TICs.
+ * You need to do a new round for each TIC target.
+ *
+ * Returns: the length of the ccw chain or -errno.
+ */
+static int ccwchain_calc_length(u64 iova, struct channel_program *cp)
+{
+	struct ccw1 *ccw, *p;
+	int cnt;
+
+	/*
+	 * Copy current chain from guest to host kernel.
+	 * Currently the chain length is limited to CCWCHAIN_LEN_MAX (256).
+	 * So copying 2K is enough (safe).
+	 */
+	p = ccw = kcalloc(CCWCHAIN_LEN_MAX, sizeof(*ccw), GFP_KERNEL);
+	if (!ccw)
+		return -ENOMEM;
+
+	cnt = copy_ccw_from_iova(cp, ccw, iova, CCWCHAIN_LEN_MAX);
+	if (cnt) {
+		kfree(ccw);
+		return cnt;
+	}
+
+	cnt = 0;
+	do {
+		cnt++;
+
+		if ((!ccw_is_chain(ccw)) && (!ccw_is_tic(ccw)))
+			break;
+
+		ccw++;
+	} while (cnt < CCWCHAIN_LEN_MAX + 1);
+
+	if (cnt == CCWCHAIN_LEN_MAX + 1)
+		cnt = -EINVAL;
+
+	kfree(p);
+	return cnt;
+}
+
+static int tic_target_chain_exists(struct ccw1 *tic, struct channel_program *cp)
+{
+	struct ccwchain *chain;
+	u32 ccw_head, ccw_tail;
+
+	list_for_each_entry(chain, &cp->ccwchain_list, next) {
+		ccw_head = chain->ch_iova;
+		ccw_tail = ccw_head + (chain->ch_len - 1) * sizeof(struct ccw1);
+
+		if ((ccw_head <= tic->cda) && (tic->cda <= ccw_tail))
+			return 1;
+	}
+
+	return 0;
+}
+
+static int ccwchain_loop_tic(struct ccwchain *chain,
+			     struct channel_program *cp);
+
+static int ccwchain_handle_tic(struct ccw1 *tic, struct channel_program *cp)
+{
+	struct ccwchain *chain;
+	int len, ret;
+
+	/* May transfer to an existing chain. */
+	if (tic_target_chain_exists(tic, cp))
+		return 0;
+
+	/* Get chain length. */
+	len = ccwchain_calc_length(tic->cda, cp);
+	if (len < 0)
+		return len;
+
+	/* Need alloc a new chain for this one. */
+	chain = ccwchain_alloc(cp, len);
+	if (!chain)
+		return -ENOMEM;
+	chain->ch_iova = tic->cda;
+
+	/* Copy the new chain from user. */
+	ret = copy_ccw_from_iova(cp, chain->ch_ccw, tic->cda, len);
+	if (ret) {
+		ccwchain_free(chain);
+		return ret;
+	}
+
+	/* Loop for tics on this new chain. */
+	return ccwchain_loop_tic(chain, cp);
+}
+
+/* Loop for TICs. */
+static int ccwchain_loop_tic(struct ccwchain *chain, struct channel_program *cp)
+{
+	struct ccw1 *tic;
+	int i, ret;
+
+	for (i = 0; i < chain->ch_len; i++) {
+		tic = chain->ch_ccw + i;
+
+		if (!ccw_is_tic(tic))
+			continue;
+
+		ret = ccwchain_handle_tic(tic, cp);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int ccwchain_fetch_tic(struct ccwchain *chain,
+			      int idx,
+			      struct channel_program *cp)
+{
+	struct ccw1 *ccw = chain->ch_ccw + idx;
+	struct ccwchain *iter;
+	u32 ccw_head, ccw_tail;
+
+	list_for_each_entry(iter, &cp->ccwchain_list, next) {
+		ccw_head = iter->ch_iova;
+		ccw_tail = ccw_head + (iter->ch_len - 1) * sizeof(struct ccw1);
+
+		if ((ccw_head <= ccw->cda) && (ccw->cda <= ccw_tail)) {
+			ccw->cda = (__u32) (addr_t) (iter->ch_ccw +
+						     (ccw->cda - ccw_head));
+			return 0;
+		}
+	}
+
+	return -EFAULT;
+}
+
+static int ccwchain_fetch_direct(struct ccwchain *chain,
+				 int idx,
+				 struct channel_program *cp)
+{
+	struct ccw1 *ccw;
+	struct pfn_array_table *pat;
+	unsigned long *idaws;
+	int idaw_nr;
+
+	ccw = chain->ch_ccw + idx;
+
+	/*
+	 * Pin data page(s) in memory.
+	 * The number of pages actually is the count of the idaws which will be
+	 * needed when translating a direct ccw to a idal ccw.
+	 */
+	pat = chain->ch_pat + idx;
+	if (pfn_array_table_init(pat, 1))
+		return -ENOMEM;
+	idaw_nr = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
+				      ccw->cda, ccw->count);
+	if (idaw_nr < 0)
+		return idaw_nr;
+
+	/* Translate this direct ccw to a idal ccw. */
+	idaws = kcalloc(idaw_nr, sizeof(*idaws), GFP_DMA | GFP_KERNEL);
+	if (!idaws) {
+		pfn_array_table_unpin_free(pat, cp->mdev);
+		return -ENOMEM;
+	}
+	ccw->cda = (__u32) virt_to_phys(idaws);
+	ccw->flags |= CCW_FLAG_IDA;
+
+	pfn_array_table_idal_create_words(pat, idaws);
+
+	return 0;
+}
+
+static int ccwchain_fetch_idal(struct ccwchain *chain,
+			       int idx,
+			       struct channel_program *cp)
+{
+	struct ccw1 *ccw;
+	struct pfn_array_table *pat;
+	unsigned long *idaws;
+	u64 idaw_iova;
+	unsigned int idaw_nr, idaw_len;
+	int i, ret;
+
+	ccw = chain->ch_ccw + idx;
+
+	/* Calculate size of idaws. */
+	ret = copy_from_iova(cp->mdev, &idaw_iova, ccw->cda, sizeof(idaw_iova));
+	if (ret)
+		return ret;
+	idaw_nr = idal_nr_words((void *)(idaw_iova), ccw->count);
+	idaw_len = idaw_nr * sizeof(*idaws);
+
+	/* Pin data page(s) in memory. */
+	pat = chain->ch_pat + idx;
+	ret = pfn_array_table_init(pat, idaw_nr);
+	if (ret)
+		return ret;
+
+	/* Translate idal ccw to use new allocated idaws. */
+	idaws = kzalloc(idaw_len, GFP_DMA | GFP_KERNEL);
+	if (!idaws) {
+		ret = -ENOMEM;
+		goto out_unpin;
+	}
+
+	ret = copy_from_iova(cp->mdev, idaws, ccw->cda, idaw_len);
+	if (ret)
+		goto out_free_idaws;
+
+	ccw->cda = virt_to_phys(idaws);
+
+	for (i = 0; i < idaw_nr; i++) {
+		idaw_iova = *(idaws + i);
+		if (IS_ERR_VALUE(idaw_iova)) {
+			ret = -EFAULT;
+			goto out_free_idaws;
+		}
+
+		ret = pfn_array_alloc_pin(pat->pat_pa + i, cp->mdev,
+					  idaw_iova, 1);
+		if (ret < 0)
+			goto out_free_idaws;
+	}
+
+	pfn_array_table_idal_create_words(pat, idaws);
+
+	return 0;
+
+out_free_idaws:
+	kfree(idaws);
+out_unpin:
+	pfn_array_table_unpin_free(pat, cp->mdev);
+	return ret;
+}
+
+/*
+ * Fetch one ccw.
+ * To reduce memory copy, we'll pin the cda page in memory,
+ * and to get rid of the cda 2G limitiaion of ccw1, we'll translate
+ * direct ccws to idal ccws.
+ */
+static int ccwchain_fetch_one(struct ccwchain *chain,
+			      int idx,
+			      struct channel_program *cp)
+{
+	struct ccw1 *ccw = chain->ch_ccw + idx;
+
+	if (ccw_is_test(ccw) || ccw_is_noop(ccw))
+		return 0;
+
+	if (ccw_is_tic(ccw))
+		return ccwchain_fetch_tic(chain, idx, cp);
+
+	if (ccw_is_idal(ccw))
+		return ccwchain_fetch_idal(chain, idx, cp);
+
+	return ccwchain_fetch_direct(chain, idx, cp);
+}
+
+/**
+ * cp_init() - allocate ccwchains for a channel program.
+ * @cp: channel_program on which to perform the operation
+ * @mdev: the mediated device to perform pin/unpin operations
+ * @orb: control block for the channel program from the guest
+ *
+ * This creates one or more ccwchain(s), and copies the raw data of
+ * the target channel program from @orb->cmd.iova to the new ccwchain(s).
+ *
+ * Limitations:
+ * 1. Supports only prefetch enabled mode.
+ * 2. Supports idal(c64) ccw chaining.
+ * 3. Supports 4k idaw.
+ *
+ * Returns:
+ *   %0 on success and a negative error value on failure.
+ */
+int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
+{
+	u64 iova = orb->cmd.cpa;
+	struct ccwchain *chain;
+	int len, ret;
+
+	/*
+	 * XXX:
+	 * Only support prefetch enable mode now.
+	 * Only support 64bit addressing idal.
+	 * Only support 4k IDAW.
+	 * Only support ccw1.
+	 */
+	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k || !orb->cmd.fmt)
+		return -EOPNOTSUPP;
+
+	INIT_LIST_HEAD(&cp->ccwchain_list);
+	memcpy(&cp->orb, orb, sizeof(*orb));
+	cp->mdev = mdev;
+
+	/* Get chain length. */
+	len = ccwchain_calc_length(iova, cp);
+	if (len < 0)
+		return len;
+
+	/* Alloc mem for the head chain. */
+	chain = ccwchain_alloc(cp, len);
+	if (!chain)
+		return -ENOMEM;
+	chain->ch_iova = iova;
+
+	/* Copy the head chain from guest. */
+	ret = copy_ccw_from_iova(cp, chain->ch_ccw, iova, len);
+	if (ret) {
+		ccwchain_free(chain);
+		return ret;
+	}
+
+	/* Now loop for its TICs. */
+	ret = ccwchain_loop_tic(chain, cp);
+	if (ret)
+		cp_unpin_free(cp);
+
+	return ret;
+}
+
+
+/**
+ * cp_free() - free resources for channel program.
+ * @cp: channel_program on which to perform the operation
+ *
+ * This unpins the memory pages and frees the memory space occupied by
+ * @cp, which must have been returned by a previous call to cp_init().
+ * Otherwise, undefined behavior occurs.
+ */
+void cp_free(struct channel_program *cp)
+{
+	cp_unpin_free(cp);
+}
+
+/**
+ * cp_prefetch() - translate a guest physical address channel program to
+ *                 a real-device runnable channel program.
+ * @cp: channel_program on which to perform the operation
+ *
+ * This function translates the guest-physical-address channel program
+ * and stores the result to ccwchain list. @cp must have been
+ * initialized by a previous call with cp_init(). Otherwise, undefined
+ * behavior occurs.
+ *
+ * The S/390 CCW Translation APIS (prefixed by 'cp_') are introduced
+ * as helpers to do ccw chain translation inside the kernel. Basically
+ * they accept a channel program issued by a virtual machine, and
+ * translate the channel program to a real-device runnable channel
+ * program.
+ *
+ * These APIs will copy the ccws into kernel-space buffers, and update
+ * the guest phsical addresses with their corresponding host physical
+ * addresses.  Then channel I/O device drivers could issue the
+ * translated channel program to real devices to perform an I/O
+ * operation.
+ *
+ * These interfaces are designed to support translation only for
+ * channel programs, which are generated and formatted by a
+ * guest. Thus this will make it possible for things like VFIO to
+ * leverage the interfaces to passthrough a channel I/O mediated
+ * device in QEMU.
+ *
+ * We support direct ccw chaining by translating them to idal ccws.
+ *
+ * Returns:
+ *   %0 on success and a negative error value on failure.
+ */
+int cp_prefetch(struct channel_program *cp)
+{
+	struct ccwchain *chain;
+	int len, idx, ret;
+
+	list_for_each_entry(chain, &cp->ccwchain_list, next) {
+		len = chain->ch_len;
+		for (idx = 0; idx < len; idx++) {
+			ret = ccwchain_fetch_one(chain, idx, cp);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * cp_get_orb() - get the orb of the channel program
+ * @cp: channel_program on which to perform the operation
+ * @intparm: new intparm for the returned orb
+ * @lpm: candidate value of the logical-path mask for the returned orb
+ *
+ * This function returns the address of the updated orb of the channel
+ * program. Channel I/O device drivers could use this orb to issue a
+ * ssch.
+ */
+union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm)
+{
+	union orb *orb;
+	struct ccwchain *chain;
+	struct ccw1 *cpa;
+
+	orb = &cp->orb;
+
+	orb->cmd.intparm = intparm;
+	orb->cmd.fmt = 1;
+	orb->cmd.key = PAGE_DEFAULT_KEY >> 4;
+
+	if (orb->cmd.lpm == 0)
+		orb->cmd.lpm = lpm;
+
+	chain = list_first_entry(&cp->ccwchain_list, struct ccwchain, next);
+	cpa = chain->ch_ccw;
+	orb->cmd.cpa = (__u32) __pa(cpa);
+
+	return orb;
+}
+
+/**
+ * cp_update_scsw() - update scsw for a channel program.
+ * @cp: channel_program on which to perform the operation
+ * @scsw: I/O results of the channel program and also the target to be
+ *        updated
+ *
+ * @scsw contains the I/O results of the channel program that pointed
+ * to by @cp. However what @scsw->cpa stores is a host physical
+ * address, which is meaningless for the guest, which is waiting for
+ * the I/O results.
+ *
+ * This function updates @scsw->cpa to its coressponding guest physical
+ * address.
+ */
+void cp_update_scsw(struct channel_program *cp, union scsw *scsw)
+{
+	struct ccwchain *chain;
+	u32 cpa = scsw->cmd.cpa;
+	u32 ccw_head, ccw_tail;
+
+	/*
+	 * LATER:
+	 * For now, only update the cmd.cpa part. We may need to deal with
+	 * other portions of the schib as well, even if we don't return them
+	 * in the ioctl directly. Path status changes etc.
+	 */
+	list_for_each_entry(chain, &cp->ccwchain_list, next) {
+		ccw_head = (u32)(u64)chain->ch_ccw;
+		ccw_tail = (u32)(u64)(chain->ch_ccw + chain->ch_len - 1);
+
+		if ((ccw_head <= cpa) && (cpa <= ccw_tail)) {
+			/*
+			 * (cpa - ccw_head) is the offset value of the host
+			 * physical ccw to its chain head.
+			 * Adding this value to the guest physical ccw chain
+			 * head gets us the guest cpa.
+			 */
+			cpa = chain->ch_iova + (cpa - ccw_head);
+			break;
+		}
+	}
+
+	scsw->cmd.cpa = cpa;
+}
+
+/**
+ * cp_iova_pinned() - check if an iova is pinned for a ccw chain.
+ * @cmd: ccwchain command on which to perform the operation
+ * @iova: the iova to check
+ *
+ * If the @iova is currently pinned for the ccw chain, return true;
+ * else return false.
+ */
+bool cp_iova_pinned(struct channel_program *cp, u64 iova)
+{
+	struct ccwchain *chain;
+	int i;
+
+	list_for_each_entry(chain, &cp->ccwchain_list, next) {
+		for (i = 0; i < chain->ch_len; i++)
+			if (pfn_array_table_iova_pinned(chain->ch_pat + i,
+							iova))
+				return true;
+	}
+
+	return false;
+}
diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
new file mode 100644
index 0000000..7a1996b
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_cp.h
@@ -0,0 +1,42 @@
+/*
+ * channel program interfaces
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#ifndef _VFIO_CCW_CP_H_
+#define _VFIO_CCW_CP_H_
+
+#include <asm/cio.h>
+#include <asm/scsw.h>
+
+#include "orb.h"
+
+/**
+ * struct channel_program - manage information for channel program
+ * @ccwchain_list: list head of ccwchains
+ * @orb: orb for the currently processed ssch request
+ * @mdev: the mediated device to perform page pinning/unpinning
+ *
+ * @ccwchain_list is the head of a ccwchain list, that contents the
+ * translated result of the guest channel program that pointed out by
+ * the iova parameter when calling cp_init.
+ */
+struct channel_program {
+	struct list_head ccwchain_list;
+	union orb orb;
+	struct device *mdev;
+};
+
+extern int cp_init(struct channel_program *cp, struct device *mdev,
+		   union orb *orb);
+extern void cp_free(struct channel_program *cp);
+extern int cp_prefetch(struct channel_program *cp);
+extern union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm);
+extern void cp_update_scsw(struct channel_program *cp, union scsw *scsw);
+extern bool cp_iova_pinned(struct channel_program *cp, u64 iova);
+
+#endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 05/15] vfio: ccw: introduce channel program interfaces
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Introduce ccwchain structure and helper functions that can be used to
handle a channel program issued from a virtual machine.

The following limitations apply:
1. Supports only prefetch enabled mode.
2. Supports idal(c64) ccw chaining.
3. Supports 4k idaw.
4. Supports ccw1.
5. Supports direct ccw chaining by translating them to idal ccws.

CCW translation requires to leverage the vfio_(un)pin_pages interfaces
to pin/unpin sets of mem pages frequently. Currently we have a lack of
support to do this in an efficient way. So we introduce pfn_array data
structure and helper functions to handle pin/unpin operations here.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 drivers/s390/cio/Makefile      |   2 +-
 drivers/s390/cio/vfio_ccw_cp.c | 816 +++++++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_cp.h |  42 +++
 3 files changed, 859 insertions(+), 1 deletion(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_cp.c
 create mode 100644 drivers/s390/cio/vfio_ccw_cp.h

diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index 3d7390e..1bec279 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -18,5 +18,5 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
 
-vfio_ccw-objs += vfio_ccw_drv.o
+vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o
 obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
new file mode 100644
index 0000000..16bbb54
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -0,0 +1,816 @@
+/*
+ * channel program interfaces
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/iommu.h>
+#include <linux/vfio.h>
+#include <asm/idals.h>
+
+#include "vfio_ccw_cp.h"
+
+/*
+ * Max length for ccw chain.
+ * XXX: Limit to 256, need to check more?
+ */
+#define CCWCHAIN_LEN_MAX	256
+
+struct pfn_array {
+	unsigned long		pa_iova;
+	unsigned long		*pa_iova_pfn;
+	unsigned long		*pa_pfn;
+	int			pa_nr;
+};
+
+struct pfn_array_table {
+	struct pfn_array	*pat_pa;
+	int			pat_nr;
+};
+
+struct ccwchain {
+	struct list_head	next;
+	struct ccw1		*ch_ccw;
+	/* Guest physical address of the current chain. */
+	u64			ch_iova;
+	/* Count of the valid ccws in chain. */
+	int			ch_len;
+	/* Pinned PAGEs for the original data. */
+	struct pfn_array_table	*ch_pat;
+};
+
+/*
+ * pfn_array_pin() - pin user pages in memory
+ * @pa: pfn_array on which to perform the operation
+ * @mdev: the mediated device to perform pin/unpin operations
+ *
+ * Attempt to pin user pages in memory.
+ *
+ * Usage of pfn_array:
+ * @pa->pa_iova     starting guest physical I/O address. Assigned by caller.
+ * @pa->pa_iova_pfn array that stores PFNs of the pages need to pin. Allocated
+ *                  by caller.
+ * @pa->pa_pfn      array that receives PFNs of the pages pinned. Allocated by
+ *                  caller.
+ * @pa->pa_nr       number of pages from @pa->pa_iova to pin. Assigned by
+ *                  caller.
+ *                  number of pages pinned. Assigned by callee.
+ *
+ * Returns:
+ *   Number of pages pinned on success.
+ *   If @pa->pa_nr is 0 or negative, returns 0.
+ *   If no pages were pinned, returns -errno.
+ */
+static int pfn_array_pin(struct pfn_array *pa, struct device *mdev)
+{
+	int i, ret;
+
+	if (pa->pa_nr <= 0) {
+		pa->pa_nr = 0;
+		return 0;
+	}
+
+	pa->pa_iova_pfn[0] = pa->pa_iova >> PAGE_SHIFT;
+	for (i = 1; i < pa->pa_nr; i++)
+		pa->pa_iova_pfn[i] = pa->pa_iova_pfn[i - 1] + 1;
+
+	ret = vfio_pin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr,
+			     IOMMU_READ | IOMMU_WRITE, pa->pa_pfn);
+
+	if (ret > 0 && ret != pa->pa_nr) {
+		vfio_unpin_pages(mdev, pa->pa_iova_pfn, ret);
+		pa->pa_nr = 0;
+		return 0;
+	}
+
+	return ret;
+}
+
+/* Unpin the pages before releasing the memory. */
+static void pfn_array_unpin_free(struct pfn_array *pa, struct device *mdev)
+{
+	vfio_unpin_pages(mdev, pa->pa_iova_pfn, pa->pa_nr);
+	pa->pa_nr = 0;
+	kfree(pa->pa_iova_pfn);
+}
+
+/* Alloc memory for PFNs, then pin pages with them. */
+static int pfn_array_alloc_pin(struct pfn_array *pa, struct device *mdev,
+			       u64 iova, unsigned int len)
+{
+	int ret = 0;
+
+	if (!len || pa->pa_nr)
+		return -EINVAL;
+
+	pa->pa_iova = iova;
+
+	pa->pa_nr = ((iova & ~PAGE_MASK) + len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
+	if (!pa->pa_nr)
+		return -EINVAL;
+
+	pa->pa_iova_pfn = kcalloc(pa->pa_nr,
+				  sizeof(*pa->pa_iova_pfn) +
+				  sizeof(*pa->pa_pfn),
+				  GFP_KERNEL);
+	if (unlikely(!pa->pa_iova_pfn))
+		return -ENOMEM;
+	pa->pa_pfn = pa->pa_iova_pfn + pa->pa_nr;
+
+	ret = pfn_array_pin(pa, mdev);
+
+	if (ret > 0)
+		return ret;
+	else if (!ret)
+		ret = -EINVAL;
+
+	kfree(pa->pa_iova_pfn);
+
+	return ret;
+}
+
+static int pfn_array_table_init(struct pfn_array_table *pat, int nr)
+{
+	pat->pat_pa = kcalloc(nr, sizeof(*pat->pat_pa), GFP_KERNEL);
+	if (unlikely(ZERO_OR_NULL_PTR(pat->pat_pa))) {
+		pat->pat_nr = 0;
+		return -ENOMEM;
+	}
+
+	pat->pat_nr = nr;
+
+	return 0;
+}
+
+static void pfn_array_table_unpin_free(struct pfn_array_table *pat,
+				       struct device *mdev)
+{
+	int i;
+
+	for (i = 0; i < pat->pat_nr; i++)
+		pfn_array_unpin_free(pat->pat_pa + i, mdev);
+
+	if (pat->pat_nr) {
+		kfree(pat->pat_pa);
+		pat->pat_pa = NULL;
+		pat->pat_nr = 0;
+	}
+}
+
+static bool pfn_array_table_iova_pinned(struct pfn_array_table *pat,
+					unsigned long iova)
+{
+	struct pfn_array *pa = pat->pat_pa;
+	unsigned long iova_pfn = iova >> PAGE_SHIFT;
+	int i, j;
+
+	for (i = 0; i < pat->pat_nr; i++, pa++)
+		for (j = 0; j < pa->pa_nr; j++)
+			if (pa->pa_iova_pfn[i] == iova_pfn)
+				return true;
+
+	return false;
+}
+/* Create the list idal words for a pfn_array_table. */
+static inline void pfn_array_table_idal_create_words(
+	struct pfn_array_table *pat,
+	unsigned long *idaws)
+{
+	struct pfn_array *pa;
+	int i, j, k;
+
+	/*
+	 * Idal words (execept the first one) rely on the memory being 4k
+	 * aligned. If a user virtual address is 4K aligned, then it's
+	 * corresponding kernel physical address will also be 4K aligned. Thus
+	 * there will be no problem here to simply use the phys to create an
+	 * idaw.
+	 */
+	k = 0;
+	for (i = 0; i < pat->pat_nr; i++) {
+		pa = pat->pat_pa + i;
+		for (j = 0; j < pa->pa_nr; j++) {
+			idaws[k] = pa->pa_pfn[j] << PAGE_SHIFT;
+			if (k == 0)
+				idaws[k] += pa->pa_iova & (PAGE_SIZE - 1);
+			k++;
+		}
+	}
+}
+
+
+/*
+ * Within the domain (@mdev), copy @n bytes from a guest physical
+ * address (@iova) to a host physical address (@to).
+ */
+static long copy_from_iova(struct device *mdev,
+			   void *to, u64 iova,
+			   unsigned long n)
+{
+	struct pfn_array pa = {0};
+	u64 from;
+	int i, ret;
+	unsigned long l, m;
+
+	ret = pfn_array_alloc_pin(&pa, mdev, iova, n);
+	if (ret <= 0)
+		return ret;
+
+	l = n;
+	for (i = 0; i < pa.pa_nr; i++) {
+		from = pa.pa_pfn[i] << PAGE_SHIFT;
+		m = PAGE_SIZE;
+		if (i == 0) {
+			from += iova & (PAGE_SIZE - 1);
+			m -= iova & (PAGE_SIZE - 1);
+		}
+
+		m = min(l, m);
+		memcpy(to + (n - l), (void *)from, m);
+
+		l -= m;
+		if (l == 0)
+			break;
+	}
+
+	pfn_array_unpin_free(&pa, mdev);
+
+	return l;
+}
+
+static long copy_ccw_from_iova(struct channel_program *cp,
+			       struct ccw1 *to, u64 iova,
+			       unsigned long len)
+{
+	return copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
+}
+
+/*
+ * Helpers to operate ccwchain.
+ */
+#define ccw_is_test(_ccw) (((_ccw)->cmd_code & 0x0F) == 0)
+
+#define ccw_is_noop(_ccw) ((_ccw)->cmd_code == CCW_CMD_NOOP)
+
+#define ccw_is_tic(_ccw) ((_ccw)->cmd_code == CCW_CMD_TIC)
+
+#define ccw_is_idal(_ccw) ((_ccw)->flags & CCW_FLAG_IDA)
+
+
+#define ccw_is_chain(_ccw) ((_ccw)->flags & (CCW_FLAG_CC | CCW_FLAG_DC))
+
+static struct ccwchain *ccwchain_alloc(struct channel_program *cp, int len)
+{
+	struct ccwchain *chain;
+	void *data;
+	size_t size;
+
+	/* Make ccw address aligned to 8. */
+	size = ((sizeof(*chain) + 7L) & -8L) +
+		sizeof(*chain->ch_ccw) * len +
+		sizeof(*chain->ch_pat) * len;
+	chain = kzalloc(size, GFP_DMA | GFP_KERNEL);
+	if (!chain)
+		return NULL;
+
+	data = (u8 *)chain + ((sizeof(*chain) + 7L) & -8L);
+	chain->ch_ccw = (struct ccw1 *)data;
+
+	data = (u8 *)(chain->ch_ccw) + sizeof(*chain->ch_ccw) * len;
+	chain->ch_pat = (struct pfn_array_table *)data;
+
+	chain->ch_len = len;
+
+	list_add_tail(&chain->next, &cp->ccwchain_list);
+
+	return chain;
+}
+
+static void ccwchain_free(struct ccwchain *chain)
+{
+	list_del(&chain->next);
+	kfree(chain);
+}
+
+/* Free resource for a ccw that allocated memory for its cda. */
+static void ccwchain_cda_free(struct ccwchain *chain, int idx)
+{
+	struct ccw1 *ccw = chain->ch_ccw + idx;
+
+	if (!ccw->count)
+		return;
+
+	kfree((void *)(u64)ccw->cda);
+}
+
+/* Unpin the pages then free the memory resources. */
+static void cp_unpin_free(struct channel_program *cp)
+{
+	struct ccwchain *chain, *temp;
+	int i;
+
+	list_for_each_entry_safe(chain, temp, &cp->ccwchain_list, next) {
+		for (i = 0; i < chain->ch_len; i++) {
+			pfn_array_table_unpin_free(chain->ch_pat + i,
+						   cp->mdev);
+			ccwchain_cda_free(chain, i);
+		}
+		ccwchain_free(chain);
+	}
+}
+
+/**
+ * ccwchain_calc_length - calculate the length of the ccw chain.
+ * @iova: guest physical address of the target ccw chain
+ * @cp: channel_program on which to perform the operation
+ *
+ * This is the chain length not considering any TICs.
+ * You need to do a new round for each TIC target.
+ *
+ * Returns: the length of the ccw chain or -errno.
+ */
+static int ccwchain_calc_length(u64 iova, struct channel_program *cp)
+{
+	struct ccw1 *ccw, *p;
+	int cnt;
+
+	/*
+	 * Copy current chain from guest to host kernel.
+	 * Currently the chain length is limited to CCWCHAIN_LEN_MAX (256).
+	 * So copying 2K is enough (safe).
+	 */
+	p = ccw = kcalloc(CCWCHAIN_LEN_MAX, sizeof(*ccw), GFP_KERNEL);
+	if (!ccw)
+		return -ENOMEM;
+
+	cnt = copy_ccw_from_iova(cp, ccw, iova, CCWCHAIN_LEN_MAX);
+	if (cnt) {
+		kfree(ccw);
+		return cnt;
+	}
+
+	cnt = 0;
+	do {
+		cnt++;
+
+		if ((!ccw_is_chain(ccw)) && (!ccw_is_tic(ccw)))
+			break;
+
+		ccw++;
+	} while (cnt < CCWCHAIN_LEN_MAX + 1);
+
+	if (cnt == CCWCHAIN_LEN_MAX + 1)
+		cnt = -EINVAL;
+
+	kfree(p);
+	return cnt;
+}
+
+static int tic_target_chain_exists(struct ccw1 *tic, struct channel_program *cp)
+{
+	struct ccwchain *chain;
+	u32 ccw_head, ccw_tail;
+
+	list_for_each_entry(chain, &cp->ccwchain_list, next) {
+		ccw_head = chain->ch_iova;
+		ccw_tail = ccw_head + (chain->ch_len - 1) * sizeof(struct ccw1);
+
+		if ((ccw_head <= tic->cda) && (tic->cda <= ccw_tail))
+			return 1;
+	}
+
+	return 0;
+}
+
+static int ccwchain_loop_tic(struct ccwchain *chain,
+			     struct channel_program *cp);
+
+static int ccwchain_handle_tic(struct ccw1 *tic, struct channel_program *cp)
+{
+	struct ccwchain *chain;
+	int len, ret;
+
+	/* May transfer to an existing chain. */
+	if (tic_target_chain_exists(tic, cp))
+		return 0;
+
+	/* Get chain length. */
+	len = ccwchain_calc_length(tic->cda, cp);
+	if (len < 0)
+		return len;
+
+	/* Need alloc a new chain for this one. */
+	chain = ccwchain_alloc(cp, len);
+	if (!chain)
+		return -ENOMEM;
+	chain->ch_iova = tic->cda;
+
+	/* Copy the new chain from user. */
+	ret = copy_ccw_from_iova(cp, chain->ch_ccw, tic->cda, len);
+	if (ret) {
+		ccwchain_free(chain);
+		return ret;
+	}
+
+	/* Loop for tics on this new chain. */
+	return ccwchain_loop_tic(chain, cp);
+}
+
+/* Loop for TICs. */
+static int ccwchain_loop_tic(struct ccwchain *chain, struct channel_program *cp)
+{
+	struct ccw1 *tic;
+	int i, ret;
+
+	for (i = 0; i < chain->ch_len; i++) {
+		tic = chain->ch_ccw + i;
+
+		if (!ccw_is_tic(tic))
+			continue;
+
+		ret = ccwchain_handle_tic(tic, cp);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int ccwchain_fetch_tic(struct ccwchain *chain,
+			      int idx,
+			      struct channel_program *cp)
+{
+	struct ccw1 *ccw = chain->ch_ccw + idx;
+	struct ccwchain *iter;
+	u32 ccw_head, ccw_tail;
+
+	list_for_each_entry(iter, &cp->ccwchain_list, next) {
+		ccw_head = iter->ch_iova;
+		ccw_tail = ccw_head + (iter->ch_len - 1) * sizeof(struct ccw1);
+
+		if ((ccw_head <= ccw->cda) && (ccw->cda <= ccw_tail)) {
+			ccw->cda = (__u32) (addr_t) (iter->ch_ccw +
+						     (ccw->cda - ccw_head));
+			return 0;
+		}
+	}
+
+	return -EFAULT;
+}
+
+static int ccwchain_fetch_direct(struct ccwchain *chain,
+				 int idx,
+				 struct channel_program *cp)
+{
+	struct ccw1 *ccw;
+	struct pfn_array_table *pat;
+	unsigned long *idaws;
+	int idaw_nr;
+
+	ccw = chain->ch_ccw + idx;
+
+	/*
+	 * Pin data page(s) in memory.
+	 * The number of pages actually is the count of the idaws which will be
+	 * needed when translating a direct ccw to a idal ccw.
+	 */
+	pat = chain->ch_pat + idx;
+	if (pfn_array_table_init(pat, 1))
+		return -ENOMEM;
+	idaw_nr = pfn_array_alloc_pin(pat->pat_pa, cp->mdev,
+				      ccw->cda, ccw->count);
+	if (idaw_nr < 0)
+		return idaw_nr;
+
+	/* Translate this direct ccw to a idal ccw. */
+	idaws = kcalloc(idaw_nr, sizeof(*idaws), GFP_DMA | GFP_KERNEL);
+	if (!idaws) {
+		pfn_array_table_unpin_free(pat, cp->mdev);
+		return -ENOMEM;
+	}
+	ccw->cda = (__u32) virt_to_phys(idaws);
+	ccw->flags |= CCW_FLAG_IDA;
+
+	pfn_array_table_idal_create_words(pat, idaws);
+
+	return 0;
+}
+
+static int ccwchain_fetch_idal(struct ccwchain *chain,
+			       int idx,
+			       struct channel_program *cp)
+{
+	struct ccw1 *ccw;
+	struct pfn_array_table *pat;
+	unsigned long *idaws;
+	u64 idaw_iova;
+	unsigned int idaw_nr, idaw_len;
+	int i, ret;
+
+	ccw = chain->ch_ccw + idx;
+
+	/* Calculate size of idaws. */
+	ret = copy_from_iova(cp->mdev, &idaw_iova, ccw->cda, sizeof(idaw_iova));
+	if (ret)
+		return ret;
+	idaw_nr = idal_nr_words((void *)(idaw_iova), ccw->count);
+	idaw_len = idaw_nr * sizeof(*idaws);
+
+	/* Pin data page(s) in memory. */
+	pat = chain->ch_pat + idx;
+	ret = pfn_array_table_init(pat, idaw_nr);
+	if (ret)
+		return ret;
+
+	/* Translate idal ccw to use new allocated idaws. */
+	idaws = kzalloc(idaw_len, GFP_DMA | GFP_KERNEL);
+	if (!idaws) {
+		ret = -ENOMEM;
+		goto out_unpin;
+	}
+
+	ret = copy_from_iova(cp->mdev, idaws, ccw->cda, idaw_len);
+	if (ret)
+		goto out_free_idaws;
+
+	ccw->cda = virt_to_phys(idaws);
+
+	for (i = 0; i < idaw_nr; i++) {
+		idaw_iova = *(idaws + i);
+		if (IS_ERR_VALUE(idaw_iova)) {
+			ret = -EFAULT;
+			goto out_free_idaws;
+		}
+
+		ret = pfn_array_alloc_pin(pat->pat_pa + i, cp->mdev,
+					  idaw_iova, 1);
+		if (ret < 0)
+			goto out_free_idaws;
+	}
+
+	pfn_array_table_idal_create_words(pat, idaws);
+
+	return 0;
+
+out_free_idaws:
+	kfree(idaws);
+out_unpin:
+	pfn_array_table_unpin_free(pat, cp->mdev);
+	return ret;
+}
+
+/*
+ * Fetch one ccw.
+ * To reduce memory copy, we'll pin the cda page in memory,
+ * and to get rid of the cda 2G limitiaion of ccw1, we'll translate
+ * direct ccws to idal ccws.
+ */
+static int ccwchain_fetch_one(struct ccwchain *chain,
+			      int idx,
+			      struct channel_program *cp)
+{
+	struct ccw1 *ccw = chain->ch_ccw + idx;
+
+	if (ccw_is_test(ccw) || ccw_is_noop(ccw))
+		return 0;
+
+	if (ccw_is_tic(ccw))
+		return ccwchain_fetch_tic(chain, idx, cp);
+
+	if (ccw_is_idal(ccw))
+		return ccwchain_fetch_idal(chain, idx, cp);
+
+	return ccwchain_fetch_direct(chain, idx, cp);
+}
+
+/**
+ * cp_init() - allocate ccwchains for a channel program.
+ * @cp: channel_program on which to perform the operation
+ * @mdev: the mediated device to perform pin/unpin operations
+ * @orb: control block for the channel program from the guest
+ *
+ * This creates one or more ccwchain(s), and copies the raw data of
+ * the target channel program from @orb->cmd.iova to the new ccwchain(s).
+ *
+ * Limitations:
+ * 1. Supports only prefetch enabled mode.
+ * 2. Supports idal(c64) ccw chaining.
+ * 3. Supports 4k idaw.
+ *
+ * Returns:
+ *   %0 on success and a negative error value on failure.
+ */
+int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
+{
+	u64 iova = orb->cmd.cpa;
+	struct ccwchain *chain;
+	int len, ret;
+
+	/*
+	 * XXX:
+	 * Only support prefetch enable mode now.
+	 * Only support 64bit addressing idal.
+	 * Only support 4k IDAW.
+	 * Only support ccw1.
+	 */
+	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k || !orb->cmd.fmt)
+		return -EOPNOTSUPP;
+
+	INIT_LIST_HEAD(&cp->ccwchain_list);
+	memcpy(&cp->orb, orb, sizeof(*orb));
+	cp->mdev = mdev;
+
+	/* Get chain length. */
+	len = ccwchain_calc_length(iova, cp);
+	if (len < 0)
+		return len;
+
+	/* Alloc mem for the head chain. */
+	chain = ccwchain_alloc(cp, len);
+	if (!chain)
+		return -ENOMEM;
+	chain->ch_iova = iova;
+
+	/* Copy the head chain from guest. */
+	ret = copy_ccw_from_iova(cp, chain->ch_ccw, iova, len);
+	if (ret) {
+		ccwchain_free(chain);
+		return ret;
+	}
+
+	/* Now loop for its TICs. */
+	ret = ccwchain_loop_tic(chain, cp);
+	if (ret)
+		cp_unpin_free(cp);
+
+	return ret;
+}
+
+
+/**
+ * cp_free() - free resources for channel program.
+ * @cp: channel_program on which to perform the operation
+ *
+ * This unpins the memory pages and frees the memory space occupied by
+ * @cp, which must have been returned by a previous call to cp_init().
+ * Otherwise, undefined behavior occurs.
+ */
+void cp_free(struct channel_program *cp)
+{
+	cp_unpin_free(cp);
+}
+
+/**
+ * cp_prefetch() - translate a guest physical address channel program to
+ *                 a real-device runnable channel program.
+ * @cp: channel_program on which to perform the operation
+ *
+ * This function translates the guest-physical-address channel program
+ * and stores the result to ccwchain list. @cp must have been
+ * initialized by a previous call with cp_init(). Otherwise, undefined
+ * behavior occurs.
+ *
+ * The S/390 CCW Translation APIS (prefixed by 'cp_') are introduced
+ * as helpers to do ccw chain translation inside the kernel. Basically
+ * they accept a channel program issued by a virtual machine, and
+ * translate the channel program to a real-device runnable channel
+ * program.
+ *
+ * These APIs will copy the ccws into kernel-space buffers, and update
+ * the guest phsical addresses with their corresponding host physical
+ * addresses.  Then channel I/O device drivers could issue the
+ * translated channel program to real devices to perform an I/O
+ * operation.
+ *
+ * These interfaces are designed to support translation only for
+ * channel programs, which are generated and formatted by a
+ * guest. Thus this will make it possible for things like VFIO to
+ * leverage the interfaces to passthrough a channel I/O mediated
+ * device in QEMU.
+ *
+ * We support direct ccw chaining by translating them to idal ccws.
+ *
+ * Returns:
+ *   %0 on success and a negative error value on failure.
+ */
+int cp_prefetch(struct channel_program *cp)
+{
+	struct ccwchain *chain;
+	int len, idx, ret;
+
+	list_for_each_entry(chain, &cp->ccwchain_list, next) {
+		len = chain->ch_len;
+		for (idx = 0; idx < len; idx++) {
+			ret = ccwchain_fetch_one(chain, idx, cp);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * cp_get_orb() - get the orb of the channel program
+ * @cp: channel_program on which to perform the operation
+ * @intparm: new intparm for the returned orb
+ * @lpm: candidate value of the logical-path mask for the returned orb
+ *
+ * This function returns the address of the updated orb of the channel
+ * program. Channel I/O device drivers could use this orb to issue a
+ * ssch.
+ */
+union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm)
+{
+	union orb *orb;
+	struct ccwchain *chain;
+	struct ccw1 *cpa;
+
+	orb = &cp->orb;
+
+	orb->cmd.intparm = intparm;
+	orb->cmd.fmt = 1;
+	orb->cmd.key = PAGE_DEFAULT_KEY >> 4;
+
+	if (orb->cmd.lpm == 0)
+		orb->cmd.lpm = lpm;
+
+	chain = list_first_entry(&cp->ccwchain_list, struct ccwchain, next);
+	cpa = chain->ch_ccw;
+	orb->cmd.cpa = (__u32) __pa(cpa);
+
+	return orb;
+}
+
+/**
+ * cp_update_scsw() - update scsw for a channel program.
+ * @cp: channel_program on which to perform the operation
+ * @scsw: I/O results of the channel program and also the target to be
+ *        updated
+ *
+ * @scsw contains the I/O results of the channel program that pointed
+ * to by @cp. However what @scsw->cpa stores is a host physical
+ * address, which is meaningless for the guest, which is waiting for
+ * the I/O results.
+ *
+ * This function updates @scsw->cpa to its coressponding guest physical
+ * address.
+ */
+void cp_update_scsw(struct channel_program *cp, union scsw *scsw)
+{
+	struct ccwchain *chain;
+	u32 cpa = scsw->cmd.cpa;
+	u32 ccw_head, ccw_tail;
+
+	/*
+	 * LATER:
+	 * For now, only update the cmd.cpa part. We may need to deal with
+	 * other portions of the schib as well, even if we don't return them
+	 * in the ioctl directly. Path status changes etc.
+	 */
+	list_for_each_entry(chain, &cp->ccwchain_list, next) {
+		ccw_head = (u32)(u64)chain->ch_ccw;
+		ccw_tail = (u32)(u64)(chain->ch_ccw + chain->ch_len - 1);
+
+		if ((ccw_head <= cpa) && (cpa <= ccw_tail)) {
+			/*
+			 * (cpa - ccw_head) is the offset value of the host
+			 * physical ccw to its chain head.
+			 * Adding this value to the guest physical ccw chain
+			 * head gets us the guest cpa.
+			 */
+			cpa = chain->ch_iova + (cpa - ccw_head);
+			break;
+		}
+	}
+
+	scsw->cmd.cpa = cpa;
+}
+
+/**
+ * cp_iova_pinned() - check if an iova is pinned for a ccw chain.
+ * @cmd: ccwchain command on which to perform the operation
+ * @iova: the iova to check
+ *
+ * If the @iova is currently pinned for the ccw chain, return true;
+ * else return false.
+ */
+bool cp_iova_pinned(struct channel_program *cp, u64 iova)
+{
+	struct ccwchain *chain;
+	int i;
+
+	list_for_each_entry(chain, &cp->ccwchain_list, next) {
+		for (i = 0; i < chain->ch_len; i++)
+			if (pfn_array_table_iova_pinned(chain->ch_pat + i,
+							iova))
+				return true;
+	}
+
+	return false;
+}
diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
new file mode 100644
index 0000000..7a1996b
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_cp.h
@@ -0,0 +1,42 @@
+/*
+ * channel program interfaces
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#ifndef _VFIO_CCW_CP_H_
+#define _VFIO_CCW_CP_H_
+
+#include <asm/cio.h>
+#include <asm/scsw.h>
+
+#include "orb.h"
+
+/**
+ * struct channel_program - manage information for channel program
+ * @ccwchain_list: list head of ccwchains
+ * @orb: orb for the currently processed ssch request
+ * @mdev: the mediated device to perform page pinning/unpinning
+ *
+ * @ccwchain_list is the head of a ccwchain list, that contents the
+ * translated result of the guest channel program that pointed out by
+ * the iova parameter when calling cp_init.
+ */
+struct channel_program {
+	struct list_head ccwchain_list;
+	union orb orb;
+	struct device *mdev;
+};
+
+extern int cp_init(struct channel_program *cp, struct device *mdev,
+		   union orb *orb);
+extern void cp_free(struct channel_program *cp);
+extern int cp_prefetch(struct channel_program *cp);
+extern union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm);
+extern void cp_update_scsw(struct channel_program *cp, union scsw *scsw);
+extern bool cp_iova_pinned(struct channel_program *cp, u64 iova);
+
+#endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 06/15] vfio: ccw: register vfio_ccw to the mediated device framework
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

To make vfio support subchannel devices, we need to leverage the
mediated device framework to create a mediated device for the
subchannel device.

This registers the subchannel device to the mediated device
framework during probe to enable mediated device creation.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 arch/s390/Kconfig                   |   2 +-
 drivers/s390/cio/Makefile           |   2 +-
 drivers/s390/cio/vfio_ccw_drv.c     |  12 ++-
 drivers/s390/cio/vfio_ccw_ops.c     | 147 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_private.h |  11 +++
 5 files changed, 171 insertions(+), 3 deletions(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_ops.c

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b920df8..32008b8 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -673,7 +673,7 @@ config EADM_SCH
 config VFIO_CCW
 	def_tristate n
 	prompt "Support for VFIO-CCW subchannels"
-	depends on S390_CCW_IOMMU && VFIO
+	depends on S390_CCW_IOMMU && VFIO_MDEV
 	help
 	  This driver allows usage of VFIO-CCW subchannels.
 
diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index 1bec279..b0586b2 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -18,5 +18,5 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
 
-vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o
+vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o
 obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index b068207..80d20e3 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -19,7 +19,7 @@
 /*
  * Helpers
  */
-static int vfio_ccw_sch_quiesce(struct subchannel *sch)
+int vfio_ccw_sch_quiesce(struct subchannel *sch)
 {
 	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
 	DECLARE_COMPLETION_ONSTACK(completion);
@@ -152,8 +152,16 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	if (ret)
 		goto out_disable;
 
+	ret = vfio_ccw_mdev_reg(sch);
+	if (ret)
+		goto out_rm_group;
+
+	atomic_set(&private->avail, 1);
+
 	return 0;
 
+out_rm_group:
+	sysfs_remove_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
 out_disable:
 	cio_disable_subchannel(sch);
 out_free:
@@ -168,6 +176,8 @@ static int vfio_ccw_sch_remove(struct subchannel *sch)
 
 	vfio_ccw_sch_quiesce(sch);
 
+	vfio_ccw_mdev_unreg(sch);
+
 	sysfs_remove_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
 
 	dev_set_drvdata(&sch->dev, NULL);
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
new file mode 100644
index 0000000..b8a2fed
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -0,0 +1,147 @@
+/*
+ * Physical device callbacks for vfio_ccw
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+
+#include "vfio_ccw_private.h"
+
+static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
+				  unsigned long action,
+				  void *data)
+{
+	struct vfio_ccw_private *private =
+		container_of(nb, struct vfio_ccw_private, nb);
+
+	if (!private)
+		return NOTIFY_STOP;
+
+	/*
+	 * TODO:
+	 * Vendor drivers MUST unpin pages in response to an
+	 * invalidation.
+	 */
+	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP)
+		return NOTIFY_BAD;
+
+	return NOTIFY_DONE;
+}
+
+static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	return sprintf(buf, "I/O subchannel (Non-QDIO)\n");
+}
+MDEV_TYPE_ATTR_RO(name);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+			       char *buf)
+{
+	return sprintf(buf, "%s\n", VFIO_DEVICE_API_CCW_STRING);
+}
+MDEV_TYPE_ATTR_RO(device_api);
+
+static ssize_t available_instances_show(struct kobject *kobj,
+					struct device *dev, char *buf)
+{
+	struct vfio_ccw_private *private = dev_get_drvdata(dev);
+
+	return sprintf(buf, "%d\n", atomic_read(&private->avail));
+}
+MDEV_TYPE_ATTR_RO(available_instances);
+
+static struct attribute *mdev_types_attrs[] = {
+	&mdev_type_attr_name.attr,
+	&mdev_type_attr_device_api.attr,
+	&mdev_type_attr_available_instances.attr,
+	NULL,
+};
+
+static struct attribute_group mdev_type_group = {
+	.name  = "io",
+	.attrs = mdev_types_attrs,
+};
+
+struct attribute_group *mdev_type_groups[] = {
+	&mdev_type_group,
+	NULL,
+};
+
+static int vfio_ccw_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private =
+		dev_get_drvdata(mdev_parent_dev(mdev));
+
+	if (atomic_dec_if_positive(&private->avail) < 0)
+		return -EPERM;
+
+	private->mdev = mdev;
+
+	return 0;
+}
+
+static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private;
+	struct subchannel *sch;
+	int ret;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	sch = private->sch;
+	ret = vfio_ccw_sch_quiesce(sch);
+	if (ret)
+		return ret;
+	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	if (ret)
+		return ret;
+
+	private->mdev = NULL;
+	atomic_inc(&private->avail);
+
+	return 0;
+}
+
+static int vfio_ccw_mdev_open(struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private =
+		dev_get_drvdata(mdev_parent_dev(mdev));
+	unsigned long events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
+
+	private->nb.notifier_call = vfio_ccw_mdev_notifier;
+
+	return vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				      &events, &private->nb);
+}
+
+void vfio_ccw_mdev_release(struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private =
+		dev_get_drvdata(mdev_parent_dev(mdev));
+
+	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				 &private->nb);
+}
+
+static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
+	.owner			= THIS_MODULE,
+	.supported_type_groups  = mdev_type_groups,
+	.create			= vfio_ccw_mdev_create,
+	.remove			= vfio_ccw_mdev_remove,
+	.open			= vfio_ccw_mdev_open,
+	.release		= vfio_ccw_mdev_release,
+};
+
+int vfio_ccw_mdev_reg(struct subchannel *sch)
+{
+	return mdev_register_device(&sch->dev, &vfio_ccw_mdev_ops);
+}
+
+void vfio_ccw_mdev_unreg(struct subchannel *sch)
+{
+	mdev_unregister_device(&sch->dev);
+}
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 38d69a5..5afb3ba 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -16,10 +16,21 @@
  * struct vfio_ccw_private
  * @sch: pointer to the subchannel
  * @completion: synchronization helper of the I/O completion
+ * @avail: available for creating a mediated device
+ * @mdev: pointer to the mediated device
+ * @nb: notifier for vfio events
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
 	struct completion	*completion;
+	atomic_t		avail;
+	struct mdev_device	*mdev;
+	struct notifier_block	nb;
 } __aligned(8);
 
+extern int vfio_ccw_mdev_reg(struct subchannel *sch);
+extern void vfio_ccw_mdev_unreg(struct subchannel *sch);
+
+extern int vfio_ccw_sch_quiesce(struct subchannel *sch);
+
 #endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 06/15] vfio: ccw: register vfio_ccw to the mediated device framework
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

To make vfio support subchannel devices, we need to leverage the
mediated device framework to create a mediated device for the
subchannel device.

This registers the subchannel device to the mediated device
framework during probe to enable mediated device creation.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 arch/s390/Kconfig                   |   2 +-
 drivers/s390/cio/Makefile           |   2 +-
 drivers/s390/cio/vfio_ccw_drv.c     |  12 ++-
 drivers/s390/cio/vfio_ccw_ops.c     | 147 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_private.h |  11 +++
 5 files changed, 171 insertions(+), 3 deletions(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_ops.c

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index b920df8..32008b8 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -673,7 +673,7 @@ config EADM_SCH
 config VFIO_CCW
 	def_tristate n
 	prompt "Support for VFIO-CCW subchannels"
-	depends on S390_CCW_IOMMU && VFIO
+	depends on S390_CCW_IOMMU && VFIO_MDEV
 	help
 	  This driver allows usage of VFIO-CCW subchannels.
 
diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index 1bec279..b0586b2 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -18,5 +18,5 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
 
-vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o
+vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o
 obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index b068207..80d20e3 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -19,7 +19,7 @@
 /*
  * Helpers
  */
-static int vfio_ccw_sch_quiesce(struct subchannel *sch)
+int vfio_ccw_sch_quiesce(struct subchannel *sch)
 {
 	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
 	DECLARE_COMPLETION_ONSTACK(completion);
@@ -152,8 +152,16 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	if (ret)
 		goto out_disable;
 
+	ret = vfio_ccw_mdev_reg(sch);
+	if (ret)
+		goto out_rm_group;
+
+	atomic_set(&private->avail, 1);
+
 	return 0;
 
+out_rm_group:
+	sysfs_remove_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
 out_disable:
 	cio_disable_subchannel(sch);
 out_free:
@@ -168,6 +176,8 @@ static int vfio_ccw_sch_remove(struct subchannel *sch)
 
 	vfio_ccw_sch_quiesce(sch);
 
+	vfio_ccw_mdev_unreg(sch);
+
 	sysfs_remove_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
 
 	dev_set_drvdata(&sch->dev, NULL);
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
new file mode 100644
index 0000000..b8a2fed
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -0,0 +1,147 @@
+/*
+ * Physical device callbacks for vfio_ccw
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
+ */
+
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+
+#include "vfio_ccw_private.h"
+
+static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
+				  unsigned long action,
+				  void *data)
+{
+	struct vfio_ccw_private *private =
+		container_of(nb, struct vfio_ccw_private, nb);
+
+	if (!private)
+		return NOTIFY_STOP;
+
+	/*
+	 * TODO:
+	 * Vendor drivers MUST unpin pages in response to an
+	 * invalidation.
+	 */
+	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP)
+		return NOTIFY_BAD;
+
+	return NOTIFY_DONE;
+}
+
+static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+	return sprintf(buf, "I/O subchannel (Non-QDIO)\n");
+}
+MDEV_TYPE_ATTR_RO(name);
+
+static ssize_t device_api_show(struct kobject *kobj, struct device *dev,
+			       char *buf)
+{
+	return sprintf(buf, "%s\n", VFIO_DEVICE_API_CCW_STRING);
+}
+MDEV_TYPE_ATTR_RO(device_api);
+
+static ssize_t available_instances_show(struct kobject *kobj,
+					struct device *dev, char *buf)
+{
+	struct vfio_ccw_private *private = dev_get_drvdata(dev);
+
+	return sprintf(buf, "%d\n", atomic_read(&private->avail));
+}
+MDEV_TYPE_ATTR_RO(available_instances);
+
+static struct attribute *mdev_types_attrs[] = {
+	&mdev_type_attr_name.attr,
+	&mdev_type_attr_device_api.attr,
+	&mdev_type_attr_available_instances.attr,
+	NULL,
+};
+
+static struct attribute_group mdev_type_group = {
+	.name  = "io",
+	.attrs = mdev_types_attrs,
+};
+
+struct attribute_group *mdev_type_groups[] = {
+	&mdev_type_group,
+	NULL,
+};
+
+static int vfio_ccw_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private =
+		dev_get_drvdata(mdev_parent_dev(mdev));
+
+	if (atomic_dec_if_positive(&private->avail) < 0)
+		return -EPERM;
+
+	private->mdev = mdev;
+
+	return 0;
+}
+
+static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private;
+	struct subchannel *sch;
+	int ret;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	sch = private->sch;
+	ret = vfio_ccw_sch_quiesce(sch);
+	if (ret)
+		return ret;
+	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	if (ret)
+		return ret;
+
+	private->mdev = NULL;
+	atomic_inc(&private->avail);
+
+	return 0;
+}
+
+static int vfio_ccw_mdev_open(struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private =
+		dev_get_drvdata(mdev_parent_dev(mdev));
+	unsigned long events = VFIO_IOMMU_NOTIFY_DMA_UNMAP;
+
+	private->nb.notifier_call = vfio_ccw_mdev_notifier;
+
+	return vfio_register_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				      &events, &private->nb);
+}
+
+void vfio_ccw_mdev_release(struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private =
+		dev_get_drvdata(mdev_parent_dev(mdev));
+
+	vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
+				 &private->nb);
+}
+
+static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
+	.owner			= THIS_MODULE,
+	.supported_type_groups  = mdev_type_groups,
+	.create			= vfio_ccw_mdev_create,
+	.remove			= vfio_ccw_mdev_remove,
+	.open			= vfio_ccw_mdev_open,
+	.release		= vfio_ccw_mdev_release,
+};
+
+int vfio_ccw_mdev_reg(struct subchannel *sch)
+{
+	return mdev_register_device(&sch->dev, &vfio_ccw_mdev_ops);
+}
+
+void vfio_ccw_mdev_unreg(struct subchannel *sch)
+{
+	mdev_unregister_device(&sch->dev);
+}
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 38d69a5..5afb3ba 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -16,10 +16,21 @@
  * struct vfio_ccw_private
  * @sch: pointer to the subchannel
  * @completion: synchronization helper of the I/O completion
+ * @avail: available for creating a mediated device
+ * @mdev: pointer to the mediated device
+ * @nb: notifier for vfio events
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
 	struct completion	*completion;
+	atomic_t		avail;
+	struct mdev_device	*mdev;
+	struct notifier_block	nb;
 } __aligned(8);
 
+extern int vfio_ccw_mdev_reg(struct subchannel *sch);
+extern void vfio_ccw_mdev_unreg(struct subchannel *sch);
+
+extern int vfio_ccw_sch_quiesce(struct subchannel *sch);
+
 #endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 07/15] vfio: ccw: introduce ccw_io_region
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

To provide user-space a set of interfaces to:
1. pass in a ccw program to perform an I/O operation.
2. read back I/O results of the completed I/O operations.
We introduce an MMIO region for the vfio-ccw device here.

This region is defined to content:
1. areas to store arguments that an ssch required.
2. areas to store the I/O results.

Using pwrite/pread to the device on this region, a user-space program
could write/read data to/from the vfio-ccw device.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 arch/s390/include/uapi/asm/Kbuild   |  1 +
 drivers/s390/cio/vfio_ccw_ops.c     | 47 +++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_private.h |  4 ++++
 include/uapi/linux/vfio_ccw.h       | 24 +++++++++++++++++++
 4 files changed, 76 insertions(+)
 create mode 100644 include/uapi/linux/vfio_ccw.h

diff --git a/arch/s390/include/uapi/asm/Kbuild b/arch/s390/include/uapi/asm/Kbuild
index bf736e7..fdb9529 100644
--- a/arch/s390/include/uapi/asm/Kbuild
+++ b/arch/s390/include/uapi/asm/Kbuild
@@ -52,3 +52,4 @@ header-y += unistd.h
 header-y += virtio-ccw.h
 header-y += vtoc.h
 header-y += zcrypt.h
+header-y += vfio_ccw.h
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index b8a2fed..6c06805 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -127,6 +127,51 @@ void vfio_ccw_mdev_release(struct mdev_device *mdev)
 				 &private->nb);
 }
 
+static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
+				  char __user *buf,
+				  size_t count,
+				  loff_t *ppos)
+{
+	struct vfio_ccw_private *private;
+	struct ccw_io_region *region;
+
+	if (*ppos + count > sizeof(*region))
+		return -EINVAL;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	if (!private)
+		return -ENODEV;
+
+	region = &private->io_region;
+	if (copy_to_user(buf, (void *)region + *ppos, count))
+		return -EFAULT;
+
+	return count;
+}
+
+static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
+				   const char __user *buf,
+				   size_t count,
+				   loff_t *ppos)
+{
+	struct vfio_ccw_private *private;
+	struct ccw_io_region *region;
+
+	if (*ppos + count > sizeof(*region))
+		return -EINVAL;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	if (!private)
+		return -ENODEV;
+
+	region = &private->io_region;
+	if (copy_from_user((void *)region + *ppos, buf, count))
+		return -EFAULT;
+	region->ret_code = 0;
+
+	return count;
+}
+
 static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
 	.owner			= THIS_MODULE,
 	.supported_type_groups  = mdev_type_groups,
@@ -134,6 +179,8 @@ static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
 	.remove			= vfio_ccw_mdev_remove,
 	.open			= vfio_ccw_mdev_open,
 	.release		= vfio_ccw_mdev_release,
+	.read			= vfio_ccw_mdev_read,
+	.write			= vfio_ccw_mdev_write,
 };
 
 int vfio_ccw_mdev_reg(struct subchannel *sch)
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 5afb3ba..359e96b 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -10,6 +10,8 @@
 #ifndef _VFIO_CCW_PRIVATE_H_
 #define _VFIO_CCW_PRIVATE_H_
 
+#include <linux/vfio_ccw.h>
+
 #include "css.h"
 
 /**
@@ -19,6 +21,7 @@
  * @avail: available for creating a mediated device
  * @mdev: pointer to the mediated device
  * @nb: notifier for vfio events
+ * @io_region: MMIO region to input/output I/O arguments/results
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
@@ -26,6 +29,7 @@ struct vfio_ccw_private {
 	atomic_t		avail;
 	struct mdev_device	*mdev;
 	struct notifier_block	nb;
+	struct ccw_io_region	io_region;
 } __aligned(8);
 
 extern int vfio_ccw_mdev_reg(struct subchannel *sch);
diff --git a/include/uapi/linux/vfio_ccw.h b/include/uapi/linux/vfio_ccw.h
new file mode 100644
index 0000000..34a7f6f
--- /dev/null
+++ b/include/uapi/linux/vfio_ccw.h
@@ -0,0 +1,24 @@
+/*
+ * Interfaces for vfio-ccw
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ */
+
+#ifndef _VFIO_CCW_H_
+#define _VFIO_CCW_H_
+
+#include <linux/types.h>
+
+struct ccw_io_region {
+#define ORB_AREA_SIZE 12
+	__u8	orb_area[ORB_AREA_SIZE];
+#define SCSW_AREA_SIZE 12
+	__u8	scsw_area[SCSW_AREA_SIZE];
+#define IRB_AREA_SIZE 96
+	__u8	irb_area[IRB_AREA_SIZE];
+	__u32	ret_code;
+} __packed;
+
+#endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 07/15] vfio: ccw: introduce ccw_io_region
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

To provide user-space a set of interfaces to:
1. pass in a ccw program to perform an I/O operation.
2. read back I/O results of the completed I/O operations.
We introduce an MMIO region for the vfio-ccw device here.

This region is defined to content:
1. areas to store arguments that an ssch required.
2. areas to store the I/O results.

Using pwrite/pread to the device on this region, a user-space program
could write/read data to/from the vfio-ccw device.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 arch/s390/include/uapi/asm/Kbuild   |  1 +
 drivers/s390/cio/vfio_ccw_ops.c     | 47 +++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_private.h |  4 ++++
 include/uapi/linux/vfio_ccw.h       | 24 +++++++++++++++++++
 4 files changed, 76 insertions(+)
 create mode 100644 include/uapi/linux/vfio_ccw.h

diff --git a/arch/s390/include/uapi/asm/Kbuild b/arch/s390/include/uapi/asm/Kbuild
index bf736e7..fdb9529 100644
--- a/arch/s390/include/uapi/asm/Kbuild
+++ b/arch/s390/include/uapi/asm/Kbuild
@@ -52,3 +52,4 @@ header-y += unistd.h
 header-y += virtio-ccw.h
 header-y += vtoc.h
 header-y += zcrypt.h
+header-y += vfio_ccw.h
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index b8a2fed..6c06805 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -127,6 +127,51 @@ void vfio_ccw_mdev_release(struct mdev_device *mdev)
 				 &private->nb);
 }
 
+static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
+				  char __user *buf,
+				  size_t count,
+				  loff_t *ppos)
+{
+	struct vfio_ccw_private *private;
+	struct ccw_io_region *region;
+
+	if (*ppos + count > sizeof(*region))
+		return -EINVAL;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	if (!private)
+		return -ENODEV;
+
+	region = &private->io_region;
+	if (copy_to_user(buf, (void *)region + *ppos, count))
+		return -EFAULT;
+
+	return count;
+}
+
+static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
+				   const char __user *buf,
+				   size_t count,
+				   loff_t *ppos)
+{
+	struct vfio_ccw_private *private;
+	struct ccw_io_region *region;
+
+	if (*ppos + count > sizeof(*region))
+		return -EINVAL;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	if (!private)
+		return -ENODEV;
+
+	region = &private->io_region;
+	if (copy_from_user((void *)region + *ppos, buf, count))
+		return -EFAULT;
+	region->ret_code = 0;
+
+	return count;
+}
+
 static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
 	.owner			= THIS_MODULE,
 	.supported_type_groups  = mdev_type_groups,
@@ -134,6 +179,8 @@ static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
 	.remove			= vfio_ccw_mdev_remove,
 	.open			= vfio_ccw_mdev_open,
 	.release		= vfio_ccw_mdev_release,
+	.read			= vfio_ccw_mdev_read,
+	.write			= vfio_ccw_mdev_write,
 };
 
 int vfio_ccw_mdev_reg(struct subchannel *sch)
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 5afb3ba..359e96b 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -10,6 +10,8 @@
 #ifndef _VFIO_CCW_PRIVATE_H_
 #define _VFIO_CCW_PRIVATE_H_
 
+#include <linux/vfio_ccw.h>
+
 #include "css.h"
 
 /**
@@ -19,6 +21,7 @@
  * @avail: available for creating a mediated device
  * @mdev: pointer to the mediated device
  * @nb: notifier for vfio events
+ * @io_region: MMIO region to input/output I/O arguments/results
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
@@ -26,6 +29,7 @@ struct vfio_ccw_private {
 	atomic_t		avail;
 	struct mdev_device	*mdev;
 	struct notifier_block	nb;
+	struct ccw_io_region	io_region;
 } __aligned(8);
 
 extern int vfio_ccw_mdev_reg(struct subchannel *sch);
diff --git a/include/uapi/linux/vfio_ccw.h b/include/uapi/linux/vfio_ccw.h
new file mode 100644
index 0000000..34a7f6f
--- /dev/null
+++ b/include/uapi/linux/vfio_ccw.h
@@ -0,0 +1,24 @@
+/*
+ * Interfaces for vfio-ccw
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ */
+
+#ifndef _VFIO_CCW_H_
+#define _VFIO_CCW_H_
+
+#include <linux/types.h>
+
+struct ccw_io_region {
+#define ORB_AREA_SIZE 12
+	__u8	orb_area[ORB_AREA_SIZE];
+#define SCSW_AREA_SIZE 12
+	__u8	scsw_area[SCSW_AREA_SIZE];
+#define IRB_AREA_SIZE 96
+	__u8	irb_area[IRB_AREA_SIZE];
+	__u32	ret_code;
+} __packed;
+
+#endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 08/15] vfio: ccw: handle ccw command request
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

We implement the basic ccw command handling infrastructure
here:
1. Translate the ccw commands.
2. Issue the translated ccw commands to the device.
3. Once we get the execution result, update the guest SCSW
   with it.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c     | 114 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_ops.c     |  24 ++++++--
 drivers/s390/cio/vfio_ccw_private.h |  14 +++++
 3 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 80d20e3..601fa1a 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -11,9 +11,13 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/mdev.h>
 
 #include <asm/isc.h>
 
+#include "ioasm.h"
+#include "css.h"
 #include "vfio_ccw_private.h"
 
 /*
@@ -59,6 +63,109 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
 	return ret;
 }
 
+static int doing_io(struct vfio_ccw_private *private, u32 intparm)
+{
+	return (private->intparm == intparm);
+}
+
+static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
+{
+	struct subchannel *sch;
+	union orb *orb;
+	int ccode;
+	__u8 lpm;
+	u32 intparm;
+
+	sch = private->sch;
+
+	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
+
+	/* Issue "Start Subchannel" */
+	ccode = ssch(sch->schid, orb);
+
+	switch (ccode) {
+	case 0:
+		/*
+		 * Initialize device status information
+		 */
+		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
+		break;
+	case 1:		/* Status pending */
+	case 2:		/* Busy */
+		return -EBUSY;
+	case 3:		/* Device/path not operational */
+	{
+		lpm = orb->cmd.lpm;
+		if (lpm != 0)
+			sch->lpm &= ~lpm;
+		else
+			sch->lpm = 0;
+
+		if (cio_update_schib(sch))
+			return -ENODEV;
+
+		return sch->lpm ? -EACCES : -ENODEV;
+	}
+	default:
+		return ccode;
+	}
+
+	intparm = (u32)(addr_t)sch;
+	private->intparm = 0;
+	wait_event(private->wait_q, doing_io(private, intparm));
+
+	if (scsw_is_solicited(&private->irb.scsw))
+		cp_update_scsw(&private->cp, &private->irb.scsw);
+
+	return 0;
+}
+
+/* Deal with the ccw command request from the userspace. */
+int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private)
+{
+	struct mdev_device *mdev = private->mdev;
+	union orb *orb;
+	union scsw *scsw = &private->scsw;
+	struct irb *irb = &private->irb;
+	struct ccw_io_region *io_region = &private->io_region;
+	int ret;
+
+	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
+
+	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
+		orb = (union orb *)io_region->orb_area;
+
+		ret = cp_init(&private->cp, mdev_dev(mdev), orb);
+		if (ret)
+			return ret;
+
+		ret = cp_prefetch(&private->cp);
+		if (ret) {
+			cp_free(&private->cp);
+			return ret;
+		}
+
+		/* Start channel program and wait for I/O interrupt. */
+		ret = vfio_ccw_sch_io_helper(private);
+		if (!ret) {
+			/* Get irb info and copy it to irb_area. */
+			memcpy(io_region->irb_area, irb, sizeof(*irb));
+		}
+
+		cp_free(&private->cp);
+	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
+		/* XXX: Handle halt. */
+		ret = -EOPNOTSUPP;
+	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
+		/* XXX: Handle clear. */
+		ret = -EOPNOTSUPP;
+	} else {
+		ret = -EOPNOTSUPP;
+	}
+
+	return ret;
+}
+
 /*
  * Sysfs interfaces
  */
@@ -113,12 +220,18 @@ static struct attribute_group vfio_subchannel_attr_group = {
 static void vfio_ccw_sch_irq(struct subchannel *sch)
 {
 	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+	struct irb *irb;
 
 	inc_irq_stat(IRQIO_CIO);
 
 	if (!private)
 		return;
 
+	irb = this_cpu_ptr(&cio_irb);
+	memcpy(&private->irb, irb, sizeof(*irb));
+	private->intparm = (u32)(addr_t)sch;
+	wake_up(&private->wait_q);
+
 	if (private->completion)
 		complete(private->completion);
 }
@@ -156,6 +269,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	if (ret)
 		goto out_rm_group;
 
+	init_waitqueue_head(&private->wait_q);
 	atomic_set(&private->avail, 1);
 
 	return 0;
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 6c06805..878c882 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -23,12 +23,25 @@ static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
 		return NOTIFY_STOP;
 
 	/*
-	 * TODO:
 	 * Vendor drivers MUST unpin pages in response to an
 	 * invalidation.
 	 */
-	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP)
-		return NOTIFY_BAD;
+	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
+		struct vfio_iommu_type1_dma_unmap *unmap = data;
+		struct subchannel *sch = private->sch;
+
+		if (!cp_iova_pinned(&private->cp, unmap->iova))
+			return NOTIFY_OK;
+
+		if (vfio_ccw_sch_quiesce(sch))
+			return NOTIFY_BAD;
+
+		if (cio_enable_subchannel(sch, (u32)(unsigned long)sch))
+			return NOTIFY_BAD;
+
+		cp_free(&private->cp);
+		return NOTIFY_OK;
+	}
 
 	return NOTIFY_DONE;
 }
@@ -167,7 +180,10 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	region = &private->io_region;
 	if (copy_from_user((void *)region + *ppos, buf, count))
 		return -EFAULT;
-	region->ret_code = 0;
+
+	region->ret_code = vfio_ccw_sch_cmd_request(private);
+	if (region->ret_code != 0)
+		return region->ret_code;
 
 	return count;
 }
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 359e96b..79e5337 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -10,9 +10,11 @@
 #ifndef _VFIO_CCW_PRIVATE_H_
 #define _VFIO_CCW_PRIVATE_H_
 
+#include <linux/completion.h>
 #include <linux/vfio_ccw.h>
 
 #include "css.h"
+#include "vfio_ccw_cp.h"
 
 /**
  * struct vfio_ccw_private
@@ -22,6 +24,11 @@
  * @mdev: pointer to the mediated device
  * @nb: notifier for vfio events
  * @io_region: MMIO region to input/output I/O arguments/results
+ * @wait_q: wait for interrupt
+ * @intparm: record current interrupt parameter, used for wait interrupt
+ * @cp: channel program for the current I/O operation
+ * @irb: irb info received from interrupt
+ * @scsw: scsw info
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
@@ -30,11 +37,18 @@ struct vfio_ccw_private {
 	struct mdev_device	*mdev;
 	struct notifier_block	nb;
 	struct ccw_io_region	io_region;
+
+	wait_queue_head_t	wait_q;
+	u32			intparm;
+	struct channel_program	cp;
+	struct irb		irb;
+	union scsw		scsw;
 } __aligned(8);
 
 extern int vfio_ccw_mdev_reg(struct subchannel *sch);
 extern void vfio_ccw_mdev_unreg(struct subchannel *sch);
 
 extern int vfio_ccw_sch_quiesce(struct subchannel *sch);
+extern int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private);
 
 #endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 08/15] vfio: ccw: handle ccw command request
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

We implement the basic ccw command handling infrastructure
here:
1. Translate the ccw commands.
2. Issue the translated ccw commands to the device.
3. Once we get the execution result, update the guest SCSW
   with it.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c     | 114 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_ops.c     |  24 ++++++--
 drivers/s390/cio/vfio_ccw_private.h |  14 +++++
 3 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 80d20e3..601fa1a 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -11,9 +11,13 @@
 #include <linux/init.h>
 #include <linux/device.h>
 #include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/mdev.h>
 
 #include <asm/isc.h>
 
+#include "ioasm.h"
+#include "css.h"
 #include "vfio_ccw_private.h"
 
 /*
@@ -59,6 +63,109 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
 	return ret;
 }
 
+static int doing_io(struct vfio_ccw_private *private, u32 intparm)
+{
+	return (private->intparm == intparm);
+}
+
+static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
+{
+	struct subchannel *sch;
+	union orb *orb;
+	int ccode;
+	__u8 lpm;
+	u32 intparm;
+
+	sch = private->sch;
+
+	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
+
+	/* Issue "Start Subchannel" */
+	ccode = ssch(sch->schid, orb);
+
+	switch (ccode) {
+	case 0:
+		/*
+		 * Initialize device status information
+		 */
+		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
+		break;
+	case 1:		/* Status pending */
+	case 2:		/* Busy */
+		return -EBUSY;
+	case 3:		/* Device/path not operational */
+	{
+		lpm = orb->cmd.lpm;
+		if (lpm != 0)
+			sch->lpm &= ~lpm;
+		else
+			sch->lpm = 0;
+
+		if (cio_update_schib(sch))
+			return -ENODEV;
+
+		return sch->lpm ? -EACCES : -ENODEV;
+	}
+	default:
+		return ccode;
+	}
+
+	intparm = (u32)(addr_t)sch;
+	private->intparm = 0;
+	wait_event(private->wait_q, doing_io(private, intparm));
+
+	if (scsw_is_solicited(&private->irb.scsw))
+		cp_update_scsw(&private->cp, &private->irb.scsw);
+
+	return 0;
+}
+
+/* Deal with the ccw command request from the userspace. */
+int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private)
+{
+	struct mdev_device *mdev = private->mdev;
+	union orb *orb;
+	union scsw *scsw = &private->scsw;
+	struct irb *irb = &private->irb;
+	struct ccw_io_region *io_region = &private->io_region;
+	int ret;
+
+	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
+
+	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
+		orb = (union orb *)io_region->orb_area;
+
+		ret = cp_init(&private->cp, mdev_dev(mdev), orb);
+		if (ret)
+			return ret;
+
+		ret = cp_prefetch(&private->cp);
+		if (ret) {
+			cp_free(&private->cp);
+			return ret;
+		}
+
+		/* Start channel program and wait for I/O interrupt. */
+		ret = vfio_ccw_sch_io_helper(private);
+		if (!ret) {
+			/* Get irb info and copy it to irb_area. */
+			memcpy(io_region->irb_area, irb, sizeof(*irb));
+		}
+
+		cp_free(&private->cp);
+	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
+		/* XXX: Handle halt. */
+		ret = -EOPNOTSUPP;
+	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
+		/* XXX: Handle clear. */
+		ret = -EOPNOTSUPP;
+	} else {
+		ret = -EOPNOTSUPP;
+	}
+
+	return ret;
+}
+
 /*
  * Sysfs interfaces
  */
@@ -113,12 +220,18 @@ static struct attribute_group vfio_subchannel_attr_group = {
 static void vfio_ccw_sch_irq(struct subchannel *sch)
 {
 	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
+	struct irb *irb;
 
 	inc_irq_stat(IRQIO_CIO);
 
 	if (!private)
 		return;
 
+	irb = this_cpu_ptr(&cio_irb);
+	memcpy(&private->irb, irb, sizeof(*irb));
+	private->intparm = (u32)(addr_t)sch;
+	wake_up(&private->wait_q);
+
 	if (private->completion)
 		complete(private->completion);
 }
@@ -156,6 +269,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	if (ret)
 		goto out_rm_group;
 
+	init_waitqueue_head(&private->wait_q);
 	atomic_set(&private->avail, 1);
 
 	return 0;
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 6c06805..878c882 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -23,12 +23,25 @@ static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
 		return NOTIFY_STOP;
 
 	/*
-	 * TODO:
 	 * Vendor drivers MUST unpin pages in response to an
 	 * invalidation.
 	 */
-	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP)
-		return NOTIFY_BAD;
+	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
+		struct vfio_iommu_type1_dma_unmap *unmap = data;
+		struct subchannel *sch = private->sch;
+
+		if (!cp_iova_pinned(&private->cp, unmap->iova))
+			return NOTIFY_OK;
+
+		if (vfio_ccw_sch_quiesce(sch))
+			return NOTIFY_BAD;
+
+		if (cio_enable_subchannel(sch, (u32)(unsigned long)sch))
+			return NOTIFY_BAD;
+
+		cp_free(&private->cp);
+		return NOTIFY_OK;
+	}
 
 	return NOTIFY_DONE;
 }
@@ -167,7 +180,10 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	region = &private->io_region;
 	if (copy_from_user((void *)region + *ppos, buf, count))
 		return -EFAULT;
-	region->ret_code = 0;
+
+	region->ret_code = vfio_ccw_sch_cmd_request(private);
+	if (region->ret_code != 0)
+		return region->ret_code;
 
 	return count;
 }
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 359e96b..79e5337 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -10,9 +10,11 @@
 #ifndef _VFIO_CCW_PRIVATE_H_
 #define _VFIO_CCW_PRIVATE_H_
 
+#include <linux/completion.h>
 #include <linux/vfio_ccw.h>
 
 #include "css.h"
+#include "vfio_ccw_cp.h"
 
 /**
  * struct vfio_ccw_private
@@ -22,6 +24,11 @@
  * @mdev: pointer to the mediated device
  * @nb: notifier for vfio events
  * @io_region: MMIO region to input/output I/O arguments/results
+ * @wait_q: wait for interrupt
+ * @intparm: record current interrupt parameter, used for wait interrupt
+ * @cp: channel program for the current I/O operation
+ * @irb: irb info received from interrupt
+ * @scsw: scsw info
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
@@ -30,11 +37,18 @@ struct vfio_ccw_private {
 	struct mdev_device	*mdev;
 	struct notifier_block	nb;
 	struct ccw_io_region	io_region;
+
+	wait_queue_head_t	wait_q;
+	u32			intparm;
+	struct channel_program	cp;
+	struct irb		irb;
+	union scsw		scsw;
 } __aligned(8);
 
 extern int vfio_ccw_mdev_reg(struct subchannel *sch);
 extern void vfio_ccw_mdev_unreg(struct subchannel *sch);
 
 extern int vfio_ccw_sch_quiesce(struct subchannel *sch);
+extern int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private);
 
 #endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 09/15] vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Introduce device information about vfio-ccw: VFIO_DEVICE_FLAGS_CCW.
Realize VFIO_DEVICE_GET_REGION_INFO ioctl for vfio-ccw.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_ops.c | 78 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 11 ++++++
 2 files changed, 89 insertions(+)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 878c882..f3300dd 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -188,6 +188,83 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	return count;
 }
 
+static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
+{
+	info->flags = VFIO_DEVICE_FLAGS_CCW;
+	info->num_regions = VFIO_CCW_NUM_REGIONS;
+	info->num_irqs = 0;
+
+	return 0;
+}
+
+static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
+					 u16 *cap_type_id,
+					 void **cap_type)
+{
+	switch (info->index) {
+	case VFIO_CCW_CONFIG_REGION_INDEX:
+		info->offset = 0;
+		info->size = sizeof(struct ccw_io_region);
+		info->flags = VFIO_REGION_INFO_FLAG_READ
+			      | VFIO_REGION_INFO_FLAG_WRITE;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
+				   unsigned int cmd,
+				   unsigned long arg)
+{
+	int ret = 0;
+	unsigned long minsz;
+
+	switch (cmd) {
+	case VFIO_DEVICE_GET_INFO:
+	{
+		struct vfio_device_info info;
+
+		minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		ret = vfio_ccw_mdev_get_device_info(&info);
+		if (ret)
+			return ret;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
+	}
+	case VFIO_DEVICE_GET_REGION_INFO:
+	{
+		struct vfio_region_info info;
+		u16 cap_type_id = 0;
+		void *cap_type = NULL;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		ret = vfio_ccw_mdev_get_region_info(&info, &cap_type_id,
+						    &cap_type);
+		if (ret)
+			return ret;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
+	}
+	default:
+		return -ENOTTY;
+	}
+}
+
 static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
 	.owner			= THIS_MODULE,
 	.supported_type_groups  = mdev_type_groups,
@@ -197,6 +274,7 @@ static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
 	.release		= vfio_ccw_mdev_release,
 	.read			= vfio_ccw_mdev_read,
 	.write			= vfio_ccw_mdev_write,
+	.ioctl			= vfio_ccw_mdev_ioctl,
 };
 
 int vfio_ccw_mdev_reg(struct subchannel *sch)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6183789..3fd70ff 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -198,6 +198,7 @@ struct vfio_device_info {
 #define VFIO_DEVICE_FLAGS_PCI	(1 << 1)	/* vfio-pci device */
 #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)	/* vfio-platform device */
 #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)	/* vfio-amba device */
+#define VFIO_DEVICE_FLAGS_CCW	(1 << 4)	/* vfio-ccw device */
 	__u32	num_regions;	/* Max region index + 1 */
 	__u32	num_irqs;	/* Max IRQ index + 1 */
 };
@@ -447,6 +448,16 @@ enum {
 	VFIO_PCI_NUM_IRQS
 };
 
+/*
+ * The vfio-ccw bus driver makes use of the following fixed region.
+ * Unimplemented regions return a size of zero.
+ */
+
+enum {
+	VFIO_CCW_CONFIG_REGION_INDEX,
+	VFIO_CCW_NUM_REGIONS
+};
+
 /**
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
  *					      struct vfio_pci_hot_reset_info)
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 09/15] vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Introduce device information about vfio-ccw: VFIO_DEVICE_FLAGS_CCW.
Realize VFIO_DEVICE_GET_REGION_INFO ioctl for vfio-ccw.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_ops.c | 78 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/vfio.h       | 11 ++++++
 2 files changed, 89 insertions(+)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 878c882..f3300dd 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -188,6 +188,83 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	return count;
 }
 
+static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
+{
+	info->flags = VFIO_DEVICE_FLAGS_CCW;
+	info->num_regions = VFIO_CCW_NUM_REGIONS;
+	info->num_irqs = 0;
+
+	return 0;
+}
+
+static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
+					 u16 *cap_type_id,
+					 void **cap_type)
+{
+	switch (info->index) {
+	case VFIO_CCW_CONFIG_REGION_INDEX:
+		info->offset = 0;
+		info->size = sizeof(struct ccw_io_region);
+		info->flags = VFIO_REGION_INFO_FLAG_READ
+			      | VFIO_REGION_INFO_FLAG_WRITE;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
+				   unsigned int cmd,
+				   unsigned long arg)
+{
+	int ret = 0;
+	unsigned long minsz;
+
+	switch (cmd) {
+	case VFIO_DEVICE_GET_INFO:
+	{
+		struct vfio_device_info info;
+
+		minsz = offsetofend(struct vfio_device_info, num_irqs);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		ret = vfio_ccw_mdev_get_device_info(&info);
+		if (ret)
+			return ret;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
+	}
+	case VFIO_DEVICE_GET_REGION_INFO:
+	{
+		struct vfio_region_info info;
+		u16 cap_type_id = 0;
+		void *cap_type = NULL;
+
+		minsz = offsetofend(struct vfio_region_info, offset);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz)
+			return -EINVAL;
+
+		ret = vfio_ccw_mdev_get_region_info(&info, &cap_type_id,
+						    &cap_type);
+		if (ret)
+			return ret;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
+	}
+	default:
+		return -ENOTTY;
+	}
+}
+
 static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
 	.owner			= THIS_MODULE,
 	.supported_type_groups  = mdev_type_groups,
@@ -197,6 +274,7 @@ static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
 	.release		= vfio_ccw_mdev_release,
 	.read			= vfio_ccw_mdev_read,
 	.write			= vfio_ccw_mdev_write,
+	.ioctl			= vfio_ccw_mdev_ioctl,
 };
 
 int vfio_ccw_mdev_reg(struct subchannel *sch)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6183789..3fd70ff 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -198,6 +198,7 @@ struct vfio_device_info {
 #define VFIO_DEVICE_FLAGS_PCI	(1 << 1)	/* vfio-pci device */
 #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2)	/* vfio-platform device */
 #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)	/* vfio-amba device */
+#define VFIO_DEVICE_FLAGS_CCW	(1 << 4)	/* vfio-ccw device */
 	__u32	num_regions;	/* Max region index + 1 */
 	__u32	num_irqs;	/* Max IRQ index + 1 */
 };
@@ -447,6 +448,16 @@ enum {
 	VFIO_PCI_NUM_IRQS
 };
 
+/*
+ * The vfio-ccw bus driver makes use of the following fixed region.
+ * Unimplemented regions return a size of zero.
+ */
+
+enum {
+	VFIO_CCW_CONFIG_REGION_INDEX,
+	VFIO_CCW_NUM_REGIONS
+};
+
 /**
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
  *					      struct vfio_pci_hot_reset_info)
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 10/15] vfio: ccw: realize VFIO_DEVICE_RESET ioctl
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Introduce VFIO_DEVICE_RESET ioctl for vfio-ccw to make it possible
to hot-reset the device.

We try to achieve a reset by first disabling the subchannel and
then enabling it again: this should clear all state at the subchannel.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_ops.c | 47 +++++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index f3300dd..125818c 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -12,6 +12,32 @@
 
 #include "vfio_ccw_private.h"
 
+static int vfio_ccw_mdev_reset(struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private;
+	struct subchannel *sch;
+	int ret;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	if (!private)
+		return -ENODEV;
+
+	sch = private->sch;
+	/*
+	 * TODO:
+	 * In the cureent stage, some things like "no I/O running" and "no
+	 * interrupt pending" are clear, but we are not sure what other state
+	 * we need to care about.
+	 * There are still a lot more instructions need to be handled. We
+	 * should come back here later.
+	 */
+	ret = vfio_ccw_sch_quiesce(sch);
+	if (ret)
+		return ret;
+
+	return cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+}
+
 static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
 				  unsigned long action,
 				  void *data)
@@ -28,15 +54,11 @@ static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
 	 */
 	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
 		struct vfio_iommu_type1_dma_unmap *unmap = data;
-		struct subchannel *sch = private->sch;
 
 		if (!cp_iova_pinned(&private->cp, unmap->iova))
 			return NOTIFY_OK;
 
-		if (vfio_ccw_sch_quiesce(sch))
-			return NOTIFY_BAD;
-
-		if (cio_enable_subchannel(sch, (u32)(unsigned long)sch))
+		if (vfio_ccw_mdev_reset(private->mdev))
 			return NOTIFY_BAD;
 
 		cp_free(&private->cp);
@@ -100,16 +122,11 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 
 static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
 {
-	struct vfio_ccw_private *private;
-	struct subchannel *sch;
+	struct vfio_ccw_private *private =
+		dev_get_drvdata(mdev_parent_dev(mdev));
 	int ret;
 
-	private = dev_get_drvdata(mdev_parent_dev(mdev));
-	sch = private->sch;
-	ret = vfio_ccw_sch_quiesce(sch);
-	if (ret)
-		return ret;
-	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	ret = vfio_ccw_mdev_reset(mdev);
 	if (ret)
 		return ret;
 
@@ -190,7 +207,7 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 
 static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
 {
-	info->flags = VFIO_DEVICE_FLAGS_CCW;
+	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
 	info->num_regions = VFIO_CCW_NUM_REGIONS;
 	info->num_irqs = 0;
 
@@ -260,6 +277,8 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 	}
+	case VFIO_DEVICE_RESET:
+		return vfio_ccw_mdev_reset(mdev);
 	default:
 		return -ENOTTY;
 	}
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 10/15] vfio: ccw: realize VFIO_DEVICE_RESET ioctl
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Introduce VFIO_DEVICE_RESET ioctl for vfio-ccw to make it possible
to hot-reset the device.

We try to achieve a reset by first disabling the subchannel and
then enabling it again: this should clear all state at the subchannel.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_ops.c | 47 +++++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index f3300dd..125818c 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -12,6 +12,32 @@
 
 #include "vfio_ccw_private.h"
 
+static int vfio_ccw_mdev_reset(struct mdev_device *mdev)
+{
+	struct vfio_ccw_private *private;
+	struct subchannel *sch;
+	int ret;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	if (!private)
+		return -ENODEV;
+
+	sch = private->sch;
+	/*
+	 * TODO:
+	 * In the cureent stage, some things like "no I/O running" and "no
+	 * interrupt pending" are clear, but we are not sure what other state
+	 * we need to care about.
+	 * There are still a lot more instructions need to be handled. We
+	 * should come back here later.
+	 */
+	ret = vfio_ccw_sch_quiesce(sch);
+	if (ret)
+		return ret;
+
+	return cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+}
+
 static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
 				  unsigned long action,
 				  void *data)
@@ -28,15 +54,11 @@ static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
 	 */
 	if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) {
 		struct vfio_iommu_type1_dma_unmap *unmap = data;
-		struct subchannel *sch = private->sch;
 
 		if (!cp_iova_pinned(&private->cp, unmap->iova))
 			return NOTIFY_OK;
 
-		if (vfio_ccw_sch_quiesce(sch))
-			return NOTIFY_BAD;
-
-		if (cio_enable_subchannel(sch, (u32)(unsigned long)sch))
+		if (vfio_ccw_mdev_reset(private->mdev))
 			return NOTIFY_BAD;
 
 		cp_free(&private->cp);
@@ -100,16 +122,11 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 
 static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
 {
-	struct vfio_ccw_private *private;
-	struct subchannel *sch;
+	struct vfio_ccw_private *private =
+		dev_get_drvdata(mdev_parent_dev(mdev));
 	int ret;
 
-	private = dev_get_drvdata(mdev_parent_dev(mdev));
-	sch = private->sch;
-	ret = vfio_ccw_sch_quiesce(sch);
-	if (ret)
-		return ret;
-	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	ret = vfio_ccw_mdev_reset(mdev);
 	if (ret)
 		return ret;
 
@@ -190,7 +207,7 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 
 static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
 {
-	info->flags = VFIO_DEVICE_FLAGS_CCW;
+	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
 	info->num_regions = VFIO_CCW_NUM_REGIONS;
 	info->num_irqs = 0;
 
@@ -260,6 +277,8 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 	}
+	case VFIO_DEVICE_RESET:
+		return vfio_ccw_mdev_reset(mdev);
 	default:
 		return -ENOTTY;
 	}
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 11/15] vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve
VFIO_CCW_IO_IRQ information.

Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for
VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region
was performed, trigger a signal on this fd.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_ops.c     | 123 +++++++++++++++++++++++++++++++++++-
 drivers/s390/cio/vfio_ccw_private.h |   4 ++
 include/uapi/linux/vfio.h           |  10 ++-
 3 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 125818c..ebc38fb 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -202,6 +202,9 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	if (region->ret_code != 0)
 		return region->ret_code;
 
+	if (private->io_trigger)
+		eventfd_signal(private->io_trigger, 1);
+
 	return count;
 }
 
@@ -209,7 +212,7 @@ static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
 {
 	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
 	info->num_regions = VFIO_CCW_NUM_REGIONS;
-	info->num_irqs = 0;
+	info->num_irqs = VFIO_CCW_NUM_IRQS;
 
 	return 0;
 }
@@ -230,6 +233,83 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
 	}
 }
 
+int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
+{
+	if (info->index != VFIO_CCW_IO_IRQ_INDEX)
+		return -EINVAL;
+
+	info->count = 1;
+	info->flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE;
+
+	return 0;
+}
+
+static int vfio_ccw_mdev_set_irqs(struct mdev_device *mdev,
+				  uint32_t flags,
+				  void __user *data)
+{
+	struct vfio_ccw_private *private;
+	struct eventfd_ctx **ctx;
+
+	if (!(flags & VFIO_IRQ_SET_ACTION_TRIGGER))
+		return -EINVAL;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	if (!private)
+		return -ENODEV;
+
+	ctx = &private->io_trigger;
+
+	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+	case VFIO_IRQ_SET_DATA_NONE:
+	{
+		if (*ctx)
+			eventfd_signal(*ctx, 1);
+		return 0;
+	}
+	case VFIO_IRQ_SET_DATA_BOOL:
+	{
+		uint8_t trigger;
+
+		if (get_user(trigger, (uint8_t __user *)data))
+			return -EFAULT;
+
+		if (trigger && *ctx)
+			eventfd_signal(*ctx, 1);
+		return 0;
+	}
+	case VFIO_IRQ_SET_DATA_EVENTFD:
+	{
+		int32_t fd;
+
+		if (get_user(fd, (int32_t __user *)data))
+			return -EFAULT;
+
+		if (fd == -1) {
+			if (*ctx)
+				eventfd_ctx_put(*ctx);
+			*ctx = NULL;
+		} else if (fd >= 0) {
+			struct eventfd_ctx *efdctx;
+
+			efdctx = eventfd_ctx_fdget(fd);
+			if (IS_ERR(efdctx))
+				return PTR_ERR(efdctx);
+
+			if (*ctx)
+				eventfd_ctx_put(*ctx);
+
+			*ctx = efdctx;
+		} else
+			return -EINVAL;
+
+		return 0;
+	}
+	default:
+		return -EINVAL;
+	}
+}
+
 static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 				   unsigned int cmd,
 				   unsigned long arg)
@@ -277,6 +357,47 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 	}
+	case VFIO_DEVICE_GET_IRQ_INFO:
+	{
+		struct vfio_irq_info info;
+
+		minsz = offsetofend(struct vfio_irq_info, count);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz || info.index >= VFIO_CCW_NUM_IRQS)
+			return -EINVAL;
+
+		ret = vfio_ccw_mdev_get_irq_info(&info);
+		if (ret)
+			return ret;
+
+		if (info.count == -1)
+			return -EINVAL;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
+	}
+	case VFIO_DEVICE_SET_IRQS:
+	{
+		struct vfio_irq_set hdr;
+		size_t data_size;
+		void __user *data;
+
+		minsz = offsetofend(struct vfio_irq_set, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		ret = vfio_set_irqs_validate_and_prepare(&hdr, 1,
+							 VFIO_CCW_NUM_IRQS,
+							 &data_size);
+		if (ret)
+			return ret;
+
+		data = (void __user *)(arg + minsz);
+		return vfio_ccw_mdev_set_irqs(mdev, hdr.flags, data);
+	}
 	case VFIO_DEVICE_RESET:
 		return vfio_ccw_mdev_reset(mdev);
 	default:
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 79e5337..dddab52 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -11,6 +11,7 @@
 #define _VFIO_CCW_PRIVATE_H_
 
 #include <linux/completion.h>
+#include <linux/eventfd.h>
 #include <linux/vfio_ccw.h>
 
 #include "css.h"
@@ -29,6 +30,7 @@
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
  * @scsw: scsw info
+ * @io_trigger: eventfd ctx for signaling userspace I/O results
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
@@ -43,6 +45,8 @@ struct vfio_ccw_private {
 	struct channel_program	cp;
 	struct irb		irb;
 	union scsw		scsw;
+
+	struct eventfd_ctx	*io_trigger;
 } __aligned(8);
 
 extern int vfio_ccw_mdev_reg(struct subchannel *sch);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 3fd70ff..ae46105 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -449,8 +449,9 @@ enum {
 };
 
 /*
- * The vfio-ccw bus driver makes use of the following fixed region.
- * Unimplemented regions return a size of zero.
+ * The vfio-ccw bus driver makes use of the following fixed region and
+ * IRQ index mapping. Unimplemented regions return a size of zero.
+ * Unimplemented IRQ types return a count of zero.
  */
 
 enum {
@@ -458,6 +459,11 @@ enum {
 	VFIO_CCW_NUM_REGIONS
 };
 
+enum {
+	VFIO_CCW_IO_IRQ_INDEX,
+	VFIO_CCW_NUM_IRQS
+};
+
 /**
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
  *					      struct vfio_pci_hot_reset_info)
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 11/15] vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve
VFIO_CCW_IO_IRQ information.

Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for
VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region
was performed, trigger a signal on this fd.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_ops.c     | 123 +++++++++++++++++++++++++++++++++++-
 drivers/s390/cio/vfio_ccw_private.h |   4 ++
 include/uapi/linux/vfio.h           |  10 ++-
 3 files changed, 134 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 125818c..ebc38fb 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -202,6 +202,9 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	if (region->ret_code != 0)
 		return region->ret_code;
 
+	if (private->io_trigger)
+		eventfd_signal(private->io_trigger, 1);
+
 	return count;
 }
 
@@ -209,7 +212,7 @@ static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
 {
 	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
 	info->num_regions = VFIO_CCW_NUM_REGIONS;
-	info->num_irqs = 0;
+	info->num_irqs = VFIO_CCW_NUM_IRQS;
 
 	return 0;
 }
@@ -230,6 +233,83 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
 	}
 }
 
+int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
+{
+	if (info->index != VFIO_CCW_IO_IRQ_INDEX)
+		return -EINVAL;
+
+	info->count = 1;
+	info->flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE;
+
+	return 0;
+}
+
+static int vfio_ccw_mdev_set_irqs(struct mdev_device *mdev,
+				  uint32_t flags,
+				  void __user *data)
+{
+	struct vfio_ccw_private *private;
+	struct eventfd_ctx **ctx;
+
+	if (!(flags & VFIO_IRQ_SET_ACTION_TRIGGER))
+		return -EINVAL;
+
+	private = dev_get_drvdata(mdev_parent_dev(mdev));
+	if (!private)
+		return -ENODEV;
+
+	ctx = &private->io_trigger;
+
+	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
+	case VFIO_IRQ_SET_DATA_NONE:
+	{
+		if (*ctx)
+			eventfd_signal(*ctx, 1);
+		return 0;
+	}
+	case VFIO_IRQ_SET_DATA_BOOL:
+	{
+		uint8_t trigger;
+
+		if (get_user(trigger, (uint8_t __user *)data))
+			return -EFAULT;
+
+		if (trigger && *ctx)
+			eventfd_signal(*ctx, 1);
+		return 0;
+	}
+	case VFIO_IRQ_SET_DATA_EVENTFD:
+	{
+		int32_t fd;
+
+		if (get_user(fd, (int32_t __user *)data))
+			return -EFAULT;
+
+		if (fd == -1) {
+			if (*ctx)
+				eventfd_ctx_put(*ctx);
+			*ctx = NULL;
+		} else if (fd >= 0) {
+			struct eventfd_ctx *efdctx;
+
+			efdctx = eventfd_ctx_fdget(fd);
+			if (IS_ERR(efdctx))
+				return PTR_ERR(efdctx);
+
+			if (*ctx)
+				eventfd_ctx_put(*ctx);
+
+			*ctx = efdctx;
+		} else
+			return -EINVAL;
+
+		return 0;
+	}
+	default:
+		return -EINVAL;
+	}
+}
+
 static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 				   unsigned int cmd,
 				   unsigned long arg)
@@ -277,6 +357,47 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
 
 		return copy_to_user((void __user *)arg, &info, minsz);
 	}
+	case VFIO_DEVICE_GET_IRQ_INFO:
+	{
+		struct vfio_irq_info info;
+
+		minsz = offsetofend(struct vfio_irq_info, count);
+
+		if (copy_from_user(&info, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		if (info.argsz < minsz || info.index >= VFIO_CCW_NUM_IRQS)
+			return -EINVAL;
+
+		ret = vfio_ccw_mdev_get_irq_info(&info);
+		if (ret)
+			return ret;
+
+		if (info.count == -1)
+			return -EINVAL;
+
+		return copy_to_user((void __user *)arg, &info, minsz);
+	}
+	case VFIO_DEVICE_SET_IRQS:
+	{
+		struct vfio_irq_set hdr;
+		size_t data_size;
+		void __user *data;
+
+		minsz = offsetofend(struct vfio_irq_set, count);
+
+		if (copy_from_user(&hdr, (void __user *)arg, minsz))
+			return -EFAULT;
+
+		ret = vfio_set_irqs_validate_and_prepare(&hdr, 1,
+							 VFIO_CCW_NUM_IRQS,
+							 &data_size);
+		if (ret)
+			return ret;
+
+		data = (void __user *)(arg + minsz);
+		return vfio_ccw_mdev_set_irqs(mdev, hdr.flags, data);
+	}
 	case VFIO_DEVICE_RESET:
 		return vfio_ccw_mdev_reset(mdev);
 	default:
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 79e5337..dddab52 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -11,6 +11,7 @@
 #define _VFIO_CCW_PRIVATE_H_
 
 #include <linux/completion.h>
+#include <linux/eventfd.h>
 #include <linux/vfio_ccw.h>
 
 #include "css.h"
@@ -29,6 +30,7 @@
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
  * @scsw: scsw info
+ * @io_trigger: eventfd ctx for signaling userspace I/O results
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
@@ -43,6 +45,8 @@ struct vfio_ccw_private {
 	struct channel_program	cp;
 	struct irb		irb;
 	union scsw		scsw;
+
+	struct eventfd_ctx	*io_trigger;
 } __aligned(8);
 
 extern int vfio_ccw_mdev_reg(struct subchannel *sch);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 3fd70ff..ae46105 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -449,8 +449,9 @@ enum {
 };
 
 /*
- * The vfio-ccw bus driver makes use of the following fixed region.
- * Unimplemented regions return a size of zero.
+ * The vfio-ccw bus driver makes use of the following fixed region and
+ * IRQ index mapping. Unimplemented regions return a size of zero.
+ * Unimplemented IRQ types return a count of zero.
  */
 
 enum {
@@ -458,6 +459,11 @@ enum {
 	VFIO_CCW_NUM_REGIONS
 };
 
+enum {
+	VFIO_CCW_IO_IRQ_INDEX,
+	VFIO_CCW_NUM_IRQS
+};
+
 /**
  * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
  *					      struct vfio_pci_hot_reset_info)
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 12/15] vfio: ccw: return I/O results asynchronously
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Introduce a singlethreaded workqueue to handle the I/O interrupts.
With the work added to this queue, we store the I/O results to the
io_region of the subchannel, then signal the userspace program to
handle the results.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c     | 58 ++++++++++++++++++++++---------------
 drivers/s390/cio/vfio_ccw_ops.c     |  3 --
 drivers/s390/cio/vfio_ccw_private.h |  7 ++---
 3 files changed, 37 insertions(+), 31 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 601fa1a..dcf2d70 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -20,6 +20,8 @@
 #include "css.h"
 #include "vfio_ccw_private.h"
 
+struct workqueue_struct *vfio_ccw_work_q;
+
 /*
  * Helpers
  */
@@ -52,6 +54,7 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
 
 			spin_lock_irq(sch->lock);
 			private->completion = NULL;
+			flush_workqueue(vfio_ccw_work_q);
 			ret = cio_cancel_halt_clear(sch, &iretry);
 		};
 
@@ -63,18 +66,12 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
 	return ret;
 }
 
-static int doing_io(struct vfio_ccw_private *private, u32 intparm)
-{
-	return (private->intparm == intparm);
-}
-
 static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
 {
 	struct subchannel *sch;
 	union orb *orb;
 	int ccode;
 	__u8 lpm;
-	u32 intparm;
 
 	sch = private->sch;
 
@@ -89,7 +86,7 @@ static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
 		 * Initialize device status information
 		 */
 		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
-		break;
+		return 0;
 	case 1:		/* Status pending */
 	case 2:		/* Busy */
 		return -EBUSY;
@@ -109,15 +106,26 @@ static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
 	default:
 		return ccode;
 	}
+}
 
-	intparm = (u32)(addr_t)sch;
-	private->intparm = 0;
-	wait_event(private->wait_q, doing_io(private, intparm));
+static void vfio_ccw_sch_io_todo(struct work_struct *work)
+{
+	struct vfio_ccw_private *private;
+	struct subchannel *sch;
+	struct irb *irb;
 
-	if (scsw_is_solicited(&private->irb.scsw))
-		cp_update_scsw(&private->cp, &private->irb.scsw);
+	private = container_of(work, struct vfio_ccw_private, io_work);
+	irb = &private->irb;
+	sch = private->sch;
 
-	return 0;
+	if (scsw_is_solicited(&irb->scsw)) {
+		cp_update_scsw(&private->cp, &irb->scsw);
+		cp_free(&private->cp);
+	}
+	memcpy(private->io_region.irb_area, irb, sizeof(*irb));
+
+	if (private->io_trigger)
+		eventfd_signal(private->io_trigger, 1);
 }
 
 /* Deal with the ccw command request from the userspace. */
@@ -126,7 +134,6 @@ int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private)
 	struct mdev_device *mdev = private->mdev;
 	union orb *orb;
 	union scsw *scsw = &private->scsw;
-	struct irb *irb = &private->irb;
 	struct ccw_io_region *io_region = &private->io_region;
 	int ret;
 
@@ -147,12 +154,8 @@ int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private)
 
 		/* Start channel program and wait for I/O interrupt. */
 		ret = vfio_ccw_sch_io_helper(private);
-		if (!ret) {
-			/* Get irb info and copy it to irb_area. */
-			memcpy(io_region->irb_area, irb, sizeof(*irb));
-		}
-
-		cp_free(&private->cp);
+		if (!ret)
+			cp_free(&private->cp);
 	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
 		/* XXX: Handle halt. */
 		ret = -EOPNOTSUPP;
@@ -229,8 +232,8 @@ static void vfio_ccw_sch_irq(struct subchannel *sch)
 
 	irb = this_cpu_ptr(&cio_irb);
 	memcpy(&private->irb, irb, sizeof(*irb));
-	private->intparm = (u32)(addr_t)sch;
-	wake_up(&private->wait_q);
+
+	queue_work(vfio_ccw_work_q, &private->io_work);
 
 	if (private->completion)
 		complete(private->completion);
@@ -269,7 +272,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	if (ret)
 		goto out_rm_group;
 
-	init_waitqueue_head(&private->wait_q);
+	INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
 	atomic_set(&private->avail, 1);
 
 	return 0;
@@ -367,10 +370,16 @@ static int __init vfio_ccw_sch_init(void)
 {
 	int ret;
 
+	vfio_ccw_work_q = create_singlethread_workqueue("vfio-ccw");
+	if (!vfio_ccw_work_q)
+		return -ENOMEM;
+
 	isc_register(VFIO_CCW_ISC);
 	ret = css_driver_register(&vfio_ccw_sch_driver);
-	if (ret)
+	if (ret) {
 		isc_unregister(VFIO_CCW_ISC);
+		destroy_workqueue(vfio_ccw_work_q);
+	}
 
 	return ret;
 }
@@ -379,6 +388,7 @@ static void __exit vfio_ccw_sch_exit(void)
 {
 	css_driver_unregister(&vfio_ccw_sch_driver);
 	isc_unregister(VFIO_CCW_ISC);
+	destroy_workqueue(vfio_ccw_work_q);
 }
 module_init(vfio_ccw_sch_init);
 module_exit(vfio_ccw_sch_exit);
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index ebc38fb..8e6754d 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -202,9 +202,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	if (region->ret_code != 0)
 		return region->ret_code;
 
-	if (private->io_trigger)
-		eventfd_signal(private->io_trigger, 1);
-
 	return count;
 }
 
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index dddab52..2503797 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -12,6 +12,7 @@
 
 #include <linux/completion.h>
 #include <linux/eventfd.h>
+#include <linux/workqueue.h>
 #include <linux/vfio_ccw.h>
 
 #include "css.h"
@@ -25,12 +26,11 @@
  * @mdev: pointer to the mediated device
  * @nb: notifier for vfio events
  * @io_region: MMIO region to input/output I/O arguments/results
- * @wait_q: wait for interrupt
- * @intparm: record current interrupt parameter, used for wait interrupt
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
  * @scsw: scsw info
  * @io_trigger: eventfd ctx for signaling userspace I/O results
+ * @io_work: work for deferral process of I/O handling
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
@@ -40,13 +40,12 @@ struct vfio_ccw_private {
 	struct notifier_block	nb;
 	struct ccw_io_region	io_region;
 
-	wait_queue_head_t	wait_q;
-	u32			intparm;
 	struct channel_program	cp;
 	struct irb		irb;
 	union scsw		scsw;
 
 	struct eventfd_ctx	*io_trigger;
+	struct work_struct	io_work;
 } __aligned(8);
 
 extern int vfio_ccw_mdev_reg(struct subchannel *sch);
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 12/15] vfio: ccw: return I/O results asynchronously
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Introduce a singlethreaded workqueue to handle the I/O interrupts.
With the work added to this queue, we store the I/O results to the
io_region of the subchannel, then signal the userspace program to
handle the results.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 drivers/s390/cio/vfio_ccw_drv.c     | 58 ++++++++++++++++++++++---------------
 drivers/s390/cio/vfio_ccw_ops.c     |  3 --
 drivers/s390/cio/vfio_ccw_private.h |  7 ++---
 3 files changed, 37 insertions(+), 31 deletions(-)

diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index 601fa1a..dcf2d70 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -20,6 +20,8 @@
 #include "css.h"
 #include "vfio_ccw_private.h"
 
+struct workqueue_struct *vfio_ccw_work_q;
+
 /*
  * Helpers
  */
@@ -52,6 +54,7 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
 
 			spin_lock_irq(sch->lock);
 			private->completion = NULL;
+			flush_workqueue(vfio_ccw_work_q);
 			ret = cio_cancel_halt_clear(sch, &iretry);
 		};
 
@@ -63,18 +66,12 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
 	return ret;
 }
 
-static int doing_io(struct vfio_ccw_private *private, u32 intparm)
-{
-	return (private->intparm == intparm);
-}
-
 static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
 {
 	struct subchannel *sch;
 	union orb *orb;
 	int ccode;
 	__u8 lpm;
-	u32 intparm;
 
 	sch = private->sch;
 
@@ -89,7 +86,7 @@ static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
 		 * Initialize device status information
 		 */
 		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
-		break;
+		return 0;
 	case 1:		/* Status pending */
 	case 2:		/* Busy */
 		return -EBUSY;
@@ -109,15 +106,26 @@ static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
 	default:
 		return ccode;
 	}
+}
 
-	intparm = (u32)(addr_t)sch;
-	private->intparm = 0;
-	wait_event(private->wait_q, doing_io(private, intparm));
+static void vfio_ccw_sch_io_todo(struct work_struct *work)
+{
+	struct vfio_ccw_private *private;
+	struct subchannel *sch;
+	struct irb *irb;
 
-	if (scsw_is_solicited(&private->irb.scsw))
-		cp_update_scsw(&private->cp, &private->irb.scsw);
+	private = container_of(work, struct vfio_ccw_private, io_work);
+	irb = &private->irb;
+	sch = private->sch;
 
-	return 0;
+	if (scsw_is_solicited(&irb->scsw)) {
+		cp_update_scsw(&private->cp, &irb->scsw);
+		cp_free(&private->cp);
+	}
+	memcpy(private->io_region.irb_area, irb, sizeof(*irb));
+
+	if (private->io_trigger)
+		eventfd_signal(private->io_trigger, 1);
 }
 
 /* Deal with the ccw command request from the userspace. */
@@ -126,7 +134,6 @@ int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private)
 	struct mdev_device *mdev = private->mdev;
 	union orb *orb;
 	union scsw *scsw = &private->scsw;
-	struct irb *irb = &private->irb;
 	struct ccw_io_region *io_region = &private->io_region;
 	int ret;
 
@@ -147,12 +154,8 @@ int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private)
 
 		/* Start channel program and wait for I/O interrupt. */
 		ret = vfio_ccw_sch_io_helper(private);
-		if (!ret) {
-			/* Get irb info and copy it to irb_area. */
-			memcpy(io_region->irb_area, irb, sizeof(*irb));
-		}
-
-		cp_free(&private->cp);
+		if (!ret)
+			cp_free(&private->cp);
 	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
 		/* XXX: Handle halt. */
 		ret = -EOPNOTSUPP;
@@ -229,8 +232,8 @@ static void vfio_ccw_sch_irq(struct subchannel *sch)
 
 	irb = this_cpu_ptr(&cio_irb);
 	memcpy(&private->irb, irb, sizeof(*irb));
-	private->intparm = (u32)(addr_t)sch;
-	wake_up(&private->wait_q);
+
+	queue_work(vfio_ccw_work_q, &private->io_work);
 
 	if (private->completion)
 		complete(private->completion);
@@ -269,7 +272,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	if (ret)
 		goto out_rm_group;
 
-	init_waitqueue_head(&private->wait_q);
+	INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
 	atomic_set(&private->avail, 1);
 
 	return 0;
@@ -367,10 +370,16 @@ static int __init vfio_ccw_sch_init(void)
 {
 	int ret;
 
+	vfio_ccw_work_q = create_singlethread_workqueue("vfio-ccw");
+	if (!vfio_ccw_work_q)
+		return -ENOMEM;
+
 	isc_register(VFIO_CCW_ISC);
 	ret = css_driver_register(&vfio_ccw_sch_driver);
-	if (ret)
+	if (ret) {
 		isc_unregister(VFIO_CCW_ISC);
+		destroy_workqueue(vfio_ccw_work_q);
+	}
 
 	return ret;
 }
@@ -379,6 +388,7 @@ static void __exit vfio_ccw_sch_exit(void)
 {
 	css_driver_unregister(&vfio_ccw_sch_driver);
 	isc_unregister(VFIO_CCW_ISC);
+	destroy_workqueue(vfio_ccw_work_q);
 }
 module_init(vfio_ccw_sch_init);
 module_exit(vfio_ccw_sch_exit);
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index ebc38fb..8e6754d 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -202,9 +202,6 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	if (region->ret_code != 0)
 		return region->ret_code;
 
-	if (private->io_trigger)
-		eventfd_signal(private->io_trigger, 1);
-
 	return count;
 }
 
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index dddab52..2503797 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -12,6 +12,7 @@
 
 #include <linux/completion.h>
 #include <linux/eventfd.h>
+#include <linux/workqueue.h>
 #include <linux/vfio_ccw.h>
 
 #include "css.h"
@@ -25,12 +26,11 @@
  * @mdev: pointer to the mediated device
  * @nb: notifier for vfio events
  * @io_region: MMIO region to input/output I/O arguments/results
- * @wait_q: wait for interrupt
- * @intparm: record current interrupt parameter, used for wait interrupt
  * @cp: channel program for the current I/O operation
  * @irb: irb info received from interrupt
  * @scsw: scsw info
  * @io_trigger: eventfd ctx for signaling userspace I/O results
+ * @io_work: work for deferral process of I/O handling
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
@@ -40,13 +40,12 @@ struct vfio_ccw_private {
 	struct notifier_block	nb;
 	struct ccw_io_region	io_region;
 
-	wait_queue_head_t	wait_q;
-	u32			intparm;
 	struct channel_program	cp;
 	struct irb		irb;
 	union scsw		scsw;
 
 	struct eventfd_ctx	*io_trigger;
+	struct work_struct	io_work;
 } __aligned(8);
 
 extern int vfio_ccw_mdev_reg(struct subchannel *sch);
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 13/15] vfio: ccw: introduce a finite state machine
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

The current implementation doesn't check if the subchannel is in a
proper device state when handling an event. Let's introduce
a finite state machine to manage the state/event change.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 drivers/s390/cio/Makefile           |   2 +-
 drivers/s390/cio/vfio_ccw_drv.c     | 116 +++-----------------
 drivers/s390/cio/vfio_ccw_fsm.c     | 207 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_ops.c     |  28 ++++-
 drivers/s390/cio/vfio_ccw_private.h |  41 ++++++-
 5 files changed, 287 insertions(+), 107 deletions(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_fsm.c

diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index b0586b2..bdf4752 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -18,5 +18,5 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
 
-vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o
+vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o
 obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index dcf2d70..0cec76f 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -60,54 +60,12 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
 
 		ret = cio_disable_subchannel(sch);
 	} while (ret == -EBUSY);
-
 out_unlock:
+	private->state = VFIO_CCW_STATE_NOT_OPER;
 	spin_unlock_irq(sch->lock);
 	return ret;
 }
 
-static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
-{
-	struct subchannel *sch;
-	union orb *orb;
-	int ccode;
-	__u8 lpm;
-
-	sch = private->sch;
-
-	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
-
-	/* Issue "Start Subchannel" */
-	ccode = ssch(sch->schid, orb);
-
-	switch (ccode) {
-	case 0:
-		/*
-		 * Initialize device status information
-		 */
-		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
-		return 0;
-	case 1:		/* Status pending */
-	case 2:		/* Busy */
-		return -EBUSY;
-	case 3:		/* Device/path not operational */
-	{
-		lpm = orb->cmd.lpm;
-		if (lpm != 0)
-			sch->lpm &= ~lpm;
-		else
-			sch->lpm = 0;
-
-		if (cio_update_schib(sch))
-			return -ENODEV;
-
-		return sch->lpm ? -EACCES : -ENODEV;
-	}
-	default:
-		return ccode;
-	}
-}
-
 static void vfio_ccw_sch_io_todo(struct work_struct *work)
 {
 	struct vfio_ccw_private *private;
@@ -126,47 +84,9 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
 
 	if (private->io_trigger)
 		eventfd_signal(private->io_trigger, 1);
-}
-
-/* Deal with the ccw command request from the userspace. */
-int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private)
-{
-	struct mdev_device *mdev = private->mdev;
-	union orb *orb;
-	union scsw *scsw = &private->scsw;
-	struct ccw_io_region *io_region = &private->io_region;
-	int ret;
-
-	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
-
-	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
-		orb = (union orb *)io_region->orb_area;
-
-		ret = cp_init(&private->cp, mdev_dev(mdev), orb);
-		if (ret)
-			return ret;
-
-		ret = cp_prefetch(&private->cp);
-		if (ret) {
-			cp_free(&private->cp);
-			return ret;
-		}
-
-		/* Start channel program and wait for I/O interrupt. */
-		ret = vfio_ccw_sch_io_helper(private);
-		if (!ret)
-			cp_free(&private->cp);
-	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
-		/* XXX: Handle halt. */
-		ret = -EOPNOTSUPP;
-	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
-		/* XXX: Handle clear. */
-		ret = -EOPNOTSUPP;
-	} else {
-		ret = -EOPNOTSUPP;
-	}
 
-	return ret;
+	if (private->mdev)
+		private->state = VFIO_CCW_STATE_IDLE;
 }
 
 /*
@@ -223,20 +143,9 @@ static struct attribute_group vfio_subchannel_attr_group = {
 static void vfio_ccw_sch_irq(struct subchannel *sch)
 {
 	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
-	struct irb *irb;
 
 	inc_irq_stat(IRQIO_CIO);
-
-	if (!private)
-		return;
-
-	irb = this_cpu_ptr(&cio_irb);
-	memcpy(&private->irb, irb, sizeof(*irb));
-
-	queue_work(vfio_ccw_work_q, &private->io_work);
-
-	if (private->completion)
-		complete(private->completion);
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_INTERRUPT);
 }
 
 static int vfio_ccw_sch_probe(struct subchannel *sch)
@@ -258,6 +167,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	dev_set_drvdata(&sch->dev, private);
 
 	spin_lock_irq(sch->lock);
+	private->state = VFIO_CCW_STATE_NOT_OPER;
 	sch->isc = VFIO_CCW_ISC;
 	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
 	spin_unlock_irq(sch->lock);
@@ -274,6 +184,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 
 	INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
 	atomic_set(&private->avail, 1);
+	private->state = VFIO_CCW_STATE_STANDBY;
 
 	return 0;
 
@@ -321,6 +232,7 @@ static void vfio_ccw_sch_shutdown(struct subchannel *sch)
  */
 static int vfio_ccw_sch_event(struct subchannel *sch, int process)
 {
+	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
 	unsigned long flags;
 
 	spin_lock_irqsave(sch->lock, flags);
@@ -331,16 +243,16 @@ static int vfio_ccw_sch_event(struct subchannel *sch, int process)
 		goto out_unlock;
 
 	if (cio_update_schib(sch)) {
-		/* Not operational. */
-		css_sched_sch_todo(sch, SCH_TODO_UNREG);
-
-		/*
-		 * TODO:
-		 * Probably we should send the machine check to the guest.
-		 */
+		vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_NOT_OPER);
 		goto out_unlock;
 	}
 
+	private = dev_get_drvdata(&sch->dev);
+	if (private->state == VFIO_CCW_STATE_NOT_OPER) {
+		private->state = private->mdev ? VFIO_CCW_STATE_IDLE :
+				 VFIO_CCW_STATE_STANDBY;
+	}
+
 out_unlock:
 	spin_unlock_irqrestore(sch->lock, flags);
 
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
new file mode 100644
index 0000000..55b6cc5
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -0,0 +1,207 @@
+/*
+ * Finite state machine for vfio-ccw device handling
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ */
+
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+
+#include "ioasm.h"
+#include "vfio_ccw_private.h"
+
+static int fsm_io_helper(struct vfio_ccw_private *private)
+{
+	struct subchannel *sch;
+	union orb *orb;
+	int ccode;
+	__u8 lpm;
+	unsigned long flags;
+
+	sch = private->sch;
+
+	spin_lock_irqsave(sch->lock, flags);
+	private->state = VFIO_CCW_STATE_BUSY;
+	spin_unlock_irqrestore(sch->lock, flags);
+
+	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
+
+	/* Issue "Start Subchannel" */
+	ccode = ssch(sch->schid, orb);
+
+	switch (ccode) {
+	case 0:
+		/*
+		 * Initialize device status information
+		 */
+		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
+		return 0;
+	case 1:		/* Status pending */
+	case 2:		/* Busy */
+		return -EBUSY;
+	case 3:		/* Device/path not operational */
+	{
+		lpm = orb->cmd.lpm;
+		if (lpm != 0)
+			sch->lpm &= ~lpm;
+		else
+			sch->lpm = 0;
+
+		if (cio_update_schib(sch))
+			return -ENODEV;
+
+		return sch->lpm ? -EACCES : -ENODEV;
+	}
+	default:
+		return ccode;
+	}
+}
+
+static void fsm_notoper(struct vfio_ccw_private *private,
+			enum vfio_ccw_event event)
+{
+	struct subchannel *sch = private->sch;
+
+	/*
+	 * TODO:
+	 * Probably we should send the machine check to the guest.
+	 */
+	css_sched_sch_todo(sch, SCH_TODO_UNREG);
+	private->state = VFIO_CCW_STATE_NOT_OPER;
+}
+
+/*
+ * No operation action.
+ */
+static void fsm_nop(struct vfio_ccw_private *private,
+		    enum vfio_ccw_event event)
+{
+}
+
+static void fsm_io_error(struct vfio_ccw_private *private,
+			 enum vfio_ccw_event event)
+{
+	pr_err("vfio-ccw: FSM: I/O request from state:%d\n", private->state);
+	private->io_region.ret_code = -EIO;
+}
+
+static void fsm_io_busy(struct vfio_ccw_private *private,
+			enum vfio_ccw_event event)
+{
+	private->io_region.ret_code = -EBUSY;
+}
+
+static void fsm_disabled_irq(struct vfio_ccw_private *private,
+			     enum vfio_ccw_event event)
+{
+	struct subchannel *sch = private->sch;
+
+	/*
+	 * An interrupt in a disabled state means a previous disable was not
+	 * successful - should not happen, but we try to disable again.
+	 */
+	cio_disable_subchannel(sch);
+}
+
+/*
+ * Deal with the ccw command request from the userspace.
+ */
+static void fsm_io_request(struct vfio_ccw_private *private,
+			   enum vfio_ccw_event event)
+{
+	union orb *orb;
+	union scsw *scsw = &private->scsw;
+	struct ccw_io_region *io_region = &private->io_region;
+	struct mdev_device *mdev = private->mdev;
+
+	private->state = VFIO_CCW_STATE_BOXED;
+
+	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
+
+	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
+		orb = (union orb *)io_region->orb_area;
+
+		io_region->ret_code = cp_init(&private->cp, mdev_dev(mdev),
+					      orb);
+		if (io_region->ret_code)
+			goto err_out;
+
+		io_region->ret_code = cp_prefetch(&private->cp);
+		if (io_region->ret_code) {
+			cp_free(&private->cp);
+			goto err_out;
+		}
+
+		/* Start channel program and wait for I/O interrupt. */
+		io_region->ret_code = fsm_io_helper(private);
+		if (io_region->ret_code) {
+			cp_free(&private->cp);
+			goto err_out;
+		}
+		return;
+	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
+		/* XXX: Handle halt. */
+		io_region->ret_code = -EOPNOTSUPP;
+		goto err_out;
+	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
+		/* XXX: Handle clear. */
+		io_region->ret_code = -EOPNOTSUPP;
+		goto err_out;
+	}
+
+err_out:
+	private->state = VFIO_CCW_STATE_IDLE;
+}
+
+/*
+ * Got an interrupt for a normal io (state busy).
+ */
+static void fsm_irq(struct vfio_ccw_private *private,
+		    enum vfio_ccw_event event)
+{
+	struct irb *irb;
+
+	if (!private)
+		return;
+
+	irb = this_cpu_ptr(&cio_irb);
+	memcpy(&private->irb, irb, sizeof(*irb));
+
+	queue_work(vfio_ccw_work_q, &private->io_work);
+
+	if (private->completion)
+		complete(private->completion);
+}
+
+/*
+ * Device statemachine
+ */
+fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
+	[VFIO_CCW_STATE_NOT_OPER] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_nop,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
+	},
+	[VFIO_CCW_STATE_STANDBY] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
+	},
+	[VFIO_CCW_STATE_IDLE] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_request,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
+	},
+	[VFIO_CCW_STATE_BOXED] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
+	},
+	[VFIO_CCW_STATE_BUSY] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
+	},
+};
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 8e6754d..c7bed30a 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -35,7 +35,11 @@ static int vfio_ccw_mdev_reset(struct mdev_device *mdev)
 	if (ret)
 		return ret;
 
-	return cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	if (!ret)
+		private->state = VFIO_CCW_STATE_IDLE;
+
+	return ret;
 }
 
 static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
@@ -112,10 +116,14 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 	struct vfio_ccw_private *private =
 		dev_get_drvdata(mdev_parent_dev(mdev));
 
+	if (private->state == VFIO_CCW_STATE_NOT_OPER)
+		return -ENODEV;
+
 	if (atomic_dec_if_positive(&private->avail) < 0)
 		return -EPERM;
 
 	private->mdev = mdev;
+	private->state = VFIO_CCW_STATE_IDLE;
 
 	return 0;
 }
@@ -126,10 +134,20 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
 		dev_get_drvdata(mdev_parent_dev(mdev));
 	int ret;
 
+	if (!private)
+		goto out;
+
+	if ((private->state == VFIO_CCW_STATE_NOT_OPER) ||
+	    (private->state == VFIO_CCW_STATE_STANDBY))
+		goto out;
+
 	ret = vfio_ccw_mdev_reset(mdev);
 	if (ret)
 		return ret;
 
+	private->state = VFIO_CCW_STATE_STANDBY;
+
+out:
 	private->mdev = NULL;
 	atomic_inc(&private->avail);
 
@@ -193,14 +211,18 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	if (!private)
 		return -ENODEV;
+	if (private->state != VFIO_CCW_STATE_IDLE)
+		return -EACCES;
 
 	region = &private->io_region;
 	if (copy_from_user((void *)region + *ppos, buf, count))
 		return -EFAULT;
 
-	region->ret_code = vfio_ccw_sch_cmd_request(private);
-	if (region->ret_code != 0)
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
+	if (region->ret_code != 0) {
+		private->state = VFIO_CCW_STATE_IDLE;
 		return region->ret_code;
+	}
 
 	return count;
 }
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 2503797..fc0f01c 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -21,6 +21,7 @@
 /**
  * struct vfio_ccw_private
  * @sch: pointer to the subchannel
+ * @state: internal state of the device
  * @completion: synchronization helper of the I/O completion
  * @avail: available for creating a mediated device
  * @mdev: pointer to the mediated device
@@ -34,6 +35,7 @@
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
+	int			state;
 	struct completion	*completion;
 	atomic_t		avail;
 	struct mdev_device	*mdev;
@@ -52,6 +54,43 @@ extern int vfio_ccw_mdev_reg(struct subchannel *sch);
 extern void vfio_ccw_mdev_unreg(struct subchannel *sch);
 
 extern int vfio_ccw_sch_quiesce(struct subchannel *sch);
-extern int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private);
+
+/*
+ * States of the device statemachine.
+ */
+enum vfio_ccw_state {
+	VFIO_CCW_STATE_NOT_OPER,
+	VFIO_CCW_STATE_STANDBY,
+	VFIO_CCW_STATE_IDLE,
+	VFIO_CCW_STATE_BOXED,
+	VFIO_CCW_STATE_BUSY,
+	/* last element! */
+	NR_VFIO_CCW_STATES
+};
+
+/*
+ * Asynchronous events of the device statemachine.
+ */
+enum vfio_ccw_event {
+	VFIO_CCW_EVENT_NOT_OPER,
+	VFIO_CCW_EVENT_IO_REQ,
+	VFIO_CCW_EVENT_INTERRUPT,
+	/* last element! */
+	NR_VFIO_CCW_EVENTS
+};
+
+/*
+ * Action called through jumptable.
+ */
+typedef void (fsm_func_t)(struct vfio_ccw_private *, enum vfio_ccw_event);
+extern fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS];
+
+static inline void vfio_ccw_fsm_event(struct vfio_ccw_private *private,
+				     int event)
+{
+	vfio_ccw_jumptable[private->state][event](private, event);
+}
+
+extern struct workqueue_struct *vfio_ccw_work_q;
 
 #endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 13/15] vfio: ccw: introduce a finite state machine
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

The current implementation doesn't check if the subchannel is in a
proper device state when handling an event. Let's introduce
a finite state machine to manage the state/event change.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 drivers/s390/cio/Makefile           |   2 +-
 drivers/s390/cio/vfio_ccw_drv.c     | 116 +++-----------------
 drivers/s390/cio/vfio_ccw_fsm.c     | 207 ++++++++++++++++++++++++++++++++++++
 drivers/s390/cio/vfio_ccw_ops.c     |  28 ++++-
 drivers/s390/cio/vfio_ccw_private.h |  41 ++++++-
 5 files changed, 287 insertions(+), 107 deletions(-)
 create mode 100644 drivers/s390/cio/vfio_ccw_fsm.c

diff --git a/drivers/s390/cio/Makefile b/drivers/s390/cio/Makefile
index b0586b2..bdf4752 100644
--- a/drivers/s390/cio/Makefile
+++ b/drivers/s390/cio/Makefile
@@ -18,5 +18,5 @@ obj-$(CONFIG_CCWGROUP) += ccwgroup.o
 qdio-objs := qdio_main.o qdio_thinint.o qdio_debug.o qdio_setup.o
 obj-$(CONFIG_QDIO) += qdio.o
 
-vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o
+vfio_ccw-objs += vfio_ccw_drv.o vfio_ccw_cp.o vfio_ccw_ops.o vfio_ccw_fsm.o
 obj-$(CONFIG_VFIO_CCW) += vfio_ccw.o
diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
index dcf2d70..0cec76f 100644
--- a/drivers/s390/cio/vfio_ccw_drv.c
+++ b/drivers/s390/cio/vfio_ccw_drv.c
@@ -60,54 +60,12 @@ int vfio_ccw_sch_quiesce(struct subchannel *sch)
 
 		ret = cio_disable_subchannel(sch);
 	} while (ret == -EBUSY);
-
 out_unlock:
+	private->state = VFIO_CCW_STATE_NOT_OPER;
 	spin_unlock_irq(sch->lock);
 	return ret;
 }
 
-static int vfio_ccw_sch_io_helper(struct vfio_ccw_private *private)
-{
-	struct subchannel *sch;
-	union orb *orb;
-	int ccode;
-	__u8 lpm;
-
-	sch = private->sch;
-
-	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
-
-	/* Issue "Start Subchannel" */
-	ccode = ssch(sch->schid, orb);
-
-	switch (ccode) {
-	case 0:
-		/*
-		 * Initialize device status information
-		 */
-		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
-		return 0;
-	case 1:		/* Status pending */
-	case 2:		/* Busy */
-		return -EBUSY;
-	case 3:		/* Device/path not operational */
-	{
-		lpm = orb->cmd.lpm;
-		if (lpm != 0)
-			sch->lpm &= ~lpm;
-		else
-			sch->lpm = 0;
-
-		if (cio_update_schib(sch))
-			return -ENODEV;
-
-		return sch->lpm ? -EACCES : -ENODEV;
-	}
-	default:
-		return ccode;
-	}
-}
-
 static void vfio_ccw_sch_io_todo(struct work_struct *work)
 {
 	struct vfio_ccw_private *private;
@@ -126,47 +84,9 @@ static void vfio_ccw_sch_io_todo(struct work_struct *work)
 
 	if (private->io_trigger)
 		eventfd_signal(private->io_trigger, 1);
-}
-
-/* Deal with the ccw command request from the userspace. */
-int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private)
-{
-	struct mdev_device *mdev = private->mdev;
-	union orb *orb;
-	union scsw *scsw = &private->scsw;
-	struct ccw_io_region *io_region = &private->io_region;
-	int ret;
-
-	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
-
-	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
-		orb = (union orb *)io_region->orb_area;
-
-		ret = cp_init(&private->cp, mdev_dev(mdev), orb);
-		if (ret)
-			return ret;
-
-		ret = cp_prefetch(&private->cp);
-		if (ret) {
-			cp_free(&private->cp);
-			return ret;
-		}
-
-		/* Start channel program and wait for I/O interrupt. */
-		ret = vfio_ccw_sch_io_helper(private);
-		if (!ret)
-			cp_free(&private->cp);
-	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
-		/* XXX: Handle halt. */
-		ret = -EOPNOTSUPP;
-	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
-		/* XXX: Handle clear. */
-		ret = -EOPNOTSUPP;
-	} else {
-		ret = -EOPNOTSUPP;
-	}
 
-	return ret;
+	if (private->mdev)
+		private->state = VFIO_CCW_STATE_IDLE;
 }
 
 /*
@@ -223,20 +143,9 @@ static struct attribute_group vfio_subchannel_attr_group = {
 static void vfio_ccw_sch_irq(struct subchannel *sch)
 {
 	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
-	struct irb *irb;
 
 	inc_irq_stat(IRQIO_CIO);
-
-	if (!private)
-		return;
-
-	irb = this_cpu_ptr(&cio_irb);
-	memcpy(&private->irb, irb, sizeof(*irb));
-
-	queue_work(vfio_ccw_work_q, &private->io_work);
-
-	if (private->completion)
-		complete(private->completion);
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_INTERRUPT);
 }
 
 static int vfio_ccw_sch_probe(struct subchannel *sch)
@@ -258,6 +167,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 	dev_set_drvdata(&sch->dev, private);
 
 	spin_lock_irq(sch->lock);
+	private->state = VFIO_CCW_STATE_NOT_OPER;
 	sch->isc = VFIO_CCW_ISC;
 	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
 	spin_unlock_irq(sch->lock);
@@ -274,6 +184,7 @@ static int vfio_ccw_sch_probe(struct subchannel *sch)
 
 	INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo);
 	atomic_set(&private->avail, 1);
+	private->state = VFIO_CCW_STATE_STANDBY;
 
 	return 0;
 
@@ -321,6 +232,7 @@ static void vfio_ccw_sch_shutdown(struct subchannel *sch)
  */
 static int vfio_ccw_sch_event(struct subchannel *sch, int process)
 {
+	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
 	unsigned long flags;
 
 	spin_lock_irqsave(sch->lock, flags);
@@ -331,16 +243,16 @@ static int vfio_ccw_sch_event(struct subchannel *sch, int process)
 		goto out_unlock;
 
 	if (cio_update_schib(sch)) {
-		/* Not operational. */
-		css_sched_sch_todo(sch, SCH_TODO_UNREG);
-
-		/*
-		 * TODO:
-		 * Probably we should send the machine check to the guest.
-		 */
+		vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_NOT_OPER);
 		goto out_unlock;
 	}
 
+	private = dev_get_drvdata(&sch->dev);
+	if (private->state == VFIO_CCW_STATE_NOT_OPER) {
+		private->state = private->mdev ? VFIO_CCW_STATE_IDLE :
+				 VFIO_CCW_STATE_STANDBY;
+	}
+
 out_unlock:
 	spin_unlock_irqrestore(sch->lock, flags);
 
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c
new file mode 100644
index 0000000..55b6cc5
--- /dev/null
+++ b/drivers/s390/cio/vfio_ccw_fsm.c
@@ -0,0 +1,207 @@
+/*
+ * Finite state machine for vfio-ccw device handling
+ *
+ * Copyright IBM Corp. 2017
+ *
+ * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
+ */
+
+#include <linux/vfio.h>
+#include <linux/mdev.h>
+
+#include "ioasm.h"
+#include "vfio_ccw_private.h"
+
+static int fsm_io_helper(struct vfio_ccw_private *private)
+{
+	struct subchannel *sch;
+	union orb *orb;
+	int ccode;
+	__u8 lpm;
+	unsigned long flags;
+
+	sch = private->sch;
+
+	spin_lock_irqsave(sch->lock, flags);
+	private->state = VFIO_CCW_STATE_BUSY;
+	spin_unlock_irqrestore(sch->lock, flags);
+
+	orb = cp_get_orb(&private->cp, (u32)(addr_t)sch, sch->lpm);
+
+	/* Issue "Start Subchannel" */
+	ccode = ssch(sch->schid, orb);
+
+	switch (ccode) {
+	case 0:
+		/*
+		 * Initialize device status information
+		 */
+		sch->schib.scsw.cmd.actl |= SCSW_ACTL_START_PEND;
+		return 0;
+	case 1:		/* Status pending */
+	case 2:		/* Busy */
+		return -EBUSY;
+	case 3:		/* Device/path not operational */
+	{
+		lpm = orb->cmd.lpm;
+		if (lpm != 0)
+			sch->lpm &= ~lpm;
+		else
+			sch->lpm = 0;
+
+		if (cio_update_schib(sch))
+			return -ENODEV;
+
+		return sch->lpm ? -EACCES : -ENODEV;
+	}
+	default:
+		return ccode;
+	}
+}
+
+static void fsm_notoper(struct vfio_ccw_private *private,
+			enum vfio_ccw_event event)
+{
+	struct subchannel *sch = private->sch;
+
+	/*
+	 * TODO:
+	 * Probably we should send the machine check to the guest.
+	 */
+	css_sched_sch_todo(sch, SCH_TODO_UNREG);
+	private->state = VFIO_CCW_STATE_NOT_OPER;
+}
+
+/*
+ * No operation action.
+ */
+static void fsm_nop(struct vfio_ccw_private *private,
+		    enum vfio_ccw_event event)
+{
+}
+
+static void fsm_io_error(struct vfio_ccw_private *private,
+			 enum vfio_ccw_event event)
+{
+	pr_err("vfio-ccw: FSM: I/O request from state:%d\n", private->state);
+	private->io_region.ret_code = -EIO;
+}
+
+static void fsm_io_busy(struct vfio_ccw_private *private,
+			enum vfio_ccw_event event)
+{
+	private->io_region.ret_code = -EBUSY;
+}
+
+static void fsm_disabled_irq(struct vfio_ccw_private *private,
+			     enum vfio_ccw_event event)
+{
+	struct subchannel *sch = private->sch;
+
+	/*
+	 * An interrupt in a disabled state means a previous disable was not
+	 * successful - should not happen, but we try to disable again.
+	 */
+	cio_disable_subchannel(sch);
+}
+
+/*
+ * Deal with the ccw command request from the userspace.
+ */
+static void fsm_io_request(struct vfio_ccw_private *private,
+			   enum vfio_ccw_event event)
+{
+	union orb *orb;
+	union scsw *scsw = &private->scsw;
+	struct ccw_io_region *io_region = &private->io_region;
+	struct mdev_device *mdev = private->mdev;
+
+	private->state = VFIO_CCW_STATE_BOXED;
+
+	memcpy(scsw, io_region->scsw_area, sizeof(*scsw));
+
+	if (scsw->cmd.fctl & SCSW_FCTL_START_FUNC) {
+		orb = (union orb *)io_region->orb_area;
+
+		io_region->ret_code = cp_init(&private->cp, mdev_dev(mdev),
+					      orb);
+		if (io_region->ret_code)
+			goto err_out;
+
+		io_region->ret_code = cp_prefetch(&private->cp);
+		if (io_region->ret_code) {
+			cp_free(&private->cp);
+			goto err_out;
+		}
+
+		/* Start channel program and wait for I/O interrupt. */
+		io_region->ret_code = fsm_io_helper(private);
+		if (io_region->ret_code) {
+			cp_free(&private->cp);
+			goto err_out;
+		}
+		return;
+	} else if (scsw->cmd.fctl & SCSW_FCTL_HALT_FUNC) {
+		/* XXX: Handle halt. */
+		io_region->ret_code = -EOPNOTSUPP;
+		goto err_out;
+	} else if (scsw->cmd.fctl & SCSW_FCTL_CLEAR_FUNC) {
+		/* XXX: Handle clear. */
+		io_region->ret_code = -EOPNOTSUPP;
+		goto err_out;
+	}
+
+err_out:
+	private->state = VFIO_CCW_STATE_IDLE;
+}
+
+/*
+ * Got an interrupt for a normal io (state busy).
+ */
+static void fsm_irq(struct vfio_ccw_private *private,
+		    enum vfio_ccw_event event)
+{
+	struct irb *irb;
+
+	if (!private)
+		return;
+
+	irb = this_cpu_ptr(&cio_irb);
+	memcpy(&private->irb, irb, sizeof(*irb));
+
+	queue_work(vfio_ccw_work_q, &private->io_work);
+
+	if (private->completion)
+		complete(private->completion);
+}
+
+/*
+ * Device statemachine
+ */
+fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS] = {
+	[VFIO_CCW_STATE_NOT_OPER] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_nop,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_disabled_irq,
+	},
+	[VFIO_CCW_STATE_STANDBY] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_error,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
+	},
+	[VFIO_CCW_STATE_IDLE] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_request,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
+	},
+	[VFIO_CCW_STATE_BOXED] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
+	},
+	[VFIO_CCW_STATE_BUSY] = {
+		[VFIO_CCW_EVENT_NOT_OPER]	= fsm_notoper,
+		[VFIO_CCW_EVENT_IO_REQ]		= fsm_io_busy,
+		[VFIO_CCW_EVENT_INTERRUPT]	= fsm_irq,
+	},
+};
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 8e6754d..c7bed30a 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -35,7 +35,11 @@ static int vfio_ccw_mdev_reset(struct mdev_device *mdev)
 	if (ret)
 		return ret;
 
-	return cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
+	if (!ret)
+		private->state = VFIO_CCW_STATE_IDLE;
+
+	return ret;
 }
 
 static int vfio_ccw_mdev_notifier(struct notifier_block *nb,
@@ -112,10 +116,14 @@ static int vfio_ccw_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
 	struct vfio_ccw_private *private =
 		dev_get_drvdata(mdev_parent_dev(mdev));
 
+	if (private->state == VFIO_CCW_STATE_NOT_OPER)
+		return -ENODEV;
+
 	if (atomic_dec_if_positive(&private->avail) < 0)
 		return -EPERM;
 
 	private->mdev = mdev;
+	private->state = VFIO_CCW_STATE_IDLE;
 
 	return 0;
 }
@@ -126,10 +134,20 @@ static int vfio_ccw_mdev_remove(struct mdev_device *mdev)
 		dev_get_drvdata(mdev_parent_dev(mdev));
 	int ret;
 
+	if (!private)
+		goto out;
+
+	if ((private->state == VFIO_CCW_STATE_NOT_OPER) ||
+	    (private->state == VFIO_CCW_STATE_STANDBY))
+		goto out;
+
 	ret = vfio_ccw_mdev_reset(mdev);
 	if (ret)
 		return ret;
 
+	private->state = VFIO_CCW_STATE_STANDBY;
+
+out:
 	private->mdev = NULL;
 	atomic_inc(&private->avail);
 
@@ -193,14 +211,18 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
 	private = dev_get_drvdata(mdev_parent_dev(mdev));
 	if (!private)
 		return -ENODEV;
+	if (private->state != VFIO_CCW_STATE_IDLE)
+		return -EACCES;
 
 	region = &private->io_region;
 	if (copy_from_user((void *)region + *ppos, buf, count))
 		return -EFAULT;
 
-	region->ret_code = vfio_ccw_sch_cmd_request(private);
-	if (region->ret_code != 0)
+	vfio_ccw_fsm_event(private, VFIO_CCW_EVENT_IO_REQ);
+	if (region->ret_code != 0) {
+		private->state = VFIO_CCW_STATE_IDLE;
 		return region->ret_code;
+	}
 
 	return count;
 }
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index 2503797..fc0f01c 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -21,6 +21,7 @@
 /**
  * struct vfio_ccw_private
  * @sch: pointer to the subchannel
+ * @state: internal state of the device
  * @completion: synchronization helper of the I/O completion
  * @avail: available for creating a mediated device
  * @mdev: pointer to the mediated device
@@ -34,6 +35,7 @@
  */
 struct vfio_ccw_private {
 	struct subchannel	*sch;
+	int			state;
 	struct completion	*completion;
 	atomic_t		avail;
 	struct mdev_device	*mdev;
@@ -52,6 +54,43 @@ extern int vfio_ccw_mdev_reg(struct subchannel *sch);
 extern void vfio_ccw_mdev_unreg(struct subchannel *sch);
 
 extern int vfio_ccw_sch_quiesce(struct subchannel *sch);
-extern int vfio_ccw_sch_cmd_request(struct vfio_ccw_private *private);
+
+/*
+ * States of the device statemachine.
+ */
+enum vfio_ccw_state {
+	VFIO_CCW_STATE_NOT_OPER,
+	VFIO_CCW_STATE_STANDBY,
+	VFIO_CCW_STATE_IDLE,
+	VFIO_CCW_STATE_BOXED,
+	VFIO_CCW_STATE_BUSY,
+	/* last element! */
+	NR_VFIO_CCW_STATES
+};
+
+/*
+ * Asynchronous events of the device statemachine.
+ */
+enum vfio_ccw_event {
+	VFIO_CCW_EVENT_NOT_OPER,
+	VFIO_CCW_EVENT_IO_REQ,
+	VFIO_CCW_EVENT_INTERRUPT,
+	/* last element! */
+	NR_VFIO_CCW_EVENTS
+};
+
+/*
+ * Action called through jumptable.
+ */
+typedef void (fsm_func_t)(struct vfio_ccw_private *, enum vfio_ccw_event);
+extern fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS];
+
+static inline void vfio_ccw_fsm_event(struct vfio_ccw_private *private,
+				     int event)
+{
+	vfio_ccw_jumptable[private->state][event](private, event);
+}
+
+extern struct workqueue_struct *vfio_ccw_work_q;
 
 #endif
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 14/15] docs: add documentation for vfio-ccw
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Add file Documentation/s390/vfio-ccw.txt that includes details
of vfio-ccw.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 Documentation/s390/00-INDEX     |   2 +
 Documentation/s390/vfio-ccw.txt | 303 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 305 insertions(+)
 create mode 100644 Documentation/s390/vfio-ccw.txt

diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX
index 9189535..317f037 100644
--- a/Documentation/s390/00-INDEX
+++ b/Documentation/s390/00-INDEX
@@ -22,5 +22,7 @@ qeth.txt
 	- HiperSockets Bridge Port Support.
 s390dbf.txt
 	- information on using the s390 debug feature.
+vfio-ccw.txt
+	  information on the vfio-ccw I/O subchannel driver.
 zfcpdump.txt
 	- information on the s390 SCSI dump tool.
diff --git a/Documentation/s390/vfio-ccw.txt b/Documentation/s390/vfio-ccw.txt
new file mode 100644
index 0000000..90b3dfe
--- /dev/null
+++ b/Documentation/s390/vfio-ccw.txt
@@ -0,0 +1,303 @@
+vfio-ccw: the basic infrastructure
+==================================
+
+Introduction
+------------
+
+Here we describe the vfio support for I/O subchannel devices for
+Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
+virtual machine, while vfio is the means.
+
+Different than other hardware architectures, s390 has defined a unified
+I/O access method, which is so called Channel I/O. It has its own access
+patterns:
+- Channel programs run asynchronously on a separate (co)processor.
+- The channel subsystem will access any memory designated by the caller
+  in the channel program directly, i.e. there is no iommu involved.
+Thus when we introduce vfio support for these devices, we realize it
+with a mediated device (mdev) implementation. The vfio mdev will be
+added to an iommu group, so as to make itself able to be managed by the
+vfio framework. And we add read/write callbacks for special vfio I/O
+regions to pass the channel programs from the mdev to its parent device
+(the real I/O subchannel device) to do further address translation and
+to perform I/O instructions.
+
+This document does not intend to explain the s390 I/O architecture in
+every detail. More information/reference could be found here:
+- A good start to know Channel I/O in general:
+  https://en.wikipedia.org/wiki/Channel_I/O
+- s390 architecture:
+  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
+- The existing Qemu code which implements a simple emulated channel
+  subsystem could also be a good reference. It makes it easier to follow
+  the flow.
+  qemu/hw/s390x/css.c
+
+For vfio mediated device framework:
+- Documentation/vfio-mediated-device.txt
+
+Motivation of vfio-ccw
+----------------------
+
+Currently, a guest virtualized via qemu/kvm on s390 only sees
+paravirtualized virtio devices via the "Virtio Over Channel I/O
+(virtio-ccw)" transport. This makes virtio devices discoverable via
+standard operating system algorithms for handling channel devices.
+
+However this is not enough. On s390 for the majority of devices, which
+use the standard Channel I/O based mechanism, we also need to provide
+the functionality of passing through them to a Qemu virtual machine.
+This includes devices that don't have a virtio counterpart (e.g. tape
+drives) or that have specific characteristics which guests want to
+exploit.
+
+For passing a device to a guest, we want to use the same interface as
+everybody else, namely vfio. Thus, we would like to introduce vfio
+support for channel devices. And we would like to name this new vfio
+device "vfio-ccw".
+
+Access patterns of CCW devices
+------------------------------
+
+s390 architecture has implemented a so called channel subsystem, that
+provides a unified view of the devices physically attached to the
+systems. Though the s390 hardware platform knows about a huge variety of
+different peripheral attachments like disk devices (aka. DASDs), tapes,
+communication controllers, etc. They can all be accessed by a well
+defined access method and they are presenting I/O completion a unified
+way: I/O interruptions.
+
+All I/O requires the use of channel command words (CCWs). A CCW is an
+instruction to a specialized I/O channel processor. A channel program is
+a sequence of CCWs which are executed by the I/O channel subsystem.  To
+issue a channel program to the channel subsystem, it is required to
+build an operation request block (ORB), which can be used to point out
+the format of the CCW and other control information to the system. The
+operating system signals the I/O channel subsystem to begin executing
+the channel program with a SSCH (start sub-channel) instruction. The
+central processor is then free to proceed with non-I/O instructions
+until interrupted. The I/O completion result is received by the
+interrupt handler in the form of interrupt response block (IRB).
+
+Back to vfio-ccw, in short:
+- ORBs and channel programs are built in guest kernel (with guest
+  physical addresses).
+- ORBs and channel programs are passed to the host kernel.
+- Host kernel translates the guest physical addresses to real addresses
+  and starts the I/O with issuing a privileged Channel I/O instruction
+  (e.g SSCH).
+- channel programs run asynchronously on a separate processor.
+- I/O completion will be signaled to the host with I/O interruptions.
+  And it will be copied as IRB to user space to pass it back to the
+  guest.
+
+Physical vfio ccw device and its child mdev
+-------------------------------------------
+
+As mentioned above, we realize vfio-ccw with a mdev implementation.
+
+Channel I/O does not have IOMMU hardware support, so the physical
+vfio-ccw device does not have an IOMMU level translation or isolation.
+
+Sub-channel I/O instructions are all privileged instructions, When
+handling the I/O instruction interception, vfio-ccw has the software
+policing and translation how the channel program is programmed before
+it gets sent to hardware.
+
+Within this implementation, we have two drivers for two types of
+devices:
+- The vfio_ccw driver for the physical subchannel device.
+  This is an I/O subchannel driver for the real subchannel device.  It
+  realizes a group of callbacks and registers to the mdev framework as a
+  parent (physical) device. As a consequence, mdev provides vfio_ccw a
+  generic interface (sysfs) to create mdev devices. A vfio mdev could be
+  created by vfio_ccw then and added to the mediated bus. It is the vfio
+  device that added to an IOMMU group and a vfio group.
+  vfio_ccw also provides an I/O region to accept channel program
+  request from user space and store I/O interrupt result for user
+  space to retrieve. To notify user space an I/O completion, it offers
+  an interface to setup an eventfd fd for asynchronous signaling.
+
+- The vfio_mdev driver for the mediated vfio ccw device.
+  This is provided by the mdev framework. It is a vfio device driver for
+  the mdev that created by vfio_ccw.
+  It realize a group of vfio device driver callbacks, adds itself to a
+  vfio group, and registers itself to the mdev framework as a mdev
+  driver.
+  It uses a vfio iommu backend that uses the existing map and unmap
+  ioctls, but rather than programming them into an IOMMU for a device,
+  it simply stores the translations for use by later requests. This
+  means that a device programmed in a VM with guest physical addresses
+  can have the vfio kernel convert that address to process virtual
+  address, pin the page and program the hardware with the host physical
+  address in one step.
+  For a mdev, the vfio iommu backend will not pin the pages during the
+  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
+  of the iova<->vaddr mappings in this operation. And they export a
+  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
+  backend for the physical devices to pin and unpin pages by demand.
+
+Below is a high Level block diagram.
+
+ +-------------+
+ |             |
+ | +---------+ | mdev_register_driver() +--------------+
+ | |  Mdev   | +<-----------------------+              |
+ | |  bus    | |                        | vfio_mdev.ko |
+ | | driver  | +----------------------->+              |<-> VFIO user
+ | +---------+ |    probe()/remove()    +--------------+    APIs
+ |             |
+ |  MDEV CORE  |
+ |   MODULE    |
+ |   mdev.ko   |
+ | +---------+ | mdev_register_device() +--------------+
+ | |Physical | +<-----------------------+              |
+ | | device  | |                        |  vfio_ccw.ko |<-> subchannel
+ | |interface| +----------------------->+              |     device
+ | +---------+ |       callback         +--------------+
+ +-------------+
+
+The process of how these work together.
+1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
+   physical device (with callbacks) to mdev framework.
+   When vfio_ccw probing the subchannel device, it registers device
+   pointer and callbacks to the mdev framework. Mdev related file nodes
+   under the device node in sysfs would be created for the subchannel
+   device, namely 'mdev_create', 'mdev_destroy' and
+   'mdev_supported_types'.
+2. Create a mediated vfio ccw device.
+   Use the 'mdev_create' sysfs file, we need to manually create one (and
+   only one for our case) mediated device.
+3. vfio_mdev.ko drives the mediated ccw device.
+   vfio_mdev is also the vfio device drvier. It will probe the mdev and
+   add it to an iommu_group and a vfio_group. Then we could pass through
+   the mdev to a guest.
+
+vfio-ccw I/O region
+-------------------
+
+An I/O region is used to accept channel program request from user
+space and store I/O interrupt result for user space to retrieve. The
+defination of the region is:
+
+struct ccw_io_region {
+#define ORB_AREA_SIZE 12
+	__u8	orb_area[ORB_AREA_SIZE];
+#define SCSW_AREA_SIZE 12
+	__u8	scsw_area[SCSW_AREA_SIZE];
+#define IRB_AREA_SIZE 96
+	__u8	irb_area[IRB_AREA_SIZE];
+	__u32	ret_code;
+} __packed;
+
+While starting an I/O request, orb_area should be filled with the
+guest ORB, and scsw_area should be filled with the SCSW of the Virtual
+Subchannel.
+
+irb_area stores the I/O result.
+
+ret_code stores a return code for each access of the region.
+
+vfio-ccw patches overview
+-------------------------
+
+For now, our patches are rebased on the latest mdev implementation.
+vfio-ccw follows what vfio-pci did on the s390 paltform and uses
+vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch
+the code review for vfio-ccw. Note that the implementation is far from
+complete yet; but we'd like to get feedback for the general
+architecture.
+
+* CCW translation APIs
+- Description:
+  These introduce a group of APIs (start with 'cp_') to do CCW
+  translation. The CCWs passed in by a user space program are
+  organized with their guest physical memory addresses. These APIs
+  will copy the CCWs into the kernel space, and assemble a runnable
+  kernel channel program by updating the guest physical addresses with
+  their corresponding host physical addresses.
+- Patches:
+  vfio: ccw: introduce channel program interfaces
+
+* vfio_ccw device driver
+- Description:
+  The following patches utilizes the CCW translation APIs and introduce
+  vfio_ccw, which is the driver for the I/O subchannel devices you want
+  to pass through.
+  vfio_ccw implements the following vfio ioctls:
+    VFIO_DEVICE_GET_INFO
+    VFIO_DEVICE_GET_IRQ_INFO
+    VFIO_DEVICE_GET_REGION_INFO
+    VFIO_DEVICE_RESET
+    VFIO_DEVICE_SET_IRQS
+  This provides an I/O region, so that the user space program can pass a
+  channel program to the kernel, to do further CCW translation before
+  issuing them to a real device.
+  This also provides the SET_IRQ ioctl to setup an event notifier to
+  notify the user space program the I/O completion in an asynchronous
+  way.
+- Patches:
+  vfio: ccw: basic implementation for vfio_ccw driver
+  vfio: ccw: introduce ccw_io_region
+  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
+  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
+  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
+
+The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
+good example to get understand how these patches work. Here is a little
+bit more detail how an I/O request triggered by the Qemu guest will be
+handled (without error handling).
+
+Explanation:
+Q1-Q7: Qemu side process.
+K1-K5: Kernel side process.
+
+Q1. Get I/O region info during initialization.
+Q2. Setup event notifier and handler to handle I/O completion.
+
+... ...
+
+Q3. Intercept a ssch instruction.
+Q4. Write the guest channel program and ORB to the I/O region.
+    K1. Copy from guest to kernel.
+    K2. Translate the guest channel program to a host kernel space
+        channel program, which becomes runnable for a real device.
+    K3. With the necessary information contained in the orb passed in
+        by Qemu, issue the ccwchain to the device.
+    K4. Return the ssch CC code.
+Q5. Return the CC code to the guest.
+
+... ...
+
+    K5. Interrupt handler gets the I/O result and write the result to
+        the I/O region.
+    K6. Signal Qemu to retrieve the result.
+Q6. Get the signal and event handler reads out the result from the I/O
+    region.
+Q7. Update the irb for the guest.
+
+Limitations
+-----------
+
+The current vfio-ccw implementation focuses on supporting basic commands
+needed to implement block device functionality (read/write) of DASD/ECKD
+device only. Some commands may need special handling in the future, for
+example, anything related to path grouping.
+
+DASD is a kind of storage device. While ECKD is a data recording format.
+More information for DASD and ECKD could be found here:
+https://en.wikipedia.org/wiki/Direct-access_storage_device
+https://en.wikipedia.org/wiki/Count_key_data
+
+Together with the corresponding work in Qemu, we can bring the passed
+through DASD/ECKD device online in a guest now and use it as a block
+device.
+
+Reference
+---------
+1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
+2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
+3. https://en.wikipedia.org/wiki/Channel_I/O
+4. Documentation/s390/cds.txt
+5. Documentation/vfio.txt
+6. Documentation/vfio-mediated-device.txt
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 14/15] docs: add documentation for vfio-ccw
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Add file Documentation/s390/vfio-ccw.txt that includes details
of vfio-ccw.

Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
---
 Documentation/s390/00-INDEX     |   2 +
 Documentation/s390/vfio-ccw.txt | 303 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 305 insertions(+)
 create mode 100644 Documentation/s390/vfio-ccw.txt

diff --git a/Documentation/s390/00-INDEX b/Documentation/s390/00-INDEX
index 9189535..317f037 100644
--- a/Documentation/s390/00-INDEX
+++ b/Documentation/s390/00-INDEX
@@ -22,5 +22,7 @@ qeth.txt
 	- HiperSockets Bridge Port Support.
 s390dbf.txt
 	- information on using the s390 debug feature.
+vfio-ccw.txt
+	  information on the vfio-ccw I/O subchannel driver.
 zfcpdump.txt
 	- information on the s390 SCSI dump tool.
diff --git a/Documentation/s390/vfio-ccw.txt b/Documentation/s390/vfio-ccw.txt
new file mode 100644
index 0000000..90b3dfe
--- /dev/null
+++ b/Documentation/s390/vfio-ccw.txt
@@ -0,0 +1,303 @@
+vfio-ccw: the basic infrastructure
+==================================
+
+Introduction
+------------
+
+Here we describe the vfio support for I/O subchannel devices for
+Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
+virtual machine, while vfio is the means.
+
+Different than other hardware architectures, s390 has defined a unified
+I/O access method, which is so called Channel I/O. It has its own access
+patterns:
+- Channel programs run asynchronously on a separate (co)processor.
+- The channel subsystem will access any memory designated by the caller
+  in the channel program directly, i.e. there is no iommu involved.
+Thus when we introduce vfio support for these devices, we realize it
+with a mediated device (mdev) implementation. The vfio mdev will be
+added to an iommu group, so as to make itself able to be managed by the
+vfio framework. And we add read/write callbacks for special vfio I/O
+regions to pass the channel programs from the mdev to its parent device
+(the real I/O subchannel device) to do further address translation and
+to perform I/O instructions.
+
+This document does not intend to explain the s390 I/O architecture in
+every detail. More information/reference could be found here:
+- A good start to know Channel I/O in general:
+  https://en.wikipedia.org/wiki/Channel_I/O
+- s390 architecture:
+  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
+- The existing Qemu code which implements a simple emulated channel
+  subsystem could also be a good reference. It makes it easier to follow
+  the flow.
+  qemu/hw/s390x/css.c
+
+For vfio mediated device framework:
+- Documentation/vfio-mediated-device.txt
+
+Motivation of vfio-ccw
+----------------------
+
+Currently, a guest virtualized via qemu/kvm on s390 only sees
+paravirtualized virtio devices via the "Virtio Over Channel I/O
+(virtio-ccw)" transport. This makes virtio devices discoverable via
+standard operating system algorithms for handling channel devices.
+
+However this is not enough. On s390 for the majority of devices, which
+use the standard Channel I/O based mechanism, we also need to provide
+the functionality of passing through them to a Qemu virtual machine.
+This includes devices that don't have a virtio counterpart (e.g. tape
+drives) or that have specific characteristics which guests want to
+exploit.
+
+For passing a device to a guest, we want to use the same interface as
+everybody else, namely vfio. Thus, we would like to introduce vfio
+support for channel devices. And we would like to name this new vfio
+device "vfio-ccw".
+
+Access patterns of CCW devices
+------------------------------
+
+s390 architecture has implemented a so called channel subsystem, that
+provides a unified view of the devices physically attached to the
+systems. Though the s390 hardware platform knows about a huge variety of
+different peripheral attachments like disk devices (aka. DASDs), tapes,
+communication controllers, etc. They can all be accessed by a well
+defined access method and they are presenting I/O completion a unified
+way: I/O interruptions.
+
+All I/O requires the use of channel command words (CCWs). A CCW is an
+instruction to a specialized I/O channel processor. A channel program is
+a sequence of CCWs which are executed by the I/O channel subsystem.  To
+issue a channel program to the channel subsystem, it is required to
+build an operation request block (ORB), which can be used to point out
+the format of the CCW and other control information to the system. The
+operating system signals the I/O channel subsystem to begin executing
+the channel program with a SSCH (start sub-channel) instruction. The
+central processor is then free to proceed with non-I/O instructions
+until interrupted. The I/O completion result is received by the
+interrupt handler in the form of interrupt response block (IRB).
+
+Back to vfio-ccw, in short:
+- ORBs and channel programs are built in guest kernel (with guest
+  physical addresses).
+- ORBs and channel programs are passed to the host kernel.
+- Host kernel translates the guest physical addresses to real addresses
+  and starts the I/O with issuing a privileged Channel I/O instruction
+  (e.g SSCH).
+- channel programs run asynchronously on a separate processor.
+- I/O completion will be signaled to the host with I/O interruptions.
+  And it will be copied as IRB to user space to pass it back to the
+  guest.
+
+Physical vfio ccw device and its child mdev
+-------------------------------------------
+
+As mentioned above, we realize vfio-ccw with a mdev implementation.
+
+Channel I/O does not have IOMMU hardware support, so the physical
+vfio-ccw device does not have an IOMMU level translation or isolation.
+
+Sub-channel I/O instructions are all privileged instructions, When
+handling the I/O instruction interception, vfio-ccw has the software
+policing and translation how the channel program is programmed before
+it gets sent to hardware.
+
+Within this implementation, we have two drivers for two types of
+devices:
+- The vfio_ccw driver for the physical subchannel device.
+  This is an I/O subchannel driver for the real subchannel device.  It
+  realizes a group of callbacks and registers to the mdev framework as a
+  parent (physical) device. As a consequence, mdev provides vfio_ccw a
+  generic interface (sysfs) to create mdev devices. A vfio mdev could be
+  created by vfio_ccw then and added to the mediated bus. It is the vfio
+  device that added to an IOMMU group and a vfio group.
+  vfio_ccw also provides an I/O region to accept channel program
+  request from user space and store I/O interrupt result for user
+  space to retrieve. To notify user space an I/O completion, it offers
+  an interface to setup an eventfd fd for asynchronous signaling.
+
+- The vfio_mdev driver for the mediated vfio ccw device.
+  This is provided by the mdev framework. It is a vfio device driver for
+  the mdev that created by vfio_ccw.
+  It realize a group of vfio device driver callbacks, adds itself to a
+  vfio group, and registers itself to the mdev framework as a mdev
+  driver.
+  It uses a vfio iommu backend that uses the existing map and unmap
+  ioctls, but rather than programming them into an IOMMU for a device,
+  it simply stores the translations for use by later requests. This
+  means that a device programmed in a VM with guest physical addresses
+  can have the vfio kernel convert that address to process virtual
+  address, pin the page and program the hardware with the host physical
+  address in one step.
+  For a mdev, the vfio iommu backend will not pin the pages during the
+  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
+  of the iova<->vaddr mappings in this operation. And they export a
+  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
+  backend for the physical devices to pin and unpin pages by demand.
+
+Below is a high Level block diagram.
+
+ +-------------+
+ |             |
+ | +---------+ | mdev_register_driver() +--------------+
+ | |  Mdev   | +<-----------------------+              |
+ | |  bus    | |                        | vfio_mdev.ko |
+ | | driver  | +----------------------->+              |<-> VFIO user
+ | +---------+ |    probe()/remove()    +--------------+    APIs
+ |             |
+ |  MDEV CORE  |
+ |   MODULE    |
+ |   mdev.ko   |
+ | +---------+ | mdev_register_device() +--------------+
+ | |Physical | +<-----------------------+              |
+ | | device  | |                        |  vfio_ccw.ko |<-> subchannel
+ | |interface| +----------------------->+              |     device
+ | +---------+ |       callback         +--------------+
+ +-------------+
+
+The process of how these work together.
+1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
+   physical device (with callbacks) to mdev framework.
+   When vfio_ccw probing the subchannel device, it registers device
+   pointer and callbacks to the mdev framework. Mdev related file nodes
+   under the device node in sysfs would be created for the subchannel
+   device, namely 'mdev_create', 'mdev_destroy' and
+   'mdev_supported_types'.
+2. Create a mediated vfio ccw device.
+   Use the 'mdev_create' sysfs file, we need to manually create one (and
+   only one for our case) mediated device.
+3. vfio_mdev.ko drives the mediated ccw device.
+   vfio_mdev is also the vfio device drvier. It will probe the mdev and
+   add it to an iommu_group and a vfio_group. Then we could pass through
+   the mdev to a guest.
+
+vfio-ccw I/O region
+-------------------
+
+An I/O region is used to accept channel program request from user
+space and store I/O interrupt result for user space to retrieve. The
+defination of the region is:
+
+struct ccw_io_region {
+#define ORB_AREA_SIZE 12
+	__u8	orb_area[ORB_AREA_SIZE];
+#define SCSW_AREA_SIZE 12
+	__u8	scsw_area[SCSW_AREA_SIZE];
+#define IRB_AREA_SIZE 96
+	__u8	irb_area[IRB_AREA_SIZE];
+	__u32	ret_code;
+} __packed;
+
+While starting an I/O request, orb_area should be filled with the
+guest ORB, and scsw_area should be filled with the SCSW of the Virtual
+Subchannel.
+
+irb_area stores the I/O result.
+
+ret_code stores a return code for each access of the region.
+
+vfio-ccw patches overview
+-------------------------
+
+For now, our patches are rebased on the latest mdev implementation.
+vfio-ccw follows what vfio-pci did on the s390 paltform and uses
+vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch
+the code review for vfio-ccw. Note that the implementation is far from
+complete yet; but we'd like to get feedback for the general
+architecture.
+
+* CCW translation APIs
+- Description:
+  These introduce a group of APIs (start with 'cp_') to do CCW
+  translation. The CCWs passed in by a user space program are
+  organized with their guest physical memory addresses. These APIs
+  will copy the CCWs into the kernel space, and assemble a runnable
+  kernel channel program by updating the guest physical addresses with
+  their corresponding host physical addresses.
+- Patches:
+  vfio: ccw: introduce channel program interfaces
+
+* vfio_ccw device driver
+- Description:
+  The following patches utilizes the CCW translation APIs and introduce
+  vfio_ccw, which is the driver for the I/O subchannel devices you want
+  to pass through.
+  vfio_ccw implements the following vfio ioctls:
+    VFIO_DEVICE_GET_INFO
+    VFIO_DEVICE_GET_IRQ_INFO
+    VFIO_DEVICE_GET_REGION_INFO
+    VFIO_DEVICE_RESET
+    VFIO_DEVICE_SET_IRQS
+  This provides an I/O region, so that the user space program can pass a
+  channel program to the kernel, to do further CCW translation before
+  issuing them to a real device.
+  This also provides the SET_IRQ ioctl to setup an event notifier to
+  notify the user space program the I/O completion in an asynchronous
+  way.
+- Patches:
+  vfio: ccw: basic implementation for vfio_ccw driver
+  vfio: ccw: introduce ccw_io_region
+  vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
+  vfio: ccw: realize VFIO_DEVICE_RESET ioctl
+  vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
+
+The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
+good example to get understand how these patches work. Here is a little
+bit more detail how an I/O request triggered by the Qemu guest will be
+handled (without error handling).
+
+Explanation:
+Q1-Q7: Qemu side process.
+K1-K5: Kernel side process.
+
+Q1. Get I/O region info during initialization.
+Q2. Setup event notifier and handler to handle I/O completion.
+
+... ...
+
+Q3. Intercept a ssch instruction.
+Q4. Write the guest channel program and ORB to the I/O region.
+    K1. Copy from guest to kernel.
+    K2. Translate the guest channel program to a host kernel space
+        channel program, which becomes runnable for a real device.
+    K3. With the necessary information contained in the orb passed in
+        by Qemu, issue the ccwchain to the device.
+    K4. Return the ssch CC code.
+Q5. Return the CC code to the guest.
+
+... ...
+
+    K5. Interrupt handler gets the I/O result and write the result to
+        the I/O region.
+    K6. Signal Qemu to retrieve the result.
+Q6. Get the signal and event handler reads out the result from the I/O
+    region.
+Q7. Update the irb for the guest.
+
+Limitations
+-----------
+
+The current vfio-ccw implementation focuses on supporting basic commands
+needed to implement block device functionality (read/write) of DASD/ECKD
+device only. Some commands may need special handling in the future, for
+example, anything related to path grouping.
+
+DASD is a kind of storage device. While ECKD is a data recording format.
+More information for DASD and ECKD could be found here:
+https://en.wikipedia.org/wiki/Direct-access_storage_device
+https://en.wikipedia.org/wiki/Count_key_data
+
+Together with the corresponding work in Qemu, we can bring the passed
+through DASD/ECKD device online in a guest now and use it as a block
+device.
+
+Reference
+---------
+1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
+2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
+3. https://en.wikipedia.org/wiki/Channel_I/O
+4. Documentation/s390/cds.txt
+5. Documentation/vfio.txt
+6. Documentation/vfio-mediated-device.txt
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-17  8:29   ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Although Linux does not use format-0 channel command words (CCW0)
these are a non-optional part of the platform spec, and for the sake
of platform compliance, and possibly some non-Linux guests, we have
to support CCW0.

Making the kernel execute a format 0 channel program is too much hassle
because we would need to allocate and use memory which can be addressed
by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
support by translating the channel program into an equivalent CCW1
program instead.

Signed-off-by: Kai Yue Wang <wkywang@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 arch/s390/Kconfig              |  7 +++++
 drivers/s390/cio/vfio_ccw_cp.c | 58 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 32008b8..f25d077 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -680,6 +680,13 @@ config VFIO_CCW
 	  To compile this driver as a module, choose M here: the
 	  module will be called vfio_ccw.
 
+config VFIO_CCW_CCW0
+	def_bool n
+	prompt "Support for CCW0 translation"
+	depends on VFIO_CCW
+	help
+	  Enable translation for CCW0 programs for VFIO-CCW subchannels.
+
 endmenu
 
 menu "Dump support"
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 16bbb54..b0a8bc05 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -15,6 +15,26 @@
 
 #include "vfio_ccw_cp.h"
 
+#ifdef CONFIG_VFIO_CCW_CCW0
+/**
+ * struct ccw0 - channel command word
+ * @cmd_code: command code
+ * @cda: data address
+ * @flags: flags, like IDA addressing, etc.
+ * @reserved: will be ignored
+ * @count: byte count
+ *
+ * The format-0 ccw structure.
+ */
+struct ccw0 {
+	__u8 cmd_code;
+	__u32 cda : 24;
+	__u8  flags;
+	__u8  reserved;
+	__u16 count;
+} __packed __aligned(8);
+#endif
+
 /*
  * Max length for ccw chain.
  * XXX: Limit to 256, need to check more?
@@ -243,12 +263,42 @@ static long copy_from_iova(struct device *mdev,
 	return l;
 }
 
+#ifdef CONFIG_VFIO_CCW_CCW0
+static long copy_ccw_from_iova(struct channel_program *cp,
+			       struct ccw1 *to, u64 iova,
+			       unsigned long len)
+{
+	struct ccw0 ccw0;
+	struct ccw1 *pccw1;
+	int ret;
+	int i;
+
+	ret = copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
+	if (ret)
+		return ret;
+
+	if (!cp->orb.cmd.fmt) {
+		pccw1 = to;
+		for (i = 0; i < len; i++) {
+			ccw0 = *(struct ccw0 *)pccw1;
+			pccw1->cmd_code = ccw0.cmd_code;
+			pccw1->flags = ccw0.flags;
+			pccw1->count = ccw0.count;
+			pccw1->cda = ccw0.cda;
+			pccw1++;
+		}
+	}
+
+	return ret;
+}
+#else
 static long copy_ccw_from_iova(struct channel_program *cp,
 			       struct ccw1 *to, u64 iova,
 			       unsigned long len)
 {
 	return copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
 }
+#endif
 
 /*
  * Helpers to operate ccwchain.
@@ -616,10 +666,14 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 	 * Only support prefetch enable mode now.
 	 * Only support 64bit addressing idal.
 	 * Only support 4k IDAW.
-	 * Only support ccw1.
 	 */
-	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k || !orb->cmd.fmt)
+	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k)
+		return -EOPNOTSUPP;
+
+#ifndef CONFIG_VFIO_CCW_CCW0
+	if (!orb->cmd.fmt)
 		return -EOPNOTSUPP;
+#endif
 
 	INIT_LIST_HEAD(&cp->ccwchain_list);
 	memcpy(&cp->orb, orb, sizeof(*orb));
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [Qemu-devel] [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0
@ 2017-02-17  8:29   ` Dong Jia Shi
  0 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-17  8:29 UTC (permalink / raw)
  To: kvm, linux-s390, qemu-devel
  Cc: bjsdjshi, renxiaof, cornelia.huck, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

Although Linux does not use format-0 channel command words (CCW0)
these are a non-optional part of the platform spec, and for the sake
of platform compliance, and possibly some non-Linux guests, we have
to support CCW0.

Making the kernel execute a format 0 channel program is too much hassle
because we would need to allocate and use memory which can be addressed
by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
support by translating the channel program into an equivalent CCW1
program instead.

Signed-off-by: Kai Yue Wang <wkywang@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
---
 arch/s390/Kconfig              |  7 +++++
 drivers/s390/cio/vfio_ccw_cp.c | 58 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 32008b8..f25d077 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -680,6 +680,13 @@ config VFIO_CCW
 	  To compile this driver as a module, choose M here: the
 	  module will be called vfio_ccw.
 
+config VFIO_CCW_CCW0
+	def_bool n
+	prompt "Support for CCW0 translation"
+	depends on VFIO_CCW
+	help
+	  Enable translation for CCW0 programs for VFIO-CCW subchannels.
+
 endmenu
 
 menu "Dump support"
diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
index 16bbb54..b0a8bc05 100644
--- a/drivers/s390/cio/vfio_ccw_cp.c
+++ b/drivers/s390/cio/vfio_ccw_cp.c
@@ -15,6 +15,26 @@
 
 #include "vfio_ccw_cp.h"
 
+#ifdef CONFIG_VFIO_CCW_CCW0
+/**
+ * struct ccw0 - channel command word
+ * @cmd_code: command code
+ * @cda: data address
+ * @flags: flags, like IDA addressing, etc.
+ * @reserved: will be ignored
+ * @count: byte count
+ *
+ * The format-0 ccw structure.
+ */
+struct ccw0 {
+	__u8 cmd_code;
+	__u32 cda : 24;
+	__u8  flags;
+	__u8  reserved;
+	__u16 count;
+} __packed __aligned(8);
+#endif
+
 /*
  * Max length for ccw chain.
  * XXX: Limit to 256, need to check more?
@@ -243,12 +263,42 @@ static long copy_from_iova(struct device *mdev,
 	return l;
 }
 
+#ifdef CONFIG_VFIO_CCW_CCW0
+static long copy_ccw_from_iova(struct channel_program *cp,
+			       struct ccw1 *to, u64 iova,
+			       unsigned long len)
+{
+	struct ccw0 ccw0;
+	struct ccw1 *pccw1;
+	int ret;
+	int i;
+
+	ret = copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
+	if (ret)
+		return ret;
+
+	if (!cp->orb.cmd.fmt) {
+		pccw1 = to;
+		for (i = 0; i < len; i++) {
+			ccw0 = *(struct ccw0 *)pccw1;
+			pccw1->cmd_code = ccw0.cmd_code;
+			pccw1->flags = ccw0.flags;
+			pccw1->count = ccw0.count;
+			pccw1->cda = ccw0.cda;
+			pccw1++;
+		}
+	}
+
+	return ret;
+}
+#else
 static long copy_ccw_from_iova(struct channel_program *cp,
 			       struct ccw1 *to, u64 iova,
 			       unsigned long len)
 {
 	return copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
 }
+#endif
 
 /*
  * Helpers to operate ccwchain.
@@ -616,10 +666,14 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
 	 * Only support prefetch enable mode now.
 	 * Only support 64bit addressing idal.
 	 * Only support 4k IDAW.
-	 * Only support ccw1.
 	 */
-	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k || !orb->cmd.fmt)
+	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k)
+		return -EOPNOTSUPP;
+
+#ifndef CONFIG_VFIO_CCW_CCW0
+	if (!orb->cmd.fmt)
 		return -EOPNOTSUPP;
+#endif
 
 	INIT_LIST_HEAD(&cp->ccwchain_list);
 	memcpy(&cp->orb, orb, sizeof(*orb));
-- 
2.8.4

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-20 18:31     ` Cornelia Huck
  -1 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-02-20 18:31 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Fri, 17 Feb 2017 09:29:28 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> To make vfio support subchannel devices, we need a css driver for
> the vfio subchannels. This patch adds a basic vfio-ccw subchannel
> driver for this purpose.
> 
> To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
> and configure VFIO as required.
> 
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> ---
>  arch/s390/Kconfig                   |  10 ++
>  arch/s390/include/asm/isc.h         |   1 +
>  drivers/iommu/Kconfig               |   8 ++
>  drivers/s390/cio/Makefile           |   3 +
>  drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
>  drivers/s390/cio/vfio_ccw_private.h |  25 ++++
>  6 files changed, 309 insertions(+)
>  create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
>  create mode 100644 drivers/s390/cio/vfio_ccw_private.h
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index c6722112..b920df8 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -670,6 +670,16 @@ config EADM_SCH
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called eadm_sch.
> 
> +config VFIO_CCW
> +	def_tristate n
> +	prompt "Support for VFIO-CCW subchannels"
> +	depends on S390_CCW_IOMMU && VFIO
> +	help
> +	  This driver allows usage of VFIO-CCW subchannels.

Hm...

"This driver allows usage of I/O subchannels via VFIO-CCW."

?

> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called vfio_ccw.
> +
>  endmenu
> 
>  menu "Dump support"
> diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
> index 68d7d68..8a0b721 100644
> --- a/arch/s390/include/asm/isc.h
> +++ b/arch/s390/include/asm/isc.h
> @@ -16,6 +16,7 @@
>  #define CONSOLE_ISC 1			/* console I/O subchannel */
>  #define EADM_SCH_ISC 4			/* EADM subchannels */
>  #define CHSC_SCH_ISC 7			/* CHSC subchannels */
> +#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */

This is OK for now, I guess; but do we want to have the isc
configurable in the long run? I.e., if a host wants to run its own I/O
devices at a different priority than the devices it passes to a guest?

>  /* Adapter interrupts. */
>  #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
>  #define PCI_ISC 2			/* PCI I/O subchannels */

(...)

> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> new file mode 100644
> index 0000000..b068207
> --- /dev/null
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -0,0 +1,262 @@
> +/*
> + * VFIO based Physical Subchannel device driver
> + *
> + * Copyright IBM Corp. 2017
> + *
> + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> + *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +
> +#include <asm/isc.h>
> +
> +#include "vfio_ccw_private.h"
> +
> +/*
> + * Helpers
> + */
> +static int vfio_ccw_sch_quiesce(struct subchannel *sch)
> +{
> +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> +	DECLARE_COMPLETION_ONSTACK(completion);
> +	int iretry, ret = 0;
> +
> +	spin_lock_irq(sch->lock);
> +	if (!sch->schib.pmcw.ena)
> +		goto out_unlock;
> +	ret = cio_disable_subchannel(sch);
> +	if (ret != -EBUSY)
> +		goto out_unlock;
> +
> +	do {
> +		iretry = 255;
> +
> +		ret = cio_cancel_halt_clear(sch, &iretry);
> +		while (ret == -EBUSY) {
> +			/*
> +			 * Flushing all I/O and wait the

"Flush all I/O and wait for..."

> +			 * cancel/halt/clear completion.
> +			 */
> +			private->completion = &completion;
> +			spin_unlock_irq(sch->lock);
> +
> +			wait_for_completion(&completion);

What happens for cancel? It won't generate an interrupt.

> +
> +			spin_lock_irq(sch->lock);
> +			private->completion = NULL;
> +			ret = cio_cancel_halt_clear(sch, &iretry);
> +		};
> +
> +		ret = cio_disable_subchannel(sch);
> +	} while (ret == -EBUSY);
> +
> +out_unlock:
> +	spin_unlock_irq(sch->lock);
> +	return ret;
> +}
> +
> +/*
> + * Sysfs interfaces
> + */
> +static ssize_t chpids_show(struct device *dev,
> +			   struct device_attribute *attr,
> +			   char *buf)
> +{
> +	struct subchannel *sch = to_subchannel(dev);
> +	struct chsc_ssd_info *ssd = &sch->ssd_info;
> +	ssize_t ret = 0;
> +	int chp;
> +	int mask;
> +
> +	for (chp = 0; chp < 8; chp++) {
> +		mask = 0x80 >> chp;
> +		if (ssd->path_mask & mask)
> +			ret += sprintf(buf + ret, "%02x ", ssd->chpid[chp].id);
> +		else
> +			ret += sprintf(buf + ret, "00 ");
> +	}
> +	ret += sprintf(buf+ret, "\n");
> +	return ret;
> +}
> +
> +static ssize_t pimpampom_show(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct subchannel *sch = to_subchannel(dev);
> +	struct pmcw *pmcw = &sch->schib.pmcw;
> +
> +	return sprintf(buf, "%02x %02x %02x\n",
> +		       pmcw->pim, pmcw->pam, pmcw->pom);
> +}
> +
> +static DEVICE_ATTR(chpids, 0444, chpids_show, NULL);
> +static DEVICE_ATTR(pimpampom, 0444, pimpampom_show, NULL);

Quick question: You need to duplicate these so that lscss shows a sane
output for vfio-ccw subchannels?

> +
> +static struct attribute *vfio_subchannel_attrs[] = {
> +	&dev_attr_chpids.attr,
> +	&dev_attr_pimpampom.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group vfio_subchannel_attr_group = {
> +	.attrs = vfio_subchannel_attrs,
> +};
> +
> +/*
> + * Css driver callbacks
> + */
> +static void vfio_ccw_sch_irq(struct subchannel *sch)
> +{
> +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> +
> +	inc_irq_stat(IRQIO_CIO);
> +
> +	if (!private)
> +		return;
> +
> +	if (private->completion)
> +		complete(private->completion);
> +}
> +
> +static int vfio_ccw_sch_probe(struct subchannel *sch)
> +{
> +	struct pmcw *pmcw = &sch->schib.pmcw;
> +	struct vfio_ccw_private *private;
> +	int ret;
> +
> +	if (pmcw->qf) {
> +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",

s/do/does/

> +			 dev_name(&sch->dev));
> +		return -ENOTTY;

Is -ENOTTY the right return code here? -EINVAL?

> +	}
> +
> +	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> +	if (!private)
> +		return -ENOMEM;
> +	private->sch = sch;
> +	dev_set_drvdata(&sch->dev, private);
> +
> +	spin_lock_irq(sch->lock);
> +	sch->isc = VFIO_CCW_ISC;
> +	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
> +	spin_unlock_irq(sch->lock);
> +	if (ret)
> +		goto out_free;
> +
> +	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
> +	if (ret)
> +		goto out_disable;
> +
> +	return 0;
> +
> +out_disable:
> +	cio_disable_subchannel(sch);
> +out_free:
> +	dev_set_drvdata(&sch->dev, NULL);
> +	kfree(private);
> +	return ret;
> +}
> +

(...)

> +/**
> + * vfio_ccw_sch_event - process subchannel event
> + * @sch: subchannel
> + * @process: non-zero if function is called in process context
> + *
> + * An unspecified event occurred for this subchannel. Adjust data according
> + * to the current operational state of the subchannel. Return zero when the
> + * event has been handled sufficiently or -EAGAIN when this function should
> + * be called again in process context.
> + */
> +static int vfio_ccw_sch_event(struct subchannel *sch, int process)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(sch->lock, flags);
> +	if (!device_is_registered(&sch->dev))
> +		goto out_unlock;
> +
> +	if (work_pending(&sch->todo_work))
> +		goto out_unlock;
> +
> +	if (cio_update_schib(sch)) {
> +		/* Not operational. */
> +		css_sched_sch_todo(sch, SCH_TODO_UNREG);
> +
> +		/*
> +		 * TODO:
> +		 * Probably we should send the machine check to the guest.

Yes, we should do that later on. Will user space notice that the device
is gone? (I think crw injection should be done by user space.)

> +		 */
> +		goto out_unlock;
> +	}
> +
> +out_unlock:
> +	spin_unlock_irqrestore(sch->lock, flags);
> +
> +	return 0;
> +}

(...)

Looks sane in general from my POV.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
@ 2017-02-20 18:31     ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-02-20 18:31 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Fri, 17 Feb 2017 09:29:28 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> To make vfio support subchannel devices, we need a css driver for
> the vfio subchannels. This patch adds a basic vfio-ccw subchannel
> driver for this purpose.
> 
> To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
> and configure VFIO as required.
> 
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> ---
>  arch/s390/Kconfig                   |  10 ++
>  arch/s390/include/asm/isc.h         |   1 +
>  drivers/iommu/Kconfig               |   8 ++
>  drivers/s390/cio/Makefile           |   3 +
>  drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
>  drivers/s390/cio/vfio_ccw_private.h |  25 ++++
>  6 files changed, 309 insertions(+)
>  create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
>  create mode 100644 drivers/s390/cio/vfio_ccw_private.h
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index c6722112..b920df8 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -670,6 +670,16 @@ config EADM_SCH
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called eadm_sch.
> 
> +config VFIO_CCW
> +	def_tristate n
> +	prompt "Support for VFIO-CCW subchannels"
> +	depends on S390_CCW_IOMMU && VFIO
> +	help
> +	  This driver allows usage of VFIO-CCW subchannels.

Hm...

"This driver allows usage of I/O subchannels via VFIO-CCW."

?

> +
> +	  To compile this driver as a module, choose M here: the
> +	  module will be called vfio_ccw.
> +
>  endmenu
> 
>  menu "Dump support"
> diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
> index 68d7d68..8a0b721 100644
> --- a/arch/s390/include/asm/isc.h
> +++ b/arch/s390/include/asm/isc.h
> @@ -16,6 +16,7 @@
>  #define CONSOLE_ISC 1			/* console I/O subchannel */
>  #define EADM_SCH_ISC 4			/* EADM subchannels */
>  #define CHSC_SCH_ISC 7			/* CHSC subchannels */
> +#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */

This is OK for now, I guess; but do we want to have the isc
configurable in the long run? I.e., if a host wants to run its own I/O
devices at a different priority than the devices it passes to a guest?

>  /* Adapter interrupts. */
>  #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
>  #define PCI_ISC 2			/* PCI I/O subchannels */

(...)

> diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> new file mode 100644
> index 0000000..b068207
> --- /dev/null
> +++ b/drivers/s390/cio/vfio_ccw_drv.c
> @@ -0,0 +1,262 @@
> +/*
> + * VFIO based Physical Subchannel device driver
> + *
> + * Copyright IBM Corp. 2017
> + *
> + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> + *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +
> +#include <asm/isc.h>
> +
> +#include "vfio_ccw_private.h"
> +
> +/*
> + * Helpers
> + */
> +static int vfio_ccw_sch_quiesce(struct subchannel *sch)
> +{
> +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> +	DECLARE_COMPLETION_ONSTACK(completion);
> +	int iretry, ret = 0;
> +
> +	spin_lock_irq(sch->lock);
> +	if (!sch->schib.pmcw.ena)
> +		goto out_unlock;
> +	ret = cio_disable_subchannel(sch);
> +	if (ret != -EBUSY)
> +		goto out_unlock;
> +
> +	do {
> +		iretry = 255;
> +
> +		ret = cio_cancel_halt_clear(sch, &iretry);
> +		while (ret == -EBUSY) {
> +			/*
> +			 * Flushing all I/O and wait the

"Flush all I/O and wait for..."

> +			 * cancel/halt/clear completion.
> +			 */
> +			private->completion = &completion;
> +			spin_unlock_irq(sch->lock);
> +
> +			wait_for_completion(&completion);

What happens for cancel? It won't generate an interrupt.

> +
> +			spin_lock_irq(sch->lock);
> +			private->completion = NULL;
> +			ret = cio_cancel_halt_clear(sch, &iretry);
> +		};
> +
> +		ret = cio_disable_subchannel(sch);
> +	} while (ret == -EBUSY);
> +
> +out_unlock:
> +	spin_unlock_irq(sch->lock);
> +	return ret;
> +}
> +
> +/*
> + * Sysfs interfaces
> + */
> +static ssize_t chpids_show(struct device *dev,
> +			   struct device_attribute *attr,
> +			   char *buf)
> +{
> +	struct subchannel *sch = to_subchannel(dev);
> +	struct chsc_ssd_info *ssd = &sch->ssd_info;
> +	ssize_t ret = 0;
> +	int chp;
> +	int mask;
> +
> +	for (chp = 0; chp < 8; chp++) {
> +		mask = 0x80 >> chp;
> +		if (ssd->path_mask & mask)
> +			ret += sprintf(buf + ret, "%02x ", ssd->chpid[chp].id);
> +		else
> +			ret += sprintf(buf + ret, "00 ");
> +	}
> +	ret += sprintf(buf+ret, "\n");
> +	return ret;
> +}
> +
> +static ssize_t pimpampom_show(struct device *dev,
> +			      struct device_attribute *attr,
> +			      char *buf)
> +{
> +	struct subchannel *sch = to_subchannel(dev);
> +	struct pmcw *pmcw = &sch->schib.pmcw;
> +
> +	return sprintf(buf, "%02x %02x %02x\n",
> +		       pmcw->pim, pmcw->pam, pmcw->pom);
> +}
> +
> +static DEVICE_ATTR(chpids, 0444, chpids_show, NULL);
> +static DEVICE_ATTR(pimpampom, 0444, pimpampom_show, NULL);

Quick question: You need to duplicate these so that lscss shows a sane
output for vfio-ccw subchannels?

> +
> +static struct attribute *vfio_subchannel_attrs[] = {
> +	&dev_attr_chpids.attr,
> +	&dev_attr_pimpampom.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group vfio_subchannel_attr_group = {
> +	.attrs = vfio_subchannel_attrs,
> +};
> +
> +/*
> + * Css driver callbacks
> + */
> +static void vfio_ccw_sch_irq(struct subchannel *sch)
> +{
> +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> +
> +	inc_irq_stat(IRQIO_CIO);
> +
> +	if (!private)
> +		return;
> +
> +	if (private->completion)
> +		complete(private->completion);
> +}
> +
> +static int vfio_ccw_sch_probe(struct subchannel *sch)
> +{
> +	struct pmcw *pmcw = &sch->schib.pmcw;
> +	struct vfio_ccw_private *private;
> +	int ret;
> +
> +	if (pmcw->qf) {
> +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",

s/do/does/

> +			 dev_name(&sch->dev));
> +		return -ENOTTY;

Is -ENOTTY the right return code here? -EINVAL?

> +	}
> +
> +	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> +	if (!private)
> +		return -ENOMEM;
> +	private->sch = sch;
> +	dev_set_drvdata(&sch->dev, private);
> +
> +	spin_lock_irq(sch->lock);
> +	sch->isc = VFIO_CCW_ISC;
> +	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
> +	spin_unlock_irq(sch->lock);
> +	if (ret)
> +		goto out_free;
> +
> +	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
> +	if (ret)
> +		goto out_disable;
> +
> +	return 0;
> +
> +out_disable:
> +	cio_disable_subchannel(sch);
> +out_free:
> +	dev_set_drvdata(&sch->dev, NULL);
> +	kfree(private);
> +	return ret;
> +}
> +

(...)

> +/**
> + * vfio_ccw_sch_event - process subchannel event
> + * @sch: subchannel
> + * @process: non-zero if function is called in process context
> + *
> + * An unspecified event occurred for this subchannel. Adjust data according
> + * to the current operational state of the subchannel. Return zero when the
> + * event has been handled sufficiently or -EAGAIN when this function should
> + * be called again in process context.
> + */
> +static int vfio_ccw_sch_event(struct subchannel *sch, int process)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(sch->lock, flags);
> +	if (!device_is_registered(&sch->dev))
> +		goto out_unlock;
> +
> +	if (work_pending(&sch->todo_work))
> +		goto out_unlock;
> +
> +	if (cio_update_schib(sch)) {
> +		/* Not operational. */
> +		css_sched_sch_todo(sch, SCH_TODO_UNREG);
> +
> +		/*
> +		 * TODO:
> +		 * Probably we should send the machine check to the guest.

Yes, we should do that later on. Will user space notice that the device
is gone? (I think crw injection should be done by user space.)

> +		 */
> +		goto out_unlock;
> +	}
> +
> +out_unlock:
> +	spin_unlock_irqrestore(sch->lock, flags);
> +
> +	return 0;
> +}

(...)

Looks sane in general from my POV.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0
  2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-20 18:59     ` Cornelia Huck
  -1 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-02-20 18:59 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Fri, 17 Feb 2017 09:29:39 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> Although Linux does not use format-0 channel command words (CCW0)
> these are a non-optional part of the platform spec, and for the sake
> of platform compliance, and possibly some non-Linux guests, we have
> to support CCW0.
> 
> Making the kernel execute a format 0 channel program is too much hassle
> because we would need to allocate and use memory which can be addressed
> by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
> support by translating the channel program into an equivalent CCW1
> program instead.
> 
> Signed-off-by: Kai Yue Wang <wkywang@linux.vnet.ibm.com>
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> ---
>  arch/s390/Kconfig              |  7 +++++
>  drivers/s390/cio/vfio_ccw_cp.c | 58 ++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 63 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 32008b8..f25d077 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -680,6 +680,13 @@ config VFIO_CCW
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called vfio_ccw.
> 
> +config VFIO_CCW_CCW0
> +	def_bool n
> +	prompt "Support for CCW0 translation"
> +	depends on VFIO_CCW
> +	help
> +	  Enable translation for CCW0 programs for VFIO-CCW subchannels.

Hm... if ccw0 is non-optional for the architecture, why are you making
it optional then?

> +
>  endmenu
> 
>  menu "Dump support"
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 16bbb54..b0a8bc05 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -15,6 +15,26 @@
> 
>  #include "vfio_ccw_cp.h"
> 
> +#ifdef CONFIG_VFIO_CCW_CCW0
> +/**
> + * struct ccw0 - channel command word
> + * @cmd_code: command code
> + * @cda: data address
> + * @flags: flags, like IDA addressing, etc.
> + * @reserved: will be ignored
> + * @count: byte count
> + *
> + * The format-0 ccw structure.
> + */
> +struct ccw0 {
> +	__u8 cmd_code;
> +	__u32 cda : 24;
> +	__u8  flags;
> +	__u8  reserved;
> +	__u16 count;
> +} __packed __aligned(8);
> +#endif

Would it be better to move this definition next to the struct ccw1
definition? Even if the kernel does not build format-0 ccws, it makes
sense from a logical POV.

> +
>  /*
>   * Max length for ccw chain.
>   * XXX: Limit to 256, need to check more?
> @@ -243,12 +263,42 @@ static long copy_from_iova(struct device *mdev,
>  	return l;
>  }
> 
> +#ifdef CONFIG_VFIO_CCW_CCW0
> +static long copy_ccw_from_iova(struct channel_program *cp,
> +			       struct ccw1 *to, u64 iova,
> +			       unsigned long len)
> +{
> +	struct ccw0 ccw0;
> +	struct ccw1 *pccw1;
> +	int ret;
> +	int i;
> +
> +	ret = copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
> +	if (ret)
> +		return ret;
> +
> +	if (!cp->orb.cmd.fmt) {
> +		pccw1 = to;
> +		for (i = 0; i < len; i++) {
> +			ccw0 = *(struct ccw0 *)pccw1;
> +			pccw1->cmd_code = ccw0.cmd_code;
> +			pccw1->flags = ccw0.flags;
> +			pccw1->count = ccw0.count;
> +			pccw1->cda = ccw0.cda;
> +			pccw1++;
> +		}

IIRC there are one or two very subtle differences between what format-0
and what format-1 ccws allow (see the ccw interpretation in qemu --
probably easier than combing through the PoP). We should either check
this or add a comment why we don't.

> +	}
> +
> +	return ret;
> +}
> +#else
>  static long copy_ccw_from_iova(struct channel_program *cp,
>  			       struct ccw1 *to, u64 iova,
>  			       unsigned long len)
>  {
>  	return copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
>  }
> +#endif

This would be nicer without #ifdefs :)

> 
>  /*
>   * Helpers to operate ccwchain.
> @@ -616,10 +666,14 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
>  	 * Only support prefetch enable mode now.
>  	 * Only support 64bit addressing idal.
>  	 * Only support 4k IDAW.
> -	 * Only support ccw1.
>  	 */
> -	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k || !orb->cmd.fmt)
> +	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k)
> +		return -EOPNOTSUPP;
> +
> +#ifndef CONFIG_VFIO_CCW_CCW0
> +	if (!orb->cmd.fmt)
>  		return -EOPNOTSUPP;
> +#endif

dito

> 
>  	INIT_LIST_HEAD(&cp->ccwchain_list);
>  	memcpy(&cp->orb, orb, sizeof(*orb));

The "translate format-0 into format-1 ccws" approach is fine, but I
think you need to double check the subtle differences I mentioned.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0
@ 2017-02-20 18:59     ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-02-20 18:59 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Fri, 17 Feb 2017 09:29:39 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> Although Linux does not use format-0 channel command words (CCW0)
> these are a non-optional part of the platform spec, and for the sake
> of platform compliance, and possibly some non-Linux guests, we have
> to support CCW0.
> 
> Making the kernel execute a format 0 channel program is too much hassle
> because we would need to allocate and use memory which can be addressed
> by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
> support by translating the channel program into an equivalent CCW1
> program instead.
> 
> Signed-off-by: Kai Yue Wang <wkywang@linux.vnet.ibm.com>
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> ---
>  arch/s390/Kconfig              |  7 +++++
>  drivers/s390/cio/vfio_ccw_cp.c | 58 ++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 63 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 32008b8..f25d077 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -680,6 +680,13 @@ config VFIO_CCW
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called vfio_ccw.
> 
> +config VFIO_CCW_CCW0
> +	def_bool n
> +	prompt "Support for CCW0 translation"
> +	depends on VFIO_CCW
> +	help
> +	  Enable translation for CCW0 programs for VFIO-CCW subchannels.

Hm... if ccw0 is non-optional for the architecture, why are you making
it optional then?

> +
>  endmenu
> 
>  menu "Dump support"
> diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> index 16bbb54..b0a8bc05 100644
> --- a/drivers/s390/cio/vfio_ccw_cp.c
> +++ b/drivers/s390/cio/vfio_ccw_cp.c
> @@ -15,6 +15,26 @@
> 
>  #include "vfio_ccw_cp.h"
> 
> +#ifdef CONFIG_VFIO_CCW_CCW0
> +/**
> + * struct ccw0 - channel command word
> + * @cmd_code: command code
> + * @cda: data address
> + * @flags: flags, like IDA addressing, etc.
> + * @reserved: will be ignored
> + * @count: byte count
> + *
> + * The format-0 ccw structure.
> + */
> +struct ccw0 {
> +	__u8 cmd_code;
> +	__u32 cda : 24;
> +	__u8  flags;
> +	__u8  reserved;
> +	__u16 count;
> +} __packed __aligned(8);
> +#endif

Would it be better to move this definition next to the struct ccw1
definition? Even if the kernel does not build format-0 ccws, it makes
sense from a logical POV.

> +
>  /*
>   * Max length for ccw chain.
>   * XXX: Limit to 256, need to check more?
> @@ -243,12 +263,42 @@ static long copy_from_iova(struct device *mdev,
>  	return l;
>  }
> 
> +#ifdef CONFIG_VFIO_CCW_CCW0
> +static long copy_ccw_from_iova(struct channel_program *cp,
> +			       struct ccw1 *to, u64 iova,
> +			       unsigned long len)
> +{
> +	struct ccw0 ccw0;
> +	struct ccw1 *pccw1;
> +	int ret;
> +	int i;
> +
> +	ret = copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
> +	if (ret)
> +		return ret;
> +
> +	if (!cp->orb.cmd.fmt) {
> +		pccw1 = to;
> +		for (i = 0; i < len; i++) {
> +			ccw0 = *(struct ccw0 *)pccw1;
> +			pccw1->cmd_code = ccw0.cmd_code;
> +			pccw1->flags = ccw0.flags;
> +			pccw1->count = ccw0.count;
> +			pccw1->cda = ccw0.cda;
> +			pccw1++;
> +		}

IIRC there are one or two very subtle differences between what format-0
and what format-1 ccws allow (see the ccw interpretation in qemu --
probably easier than combing through the PoP). We should either check
this or add a comment why we don't.

> +	}
> +
> +	return ret;
> +}
> +#else
>  static long copy_ccw_from_iova(struct channel_program *cp,
>  			       struct ccw1 *to, u64 iova,
>  			       unsigned long len)
>  {
>  	return copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
>  }
> +#endif

This would be nicer without #ifdefs :)

> 
>  /*
>   * Helpers to operate ccwchain.
> @@ -616,10 +666,14 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
>  	 * Only support prefetch enable mode now.
>  	 * Only support 64bit addressing idal.
>  	 * Only support 4k IDAW.
> -	 * Only support ccw1.
>  	 */
> -	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k || !orb->cmd.fmt)
> +	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k)
> +		return -EOPNOTSUPP;
> +
> +#ifndef CONFIG_VFIO_CCW_CCW0
> +	if (!orb->cmd.fmt)
>  		return -EOPNOTSUPP;
> +#endif

dito

> 
>  	INIT_LIST_HEAD(&cp->ccwchain_list);
>  	memcpy(&cp->orb, orb, sizeof(*orb));

The "translate format-0 into format-1 ccws" approach is fine, but I
think you need to double check the subtle differences I mentioned.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-20 18:31     ` [Qemu-devel] " Cornelia Huck
  (?)
@ 2017-02-21  7:36     ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-21  7:36 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, pasic, kvm, pmorel, qemu-devel, borntraeger,
	alex.williamson, renxiaof, wkywang, Dong Jia Shi, agraf

* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:31:13 +0100]:

> On Fri, 17 Feb 2017 09:29:28 +0100
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > To make vfio support subchannel devices, we need a css driver for
> > the vfio subchannels. This patch adds a basic vfio-ccw subchannel
> > driver for this purpose.
> > 
> > To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
> > and configure VFIO as required.
> > 
> > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> > ---
> >  arch/s390/Kconfig                   |  10 ++
> >  arch/s390/include/asm/isc.h         |   1 +
> >  drivers/iommu/Kconfig               |   8 ++
> >  drivers/s390/cio/Makefile           |   3 +
> >  drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
> >  drivers/s390/cio/vfio_ccw_private.h |  25 ++++
> >  6 files changed, 309 insertions(+)
> >  create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
> >  create mode 100644 drivers/s390/cio/vfio_ccw_private.h
> > 
> > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> > index c6722112..b920df8 100644
> > --- a/arch/s390/Kconfig
> > +++ b/arch/s390/Kconfig
> > @@ -670,6 +670,16 @@ config EADM_SCH
> >  	  To compile this driver as a module, choose M here: the
> >  	  module will be called eadm_sch.
> > 
> > +config VFIO_CCW
> > +	def_tristate n
> > +	prompt "Support for VFIO-CCW subchannels"
> > +	depends on S390_CCW_IOMMU && VFIO
> > +	help
> > +	  This driver allows usage of VFIO-CCW subchannels.
> 
> Hm...
> 
> "This driver allows usage of I/O subchannels via VFIO-CCW."
> 
> ?
> 
This is better. Will change.

> > +
> > +	  To compile this driver as a module, choose M here: the
> > +	  module will be called vfio_ccw.
> > +
> >  endmenu
> > 
> >  menu "Dump support"
> > diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
> > index 68d7d68..8a0b721 100644
> > --- a/arch/s390/include/asm/isc.h
> > +++ b/arch/s390/include/asm/isc.h
> > @@ -16,6 +16,7 @@
> >  #define CONSOLE_ISC 1			/* console I/O subchannel */
> >  #define EADM_SCH_ISC 4			/* EADM subchannels */
> >  #define CHSC_SCH_ISC 7			/* CHSC subchannels */
> > +#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */
> 
> This is OK for now, I guess; but do we want to have the isc
> configurable in the long run? I.e., if a host wants to run its own I/O
> devices at a different priority than the devices it passes to a guest?
> 
I think we can keep this as the default value, and provide a driver
param to customize the ISC value in the future once we need this. I put
this on my LATER list, or I do it in next version?

> >  /* Adapter interrupts. */
> >  #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
> >  #define PCI_ISC 2			/* PCI I/O subchannels */
> 
> (...)
> 
> > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> > new file mode 100644
> > index 0000000..b068207
> > --- /dev/null
> > +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > @@ -0,0 +1,262 @@
> > +/*
> > + * VFIO based Physical Subchannel device driver
> > + *
> > + * Copyright IBM Corp. 2017
> > + *
> > + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > + *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/init.h>
> > +#include <linux/device.h>
> > +#include <linux/slab.h>
> > +
> > +#include <asm/isc.h>
> > +
> > +#include "vfio_ccw_private.h"
> > +
> > +/*
> > + * Helpers
> > + */
> > +static int vfio_ccw_sch_quiesce(struct subchannel *sch)
> > +{
> > +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> > +	DECLARE_COMPLETION_ONSTACK(completion);
> > +	int iretry, ret = 0;
> > +
> > +	spin_lock_irq(sch->lock);
> > +	if (!sch->schib.pmcw.ena)
> > +		goto out_unlock;
> > +	ret = cio_disable_subchannel(sch);
> > +	if (ret != -EBUSY)
> > +		goto out_unlock;
> > +
> > +	do {
> > +		iretry = 255;
> > +
> > +		ret = cio_cancel_halt_clear(sch, &iretry);
> > +		while (ret == -EBUSY) {
> > +			/*
> > +			 * Flushing all I/O and wait the
> 
> "Flush all I/O and wait for..."
> 
Ok.

> > +			 * cancel/halt/clear completion.
> > +			 */
> > +			private->completion = &completion;
> > +			spin_unlock_irq(sch->lock);
> > +
> > +			wait_for_completion(&completion);
> 
> What happens for cancel? It won't generate an interrupt.
> 
Right! How about using:
	wait_for_completion_timeout(&completion, 3*HZ);

(I stole '3*HZ' from ccw_device_kill_io.)

> > +
> > +			spin_lock_irq(sch->lock);
> > +			private->completion = NULL;
> > +			ret = cio_cancel_halt_clear(sch, &iretry);
> > +		};
> > +
> > +		ret = cio_disable_subchannel(sch);
> > +	} while (ret == -EBUSY);
> > +
> > +out_unlock:
> > +	spin_unlock_irq(sch->lock);
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Sysfs interfaces
> > + */
> > +static ssize_t chpids_show(struct device *dev,
> > +			   struct device_attribute *attr,
> > +			   char *buf)
> > +{
> > +	struct subchannel *sch = to_subchannel(dev);
> > +	struct chsc_ssd_info *ssd = &sch->ssd_info;
> > +	ssize_t ret = 0;
> > +	int chp;
> > +	int mask;
> > +
> > +	for (chp = 0; chp < 8; chp++) {
> > +		mask = 0x80 >> chp;
> > +		if (ssd->path_mask & mask)
> > +			ret += sprintf(buf + ret, "%02x ", ssd->chpid[chp].id);
> > +		else
> > +			ret += sprintf(buf + ret, "00 ");
> > +	}
> > +	ret += sprintf(buf+ret, "\n");
> > +	return ret;
> > +}
> > +
> > +static ssize_t pimpampom_show(struct device *dev,
> > +			      struct device_attribute *attr,
> > +			      char *buf)
> > +{
> > +	struct subchannel *sch = to_subchannel(dev);
> > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > +
> > +	return sprintf(buf, "%02x %02x %02x\n",
> > +		       pmcw->pim, pmcw->pam, pmcw->pom);
> > +}
> > +
> > +static DEVICE_ATTR(chpids, 0444, chpids_show, NULL);
> > +static DEVICE_ATTR(pimpampom, 0444, pimpampom_show, NULL);
> 
> Quick question: You need to duplicate these so that lscss shows a sane
> output for vfio-ccw subchannels?
> 
Yes, and also for retrieving these values in QEMU.

> > +
> > +static struct attribute *vfio_subchannel_attrs[] = {
> > +	&dev_attr_chpids.attr,
> > +	&dev_attr_pimpampom.attr,
> > +	NULL,
> > +};
> > +
> > +static struct attribute_group vfio_subchannel_attr_group = {
> > +	.attrs = vfio_subchannel_attrs,
> > +};
> > +
> > +/*
> > + * Css driver callbacks
> > + */
> > +static void vfio_ccw_sch_irq(struct subchannel *sch)
> > +{
> > +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> > +
> > +	inc_irq_stat(IRQIO_CIO);
> > +
> > +	if (!private)
> > +		return;
> > +
> > +	if (private->completion)
> > +		complete(private->completion);
> > +}
> > +
> > +static int vfio_ccw_sch_probe(struct subchannel *sch)
> > +{
> > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > +	struct vfio_ccw_private *private;
> > +	int ret;
> > +
> > +	if (pmcw->qf) {
> > +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
> 
> s/do/does/
> 
Ok.

> > +			 dev_name(&sch->dev));
> > +		return -ENOTTY;
> 
> Is -ENOTTY the right return code here? -EINVAL?
> 
Ok. Think it again. -EINVAL makes more sense. It's like:
"hey, I know it's an I/O subchannel, but not the kind we support".

> > +	}
> > +
> > +	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> > +	if (!private)
> > +		return -ENOMEM;
> > +	private->sch = sch;
> > +	dev_set_drvdata(&sch->dev, private);
> > +
> > +	spin_lock_irq(sch->lock);
> > +	sch->isc = VFIO_CCW_ISC;
> > +	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
> > +	spin_unlock_irq(sch->lock);
> > +	if (ret)
> > +		goto out_free;
> > +
> > +	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
> > +	if (ret)
> > +		goto out_disable;
> > +
> > +	return 0;
> > +
> > +out_disable:
> > +	cio_disable_subchannel(sch);
> > +out_free:
> > +	dev_set_drvdata(&sch->dev, NULL);
> > +	kfree(private);
> > +	return ret;
> > +}
> > +
> 
> (...)
> 
> > +/**
> > + * vfio_ccw_sch_event - process subchannel event
> > + * @sch: subchannel
> > + * @process: non-zero if function is called in process context
> > + *
> > + * An unspecified event occurred for this subchannel. Adjust data according
> > + * to the current operational state of the subchannel. Return zero when the
> > + * event has been handled sufficiently or -EAGAIN when this function should
> > + * be called again in process context.
> > + */
> > +static int vfio_ccw_sch_event(struct subchannel *sch, int process)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(sch->lock, flags);
> > +	if (!device_is_registered(&sch->dev))
> > +		goto out_unlock;
> > +
> > +	if (work_pending(&sch->todo_work))
> > +		goto out_unlock;
> > +
> > +	if (cio_update_schib(sch)) {
> > +		/* Not operational. */
> > +		css_sched_sch_todo(sch, SCH_TODO_UNREG);
> > +
> > +		/*
> > +		 * TODO:
> > +		 * Probably we should send the machine check to the guest.
> 
> Yes, we should do that later on. Will user space notice that the device
> is gone? (I think crw injection should be done by user space.)
> 
Currently we lack this mechanism. I think there are many todos here. I
will investigate latter.

> > +		 */
> > +		goto out_unlock;
> > +	}
> > +
> > +out_unlock:
> > +	spin_unlock_irqrestore(sch->lock, flags);
> > +
> > +	return 0;
> > +}
> 
> (...)
> 
> Looks sane in general from my POV.
Thanks for the comments.

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-20 18:31     ` [Qemu-devel] " Cornelia Huck
  (?)
  (?)
@ 2017-02-21  7:36     ` Dong Jia Shi
  2017-02-21  7:43       ` Dong Jia Shi
                         ` (2 more replies)
  -1 siblings, 3 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-21  7:36 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Dong Jia Shi, kvm, linux-s390, qemu-devel, renxiaof, borntraeger,
	agraf, alex.williamson, pmorel, pasic, wkywang

* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:31:13 +0100]:

> On Fri, 17 Feb 2017 09:29:28 +0100
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > To make vfio support subchannel devices, we need a css driver for
> > the vfio subchannels. This patch adds a basic vfio-ccw subchannel
> > driver for this purpose.
> > 
> > To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
> > and configure VFIO as required.
> > 
> > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> > ---
> >  arch/s390/Kconfig                   |  10 ++
> >  arch/s390/include/asm/isc.h         |   1 +
> >  drivers/iommu/Kconfig               |   8 ++
> >  drivers/s390/cio/Makefile           |   3 +
> >  drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
> >  drivers/s390/cio/vfio_ccw_private.h |  25 ++++
> >  6 files changed, 309 insertions(+)
> >  create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
> >  create mode 100644 drivers/s390/cio/vfio_ccw_private.h
> > 
> > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> > index c6722112..b920df8 100644
> > --- a/arch/s390/Kconfig
> > +++ b/arch/s390/Kconfig
> > @@ -670,6 +670,16 @@ config EADM_SCH
> >  	  To compile this driver as a module, choose M here: the
> >  	  module will be called eadm_sch.
> > 
> > +config VFIO_CCW
> > +	def_tristate n
> > +	prompt "Support for VFIO-CCW subchannels"
> > +	depends on S390_CCW_IOMMU && VFIO
> > +	help
> > +	  This driver allows usage of VFIO-CCW subchannels.
> 
> Hm...
> 
> "This driver allows usage of I/O subchannels via VFIO-CCW."
> 
> ?
> 
This is better. Will change.

> > +
> > +	  To compile this driver as a module, choose M here: the
> > +	  module will be called vfio_ccw.
> > +
> >  endmenu
> > 
> >  menu "Dump support"
> > diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
> > index 68d7d68..8a0b721 100644
> > --- a/arch/s390/include/asm/isc.h
> > +++ b/arch/s390/include/asm/isc.h
> > @@ -16,6 +16,7 @@
> >  #define CONSOLE_ISC 1			/* console I/O subchannel */
> >  #define EADM_SCH_ISC 4			/* EADM subchannels */
> >  #define CHSC_SCH_ISC 7			/* CHSC subchannels */
> > +#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */
> 
> This is OK for now, I guess; but do we want to have the isc
> configurable in the long run? I.e., if a host wants to run its own I/O
> devices at a different priority than the devices it passes to a guest?
> 
I think we can keep this as the default value, and provide a driver
param to customize the ISC value in the future once we need this. I put
this on my LATER list, or I do it in next version?

> >  /* Adapter interrupts. */
> >  #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
> >  #define PCI_ISC 2			/* PCI I/O subchannels */
> 
> (...)
> 
> > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> > new file mode 100644
> > index 0000000..b068207
> > --- /dev/null
> > +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > @@ -0,0 +1,262 @@
> > +/*
> > + * VFIO based Physical Subchannel device driver
> > + *
> > + * Copyright IBM Corp. 2017
> > + *
> > + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > + *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/init.h>
> > +#include <linux/device.h>
> > +#include <linux/slab.h>
> > +
> > +#include <asm/isc.h>
> > +
> > +#include "vfio_ccw_private.h"
> > +
> > +/*
> > + * Helpers
> > + */
> > +static int vfio_ccw_sch_quiesce(struct subchannel *sch)
> > +{
> > +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> > +	DECLARE_COMPLETION_ONSTACK(completion);
> > +	int iretry, ret = 0;
> > +
> > +	spin_lock_irq(sch->lock);
> > +	if (!sch->schib.pmcw.ena)
> > +		goto out_unlock;
> > +	ret = cio_disable_subchannel(sch);
> > +	if (ret != -EBUSY)
> > +		goto out_unlock;
> > +
> > +	do {
> > +		iretry = 255;
> > +
> > +		ret = cio_cancel_halt_clear(sch, &iretry);
> > +		while (ret == -EBUSY) {
> > +			/*
> > +			 * Flushing all I/O and wait the
> 
> "Flush all I/O and wait for..."
> 
Ok.

> > +			 * cancel/halt/clear completion.
> > +			 */
> > +			private->completion = &completion;
> > +			spin_unlock_irq(sch->lock);
> > +
> > +			wait_for_completion(&completion);
> 
> What happens for cancel? It won't generate an interrupt.
> 
Right! How about using:
	wait_for_completion_timeout(&completion, 3*HZ);

(I stole '3*HZ' from ccw_device_kill_io.)

> > +
> > +			spin_lock_irq(sch->lock);
> > +			private->completion = NULL;
> > +			ret = cio_cancel_halt_clear(sch, &iretry);
> > +		};
> > +
> > +		ret = cio_disable_subchannel(sch);
> > +	} while (ret == -EBUSY);
> > +
> > +out_unlock:
> > +	spin_unlock_irq(sch->lock);
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Sysfs interfaces
> > + */
> > +static ssize_t chpids_show(struct device *dev,
> > +			   struct device_attribute *attr,
> > +			   char *buf)
> > +{
> > +	struct subchannel *sch = to_subchannel(dev);
> > +	struct chsc_ssd_info *ssd = &sch->ssd_info;
> > +	ssize_t ret = 0;
> > +	int chp;
> > +	int mask;
> > +
> > +	for (chp = 0; chp < 8; chp++) {
> > +		mask = 0x80 >> chp;
> > +		if (ssd->path_mask & mask)
> > +			ret += sprintf(buf + ret, "%02x ", ssd->chpid[chp].id);
> > +		else
> > +			ret += sprintf(buf + ret, "00 ");
> > +	}
> > +	ret += sprintf(buf+ret, "\n");
> > +	return ret;
> > +}
> > +
> > +static ssize_t pimpampom_show(struct device *dev,
> > +			      struct device_attribute *attr,
> > +			      char *buf)
> > +{
> > +	struct subchannel *sch = to_subchannel(dev);
> > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > +
> > +	return sprintf(buf, "%02x %02x %02x\n",
> > +		       pmcw->pim, pmcw->pam, pmcw->pom);
> > +}
> > +
> > +static DEVICE_ATTR(chpids, 0444, chpids_show, NULL);
> > +static DEVICE_ATTR(pimpampom, 0444, pimpampom_show, NULL);
> 
> Quick question: You need to duplicate these so that lscss shows a sane
> output for vfio-ccw subchannels?
> 
Yes, and also for retrieving these values in QEMU.

> > +
> > +static struct attribute *vfio_subchannel_attrs[] = {
> > +	&dev_attr_chpids.attr,
> > +	&dev_attr_pimpampom.attr,
> > +	NULL,
> > +};
> > +
> > +static struct attribute_group vfio_subchannel_attr_group = {
> > +	.attrs = vfio_subchannel_attrs,
> > +};
> > +
> > +/*
> > + * Css driver callbacks
> > + */
> > +static void vfio_ccw_sch_irq(struct subchannel *sch)
> > +{
> > +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> > +
> > +	inc_irq_stat(IRQIO_CIO);
> > +
> > +	if (!private)
> > +		return;
> > +
> > +	if (private->completion)
> > +		complete(private->completion);
> > +}
> > +
> > +static int vfio_ccw_sch_probe(struct subchannel *sch)
> > +{
> > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > +	struct vfio_ccw_private *private;
> > +	int ret;
> > +
> > +	if (pmcw->qf) {
> > +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
> 
> s/do/does/
> 
Ok.

> > +			 dev_name(&sch->dev));
> > +		return -ENOTTY;
> 
> Is -ENOTTY the right return code here? -EINVAL?
> 
Ok. Think it again. -EINVAL makes more sense. It's like:
"hey, I know it's an I/O subchannel, but not the kind we support".

> > +	}
> > +
> > +	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> > +	if (!private)
> > +		return -ENOMEM;
> > +	private->sch = sch;
> > +	dev_set_drvdata(&sch->dev, private);
> > +
> > +	spin_lock_irq(sch->lock);
> > +	sch->isc = VFIO_CCW_ISC;
> > +	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
> > +	spin_unlock_irq(sch->lock);
> > +	if (ret)
> > +		goto out_free;
> > +
> > +	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
> > +	if (ret)
> > +		goto out_disable;
> > +
> > +	return 0;
> > +
> > +out_disable:
> > +	cio_disable_subchannel(sch);
> > +out_free:
> > +	dev_set_drvdata(&sch->dev, NULL);
> > +	kfree(private);
> > +	return ret;
> > +}
> > +
> 
> (...)
> 
> > +/**
> > + * vfio_ccw_sch_event - process subchannel event
> > + * @sch: subchannel
> > + * @process: non-zero if function is called in process context
> > + *
> > + * An unspecified event occurred for this subchannel. Adjust data according
> > + * to the current operational state of the subchannel. Return zero when the
> > + * event has been handled sufficiently or -EAGAIN when this function should
> > + * be called again in process context.
> > + */
> > +static int vfio_ccw_sch_event(struct subchannel *sch, int process)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(sch->lock, flags);
> > +	if (!device_is_registered(&sch->dev))
> > +		goto out_unlock;
> > +
> > +	if (work_pending(&sch->todo_work))
> > +		goto out_unlock;
> > +
> > +	if (cio_update_schib(sch)) {
> > +		/* Not operational. */
> > +		css_sched_sch_todo(sch, SCH_TODO_UNREG);
> > +
> > +		/*
> > +		 * TODO:
> > +		 * Probably we should send the machine check to the guest.
> 
> Yes, we should do that later on. Will user space notice that the device
> is gone? (I think crw injection should be done by user space.)
> 
Currently we lack this mechanism. I think there are many todos here. I
will investigate latter.

> > +		 */
> > +		goto out_unlock;
> > +	}
> > +
> > +out_unlock:
> > +	spin_unlock_irqrestore(sch->lock, flags);
> > +
> > +	return 0;
> > +}
> 
> (...)
> 
> Looks sane in general from my POV.
Thanks for the comments.

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-21  7:36     ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-21  7:43       ` Dong Jia Shi
  2017-02-21  7:43       ` [Qemu-devel] " Dong Jia Shi
  2017-02-21 15:43         ` [Qemu-devel] " Cornelia Huck
  2 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-21  7:43 UTC (permalink / raw)
  To: Cornelia Huck, Dong Jia Shi, kvm, linux-s390, qemu-devel,
	renxiaof, borntraeger, agraf, alex.williamson, pmorel, pasic,
	wkywang

* Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2017-02-21 15:36:23 +0800]:

[...]
> > > +static int vfio_ccw_sch_probe(struct subchannel *sch)
> > > +{
> > > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > > +	struct vfio_ccw_private *private;
> > > +	int ret;
> > > +
> > > +	if (pmcw->qf) {
> > > +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
> > 
> > s/do/does/
> > 
> Ok.
> 
> > > +			 dev_name(&sch->dev));
> > > +		return -ENOTTY;
> > 
> > Is -ENOTTY the right return code here? -EINVAL?
> > 
> Ok. Think it again. -EINVAL makes more sense. It's like:
> "hey, I know it's an I/O subchannel, but not the kind we support".
> 
Or -ENOTSUPP ?
> > > +	}
> > > +
[...]

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-21  7:36     ` [Qemu-devel] " Dong Jia Shi
  2017-02-21  7:43       ` Dong Jia Shi
@ 2017-02-21  7:43       ` Dong Jia Shi
  2017-02-21 15:43         ` [Qemu-devel] " Cornelia Huck
  2 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-21  7:43 UTC (permalink / raw)
  To: Cornelia Huck, Dong Jia Shi, kvm, linux-s390, qemu-devel,
	renxiaof, borntraeger, agraf, alex.williamson, pmorel, pasic,
	wkywang

* Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2017-02-21 15:36:23 +0800]:

[...]
> > > +static int vfio_ccw_sch_probe(struct subchannel *sch)
> > > +{
> > > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > > +	struct vfio_ccw_private *private;
> > > +	int ret;
> > > +
> > > +	if (pmcw->qf) {
> > > +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
> > 
> > s/do/does/
> > 
> Ok.
> 
> > > +			 dev_name(&sch->dev));
> > > +		return -ENOTTY;
> > 
> > Is -ENOTTY the right return code here? -EINVAL?
> > 
> Ok. Think it again. -EINVAL makes more sense. It's like:
> "hey, I know it's an I/O subchannel, but not the kind we support".
> 
Or -ENOTSUPP ?
> > > +	}
> > > +
[...]

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0
  2017-02-20 18:59     ` [Qemu-devel] " Cornelia Huck
  (?)
@ 2017-02-21  8:58     ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-21  8:58 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, pasic, kvm, pmorel, qemu-devel, borntraeger,
	alex.williamson, renxiaof, wkywang, Dong Jia Shi, agraf

* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:59:01 +0100]:

> On Fri, 17 Feb 2017 09:29:39 +0100
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > Although Linux does not use format-0 channel command words (CCW0)
> > these are a non-optional part of the platform spec, and for the sake
> > of platform compliance, and possibly some non-Linux guests, we have
> > to support CCW0.
> > 
> > Making the kernel execute a format 0 channel program is too much hassle
> > because we would need to allocate and use memory which can be addressed
> > by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
> > support by translating the channel program into an equivalent CCW1
> > program instead.
> > 
> > Signed-off-by: Kai Yue Wang <wkywang@linux.vnet.ibm.com>
> > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > ---
> >  arch/s390/Kconfig              |  7 +++++
> >  drivers/s390/cio/vfio_ccw_cp.c | 58 ++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 63 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> > index 32008b8..f25d077 100644
> > --- a/arch/s390/Kconfig
> > +++ b/arch/s390/Kconfig
> > @@ -680,6 +680,13 @@ config VFIO_CCW
> >  	  To compile this driver as a module, choose M here: the
> >  	  module will be called vfio_ccw.
> > 
> > +config VFIO_CCW_CCW0
> > +	def_bool n
> > +	prompt "Support for CCW0 translation"
> > +	depends on VFIO_CCW
> > +	help
> > +	  Enable translation for CCW0 programs for VFIO-CCW subchannels.
> 
> Hm... if ccw0 is non-optional for the architecture, why are you making
> it optional then?
> 
Yes, this is an unnecessary move. I overthought to give the user the
freedom to make their own choice. Will remove this.

> > +
> >  endmenu
> > 
> >  menu "Dump support"
> > diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> > index 16bbb54..b0a8bc05 100644
> > --- a/drivers/s390/cio/vfio_ccw_cp.c
> > +++ b/drivers/s390/cio/vfio_ccw_cp.c
> > @@ -15,6 +15,26 @@
> > 
> >  #include "vfio_ccw_cp.h"
> > 
> > +#ifdef CONFIG_VFIO_CCW_CCW0
> > +/**
> > + * struct ccw0 - channel command word
> > + * @cmd_code: command code
> > + * @cda: data address
> > + * @flags: flags, like IDA addressing, etc.
> > + * @reserved: will be ignored
> > + * @count: byte count
> > + *
> > + * The format-0 ccw structure.
> > + */
> > +struct ccw0 {
> > +	__u8 cmd_code;
> > +	__u32 cda : 24;
> > +	__u8  flags;
> > +	__u8  reserved;
> > +	__u16 count;
> > +} __packed __aligned(8);
> > +#endif
> 
> Would it be better to move this definition next to the struct ccw1
> definition? Even if the kernel does not build format-0 ccws, it makes
> sense from a logical POV.
> 
Ok.

> > +
> >  /*
> >   * Max length for ccw chain.
> >   * XXX: Limit to 256, need to check more?
> > @@ -243,12 +263,42 @@ static long copy_from_iova(struct device *mdev,
> >  	return l;
> >  }
> > 
> > +#ifdef CONFIG_VFIO_CCW_CCW0
> > +static long copy_ccw_from_iova(struct channel_program *cp,
> > +			       struct ccw1 *to, u64 iova,
> > +			       unsigned long len)
> > +{
> > +	struct ccw0 ccw0;
> > +	struct ccw1 *pccw1;
> > +	int ret;
> > +	int i;
> > +
> > +	ret = copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (!cp->orb.cmd.fmt) {
> > +		pccw1 = to;
> > +		for (i = 0; i < len; i++) {
> > +			ccw0 = *(struct ccw0 *)pccw1;
> > +			pccw1->cmd_code = ccw0.cmd_code;
> > +			pccw1->flags = ccw0.flags;
> > +			pccw1->count = ccw0.count;
> > +			pccw1->cda = ccw0.cda;
> > +			pccw1++;
> > +		}
> 
> IIRC there are one or two very subtle differences between what format-0
> and what format-1 ccws allow (see the ccw interpretation in qemu --
> probably easier than combing through the PoP). We should either check
> this or add a comment why we don't.
> 
Cool! Thanks for pointing out this!

Transfer in Channel, 15-73, PoP:
--------------------8<-----------------------------
The contents of the second half of the format-0 CCW,
bit positions 32-63, are ignored. Similarly, the con-
tents of bit positions 0-3 of the format-0 CCW are
ignored.

Bit positions 0-3 and 8-32 of the format-1 CCW must
contain zeros; otherwise, a program-check condition
is generated.
-------------------->8-----------------------------

I will fix according to the above description.

I also checked the code in qemu. It seems that it is not compliant with
the second paragraph above. Could I send out a separated patch to fix
that?

> > +	}
> > +
> > +	return ret;
> > +}
> > +#else
> >  static long copy_ccw_from_iova(struct channel_program *cp,
> >  			       struct ccw1 *to, u64 iova,
> >  			       unsigned long len)
> >  {
> >  	return copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
> >  }
> > +#endif
> 
> This would be nicer without #ifdefs :)
> 
No problem. Since I will remove CONFIG_VFIO_CCW_CCW0, this will
disappear either.

> > 
> >  /*
> >   * Helpers to operate ccwchain.
> > @@ -616,10 +666,14 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
> >  	 * Only support prefetch enable mode now.
> >  	 * Only support 64bit addressing idal.
> >  	 * Only support 4k IDAW.
> > -	 * Only support ccw1.
> >  	 */
> > -	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k || !orb->cmd.fmt)
> > +	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k)
> > +		return -EOPNOTSUPP;
> > +
> > +#ifndef CONFIG_VFIO_CCW_CCW0
> > +	if (!orb->cmd.fmt)
> >  		return -EOPNOTSUPP;
> > +#endif
> 
> dito
> 
Ok.

> > 
> >  	INIT_LIST_HEAD(&cp->ccwchain_list);
> >  	memcpy(&cp->orb, orb, sizeof(*orb));
> 
> The "translate format-0 into format-1 ccws" approach is fine, but I
> think you need to double check the subtle differences I mentioned.
Thansk for the comments. Will do!

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0
  2017-02-20 18:59     ` [Qemu-devel] " Cornelia Huck
  (?)
  (?)
@ 2017-02-21  8:58     ` Dong Jia Shi
  2017-02-21 15:47         ` [Qemu-devel] " Cornelia Huck
  -1 siblings, 1 reply; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-21  8:58 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Dong Jia Shi, kvm, linux-s390, qemu-devel, renxiaof, borntraeger,
	agraf, alex.williamson, pmorel, pasic, wkywang

* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:59:01 +0100]:

> On Fri, 17 Feb 2017 09:29:39 +0100
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > Although Linux does not use format-0 channel command words (CCW0)
> > these are a non-optional part of the platform spec, and for the sake
> > of platform compliance, and possibly some non-Linux guests, we have
> > to support CCW0.
> > 
> > Making the kernel execute a format 0 channel program is too much hassle
> > because we would need to allocate and use memory which can be addressed
> > by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
> > support by translating the channel program into an equivalent CCW1
> > program instead.
> > 
> > Signed-off-by: Kai Yue Wang <wkywang@linux.vnet.ibm.com>
> > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > ---
> >  arch/s390/Kconfig              |  7 +++++
> >  drivers/s390/cio/vfio_ccw_cp.c | 58 ++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 63 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> > index 32008b8..f25d077 100644
> > --- a/arch/s390/Kconfig
> > +++ b/arch/s390/Kconfig
> > @@ -680,6 +680,13 @@ config VFIO_CCW
> >  	  To compile this driver as a module, choose M here: the
> >  	  module will be called vfio_ccw.
> > 
> > +config VFIO_CCW_CCW0
> > +	def_bool n
> > +	prompt "Support for CCW0 translation"
> > +	depends on VFIO_CCW
> > +	help
> > +	  Enable translation for CCW0 programs for VFIO-CCW subchannels.
> 
> Hm... if ccw0 is non-optional for the architecture, why are you making
> it optional then?
> 
Yes, this is an unnecessary move. I overthought to give the user the
freedom to make their own choice. Will remove this.

> > +
> >  endmenu
> > 
> >  menu "Dump support"
> > diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c
> > index 16bbb54..b0a8bc05 100644
> > --- a/drivers/s390/cio/vfio_ccw_cp.c
> > +++ b/drivers/s390/cio/vfio_ccw_cp.c
> > @@ -15,6 +15,26 @@
> > 
> >  #include "vfio_ccw_cp.h"
> > 
> > +#ifdef CONFIG_VFIO_CCW_CCW0
> > +/**
> > + * struct ccw0 - channel command word
> > + * @cmd_code: command code
> > + * @cda: data address
> > + * @flags: flags, like IDA addressing, etc.
> > + * @reserved: will be ignored
> > + * @count: byte count
> > + *
> > + * The format-0 ccw structure.
> > + */
> > +struct ccw0 {
> > +	__u8 cmd_code;
> > +	__u32 cda : 24;
> > +	__u8  flags;
> > +	__u8  reserved;
> > +	__u16 count;
> > +} __packed __aligned(8);
> > +#endif
> 
> Would it be better to move this definition next to the struct ccw1
> definition? Even if the kernel does not build format-0 ccws, it makes
> sense from a logical POV.
> 
Ok.

> > +
> >  /*
> >   * Max length for ccw chain.
> >   * XXX: Limit to 256, need to check more?
> > @@ -243,12 +263,42 @@ static long copy_from_iova(struct device *mdev,
> >  	return l;
> >  }
> > 
> > +#ifdef CONFIG_VFIO_CCW_CCW0
> > +static long copy_ccw_from_iova(struct channel_program *cp,
> > +			       struct ccw1 *to, u64 iova,
> > +			       unsigned long len)
> > +{
> > +	struct ccw0 ccw0;
> > +	struct ccw1 *pccw1;
> > +	int ret;
> > +	int i;
> > +
> > +	ret = copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (!cp->orb.cmd.fmt) {
> > +		pccw1 = to;
> > +		for (i = 0; i < len; i++) {
> > +			ccw0 = *(struct ccw0 *)pccw1;
> > +			pccw1->cmd_code = ccw0.cmd_code;
> > +			pccw1->flags = ccw0.flags;
> > +			pccw1->count = ccw0.count;
> > +			pccw1->cda = ccw0.cda;
> > +			pccw1++;
> > +		}
> 
> IIRC there are one or two very subtle differences between what format-0
> and what format-1 ccws allow (see the ccw interpretation in qemu --
> probably easier than combing through the PoP). We should either check
> this or add a comment why we don't.
> 
Cool! Thanks for pointing out this!

Transfer in Channel, 15-73, PoP:
--------------------8<-----------------------------
The contents of the second half of the format-0 CCW,
bit positions 32-63, are ignored. Similarly, the con-
tents of bit positions 0-3 of the format-0 CCW are
ignored.

Bit positions 0-3 and 8-32 of the format-1 CCW must
contain zeros; otherwise, a program-check condition
is generated.
-------------------->8-----------------------------

I will fix according to the above description.

I also checked the code in qemu. It seems that it is not compliant with
the second paragraph above. Could I send out a separated patch to fix
that?

> > +	}
> > +
> > +	return ret;
> > +}
> > +#else
> >  static long copy_ccw_from_iova(struct channel_program *cp,
> >  			       struct ccw1 *to, u64 iova,
> >  			       unsigned long len)
> >  {
> >  	return copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
> >  }
> > +#endif
> 
> This would be nicer without #ifdefs :)
> 
No problem. Since I will remove CONFIG_VFIO_CCW_CCW0, this will
disappear either.

> > 
> >  /*
> >   * Helpers to operate ccwchain.
> > @@ -616,10 +666,14 @@ int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb)
> >  	 * Only support prefetch enable mode now.
> >  	 * Only support 64bit addressing idal.
> >  	 * Only support 4k IDAW.
> > -	 * Only support ccw1.
> >  	 */
> > -	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k || !orb->cmd.fmt)
> > +	if (!orb->cmd.pfch || !orb->cmd.c64 || orb->cmd.i2k)
> > +		return -EOPNOTSUPP;
> > +
> > +#ifndef CONFIG_VFIO_CCW_CCW0
> > +	if (!orb->cmd.fmt)
> >  		return -EOPNOTSUPP;
> > +#endif
> 
> dito
> 
Ok.

> > 
> >  	INIT_LIST_HEAD(&cp->ccwchain_list);
> >  	memcpy(&cp->orb, orb, sizeof(*orb));
> 
> The "translate format-0 into format-1 ccws" approach is fine, but I
> think you need to double check the subtle differences I mentioned.
Thansk for the comments. Will do!

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-21  7:36     ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-21 15:43         ` Cornelia Huck
  2017-02-21  7:43       ` [Qemu-devel] " Dong Jia Shi
  2017-02-21 15:43         ` [Qemu-devel] " Cornelia Huck
  2 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-02-21 15:43 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Tue, 21 Feb 2017 15:36:23 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:31:13 +0100]:
> 
> > On Fri, 17 Feb 2017 09:29:28 +0100
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> > 
> > > To make vfio support subchannel devices, we need a css driver for
> > > the vfio subchannels. This patch adds a basic vfio-ccw subchannel
> > > driver for this purpose.
> > > 
> > > To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
> > > and configure VFIO as required.
> > > 
> > > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> > > ---
> > >  arch/s390/Kconfig                   |  10 ++
> > >  arch/s390/include/asm/isc.h         |   1 +
> > >  drivers/iommu/Kconfig               |   8 ++
> > >  drivers/s390/cio/Makefile           |   3 +
> > >  drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
> > >  drivers/s390/cio/vfio_ccw_private.h |  25 ++++
> > >  6 files changed, 309 insertions(+)
> > >  create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
> > >  create mode 100644 drivers/s390/cio/vfio_ccw_private.h

> > > diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
> > > index 68d7d68..8a0b721 100644
> > > --- a/arch/s390/include/asm/isc.h
> > > +++ b/arch/s390/include/asm/isc.h
> > > @@ -16,6 +16,7 @@
> > >  #define CONSOLE_ISC 1			/* console I/O subchannel */
> > >  #define EADM_SCH_ISC 4			/* EADM subchannels */
> > >  #define CHSC_SCH_ISC 7			/* CHSC subchannels */
> > > +#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */
> > 
> > This is OK for now, I guess; but do we want to have the isc
> > configurable in the long run? I.e., if a host wants to run its own I/O
> > devices at a different priority than the devices it passes to a guest?
> > 
> I think we can keep this as the default value, and provide a driver
> param to customize the ISC value in the future once we need this. I put
> this on my LATER list, or I do it in next version?

This should a per-device knob, I think. It's ok to default to the
normal I/O subchannel isc.

We probably can add this at a later time when we're clear about the
requirements.

> 
> > >  /* Adapter interrupts. */
> > >  #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
> > >  #define PCI_ISC 2			/* PCI I/O subchannels */
> > 
> > (...)
> > 
> > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> > > new file mode 100644
> > > index 0000000..b068207
> > > --- /dev/null
> > > +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > > @@ -0,0 +1,262 @@
> > > +/*
> > > + * VFIO based Physical Subchannel device driver
> > > + *
> > > + * Copyright IBM Corp. 2017
> > > + *
> > > + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > + *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> > > + */
> > > +
> > > +#include <linux/module.h>
> > > +#include <linux/init.h>
> > > +#include <linux/device.h>
> > > +#include <linux/slab.h>
> > > +
> > > +#include <asm/isc.h>
> > > +
> > > +#include "vfio_ccw_private.h"
> > > +
> > > +/*
> > > + * Helpers
> > > + */
> > > +static int vfio_ccw_sch_quiesce(struct subchannel *sch)
> > > +{
> > > +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> > > +	DECLARE_COMPLETION_ONSTACK(completion);
> > > +	int iretry, ret = 0;
> > > +
> > > +	spin_lock_irq(sch->lock);
> > > +	if (!sch->schib.pmcw.ena)
> > > +		goto out_unlock;
> > > +	ret = cio_disable_subchannel(sch);
> > > +	if (ret != -EBUSY)
> > > +		goto out_unlock;
> > > +
> > > +	do {
> > > +		iretry = 255;
> > > +
> > > +		ret = cio_cancel_halt_clear(sch, &iretry);
> > > +		while (ret == -EBUSY) {
> > > +			/*
> > > +			 * Flushing all I/O and wait the
> > 
> > "Flush all I/O and wait for..."
> > 
> Ok.
> 
> > > +			 * cancel/halt/clear completion.
> > > +			 */
> > > +			private->completion = &completion;
> > > +			spin_unlock_irq(sch->lock);
> > > +
> > > +			wait_for_completion(&completion);
> > 
> > What happens for cancel? It won't generate an interrupt.
> > 
> Right! How about using:
> 	wait_for_completion_timeout(&completion, 3*HZ);
> 
> (I stole '3*HZ' from ccw_device_kill_io.)

That's likely a good place to steal from :)

> 
> > > +
> > > +			spin_lock_irq(sch->lock);
> > > +			private->completion = NULL;
> > > +			ret = cio_cancel_halt_clear(sch, &iretry);
> > > +		};
> > > +
> > > +		ret = cio_disable_subchannel(sch);
> > > +	} while (ret == -EBUSY);
> > > +
> > > +out_unlock:
> > > +	spin_unlock_irq(sch->lock);
> > > +	return ret;
> > > +}

> > > +static int vfio_ccw_sch_probe(struct subchannel *sch)
> > > +{
> > > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > > +	struct vfio_ccw_private *private;
> > > +	int ret;
> > > +
> > > +	if (pmcw->qf) {
> > > +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
> > 
> > s/do/does/
> > 
> Ok.
> 
> > > +			 dev_name(&sch->dev));
> > > +		return -ENOTTY;
> > 
> > Is -ENOTTY the right return code here? -EINVAL?
> > 
> Ok. Think it again. -EINVAL makes more sense. It's like:
> "hey, I know it's an I/O subchannel, but not the kind we support".

The driver core treats -ENODEV/-ENXIO as "driver matched, but rejected
the device". That's probably better, as we can't filter on device types
when binding at the subchannel level.

> 
> > > +	}
> > > +
> > > +	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> > > +	if (!private)
> > > +		return -ENOMEM;
> > > +	private->sch = sch;
> > > +	dev_set_drvdata(&sch->dev, private);
> > > +
> > > +	spin_lock_irq(sch->lock);
> > > +	sch->isc = VFIO_CCW_ISC;
> > > +	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
> > > +	spin_unlock_irq(sch->lock);
> > > +	if (ret)
> > > +		goto out_free;
> > > +
> > > +	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
> > > +	if (ret)
> > > +		goto out_disable;
> > > +
> > > +	return 0;
> > > +
> > > +out_disable:
> > > +	cio_disable_subchannel(sch);
> > > +out_free:
> > > +	dev_set_drvdata(&sch->dev, NULL);
> > > +	kfree(private);
> > > +	return ret;
> > > +}
> > > +
> > 
> > (...)
> > 
> > > +/**
> > > + * vfio_ccw_sch_event - process subchannel event
> > > + * @sch: subchannel
> > > + * @process: non-zero if function is called in process context
> > > + *
> > > + * An unspecified event occurred for this subchannel. Adjust data according
> > > + * to the current operational state of the subchannel. Return zero when the
> > > + * event has been handled sufficiently or -EAGAIN when this function should
> > > + * be called again in process context.
> > > + */
> > > +static int vfio_ccw_sch_event(struct subchannel *sch, int process)
> > > +{
> > > +	unsigned long flags;
> > > +
> > > +	spin_lock_irqsave(sch->lock, flags);
> > > +	if (!device_is_registered(&sch->dev))
> > > +		goto out_unlock;
> > > +
> > > +	if (work_pending(&sch->todo_work))
> > > +		goto out_unlock;
> > > +
> > > +	if (cio_update_schib(sch)) {
> > > +		/* Not operational. */
> > > +		css_sched_sch_todo(sch, SCH_TODO_UNREG);
> > > +
> > > +		/*
> > > +		 * TODO:
> > > +		 * Probably we should send the machine check to the guest.
> > 
> > Yes, we should do that later on. Will user space notice that the device
> > is gone? (I think crw injection should be done by user space.)
> > 
> Currently we lack this mechanism. I think there are many todos here. I
> will investigate latter.

Yes. We just need to keep that in mind for later.

> 
> > > +		 */
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +out_unlock:
> > > +	spin_unlock_irqrestore(sch->lock, flags);
> > > +
> > > +	return 0;
> > > +}

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
@ 2017-02-21 15:43         ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-02-21 15:43 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Tue, 21 Feb 2017 15:36:23 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:31:13 +0100]:
> 
> > On Fri, 17 Feb 2017 09:29:28 +0100
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> > 
> > > To make vfio support subchannel devices, we need a css driver for
> > > the vfio subchannels. This patch adds a basic vfio-ccw subchannel
> > > driver for this purpose.
> > > 
> > > To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
> > > and configure VFIO as required.
> > > 
> > > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> > > ---
> > >  arch/s390/Kconfig                   |  10 ++
> > >  arch/s390/include/asm/isc.h         |   1 +
> > >  drivers/iommu/Kconfig               |   8 ++
> > >  drivers/s390/cio/Makefile           |   3 +
> > >  drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
> > >  drivers/s390/cio/vfio_ccw_private.h |  25 ++++
> > >  6 files changed, 309 insertions(+)
> > >  create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
> > >  create mode 100644 drivers/s390/cio/vfio_ccw_private.h

> > > diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
> > > index 68d7d68..8a0b721 100644
> > > --- a/arch/s390/include/asm/isc.h
> > > +++ b/arch/s390/include/asm/isc.h
> > > @@ -16,6 +16,7 @@
> > >  #define CONSOLE_ISC 1			/* console I/O subchannel */
> > >  #define EADM_SCH_ISC 4			/* EADM subchannels */
> > >  #define CHSC_SCH_ISC 7			/* CHSC subchannels */
> > > +#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */
> > 
> > This is OK for now, I guess; but do we want to have the isc
> > configurable in the long run? I.e., if a host wants to run its own I/O
> > devices at a different priority than the devices it passes to a guest?
> > 
> I think we can keep this as the default value, and provide a driver
> param to customize the ISC value in the future once we need this. I put
> this on my LATER list, or I do it in next version?

This should a per-device knob, I think. It's ok to default to the
normal I/O subchannel isc.

We probably can add this at a later time when we're clear about the
requirements.

> 
> > >  /* Adapter interrupts. */
> > >  #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
> > >  #define PCI_ISC 2			/* PCI I/O subchannels */
> > 
> > (...)
> > 
> > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> > > new file mode 100644
> > > index 0000000..b068207
> > > --- /dev/null
> > > +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > > @@ -0,0 +1,262 @@
> > > +/*
> > > + * VFIO based Physical Subchannel device driver
> > > + *
> > > + * Copyright IBM Corp. 2017
> > > + *
> > > + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > + *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> > > + */
> > > +
> > > +#include <linux/module.h>
> > > +#include <linux/init.h>
> > > +#include <linux/device.h>
> > > +#include <linux/slab.h>
> > > +
> > > +#include <asm/isc.h>
> > > +
> > > +#include "vfio_ccw_private.h"
> > > +
> > > +/*
> > > + * Helpers
> > > + */
> > > +static int vfio_ccw_sch_quiesce(struct subchannel *sch)
> > > +{
> > > +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> > > +	DECLARE_COMPLETION_ONSTACK(completion);
> > > +	int iretry, ret = 0;
> > > +
> > > +	spin_lock_irq(sch->lock);
> > > +	if (!sch->schib.pmcw.ena)
> > > +		goto out_unlock;
> > > +	ret = cio_disable_subchannel(sch);
> > > +	if (ret != -EBUSY)
> > > +		goto out_unlock;
> > > +
> > > +	do {
> > > +		iretry = 255;
> > > +
> > > +		ret = cio_cancel_halt_clear(sch, &iretry);
> > > +		while (ret == -EBUSY) {
> > > +			/*
> > > +			 * Flushing all I/O and wait the
> > 
> > "Flush all I/O and wait for..."
> > 
> Ok.
> 
> > > +			 * cancel/halt/clear completion.
> > > +			 */
> > > +			private->completion = &completion;
> > > +			spin_unlock_irq(sch->lock);
> > > +
> > > +			wait_for_completion(&completion);
> > 
> > What happens for cancel? It won't generate an interrupt.
> > 
> Right! How about using:
> 	wait_for_completion_timeout(&completion, 3*HZ);
> 
> (I stole '3*HZ' from ccw_device_kill_io.)

That's likely a good place to steal from :)

> 
> > > +
> > > +			spin_lock_irq(sch->lock);
> > > +			private->completion = NULL;
> > > +			ret = cio_cancel_halt_clear(sch, &iretry);
> > > +		};
> > > +
> > > +		ret = cio_disable_subchannel(sch);
> > > +	} while (ret == -EBUSY);
> > > +
> > > +out_unlock:
> > > +	spin_unlock_irq(sch->lock);
> > > +	return ret;
> > > +}

> > > +static int vfio_ccw_sch_probe(struct subchannel *sch)
> > > +{
> > > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > > +	struct vfio_ccw_private *private;
> > > +	int ret;
> > > +
> > > +	if (pmcw->qf) {
> > > +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
> > 
> > s/do/does/
> > 
> Ok.
> 
> > > +			 dev_name(&sch->dev));
> > > +		return -ENOTTY;
> > 
> > Is -ENOTTY the right return code here? -EINVAL?
> > 
> Ok. Think it again. -EINVAL makes more sense. It's like:
> "hey, I know it's an I/O subchannel, but not the kind we support".

The driver core treats -ENODEV/-ENXIO as "driver matched, but rejected
the device". That's probably better, as we can't filter on device types
when binding at the subchannel level.

> 
> > > +	}
> > > +
> > > +	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> > > +	if (!private)
> > > +		return -ENOMEM;
> > > +	private->sch = sch;
> > > +	dev_set_drvdata(&sch->dev, private);
> > > +
> > > +	spin_lock_irq(sch->lock);
> > > +	sch->isc = VFIO_CCW_ISC;
> > > +	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
> > > +	spin_unlock_irq(sch->lock);
> > > +	if (ret)
> > > +		goto out_free;
> > > +
> > > +	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
> > > +	if (ret)
> > > +		goto out_disable;
> > > +
> > > +	return 0;
> > > +
> > > +out_disable:
> > > +	cio_disable_subchannel(sch);
> > > +out_free:
> > > +	dev_set_drvdata(&sch->dev, NULL);
> > > +	kfree(private);
> > > +	return ret;
> > > +}
> > > +
> > 
> > (...)
> > 
> > > +/**
> > > + * vfio_ccw_sch_event - process subchannel event
> > > + * @sch: subchannel
> > > + * @process: non-zero if function is called in process context
> > > + *
> > > + * An unspecified event occurred for this subchannel. Adjust data according
> > > + * to the current operational state of the subchannel. Return zero when the
> > > + * event has been handled sufficiently or -EAGAIN when this function should
> > > + * be called again in process context.
> > > + */
> > > +static int vfio_ccw_sch_event(struct subchannel *sch, int process)
> > > +{
> > > +	unsigned long flags;
> > > +
> > > +	spin_lock_irqsave(sch->lock, flags);
> > > +	if (!device_is_registered(&sch->dev))
> > > +		goto out_unlock;
> > > +
> > > +	if (work_pending(&sch->todo_work))
> > > +		goto out_unlock;
> > > +
> > > +	if (cio_update_schib(sch)) {
> > > +		/* Not operational. */
> > > +		css_sched_sch_todo(sch, SCH_TODO_UNREG);
> > > +
> > > +		/*
> > > +		 * TODO:
> > > +		 * Probably we should send the machine check to the guest.
> > 
> > Yes, we should do that later on. Will user space notice that the device
> > is gone? (I think crw injection should be done by user space.)
> > 
> Currently we lack this mechanism. I think there are many todos here. I
> will investigate latter.

Yes. We just need to keep that in mind for later.

> 
> > > +		 */
> > > +		goto out_unlock;
> > > +	}
> > > +
> > > +out_unlock:
> > > +	spin_unlock_irqrestore(sch->lock, flags);
> > > +
> > > +	return 0;
> > > +}

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0
  2017-02-21  8:58     ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-21 15:47         ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-02-21 15:47 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Tue, 21 Feb 2017 16:58:24 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:59:01 +0100]:
> 
> > On Fri, 17 Feb 2017 09:29:39 +0100
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> > 
> > > Although Linux does not use format-0 channel command words (CCW0)
> > > these are a non-optional part of the platform spec, and for the sake
> > > of platform compliance, and possibly some non-Linux guests, we have
> > > to support CCW0.
> > > 
> > > Making the kernel execute a format 0 channel program is too much hassle
> > > because we would need to allocate and use memory which can be addressed
> > > by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
> > > support by translating the channel program into an equivalent CCW1
> > > program instead.
> > > 
> > > Signed-off-by: Kai Yue Wang <wkywang@linux.vnet.ibm.com>
> > > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > ---
> > >  arch/s390/Kconfig              |  7 +++++
> > >  drivers/s390/cio/vfio_ccw_cp.c | 58 ++++++++++++++++++++++++++++++++++++++++--
> > >  2 files changed, 63 insertions(+), 2 deletions(-)

> > > +#ifdef CONFIG_VFIO_CCW_CCW0
> > > +static long copy_ccw_from_iova(struct channel_program *cp,
> > > +			       struct ccw1 *to, u64 iova,
> > > +			       unsigned long len)
> > > +{
> > > +	struct ccw0 ccw0;
> > > +	struct ccw1 *pccw1;
> > > +	int ret;
> > > +	int i;
> > > +
> > > +	ret = copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (!cp->orb.cmd.fmt) {
> > > +		pccw1 = to;
> > > +		for (i = 0; i < len; i++) {
> > > +			ccw0 = *(struct ccw0 *)pccw1;
> > > +			pccw1->cmd_code = ccw0.cmd_code;
> > > +			pccw1->flags = ccw0.flags;
> > > +			pccw1->count = ccw0.count;
> > > +			pccw1->cda = ccw0.cda;
> > > +			pccw1++;
> > > +		}
> > 
> > IIRC there are one or two very subtle differences between what format-0
> > and what format-1 ccws allow (see the ccw interpretation in qemu --
> > probably easier than combing through the PoP). We should either check
> > this or add a comment why we don't.
> > 
> Cool! Thanks for pointing out this!
> 
> Transfer in Channel, 15-73, PoP:
> --------------------8<-----------------------------
> The contents of the second half of the format-0 CCW,
> bit positions 32-63, are ignored. Similarly, the con-
> tents of bit positions 0-3 of the format-0 CCW are
> ignored.
> 
> Bit positions 0-3 and 8-32 of the format-1 CCW must
> contain zeros; otherwise, a program-check condition
> is generated.
> -------------------->8-----------------------------
> 
> I will fix according to the above description.

Yes. It's a bit unfortunate that we have to inspect the ccws, but it
probably can't be helped if we want to be architecture compliant.

> 
> I also checked the code in qemu. It seems that it is not compliant with
> the second paragraph above. Could I send out a separated patch to fix
> that?

Please do, bugfixes are welcome :)

> 
> > > +	}
> > > +
> > > +	return ret;
> > > +}

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0
@ 2017-02-21 15:47         ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-02-21 15:47 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Tue, 21 Feb 2017 16:58:24 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:59:01 +0100]:
> 
> > On Fri, 17 Feb 2017 09:29:39 +0100
> > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> > 
> > > Although Linux does not use format-0 channel command words (CCW0)
> > > these are a non-optional part of the platform spec, and for the sake
> > > of platform compliance, and possibly some non-Linux guests, we have
> > > to support CCW0.
> > > 
> > > Making the kernel execute a format 0 channel program is too much hassle
> > > because we would need to allocate and use memory which can be addressed
> > > by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
> > > support by translating the channel program into an equivalent CCW1
> > > program instead.
> > > 
> > > Signed-off-by: Kai Yue Wang <wkywang@linux.vnet.ibm.com>
> > > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > ---
> > >  arch/s390/Kconfig              |  7 +++++
> > >  drivers/s390/cio/vfio_ccw_cp.c | 58 ++++++++++++++++++++++++++++++++++++++++--
> > >  2 files changed, 63 insertions(+), 2 deletions(-)

> > > +#ifdef CONFIG_VFIO_CCW_CCW0
> > > +static long copy_ccw_from_iova(struct channel_program *cp,
> > > +			       struct ccw1 *to, u64 iova,
> > > +			       unsigned long len)
> > > +{
> > > +	struct ccw0 ccw0;
> > > +	struct ccw1 *pccw1;
> > > +	int ret;
> > > +	int i;
> > > +
> > > +	ret = copy_from_iova(cp->mdev, to, iova, len * sizeof(struct ccw1));
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (!cp->orb.cmd.fmt) {
> > > +		pccw1 = to;
> > > +		for (i = 0; i < len; i++) {
> > > +			ccw0 = *(struct ccw0 *)pccw1;
> > > +			pccw1->cmd_code = ccw0.cmd_code;
> > > +			pccw1->flags = ccw0.flags;
> > > +			pccw1->count = ccw0.count;
> > > +			pccw1->cda = ccw0.cda;
> > > +			pccw1++;
> > > +		}
> > 
> > IIRC there are one or two very subtle differences between what format-0
> > and what format-1 ccws allow (see the ccw interpretation in qemu --
> > probably easier than combing through the PoP). We should either check
> > this or add a comment why we don't.
> > 
> Cool! Thanks for pointing out this!
> 
> Transfer in Channel, 15-73, PoP:
> --------------------8<-----------------------------
> The contents of the second half of the format-0 CCW,
> bit positions 32-63, are ignored. Similarly, the con-
> tents of bit positions 0-3 of the format-0 CCW are
> ignored.
> 
> Bit positions 0-3 and 8-32 of the format-1 CCW must
> contain zeros; otherwise, a program-check condition
> is generated.
> -------------------->8-----------------------------
> 
> I will fix according to the above description.

Yes. It's a bit unfortunate that we have to inspect the ccws, but it
probably can't be helped if we want to be architecture compliant.

> 
> I also checked the code in qemu. It seems that it is not compliant with
> the second paragraph above. Could I send out a separated patch to fix
> that?

Please do, bugfixes are welcome :)

> 
> > > +	}
> > > +
> > > +	return ret;
> > > +}

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-21 15:43         ` [Qemu-devel] " Cornelia Huck
  (?)
  (?)
@ 2017-02-22  3:05         ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-22  3:05 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: linux-s390, pasic, kvm, pmorel, qemu-devel, borntraeger,
	alex.williamson, renxiaof, wkywang, Dong Jia Shi, agraf

* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-21 16:43:07 +0100]:

> On Tue, 21 Feb 2017 15:36:23 +0800
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > * Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:31:13 +0100]:
> > 
> > > On Fri, 17 Feb 2017 09:29:28 +0100
> > > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> > > 
> > > > To make vfio support subchannel devices, we need a css driver for
> > > > the vfio subchannels. This patch adds a basic vfio-ccw subchannel
> > > > driver for this purpose.
> > > > 
> > > > To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
> > > > and configure VFIO as required.
> > > > 
> > > > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > > Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> > > > ---
> > > >  arch/s390/Kconfig                   |  10 ++
> > > >  arch/s390/include/asm/isc.h         |   1 +
> > > >  drivers/iommu/Kconfig               |   8 ++
> > > >  drivers/s390/cio/Makefile           |   3 +
> > > >  drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
> > > >  drivers/s390/cio/vfio_ccw_private.h |  25 ++++
> > > >  6 files changed, 309 insertions(+)
> > > >  create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
> > > >  create mode 100644 drivers/s390/cio/vfio_ccw_private.h
> 
> > > > diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
> > > > index 68d7d68..8a0b721 100644
> > > > --- a/arch/s390/include/asm/isc.h
> > > > +++ b/arch/s390/include/asm/isc.h
> > > > @@ -16,6 +16,7 @@
> > > >  #define CONSOLE_ISC 1			/* console I/O subchannel */
> > > >  #define EADM_SCH_ISC 4			/* EADM subchannels */
> > > >  #define CHSC_SCH_ISC 7			/* CHSC subchannels */
> > > > +#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */
> > > 
> > > This is OK for now, I guess; but do we want to have the isc
> > > configurable in the long run? I.e., if a host wants to run its own I/O
> > > devices at a different priority than the devices it passes to a guest?
> > > 
> > I think we can keep this as the default value, and provide a driver
> > param to customize the ISC value in the future once we need this. I put
> > this on my LATER list, or I do it in next version?
> 
> This should a per-device knob, I think. It's ok to default to the
> normal I/O subchannel isc.
> 
> We probably can add this at a later time when we're clear about the
> requirements.
> 
Ok.

> > 
> > > >  /* Adapter interrupts. */
> > > >  #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
> > > >  #define PCI_ISC 2			/* PCI I/O subchannels */
> > > 
> > > (...)
> > > 
> > > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> > > > new file mode 100644
> > > > index 0000000..b068207
> > > > --- /dev/null
> > > > +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > > > @@ -0,0 +1,262 @@
> > > > +/*
> > > > + * VFIO based Physical Subchannel device driver
> > > > + *
> > > > + * Copyright IBM Corp. 2017
> > > > + *
> > > > + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > > + *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> > > > + */
> > > > +
> > > > +#include <linux/module.h>
> > > > +#include <linux/init.h>
> > > > +#include <linux/device.h>
> > > > +#include <linux/slab.h>
> > > > +
> > > > +#include <asm/isc.h>
> > > > +
> > > > +#include "vfio_ccw_private.h"
> > > > +
> > > > +/*
> > > > + * Helpers
> > > > + */
> > > > +static int vfio_ccw_sch_quiesce(struct subchannel *sch)
> > > > +{
> > > > +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> > > > +	DECLARE_COMPLETION_ONSTACK(completion);
> > > > +	int iretry, ret = 0;
> > > > +
> > > > +	spin_lock_irq(sch->lock);
> > > > +	if (!sch->schib.pmcw.ena)
> > > > +		goto out_unlock;
> > > > +	ret = cio_disable_subchannel(sch);
> > > > +	if (ret != -EBUSY)
> > > > +		goto out_unlock;
> > > > +
> > > > +	do {
> > > > +		iretry = 255;
> > > > +
> > > > +		ret = cio_cancel_halt_clear(sch, &iretry);
> > > > +		while (ret == -EBUSY) {
> > > > +			/*
> > > > +			 * Flushing all I/O and wait the
> > > 
> > > "Flush all I/O and wait for..."
> > > 
> > Ok.
> > 
> > > > +			 * cancel/halt/clear completion.
> > > > +			 */
> > > > +			private->completion = &completion;
> > > > +			spin_unlock_irq(sch->lock);
> > > > +
> > > > +			wait_for_completion(&completion);
> > > 
> > > What happens for cancel? It won't generate an interrupt.
> > > 
> > Right! How about using:
> > 	wait_for_completion_timeout(&completion, 3*HZ);
> > 
> > (I stole '3*HZ' from ccw_device_kill_io.)
> 
> That's likely a good place to steal from :)
> 
> > 
> > > > +
> > > > +			spin_lock_irq(sch->lock);
> > > > +			private->completion = NULL;
> > > > +			ret = cio_cancel_halt_clear(sch, &iretry);
> > > > +		};
> > > > +
> > > > +		ret = cio_disable_subchannel(sch);
> > > > +	} while (ret == -EBUSY);
> > > > +
> > > > +out_unlock:
> > > > +	spin_unlock_irq(sch->lock);
> > > > +	return ret;
> > > > +}
> 
> > > > +static int vfio_ccw_sch_probe(struct subchannel *sch)
> > > > +{
> > > > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > > > +	struct vfio_ccw_private *private;
> > > > +	int ret;
> > > > +
> > > > +	if (pmcw->qf) {
> > > > +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
> > > 
> > > s/do/does/
> > > 
> > Ok.
> > 
> > > > +			 dev_name(&sch->dev));
> > > > +		return -ENOTTY;
> > > 
> > > Is -ENOTTY the right return code here? -EINVAL?
> > > 
> > Ok. Think it again. -EINVAL makes more sense. It's like:
> > "hey, I know it's an I/O subchannel, but not the kind we support".
> 
> The driver core treats -ENODEV/-ENXIO as "driver matched, but rejected
> the device". That's probably better, as we can't filter on device types
> when binding at the subchannel level.
> 
Ok. Will use -ENODEV.

> > 
> > > > +	}
> > > > +
> > > > +	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> > > > +	if (!private)
> > > > +		return -ENOMEM;
> > > > +	private->sch = sch;
> > > > +	dev_set_drvdata(&sch->dev, private);
> > > > +
> > > > +	spin_lock_irq(sch->lock);
> > > > +	sch->isc = VFIO_CCW_ISC;
> > > > +	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
> > > > +	spin_unlock_irq(sch->lock);
> > > > +	if (ret)
> > > > +		goto out_free;
> > > > +
> > > > +	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
> > > > +	if (ret)
> > > > +		goto out_disable;
> > > > +
> > > > +	return 0;
> > > > +
> > > > +out_disable:
> > > > +	cio_disable_subchannel(sch);
> > > > +out_free:
> > > > +	dev_set_drvdata(&sch->dev, NULL);
> > > > +	kfree(private);
> > > > +	return ret;
> > > > +}
> > > > +
> > > 
> > > (...)
> > > 
> > > > +/**
> > > > + * vfio_ccw_sch_event - process subchannel event
> > > > + * @sch: subchannel
> > > > + * @process: non-zero if function is called in process context
> > > > + *
> > > > + * An unspecified event occurred for this subchannel. Adjust data according
> > > > + * to the current operational state of the subchannel. Return zero when the
> > > > + * event has been handled sufficiently or -EAGAIN when this function should
> > > > + * be called again in process context.
> > > > + */
> > > > +static int vfio_ccw_sch_event(struct subchannel *sch, int process)
> > > > +{
> > > > +	unsigned long flags;
> > > > +
> > > > +	spin_lock_irqsave(sch->lock, flags);
> > > > +	if (!device_is_registered(&sch->dev))
> > > > +		goto out_unlock;
> > > > +
> > > > +	if (work_pending(&sch->todo_work))
> > > > +		goto out_unlock;
> > > > +
> > > > +	if (cio_update_schib(sch)) {
> > > > +		/* Not operational. */
> > > > +		css_sched_sch_todo(sch, SCH_TODO_UNREG);
> > > > +
> > > > +		/*
> > > > +		 * TODO:
> > > > +		 * Probably we should send the machine check to the guest.
> > > 
> > > Yes, we should do that later on. Will user space notice that the device
> > > is gone? (I think crw injection should be done by user space.)
> > > 
> > Currently we lack this mechanism. I think there are many todos here. I
> > will investigate latter.
> 
> Yes. We just need to keep that in mind for later.
> 
Ok.

> > 
> > > > +		 */
> > > > +		goto out_unlock;
> > > > +	}
> > > > +
> > > > +out_unlock:
> > > > +	spin_unlock_irqrestore(sch->lock, flags);
> > > > +
> > > > +	return 0;
> > > > +}
> 

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver
  2017-02-21 15:43         ` [Qemu-devel] " Cornelia Huck
  (?)
@ 2017-02-22  3:05         ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-22  3:05 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Dong Jia Shi, kvm, linux-s390, qemu-devel, renxiaof, borntraeger,
	agraf, alex.williamson, pmorel, pasic, wkywang

* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-21 16:43:07 +0100]:

> On Tue, 21 Feb 2017 15:36:23 +0800
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > * Cornelia Huck <cornelia.huck@de.ibm.com> [2017-02-20 19:31:13 +0100]:
> > 
> > > On Fri, 17 Feb 2017 09:29:28 +0100
> > > Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> > > 
> > > > To make vfio support subchannel devices, we need a css driver for
> > > > the vfio subchannels. This patch adds a basic vfio-ccw subchannel
> > > > driver for this purpose.
> > > > 
> > > > To enable VFIO for vfio-ccw, enable S390_CCW_IOMMU config option
> > > > and configure VFIO as required.
> > > > 
> > > > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > > Acked-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> > > > ---
> > > >  arch/s390/Kconfig                   |  10 ++
> > > >  arch/s390/include/asm/isc.h         |   1 +
> > > >  drivers/iommu/Kconfig               |   8 ++
> > > >  drivers/s390/cio/Makefile           |   3 +
> > > >  drivers/s390/cio/vfio_ccw_drv.c     | 262 ++++++++++++++++++++++++++++++++++++
> > > >  drivers/s390/cio/vfio_ccw_private.h |  25 ++++
> > > >  6 files changed, 309 insertions(+)
> > > >  create mode 100644 drivers/s390/cio/vfio_ccw_drv.c
> > > >  create mode 100644 drivers/s390/cio/vfio_ccw_private.h
> 
> > > > diff --git a/arch/s390/include/asm/isc.h b/arch/s390/include/asm/isc.h
> > > > index 68d7d68..8a0b721 100644
> > > > --- a/arch/s390/include/asm/isc.h
> > > > +++ b/arch/s390/include/asm/isc.h
> > > > @@ -16,6 +16,7 @@
> > > >  #define CONSOLE_ISC 1			/* console I/O subchannel */
> > > >  #define EADM_SCH_ISC 4			/* EADM subchannels */
> > > >  #define CHSC_SCH_ISC 7			/* CHSC subchannels */
> > > > +#define VFIO_CCW_ISC IO_SCH_ISC		/* VFIO-CCW I/O subchannels */
> > > 
> > > This is OK for now, I guess; but do we want to have the isc
> > > configurable in the long run? I.e., if a host wants to run its own I/O
> > > devices at a different priority than the devices it passes to a guest?
> > > 
> > I think we can keep this as the default value, and provide a driver
> > param to customize the ISC value in the future once we need this. I put
> > this on my LATER list, or I do it in next version?
> 
> This should a per-device knob, I think. It's ok to default to the
> normal I/O subchannel isc.
> 
> We probably can add this at a later time when we're clear about the
> requirements.
> 
Ok.

> > 
> > > >  /* Adapter interrupts. */
> > > >  #define QDIO_AIRQ_ISC IO_SCH_ISC	/* I/O subchannel in qdio mode */
> > > >  #define PCI_ISC 2			/* PCI I/O subchannels */
> > > 
> > > (...)
> > > 
> > > > diff --git a/drivers/s390/cio/vfio_ccw_drv.c b/drivers/s390/cio/vfio_ccw_drv.c
> > > > new file mode 100644
> > > > index 0000000..b068207
> > > > --- /dev/null
> > > > +++ b/drivers/s390/cio/vfio_ccw_drv.c
> > > > @@ -0,0 +1,262 @@
> > > > +/*
> > > > + * VFIO based Physical Subchannel device driver
> > > > + *
> > > > + * Copyright IBM Corp. 2017
> > > > + *
> > > > + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > > > + *            Xiao Feng Ren <renxiaof@linux.vnet.ibm.com>
> > > > + */
> > > > +
> > > > +#include <linux/module.h>
> > > > +#include <linux/init.h>
> > > > +#include <linux/device.h>
> > > > +#include <linux/slab.h>
> > > > +
> > > > +#include <asm/isc.h>
> > > > +
> > > > +#include "vfio_ccw_private.h"
> > > > +
> > > > +/*
> > > > + * Helpers
> > > > + */
> > > > +static int vfio_ccw_sch_quiesce(struct subchannel *sch)
> > > > +{
> > > > +	struct vfio_ccw_private *private = dev_get_drvdata(&sch->dev);
> > > > +	DECLARE_COMPLETION_ONSTACK(completion);
> > > > +	int iretry, ret = 0;
> > > > +
> > > > +	spin_lock_irq(sch->lock);
> > > > +	if (!sch->schib.pmcw.ena)
> > > > +		goto out_unlock;
> > > > +	ret = cio_disable_subchannel(sch);
> > > > +	if (ret != -EBUSY)
> > > > +		goto out_unlock;
> > > > +
> > > > +	do {
> > > > +		iretry = 255;
> > > > +
> > > > +		ret = cio_cancel_halt_clear(sch, &iretry);
> > > > +		while (ret == -EBUSY) {
> > > > +			/*
> > > > +			 * Flushing all I/O and wait the
> > > 
> > > "Flush all I/O and wait for..."
> > > 
> > Ok.
> > 
> > > > +			 * cancel/halt/clear completion.
> > > > +			 */
> > > > +			private->completion = &completion;
> > > > +			spin_unlock_irq(sch->lock);
> > > > +
> > > > +			wait_for_completion(&completion);
> > > 
> > > What happens for cancel? It won't generate an interrupt.
> > > 
> > Right! How about using:
> > 	wait_for_completion_timeout(&completion, 3*HZ);
> > 
> > (I stole '3*HZ' from ccw_device_kill_io.)
> 
> That's likely a good place to steal from :)
> 
> > 
> > > > +
> > > > +			spin_lock_irq(sch->lock);
> > > > +			private->completion = NULL;
> > > > +			ret = cio_cancel_halt_clear(sch, &iretry);
> > > > +		};
> > > > +
> > > > +		ret = cio_disable_subchannel(sch);
> > > > +	} while (ret == -EBUSY);
> > > > +
> > > > +out_unlock:
> > > > +	spin_unlock_irq(sch->lock);
> > > > +	return ret;
> > > > +}
> 
> > > > +static int vfio_ccw_sch_probe(struct subchannel *sch)
> > > > +{
> > > > +	struct pmcw *pmcw = &sch->schib.pmcw;
> > > > +	struct vfio_ccw_private *private;
> > > > +	int ret;
> > > > +
> > > > +	if (pmcw->qf) {
> > > > +		dev_warn(&sch->dev, "vfio: ccw: do not support QDIO: %s\n",
> > > 
> > > s/do/does/
> > > 
> > Ok.
> > 
> > > > +			 dev_name(&sch->dev));
> > > > +		return -ENOTTY;
> > > 
> > > Is -ENOTTY the right return code here? -EINVAL?
> > > 
> > Ok. Think it again. -EINVAL makes more sense. It's like:
> > "hey, I know it's an I/O subchannel, but not the kind we support".
> 
> The driver core treats -ENODEV/-ENXIO as "driver matched, but rejected
> the device". That's probably better, as we can't filter on device types
> when binding at the subchannel level.
> 
Ok. Will use -ENODEV.

> > 
> > > > +	}
> > > > +
> > > > +	private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> > > > +	if (!private)
> > > > +		return -ENOMEM;
> > > > +	private->sch = sch;
> > > > +	dev_set_drvdata(&sch->dev, private);
> > > > +
> > > > +	spin_lock_irq(sch->lock);
> > > > +	sch->isc = VFIO_CCW_ISC;
> > > > +	ret = cio_enable_subchannel(sch, (u32)(unsigned long)sch);
> > > > +	spin_unlock_irq(sch->lock);
> > > > +	if (ret)
> > > > +		goto out_free;
> > > > +
> > > > +	ret = sysfs_create_group(&sch->dev.kobj, &vfio_subchannel_attr_group);
> > > > +	if (ret)
> > > > +		goto out_disable;
> > > > +
> > > > +	return 0;
> > > > +
> > > > +out_disable:
> > > > +	cio_disable_subchannel(sch);
> > > > +out_free:
> > > > +	dev_set_drvdata(&sch->dev, NULL);
> > > > +	kfree(private);
> > > > +	return ret;
> > > > +}
> > > > +
> > > 
> > > (...)
> > > 
> > > > +/**
> > > > + * vfio_ccw_sch_event - process subchannel event
> > > > + * @sch: subchannel
> > > > + * @process: non-zero if function is called in process context
> > > > + *
> > > > + * An unspecified event occurred for this subchannel. Adjust data according
> > > > + * to the current operational state of the subchannel. Return zero when the
> > > > + * event has been handled sufficiently or -EAGAIN when this function should
> > > > + * be called again in process context.
> > > > + */
> > > > +static int vfio_ccw_sch_event(struct subchannel *sch, int process)
> > > > +{
> > > > +	unsigned long flags;
> > > > +
> > > > +	spin_lock_irqsave(sch->lock, flags);
> > > > +	if (!device_is_registered(&sch->dev))
> > > > +		goto out_unlock;
> > > > +
> > > > +	if (work_pending(&sch->todo_work))
> > > > +		goto out_unlock;
> > > > +
> > > > +	if (cio_update_schib(sch)) {
> > > > +		/* Not operational. */
> > > > +		css_sched_sch_todo(sch, SCH_TODO_UNREG);
> > > > +
> > > > +		/*
> > > > +		 * TODO:
> > > > +		 * Probably we should send the machine check to the guest.
> > > 
> > > Yes, we should do that later on. Will user space notice that the device
> > > is gone? (I think crw injection should be done by user space.)
> > > 
> > Currently we lack this mechanism. I think there are many todos here. I
> > will investigate latter.
> 
> Yes. We just need to keep that in mind for later.
> 
Ok.

> > 
> > > > +		 */
> > > > +		goto out_unlock;
> > > > +	}
> > > > +
> > > > +out_unlock:
> > > > +	spin_unlock_irqrestore(sch->lock, flags);
> > > > +
> > > > +	return 0;
> > > > +}
> 

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 07/15] vfio: ccw: introduce ccw_io_region
  2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-24 23:13     ` Alex Williamson
  -1 siblings, 0 replies; 61+ messages in thread
From: Alex Williamson @ 2017-02-24 23:13 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, cornelia.huck,
	borntraeger, agraf, pmorel, pasic, wkywang

On Fri, 17 Feb 2017 09:29:31 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> To provide user-space a set of interfaces to:
> 1. pass in a ccw program to perform an I/O operation.
> 2. read back I/O results of the completed I/O operations.
> We introduce an MMIO region for the vfio-ccw device here.
> 
> This region is defined to content:
> 1. areas to store arguments that an ssch required.
> 2. areas to store the I/O results.
> 
> Using pwrite/pread to the device on this region, a user-space program
> could write/read data to/from the vfio-ccw device.
> 
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> ---
>  arch/s390/include/uapi/asm/Kbuild   |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c     | 47 +++++++++++++++++++++++++++++++++++++
>  drivers/s390/cio/vfio_ccw_private.h |  4 ++++
>  include/uapi/linux/vfio_ccw.h       | 24 +++++++++++++++++++
>  4 files changed, 76 insertions(+)
>  create mode 100644 include/uapi/linux/vfio_ccw.h
> 
> diff --git a/arch/s390/include/uapi/asm/Kbuild b/arch/s390/include/uapi/asm/Kbuild
> index bf736e7..fdb9529 100644
> --- a/arch/s390/include/uapi/asm/Kbuild
> +++ b/arch/s390/include/uapi/asm/Kbuild
> @@ -52,3 +52,4 @@ header-y += unistd.h
>  header-y += virtio-ccw.h
>  header-y += vtoc.h
>  header-y += zcrypt.h
> +header-y += vfio_ccw.h
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index b8a2fed..6c06805 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -127,6 +127,51 @@ void vfio_ccw_mdev_release(struct mdev_device *mdev)
>  				 &private->nb);
>  }
>  
> +static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
> +				  char __user *buf,
> +				  size_t count,
> +				  loff_t *ppos)
> +{
> +	struct vfio_ccw_private *private;
> +	struct ccw_io_region *region;
> +
> +	if (*ppos + count > sizeof(*region))
> +		return -EINVAL;

Couldn't this wrap-around and still satisfy this test?

> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +	if (!private)
> +		return -ENODEV;
> +
> +	region = &private->io_region;
> +	if (copy_to_user(buf, (void *)region + *ppos, count))
> +		return -EFAULT;
> +
> +	return count;
> +}
> +
> +static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> +				   const char __user *buf,
> +				   size_t count,
> +				   loff_t *ppos)
> +{
> +	struct vfio_ccw_private *private;
> +	struct ccw_io_region *region;
> +
> +	if (*ppos + count > sizeof(*region))
> +		return -EINVAL;

Same here

> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +	if (!private)
> +		return -ENODEV;
> +
> +	region = &private->io_region;
> +	if (copy_from_user((void *)region + *ppos, buf, count))
> +		return -EFAULT;
> +	region->ret_code = 0;
> +
> +	return count;
> +}
> +
>  static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
>  	.owner			= THIS_MODULE,
>  	.supported_type_groups  = mdev_type_groups,
> @@ -134,6 +179,8 @@ static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
>  	.remove			= vfio_ccw_mdev_remove,
>  	.open			= vfio_ccw_mdev_open,
>  	.release		= vfio_ccw_mdev_release,
> +	.read			= vfio_ccw_mdev_read,
> +	.write			= vfio_ccw_mdev_write,
>  };
>  
>  int vfio_ccw_mdev_reg(struct subchannel *sch)
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 5afb3ba..359e96b 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -10,6 +10,8 @@
>  #ifndef _VFIO_CCW_PRIVATE_H_
>  #define _VFIO_CCW_PRIVATE_H_
>  
> +#include <linux/vfio_ccw.h>
> +
>  #include "css.h"
>  
>  /**
> @@ -19,6 +21,7 @@
>   * @avail: available for creating a mediated device
>   * @mdev: pointer to the mediated device
>   * @nb: notifier for vfio events
> + * @io_region: MMIO region to input/output I/O arguments/results
>   */
>  struct vfio_ccw_private {
>  	struct subchannel	*sch;
> @@ -26,6 +29,7 @@ struct vfio_ccw_private {
>  	atomic_t		avail;
>  	struct mdev_device	*mdev;
>  	struct notifier_block	nb;
> +	struct ccw_io_region	io_region;
>  } __aligned(8);
>  
>  extern int vfio_ccw_mdev_reg(struct subchannel *sch);
> diff --git a/include/uapi/linux/vfio_ccw.h b/include/uapi/linux/vfio_ccw.h
> new file mode 100644
> index 0000000..34a7f6f
> --- /dev/null
> +++ b/include/uapi/linux/vfio_ccw.h
> @@ -0,0 +1,24 @@
> +/*
> + * Interfaces for vfio-ccw
> + *
> + * Copyright IBM Corp. 2017
> + *
> + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> + */
> +
> +#ifndef _VFIO_CCW_H_
> +#define _VFIO_CCW_H_
> +
> +#include <linux/types.h>
> +
> +struct ccw_io_region {
> +#define ORB_AREA_SIZE 12
> +	__u8	orb_area[ORB_AREA_SIZE];
> +#define SCSW_AREA_SIZE 12
> +	__u8	scsw_area[SCSW_AREA_SIZE];
> +#define IRB_AREA_SIZE 96
> +	__u8	irb_area[IRB_AREA_SIZE];
> +	__u32	ret_code;
> +} __packed;
> +
> +#endif

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 07/15] vfio: ccw: introduce ccw_io_region
@ 2017-02-24 23:13     ` Alex Williamson
  0 siblings, 0 replies; 61+ messages in thread
From: Alex Williamson @ 2017-02-24 23:13 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, cornelia.huck,
	borntraeger, agraf, pmorel, pasic, wkywang

On Fri, 17 Feb 2017 09:29:31 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> To provide user-space a set of interfaces to:
> 1. pass in a ccw program to perform an I/O operation.
> 2. read back I/O results of the completed I/O operations.
> We introduce an MMIO region for the vfio-ccw device here.
> 
> This region is defined to content:
> 1. areas to store arguments that an ssch required.
> 2. areas to store the I/O results.
> 
> Using pwrite/pread to the device on this region, a user-space program
> could write/read data to/from the vfio-ccw device.
> 
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> ---
>  arch/s390/include/uapi/asm/Kbuild   |  1 +
>  drivers/s390/cio/vfio_ccw_ops.c     | 47 +++++++++++++++++++++++++++++++++++++
>  drivers/s390/cio/vfio_ccw_private.h |  4 ++++
>  include/uapi/linux/vfio_ccw.h       | 24 +++++++++++++++++++
>  4 files changed, 76 insertions(+)
>  create mode 100644 include/uapi/linux/vfio_ccw.h
> 
> diff --git a/arch/s390/include/uapi/asm/Kbuild b/arch/s390/include/uapi/asm/Kbuild
> index bf736e7..fdb9529 100644
> --- a/arch/s390/include/uapi/asm/Kbuild
> +++ b/arch/s390/include/uapi/asm/Kbuild
> @@ -52,3 +52,4 @@ header-y += unistd.h
>  header-y += virtio-ccw.h
>  header-y += vtoc.h
>  header-y += zcrypt.h
> +header-y += vfio_ccw.h
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index b8a2fed..6c06805 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -127,6 +127,51 @@ void vfio_ccw_mdev_release(struct mdev_device *mdev)
>  				 &private->nb);
>  }
>  
> +static ssize_t vfio_ccw_mdev_read(struct mdev_device *mdev,
> +				  char __user *buf,
> +				  size_t count,
> +				  loff_t *ppos)
> +{
> +	struct vfio_ccw_private *private;
> +	struct ccw_io_region *region;
> +
> +	if (*ppos + count > sizeof(*region))
> +		return -EINVAL;

Couldn't this wrap-around and still satisfy this test?

> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +	if (!private)
> +		return -ENODEV;
> +
> +	region = &private->io_region;
> +	if (copy_to_user(buf, (void *)region + *ppos, count))
> +		return -EFAULT;
> +
> +	return count;
> +}
> +
> +static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> +				   const char __user *buf,
> +				   size_t count,
> +				   loff_t *ppos)
> +{
> +	struct vfio_ccw_private *private;
> +	struct ccw_io_region *region;
> +
> +	if (*ppos + count > sizeof(*region))
> +		return -EINVAL;

Same here

> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +	if (!private)
> +		return -ENODEV;
> +
> +	region = &private->io_region;
> +	if (copy_from_user((void *)region + *ppos, buf, count))
> +		return -EFAULT;
> +	region->ret_code = 0;
> +
> +	return count;
> +}
> +
>  static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
>  	.owner			= THIS_MODULE,
>  	.supported_type_groups  = mdev_type_groups,
> @@ -134,6 +179,8 @@ static const struct mdev_parent_ops vfio_ccw_mdev_ops = {
>  	.remove			= vfio_ccw_mdev_remove,
>  	.open			= vfio_ccw_mdev_open,
>  	.release		= vfio_ccw_mdev_release,
> +	.read			= vfio_ccw_mdev_read,
> +	.write			= vfio_ccw_mdev_write,
>  };
>  
>  int vfio_ccw_mdev_reg(struct subchannel *sch)
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 5afb3ba..359e96b 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -10,6 +10,8 @@
>  #ifndef _VFIO_CCW_PRIVATE_H_
>  #define _VFIO_CCW_PRIVATE_H_
>  
> +#include <linux/vfio_ccw.h>
> +
>  #include "css.h"
>  
>  /**
> @@ -19,6 +21,7 @@
>   * @avail: available for creating a mediated device
>   * @mdev: pointer to the mediated device
>   * @nb: notifier for vfio events
> + * @io_region: MMIO region to input/output I/O arguments/results
>   */
>  struct vfio_ccw_private {
>  	struct subchannel	*sch;
> @@ -26,6 +29,7 @@ struct vfio_ccw_private {
>  	atomic_t		avail;
>  	struct mdev_device	*mdev;
>  	struct notifier_block	nb;
> +	struct ccw_io_region	io_region;
>  } __aligned(8);
>  
>  extern int vfio_ccw_mdev_reg(struct subchannel *sch);
> diff --git a/include/uapi/linux/vfio_ccw.h b/include/uapi/linux/vfio_ccw.h
> new file mode 100644
> index 0000000..34a7f6f
> --- /dev/null
> +++ b/include/uapi/linux/vfio_ccw.h
> @@ -0,0 +1,24 @@
> +/*
> + * Interfaces for vfio-ccw
> + *
> + * Copyright IBM Corp. 2017
> + *
> + * Author(s): Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> + */
> +
> +#ifndef _VFIO_CCW_H_
> +#define _VFIO_CCW_H_
> +
> +#include <linux/types.h>
> +
> +struct ccw_io_region {
> +#define ORB_AREA_SIZE 12
> +	__u8	orb_area[ORB_AREA_SIZE];
> +#define SCSW_AREA_SIZE 12
> +	__u8	scsw_area[SCSW_AREA_SIZE];
> +#define IRB_AREA_SIZE 96
> +	__u8	irb_area[IRB_AREA_SIZE];
> +	__u32	ret_code;
> +} __packed;
> +
> +#endif

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 11/15] vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
  2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
@ 2017-02-24 23:27     ` Alex Williamson
  -1 siblings, 0 replies; 61+ messages in thread
From: Alex Williamson @ 2017-02-24 23:27 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, cornelia.huck,
	borntraeger, agraf, pmorel, pasic, wkywang

On Fri, 17 Feb 2017 09:29:35 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve
> VFIO_CCW_IO_IRQ information.
> 
> Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for
> VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region
> was performed, trigger a signal on this fd.
> 
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_ops.c     | 123 +++++++++++++++++++++++++++++++++++-
>  drivers/s390/cio/vfio_ccw_private.h |   4 ++
>  include/uapi/linux/vfio.h           |  10 ++-
>  3 files changed, 134 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index 125818c..ebc38fb 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -202,6 +202,9 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>  	if (region->ret_code != 0)
>  		return region->ret_code;
>  
> +	if (private->io_trigger)
> +		eventfd_signal(private->io_trigger, 1);
> +
>  	return count;
>  }
>  
> @@ -209,7 +212,7 @@ static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
>  {
>  	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
>  	info->num_regions = VFIO_CCW_NUM_REGIONS;
> -	info->num_irqs = 0;
> +	info->num_irqs = VFIO_CCW_NUM_IRQS;
>  
>  	return 0;
>  }
> @@ -230,6 +233,83 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
>  	}
>  }
>  
> +int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
> +{
> +	if (info->index != VFIO_CCW_IO_IRQ_INDEX)
> +		return -EINVAL;
> +
> +	info->count = 1;
> +	info->flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE;

I tend to think that NORESIZE isn't relevant when count is one.  We
don't expose this flag on vfio-pci for INTx where there's only a single
interrupt.  I don't really see how it applies, it's really trying to
indicate if there are multiple sub-indexes within an index, if some are
enabled, we cannot enable others without first disabling them all.
Thus we can't resize the number of enabled sub-indexes.  There really
are no sub-indexes when count is one.  Thanks,

Alex

> +
> +	return 0;
> +}
> +
> +static int vfio_ccw_mdev_set_irqs(struct mdev_device *mdev,
> +				  uint32_t flags,
> +				  void __user *data)
> +{
> +	struct vfio_ccw_private *private;
> +	struct eventfd_ctx **ctx;
> +
> +	if (!(flags & VFIO_IRQ_SET_ACTION_TRIGGER))
> +		return -EINVAL;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +	if (!private)
> +		return -ENODEV;
> +
> +	ctx = &private->io_trigger;
> +
> +	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
> +	case VFIO_IRQ_SET_DATA_NONE:
> +	{
> +		if (*ctx)
> +			eventfd_signal(*ctx, 1);
> +		return 0;
> +	}
> +	case VFIO_IRQ_SET_DATA_BOOL:
> +	{
> +		uint8_t trigger;
> +
> +		if (get_user(trigger, (uint8_t __user *)data))
> +			return -EFAULT;
> +
> +		if (trigger && *ctx)
> +			eventfd_signal(*ctx, 1);
> +		return 0;
> +	}
> +	case VFIO_IRQ_SET_DATA_EVENTFD:
> +	{
> +		int32_t fd;
> +
> +		if (get_user(fd, (int32_t __user *)data))
> +			return -EFAULT;
> +
> +		if (fd == -1) {
> +			if (*ctx)
> +				eventfd_ctx_put(*ctx);
> +			*ctx = NULL;
> +		} else if (fd >= 0) {
> +			struct eventfd_ctx *efdctx;
> +
> +			efdctx = eventfd_ctx_fdget(fd);
> +			if (IS_ERR(efdctx))
> +				return PTR_ERR(efdctx);
> +
> +			if (*ctx)
> +				eventfd_ctx_put(*ctx);
> +
> +			*ctx = efdctx;
> +		} else
> +			return -EINVAL;
> +
> +		return 0;
> +	}
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
>  static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>  				   unsigned int cmd,
>  				   unsigned long arg)
> @@ -277,6 +357,47 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>  
>  		return copy_to_user((void __user *)arg, &info, minsz);
>  	}
> +	case VFIO_DEVICE_GET_IRQ_INFO:
> +	{
> +		struct vfio_irq_info info;
> +
> +		minsz = offsetofend(struct vfio_irq_info, count);
> +
> +		if (copy_from_user(&info, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (info.argsz < minsz || info.index >= VFIO_CCW_NUM_IRQS)
> +			return -EINVAL;
> +
> +		ret = vfio_ccw_mdev_get_irq_info(&info);
> +		if (ret)
> +			return ret;
> +
> +		if (info.count == -1)
> +			return -EINVAL;
> +
> +		return copy_to_user((void __user *)arg, &info, minsz);
> +	}
> +	case VFIO_DEVICE_SET_IRQS:
> +	{
> +		struct vfio_irq_set hdr;
> +		size_t data_size;
> +		void __user *data;
> +
> +		minsz = offsetofend(struct vfio_irq_set, count);
> +
> +		if (copy_from_user(&hdr, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		ret = vfio_set_irqs_validate_and_prepare(&hdr, 1,
> +							 VFIO_CCW_NUM_IRQS,
> +							 &data_size);
> +		if (ret)
> +			return ret;
> +
> +		data = (void __user *)(arg + minsz);
> +		return vfio_ccw_mdev_set_irqs(mdev, hdr.flags, data);
> +	}
>  	case VFIO_DEVICE_RESET:
>  		return vfio_ccw_mdev_reset(mdev);
>  	default:
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 79e5337..dddab52 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -11,6 +11,7 @@
>  #define _VFIO_CCW_PRIVATE_H_
>  
>  #include <linux/completion.h>
> +#include <linux/eventfd.h>
>  #include <linux/vfio_ccw.h>
>  
>  #include "css.h"
> @@ -29,6 +30,7 @@
>   * @cp: channel program for the current I/O operation
>   * @irb: irb info received from interrupt
>   * @scsw: scsw info
> + * @io_trigger: eventfd ctx for signaling userspace I/O results
>   */
>  struct vfio_ccw_private {
>  	struct subchannel	*sch;
> @@ -43,6 +45,8 @@ struct vfio_ccw_private {
>  	struct channel_program	cp;
>  	struct irb		irb;
>  	union scsw		scsw;
> +
> +	struct eventfd_ctx	*io_trigger;
>  } __aligned(8);
>  
>  extern int vfio_ccw_mdev_reg(struct subchannel *sch);
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 3fd70ff..ae46105 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -449,8 +449,9 @@ enum {
>  };
>  
>  /*
> - * The vfio-ccw bus driver makes use of the following fixed region.
> - * Unimplemented regions return a size of zero.
> + * The vfio-ccw bus driver makes use of the following fixed region and
> + * IRQ index mapping. Unimplemented regions return a size of zero.
> + * Unimplemented IRQ types return a count of zero.
>   */
>  
>  enum {
> @@ -458,6 +459,11 @@ enum {
>  	VFIO_CCW_NUM_REGIONS
>  };
>  
> +enum {
> +	VFIO_CCW_IO_IRQ_INDEX,
> +	VFIO_CCW_NUM_IRQS
> +};
> +
>  /**
>   * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
>   *					      struct vfio_pci_hot_reset_info)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 11/15] vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
@ 2017-02-24 23:27     ` Alex Williamson
  0 siblings, 0 replies; 61+ messages in thread
From: Alex Williamson @ 2017-02-24 23:27 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, cornelia.huck,
	borntraeger, agraf, pmorel, pasic, wkywang

On Fri, 17 Feb 2017 09:29:35 +0100
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve
> VFIO_CCW_IO_IRQ information.
> 
> Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for
> VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region
> was performed, trigger a signal on this fd.
> 
> Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> ---
>  drivers/s390/cio/vfio_ccw_ops.c     | 123 +++++++++++++++++++++++++++++++++++-
>  drivers/s390/cio/vfio_ccw_private.h |   4 ++
>  include/uapi/linux/vfio.h           |  10 ++-
>  3 files changed, 134 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> index 125818c..ebc38fb 100644
> --- a/drivers/s390/cio/vfio_ccw_ops.c
> +++ b/drivers/s390/cio/vfio_ccw_ops.c
> @@ -202,6 +202,9 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
>  	if (region->ret_code != 0)
>  		return region->ret_code;
>  
> +	if (private->io_trigger)
> +		eventfd_signal(private->io_trigger, 1);
> +
>  	return count;
>  }
>  
> @@ -209,7 +212,7 @@ static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
>  {
>  	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
>  	info->num_regions = VFIO_CCW_NUM_REGIONS;
> -	info->num_irqs = 0;
> +	info->num_irqs = VFIO_CCW_NUM_IRQS;
>  
>  	return 0;
>  }
> @@ -230,6 +233,83 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
>  	}
>  }
>  
> +int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
> +{
> +	if (info->index != VFIO_CCW_IO_IRQ_INDEX)
> +		return -EINVAL;
> +
> +	info->count = 1;
> +	info->flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE;

I tend to think that NORESIZE isn't relevant when count is one.  We
don't expose this flag on vfio-pci for INTx where there's only a single
interrupt.  I don't really see how it applies, it's really trying to
indicate if there are multiple sub-indexes within an index, if some are
enabled, we cannot enable others without first disabling them all.
Thus we can't resize the number of enabled sub-indexes.  There really
are no sub-indexes when count is one.  Thanks,

Alex

> +
> +	return 0;
> +}
> +
> +static int vfio_ccw_mdev_set_irqs(struct mdev_device *mdev,
> +				  uint32_t flags,
> +				  void __user *data)
> +{
> +	struct vfio_ccw_private *private;
> +	struct eventfd_ctx **ctx;
> +
> +	if (!(flags & VFIO_IRQ_SET_ACTION_TRIGGER))
> +		return -EINVAL;
> +
> +	private = dev_get_drvdata(mdev_parent_dev(mdev));
> +	if (!private)
> +		return -ENODEV;
> +
> +	ctx = &private->io_trigger;
> +
> +	switch (flags & VFIO_IRQ_SET_DATA_TYPE_MASK) {
> +	case VFIO_IRQ_SET_DATA_NONE:
> +	{
> +		if (*ctx)
> +			eventfd_signal(*ctx, 1);
> +		return 0;
> +	}
> +	case VFIO_IRQ_SET_DATA_BOOL:
> +	{
> +		uint8_t trigger;
> +
> +		if (get_user(trigger, (uint8_t __user *)data))
> +			return -EFAULT;
> +
> +		if (trigger && *ctx)
> +			eventfd_signal(*ctx, 1);
> +		return 0;
> +	}
> +	case VFIO_IRQ_SET_DATA_EVENTFD:
> +	{
> +		int32_t fd;
> +
> +		if (get_user(fd, (int32_t __user *)data))
> +			return -EFAULT;
> +
> +		if (fd == -1) {
> +			if (*ctx)
> +				eventfd_ctx_put(*ctx);
> +			*ctx = NULL;
> +		} else if (fd >= 0) {
> +			struct eventfd_ctx *efdctx;
> +
> +			efdctx = eventfd_ctx_fdget(fd);
> +			if (IS_ERR(efdctx))
> +				return PTR_ERR(efdctx);
> +
> +			if (*ctx)
> +				eventfd_ctx_put(*ctx);
> +
> +			*ctx = efdctx;
> +		} else
> +			return -EINVAL;
> +
> +		return 0;
> +	}
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
>  static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>  				   unsigned int cmd,
>  				   unsigned long arg)
> @@ -277,6 +357,47 @@ static ssize_t vfio_ccw_mdev_ioctl(struct mdev_device *mdev,
>  
>  		return copy_to_user((void __user *)arg, &info, minsz);
>  	}
> +	case VFIO_DEVICE_GET_IRQ_INFO:
> +	{
> +		struct vfio_irq_info info;
> +
> +		minsz = offsetofend(struct vfio_irq_info, count);
> +
> +		if (copy_from_user(&info, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		if (info.argsz < minsz || info.index >= VFIO_CCW_NUM_IRQS)
> +			return -EINVAL;
> +
> +		ret = vfio_ccw_mdev_get_irq_info(&info);
> +		if (ret)
> +			return ret;
> +
> +		if (info.count == -1)
> +			return -EINVAL;
> +
> +		return copy_to_user((void __user *)arg, &info, minsz);
> +	}
> +	case VFIO_DEVICE_SET_IRQS:
> +	{
> +		struct vfio_irq_set hdr;
> +		size_t data_size;
> +		void __user *data;
> +
> +		minsz = offsetofend(struct vfio_irq_set, count);
> +
> +		if (copy_from_user(&hdr, (void __user *)arg, minsz))
> +			return -EFAULT;
> +
> +		ret = vfio_set_irqs_validate_and_prepare(&hdr, 1,
> +							 VFIO_CCW_NUM_IRQS,
> +							 &data_size);
> +		if (ret)
> +			return ret;
> +
> +		data = (void __user *)(arg + minsz);
> +		return vfio_ccw_mdev_set_irqs(mdev, hdr.flags, data);
> +	}
>  	case VFIO_DEVICE_RESET:
>  		return vfio_ccw_mdev_reset(mdev);
>  	default:
> diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
> index 79e5337..dddab52 100644
> --- a/drivers/s390/cio/vfio_ccw_private.h
> +++ b/drivers/s390/cio/vfio_ccw_private.h
> @@ -11,6 +11,7 @@
>  #define _VFIO_CCW_PRIVATE_H_
>  
>  #include <linux/completion.h>
> +#include <linux/eventfd.h>
>  #include <linux/vfio_ccw.h>
>  
>  #include "css.h"
> @@ -29,6 +30,7 @@
>   * @cp: channel program for the current I/O operation
>   * @irb: irb info received from interrupt
>   * @scsw: scsw info
> + * @io_trigger: eventfd ctx for signaling userspace I/O results
>   */
>  struct vfio_ccw_private {
>  	struct subchannel	*sch;
> @@ -43,6 +45,8 @@ struct vfio_ccw_private {
>  	struct channel_program	cp;
>  	struct irb		irb;
>  	union scsw		scsw;
> +
> +	struct eventfd_ctx	*io_trigger;
>  } __aligned(8);
>  
>  extern int vfio_ccw_mdev_reg(struct subchannel *sch);
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 3fd70ff..ae46105 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -449,8 +449,9 @@ enum {
>  };
>  
>  /*
> - * The vfio-ccw bus driver makes use of the following fixed region.
> - * Unimplemented regions return a size of zero.
> + * The vfio-ccw bus driver makes use of the following fixed region and
> + * IRQ index mapping. Unimplemented regions return a size of zero.
> + * Unimplemented IRQ types return a count of zero.
>   */
>  
>  enum {
> @@ -458,6 +459,11 @@ enum {
>  	VFIO_CCW_NUM_REGIONS
>  };
>  
> +enum {
> +	VFIO_CCW_IO_IRQ_INDEX,
> +	VFIO_CCW_NUM_IRQS
> +};
> +
>  /**
>   * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IORW(VFIO_TYPE, VFIO_BASE + 12,
>   *					      struct vfio_pci_hot_reset_info)

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 11/15] vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
  2017-02-24 23:27     ` [Qemu-devel] " Alex Williamson
  (?)
  (?)
@ 2017-02-27  1:44     ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-27  1:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: linux-s390, pasic, kvm, pmorel, qemu-devel, borntraeger,
	renxiaof, wkywang, cornelia.huck, Dong Jia Shi, agraf

* Alex Williamson <alex.williamson@redhat.com> [2017-02-24 16:27:24 -0700]:

> On Fri, 17 Feb 2017 09:29:35 +0100
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve
> > VFIO_CCW_IO_IRQ information.
> > 
> > Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for
> > VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region
> > was performed, trigger a signal on this fd.
> > 
> > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> > ---
> >  drivers/s390/cio/vfio_ccw_ops.c     | 123 +++++++++++++++++++++++++++++++++++-
> >  drivers/s390/cio/vfio_ccw_private.h |   4 ++
> >  include/uapi/linux/vfio.h           |  10 ++-
> >  3 files changed, 134 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> > index 125818c..ebc38fb 100644
> > --- a/drivers/s390/cio/vfio_ccw_ops.c
> > +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > @@ -202,6 +202,9 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> >  	if (region->ret_code != 0)
> >  		return region->ret_code;
> >  
> > +	if (private->io_trigger)
> > +		eventfd_signal(private->io_trigger, 1);
> > +
> >  	return count;
> >  }
> >  
> > @@ -209,7 +212,7 @@ static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
> >  {
> >  	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
> >  	info->num_regions = VFIO_CCW_NUM_REGIONS;
> > -	info->num_irqs = 0;
> > +	info->num_irqs = VFIO_CCW_NUM_IRQS;
> >  
> >  	return 0;
> >  }
> > @@ -230,6 +233,83 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
> >  	}
> >  }
> >  
> > +int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
> > +{
> > +	if (info->index != VFIO_CCW_IO_IRQ_INDEX)
> > +		return -EINVAL;
> > +
> > +	info->count = 1;
> > +	info->flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE;
> 
> I tend to think that NORESIZE isn't relevant when count is one.  We
> don't expose this flag on vfio-pci for INTx where there's only a single
> interrupt.  I don't really see how it applies, it's really trying to
> indicate if there are multiple sub-indexes within an index, if some are
> enabled, we cannot enable others without first disabling them all.
> Thus we can't resize the number of enabled sub-indexes.  There really
> are no sub-indexes when count is one.  Thanks,
Hi Alex,

I read the description of NORESIZE again, and I think you are right.
Since we do not have a subset of indexes here, I will remove this flag.

Thanks!

> 
> Alex
> 
> > +
> > +	return 0;
> > +}
> > +
[...]

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 11/15] vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
  2017-02-24 23:27     ` [Qemu-devel] " Alex Williamson
  (?)
@ 2017-02-27  1:44     ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-02-27  1:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Dong Jia Shi, kvm, linux-s390, qemu-devel, renxiaof,
	cornelia.huck, borntraeger, agraf, pmorel, pasic, wkywang

* Alex Williamson <alex.williamson@redhat.com> [2017-02-24 16:27:24 -0700]:

> On Fri, 17 Feb 2017 09:29:35 +0100
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > Realize VFIO_DEVICE_GET_IRQ_INFO ioctl to retrieve
> > VFIO_CCW_IO_IRQ information.
> > 
> > Realize VFIO_DEVICE_SET_IRQS ioctl to set an eventfd fd for
> > VFIO_CCW_IO_IRQ. Once a write operation to the ccw_io_region
> > was performed, trigger a signal on this fd.
> > 
> > Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> > Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
> > ---
> >  drivers/s390/cio/vfio_ccw_ops.c     | 123 +++++++++++++++++++++++++++++++++++-
> >  drivers/s390/cio/vfio_ccw_private.h |   4 ++
> >  include/uapi/linux/vfio.h           |  10 ++-
> >  3 files changed, 134 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
> > index 125818c..ebc38fb 100644
> > --- a/drivers/s390/cio/vfio_ccw_ops.c
> > +++ b/drivers/s390/cio/vfio_ccw_ops.c
> > @@ -202,6 +202,9 @@ static ssize_t vfio_ccw_mdev_write(struct mdev_device *mdev,
> >  	if (region->ret_code != 0)
> >  		return region->ret_code;
> >  
> > +	if (private->io_trigger)
> > +		eventfd_signal(private->io_trigger, 1);
> > +
> >  	return count;
> >  }
> >  
> > @@ -209,7 +212,7 @@ static int vfio_ccw_mdev_get_device_info(struct vfio_device_info *info)
> >  {
> >  	info->flags = VFIO_DEVICE_FLAGS_CCW | VFIO_DEVICE_FLAGS_RESET;
> >  	info->num_regions = VFIO_CCW_NUM_REGIONS;
> > -	info->num_irqs = 0;
> > +	info->num_irqs = VFIO_CCW_NUM_IRQS;
> >  
> >  	return 0;
> >  }
> > @@ -230,6 +233,83 @@ static int vfio_ccw_mdev_get_region_info(struct vfio_region_info *info,
> >  	}
> >  }
> >  
> > +int vfio_ccw_mdev_get_irq_info(struct vfio_irq_info *info)
> > +{
> > +	if (info->index != VFIO_CCW_IO_IRQ_INDEX)
> > +		return -EINVAL;
> > +
> > +	info->count = 1;
> > +	info->flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE;
> 
> I tend to think that NORESIZE isn't relevant when count is one.  We
> don't expose this flag on vfio-pci for INTx where there's only a single
> interrupt.  I don't really see how it applies, it's really trying to
> indicate if there are multiple sub-indexes within an index, if some are
> enabled, we cannot enable others without first disabling them all.
> Thus we can't resize the number of enabled sub-indexes.  There really
> are no sub-indexes when count is one.  Thanks,
Hi Alex,

I read the description of NORESIZE again, and I think you are right.
Since we do not have a subset of indexes here, I will remove this flag.

Thanks!

> 
> Alex
> 
> > +
> > +	return 0;
> > +}
> > +
[...]

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
  2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
                   ` (15 preceding siblings ...)
  (?)
@ 2017-03-09  9:25 ` Dong Jia Shi
  2017-03-10 10:46     ` [Qemu-devel] " Cornelia Huck
  -1 siblings, 1 reply; 61+ messages in thread
From: Dong Jia Shi @ 2017-03-09  9:25 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, cornelia.huck,
	borntraeger, agraf, alex.williamson, pmorel, pasic, wkywang

* Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2017-02-17 09:29:24 +0100]:

> vfio-ccw: the basic infrastructure
> ==================================
> 
> Introduction
> ------------
> 
> Here we describe the vfio support for I/O subchannel devices for
> Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
> virtual machine, while vfio is the means.
> 

Ping.

Any more comment for this version?

Thanks.

[...]

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
  2017-03-09  9:25 ` [Qemu-devel] [PATCH RFC v3 00/15] basic vfio-ccw infrastructure Dong Jia Shi
@ 2017-03-10 10:46     ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-03-10 10:46 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Thu, 9 Mar 2017 17:25:25 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2017-02-17 09:29:24 +0100]:
> 
> > vfio-ccw: the basic infrastructure
> > ==================================
> > 
> > Introduction
> > ------------
> > 
> > Here we describe the vfio support for I/O subchannel devices for
> > Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
> > virtual machine, while vfio is the means.
> > 
> 
> Ping.
> 
> Any more comment for this version?
> 
> Thanks.
> 
> [...]
> 

I just walked through the patch set again and did not notice anything
else. There's still the area of channel path handling, but I don't see
why it shouldn't be able to fit into the infrastructure you introduce
here. (I'd expect most of the interesting stuff there to be handled in
user space anyway.)

You should update MAINTAINERS, though. I'm not sure the s390 cio folks
are prepared to handle vfio-ccw :) (although I think you should cc:
them on the patches exporting cio interfaces.) It's probably best to
take vfio-ccw through the kvm-s390 tree.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
@ 2017-03-10 10:46     ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-03-10 10:46 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Thu, 9 Mar 2017 17:25:25 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2017-02-17 09:29:24 +0100]:
> 
> > vfio-ccw: the basic infrastructure
> > ==================================
> > 
> > Introduction
> > ------------
> > 
> > Here we describe the vfio support for I/O subchannel devices for
> > Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
> > virtual machine, while vfio is the means.
> > 
> 
> Ping.
> 
> Any more comment for this version?
> 
> Thanks.
> 
> [...]
> 

I just walked through the patch set again and did not notice anything
else. There's still the area of channel path handling, but I don't see
why it shouldn't be able to fit into the infrastructure you introduce
here. (I'd expect most of the interesting stuff there to be handled in
user space anyway.)

You should update MAINTAINERS, though. I'm not sure the s390 cio folks
are prepared to handle vfio-ccw :) (although I think you should cc:
them on the patches exporting cio interfaces.) It's probably best to
take vfio-ccw through the kvm-s390 tree.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
  2017-03-10 10:46     ` [Qemu-devel] " Cornelia Huck
  (?)
@ 2017-03-13  7:16     ` Dong Jia Shi
  2017-03-16  9:25         ` [Qemu-devel] " Cornelia Huck
  -1 siblings, 1 reply; 61+ messages in thread
From: Dong Jia Shi @ 2017-03-13  7:16 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Dong Jia Shi, kvm, linux-s390, qemu-devel, renxiaof, borntraeger,
	agraf, alex.williamson, pmorel, pasic, wkywang

* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-03-10 11:46:05 +0100]:

> On Thu, 9 Mar 2017 17:25:25 +0800
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > * Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> [2017-02-17 09:29:24 +0100]:
> > 
> > > vfio-ccw: the basic infrastructure
> > > ==================================
> > > 
> > > Introduction
> > > ------------
> > > 
> > > Here we describe the vfio support for I/O subchannel devices for
> > > Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
> > > virtual machine, while vfio is the means.
> > > 
> > 
> > Ping.
> > 
> > Any more comment for this version?
> > 
> > Thanks.
> > 
> > [...]
> > 
Hey Conny,

> 
> I just walked through the patch set again and did not notice anything
> else. There's still the area of channel path handling, but I don't see
> why it shouldn't be able to fit into the infrastructure you introduce
> here. (I'd expect most of the interesting stuff there to be handled in
> user space anyway.)
Thanks! I have the same expectation for the channel path handling part.

> 
> You should update MAINTAINERS, though. I'm not sure the s390 cio folks
> are prepared to handle vfio-ccw :) (although I think you should cc:
> them on the patches exporting cio interfaces.) It's probably best to
> take vfio-ccw through the kvm-s390 tree.

Beside the fixes for former comments form you and Alex, I will also do
the following stuff in the next version:
1. Remove the "RFC" tag from the mail subject.
2. Cc the s390 CIO layer maintainers for patch 1 and 2:
    Sebastian Ott <sebott@linux.vnet.ibm.com>
    Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
3. For the new created files, update MAINTAINERS by adding:
S390 VFIO-CCW DRIVER
M:	Cornelia Huck <cornelia.huck@de.ibm.com>
L:	linux-s390@vger.kernel.org
L:	kvm@vger.kernel.org
S:	Supported
F:	drivers/s390/cio/vfio_ccw*
F:	Documentation/s390/vfio-ccw.txt
F:	include/uapi/linux/vfio_ccw.h

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
  2017-03-13  7:16     ` Dong Jia Shi
@ 2017-03-16  9:25         ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-03-16  9:25 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Mon, 13 Mar 2017 15:16:52 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> Beside the fixes for former comments form you and Alex, I will also do
> the following stuff in the next version:
> 1. Remove the "RFC" tag from the mail subject.
> 2. Cc the s390 CIO layer maintainers for patch 1 and 2:
>     Sebastian Ott <sebott@linux.vnet.ibm.com>
>     Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
> 3. For the new created files, update MAINTAINERS by adding:
> S390 VFIO-CCW DRIVER
> M:	Cornelia Huck <cornelia.huck@de.ibm.com>

Please add

M: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>

as well :)

> L:	linux-s390@vger.kernel.org
> L:	kvm@vger.kernel.org
> S:	Supported
> F:	drivers/s390/cio/vfio_ccw*
> F:	Documentation/s390/vfio-ccw.txt
> F:	include/uapi/linux/vfio_ccw.h

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
@ 2017-03-16  9:25         ` Cornelia Huck
  0 siblings, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2017-03-16  9:25 UTC (permalink / raw)
  To: Dong Jia Shi
  Cc: kvm, linux-s390, qemu-devel, renxiaof, borntraeger, agraf,
	alex.williamson, pmorel, pasic, wkywang

On Mon, 13 Mar 2017 15:16:52 +0800
Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:

> Beside the fixes for former comments form you and Alex, I will also do
> the following stuff in the next version:
> 1. Remove the "RFC" tag from the mail subject.
> 2. Cc the s390 CIO layer maintainers for patch 1 and 2:
>     Sebastian Ott <sebott@linux.vnet.ibm.com>
>     Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
> 3. For the new created files, update MAINTAINERS by adding:
> S390 VFIO-CCW DRIVER
> M:	Cornelia Huck <cornelia.huck@de.ibm.com>

Please add

M: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>

as well :)

> L:	linux-s390@vger.kernel.org
> L:	kvm@vger.kernel.org
> S:	Supported
> F:	drivers/s390/cio/vfio_ccw*
> F:	Documentation/s390/vfio-ccw.txt
> F:	include/uapi/linux/vfio_ccw.h

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [Qemu-devel] [PATCH RFC v3 00/15] basic vfio-ccw infrastructure
  2017-03-16  9:25         ` [Qemu-devel] " Cornelia Huck
  (?)
@ 2017-03-16  9:28         ` Dong Jia Shi
  -1 siblings, 0 replies; 61+ messages in thread
From: Dong Jia Shi @ 2017-03-16  9:28 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Dong Jia Shi, kvm, linux-s390, qemu-devel, renxiaof, borntraeger,
	agraf, alex.williamson, pmorel, pasic, wkywang

* Cornelia Huck <cornelia.huck@de.ibm.com> [2017-03-16 10:25:16 +0100]:

> On Mon, 13 Mar 2017 15:16:52 +0800
> Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
> > Beside the fixes for former comments form you and Alex, I will also do
> > the following stuff in the next version:
> > 1. Remove the "RFC" tag from the mail subject.
> > 2. Cc the s390 CIO layer maintainers for patch 1 and 2:
> >     Sebastian Ott <sebott@linux.vnet.ibm.com>
> >     Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
> > 3. For the new created files, update MAINTAINERS by adding:
> > S390 VFIO-CCW DRIVER
> > M:	Cornelia Huck <cornelia.huck@de.ibm.com>
> 
> Please add
> 
> M: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
> 
> as well :)
> 
NP. :)

> > L:	linux-s390@vger.kernel.org
> > L:	kvm@vger.kernel.org
> > S:	Supported
> > F:	drivers/s390/cio/vfio_ccw*
> > F:	Documentation/s390/vfio-ccw.txt
> > F:	include/uapi/linux/vfio_ccw.h

-- 
Dong Jia

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2017-03-16  9:28 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-17  8:29 [PATCH RFC v3 00/15] basic vfio-ccw infrastructure Dong Jia Shi
2017-02-17  8:29 ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 01/15] s390: cio: introduce cio_cancel_halt_clear Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 02/15] s390: cio: export more interfaces Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 03/15] vfio: ccw: define device_api strings Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 04/15] vfio: ccw: basic implementation for vfio_ccw driver Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-20 18:31   ` Cornelia Huck
2017-02-20 18:31     ` [Qemu-devel] " Cornelia Huck
2017-02-21  7:36     ` Dong Jia Shi
2017-02-21  7:36     ` [Qemu-devel] " Dong Jia Shi
2017-02-21  7:43       ` Dong Jia Shi
2017-02-21  7:43       ` [Qemu-devel] " Dong Jia Shi
2017-02-21 15:43       ` Cornelia Huck
2017-02-21 15:43         ` [Qemu-devel] " Cornelia Huck
2017-02-22  3:05         ` Dong Jia Shi
2017-02-22  3:05         ` Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 05/15] vfio: ccw: introduce channel program interfaces Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 06/15] vfio: ccw: register vfio_ccw to the mediated device framework Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 07/15] vfio: ccw: introduce ccw_io_region Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-24 23:13   ` Alex Williamson
2017-02-24 23:13     ` [Qemu-devel] " Alex Williamson
2017-02-17  8:29 ` [PATCH RFC v3 08/15] vfio: ccw: handle ccw command request Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 09/15] vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 10/15] vfio: ccw: realize VFIO_DEVICE_RESET ioctl Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 11/15] vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-24 23:27   ` Alex Williamson
2017-02-24 23:27     ` [Qemu-devel] " Alex Williamson
2017-02-27  1:44     ` Dong Jia Shi
2017-02-27  1:44     ` Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 12/15] vfio: ccw: return I/O results asynchronously Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 13/15] vfio: ccw: introduce a finite state machine Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 14/15] docs: add documentation for vfio-ccw Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-17  8:29 ` [PATCH RFC v3 15/15] vfio: ccw: introduce support for ccw0 Dong Jia Shi
2017-02-17  8:29   ` [Qemu-devel] " Dong Jia Shi
2017-02-20 18:59   ` Cornelia Huck
2017-02-20 18:59     ` [Qemu-devel] " Cornelia Huck
2017-02-21  8:58     ` Dong Jia Shi
2017-02-21  8:58     ` [Qemu-devel] " Dong Jia Shi
2017-02-21 15:47       ` Cornelia Huck
2017-02-21 15:47         ` [Qemu-devel] " Cornelia Huck
2017-03-09  9:25 ` [Qemu-devel] [PATCH RFC v3 00/15] basic vfio-ccw infrastructure Dong Jia Shi
2017-03-10 10:46   ` Cornelia Huck
2017-03-10 10:46     ` [Qemu-devel] " Cornelia Huck
2017-03-13  7:16     ` Dong Jia Shi
2017-03-16  9:25       ` Cornelia Huck
2017-03-16  9:25         ` [Qemu-devel] " Cornelia Huck
2017-03-16  9:28         ` Dong Jia Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.