All of lore.kernel.org
 help / color / mirror / Atom feed
* virtio scsi host draft specification, v3
@ 2011-06-07 13:43 ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-07 13:43 UTC (permalink / raw)
  To: Linux Virtualization, Linux Kernel Mailing List, qemu-devel
  Cc: Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	Hannes Reinecke, Michael S. Tsirkin, kvm

Hi all,

after some preliminary discussion on the QEMU mailing list, I present a
draft specification for a virtio-based SCSI host (controller, HBA, you
name it).

The virtio SCSI host is the basis of an alternative storage stack for
KVM. This stack would overcome several limitations of the current
solution, virtio-blk:

1) scalability limitations: virtio-blk-over-PCI puts a strong upper
limit on the number of devices that can be added to a guest. Common
configurations have a limit of ~30 devices. While this can be worked
around by implementing a PCI-to-PCI bridge, or by using multifunction
virtio-blk devices, these solutions either have not been implemented
yet, or introduce management restrictions. On the other hand, the SCSI
architecture is well known for its scalability and virtio-scsi supports
advanced feature such as multiqueueing.

2) limited flexibility: virtio-blk does not support all possible storage
scenarios. For example, it does not allow SCSI passthrough or persistent
reservations. In principle, virtio-scsi provides anything that the
underlying SCSI target (be it physical storage, iSCSI or the in-kernel
target) supports.

3) limited extensibility: over the time, many features have been added
to virtio-blk. Each such change requires modifications to the virtio
specification, to the guest drivers, and to the device model in the
host. The virtio-scsi spec has been written to follow SAM conventions,
and exposing new features to the guest will only require changes to the
host's SCSI target implementation.


Comments are welcome.

Paolo 

------------------------------- >8 -----------------------------------


Virtio SCSI Host Device Spec
============================

The virtio SCSI host device groups together one or more simple virtual
devices (ie. disk), and allows communicating to these devices using the
SCSI protocol.  An instance of the device represents a SCSI host with
possibly many buses, targets and LUN attached.

The virtio SCSI device services two kinds of requests:

- command requests for a logical unit;

- task management functions related to a logical unit, target or
command.

The device is also able to send out notifications about added
and removed logical units.

v1:
    First public version

v2:
    Merged all virtqueues into one, removed separate TARGET fields

v3:
    Added configuration information and reworked descriptor structure.
    Added back multiqueue on Avi's request, while still leaving TARGET
    fields out.  Added dummy event and clarified some aspects of the
    event protocol.  First version sent to a wider audience (linux-kernel
    and virtio lists).

Configuration
-------------

Subsystem Device ID
    TBD

Virtqueues
    0:controlq
    1:eventq
    2..n:request queues

Feature bits
    VIRTIO_SCSI_F_INOUT (0) - Whether a single request can include both
        read-only and write-only data buffers.

Device configuration layout
    struct virtio_scsi_config {
        u32 num_queues;
        u32 event_info_size;
        u32 sense_size;
        u32 cdb_size;
    }

    num_queues is the total number of virtqueues exposed by the
    device.  The driver is free to use only one request queue, or
    it can use more to achieve better performance.

    event_info_size is the maximum size that the device will fill
    for buffers that the driver places in the eventq.  The
    driver should always put buffers at least of this size.

    sense_size is the maximum size of the sense data that the device
    will write.  The default value is written by the device and
    will always be 96, but the driver can modify it.

    cdb_size is the maximum size of the CBD that the driver
    will write.  The default value is written by the device and
    will always be 32, but the driver can likewise modify it.

Device initialization
---------------------

The initialization routine should first of all discover the device's
virtqueues.

The driver should then place at least a buffer in the eventq.
Buffers returned by the device on the eventq may be referred
to as "events" in the rest of the document.

The driver can immediately issue requests (for example, INQUIRY or
REPORT LUNS) or task management functions (for example, I_T RESET).

Device operation: request queues
--------------------------------

The driver queues requests to an arbitrary request queue, and they are
used by the device on that same queue.

Requests have the following format:

    struct virtio_scsi_req_cmd {
        u8 lun[8];
        u64 id;
        u8 task_attr;
        u8 prio;
        u8 crn;
        char cdb[cdb_size];
        char dataout[];

        u8 sense[sense_size];
        u32 sense_len;
        u32 residual;
        u16 status_qualifier;
        u8 status;
        u8 response;
        char datain[];
    };

    /* command-specific response values */
    #define VIRTIO_SCSI_S_OK              0
    #define VIRTIO_SCSI_S_UNDERRUN        1
    #define VIRTIO_SCSI_S_ABORTED         2
    #define VIRTIO_SCSI_S_FAILURE         3

    /* task_attr */
    #define VIRTIO_SCSI_S_SIMPLE          0
    #define VIRTIO_SCSI_S_ORDERED         1
    #define VIRTIO_SCSI_S_HEAD            2
    #define VIRTIO_SCSI_S_ACA             3

    The lun field addresses a bus, target and logical unit in the SCSI
    host.  The id field is the command identifier as defined in SAM.

    Task_attr, prio and CRN are defined in SAM.  The prio field should
    always be zero, as command priority is explicitly not supported by
    this version of the device.  task_attr defines the task attribute as
    in the table above, Note that all task attributes may be mapped to
    SIMPLE by the device.  CRN is generally expected to be 0, but clients
    can provide it.  The maximum CRN value defined by the protocol is 255,
    since CRN is stored in an 8-bit integer.

    All of these fields are always read-only, as are the cdb and dataout
    field.  sense and subsequent fields are always write-only.

    The sense_len field indicates the number of bytes actually written
    to the sense buffer.  The residual field indicates the residual
    size, calculated as data_length - number_of_transferred_bytes, for
    read or write operations.

    The status byte is written by the device to be the SCSI status code.

    The response byte is written by the device to be one of the following:

    - VIRTIO_SCSI_S_OK when the request was completed and the status byte
      is filled with a SCSI status code (not necessarily "GOOD").

    - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
      more data than is available in the data buffers.

    - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
      or another task management function.

    - VIRTIO_SCSI_S_FAILURE for other host or guest error.  In particular,
      if neither dataout nor datain is empty, and the VIRTIO_SCSI_F_INOUT
      feature has not been negotiated, the request will be immediately
      returned with a response equal to VIRTIO_SCSI_S_FAILURE.

Device operation: controlq
--------------------------

The controlq is used for other SCSI transport operations.
Requests have the following format:

    struct virtio_scsi_ctrl
    {
        u32 type;
        ...
        u8 response;
    }

    The type identifies the remaining fields.

The following commands are defined:

- Task management function

    #define VIRTIO_SCSI_T_TMF                      0

    #define VIRTIO_SCSI_T_TMF_ABORT_TASK           0
    #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET       1
    #define VIRTIO_SCSI_T_TMF_CLEAR_ACA            2
    #define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET       3
    #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET      4
    #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
    #define VIRTIO_SCSI_T_TMF_QUERY_TASK           6
    #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET       7

    struct virtio_scsi_ctrl_tmf
    {
        u32 type;
        u32 subtype;
        u8 lun[8];
        u64 id;
        u8 additional[];
        u8 response;
    }

    /* command-specific response values */
    #define VIRTIO_SCSI_S_FUNCTION_COMPLETE        0
    #define VIRTIO_SCSI_S_FAILURE                  3
    #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED       4
    #define VIRTIO_SCSI_S_FUNCTION_REJECTED        5
    #define VIRTIO_SCSI_S_INCORRECT_LUN            6

    The type is VIRTIO_SCSI_T_TMF.  All fields but the last one are
    filled by the driver, the response field is filled in by the device.
    The id command must match the id in a SCSI command.  Irrelevant fields
    for the requested TMF are ignored.

    Note that since ACA is not supported by this version of the spec,
    VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.

    The outcome of the task management function is written by the device
    in the response field.  Return values map 1-to-1 with those defined
    in SAM.

- Asynchronous notification query

    #define VIRTIO_SCSI_T_AN_QUERY                    1

    struct virtio_scsi_ctrl_an {
        u32 type;
        u8  lun[8];
        u32 event_requested;
        u32 event_actual;
        u8  response;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE  2
    #define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT          4
    #define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST    8
    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
    #define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST          32
    #define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY         64

    By sending this command, the driver asks the device which events
    the given LUN can report, as described in paragraphs 6.6 and A.6
    of the SCSI MMC specification.  The driver writes the events it is
    interested in into the event_requested; the device responds by
    writing the events that it supports into event_actual.

    The type is VIRTIO_SCSI_T_AN_QUERY.  The lun and event_requested
    fields are written by the driver.  The event_actual and response
    fields are written by the device.

    Valid values of the response byte are VIRTIO_SCSI_S_OK or
    VIRTIO_SCSI_S_FAILURE (with the same meaning as above).

- Asynchronous notification subscription

    #define VIRTIO_SCSI_T_AN_SUBSCRIBE                2

    struct virtio_scsi_ctrl_an {
        u32 type;
        u8  lun[8];
        u32 event_requested;
        u32 event_actual;
        u8  response;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16

    By sending this command, the driver asks the specified LUN to report
    events for its physical interface, as described in Annex A of the SCSI
    MMC specification.  The driver writes the events it is interested in
    into the event_requested; the device responds by writing the events
    that it supports into event_actual.

    The type is VIRTIO_SCSI_T_AN_SUBSCRIBE.  The lun and event_requested
    fields are written by the driver.  The event_actual and response
    fields are written by the device.

    Valid values of the response byte are VIRTIO_SCSI_S_OK,
    VIRTIO_SCSI_S_FAILURE (with the same meaning as above).

Device operation: eventq
------------------------

The eventq is used by the device to report information on logical units
that are attached to it.  The driver should always leave a few (?) buffers
ready in the eventq.  The device will end up dropping events if it finds
no buffer ready.

Buffers are placed in the eventq and filled by the device when interesting
events occur.  The buffers should be strictly write-only (device-filled)
and the size of the buffers should be at least the value given in the
device's configuration information.

Events have the following format:

    #define VIRTIO_SCSI_T_EVENTS_MISSED   0x80000000

    struct virtio_scsi_ctrl_recv {
        u32 event;
        ...
    }

If bit 31 is set in the event field, the device failed to report an
event due to missing buffers.  In this case, the driver should poll the
logical units for unit attention conditions, and/or do whatever form of
bus scan is appropriate for the guest operating system.

Other data that the device writes to the buffer depends on the contents
of the event field.  The following events are defined:

- No event

    #define VIRTIO_SCSI_T_NO_EVENT         0

    This event is fired in the following cases:

    1) When the device detects in the eventq a buffer that is shorter
    than what is indicated in the configuration field, it will use
    it immediately and put this dummy value in the event field.
    A well-written driver will never observe this situation.

    2) When events are dropped, the device may signal this event as
    soon as the drivers makes a buffer available, in order to request
    action from the driver.  In this case, of course, this event will
    be reported with the VIRTIO_SCSI_T_EVENTS_MISSED flag.

- Transport reset

    #define VIRTIO_SCSI_T_TRANSPORT_RESET  1

    struct virtio_scsi_reset {
        u32 event;
        u8  lun[8];
        u32 reason;
    }

    #define VIRTIO_SCSI_EVT_RESET_HARD         0
    #define VIRTIO_SCSI_EVT_RESET_RESCAN       1
    #define VIRTIO_SCSI_EVT_RESET_REMOVED      2

    By sending this event, the device signals that a logical unit
    on a target has been reset, including the case of a new device
    appearing or disappearing on the bus.

    The device fills in all fields.  The event field is set to
    VIRTIO_SCSI_T_TRANSPORT_RESET.  The lun field addresses a bus,
    target and logical unit in the SCSI host.

    The reason value is one of the four #define values appearing above.
    VIRTIO_SCSI_EVT_RESET_REMOVED is used if the target or logical unit
    is no longer able to receive commands.  VIRTIO_SCSI_EVT_RESET_HARD
    is used if the logical unit has been reset, but is still present.
    VIRTIO_SCSI_EVT_RESET_RESCAN is used if a target or logical unit has
    just appeared on the device.

    When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
    is sent for LUN 0, the driver should ask the initiator to rescan
    the target, in order to detect the case when an entire target has
    appeared or disappeared.

    Events will also be reported via sense codes (this obviously does
    not apply to newly appeared buses or targets, since the application
    has never discovered them):

    - VIRTIO_SCSI_EVT_RESET_HARD
      sense UNIT ATTENTION
      asc POWER ON, RESET OR BUS DEVICE RESET OCCURRED

    - VIRTIO_SCSI_EVT_RESET_RESCAN
      sense UNIT ATTENTION
      asc REPORTED LUNS DATA HAS CHANGED

    - VIRTIO_SCSI_EVT_RESET_REMOVED
      sense ILLEGAL REQUEST
      asc LOGICAL UNIT NOT SUPPORTED

    The preferred way to detect transport reset is always to use events,
    because sense codes are only seen by the driver when it sends a
    SCSI command to the logical unit or target.  However, in case events
    are dropped, the initiator will still be able to synchronize with the
    actual state of the controller if the driver asks the initiator to
    rescan of the SCSI bus.  During the rescan, the initiator will be
    able to observe the above sense codes, and it will process them as
    if it the driver had received the equivalent event.

- Asynchronous notification

    #define VIRTIO_SCSI_T_ASYNC_NOTIFY     2

    struct virtio_scsi_an_event {
        u32 event;
        u8  lun[8];
        u32 reason;
    }

    By sending this event, the device signals that an asynchronous
    event was fired from a physical interface.

    All fields are written by the device.  The event field is set to
    VIRTIO_SCSI_T_ASYNC_NOTIFY.  The reason field is a subset of the
    events that the driver has subscribed to via the "Asynchronous
    notification subscription" command.

    When dropped events are reported, the driver should poll for 
    asynchronous events manually using SCSI commands.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-07 13:43 ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-07 13:43 UTC (permalink / raw)
  To: Linux Virtualization, Linux Kernel Mailing List, qemu-devel
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, Hannes Reinecke

Hi all,

after some preliminary discussion on the QEMU mailing list, I present a
draft specification for a virtio-based SCSI host (controller, HBA, you
name it).

The virtio SCSI host is the basis of an alternative storage stack for
KVM. This stack would overcome several limitations of the current
solution, virtio-blk:

1) scalability limitations: virtio-blk-over-PCI puts a strong upper
limit on the number of devices that can be added to a guest. Common
configurations have a limit of ~30 devices. While this can be worked
around by implementing a PCI-to-PCI bridge, or by using multifunction
virtio-blk devices, these solutions either have not been implemented
yet, or introduce management restrictions. On the other hand, the SCSI
architecture is well known for its scalability and virtio-scsi supports
advanced feature such as multiqueueing.

2) limited flexibility: virtio-blk does not support all possible storage
scenarios. For example, it does not allow SCSI passthrough or persistent
reservations. In principle, virtio-scsi provides anything that the
underlying SCSI target (be it physical storage, iSCSI or the in-kernel
target) supports.

3) limited extensibility: over the time, many features have been added
to virtio-blk. Each such change requires modifications to the virtio
specification, to the guest drivers, and to the device model in the
host. The virtio-scsi spec has been written to follow SAM conventions,
and exposing new features to the guest will only require changes to the
host's SCSI target implementation.


Comments are welcome.

Paolo 

------------------------------- >8 -----------------------------------


Virtio SCSI Host Device Spec
============================

The virtio SCSI host device groups together one or more simple virtual
devices (ie. disk), and allows communicating to these devices using the
SCSI protocol.  An instance of the device represents a SCSI host with
possibly many buses, targets and LUN attached.

The virtio SCSI device services two kinds of requests:

- command requests for a logical unit;

- task management functions related to a logical unit, target or
command.

The device is also able to send out notifications about added
and removed logical units.

v1:
    First public version

v2:
    Merged all virtqueues into one, removed separate TARGET fields

v3:
    Added configuration information and reworked descriptor structure.
    Added back multiqueue on Avi's request, while still leaving TARGET
    fields out.  Added dummy event and clarified some aspects of the
    event protocol.  First version sent to a wider audience (linux-kernel
    and virtio lists).

Configuration
-------------

Subsystem Device ID
    TBD

Virtqueues
    0:controlq
    1:eventq
    2..n:request queues

Feature bits
    VIRTIO_SCSI_F_INOUT (0) - Whether a single request can include both
        read-only and write-only data buffers.

Device configuration layout
    struct virtio_scsi_config {
        u32 num_queues;
        u32 event_info_size;
        u32 sense_size;
        u32 cdb_size;
    }

    num_queues is the total number of virtqueues exposed by the
    device.  The driver is free to use only one request queue, or
    it can use more to achieve better performance.

    event_info_size is the maximum size that the device will fill
    for buffers that the driver places in the eventq.  The
    driver should always put buffers at least of this size.

    sense_size is the maximum size of the sense data that the device
    will write.  The default value is written by the device and
    will always be 96, but the driver can modify it.

    cdb_size is the maximum size of the CBD that the driver
    will write.  The default value is written by the device and
    will always be 32, but the driver can likewise modify it.

Device initialization
---------------------

The initialization routine should first of all discover the device's
virtqueues.

The driver should then place at least a buffer in the eventq.
Buffers returned by the device on the eventq may be referred
to as "events" in the rest of the document.

The driver can immediately issue requests (for example, INQUIRY or
REPORT LUNS) or task management functions (for example, I_T RESET).

Device operation: request queues
--------------------------------

The driver queues requests to an arbitrary request queue, and they are
used by the device on that same queue.

Requests have the following format:

    struct virtio_scsi_req_cmd {
        u8 lun[8];
        u64 id;
        u8 task_attr;
        u8 prio;
        u8 crn;
        char cdb[cdb_size];
        char dataout[];

        u8 sense[sense_size];
        u32 sense_len;
        u32 residual;
        u16 status_qualifier;
        u8 status;
        u8 response;
        char datain[];
    };

    /* command-specific response values */
    #define VIRTIO_SCSI_S_OK              0
    #define VIRTIO_SCSI_S_UNDERRUN        1
    #define VIRTIO_SCSI_S_ABORTED         2
    #define VIRTIO_SCSI_S_FAILURE         3

    /* task_attr */
    #define VIRTIO_SCSI_S_SIMPLE          0
    #define VIRTIO_SCSI_S_ORDERED         1
    #define VIRTIO_SCSI_S_HEAD            2
    #define VIRTIO_SCSI_S_ACA             3

    The lun field addresses a bus, target and logical unit in the SCSI
    host.  The id field is the command identifier as defined in SAM.

    Task_attr, prio and CRN are defined in SAM.  The prio field should
    always be zero, as command priority is explicitly not supported by
    this version of the device.  task_attr defines the task attribute as
    in the table above, Note that all task attributes may be mapped to
    SIMPLE by the device.  CRN is generally expected to be 0, but clients
    can provide it.  The maximum CRN value defined by the protocol is 255,
    since CRN is stored in an 8-bit integer.

    All of these fields are always read-only, as are the cdb and dataout
    field.  sense and subsequent fields are always write-only.

    The sense_len field indicates the number of bytes actually written
    to the sense buffer.  The residual field indicates the residual
    size, calculated as data_length - number_of_transferred_bytes, for
    read or write operations.

    The status byte is written by the device to be the SCSI status code.

    The response byte is written by the device to be one of the following:

    - VIRTIO_SCSI_S_OK when the request was completed and the status byte
      is filled with a SCSI status code (not necessarily "GOOD").

    - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
      more data than is available in the data buffers.

    - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
      or another task management function.

    - VIRTIO_SCSI_S_FAILURE for other host or guest error.  In particular,
      if neither dataout nor datain is empty, and the VIRTIO_SCSI_F_INOUT
      feature has not been negotiated, the request will be immediately
      returned with a response equal to VIRTIO_SCSI_S_FAILURE.

Device operation: controlq
--------------------------

The controlq is used for other SCSI transport operations.
Requests have the following format:

    struct virtio_scsi_ctrl
    {
        u32 type;
        ...
        u8 response;
    }

    The type identifies the remaining fields.

The following commands are defined:

- Task management function

    #define VIRTIO_SCSI_T_TMF                      0

    #define VIRTIO_SCSI_T_TMF_ABORT_TASK           0
    #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET       1
    #define VIRTIO_SCSI_T_TMF_CLEAR_ACA            2
    #define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET       3
    #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET      4
    #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
    #define VIRTIO_SCSI_T_TMF_QUERY_TASK           6
    #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET       7

    struct virtio_scsi_ctrl_tmf
    {
        u32 type;
        u32 subtype;
        u8 lun[8];
        u64 id;
        u8 additional[];
        u8 response;
    }

    /* command-specific response values */
    #define VIRTIO_SCSI_S_FUNCTION_COMPLETE        0
    #define VIRTIO_SCSI_S_FAILURE                  3
    #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED       4
    #define VIRTIO_SCSI_S_FUNCTION_REJECTED        5
    #define VIRTIO_SCSI_S_INCORRECT_LUN            6

    The type is VIRTIO_SCSI_T_TMF.  All fields but the last one are
    filled by the driver, the response field is filled in by the device.
    The id command must match the id in a SCSI command.  Irrelevant fields
    for the requested TMF are ignored.

    Note that since ACA is not supported by this version of the spec,
    VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.

    The outcome of the task management function is written by the device
    in the response field.  Return values map 1-to-1 with those defined
    in SAM.

- Asynchronous notification query

    #define VIRTIO_SCSI_T_AN_QUERY                    1

    struct virtio_scsi_ctrl_an {
        u32 type;
        u8  lun[8];
        u32 event_requested;
        u32 event_actual;
        u8  response;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE  2
    #define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT          4
    #define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST    8
    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
    #define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST          32
    #define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY         64

    By sending this command, the driver asks the device which events
    the given LUN can report, as described in paragraphs 6.6 and A.6
    of the SCSI MMC specification.  The driver writes the events it is
    interested in into the event_requested; the device responds by
    writing the events that it supports into event_actual.

    The type is VIRTIO_SCSI_T_AN_QUERY.  The lun and event_requested
    fields are written by the driver.  The event_actual and response
    fields are written by the device.

    Valid values of the response byte are VIRTIO_SCSI_S_OK or
    VIRTIO_SCSI_S_FAILURE (with the same meaning as above).

- Asynchronous notification subscription

    #define VIRTIO_SCSI_T_AN_SUBSCRIBE                2

    struct virtio_scsi_ctrl_an {
        u32 type;
        u8  lun[8];
        u32 event_requested;
        u32 event_actual;
        u8  response;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16

    By sending this command, the driver asks the specified LUN to report
    events for its physical interface, as described in Annex A of the SCSI
    MMC specification.  The driver writes the events it is interested in
    into the event_requested; the device responds by writing the events
    that it supports into event_actual.

    The type is VIRTIO_SCSI_T_AN_SUBSCRIBE.  The lun and event_requested
    fields are written by the driver.  The event_actual and response
    fields are written by the device.

    Valid values of the response byte are VIRTIO_SCSI_S_OK,
    VIRTIO_SCSI_S_FAILURE (with the same meaning as above).

Device operation: eventq
------------------------

The eventq is used by the device to report information on logical units
that are attached to it.  The driver should always leave a few (?) buffers
ready in the eventq.  The device will end up dropping events if it finds
no buffer ready.

Buffers are placed in the eventq and filled by the device when interesting
events occur.  The buffers should be strictly write-only (device-filled)
and the size of the buffers should be at least the value given in the
device's configuration information.

Events have the following format:

    #define VIRTIO_SCSI_T_EVENTS_MISSED   0x80000000

    struct virtio_scsi_ctrl_recv {
        u32 event;
        ...
    }

If bit 31 is set in the event field, the device failed to report an
event due to missing buffers.  In this case, the driver should poll the
logical units for unit attention conditions, and/or do whatever form of
bus scan is appropriate for the guest operating system.

Other data that the device writes to the buffer depends on the contents
of the event field.  The following events are defined:

- No event

    #define VIRTIO_SCSI_T_NO_EVENT         0

    This event is fired in the following cases:

    1) When the device detects in the eventq a buffer that is shorter
    than what is indicated in the configuration field, it will use
    it immediately and put this dummy value in the event field.
    A well-written driver will never observe this situation.

    2) When events are dropped, the device may signal this event as
    soon as the drivers makes a buffer available, in order to request
    action from the driver.  In this case, of course, this event will
    be reported with the VIRTIO_SCSI_T_EVENTS_MISSED flag.

- Transport reset

    #define VIRTIO_SCSI_T_TRANSPORT_RESET  1

    struct virtio_scsi_reset {
        u32 event;
        u8  lun[8];
        u32 reason;
    }

    #define VIRTIO_SCSI_EVT_RESET_HARD         0
    #define VIRTIO_SCSI_EVT_RESET_RESCAN       1
    #define VIRTIO_SCSI_EVT_RESET_REMOVED      2

    By sending this event, the device signals that a logical unit
    on a target has been reset, including the case of a new device
    appearing or disappearing on the bus.

    The device fills in all fields.  The event field is set to
    VIRTIO_SCSI_T_TRANSPORT_RESET.  The lun field addresses a bus,
    target and logical unit in the SCSI host.

    The reason value is one of the four #define values appearing above.
    VIRTIO_SCSI_EVT_RESET_REMOVED is used if the target or logical unit
    is no longer able to receive commands.  VIRTIO_SCSI_EVT_RESET_HARD
    is used if the logical unit has been reset, but is still present.
    VIRTIO_SCSI_EVT_RESET_RESCAN is used if a target or logical unit has
    just appeared on the device.

    When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
    is sent for LUN 0, the driver should ask the initiator to rescan
    the target, in order to detect the case when an entire target has
    appeared or disappeared.

    Events will also be reported via sense codes (this obviously does
    not apply to newly appeared buses or targets, since the application
    has never discovered them):

    - VIRTIO_SCSI_EVT_RESET_HARD
      sense UNIT ATTENTION
      asc POWER ON, RESET OR BUS DEVICE RESET OCCURRED

    - VIRTIO_SCSI_EVT_RESET_RESCAN
      sense UNIT ATTENTION
      asc REPORTED LUNS DATA HAS CHANGED

    - VIRTIO_SCSI_EVT_RESET_REMOVED
      sense ILLEGAL REQUEST
      asc LOGICAL UNIT NOT SUPPORTED

    The preferred way to detect transport reset is always to use events,
    because sense codes are only seen by the driver when it sends a
    SCSI command to the logical unit or target.  However, in case events
    are dropped, the initiator will still be able to synchronize with the
    actual state of the controller if the driver asks the initiator to
    rescan of the SCSI bus.  During the rescan, the initiator will be
    able to observe the above sense codes, and it will process them as
    if it the driver had received the equivalent event.

- Asynchronous notification

    #define VIRTIO_SCSI_T_ASYNC_NOTIFY     2

    struct virtio_scsi_an_event {
        u32 event;
        u8  lun[8];
        u32 reason;
    }

    By sending this event, the device signals that an asynchronous
    event was fired from a physical interface.

    All fields are written by the device.  The event field is set to
    VIRTIO_SCSI_T_ASYNC_NOTIFY.  The reason field is a subset of the
    events that the driver has subscribed to via the "Asynchronous
    notification subscription" command.

    When dropped events are reported, the driver should poll for 
    asynchronous events manually using SCSI commands.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-07 13:43 ` [Qemu-devel] " Paolo Bonzini
  (?)
@ 2011-06-08 23:28   ` Rusty Russell
  -1 siblings, 0 replies; 91+ messages in thread
From: Rusty Russell @ 2011-06-08 23:28 UTC (permalink / raw)
  To: Paolo Bonzini, Linux Virtualization, Linux Kernel Mailing List,
	qemu-devel
  Cc: Stefan Hajnoczi, Christoph Hellwig, Hannes Reinecke,
	Michael S. Tsirkin, kvm

On Tue, 07 Jun 2011 15:43:49 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Hi all,
> 
> after some preliminary discussion on the QEMU mailing list, I present a
> draft specification for a virtio-based SCSI host (controller, HBA, you
> name it).

OK, I'm impressed.  This is very well written and I doesn't make any of
the obvious mistakes wrt. virtio.

Unfortunately, I know almost nothing of SCSI, so I have to leave it to
others to decide if this is actually useful and sufficient.

I assume you have an implementation, as well?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
@ 2011-06-08 23:28   ` Rusty Russell
  0 siblings, 0 replies; 91+ messages in thread
From: Rusty Russell @ 2011-06-08 23:28 UTC (permalink / raw)
  To: Paolo Bonzini, Linux Virtualization, Linux Kernel Mailing List,
	qemu-devel
  Cc: Stefan Hajnoczi, Christoph Hellwig, Hannes Reinecke,
	Michael S. Tsirkin, kvm

On Tue, 07 Jun 2011 15:43:49 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Hi all,
> 
> after some preliminary discussion on the QEMU mailing list, I present a
> draft specification for a virtio-based SCSI host (controller, HBA, you
> name it).

OK, I'm impressed.  This is very well written and I doesn't make any of
the obvious mistakes wrt. virtio.

Unfortunately, I know almost nothing of SCSI, so I have to leave it to
others to decide if this is actually useful and sufficient.

I assume you have an implementation, as well?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-08 23:28   ` Rusty Russell
  0 siblings, 0 replies; 91+ messages in thread
From: Rusty Russell @ 2011-06-08 23:28 UTC (permalink / raw)
  To: Paolo Bonzini, Linux Virtualization, Linux Kernel Mailing List,
	qemu-devel
  Cc: Christoph Hellwig, Michael S. Tsirkin, Stefan Hajnoczi, kvm,
	Hannes Reinecke

On Tue, 07 Jun 2011 15:43:49 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Hi all,
> 
> after some preliminary discussion on the QEMU mailing list, I present a
> draft specification for a virtio-based SCSI host (controller, HBA, you
> name it).

OK, I'm impressed.  This is very well written and I doesn't make any of
the obvious mistakes wrt. virtio.

Unfortunately, I know almost nothing of SCSI, so I have to leave it to
others to decide if this is actually useful and sufficient.

I assume you have an implementation, as well?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-07 13:43 ` [Qemu-devel] " Paolo Bonzini
  (?)
  (?)
@ 2011-06-08 23:28 ` Rusty Russell
  -1 siblings, 0 replies; 91+ messages in thread
From: Rusty Russell @ 2011-06-08 23:28 UTC (permalink / raw)
  To: Paolo Bonzini, Linux Virtualization, Linux Kernel Mailing List, q
  Cc: Christoph Hellwig, Michael S. Tsirkin, Stefan Hajnoczi, kvm

On Tue, 07 Jun 2011 15:43:49 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Hi all,
> 
> after some preliminary discussion on the QEMU mailing list, I present a
> draft specification for a virtio-based SCSI host (controller, HBA, you
> name it).

OK, I'm impressed.  This is very well written and I doesn't make any of
the obvious mistakes wrt. virtio.

Unfortunately, I know almost nothing of SCSI, so I have to leave it to
others to decide if this is actually useful and sufficient.

I assume you have an implementation, as well?

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-08 23:28   ` Rusty Russell
@ 2011-06-09  6:59     ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-09  6:59 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Linux Virtualization, Linux Kernel Mailing List, qemu-devel,
	Stefan Hajnoczi, Christoph Hellwig, Hannes Reinecke,
	Michael S. Tsirkin, kvm

On 06/09/2011 01:28 AM, Rusty Russell wrote:
>> >  after some preliminary discussion on the QEMU mailing list, I present a
>> >  draft specification for a virtio-based SCSI host (controller, HBA, you
>> >  name it).
>
> OK, I'm impressed.  This is very well written and it doesn't make any of
> the obvious mistakes wrt. virtio.

Thanks very much, and thanks to those who corrected my early mistakes.

> I assume you have an implementation, as well?

Unfortunately not; "we're working on it", which means I should start in 
July when I come back from vacation.

Do you prefer to wait for one before I make a patch to the LyX source? 
In the meanwhile, can you reserve a subsystem ID for me?

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-09  6:59     ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-09  6:59 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Hannes Reinecke,
	Linux Virtualization

On 06/09/2011 01:28 AM, Rusty Russell wrote:
>> >  after some preliminary discussion on the QEMU mailing list, I present a
>> >  draft specification for a virtio-based SCSI host (controller, HBA, you
>> >  name it).
>
> OK, I'm impressed.  This is very well written and it doesn't make any of
> the obvious mistakes wrt. virtio.

Thanks very much, and thanks to those who corrected my early mistakes.

> I assume you have an implementation, as well?

Unfortunately not; "we're working on it", which means I should start in 
July when I come back from vacation.

Do you prefer to wait for one before I make a patch to the LyX source? 
In the meanwhile, can you reserve a subsystem ID for me?

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-08 23:28   ` Rusty Russell
  (?)
  (?)
@ 2011-06-09  6:59   ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-09  6:59 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 06/09/2011 01:28 AM, Rusty Russell wrote:
>> >  after some preliminary discussion on the QEMU mailing list, I present a
>> >  draft specification for a virtio-based SCSI host (controller, HBA, you
>> >  name it).
>
> OK, I'm impressed.  This is very well written and it doesn't make any of
> the obvious mistakes wrt. virtio.

Thanks very much, and thanks to those who corrected my early mistakes.

> I assume you have an implementation, as well?

Unfortunately not; "we're working on it", which means I should start in 
July when I come back from vacation.

Do you prefer to wait for one before I make a patch to the LyX source? 
In the meanwhile, can you reserve a subsystem ID for me?

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-09  6:59     ` [Qemu-devel] " Paolo Bonzini
@ 2011-06-10 11:33       ` Rusty Russell
  -1 siblings, 0 replies; 91+ messages in thread
From: Rusty Russell @ 2011-06-10 11:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Linux Virtualization, Linux Kernel Mailing List, qemu-devel,
	Stefan Hajnoczi, Christoph Hellwig, Hannes Reinecke,
	Michael S. Tsirkin, kvm

On Thu, 09 Jun 2011 08:59:27 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 06/09/2011 01:28 AM, Rusty Russell wrote:
> >> >  after some preliminary discussion on the QEMU mailing list, I present a
> >> >  draft specification for a virtio-based SCSI host (controller, HBA, you
> >> >  name it).
> >
> > OK, I'm impressed.  This is very well written and it doesn't make any of
> > the obvious mistakes wrt. virtio.
> 
> Thanks very much, and thanks to those who corrected my early mistakes.
> 
> > I assume you have an implementation, as well?
> 
> Unfortunately not; "we're working on it", which means I should start in 
> July when I come back from vacation.
> 
> Do you prefer to wait for one before I make a patch to the LyX source? 
> In the meanwhile, can you reserve a subsystem ID for me?
> 
> Paolo

Sure, you can have the next subsystem ID.

It's a pain to patch once it's in LyX, so let's get the implementation
base on what you posted here an see how much it changes first...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-10 11:33       ` Rusty Russell
  0 siblings, 0 replies; 91+ messages in thread
From: Rusty Russell @ 2011-06-10 11:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Hannes Reinecke,
	Linux Virtualization

On Thu, 09 Jun 2011 08:59:27 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 06/09/2011 01:28 AM, Rusty Russell wrote:
> >> >  after some preliminary discussion on the QEMU mailing list, I present a
> >> >  draft specification for a virtio-based SCSI host (controller, HBA, you
> >> >  name it).
> >
> > OK, I'm impressed.  This is very well written and it doesn't make any of
> > the obvious mistakes wrt. virtio.
> 
> Thanks very much, and thanks to those who corrected my early mistakes.
> 
> > I assume you have an implementation, as well?
> 
> Unfortunately not; "we're working on it", which means I should start in 
> July when I come back from vacation.
> 
> Do you prefer to wait for one before I make a patch to the LyX source? 
> In the meanwhile, can you reserve a subsystem ID for me?
> 
> Paolo

Sure, you can have the next subsystem ID.

It's a pain to patch once it's in LyX, so let's get the implementation
base on what you posted here an see how much it changes first...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-09  6:59     ` [Qemu-devel] " Paolo Bonzini
  (?)
  (?)
@ 2011-06-10 11:33     ` Rusty Russell
  -1 siblings, 0 replies; 91+ messages in thread
From: Rusty Russell @ 2011-06-10 11:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On Thu, 09 Jun 2011 08:59:27 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 06/09/2011 01:28 AM, Rusty Russell wrote:
> >> >  after some preliminary discussion on the QEMU mailing list, I present a
> >> >  draft specification for a virtio-based SCSI host (controller, HBA, you
> >> >  name it).
> >
> > OK, I'm impressed.  This is very well written and it doesn't make any of
> > the obvious mistakes wrt. virtio.
> 
> Thanks very much, and thanks to those who corrected my early mistakes.
> 
> > I assume you have an implementation, as well?
> 
> Unfortunately not; "we're working on it", which means I should start in 
> July when I come back from vacation.
> 
> Do you prefer to wait for one before I make a patch to the LyX source? 
> In the meanwhile, can you reserve a subsystem ID for me?
> 
> Paolo

Sure, you can have the next subsystem ID.

It's a pain to patch once it's in LyX, so let's get the implementation
base on what you posted here an see how much it changes first...

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 11:33       ` [Qemu-devel] " Rusty Russell
@ 2011-06-10 12:14         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-10 12:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Linux Virtualization, Linux Kernel Mailing List, qemu-devel,
	Stefan Hajnoczi, Christoph Hellwig, Hannes Reinecke,
	Michael S. Tsirkin, kvm, Rusty Russell

On Fri, Jun 10, 2011 at 12:33 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Thu, 09 Jun 2011 08:59:27 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 06/09/2011 01:28 AM, Rusty Russell wrote:
>> >> >  after some preliminary discussion on the QEMU mailing list, I present a
>> >> >  draft specification for a virtio-based SCSI host (controller, HBA, you
>> >> >  name it).
>> >
>> > OK, I'm impressed.  This is very well written and it doesn't make any of
>> > the obvious mistakes wrt. virtio.
>>
>> Thanks very much, and thanks to those who corrected my early mistakes.
>>
>> > I assume you have an implementation, as well?
>>
>> Unfortunately not; "we're working on it", which means I should start in
>> July when I come back from vacation.
>>
>> Do you prefer to wait for one before I make a patch to the LyX source?
>> In the meanwhile, can you reserve a subsystem ID for me?
>>
>> Paolo
>
> Sure, you can have the next subsystem ID.
>
> It's a pain to patch once it's in LyX, so let's get the implementation
> base on what you posted here an see how much it changes first...

Paolo, I'll switch the Linux guest LLD and QEMU virtio-scsi skeleton
that I have to comply with the spec.  Does this sound good or did you
want to write these from scratch?

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-10 12:14         ` Stefan Hajnoczi
  0 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-10 12:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Hannes Reinecke, Linux Virtualization

On Fri, Jun 10, 2011 at 12:33 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Thu, 09 Jun 2011 08:59:27 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 06/09/2011 01:28 AM, Rusty Russell wrote:
>> >> >  after some preliminary discussion on the QEMU mailing list, I present a
>> >> >  draft specification for a virtio-based SCSI host (controller, HBA, you
>> >> >  name it).
>> >
>> > OK, I'm impressed.  This is very well written and it doesn't make any of
>> > the obvious mistakes wrt. virtio.
>>
>> Thanks very much, and thanks to those who corrected my early mistakes.
>>
>> > I assume you have an implementation, as well?
>>
>> Unfortunately not; "we're working on it", which means I should start in
>> July when I come back from vacation.
>>
>> Do you prefer to wait for one before I make a patch to the LyX source?
>> In the meanwhile, can you reserve a subsystem ID for me?
>>
>> Paolo
>
> Sure, you can have the next subsystem ID.
>
> It's a pain to patch once it's in LyX, so let's get the implementation
> base on what you posted here an see how much it changes first...

Paolo, I'll switch the Linux guest LLD and QEMU virtio-scsi skeleton
that I have to comply with the spec.  Does this sound good or did you
want to write these from scratch?

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 11:33       ` [Qemu-devel] " Rusty Russell
  (?)
@ 2011-06-10 12:14       ` Stefan Hajnoczi
  -1 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-10 12:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On Fri, Jun 10, 2011 at 12:33 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Thu, 09 Jun 2011 08:59:27 +0200, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 06/09/2011 01:28 AM, Rusty Russell wrote:
>> >> >  after some preliminary discussion on the QEMU mailing list, I present a
>> >> >  draft specification for a virtio-based SCSI host (controller, HBA, you
>> >> >  name it).
>> >
>> > OK, I'm impressed.  This is very well written and it doesn't make any of
>> > the obvious mistakes wrt. virtio.
>>
>> Thanks very much, and thanks to those who corrected my early mistakes.
>>
>> > I assume you have an implementation, as well?
>>
>> Unfortunately not; "we're working on it", which means I should start in
>> July when I come back from vacation.
>>
>> Do you prefer to wait for one before I make a patch to the LyX source?
>> In the meanwhile, can you reserve a subsystem ID for me?
>>
>> Paolo
>
> Sure, you can have the next subsystem ID.
>
> It's a pain to patch once it's in LyX, so let's get the implementation
> base on what you posted here an see how much it changes first...

Paolo, I'll switch the Linux guest LLD and QEMU virtio-scsi skeleton
that I have to comply with the spec.  Does this sound good or did you
want to write these from scratch?

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 12:14         ` [Qemu-devel] " Stefan Hajnoczi
@ 2011-06-10 12:22           ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-10 12:22 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Linux Virtualization, Linux Kernel Mailing List, qemu-devel,
	Stefan Hajnoczi, Christoph Hellwig, Hannes Reinecke,
	Michael S. Tsirkin, kvm, Rusty Russell

On 06/10/2011 02:14 PM, Stefan Hajnoczi wrote:
> Paolo, I'll switch the Linux guest LLD and QEMU virtio-scsi skeleton
> that I have to comply with the spec.  Does this sound good or did you
> want to write these from scratch?

Why should I want to write things from scratch? :)  Just send me again a 
pointer to your git tree, I'll make sure to add it as a remote this time 
(private mail will do).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-10 12:22           ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-10 12:22 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Hannes Reinecke, Linux Virtualization

On 06/10/2011 02:14 PM, Stefan Hajnoczi wrote:
> Paolo, I'll switch the Linux guest LLD and QEMU virtio-scsi skeleton
> that I have to comply with the spec.  Does this sound good or did you
> want to write these from scratch?

Why should I want to write things from scratch? :)  Just send me again a 
pointer to your git tree, I'll make sure to add it as a remote this time 
(private mail will do).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 12:14         ` [Qemu-devel] " Stefan Hajnoczi
  (?)
@ 2011-06-10 12:22         ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-10 12:22 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 06/10/2011 02:14 PM, Stefan Hajnoczi wrote:
> Paolo, I'll switch the Linux guest LLD and QEMU virtio-scsi skeleton
> that I have to comply with the spec.  Does this sound good or did you
> want to write these from scratch?

Why should I want to write things from scratch? :)  Just send me again a 
pointer to your git tree, I'll make sure to add it as a remote this time 
(private mail will do).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-07 13:43 ` [Qemu-devel] " Paolo Bonzini
@ 2011-06-10 12:55   ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-10 12:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Linux Virtualization, Linux Kernel Mailing List, qemu-devel,
	Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	Michael S. Tsirkin, kvm

On 06/07/2011 03:43 PM, Paolo Bonzini wrote:
> Hi all,
>
> after some preliminary discussion on the QEMU mailing list, I present a
> draft specification for a virtio-based SCSI host (controller, HBA, you
> name it).
>
> The virtio SCSI host is the basis of an alternative storage stack for
> KVM. This stack would overcome several limitations of the current
> solution, virtio-blk:
>
> 1) scalability limitations: virtio-blk-over-PCI puts a strong upper
> limit on the number of devices that can be added to a guest. Common
> configurations have a limit of ~30 devices. While this can be worked
> around by implementing a PCI-to-PCI bridge, or by using multifunction
> virtio-blk devices, these solutions either have not been implemented
> yet, or introduce management restrictions. On the other hand, the SCSI
> architecture is well known for its scalability and virtio-scsi supports
> advanced feature such as multiqueueing.
>
> 2) limited flexibility: virtio-blk does not support all possible storage
> scenarios. For example, it does not allow SCSI passthrough or persistent
> reservations. In principle, virtio-scsi provides anything that the
> underlying SCSI target (be it physical storage, iSCSI or the in-kernel
> target) supports.
>
> 3) limited extensibility: over the time, many features have been added
> to virtio-blk. Each such change requires modifications to the virtio
> specification, to the guest drivers, and to the device model in the
> host. The virtio-scsi spec has been written to follow SAM conventions,
> and exposing new features to the guest will only require changes to the
> host's SCSI target implementation.
>
>
> Comments are welcome.
>
> Paolo
>
> ------------------------------->8 -----------------------------------
>
>
> Virtio SCSI Host Device Spec
> ============================
>
> The virtio SCSI host device groups together one or more simple virtual
> devices (ie. disk), and allows communicating to these devices using the
> SCSI protocol.  An instance of the device represents a SCSI host with
> possibly many buses, targets and LUN attached.
>
> The virtio SCSI device services two kinds of requests:
>
> - command requests for a logical unit;
>
> - task management functions related to a logical unit, target or
> command.
>
> The device is also able to send out notifications about added
> and removed logical units.
>
> v1:
>      First public version
>
> v2:
>      Merged all virtqueues into one, removed separate TARGET fields
>
> v3:
>      Added configuration information and reworked descriptor structure.
>      Added back multiqueue on Avi's request, while still leaving TARGET
>      fields out.  Added dummy event and clarified some aspects of the
>      event protocol.  First version sent to a wider audience (linux-kernel
>      and virtio lists).
>
> Configuration
> -------------
>
> Subsystem Device ID
>      TBD
>
> Virtqueues
>      0:controlq
>      1:eventq
>      2..n:request queues
>
> Feature bits
>      VIRTIO_SCSI_F_INOUT (0) - Whether a single request can include both
>          read-only and write-only data buffers.
>
> Device configuration layout
>      struct virtio_scsi_config {
>          u32 num_queues;
>          u32 event_info_size;
>          u32 sense_size;
>          u32 cdb_size;
>      }
>
>      num_queues is the total number of virtqueues exposed by the
>      device.  The driver is free to use only one request queue, or
>      it can use more to achieve better performance.
>
>      event_info_size is the maximum size that the device will fill
>      for buffers that the driver places in the eventq.  The
>      driver should always put buffers at least of this size.
>
>      sense_size is the maximum size of the sense data that the device
>      will write.  The default value is written by the device and
>      will always be 96, but the driver can modify it.
>
>      cdb_size is the maximum size of the CBD that the driver
>      will write.  The default value is written by the device and
>      will always be 32, but the driver can likewise modify it.
>
> Device initialization
> ---------------------
>
> The initialization routine should first of all discover the device's
> virtqueues.
>
> The driver should then place at least a buffer in the eventq.
> Buffers returned by the device on the eventq may be referred
> to as "events" in the rest of the document.
>
> The driver can immediately issue requests (for example, INQUIRY or
> REPORT LUNS) or task management functions (for example, I_T RESET).
>
> Device operation: request queues
> --------------------------------
>
> The driver queues requests to an arbitrary request queue, and they are
> used by the device on that same queue.
>
What about request ordering?
If requests are placed on arbitrary queues you'll inevitably run on 
locking issues to ensure strict request ordering.
I would add here:

If a device uses more than one queue it is the responsibility of the 
device to ensure strict request ordering.

> Requests have the following format:
>
>      struct virtio_scsi_req_cmd {
>          u8 lun[8];
>          u64 id;
>          u8 task_attr;
>          u8 prio;
>          u8 crn;
>          char cdb[cdb_size];
>          char dataout[];
>
>          u8 sense[sense_size];
>          u32 sense_len;
>          u32 residual;
>          u16 status_qualifier;
>          u8 status;
>          u8 response;
>          char datain[];
>      };
>
>      /* command-specific response values */
>      #define VIRTIO_SCSI_S_OK              0
>      #define VIRTIO_SCSI_S_UNDERRUN        1
>      #define VIRTIO_SCSI_S_ABORTED         2
>      #define VIRTIO_SCSI_S_FAILURE         3
>
>      /* task_attr */
>      #define VIRTIO_SCSI_S_SIMPLE          0
>      #define VIRTIO_SCSI_S_ORDERED         1
>      #define VIRTIO_SCSI_S_HEAD            2
>      #define VIRTIO_SCSI_S_ACA             3
>
>      The lun field addresses a bus, target and logical unit in the SCSI
>      host.  The id field is the command identifier as defined in SAM.
>
Please do not rely in bus/target/lun here. These are leftovers from 
parallel SCSI and do not have any meaning on modern SCSI 
implementation (eg FC or SAS). Rephrase that to

The lun field is the Logical Unit Number as defined in SAM.

>      Task_attr, prio and CRN are defined in SAM.  The prio field should
>      always be zero, as command priority is explicitly not supported by
>      this version of the device.  task_attr defines the task attribute as
>      in the table above, Note that all task attributes may be mapped to
>      SIMPLE by the device.  CRN is generally expected to be 0, but clients
>      can provide it.  The maximum CRN value defined by the protocol is 255,
>      since CRN is stored in an 8-bit integer.
>
>      All of these fields are always read-only, as are the cdb and dataout
>      field.  sense and subsequent fields are always write-only.
>
>      The sense_len field indicates the number of bytes actually written
>      to the sense buffer.  The residual field indicates the residual
>      size, calculated as data_length - number_of_transferred_bytes, for
>      read or write operations.
>
>      The status byte is written by the device to be the SCSI status code.
>
?? I doubt that exists. Make that:

The status byte is written by the device to be the status code as 
defined in SAM.

>      The response byte is written by the device to be one of the following:
>
>      - VIRTIO_SCSI_S_OK when the request was completed and the status byte
>        is filled with a SCSI status code (not necessarily "GOOD").
>
>      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
>        more data than is available in the data buffers.
>
>      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
>        or another task management function.
>
>      - VIRTIO_SCSI_S_FAILURE for other host or guest error.  In particular,
>        if neither dataout nor datain is empty, and the VIRTIO_SCSI_F_INOUT
>        feature has not been negotiated, the request will be immediately
>        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>
And, of course:

VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due 
to a communication failure (eg device was removed or could not be
reached).

The remaining bits seem to be okay.

One general question:

This specification implies a strict one-to-one mapping between host 
and target. IE there is no way of specifying more than one target 
per host.
This will make things like ALUA (Asymmetric Logical Unit Access)
a bit tricky to implement, as the port states there are bound to 
target port groups. So with the virtio host spec we would need to 
specify two hosts to represent that.

If that's the intention here I'm fine, but maybe we should be 
specifying this expressis verbis somewhere.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-10 12:55   ` Hannes Reinecke
  0 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-10 12:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Linux Virtualization

On 06/07/2011 03:43 PM, Paolo Bonzini wrote:
> Hi all,
>
> after some preliminary discussion on the QEMU mailing list, I present a
> draft specification for a virtio-based SCSI host (controller, HBA, you
> name it).
>
> The virtio SCSI host is the basis of an alternative storage stack for
> KVM. This stack would overcome several limitations of the current
> solution, virtio-blk:
>
> 1) scalability limitations: virtio-blk-over-PCI puts a strong upper
> limit on the number of devices that can be added to a guest. Common
> configurations have a limit of ~30 devices. While this can be worked
> around by implementing a PCI-to-PCI bridge, or by using multifunction
> virtio-blk devices, these solutions either have not been implemented
> yet, or introduce management restrictions. On the other hand, the SCSI
> architecture is well known for its scalability and virtio-scsi supports
> advanced feature such as multiqueueing.
>
> 2) limited flexibility: virtio-blk does not support all possible storage
> scenarios. For example, it does not allow SCSI passthrough or persistent
> reservations. In principle, virtio-scsi provides anything that the
> underlying SCSI target (be it physical storage, iSCSI or the in-kernel
> target) supports.
>
> 3) limited extensibility: over the time, many features have been added
> to virtio-blk. Each such change requires modifications to the virtio
> specification, to the guest drivers, and to the device model in the
> host. The virtio-scsi spec has been written to follow SAM conventions,
> and exposing new features to the guest will only require changes to the
> host's SCSI target implementation.
>
>
> Comments are welcome.
>
> Paolo
>
> ------------------------------->8 -----------------------------------
>
>
> Virtio SCSI Host Device Spec
> ============================
>
> The virtio SCSI host device groups together one or more simple virtual
> devices (ie. disk), and allows communicating to these devices using the
> SCSI protocol.  An instance of the device represents a SCSI host with
> possibly many buses, targets and LUN attached.
>
> The virtio SCSI device services two kinds of requests:
>
> - command requests for a logical unit;
>
> - task management functions related to a logical unit, target or
> command.
>
> The device is also able to send out notifications about added
> and removed logical units.
>
> v1:
>      First public version
>
> v2:
>      Merged all virtqueues into one, removed separate TARGET fields
>
> v3:
>      Added configuration information and reworked descriptor structure.
>      Added back multiqueue on Avi's request, while still leaving TARGET
>      fields out.  Added dummy event and clarified some aspects of the
>      event protocol.  First version sent to a wider audience (linux-kernel
>      and virtio lists).
>
> Configuration
> -------------
>
> Subsystem Device ID
>      TBD
>
> Virtqueues
>      0:controlq
>      1:eventq
>      2..n:request queues
>
> Feature bits
>      VIRTIO_SCSI_F_INOUT (0) - Whether a single request can include both
>          read-only and write-only data buffers.
>
> Device configuration layout
>      struct virtio_scsi_config {
>          u32 num_queues;
>          u32 event_info_size;
>          u32 sense_size;
>          u32 cdb_size;
>      }
>
>      num_queues is the total number of virtqueues exposed by the
>      device.  The driver is free to use only one request queue, or
>      it can use more to achieve better performance.
>
>      event_info_size is the maximum size that the device will fill
>      for buffers that the driver places in the eventq.  The
>      driver should always put buffers at least of this size.
>
>      sense_size is the maximum size of the sense data that the device
>      will write.  The default value is written by the device and
>      will always be 96, but the driver can modify it.
>
>      cdb_size is the maximum size of the CBD that the driver
>      will write.  The default value is written by the device and
>      will always be 32, but the driver can likewise modify it.
>
> Device initialization
> ---------------------
>
> The initialization routine should first of all discover the device's
> virtqueues.
>
> The driver should then place at least a buffer in the eventq.
> Buffers returned by the device on the eventq may be referred
> to as "events" in the rest of the document.
>
> The driver can immediately issue requests (for example, INQUIRY or
> REPORT LUNS) or task management functions (for example, I_T RESET).
>
> Device operation: request queues
> --------------------------------
>
> The driver queues requests to an arbitrary request queue, and they are
> used by the device on that same queue.
>
What about request ordering?
If requests are placed on arbitrary queues you'll inevitably run on 
locking issues to ensure strict request ordering.
I would add here:

If a device uses more than one queue it is the responsibility of the 
device to ensure strict request ordering.

> Requests have the following format:
>
>      struct virtio_scsi_req_cmd {
>          u8 lun[8];
>          u64 id;
>          u8 task_attr;
>          u8 prio;
>          u8 crn;
>          char cdb[cdb_size];
>          char dataout[];
>
>          u8 sense[sense_size];
>          u32 sense_len;
>          u32 residual;
>          u16 status_qualifier;
>          u8 status;
>          u8 response;
>          char datain[];
>      };
>
>      /* command-specific response values */
>      #define VIRTIO_SCSI_S_OK              0
>      #define VIRTIO_SCSI_S_UNDERRUN        1
>      #define VIRTIO_SCSI_S_ABORTED         2
>      #define VIRTIO_SCSI_S_FAILURE         3
>
>      /* task_attr */
>      #define VIRTIO_SCSI_S_SIMPLE          0
>      #define VIRTIO_SCSI_S_ORDERED         1
>      #define VIRTIO_SCSI_S_HEAD            2
>      #define VIRTIO_SCSI_S_ACA             3
>
>      The lun field addresses a bus, target and logical unit in the SCSI
>      host.  The id field is the command identifier as defined in SAM.
>
Please do not rely in bus/target/lun here. These are leftovers from 
parallel SCSI and do not have any meaning on modern SCSI 
implementation (eg FC or SAS). Rephrase that to

The lun field is the Logical Unit Number as defined in SAM.

>      Task_attr, prio and CRN are defined in SAM.  The prio field should
>      always be zero, as command priority is explicitly not supported by
>      this version of the device.  task_attr defines the task attribute as
>      in the table above, Note that all task attributes may be mapped to
>      SIMPLE by the device.  CRN is generally expected to be 0, but clients
>      can provide it.  The maximum CRN value defined by the protocol is 255,
>      since CRN is stored in an 8-bit integer.
>
>      All of these fields are always read-only, as are the cdb and dataout
>      field.  sense and subsequent fields are always write-only.
>
>      The sense_len field indicates the number of bytes actually written
>      to the sense buffer.  The residual field indicates the residual
>      size, calculated as data_length - number_of_transferred_bytes, for
>      read or write operations.
>
>      The status byte is written by the device to be the SCSI status code.
>
?? I doubt that exists. Make that:

The status byte is written by the device to be the status code as 
defined in SAM.

>      The response byte is written by the device to be one of the following:
>
>      - VIRTIO_SCSI_S_OK when the request was completed and the status byte
>        is filled with a SCSI status code (not necessarily "GOOD").
>
>      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
>        more data than is available in the data buffers.
>
>      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
>        or another task management function.
>
>      - VIRTIO_SCSI_S_FAILURE for other host or guest error.  In particular,
>        if neither dataout nor datain is empty, and the VIRTIO_SCSI_F_INOUT
>        feature has not been negotiated, the request will be immediately
>        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>
And, of course:

VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due 
to a communication failure (eg device was removed or could not be
reached).

The remaining bits seem to be okay.

One general question:

This specification implies a strict one-to-one mapping between host 
and target. IE there is no way of specifying more than one target 
per host.
This will make things like ALUA (Asymmetric Logical Unit Access)
a bit tricky to implement, as the port states there are bound to 
target port groups. So with the virtio host spec we would need to 
specify two hosts to represent that.

If that's the intention here I'm fine, but maybe we should be 
specifying this expressis verbis somewhere.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-07 13:43 ` [Qemu-devel] " Paolo Bonzini
                   ` (2 preceding siblings ...)
  (?)
@ 2011-06-10 12:55 ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-10 12:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 06/07/2011 03:43 PM, Paolo Bonzini wrote:
> Hi all,
>
> after some preliminary discussion on the QEMU mailing list, I present a
> draft specification for a virtio-based SCSI host (controller, HBA, you
> name it).
>
> The virtio SCSI host is the basis of an alternative storage stack for
> KVM. This stack would overcome several limitations of the current
> solution, virtio-blk:
>
> 1) scalability limitations: virtio-blk-over-PCI puts a strong upper
> limit on the number of devices that can be added to a guest. Common
> configurations have a limit of ~30 devices. While this can be worked
> around by implementing a PCI-to-PCI bridge, or by using multifunction
> virtio-blk devices, these solutions either have not been implemented
> yet, or introduce management restrictions. On the other hand, the SCSI
> architecture is well known for its scalability and virtio-scsi supports
> advanced feature such as multiqueueing.
>
> 2) limited flexibility: virtio-blk does not support all possible storage
> scenarios. For example, it does not allow SCSI passthrough or persistent
> reservations. In principle, virtio-scsi provides anything that the
> underlying SCSI target (be it physical storage, iSCSI or the in-kernel
> target) supports.
>
> 3) limited extensibility: over the time, many features have been added
> to virtio-blk. Each such change requires modifications to the virtio
> specification, to the guest drivers, and to the device model in the
> host. The virtio-scsi spec has been written to follow SAM conventions,
> and exposing new features to the guest will only require changes to the
> host's SCSI target implementation.
>
>
> Comments are welcome.
>
> Paolo
>
> ------------------------------->8 -----------------------------------
>
>
> Virtio SCSI Host Device Spec
> ============================
>
> The virtio SCSI host device groups together one or more simple virtual
> devices (ie. disk), and allows communicating to these devices using the
> SCSI protocol.  An instance of the device represents a SCSI host with
> possibly many buses, targets and LUN attached.
>
> The virtio SCSI device services two kinds of requests:
>
> - command requests for a logical unit;
>
> - task management functions related to a logical unit, target or
> command.
>
> The device is also able to send out notifications about added
> and removed logical units.
>
> v1:
>      First public version
>
> v2:
>      Merged all virtqueues into one, removed separate TARGET fields
>
> v3:
>      Added configuration information and reworked descriptor structure.
>      Added back multiqueue on Avi's request, while still leaving TARGET
>      fields out.  Added dummy event and clarified some aspects of the
>      event protocol.  First version sent to a wider audience (linux-kernel
>      and virtio lists).
>
> Configuration
> -------------
>
> Subsystem Device ID
>      TBD
>
> Virtqueues
>      0:controlq
>      1:eventq
>      2..n:request queues
>
> Feature bits
>      VIRTIO_SCSI_F_INOUT (0) - Whether a single request can include both
>          read-only and write-only data buffers.
>
> Device configuration layout
>      struct virtio_scsi_config {
>          u32 num_queues;
>          u32 event_info_size;
>          u32 sense_size;
>          u32 cdb_size;
>      }
>
>      num_queues is the total number of virtqueues exposed by the
>      device.  The driver is free to use only one request queue, or
>      it can use more to achieve better performance.
>
>      event_info_size is the maximum size that the device will fill
>      for buffers that the driver places in the eventq.  The
>      driver should always put buffers at least of this size.
>
>      sense_size is the maximum size of the sense data that the device
>      will write.  The default value is written by the device and
>      will always be 96, but the driver can modify it.
>
>      cdb_size is the maximum size of the CBD that the driver
>      will write.  The default value is written by the device and
>      will always be 32, but the driver can likewise modify it.
>
> Device initialization
> ---------------------
>
> The initialization routine should first of all discover the device's
> virtqueues.
>
> The driver should then place at least a buffer in the eventq.
> Buffers returned by the device on the eventq may be referred
> to as "events" in the rest of the document.
>
> The driver can immediately issue requests (for example, INQUIRY or
> REPORT LUNS) or task management functions (for example, I_T RESET).
>
> Device operation: request queues
> --------------------------------
>
> The driver queues requests to an arbitrary request queue, and they are
> used by the device on that same queue.
>
What about request ordering?
If requests are placed on arbitrary queues you'll inevitably run on 
locking issues to ensure strict request ordering.
I would add here:

If a device uses more than one queue it is the responsibility of the 
device to ensure strict request ordering.

> Requests have the following format:
>
>      struct virtio_scsi_req_cmd {
>          u8 lun[8];
>          u64 id;
>          u8 task_attr;
>          u8 prio;
>          u8 crn;
>          char cdb[cdb_size];
>          char dataout[];
>
>          u8 sense[sense_size];
>          u32 sense_len;
>          u32 residual;
>          u16 status_qualifier;
>          u8 status;
>          u8 response;
>          char datain[];
>      };
>
>      /* command-specific response values */
>      #define VIRTIO_SCSI_S_OK              0
>      #define VIRTIO_SCSI_S_UNDERRUN        1
>      #define VIRTIO_SCSI_S_ABORTED         2
>      #define VIRTIO_SCSI_S_FAILURE         3
>
>      /* task_attr */
>      #define VIRTIO_SCSI_S_SIMPLE          0
>      #define VIRTIO_SCSI_S_ORDERED         1
>      #define VIRTIO_SCSI_S_HEAD            2
>      #define VIRTIO_SCSI_S_ACA             3
>
>      The lun field addresses a bus, target and logical unit in the SCSI
>      host.  The id field is the command identifier as defined in SAM.
>
Please do not rely in bus/target/lun here. These are leftovers from 
parallel SCSI and do not have any meaning on modern SCSI 
implementation (eg FC or SAS). Rephrase that to

The lun field is the Logical Unit Number as defined in SAM.

>      Task_attr, prio and CRN are defined in SAM.  The prio field should
>      always be zero, as command priority is explicitly not supported by
>      this version of the device.  task_attr defines the task attribute as
>      in the table above, Note that all task attributes may be mapped to
>      SIMPLE by the device.  CRN is generally expected to be 0, but clients
>      can provide it.  The maximum CRN value defined by the protocol is 255,
>      since CRN is stored in an 8-bit integer.
>
>      All of these fields are always read-only, as are the cdb and dataout
>      field.  sense and subsequent fields are always write-only.
>
>      The sense_len field indicates the number of bytes actually written
>      to the sense buffer.  The residual field indicates the residual
>      size, calculated as data_length - number_of_transferred_bytes, for
>      read or write operations.
>
>      The status byte is written by the device to be the SCSI status code.
>
?? I doubt that exists. Make that:

The status byte is written by the device to be the status code as 
defined in SAM.

>      The response byte is written by the device to be one of the following:
>
>      - VIRTIO_SCSI_S_OK when the request was completed and the status byte
>        is filled with a SCSI status code (not necessarily "GOOD").
>
>      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
>        more data than is available in the data buffers.
>
>      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
>        or another task management function.
>
>      - VIRTIO_SCSI_S_FAILURE for other host or guest error.  In particular,
>        if neither dataout nor datain is empty, and the VIRTIO_SCSI_F_INOUT
>        feature has not been negotiated, the request will be immediately
>        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>
And, of course:

VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due 
to a communication failure (eg device was removed or could not be
reached).

The remaining bits seem to be okay.

One general question:

This specification implies a strict one-to-one mapping between host 
and target. IE there is no way of specifying more than one target 
per host.
This will make things like ALUA (Asymmetric Logical Unit Access)
a bit tricky to implement, as the port states there are bound to 
target port groups. So with the virtio host spec we would need to 
specify two hosts to represent that.

If that's the intention here I'm fine, but maybe we should be 
specifying this expressis verbis somewhere.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 12:55   ` [Qemu-devel] " Hannes Reinecke
@ 2011-06-10 14:35     ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-10 14:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Linux Virtualization, Linux Kernel Mailing List, qemu-devel,
	Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	Michael S. Tsirkin, kvm

> If requests are placed on arbitrary queues you'll inevitably run on
> locking issues to ensure strict request ordering.
> I would add here:
> 
> If a device uses more than one queue it is the responsibility of the
> device to ensure strict request ordering.

Applied with s/device/guest/g.

> Please do not rely in bus/target/lun here. These are leftovers from
> parallel SCSI and do not have any meaning on modern SCSI
> implementation (eg FC or SAS). Rephrase that to
> 
> The lun field is the Logical Unit Number as defined in SAM.

Ok.

> >      The status byte is written by the device to be the SCSI status
> >      code.
>
> ?? I doubt that exists. Make that:
> 
> The status byte is written by the device to be the status code as
> defined in SAM.

Ok.

> >      The response byte is written by the device to be one of the
> >      following:
> >
> >      - VIRTIO_SCSI_S_OK when the request was completed and the
> >      status byte
> >        is filled with a SCSI status code (not necessarily "GOOD").
> >
> >      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
> >      transferring
> >        more data than is available in the data buffers.
> >
> >      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
> >      reset
> >        or another task management function.
> >
> >      - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
> >      particular,
> >        if neither dataout nor datain is empty, and the
> >        VIRTIO_SCSI_F_INOUT
> >        feature has not been negotiated, the request will be
> >        immediately
> >        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
> >
> And, of course:
> 
> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
> to a communication failure (eg device was removed or could not be
> reached).

Ok.

> This specification implies a strict one-to-one mapping between host
> and target. IE there is no way of specifying more than one target
> per host.

Actually no, the intention is to use hierarchical LUNs to support
more than one target per host.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-10 14:35     ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-10 14:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Linux Virtualization

> If requests are placed on arbitrary queues you'll inevitably run on
> locking issues to ensure strict request ordering.
> I would add here:
> 
> If a device uses more than one queue it is the responsibility of the
> device to ensure strict request ordering.

Applied with s/device/guest/g.

> Please do not rely in bus/target/lun here. These are leftovers from
> parallel SCSI and do not have any meaning on modern SCSI
> implementation (eg FC or SAS). Rephrase that to
> 
> The lun field is the Logical Unit Number as defined in SAM.

Ok.

> >      The status byte is written by the device to be the SCSI status
> >      code.
>
> ?? I doubt that exists. Make that:
> 
> The status byte is written by the device to be the status code as
> defined in SAM.

Ok.

> >      The response byte is written by the device to be one of the
> >      following:
> >
> >      - VIRTIO_SCSI_S_OK when the request was completed and the
> >      status byte
> >        is filled with a SCSI status code (not necessarily "GOOD").
> >
> >      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
> >      transferring
> >        more data than is available in the data buffers.
> >
> >      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
> >      reset
> >        or another task management function.
> >
> >      - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
> >      particular,
> >        if neither dataout nor datain is empty, and the
> >        VIRTIO_SCSI_F_INOUT
> >        feature has not been negotiated, the request will be
> >        immediately
> >        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
> >
> And, of course:
> 
> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
> to a communication failure (eg device was removed or could not be
> reached).

Ok.

> This specification implies a strict one-to-one mapping between host
> and target. IE there is no way of specifying more than one target
> per host.

Actually no, the intention is to use hierarchical LUNs to support
more than one target per host.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 12:55   ` [Qemu-devel] " Hannes Reinecke
  (?)
@ 2011-06-10 14:35   ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-10 14:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

> If requests are placed on arbitrary queues you'll inevitably run on
> locking issues to ensure strict request ordering.
> I would add here:
> 
> If a device uses more than one queue it is the responsibility of the
> device to ensure strict request ordering.

Applied with s/device/guest/g.

> Please do not rely in bus/target/lun here. These are leftovers from
> parallel SCSI and do not have any meaning on modern SCSI
> implementation (eg FC or SAS). Rephrase that to
> 
> The lun field is the Logical Unit Number as defined in SAM.

Ok.

> >      The status byte is written by the device to be the SCSI status
> >      code.
>
> ?? I doubt that exists. Make that:
> 
> The status byte is written by the device to be the status code as
> defined in SAM.

Ok.

> >      The response byte is written by the device to be one of the
> >      following:
> >
> >      - VIRTIO_SCSI_S_OK when the request was completed and the
> >      status byte
> >        is filled with a SCSI status code (not necessarily "GOOD").
> >
> >      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
> >      transferring
> >        more data than is available in the data buffers.
> >
> >      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
> >      reset
> >        or another task management function.
> >
> >      - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
> >      particular,
> >        if neither dataout nor datain is empty, and the
> >        VIRTIO_SCSI_F_INOUT
> >        feature has not been negotiated, the request will be
> >        immediately
> >        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
> >
> And, of course:
> 
> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
> to a communication failure (eg device was removed or could not be
> reached).

Ok.

> This specification implies a strict one-to-one mapping between host
> and target. IE there is no way of specifying more than one target
> per host.

Actually no, the intention is to use hierarchical LUNs to support
more than one target per host.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 12:55   ` [Qemu-devel] " Hannes Reinecke
@ 2011-06-12  7:51     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-12  7:51 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Paolo Bonzini, Linux Virtualization, Linux Kernel Mailing List,
	qemu-devel, Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	kvm

On Fri, Jun 10, 2011 at 02:55:35PM +0200, Hannes Reinecke wrote:
> >Device operation: request queues
> >--------------------------------
> >
> >The driver queues requests to an arbitrary request queue, and they are
> >used by the device on that same queue.
> >
> What about request ordering?
> If requests are placed on arbitrary queues you'll inevitably run on
> locking issues to ensure strict request ordering.
> I would add here:
> 
> If a device uses more than one queue it is the responsibility of the
> device to ensure strict request ordering.

Maybe I misunderstand - how can this be the responsibility of
the device if the device does not get the information about
the original ordering of the requests?

For example, if the driver is crazy enough to put
all write requests on one queue and all barriers
on another one, how is the device supposed to ensure
ordering?

-- 
MST

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-12  7:51     ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-12  7:51 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Rusty Russell,
	qemu-devel, Linux Kernel Mailing List, Paolo Bonzini,
	Linux Virtualization

On Fri, Jun 10, 2011 at 02:55:35PM +0200, Hannes Reinecke wrote:
> >Device operation: request queues
> >--------------------------------
> >
> >The driver queues requests to an arbitrary request queue, and they are
> >used by the device on that same queue.
> >
> What about request ordering?
> If requests are placed on arbitrary queues you'll inevitably run on
> locking issues to ensure strict request ordering.
> I would add here:
> 
> If a device uses more than one queue it is the responsibility of the
> device to ensure strict request ordering.

Maybe I misunderstand - how can this be the responsibility of
the device if the device does not get the information about
the original ordering of the requests?

For example, if the driver is crazy enough to put
all write requests on one queue and all barriers
on another one, how is the device supposed to ensure
ordering?

-- 
MST

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 12:55   ` [Qemu-devel] " Hannes Reinecke
                     ` (2 preceding siblings ...)
  (?)
@ 2011-06-12  7:51   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-12  7:51 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, qemu-devel,
	Linux Kernel Mailing List, Paolo Bonzini, Linux Virtualization

On Fri, Jun 10, 2011 at 02:55:35PM +0200, Hannes Reinecke wrote:
> >Device operation: request queues
> >--------------------------------
> >
> >The driver queues requests to an arbitrary request queue, and they are
> >used by the device on that same queue.
> >
> What about request ordering?
> If requests are placed on arbitrary queues you'll inevitably run on
> locking issues to ensure strict request ordering.
> I would add here:
> 
> If a device uses more than one queue it is the responsibility of the
> device to ensure strict request ordering.

Maybe I misunderstand - how can this be the responsibility of
the device if the device does not get the information about
the original ordering of the requests?

For example, if the driver is crazy enough to put
all write requests on one queue and all barriers
on another one, how is the device supposed to ensure
ordering?

-- 
MST

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 14:35     ` [Qemu-devel] " Paolo Bonzini
@ 2011-06-14  8:39       ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-14  8:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Linux Virtualization, Linux Kernel Mailing List, qemu-devel,
	Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	Michael S. Tsirkin, kvm

On 06/10/2011 04:35 PM, Paolo Bonzini wrote:
>> If requests are placed on arbitrary queues you'll inevitably run on
>> locking issues to ensure strict request ordering.
>> I would add here:
>>
>> If a device uses more than one queue it is the responsibility of the
>> device to ensure strict request ordering.
>
> Applied with s/device/guest/g.
>
>> Please do not rely in bus/target/lun here. These are leftovers from
>> parallel SCSI and do not have any meaning on modern SCSI
>> implementation (eg FC or SAS). Rephrase that to
>>
>> The lun field is the Logical Unit Number as defined in SAM.
>
> Ok.
>
>>>       The status byte is written by the device to be the SCSI status
>>>       code.
>>
>> ?? I doubt that exists. Make that:
>>
>> The status byte is written by the device to be the status code as
>> defined in SAM.
>
> Ok.
>
>>>       The response byte is written by the device to be one of the
>>>       following:
>>>
>>>       - VIRTIO_SCSI_S_OK when the request was completed and the
>>>       status byte
>>>         is filled with a SCSI status code (not necessarily "GOOD").
>>>
>>>       - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
>>>       transferring
>>>         more data than is available in the data buffers.
>>>
>>>       - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
>>>       reset
>>>         or another task management function.
>>>
>>>       - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
>>>       particular,
>>>         if neither dataout nor datain is empty, and the
>>>         VIRTIO_SCSI_F_INOUT
>>>         feature has not been negotiated, the request will be
>>>         immediately
>>>         returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>>>
>> And, of course:
>>
>> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
>> to a communication failure (eg device was removed or could not be
>> reached).
>
> Ok.
>
>> This specification implies a strict one-to-one mapping between host
>> and target. IE there is no way of specifying more than one target
>> per host.
>
> Actually no, the intention is to use hierarchical LUNs to support
> more than one target per host.
>
Can't.

Hierarchical LUNs is a target-internal representation.
The initiator (ie guest OS) should _not_ try to assume anything 
about the internal structure and just use the LUN as an opaque number.

Reason being that the LUN addressing is not unique, and there are 
several choices on how to represent a given LUN.
So the consensus here is that different LUN numbers represent
different physical devices, regardless on the (internal) LUN 
representation.
Which in turn means we cannot use the LUN number to convey anything 
else than a device identification relative to a target.

Cf SAM-3 paragraph 4.8:

A logical unit number is a field (see 4.9) containing 64 bits that 
identifies the logical unit within a SCSI target device
when accessed by a SCSI target port.

IE the LUN is dependent on the target, but you cannot make 
assumptions on the target.

Consequently, it's in the hosts' responsibility to figure out the 
targets in the system. After that it invokes the 'scan' function 
from the SCSI midlayer.
You can't start from a LUN and try to figure out the targets ...

If you want to support more than on target per host you need some 
sort of enumeration/callback which allows the host to figure out
the number of available targets.
But in general the targets are referenced by the target port 
identifier as specified in the appropriate standard (eg FC or SAS).
Sadly, we don't have any standard to fall back on for this.

If, however, we decide to expose some details about the backend, we 
could be using the values from the backend directly.
EG we could be forwarding the SCSI target port identifier here
(if backed by real hardware) or creating our own SAS-type
identifier when backed by qemu block. Then we could just query
the backend via a new command on the controlq
(eg 'list target ports') and wouldn't have to worry about any 
protocol specific details here.

Of course, when doing so we would be lose the ability to freely 
remap LUNs. But then remapping LUNs doesn't gain you much imho.
Plus you could always use qemu block backend here if you want
to hide the details.
But we would be finally able to use NPIV for KVM, something
I wanted to do for a _long_ time.

I personally _really_ would like to see the real backing device 
details exposed to the guest.
Otherwise the more advanced stuff like persistent reservations 
becomes _really_ hard if not impossible.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-14  8:39       ` Hannes Reinecke
  0 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-14  8:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Linux Virtualization

On 06/10/2011 04:35 PM, Paolo Bonzini wrote:
>> If requests are placed on arbitrary queues you'll inevitably run on
>> locking issues to ensure strict request ordering.
>> I would add here:
>>
>> If a device uses more than one queue it is the responsibility of the
>> device to ensure strict request ordering.
>
> Applied with s/device/guest/g.
>
>> Please do not rely in bus/target/lun here. These are leftovers from
>> parallel SCSI and do not have any meaning on modern SCSI
>> implementation (eg FC or SAS). Rephrase that to
>>
>> The lun field is the Logical Unit Number as defined in SAM.
>
> Ok.
>
>>>       The status byte is written by the device to be the SCSI status
>>>       code.
>>
>> ?? I doubt that exists. Make that:
>>
>> The status byte is written by the device to be the status code as
>> defined in SAM.
>
> Ok.
>
>>>       The response byte is written by the device to be one of the
>>>       following:
>>>
>>>       - VIRTIO_SCSI_S_OK when the request was completed and the
>>>       status byte
>>>         is filled with a SCSI status code (not necessarily "GOOD").
>>>
>>>       - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
>>>       transferring
>>>         more data than is available in the data buffers.
>>>
>>>       - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
>>>       reset
>>>         or another task management function.
>>>
>>>       - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
>>>       particular,
>>>         if neither dataout nor datain is empty, and the
>>>         VIRTIO_SCSI_F_INOUT
>>>         feature has not been negotiated, the request will be
>>>         immediately
>>>         returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>>>
>> And, of course:
>>
>> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
>> to a communication failure (eg device was removed or could not be
>> reached).
>
> Ok.
>
>> This specification implies a strict one-to-one mapping between host
>> and target. IE there is no way of specifying more than one target
>> per host.
>
> Actually no, the intention is to use hierarchical LUNs to support
> more than one target per host.
>
Can't.

Hierarchical LUNs is a target-internal representation.
The initiator (ie guest OS) should _not_ try to assume anything 
about the internal structure and just use the LUN as an opaque number.

Reason being that the LUN addressing is not unique, and there are 
several choices on how to represent a given LUN.
So the consensus here is that different LUN numbers represent
different physical devices, regardless on the (internal) LUN 
representation.
Which in turn means we cannot use the LUN number to convey anything 
else than a device identification relative to a target.

Cf SAM-3 paragraph 4.8:

A logical unit number is a field (see 4.9) containing 64 bits that 
identifies the logical unit within a SCSI target device
when accessed by a SCSI target port.

IE the LUN is dependent on the target, but you cannot make 
assumptions on the target.

Consequently, it's in the hosts' responsibility to figure out the 
targets in the system. After that it invokes the 'scan' function 
from the SCSI midlayer.
You can't start from a LUN and try to figure out the targets ...

If you want to support more than on target per host you need some 
sort of enumeration/callback which allows the host to figure out
the number of available targets.
But in general the targets are referenced by the target port 
identifier as specified in the appropriate standard (eg FC or SAS).
Sadly, we don't have any standard to fall back on for this.

If, however, we decide to expose some details about the backend, we 
could be using the values from the backend directly.
EG we could be forwarding the SCSI target port identifier here
(if backed by real hardware) or creating our own SAS-type
identifier when backed by qemu block. Then we could just query
the backend via a new command on the controlq
(eg 'list target ports') and wouldn't have to worry about any 
protocol specific details here.

Of course, when doing so we would be lose the ability to freely 
remap LUNs. But then remapping LUNs doesn't gain you much imho.
Plus you could always use qemu block backend here if you want
to hide the details.
But we would be finally able to use NPIV for KVM, something
I wanted to do for a _long_ time.

I personally _really_ would like to see the real backing device 
details exposed to the guest.
Otherwise the more advanced stuff like persistent reservations 
becomes _really_ hard if not impossible.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-10 14:35     ` [Qemu-devel] " Paolo Bonzini
  (?)
@ 2011-06-14  8:39     ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-14  8:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 06/10/2011 04:35 PM, Paolo Bonzini wrote:
>> If requests are placed on arbitrary queues you'll inevitably run on
>> locking issues to ensure strict request ordering.
>> I would add here:
>>
>> If a device uses more than one queue it is the responsibility of the
>> device to ensure strict request ordering.
>
> Applied with s/device/guest/g.
>
>> Please do not rely in bus/target/lun here. These are leftovers from
>> parallel SCSI and do not have any meaning on modern SCSI
>> implementation (eg FC or SAS). Rephrase that to
>>
>> The lun field is the Logical Unit Number as defined in SAM.
>
> Ok.
>
>>>       The status byte is written by the device to be the SCSI status
>>>       code.
>>
>> ?? I doubt that exists. Make that:
>>
>> The status byte is written by the device to be the status code as
>> defined in SAM.
>
> Ok.
>
>>>       The response byte is written by the device to be one of the
>>>       following:
>>>
>>>       - VIRTIO_SCSI_S_OK when the request was completed and the
>>>       status byte
>>>         is filled with a SCSI status code (not necessarily "GOOD").
>>>
>>>       - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
>>>       transferring
>>>         more data than is available in the data buffers.
>>>
>>>       - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
>>>       reset
>>>         or another task management function.
>>>
>>>       - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
>>>       particular,
>>>         if neither dataout nor datain is empty, and the
>>>         VIRTIO_SCSI_F_INOUT
>>>         feature has not been negotiated, the request will be
>>>         immediately
>>>         returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>>>
>> And, of course:
>>
>> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
>> to a communication failure (eg device was removed or could not be
>> reached).
>
> Ok.
>
>> This specification implies a strict one-to-one mapping between host
>> and target. IE there is no way of specifying more than one target
>> per host.
>
> Actually no, the intention is to use hierarchical LUNs to support
> more than one target per host.
>
Can't.

Hierarchical LUNs is a target-internal representation.
The initiator (ie guest OS) should _not_ try to assume anything 
about the internal structure and just use the LUN as an opaque number.

Reason being that the LUN addressing is not unique, and there are 
several choices on how to represent a given LUN.
So the consensus here is that different LUN numbers represent
different physical devices, regardless on the (internal) LUN 
representation.
Which in turn means we cannot use the LUN number to convey anything 
else than a device identification relative to a target.

Cf SAM-3 paragraph 4.8:

A logical unit number is a field (see 4.9) containing 64 bits that 
identifies the logical unit within a SCSI target device
when accessed by a SCSI target port.

IE the LUN is dependent on the target, but you cannot make 
assumptions on the target.

Consequently, it's in the hosts' responsibility to figure out the 
targets in the system. After that it invokes the 'scan' function 
from the SCSI midlayer.
You can't start from a LUN and try to figure out the targets ...

If you want to support more than on target per host you need some 
sort of enumeration/callback which allows the host to figure out
the number of available targets.
But in general the targets are referenced by the target port 
identifier as specified in the appropriate standard (eg FC or SAS).
Sadly, we don't have any standard to fall back on for this.

If, however, we decide to expose some details about the backend, we 
could be using the values from the backend directly.
EG we could be forwarding the SCSI target port identifier here
(if backed by real hardware) or creating our own SAS-type
identifier when backed by qemu block. Then we could just query
the backend via a new command on the controlq
(eg 'list target ports') and wouldn't have to worry about any 
protocol specific details here.

Of course, when doing so we would be lose the ability to freely 
remap LUNs. But then remapping LUNs doesn't gain you much imho.
Plus you could always use qemu block backend here if you want
to hide the details.
But we would be finally able to use NPIV for KVM, something
I wanted to do for a _long_ time.

I personally _really_ would like to see the real backing device 
details exposed to the guest.
Otherwise the more advanced stuff like persistent reservations 
becomes _really_ hard if not impossible.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-12  7:51     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-06-14 15:30       ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-14 15:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Paolo Bonzini, Linux Virtualization, Linux Kernel Mailing List,
	qemu-devel, Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	kvm

On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
> On Fri, Jun 10, 2011 at 02:55:35PM +0200, Hannes Reinecke wrote:
>>> Device operation: request queues
>>> --------------------------------
>>>
>>> The driver queues requests to an arbitrary request queue, and they are
>>> used by the device on that same queue.
>>>
>> What about request ordering?
>> If requests are placed on arbitrary queues you'll inevitably run on
>> locking issues to ensure strict request ordering.
>> I would add here:
>>
>> If a device uses more than one queue it is the responsibility of the
>> device to ensure strict request ordering.
>
> Maybe I misunderstand - how can this be the responsibility of
> the device if the device does not get the information about
> the original ordering of the requests?
>
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?
>
Which is exactly the problem I was referring to.
When using more than one channel the request ordering
_as seen by the initiator_ has to be preserved.

This is quite hard to do from a device's perspective;
it might be able to process the requests _in the order_ they've 
arrived, but it won't be able to figure out the latency of each 
request, ie how it'll take the request to be delivered to the initiator.

What we need to do here is to ensure that virtio will deliver
the requests in-order across all virtqueues. Not sure whether it 
does this already.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-14 15:30       ` Hannes Reinecke
  0 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-14 15:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Rusty Russell,
	qemu-devel, Linux Kernel Mailing List, Paolo Bonzini,
	Linux Virtualization

On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
> On Fri, Jun 10, 2011 at 02:55:35PM +0200, Hannes Reinecke wrote:
>>> Device operation: request queues
>>> --------------------------------
>>>
>>> The driver queues requests to an arbitrary request queue, and they are
>>> used by the device on that same queue.
>>>
>> What about request ordering?
>> If requests are placed on arbitrary queues you'll inevitably run on
>> locking issues to ensure strict request ordering.
>> I would add here:
>>
>> If a device uses more than one queue it is the responsibility of the
>> device to ensure strict request ordering.
>
> Maybe I misunderstand - how can this be the responsibility of
> the device if the device does not get the information about
> the original ordering of the requests?
>
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?
>
Which is exactly the problem I was referring to.
When using more than one channel the request ordering
_as seen by the initiator_ has to be preserved.

This is quite hard to do from a device's perspective;
it might be able to process the requests _in the order_ they've 
arrived, but it won't be able to figure out the latency of each 
request, ie how it'll take the request to be delivered to the initiator.

What we need to do here is to ensure that virtio will deliver
the requests in-order across all virtqueues. Not sure whether it 
does this already.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-12  7:51     ` [Qemu-devel] " Michael S. Tsirkin
  (?)
  (?)
@ 2011-06-14 15:30     ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-14 15:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, qemu-devel,
	Linux Kernel Mailing List, Paolo Bonzini, Linux Virtualization

On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
> On Fri, Jun 10, 2011 at 02:55:35PM +0200, Hannes Reinecke wrote:
>>> Device operation: request queues
>>> --------------------------------
>>>
>>> The driver queues requests to an arbitrary request queue, and they are
>>> used by the device on that same queue.
>>>
>> What about request ordering?
>> If requests are placed on arbitrary queues you'll inevitably run on
>> locking issues to ensure strict request ordering.
>> I would add here:
>>
>> If a device uses more than one queue it is the responsibility of the
>> device to ensure strict request ordering.
>
> Maybe I misunderstand - how can this be the responsibility of
> the device if the device does not get the information about
> the original ordering of the requests?
>
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?
>
Which is exactly the problem I was referring to.
When using more than one channel the request ordering
_as seen by the initiator_ has to be preserved.

This is quite hard to do from a device's perspective;
it might be able to process the requests _in the order_ they've 
arrived, but it won't be able to figure out the latency of each 
request, ie how it'll take the request to be delivered to the initiator.

What we need to do here is to ensure that virtio will deliver
the requests in-order across all virtqueues. Not sure whether it 
does this already.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-14  8:39       ` [Qemu-devel] " Hannes Reinecke
@ 2011-06-14 15:53         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-14 15:53 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Paolo Bonzini, Linux Virtualization, Linux Kernel Mailing List,
	qemu-devel, Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	Michael S. Tsirkin, kvm

On Tue, Jun 14, 2011 at 9:39 AM, Hannes Reinecke <hare@suse.de> wrote:
> On 06/10/2011 04:35 PM, Paolo Bonzini wrote:
>>>
>>> If requests are placed on arbitrary queues you'll inevitably run on
>>> locking issues to ensure strict request ordering.
>>> I would add here:
>>>
>>> If a device uses more than one queue it is the responsibility of the
>>> device to ensure strict request ordering.
>>
>> Applied with s/device/guest/g.
>>
>>> Please do not rely in bus/target/lun here. These are leftovers from
>>> parallel SCSI and do not have any meaning on modern SCSI
>>> implementation (eg FC or SAS). Rephrase that to
>>>
>>> The lun field is the Logical Unit Number as defined in SAM.
>>
>> Ok.
>>
>>>>      The status byte is written by the device to be the SCSI status
>>>>      code.
>>>
>>> ?? I doubt that exists. Make that:
>>>
>>> The status byte is written by the device to be the status code as
>>> defined in SAM.
>>
>> Ok.
>>
>>>>      The response byte is written by the device to be one of the
>>>>      following:
>>>>
>>>>      - VIRTIO_SCSI_S_OK when the request was completed and the
>>>>      status byte
>>>>        is filled with a SCSI status code (not necessarily "GOOD").
>>>>
>>>>      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
>>>>      transferring
>>>>        more data than is available in the data buffers.
>>>>
>>>>      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
>>>>      reset
>>>>        or another task management function.
>>>>
>>>>      - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
>>>>      particular,
>>>>        if neither dataout nor datain is empty, and the
>>>>        VIRTIO_SCSI_F_INOUT
>>>>        feature has not been negotiated, the request will be
>>>>        immediately
>>>>        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>>>>
>>> And, of course:
>>>
>>> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
>>> to a communication failure (eg device was removed or could not be
>>> reached).
>>
>> Ok.
>>
>>> This specification implies a strict one-to-one mapping between host
>>> and target. IE there is no way of specifying more than one target
>>> per host.
>>
>> Actually no, the intention is to use hierarchical LUNs to support
>> more than one target per host.
>>
> Can't.
>
> Hierarchical LUNs is a target-internal representation.
> The initiator (ie guest OS) should _not_ try to assume anything about the
> internal structure and just use the LUN as an opaque number.
>
> Reason being that the LUN addressing is not unique, and there are several
> choices on how to represent a given LUN.
> So the consensus here is that different LUN numbers represent
> different physical devices, regardless on the (internal) LUN representation.
> Which in turn means we cannot use the LUN number to convey anything else
> than a device identification relative to a target.
>
> Cf SAM-3 paragraph 4.8:
>
> A logical unit number is a field (see 4.9) containing 64 bits that
> identifies the logical unit within a SCSI target device
> when accessed by a SCSI target port.
>
> IE the LUN is dependent on the target, but you cannot make assumptions on
> the target.
>
> Consequently, it's in the hosts' responsibility to figure out the targets in
> the system. After that it invokes the 'scan' function from the SCSI
> midlayer.
> You can't start from a LUN and try to figure out the targets ...
>
> If you want to support more than on target per host you need some sort of
> enumeration/callback which allows the host to figure out
> the number of available targets.
> But in general the targets are referenced by the target port identifier as
> specified in the appropriate standard (eg FC or SAS).
> Sadly, we don't have any standard to fall back on for this.
>
> If, however, we decide to expose some details about the backend, we could be
> using the values from the backend directly.
> EG we could be forwarding the SCSI target port identifier here
> (if backed by real hardware) or creating our own SAS-type
> identifier when backed by qemu block. Then we could just query
> the backend via a new command on the controlq
> (eg 'list target ports') and wouldn't have to worry about any protocol
> specific details here.

I think we want to be able to pass through one or more SCSI targets,
so we probably need a 'list target ports' control command.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-14 15:53         ` Stefan Hajnoczi
  0 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-14 15:53 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Paolo Bonzini, Linux Virtualization

On Tue, Jun 14, 2011 at 9:39 AM, Hannes Reinecke <hare@suse.de> wrote:
> On 06/10/2011 04:35 PM, Paolo Bonzini wrote:
>>>
>>> If requests are placed on arbitrary queues you'll inevitably run on
>>> locking issues to ensure strict request ordering.
>>> I would add here:
>>>
>>> If a device uses more than one queue it is the responsibility of the
>>> device to ensure strict request ordering.
>>
>> Applied with s/device/guest/g.
>>
>>> Please do not rely in bus/target/lun here. These are leftovers from
>>> parallel SCSI and do not have any meaning on modern SCSI
>>> implementation (eg FC or SAS). Rephrase that to
>>>
>>> The lun field is the Logical Unit Number as defined in SAM.
>>
>> Ok.
>>
>>>>      The status byte is written by the device to be the SCSI status
>>>>      code.
>>>
>>> ?? I doubt that exists. Make that:
>>>
>>> The status byte is written by the device to be the status code as
>>> defined in SAM.
>>
>> Ok.
>>
>>>>      The response byte is written by the device to be one of the
>>>>      following:
>>>>
>>>>      - VIRTIO_SCSI_S_OK when the request was completed and the
>>>>      status byte
>>>>        is filled with a SCSI status code (not necessarily "GOOD").
>>>>
>>>>      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
>>>>      transferring
>>>>        more data than is available in the data buffers.
>>>>
>>>>      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
>>>>      reset
>>>>        or another task management function.
>>>>
>>>>      - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
>>>>      particular,
>>>>        if neither dataout nor datain is empty, and the
>>>>        VIRTIO_SCSI_F_INOUT
>>>>        feature has not been negotiated, the request will be
>>>>        immediately
>>>>        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>>>>
>>> And, of course:
>>>
>>> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
>>> to a communication failure (eg device was removed or could not be
>>> reached).
>>
>> Ok.
>>
>>> This specification implies a strict one-to-one mapping between host
>>> and target. IE there is no way of specifying more than one target
>>> per host.
>>
>> Actually no, the intention is to use hierarchical LUNs to support
>> more than one target per host.
>>
> Can't.
>
> Hierarchical LUNs is a target-internal representation.
> The initiator (ie guest OS) should _not_ try to assume anything about the
> internal structure and just use the LUN as an opaque number.
>
> Reason being that the LUN addressing is not unique, and there are several
> choices on how to represent a given LUN.
> So the consensus here is that different LUN numbers represent
> different physical devices, regardless on the (internal) LUN representation.
> Which in turn means we cannot use the LUN number to convey anything else
> than a device identification relative to a target.
>
> Cf SAM-3 paragraph 4.8:
>
> A logical unit number is a field (see 4.9) containing 64 bits that
> identifies the logical unit within a SCSI target device
> when accessed by a SCSI target port.
>
> IE the LUN is dependent on the target, but you cannot make assumptions on
> the target.
>
> Consequently, it's in the hosts' responsibility to figure out the targets in
> the system. After that it invokes the 'scan' function from the SCSI
> midlayer.
> You can't start from a LUN and try to figure out the targets ...
>
> If you want to support more than on target per host you need some sort of
> enumeration/callback which allows the host to figure out
> the number of available targets.
> But in general the targets are referenced by the target port identifier as
> specified in the appropriate standard (eg FC or SAS).
> Sadly, we don't have any standard to fall back on for this.
>
> If, however, we decide to expose some details about the backend, we could be
> using the values from the backend directly.
> EG we could be forwarding the SCSI target port identifier here
> (if backed by real hardware) or creating our own SAS-type
> identifier when backed by qemu block. Then we could just query
> the backend via a new command on the controlq
> (eg 'list target ports') and wouldn't have to worry about any protocol
> specific details here.

I think we want to be able to pass through one or more SCSI targets,
so we probably need a 'list target ports' control command.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-14  8:39       ` [Qemu-devel] " Hannes Reinecke
  (?)
@ 2011-06-14 15:53       ` Stefan Hajnoczi
  -1 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-14 15:53 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Paolo Bonzini,
	Linux Virtualization

On Tue, Jun 14, 2011 at 9:39 AM, Hannes Reinecke <hare@suse.de> wrote:
> On 06/10/2011 04:35 PM, Paolo Bonzini wrote:
>>>
>>> If requests are placed on arbitrary queues you'll inevitably run on
>>> locking issues to ensure strict request ordering.
>>> I would add here:
>>>
>>> If a device uses more than one queue it is the responsibility of the
>>> device to ensure strict request ordering.
>>
>> Applied with s/device/guest/g.
>>
>>> Please do not rely in bus/target/lun here. These are leftovers from
>>> parallel SCSI and do not have any meaning on modern SCSI
>>> implementation (eg FC or SAS). Rephrase that to
>>>
>>> The lun field is the Logical Unit Number as defined in SAM.
>>
>> Ok.
>>
>>>>      The status byte is written by the device to be the SCSI status
>>>>      code.
>>>
>>> ?? I doubt that exists. Make that:
>>>
>>> The status byte is written by the device to be the status code as
>>> defined in SAM.
>>
>> Ok.
>>
>>>>      The response byte is written by the device to be one of the
>>>>      following:
>>>>
>>>>      - VIRTIO_SCSI_S_OK when the request was completed and the
>>>>      status byte
>>>>        is filled with a SCSI status code (not necessarily "GOOD").
>>>>
>>>>      - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
>>>>      transferring
>>>>        more data than is available in the data buffers.
>>>>
>>>>      - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a
>>>>      reset
>>>>        or another task management function.
>>>>
>>>>      - VIRTIO_SCSI_S_FAILURE for other host or guest error. In
>>>>      particular,
>>>>        if neither dataout nor datain is empty, and the
>>>>        VIRTIO_SCSI_F_INOUT
>>>>        feature has not been negotiated, the request will be
>>>>        immediately
>>>>        returned with a response equal to VIRTIO_SCSI_S_FAILURE.
>>>>
>>> And, of course:
>>>
>>> VIRTIO_SCSI_S_DISCONNECT if the request could not be processed due
>>> to a communication failure (eg device was removed or could not be
>>> reached).
>>
>> Ok.
>>
>>> This specification implies a strict one-to-one mapping between host
>>> and target. IE there is no way of specifying more than one target
>>> per host.
>>
>> Actually no, the intention is to use hierarchical LUNs to support
>> more than one target per host.
>>
> Can't.
>
> Hierarchical LUNs is a target-internal representation.
> The initiator (ie guest OS) should _not_ try to assume anything about the
> internal structure and just use the LUN as an opaque number.
>
> Reason being that the LUN addressing is not unique, and there are several
> choices on how to represent a given LUN.
> So the consensus here is that different LUN numbers represent
> different physical devices, regardless on the (internal) LUN representation.
> Which in turn means we cannot use the LUN number to convey anything else
> than a device identification relative to a target.
>
> Cf SAM-3 paragraph 4.8:
>
> A logical unit number is a field (see 4.9) containing 64 bits that
> identifies the logical unit within a SCSI target device
> when accessed by a SCSI target port.
>
> IE the LUN is dependent on the target, but you cannot make assumptions on
> the target.
>
> Consequently, it's in the hosts' responsibility to figure out the targets in
> the system. After that it invokes the 'scan' function from the SCSI
> midlayer.
> You can't start from a LUN and try to figure out the targets ...
>
> If you want to support more than on target per host you need some sort of
> enumeration/callback which allows the host to figure out
> the number of available targets.
> But in general the targets are referenced by the target port identifier as
> specified in the appropriate standard (eg FC or SAS).
> Sadly, we don't have any standard to fall back on for this.
>
> If, however, we decide to expose some details about the backend, we could be
> using the values from the backend directly.
> EG we could be forwarding the SCSI target port identifier here
> (if backed by real hardware) or creating our own SAS-type
> identifier when backed by qemu block. Then we could just query
> the backend via a new command on the controlq
> (eg 'list target ports') and wouldn't have to worry about any protocol
> specific details here.

I think we want to be able to pass through one or more SCSI targets,
so we probably need a 'list target ports' control command.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-12  7:51     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-06-29  8:23       ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29  8:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hannes Reinecke, Linux Virtualization, Linux Kernel Mailing List,
	qemu-devel, Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	kvm

On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
>> >
>> >  If a device uses more than one queue it is the responsibility of the
>> >  device to ensure strict request ordering.
> Maybe I misunderstand - how can this be the responsibility of
> the device if the device does not get the information about
> the original ordering of the requests?
>
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?

I agree here, in fact I misread Hannes's comment as "if a driver uses 
more than one queue it is responsibility of the driver to ensure strict 
request ordering".  If you send requests to different queues, you know 
that those requests are independent.  I don't think anything else is 
feasible in the virtio framework.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29  8:23       ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29  8:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Rusty Russell,
	qemu-devel, Linux Kernel Mailing List, Hannes Reinecke,
	Linux Virtualization

On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
>> >
>> >  If a device uses more than one queue it is the responsibility of the
>> >  device to ensure strict request ordering.
> Maybe I misunderstand - how can this be the responsibility of
> the device if the device does not get the information about
> the original ordering of the requests?
>
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?

I agree here, in fact I misread Hannes's comment as "if a driver uses 
more than one queue it is responsibility of the driver to ensure strict 
request ordering".  If you send requests to different queues, you know 
that those requests are independent.  I don't think anything else is 
feasible in the virtio framework.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-12  7:51     ` [Qemu-devel] " Michael S. Tsirkin
                       ` (3 preceding siblings ...)
  (?)
@ 2011-06-29  8:23     ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29  8:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, qemu-devel,
	Linux Kernel Mailing List, Linux Virtualization

On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
>> >
>> >  If a device uses more than one queue it is the responsibility of the
>> >  device to ensure strict request ordering.
> Maybe I misunderstand - how can this be the responsibility of
> the device if the device does not get the information about
> the original ordering of the requests?
>
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?

I agree here, in fact I misread Hannes's comment as "if a driver uses 
more than one queue it is responsibility of the driver to ensure strict 
request ordering".  If you send requests to different queues, you know 
that those requests are independent.  I don't think anything else is 
feasible in the virtio framework.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-14  8:39       ` [Qemu-devel] " Hannes Reinecke
@ 2011-06-29  8:33         ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29  8:33 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Linux Virtualization, Linux Kernel Mailing List, qemu-devel,
	Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	Michael S. Tsirkin, kvm

On 06/14/2011 10:39 AM, Hannes Reinecke wrote:
> If, however, we decide to expose some details about the backend, we
> could be using the values from the backend directly.
> EG we could be forwarding the SCSI target port identifier here
> (if backed by real hardware) or creating our own SAS-type
> identifier when backed by qemu block. Then we could just query
> the backend via a new command on the controlq
> (eg 'list target ports') and wouldn't have to worry about any protocol
> specific details here.

Besides the controlq command, which I can certainly add, this is 
actually quite similar to what I had in mind (though my plan likely 
would not have worked because I was expecting hierarchical LUNs used 
uniformly).  So, "list target ports" would return a set of LUN values to 
which you can send REPORT LUNS, or something like that?  I suppose that 
if you're using real hardware as the backing storage the in-kernel 
target can provide that.

For the QEMU backend I'd keep hierarchical LUNs, though of course one 
could add a FC or SAS bus to QEMU, each implementing its own identifier 
scheme.

If I understand it correctly, it should remain possible to use a single 
host for both pass-through and emulated targets.

Would you draft the command structure, so I can incorporate it into the 
spec?

> Of course, when doing so we would be lose the ability to freely remap
> LUNs. But then remapping LUNs doesn't gain you much imho.
> Plus you could always use qemu block backend here if you want
> to hide the details.

And you could always use the QEMU block backend with scsi-generic if you 
want to remap LUNs, instead of true passthrough via the kernel target.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29  8:33         ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29  8:33 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Linux Virtualization

On 06/14/2011 10:39 AM, Hannes Reinecke wrote:
> If, however, we decide to expose some details about the backend, we
> could be using the values from the backend directly.
> EG we could be forwarding the SCSI target port identifier here
> (if backed by real hardware) or creating our own SAS-type
> identifier when backed by qemu block. Then we could just query
> the backend via a new command on the controlq
> (eg 'list target ports') and wouldn't have to worry about any protocol
> specific details here.

Besides the controlq command, which I can certainly add, this is 
actually quite similar to what I had in mind (though my plan likely 
would not have worked because I was expecting hierarchical LUNs used 
uniformly).  So, "list target ports" would return a set of LUN values to 
which you can send REPORT LUNS, or something like that?  I suppose that 
if you're using real hardware as the backing storage the in-kernel 
target can provide that.

For the QEMU backend I'd keep hierarchical LUNs, though of course one 
could add a FC or SAS bus to QEMU, each implementing its own identifier 
scheme.

If I understand it correctly, it should remain possible to use a single 
host for both pass-through and emulated targets.

Would you draft the command structure, so I can incorporate it into the 
spec?

> Of course, when doing so we would be lose the ability to freely remap
> LUNs. But then remapping LUNs doesn't gain you much imho.
> Plus you could always use qemu block backend here if you want
> to hide the details.

And you could always use the QEMU block backend with scsi-generic if you 
want to remap LUNs, instead of true passthrough via the kernel target.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-14  8:39       ` [Qemu-devel] " Hannes Reinecke
                         ` (2 preceding siblings ...)
  (?)
@ 2011-06-29  8:33       ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29  8:33 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 06/14/2011 10:39 AM, Hannes Reinecke wrote:
> If, however, we decide to expose some details about the backend, we
> could be using the values from the backend directly.
> EG we could be forwarding the SCSI target port identifier here
> (if backed by real hardware) or creating our own SAS-type
> identifier when backed by qemu block. Then we could just query
> the backend via a new command on the controlq
> (eg 'list target ports') and wouldn't have to worry about any protocol
> specific details here.

Besides the controlq command, which I can certainly add, this is 
actually quite similar to what I had in mind (though my plan likely 
would not have worked because I was expecting hierarchical LUNs used 
uniformly).  So, "list target ports" would return a set of LUN values to 
which you can send REPORT LUNS, or something like that?  I suppose that 
if you're using real hardware as the backing storage the in-kernel 
target can provide that.

For the QEMU backend I'd keep hierarchical LUNs, though of course one 
could add a FC or SAS bus to QEMU, each implementing its own identifier 
scheme.

If I understand it correctly, it should remain possible to use a single 
host for both pass-through and emulated targets.

Would you draft the command structure, so I can incorporate it into the 
spec?

> Of course, when doing so we would be lose the ability to freely remap
> LUNs. But then remapping LUNs doesn't gain you much imho.
> Plus you could always use qemu block backend here if you want
> to hide the details.

And you could always use the QEMU block backend with scsi-generic if you 
want to remap LUNs, instead of true passthrough via the kernel target.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  8:23       ` [Qemu-devel] " Paolo Bonzini
@ 2011-06-29  8:46         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-29  8:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Hannes Reinecke, Linux Virtualization, Linux Kernel Mailing List,
	qemu-devel, Rusty Russell, Stefan Hajnoczi, Christoph Hellwig,
	kvm

On Wed, Jun 29, 2011 at 10:23:26AM +0200, Paolo Bonzini wrote:
> On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
> >>>
> >>>  If a device uses more than one queue it is the responsibility of the
> >>>  device to ensure strict request ordering.
> >Maybe I misunderstand - how can this be the responsibility of
> >the device if the device does not get the information about
> >the original ordering of the requests?
> >
> >For example, if the driver is crazy enough to put
> >all write requests on one queue and all barriers
> >on another one, how is the device supposed to ensure
> >ordering?
> 
> I agree here, in fact I misread Hannes's comment as "if a driver
> uses more than one queue it is responsibility of the driver to
> ensure strict request ordering".  If you send requests to different
> queues, you know that those requests are independent.  I don't think
> anything else is feasible in the virtio framework.
> 
> Paolo

Like this then?

  If a driver uses more than one queue it is the responsibility of the
  driver to ensure strict request ordering: the device does not
  supply any guarantees about the ordering of requests between different
  virtqueues.



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29  8:46         ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-29  8:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Rusty Russell,
	qemu-devel, Linux Kernel Mailing List, Hannes Reinecke,
	Linux Virtualization

On Wed, Jun 29, 2011 at 10:23:26AM +0200, Paolo Bonzini wrote:
> On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
> >>>
> >>>  If a device uses more than one queue it is the responsibility of the
> >>>  device to ensure strict request ordering.
> >Maybe I misunderstand - how can this be the responsibility of
> >the device if the device does not get the information about
> >the original ordering of the requests?
> >
> >For example, if the driver is crazy enough to put
> >all write requests on one queue and all barriers
> >on another one, how is the device supposed to ensure
> >ordering?
> 
> I agree here, in fact I misread Hannes's comment as "if a driver
> uses more than one queue it is responsibility of the driver to
> ensure strict request ordering".  If you send requests to different
> queues, you know that those requests are independent.  I don't think
> anything else is feasible in the virtio framework.
> 
> Paolo

Like this then?

  If a driver uses more than one queue it is the responsibility of the
  driver to ensure strict request ordering: the device does not
  supply any guarantees about the ordering of requests between different
  virtqueues.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  8:23       ` [Qemu-devel] " Paolo Bonzini
  (?)
@ 2011-06-29  8:46       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-29  8:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, qemu-devel,
	Linux Kernel Mailing List, Linux Virtualization

On Wed, Jun 29, 2011 at 10:23:26AM +0200, Paolo Bonzini wrote:
> On 06/12/2011 09:51 AM, Michael S. Tsirkin wrote:
> >>>
> >>>  If a device uses more than one queue it is the responsibility of the
> >>>  device to ensure strict request ordering.
> >Maybe I misunderstand - how can this be the responsibility of
> >the device if the device does not get the information about
> >the original ordering of the requests?
> >
> >For example, if the driver is crazy enough to put
> >all write requests on one queue and all barriers
> >on another one, how is the device supposed to ensure
> >ordering?
> 
> I agree here, in fact I misread Hannes's comment as "if a driver
> uses more than one queue it is responsibility of the driver to
> ensure strict request ordering".  If you send requests to different
> queues, you know that those requests are independent.  I don't think
> anything else is feasible in the virtio framework.
> 
> Paolo

Like this then?

  If a driver uses more than one queue it is the responsibility of the
  driver to ensure strict request ordering: the device does not
  supply any guarantees about the ordering of requests between different
  virtqueues.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  8:33         ` [Qemu-devel] " Paolo Bonzini
@ 2011-06-29  9:39           ` Stefan Hajnoczi
  -1 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-29  9:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Hannes Reinecke, Christoph Hellwig, Stefan Hajnoczi, kvm,
	Michael S. Tsirkin, qemu-devel, Linux Kernel Mailing List,
	Linux Virtualization, Nicholas A. Bellinger

On Wed, Jun 29, 2011 at 9:33 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 06/14/2011 10:39 AM, Hannes Reinecke wrote:
>> If, however, we decide to expose some details about the backend, we
>> could be using the values from the backend directly.
>> EG we could be forwarding the SCSI target port identifier here
>> (if backed by real hardware) or creating our own SAS-type
>> identifier when backed by qemu block. Then we could just query
>> the backend via a new command on the controlq
>> (eg 'list target ports') and wouldn't have to worry about any protocol
>> specific details here.
>
> Besides the controlq command, which I can certainly add, this is
> actually quite similar to what I had in mind (though my plan likely
> would not have worked because I was expecting hierarchical LUNs used
> uniformly).  So, "list target ports" would return a set of LUN values to
> which you can send REPORT LUNS, or something like that?

I think we're missing a level of addressing.  We need the ability to
talk to multiple target ports in order for "list target ports" to make
sense.  Right now there is one implicit target that handles all
commands.  That means there is one fixed I_T Nexus.

If we introduce "list target ports" we also need a way to say "This
CDB is destined for target port #0".  Then it is possible to enumerate
target ports and address targets independently of the LUN field in the
CDB.

I'm pretty sure this is also how SAS and other transports work.  In
their framing they include the target port.

The question is whether we really need to support multiple targets on
a virtio-scsi adapter or not.  If you are selectively mapping LUNs
that the guest may access, then multiple targets are not necessary.
If we want to do pass-through of the entire SCSI bus then we need
multiple targets but I'm not sure if there are other challenges like
dependencies on the transport (Fibre Channel, SAS, etc) which make it
impossible to pass through bus-level access?

> If I understand it correctly, it should remain possible to use a single
> host for both pass-through and emulated targets.

Yes.

>> Of course, when doing so we would be lose the ability to freely remap
>> LUNs. But then remapping LUNs doesn't gain you much imho.
>> Plus you could always use qemu block backend here if you want
>> to hide the details.
>
> And you could always use the QEMU block backend with scsi-generic if you
> want to remap LUNs, instead of true passthrough via the kernel target.

IIUC the in-kernel target always does remapping.  It passes through
individual LUNs rather than entire targets and you pick LU Numbers to
map to the backing storage (which may or may not be a SCSI
pass-through device).  Nicholas Bellinger can confirm whether this is
correct.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29  9:39           ` Stefan Hajnoczi
  0 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-29  9:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Nicholas A. Bellinger, Linux Kernel Mailing List,
	Hannes Reinecke, Linux Virtualization

On Wed, Jun 29, 2011 at 9:33 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 06/14/2011 10:39 AM, Hannes Reinecke wrote:
>> If, however, we decide to expose some details about the backend, we
>> could be using the values from the backend directly.
>> EG we could be forwarding the SCSI target port identifier here
>> (if backed by real hardware) or creating our own SAS-type
>> identifier when backed by qemu block. Then we could just query
>> the backend via a new command on the controlq
>> (eg 'list target ports') and wouldn't have to worry about any protocol
>> specific details here.
>
> Besides the controlq command, which I can certainly add, this is
> actually quite similar to what I had in mind (though my plan likely
> would not have worked because I was expecting hierarchical LUNs used
> uniformly).  So, "list target ports" would return a set of LUN values to
> which you can send REPORT LUNS, or something like that?

I think we're missing a level of addressing.  We need the ability to
talk to multiple target ports in order for "list target ports" to make
sense.  Right now there is one implicit target that handles all
commands.  That means there is one fixed I_T Nexus.

If we introduce "list target ports" we also need a way to say "This
CDB is destined for target port #0".  Then it is possible to enumerate
target ports and address targets independently of the LUN field in the
CDB.

I'm pretty sure this is also how SAS and other transports work.  In
their framing they include the target port.

The question is whether we really need to support multiple targets on
a virtio-scsi adapter or not.  If you are selectively mapping LUNs
that the guest may access, then multiple targets are not necessary.
If we want to do pass-through of the entire SCSI bus then we need
multiple targets but I'm not sure if there are other challenges like
dependencies on the transport (Fibre Channel, SAS, etc) which make it
impossible to pass through bus-level access?

> If I understand it correctly, it should remain possible to use a single
> host for both pass-through and emulated targets.

Yes.

>> Of course, when doing so we would be lose the ability to freely remap
>> LUNs. But then remapping LUNs doesn't gain you much imho.
>> Plus you could always use qemu block backend here if you want
>> to hide the details.
>
> And you could always use the QEMU block backend with scsi-generic if you
> want to remap LUNs, instead of true passthrough via the kernel target.

IIUC the in-kernel target always does remapping.  It passes through
individual LUNs rather than entire targets and you pick LU Numbers to
map to the backing storage (which may or may not be a SCSI
pass-through device).  Nicholas Bellinger can confirm whether this is
correct.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  8:33         ` [Qemu-devel] " Paolo Bonzini
  (?)
  (?)
@ 2011-06-29  9:39         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-06-29  9:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On Wed, Jun 29, 2011 at 9:33 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 06/14/2011 10:39 AM, Hannes Reinecke wrote:
>> If, however, we decide to expose some details about the backend, we
>> could be using the values from the backend directly.
>> EG we could be forwarding the SCSI target port identifier here
>> (if backed by real hardware) or creating our own SAS-type
>> identifier when backed by qemu block. Then we could just query
>> the backend via a new command on the controlq
>> (eg 'list target ports') and wouldn't have to worry about any protocol
>> specific details here.
>
> Besides the controlq command, which I can certainly add, this is
> actually quite similar to what I had in mind (though my plan likely
> would not have worked because I was expecting hierarchical LUNs used
> uniformly).  So, "list target ports" would return a set of LUN values to
> which you can send REPORT LUNS, or something like that?

I think we're missing a level of addressing.  We need the ability to
talk to multiple target ports in order for "list target ports" to make
sense.  Right now there is one implicit target that handles all
commands.  That means there is one fixed I_T Nexus.

If we introduce "list target ports" we also need a way to say "This
CDB is destined for target port #0".  Then it is possible to enumerate
target ports and address targets independently of the LUN field in the
CDB.

I'm pretty sure this is also how SAS and other transports work.  In
their framing they include the target port.

The question is whether we really need to support multiple targets on
a virtio-scsi adapter or not.  If you are selectively mapping LUNs
that the guest may access, then multiple targets are not necessary.
If we want to do pass-through of the entire SCSI bus then we need
multiple targets but I'm not sure if there are other challenges like
dependencies on the transport (Fibre Channel, SAS, etc) which make it
impossible to pass through bus-level access?

> If I understand it correctly, it should remain possible to use a single
> host for both pass-through and emulated targets.

Yes.

>> Of course, when doing so we would be lose the ability to freely remap
>> LUNs. But then remapping LUNs doesn't gain you much imho.
>> Plus you could always use qemu block backend here if you want
>> to hide the details.
>
> And you could always use the QEMU block backend with scsi-generic if you
> want to remap LUNs, instead of true passthrough via the kernel target.

IIUC the in-kernel target always does remapping.  It passes through
individual LUNs rather than entire targets and you pick LU Numbers to
map to the backing storage (which may or may not be a SCSI
pass-through device).  Nicholas Bellinger can confirm whether this is
correct.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-14 15:30       ` [Qemu-devel] " Hannes Reinecke
@ 2011-06-29 10:00         ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:00 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Michael S. Tsirkin, Paolo Bonzini, Linux Virtualization,
	Linux Kernel Mailing List, qemu-devel, Rusty Russell,
	Stefan Hajnoczi, Christoph Hellwig, kvm

On Tue, Jun 14, 2011 at 05:30:24PM +0200, Hannes Reinecke wrote:
> Which is exactly the problem I was referring to.
> When using more than one channel the request ordering
> _as seen by the initiator_ has to be preserved.
> 
> This is quite hard to do from a device's perspective;
> it might be able to process the requests _in the order_ they've
> arrived, but it won't be able to figure out the latency of each
> request, ie how it'll take the request to be delivered to the
> initiator.
> 
> What we need to do here is to ensure that virtio will deliver
> the requests in-order across all virtqueues. Not sure whether it
> does this already.

This only matters for ordered tags, or implicit or explicit HEAD OF
QUEUE tags.  For everything else there's no ordering requirement.
Given that ordered tags don't matter in practice and we don't have
to support them this just leaves HEAD OF QUEUE.  I suspect the
HEAD OF QUEUE semantics need to be implemented using underlying
draining of all queues, which should be okay given that it's
usually used in slow path commands.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:00         ` Christoph Hellwig
  0 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:00 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Linux Kernel Mailing List, Rusty Russell, qemu-devel,
	Linux Virtualization, Paolo Bonzini

On Tue, Jun 14, 2011 at 05:30:24PM +0200, Hannes Reinecke wrote:
> Which is exactly the problem I was referring to.
> When using more than one channel the request ordering
> _as seen by the initiator_ has to be preserved.
> 
> This is quite hard to do from a device's perspective;
> it might be able to process the requests _in the order_ they've
> arrived, but it won't be able to figure out the latency of each
> request, ie how it'll take the request to be delivered to the
> initiator.
> 
> What we need to do here is to ensure that virtio will deliver
> the requests in-order across all virtqueues. Not sure whether it
> does this already.

This only matters for ordered tags, or implicit or explicit HEAD OF
QUEUE tags.  For everything else there's no ordering requirement.
Given that ordered tags don't matter in practice and we don't have
to support them this just leaves HEAD OF QUEUE.  I suspect the
HEAD OF QUEUE semantics need to be implemented using underlying
draining of all queues, which should be okay given that it's
usually used in slow path commands.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-14 15:30       ` [Qemu-devel] " Hannes Reinecke
  (?)
  (?)
@ 2011-06-29 10:00       ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:00 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Linux Kernel Mailing List, qemu-devel, Linux Virtualization,
	Paolo Bonzini

On Tue, Jun 14, 2011 at 05:30:24PM +0200, Hannes Reinecke wrote:
> Which is exactly the problem I was referring to.
> When using more than one channel the request ordering
> _as seen by the initiator_ has to be preserved.
> 
> This is quite hard to do from a device's perspective;
> it might be able to process the requests _in the order_ they've
> arrived, but it won't be able to figure out the latency of each
> request, ie how it'll take the request to be delivered to the
> initiator.
> 
> What we need to do here is to ensure that virtio will deliver
> the requests in-order across all virtqueues. Not sure whether it
> does this already.

This only matters for ordered tags, or implicit or explicit HEAD OF
QUEUE tags.  For everything else there's no ordering requirement.
Given that ordered tags don't matter in practice and we don't have
to support them this just leaves HEAD OF QUEUE.  I suspect the
HEAD OF QUEUE semantics need to be implemented using underlying
draining of all queues, which should be okay given that it's
usually used in slow path commands.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-12  7:51     ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-06-29 10:01       ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hannes Reinecke, Paolo Bonzini, Linux Virtualization,
	Linux Kernel Mailing List, qemu-devel, Rusty Russell,
	Stefan Hajnoczi, Christoph Hellwig, kvm

On Sun, Jun 12, 2011 at 10:51:41AM +0300, Michael S. Tsirkin wrote:
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?

There is no such things as barriers in SCSI.  The thing that comes
closest is ordered tags, which neither Linux nor any mainstream OS
uses, and which we don't have to (and generally don't want to)
implement.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:01       ` Christoph Hellwig
  0 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Rusty Russell,
	qemu-devel, Linux Kernel Mailing List, Hannes Reinecke,
	Paolo Bonzini, Linux Virtualization

On Sun, Jun 12, 2011 at 10:51:41AM +0300, Michael S. Tsirkin wrote:
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?

There is no such things as barriers in SCSI.  The thing that comes
closest is ordered tags, which neither Linux nor any mainstream OS
uses, and which we don't have to (and generally don't want to)
implement.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-12  7:51     ` [Qemu-devel] " Michael S. Tsirkin
                       ` (5 preceding siblings ...)
  (?)
@ 2011-06-29 10:01     ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, qemu-devel,
	Linux Kernel Mailing List, Paolo Bonzini, Linux Virtualization

On Sun, Jun 12, 2011 at 10:51:41AM +0300, Michael S. Tsirkin wrote:
> For example, if the driver is crazy enough to put
> all write requests on one queue and all barriers
> on another one, how is the device supposed to ensure
> ordering?

There is no such things as barriers in SCSI.  The thing that comes
closest is ordered tags, which neither Linux nor any mainstream OS
uses, and which we don't have to (and generally don't want to)
implement.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  8:23       ` [Qemu-devel] " Paolo Bonzini
@ 2011-06-29 10:03         ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Michael S. Tsirkin, Hannes Reinecke, Linux Virtualization,
	Linux Kernel Mailing List, qemu-devel, Rusty Russell,
	Stefan Hajnoczi, Christoph Hellwig, kvm

On Wed, Jun 29, 2011 at 10:23:26AM +0200, Paolo Bonzini wrote:
> I agree here, in fact I misread Hannes's comment as "if a driver
> uses more than one queue it is responsibility of the driver to
> ensure strict request ordering".  If you send requests to different
> queues, you know that those requests are independent.  I don't think
> anything else is feasible in the virtio framework.

That doesn't really fit very well with the SAM model.  If we want
to use multiple queues for a single LUN it has to be transparent to
the SCSI command stream.  Then again I don't quite see the use for
that anyway.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:03         ` Christoph Hellwig
  0 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Hannes Reinecke, Linux Virtualization

On Wed, Jun 29, 2011 at 10:23:26AM +0200, Paolo Bonzini wrote:
> I agree here, in fact I misread Hannes's comment as "if a driver
> uses more than one queue it is responsibility of the driver to
> ensure strict request ordering".  If you send requests to different
> queues, you know that those requests are independent.  I don't think
> anything else is feasible in the virtio framework.

That doesn't really fit very well with the SAM model.  If we want
to use multiple queues for a single LUN it has to be transparent to
the SCSI command stream.  Then again I don't quite see the use for
that anyway.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  8:23       ` [Qemu-devel] " Paolo Bonzini
                         ` (3 preceding siblings ...)
  (?)
@ 2011-06-29 10:03       ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On Wed, Jun 29, 2011 at 10:23:26AM +0200, Paolo Bonzini wrote:
> I agree here, in fact I misread Hannes's comment as "if a driver
> uses more than one queue it is responsibility of the driver to
> ensure strict request ordering".  If you send requests to different
> queues, you know that those requests are independent.  I don't think
> anything else is feasible in the virtio framework.

That doesn't really fit very well with the SAM model.  If we want
to use multiple queues for a single LUN it has to be transparent to
the SCSI command stream.  Then again I don't quite see the use for
that anyway.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:03         ` [Qemu-devel] " Christoph Hellwig
@ 2011-06-29 10:06           ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29 10:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Hannes Reinecke, Linux Virtualization,
	Linux Kernel Mailing List, qemu-devel, Rusty Russell,
	Stefan Hajnoczi, Christoph Hellwig, kvm

On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
> >  I agree here, in fact I misread Hannes's comment as "if a driver
> >  uses more than one queue it is responsibility of the driver to
> >  ensure strict request ordering".  If you send requests to different
> >  queues, you know that those requests are independent.  I don't think
> >  anything else is feasible in the virtio framework.
>
> That doesn't really fit very well with the SAM model.  If we want
> to use multiple queues for a single LUN it has to be transparent to
> the SCSI command stream.  Then again I don't quite see the use for
> that anyway.

Agreed, I see a use for multiple queues (MSI-X), but not for multiple 
queues shared by a single LUN.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:06           ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29 10:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Rusty Russell, qemu-devel, Linux Kernel Mailing List,
	Hannes Reinecke, Linux Virtualization

On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
> >  I agree here, in fact I misread Hannes's comment as "if a driver
> >  uses more than one queue it is responsibility of the driver to
> >  ensure strict request ordering".  If you send requests to different
> >  queues, you know that those requests are independent.  I don't think
> >  anything else is feasible in the virtio framework.
>
> That doesn't really fit very well with the SAM model.  If we want
> to use multiple queues for a single LUN it has to be transparent to
> the SCSI command stream.  Then again I don't quite see the use for
> that anyway.

Agreed, I see a use for multiple queues (MSI-X), but not for multiple 
queues shared by a single LUN.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:03         ` [Qemu-devel] " Christoph Hellwig
  (?)
@ 2011-06-29 10:06         ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29 10:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
> >  I agree here, in fact I misread Hannes's comment as "if a driver
> >  uses more than one queue it is responsibility of the driver to
> >  ensure strict request ordering".  If you send requests to different
> >  queues, you know that those requests are independent.  I don't think
> >  anything else is feasible in the virtio framework.
>
> That doesn't really fit very well with the SAM model.  If we want
> to use multiple queues for a single LUN it has to be transparent to
> the SCSI command stream.  Then again I don't quite see the use for
> that anyway.

Agreed, I see a use for multiple queues (MSI-X), but not for multiple 
queues shared by a single LUN.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  9:39           ` [Qemu-devel] " Stefan Hajnoczi
@ 2011-06-29 10:07             ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Paolo Bonzini, Hannes Reinecke, Christoph Hellwig,
	Stefan Hajnoczi, kvm, Michael S. Tsirkin, qemu-devel,
	Linux Kernel Mailing List, Linux Virtualization,
	Nicholas A. Bellinger

On Wed, Jun 29, 2011 at 10:39:42AM +0100, Stefan Hajnoczi wrote:
> I think we're missing a level of addressing.  We need the ability to
> talk to multiple target ports in order for "list target ports" to make
> sense.  Right now there is one implicit target that handles all
> commands.  That means there is one fixed I_T Nexus.
> 
> If we introduce "list target ports" we also need a way to say "This
> CDB is destined for target port #0".  Then it is possible to enumerate
> target ports and address targets independently of the LUN field in the
> CDB.
> 
> I'm pretty sure this is also how SAS and other transports work.  In
> their framing they include the target port.

Yes, exactly.  Hierachial LUNs are a nasty fringe feature that we should
avoid as much as possible, that is for everything but IBM vSCSI which is
braindead enough to force them.

> The question is whether we really need to support multiple targets on
> a virtio-scsi adapter or not.  If you are selectively mapping LUNs
> that the guest may access, then multiple targets are not necessary.
> If we want to do pass-through of the entire SCSI bus then we need
> multiple targets but I'm not sure if there are other challenges like
> dependencies on the transport (Fibre Channel, SAS, etc) which make it
> impossible to pass through bus-level access?

I don't think bus-level pass through is either easily possible nor
desirable.  What multiple targets are useful for is allowing more
virtual disks than we have virtual PCI slots.  We could do this by
supporting multiple LUNs, but given that many SCSI ressources are
target-based doing multiple targets most likely is the more scabale
and more logical variant.  E.g. we could much more easily have one
virtqueue per target than per LUN.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:07             ` Christoph Hellwig
  0 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Nicholas A. Bellinger, Linux Kernel Mailing List,
	Hannes Reinecke, Paolo Bonzini, Linux Virtualization

On Wed, Jun 29, 2011 at 10:39:42AM +0100, Stefan Hajnoczi wrote:
> I think we're missing a level of addressing.  We need the ability to
> talk to multiple target ports in order for "list target ports" to make
> sense.  Right now there is one implicit target that handles all
> commands.  That means there is one fixed I_T Nexus.
> 
> If we introduce "list target ports" we also need a way to say "This
> CDB is destined for target port #0".  Then it is possible to enumerate
> target ports and address targets independently of the LUN field in the
> CDB.
> 
> I'm pretty sure this is also how SAS and other transports work.  In
> their framing they include the target port.

Yes, exactly.  Hierachial LUNs are a nasty fringe feature that we should
avoid as much as possible, that is for everything but IBM vSCSI which is
braindead enough to force them.

> The question is whether we really need to support multiple targets on
> a virtio-scsi adapter or not.  If you are selectively mapping LUNs
> that the guest may access, then multiple targets are not necessary.
> If we want to do pass-through of the entire SCSI bus then we need
> multiple targets but I'm not sure if there are other challenges like
> dependencies on the transport (Fibre Channel, SAS, etc) which make it
> impossible to pass through bus-level access?

I don't think bus-level pass through is either easily possible nor
desirable.  What multiple targets are useful for is allowing more
virtual disks than we have virtual PCI slots.  We could do this by
supporting multiple LUNs, but given that many SCSI ressources are
target-based doing multiple targets most likely is the more scabale
and more logical variant.  E.g. we could much more easily have one
virtqueue per target than per LUN.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  9:39           ` [Qemu-devel] " Stefan Hajnoczi
  (?)
@ 2011-06-29 10:07           ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Paolo Bonzini,
	Linux Virtualization

On Wed, Jun 29, 2011 at 10:39:42AM +0100, Stefan Hajnoczi wrote:
> I think we're missing a level of addressing.  We need the ability to
> talk to multiple target ports in order for "list target ports" to make
> sense.  Right now there is one implicit target that handles all
> commands.  That means there is one fixed I_T Nexus.
> 
> If we introduce "list target ports" we also need a way to say "This
> CDB is destined for target port #0".  Then it is possible to enumerate
> target ports and address targets independently of the LUN field in the
> CDB.
> 
> I'm pretty sure this is also how SAS and other transports work.  In
> their framing they include the target port.

Yes, exactly.  Hierachial LUNs are a nasty fringe feature that we should
avoid as much as possible, that is for everything but IBM vSCSI which is
braindead enough to force them.

> The question is whether we really need to support multiple targets on
> a virtio-scsi adapter or not.  If you are selectively mapping LUNs
> that the guest may access, then multiple targets are not necessary.
> If we want to do pass-through of the entire SCSI bus then we need
> multiple targets but I'm not sure if there are other challenges like
> dependencies on the transport (Fibre Channel, SAS, etc) which make it
> impossible to pass through bus-level access?

I don't think bus-level pass through is either easily possible nor
desirable.  What multiple targets are useful for is allowing more
virtual disks than we have virtual PCI slots.  We could do this by
supporting multiple LUNs, but given that many SCSI ressources are
target-based doing multiple targets most likely is the more scabale
and more logical variant.  E.g. we could much more easily have one
virtqueue per target than per LUN.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:07             ` [Qemu-devel] " Christoph Hellwig
@ 2011-06-29 10:23               ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-29 10:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Stefan Hajnoczi, Paolo Bonzini, Christoph Hellwig,
	Stefan Hajnoczi, kvm, Michael S. Tsirkin, qemu-devel,
	Linux Kernel Mailing List, Linux Virtualization,
	Nicholas A. Bellinger

On 06/29/2011 12:07 PM, Christoph Hellwig wrote:
> On Wed, Jun 29, 2011 at 10:39:42AM +0100, Stefan Hajnoczi wrote:
>> I think we're missing a level of addressing.  We need the ability to
>> talk to multiple target ports in order for "list target ports" to make
>> sense.  Right now there is one implicit target that handles all
>> commands.  That means there is one fixed I_T Nexus.
>>
>> If we introduce "list target ports" we also need a way to say "This
>> CDB is destined for target port #0".  Then it is possible to enumerate
>> target ports and address targets independently of the LUN field in the
>> CDB.
>>
>> I'm pretty sure this is also how SAS and other transports work.  In
>> their framing they include the target port.
>
> Yes, exactly.  Hierachial LUNs are a nasty fringe feature that we should
> avoid as much as possible, that is for everything but IBM vSCSI which is
> braindead enough to force them.
>
Yep.

>> The question is whether we really need to support multiple targets on
>> a virtio-scsi adapter or not.  If you are selectively mapping LUNs
>> that the guest may access, then multiple targets are not necessary.
>> If we want to do pass-through of the entire SCSI bus then we need
>> multiple targets but I'm not sure if there are other challenges like
>> dependencies on the transport (Fibre Channel, SAS, etc) which make it
>> impossible to pass through bus-level access?
>
> I don't think bus-level pass through is either easily possible nor
> desirable.  What multiple targets are useful for is allowing more
> virtual disks than we have virtual PCI slots.  We could do this by
> supporting multiple LUNs, but given that many SCSI ressources are
> target-based doing multiple targets most likely is the more scabale
> and more logical variant.  E.g. we could much more easily have one
> virtqueue per target than per LUN.
>
The general idea here is that we can support NPIV.
With NPIV we'll have several scsi_hosts, each of which is assigned a 
different set of LUNs by the array.
With virtio we need to able to react on LUN remapping on the array 
side, ie we need to be able to issue a 'REPORT LUNS' command and 
add/remove LUNs on the fly. This means we have to expose the 
scsi_host in some way via virtio.

This is impossible with a one-to-one mapping between targets and 
LUNs. The actual bus-level pass-through will be just on the SCSI 
layer, ie 'REPORT LUNS' should be possible. If and how we do a LUN 
remapping internally on the host is a totally different matter.
Same goes for the transport details; I doubt we will expose all the 
dingy details of the various transports, but rather restrict 
ourselves to an abstract transport.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:23               ` Hannes Reinecke
  0 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-29 10:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Stefan Hajnoczi, qemu-devel, Nicholas A. Bellinger,
	Linux Kernel Mailing List, Paolo Bonzini, Linux Virtualization

On 06/29/2011 12:07 PM, Christoph Hellwig wrote:
> On Wed, Jun 29, 2011 at 10:39:42AM +0100, Stefan Hajnoczi wrote:
>> I think we're missing a level of addressing.  We need the ability to
>> talk to multiple target ports in order for "list target ports" to make
>> sense.  Right now there is one implicit target that handles all
>> commands.  That means there is one fixed I_T Nexus.
>>
>> If we introduce "list target ports" we also need a way to say "This
>> CDB is destined for target port #0".  Then it is possible to enumerate
>> target ports and address targets independently of the LUN field in the
>> CDB.
>>
>> I'm pretty sure this is also how SAS and other transports work.  In
>> their framing they include the target port.
>
> Yes, exactly.  Hierachial LUNs are a nasty fringe feature that we should
> avoid as much as possible, that is for everything but IBM vSCSI which is
> braindead enough to force them.
>
Yep.

>> The question is whether we really need to support multiple targets on
>> a virtio-scsi adapter or not.  If you are selectively mapping LUNs
>> that the guest may access, then multiple targets are not necessary.
>> If we want to do pass-through of the entire SCSI bus then we need
>> multiple targets but I'm not sure if there are other challenges like
>> dependencies on the transport (Fibre Channel, SAS, etc) which make it
>> impossible to pass through bus-level access?
>
> I don't think bus-level pass through is either easily possible nor
> desirable.  What multiple targets are useful for is allowing more
> virtual disks than we have virtual PCI slots.  We could do this by
> supporting multiple LUNs, but given that many SCSI ressources are
> target-based doing multiple targets most likely is the more scabale
> and more logical variant.  E.g. we could much more easily have one
> virtqueue per target than per LUN.
>
The general idea here is that we can support NPIV.
With NPIV we'll have several scsi_hosts, each of which is assigned a 
different set of LUNs by the array.
With virtio we need to able to react on LUN remapping on the array 
side, ie we need to be able to issue a 'REPORT LUNS' command and 
add/remove LUNs on the fly. This means we have to expose the 
scsi_host in some way via virtio.

This is impossible with a one-to-one mapping between targets and 
LUNs. The actual bus-level pass-through will be just on the SCSI 
layer, ie 'REPORT LUNS' should be possible. If and how we do a LUN 
remapping internally on the host is a totally different matter.
Same goes for the transport details; I doubt we will expose all the 
dingy details of the various transports, but rather restrict 
ourselves to an abstract transport.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:07             ` [Qemu-devel] " Christoph Hellwig
  (?)
  (?)
@ 2011-06-29 10:23             ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-06-29 10:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Paolo Bonzini,
	Linux Virtualization

On 06/29/2011 12:07 PM, Christoph Hellwig wrote:
> On Wed, Jun 29, 2011 at 10:39:42AM +0100, Stefan Hajnoczi wrote:
>> I think we're missing a level of addressing.  We need the ability to
>> talk to multiple target ports in order for "list target ports" to make
>> sense.  Right now there is one implicit target that handles all
>> commands.  That means there is one fixed I_T Nexus.
>>
>> If we introduce "list target ports" we also need a way to say "This
>> CDB is destined for target port #0".  Then it is possible to enumerate
>> target ports and address targets independently of the LUN field in the
>> CDB.
>>
>> I'm pretty sure this is also how SAS and other transports work.  In
>> their framing they include the target port.
>
> Yes, exactly.  Hierachial LUNs are a nasty fringe feature that we should
> avoid as much as possible, that is for everything but IBM vSCSI which is
> braindead enough to force them.
>
Yep.

>> The question is whether we really need to support multiple targets on
>> a virtio-scsi adapter or not.  If you are selectively mapping LUNs
>> that the guest may access, then multiple targets are not necessary.
>> If we want to do pass-through of the entire SCSI bus then we need
>> multiple targets but I'm not sure if there are other challenges like
>> dependencies on the transport (Fibre Channel, SAS, etc) which make it
>> impossible to pass through bus-level access?
>
> I don't think bus-level pass through is either easily possible nor
> desirable.  What multiple targets are useful for is allowing more
> virtual disks than we have virtual PCI slots.  We could do this by
> supporting multiple LUNs, but given that many SCSI ressources are
> target-based doing multiple targets most likely is the more scabale
> and more logical variant.  E.g. we could much more easily have one
> virtqueue per target than per LUN.
>
The general idea here is that we can support NPIV.
With NPIV we'll have several scsi_hosts, each of which is assigned a 
different set of LUNs by the array.
With virtio we need to able to react on LUN remapping on the array 
side, ie we need to be able to issue a 'REPORT LUNS' command and 
add/remove LUNs on the fly. This means we have to expose the 
scsi_host in some way via virtio.

This is impossible with a one-to-one mapping between targets and 
LUNs. The actual bus-level pass-through will be just on the SCSI 
layer, ie 'REPORT LUNS' should be possible. If and how we do a LUN 
remapping internally on the host is a totally different matter.
Same goes for the transport details; I doubt we will expose all the 
dingy details of the various transports, but rather restrict 
ourselves to an abstract transport.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:23               ` [Qemu-devel] " Hannes Reinecke
@ 2011-06-29 10:27                 ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:27 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, Paolo Bonzini,
	Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization,
	Nicholas A. Bellinger

On Wed, Jun 29, 2011 at 12:23:38PM +0200, Hannes Reinecke wrote:
> The general idea here is that we can support NPIV.
> With NPIV we'll have several scsi_hosts, each of which is assigned a
> different set of LUNs by the array.
> With virtio we need to able to react on LUN remapping on the array
> side, ie we need to be able to issue a 'REPORT LUNS' command and
> add/remove LUNs on the fly. This means we have to expose the
> scsi_host in some way via virtio.
> 
> This is impossible with a one-to-one mapping between targets and
> LUNs. The actual bus-level pass-through will be just on the SCSI
> layer, ie 'REPORT LUNS' should be possible. If and how we do a LUN
> remapping internally on the host is a totally different matter.
> Same goes for the transport details; I doubt we will expose all the
> dingy details of the various transports, but rather restrict
> ourselves to an abstract transport.

If we want to support traditional NPIV that's what we have to do.
I still hope we'll see broad SR-IOV support for FC adapters soon,
which would ease all this greatly.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:27                 ` Christoph Hellwig
  0 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:27 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Stefan Hajnoczi, qemu-devel, Nicholas A. Bellinger,
	Linux Kernel Mailing List, Christoph Hellwig, Paolo Bonzini,
	Linux Virtualization

On Wed, Jun 29, 2011 at 12:23:38PM +0200, Hannes Reinecke wrote:
> The general idea here is that we can support NPIV.
> With NPIV we'll have several scsi_hosts, each of which is assigned a
> different set of LUNs by the array.
> With virtio we need to able to react on LUN remapping on the array
> side, ie we need to be able to issue a 'REPORT LUNS' command and
> add/remove LUNs on the fly. This means we have to expose the
> scsi_host in some way via virtio.
> 
> This is impossible with a one-to-one mapping between targets and
> LUNs. The actual bus-level pass-through will be just on the SCSI
> layer, ie 'REPORT LUNS' should be possible. If and how we do a LUN
> remapping internally on the host is a totally different matter.
> Same goes for the transport details; I doubt we will expose all the
> dingy details of the various transports, but rather restrict
> ourselves to an abstract transport.

If we want to support traditional NPIV that's what we have to do.
I still hope we'll see broad SR-IOV support for FC adapters soon,
which would ease all this greatly.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:23               ` [Qemu-devel] " Hannes Reinecke
  (?)
@ 2011-06-29 10:27               ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2011-06-29 10:27 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Christoph Hellwig,
	Paolo Bonzini, Linux Virtualization

On Wed, Jun 29, 2011 at 12:23:38PM +0200, Hannes Reinecke wrote:
> The general idea here is that we can support NPIV.
> With NPIV we'll have several scsi_hosts, each of which is assigned a
> different set of LUNs by the array.
> With virtio we need to able to react on LUN remapping on the array
> side, ie we need to be able to issue a 'REPORT LUNS' command and
> add/remove LUNs on the fly. This means we have to expose the
> scsi_host in some way via virtio.
> 
> This is impossible with a one-to-one mapping between targets and
> LUNs. The actual bus-level pass-through will be just on the SCSI
> layer, ie 'REPORT LUNS' should be possible. If and how we do a LUN
> remapping internally on the host is a totally different matter.
> Same goes for the transport details; I doubt we will expose all the
> dingy details of the various transports, but rather restrict
> ourselves to an abstract transport.

If we want to support traditional NPIV that's what we have to do.
I still hope we'll see broad SR-IOV support for FC adapters soon,
which would ease all this greatly.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:06           ` [Qemu-devel] " Paolo Bonzini
@ 2011-06-29 10:31             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-29 10:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Hannes Reinecke, Linux Virtualization,
	Linux Kernel Mailing List, qemu-devel, Rusty Russell,
	Stefan Hajnoczi, Christoph Hellwig, kvm

On Wed, Jun 29, 2011 at 12:06:29PM +0200, Paolo Bonzini wrote:
> On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
> >>  I agree here, in fact I misread Hannes's comment as "if a driver
> >>  uses more than one queue it is responsibility of the driver to
> >>  ensure strict request ordering".  If you send requests to different
> >>  queues, you know that those requests are independent.  I don't think
> >>  anything else is feasible in the virtio framework.
> >
> >That doesn't really fit very well with the SAM model.  If we want
> >to use multiple queues for a single LUN it has to be transparent to
> >the SCSI command stream.  Then again I don't quite see the use for
> >that anyway.
> 
> Agreed, I see a use for multiple queues (MSI-X), but not for
> multiple queues shared by a single LUN.
> 
> Paolo

Then let's make it explicit in the spec?

-- 
MST

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:31             ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-29 10:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Rusty Russell,
	Linux Kernel Mailing List, qemu-devel, Christoph Hellwig,
	Hannes Reinecke, Linux Virtualization

On Wed, Jun 29, 2011 at 12:06:29PM +0200, Paolo Bonzini wrote:
> On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
> >>  I agree here, in fact I misread Hannes's comment as "if a driver
> >>  uses more than one queue it is responsibility of the driver to
> >>  ensure strict request ordering".  If you send requests to different
> >>  queues, you know that those requests are independent.  I don't think
> >>  anything else is feasible in the virtio framework.
> >
> >That doesn't really fit very well with the SAM model.  If we want
> >to use multiple queues for a single LUN it has to be transparent to
> >the SCSI command stream.  Then again I don't quite see the use for
> >that anyway.
> 
> Agreed, I see a use for multiple queues (MSI-X), but not for
> multiple queues shared by a single LUN.
> 
> Paolo

Then let's make it explicit in the spec?

-- 
MST

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:06           ` [Qemu-devel] " Paolo Bonzini
  (?)
@ 2011-06-29 10:31           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2011-06-29 10:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm,
	Linux Kernel Mailing List, qemu-devel, Christoph Hellwig,
	Linux Virtualization

On Wed, Jun 29, 2011 at 12:06:29PM +0200, Paolo Bonzini wrote:
> On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
> >>  I agree here, in fact I misread Hannes's comment as "if a driver
> >>  uses more than one queue it is responsibility of the driver to
> >>  ensure strict request ordering".  If you send requests to different
> >>  queues, you know that those requests are independent.  I don't think
> >>  anything else is feasible in the virtio framework.
> >
> >That doesn't really fit very well with the SAM model.  If we want
> >to use multiple queues for a single LUN it has to be transparent to
> >the SCSI command stream.  Then again I don't quite see the use for
> >that anyway.
> 
> Agreed, I see a use for multiple queues (MSI-X), but not for
> multiple queues shared by a single LUN.
> 
> Paolo

Then let's make it explicit in the spec?

-- 
MST

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:31             ` [Qemu-devel] " Michael S. Tsirkin
@ 2011-06-29 10:35               ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29 10:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Hannes Reinecke, Linux Virtualization,
	Linux Kernel Mailing List, qemu-devel, Rusty Russell,
	Stefan Hajnoczi, Christoph Hellwig, kvm

On 06/29/2011 12:31 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 29, 2011 at 12:06:29PM +0200, Paolo Bonzini wrote:
>> On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
>>>>   I agree here, in fact I misread Hannes's comment as "if a driver
>>>>   uses more than one queue it is responsibility of the driver to
>>>>   ensure strict request ordering".  If you send requests to different
>>>>   queues, you know that those requests are independent.  I don't think
>>>>   anything else is feasible in the virtio framework.
>>>
>>> That doesn't really fit very well with the SAM model.  If we want
>>> to use multiple queues for a single LUN it has to be transparent to
>>> the SCSI command stream.  Then again I don't quite see the use for
>>> that anyway.
>>
>> Agreed, I see a use for multiple queues (MSI-X), but not for
>> multiple queues shared by a single LUN.
>
> Then let's make it explicit in the spec?

What, forbid it or say ordering is not guaranteed?  The latter is 
already explicit with the wording suggested in the thread.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-06-29 10:35               ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29 10:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Rusty Russell,
	Linux Kernel Mailing List, qemu-devel, Christoph Hellwig,
	Hannes Reinecke, Linux Virtualization

On 06/29/2011 12:31 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 29, 2011 at 12:06:29PM +0200, Paolo Bonzini wrote:
>> On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
>>>>   I agree here, in fact I misread Hannes's comment as "if a driver
>>>>   uses more than one queue it is responsibility of the driver to
>>>>   ensure strict request ordering".  If you send requests to different
>>>>   queues, you know that those requests are independent.  I don't think
>>>>   anything else is feasible in the virtio framework.
>>>
>>> That doesn't really fit very well with the SAM model.  If we want
>>> to use multiple queues for a single LUN it has to be transparent to
>>> the SCSI command stream.  Then again I don't quite see the use for
>>> that anyway.
>>
>> Agreed, I see a use for multiple queues (MSI-X), but not for
>> multiple queues shared by a single LUN.
>
> Then let's make it explicit in the spec?

What, forbid it or say ordering is not guaranteed?  The latter is 
already explicit with the wording suggested in the thread.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29 10:31             ` [Qemu-devel] " Michael S. Tsirkin
  (?)
@ 2011-06-29 10:35             ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-29 10:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm,
	Linux Kernel Mailing List, qemu-devel, Christoph Hellwig,
	Linux Virtualization

On 06/29/2011 12:31 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 29, 2011 at 12:06:29PM +0200, Paolo Bonzini wrote:
>> On 06/29/2011 12:03 PM, Christoph Hellwig wrote:
>>>>   I agree here, in fact I misread Hannes's comment as "if a driver
>>>>   uses more than one queue it is responsibility of the driver to
>>>>   ensure strict request ordering".  If you send requests to different
>>>>   queues, you know that those requests are independent.  I don't think
>>>>   anything else is feasible in the virtio framework.
>>>
>>> That doesn't really fit very well with the SAM model.  If we want
>>> to use multiple queues for a single LUN it has to be transparent to
>>> the SCSI command stream.  Then again I don't quite see the use for
>>> that anyway.
>>
>> Agreed, I see a use for multiple queues (MSI-X), but not for
>> multiple queues shared by a single LUN.
>
> Then let's make it explicit in the spec?

What, forbid it or say ordering is not guaranteed?  The latter is 
already explicit with the wording suggested in the thread.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  9:39           ` [Qemu-devel] " Stefan Hajnoczi
@ 2011-07-01  6:41             ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-07-01  6:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Hannes Reinecke, Christoph Hellwig, Stefan Hajnoczi, kvm,
	Michael S. Tsirkin, qemu-devel, Linux Kernel Mailing List,
	Linux Virtualization, Nicholas A. Bellinger

On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:
> > >  Of course, when doing so we would be lose the ability to freely remap
> > >  LUNs. But then remapping LUNs doesn't gain you much imho.
> > >  Plus you could always use qemu block backend here if you want
> > >  to hide the details.
> >
> >  And you could always use the QEMU block backend with scsi-generic if you
> >  want to remap LUNs, instead of true passthrough via the kernel target.
>
> IIUC the in-kernel target always does remapping.  It passes through
> individual LUNs rather than entire targets and you pick LU Numbers to
> map to the backing storage (which may or may not be a SCSI
> pass-through device).  Nicholas Bellinger can confirm whether this is
> correct.

But then I don't understand.  If you pick LU numbers both with the 
in-kernel target and with QEMU, you do not need to use e.g. WWPNs with 
fiber channel, because we are not passing through the details of the 
transport protocol (one day we might have virtio-fc, but more likely 
not).  So the LUNs you use might as well be represented by hierarchical 
LUNs.

Using NPIV with KVM would be done by mapping the same virtual N_Port ID 
in the host(s) to the same LU number in the guest.  You might already do 
this now with virtio-blk, in fact.

Put in another way: the virtio-scsi device is itself a SCSI target, so 
yes, there is a single target port identifier in virtio-scsi.  But this 
SCSI target just passes requests down to multiple real targets, and so 
will let you do ALUA and all that.

Of course if I am dead wrong please correct me.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-07-01  6:41             ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-07-01  6:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Nicholas A. Bellinger, Linux Kernel Mailing List,
	Hannes Reinecke, Linux Virtualization

On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:
> > >  Of course, when doing so we would be lose the ability to freely remap
> > >  LUNs. But then remapping LUNs doesn't gain you much imho.
> > >  Plus you could always use qemu block backend here if you want
> > >  to hide the details.
> >
> >  And you could always use the QEMU block backend with scsi-generic if you
> >  want to remap LUNs, instead of true passthrough via the kernel target.
>
> IIUC the in-kernel target always does remapping.  It passes through
> individual LUNs rather than entire targets and you pick LU Numbers to
> map to the backing storage (which may or may not be a SCSI
> pass-through device).  Nicholas Bellinger can confirm whether this is
> correct.

But then I don't understand.  If you pick LU numbers both with the 
in-kernel target and with QEMU, you do not need to use e.g. WWPNs with 
fiber channel, because we are not passing through the details of the 
transport protocol (one day we might have virtio-fc, but more likely 
not).  So the LUNs you use might as well be represented by hierarchical 
LUNs.

Using NPIV with KVM would be done by mapping the same virtual N_Port ID 
in the host(s) to the same LU number in the guest.  You might already do 
this now with virtio-blk, in fact.

Put in another way: the virtio-scsi device is itself a SCSI target, so 
yes, there is a single target port identifier in virtio-scsi.  But this 
SCSI target just passes requests down to multiple real targets, and so 
will let you do ALUA and all that.

Of course if I am dead wrong please correct me.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-06-29  9:39           ` [Qemu-devel] " Stefan Hajnoczi
                             ` (2 preceding siblings ...)
  (?)
@ 2011-07-01  6:41           ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-07-01  6:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:
> > >  Of course, when doing so we would be lose the ability to freely remap
> > >  LUNs. But then remapping LUNs doesn't gain you much imho.
> > >  Plus you could always use qemu block backend here if you want
> > >  to hide the details.
> >
> >  And you could always use the QEMU block backend with scsi-generic if you
> >  want to remap LUNs, instead of true passthrough via the kernel target.
>
> IIUC the in-kernel target always does remapping.  It passes through
> individual LUNs rather than entire targets and you pick LU Numbers to
> map to the backing storage (which may or may not be a SCSI
> pass-through device).  Nicholas Bellinger can confirm whether this is
> correct.

But then I don't understand.  If you pick LU numbers both with the 
in-kernel target and with QEMU, you do not need to use e.g. WWPNs with 
fiber channel, because we are not passing through the details of the 
transport protocol (one day we might have virtio-fc, but more likely 
not).  So the LUNs you use might as well be represented by hierarchical 
LUNs.

Using NPIV with KVM would be done by mapping the same virtual N_Port ID 
in the host(s) to the same LU number in the guest.  You might already do 
this now with virtio-blk, in fact.

Put in another way: the virtio-scsi device is itself a SCSI target, so 
yes, there is a single target port identifier in virtio-scsi.  But this 
SCSI target just passes requests down to multiple real targets, and so 
will let you do ALUA and all that.

Of course if I am dead wrong please correct me.

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-07-01  6:41             ` [Qemu-devel] " Paolo Bonzini
@ 2011-07-01  7:14               ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-07-01  7:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, Christoph Hellwig, Stefan Hajnoczi, kvm,
	Michael S. Tsirkin, qemu-devel, Linux Kernel Mailing List,
	Linux Virtualization, Nicholas A. Bellinger

On 07/01/2011 08:41 AM, Paolo Bonzini wrote:
> On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:
>> > > Of course, when doing so we would be lose the ability to
>> freely remap
>> > > LUNs. But then remapping LUNs doesn't gain you much imho.
>> > > Plus you could always use qemu block backend here if you want
>> > > to hide the details.
>> >
>> > And you could always use the QEMU block backend with
>> > scsi-generic if you want to remap LUNs, instead of true
 >> > passthrough via the kernel target.
>>
>> IIUC the in-kernel target always does remapping. It passes through
>> individual LUNs rather than entire targets and you pick LU Numbers to
>> map to the backing storage (which may or may not be a SCSI
>> pass-through device). Nicholas Bellinger can confirm whether this is
>> correct.
>
> But then I don't understand. If you pick LU numbers both with the
> in-kernel target and with QEMU, you do not need to use e.g. WWPNs
> with fiber channel, because we are not passing through the details
> of the transport protocol (one day we might have virtio-fc, but more
> likely not). So the LUNs you use might as well be represented by
> hierarchical LUNs.
>

Actually, the kernel does _not_ do a LUN remapping. It just so 
happens that most storage arrays will present the LUN starting with 
0, so normally you wouldn't notice.

However, some arrays have an array-wide LUN range, so you start 
seeing LUNs at odd places:

[3:0:5:0]    disk    LSI      INF-01-00        0750  /dev/sdw
[3:0:5:7]    disk    LSI      Universal Xport  0750  /dev/sdx

> Using NPIV with KVM would be done by mapping the same virtual N_Port
> ID in the host(s) to the same LU number in the guest. You might
> already do this now with virtio-blk, in fact.
>
The point here is not the mapping. The point is rescanning.

You can map existing NPIV devices already. But you _cannot_ rescan
the host/device whatever _from the guest_ to detect if new devices
are present.
That is the problem I'm trying to describe here.

To be more explicit:
Currently you have to map existing devices directly as individual 
block or scsi devices to the guest.
And rescan within the guest can only be sent to that device, so the 
only information you will get able to gather is if the device itself 
is still present.
You are unable to detect if there are other devices attached to your 
guest which you should connect to.

So we have to have an enclosing instance (ie the equivalent of a 
SCSI target), which is capable of telling us exactly this.

> Put in another way: the virtio-scsi device is itself a SCSI target,
> so yes, there is a single target port identifier in virtio-scsi. But
> this SCSI target just passes requests down to multiple real targets,
> and so will let you do ALUA and all that.
>
Argl. No way. The virtio-scsi device has to map to a single LUN.

I thought I mentioned this already, but I'd better clarify this again:

The SCSI spec itself only deals with LUNs, so anything you'll read 
in there obviously will only handle the interaction between the 
initiator (read: host) and the LUN itself. However, the actual 
command is send via an intermediat target, hence you'll always see 
the reference to the ITL (initiator-target-lun) nexus.
The SCSI spec details discovery of the individual LUNs presented by 
a given target, it does _NOT_ detail the discovery of the targets 
themselves.
That is being delegated to the underlying transport, in most cases 
SAS or FibreChannel.
For the same reason the SCSI spec can afford to disdain any 
reference to path failure, device hot-plugging etc; all of these 
things are being delegated to the transport.

In our context the virtio-scsi device should map to the LUN, and the 
virtio-scsi _host_ backend should map to the target.
The virtio-scsi _guest_ driver will then map to the initiator.

So we should be able to attach more than one device to the backend,
which then will be presented to the initiator.

In the case of NPIV it would make sense to map the virtual SCSI host 
to the backend, so that all devices presented to the virtual SCSI 
host will be presented to the backend, too.
However, when doing so these devices will normally be referenced by 
their original LUN, as these will be presented to the guest via eg 
'REPORT LUNS'.

The above thread now tries to figure out if we should remap those 
LUN numbers or just expose them as they are.
If we decide on remapping, we have to emulate _all_ commands 
referring explicitely to those LUN numbers (persistent reservations, 
anyone?). If we don't, we would expose some hardware detail to the 
guest, but would save us _a lot_ of processing.

I'm all for the latter.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-07-01  7:14               ` Hannes Reinecke
  0 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-07-01  7:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Stefan Hajnoczi, qemu-devel, Nicholas A. Bellinger,
	Linux Kernel Mailing List, Linux Virtualization

On 07/01/2011 08:41 AM, Paolo Bonzini wrote:
> On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:
>> > > Of course, when doing so we would be lose the ability to
>> freely remap
>> > > LUNs. But then remapping LUNs doesn't gain you much imho.
>> > > Plus you could always use qemu block backend here if you want
>> > > to hide the details.
>> >
>> > And you could always use the QEMU block backend with
>> > scsi-generic if you want to remap LUNs, instead of true
 >> > passthrough via the kernel target.
>>
>> IIUC the in-kernel target always does remapping. It passes through
>> individual LUNs rather than entire targets and you pick LU Numbers to
>> map to the backing storage (which may or may not be a SCSI
>> pass-through device). Nicholas Bellinger can confirm whether this is
>> correct.
>
> But then I don't understand. If you pick LU numbers both with the
> in-kernel target and with QEMU, you do not need to use e.g. WWPNs
> with fiber channel, because we are not passing through the details
> of the transport protocol (one day we might have virtio-fc, but more
> likely not). So the LUNs you use might as well be represented by
> hierarchical LUNs.
>

Actually, the kernel does _not_ do a LUN remapping. It just so 
happens that most storage arrays will present the LUN starting with 
0, so normally you wouldn't notice.

However, some arrays have an array-wide LUN range, so you start 
seeing LUNs at odd places:

[3:0:5:0]    disk    LSI      INF-01-00        0750  /dev/sdw
[3:0:5:7]    disk    LSI      Universal Xport  0750  /dev/sdx

> Using NPIV with KVM would be done by mapping the same virtual N_Port
> ID in the host(s) to the same LU number in the guest. You might
> already do this now with virtio-blk, in fact.
>
The point here is not the mapping. The point is rescanning.

You can map existing NPIV devices already. But you _cannot_ rescan
the host/device whatever _from the guest_ to detect if new devices
are present.
That is the problem I'm trying to describe here.

To be more explicit:
Currently you have to map existing devices directly as individual 
block or scsi devices to the guest.
And rescan within the guest can only be sent to that device, so the 
only information you will get able to gather is if the device itself 
is still present.
You are unable to detect if there are other devices attached to your 
guest which you should connect to.

So we have to have an enclosing instance (ie the equivalent of a 
SCSI target), which is capable of telling us exactly this.

> Put in another way: the virtio-scsi device is itself a SCSI target,
> so yes, there is a single target port identifier in virtio-scsi. But
> this SCSI target just passes requests down to multiple real targets,
> and so will let you do ALUA and all that.
>
Argl. No way. The virtio-scsi device has to map to a single LUN.

I thought I mentioned this already, but I'd better clarify this again:

The SCSI spec itself only deals with LUNs, so anything you'll read 
in there obviously will only handle the interaction between the 
initiator (read: host) and the LUN itself. However, the actual 
command is send via an intermediat target, hence you'll always see 
the reference to the ITL (initiator-target-lun) nexus.
The SCSI spec details discovery of the individual LUNs presented by 
a given target, it does _NOT_ detail the discovery of the targets 
themselves.
That is being delegated to the underlying transport, in most cases 
SAS or FibreChannel.
For the same reason the SCSI spec can afford to disdain any 
reference to path failure, device hot-plugging etc; all of these 
things are being delegated to the transport.

In our context the virtio-scsi device should map to the LUN, and the 
virtio-scsi _host_ backend should map to the target.
The virtio-scsi _guest_ driver will then map to the initiator.

So we should be able to attach more than one device to the backend,
which then will be presented to the initiator.

In the case of NPIV it would make sense to map the virtual SCSI host 
to the backend, so that all devices presented to the virtual SCSI 
host will be presented to the backend, too.
However, when doing so these devices will normally be referenced by 
their original LUN, as these will be presented to the guest via eg 
'REPORT LUNS'.

The above thread now tries to figure out if we should remap those 
LUN numbers or just expose them as they are.
If we decide on remapping, we have to emulate _all_ commands 
referring explicitely to those LUN numbers (persistent reservations, 
anyone?). If we don't, we would expose some hardware detail to the 
guest, but would save us _a lot_ of processing.

I'm all for the latter.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-07-01  6:41             ` [Qemu-devel] " Paolo Bonzini
  (?)
  (?)
@ 2011-07-01  7:14             ` Hannes Reinecke
  -1 siblings, 0 replies; 91+ messages in thread
From: Hannes Reinecke @ 2011-07-01  7:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 07/01/2011 08:41 AM, Paolo Bonzini wrote:
> On 06/29/2011 11:39 AM, Stefan Hajnoczi wrote:
>> > > Of course, when doing so we would be lose the ability to
>> freely remap
>> > > LUNs. But then remapping LUNs doesn't gain you much imho.
>> > > Plus you could always use qemu block backend here if you want
>> > > to hide the details.
>> >
>> > And you could always use the QEMU block backend with
>> > scsi-generic if you want to remap LUNs, instead of true
 >> > passthrough via the kernel target.
>>
>> IIUC the in-kernel target always does remapping. It passes through
>> individual LUNs rather than entire targets and you pick LU Numbers to
>> map to the backing storage (which may or may not be a SCSI
>> pass-through device). Nicholas Bellinger can confirm whether this is
>> correct.
>
> But then I don't understand. If you pick LU numbers both with the
> in-kernel target and with QEMU, you do not need to use e.g. WWPNs
> with fiber channel, because we are not passing through the details
> of the transport protocol (one day we might have virtio-fc, but more
> likely not). So the LUNs you use might as well be represented by
> hierarchical LUNs.
>

Actually, the kernel does _not_ do a LUN remapping. It just so 
happens that most storage arrays will present the LUN starting with 
0, so normally you wouldn't notice.

However, some arrays have an array-wide LUN range, so you start 
seeing LUNs at odd places:

[3:0:5:0]    disk    LSI      INF-01-00        0750  /dev/sdw
[3:0:5:7]    disk    LSI      Universal Xport  0750  /dev/sdx

> Using NPIV with KVM would be done by mapping the same virtual N_Port
> ID in the host(s) to the same LU number in the guest. You might
> already do this now with virtio-blk, in fact.
>
The point here is not the mapping. The point is rescanning.

You can map existing NPIV devices already. But you _cannot_ rescan
the host/device whatever _from the guest_ to detect if new devices
are present.
That is the problem I'm trying to describe here.

To be more explicit:
Currently you have to map existing devices directly as individual 
block or scsi devices to the guest.
And rescan within the guest can only be sent to that device, so the 
only information you will get able to gather is if the device itself 
is still present.
You are unable to detect if there are other devices attached to your 
guest which you should connect to.

So we have to have an enclosing instance (ie the equivalent of a 
SCSI target), which is capable of telling us exactly this.

> Put in another way: the virtio-scsi device is itself a SCSI target,
> so yes, there is a single target port identifier in virtio-scsi. But
> this SCSI target just passes requests down to multiple real targets,
> and so will let you do ALUA and all that.
>
Argl. No way. The virtio-scsi device has to map to a single LUN.

I thought I mentioned this already, but I'd better clarify this again:

The SCSI spec itself only deals with LUNs, so anything you'll read 
in there obviously will only handle the interaction between the 
initiator (read: host) and the LUN itself. However, the actual 
command is send via an intermediat target, hence you'll always see 
the reference to the ITL (initiator-target-lun) nexus.
The SCSI spec details discovery of the individual LUNs presented by 
a given target, it does _NOT_ detail the discovery of the targets 
themselves.
That is being delegated to the underlying transport, in most cases 
SAS or FibreChannel.
For the same reason the SCSI spec can afford to disdain any 
reference to path failure, device hot-plugging etc; all of these 
things are being delegated to the transport.

In our context the virtio-scsi device should map to the LUN, and the 
virtio-scsi _host_ backend should map to the target.
The virtio-scsi _guest_ driver will then map to the initiator.

So we should be able to attach more than one device to the backend,
which then will be presented to the initiator.

In the case of NPIV it would make sense to map the virtual SCSI host 
to the backend, so that all devices presented to the virtual SCSI 
host will be presented to the backend, too.
However, when doing so these devices will normally be referenced by 
their original LUN, as these will be presented to the guest via eg 
'REPORT LUNS'.

The above thread now tries to figure out if we should remap those 
LUN numbers or just expose them as they are.
If we decide on remapping, we have to emulate _all_ commands 
referring explicitely to those LUN numbers (persistent reservations, 
anyone?). If we don't, we would expose some hardware detail to the 
guest, but would save us _a lot_ of processing.

I'm all for the latter.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-07-01  7:14               ` [Qemu-devel] " Hannes Reinecke
@ 2011-07-01  8:35                 ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-07-01  8:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Stefan Hajnoczi, Christoph Hellwig, Stefan Hajnoczi, kvm,
	Michael S. Tsirkin, qemu-devel, Linux Kernel Mailing List,
	Linux Virtualization, Nicholas A. Bellinger

On 07/01/2011 09:14 AM, Hannes Reinecke wrote:
> Actually, the kernel does _not_ do a LUN remapping.

Not the kernel, the in-kernel target.  The in-kernel target can and will
map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs
(mapped_lun).

>> Put in another way: the virtio-scsi device is itself a SCSI
>> target,
>
> Argl. No way. The virtio-scsi device has to map to a single LUN.

I think we are talking about different things. By "virtio-scsi device"
I meant the "virtio-scsi HBA".  When I referred to a LUN as seen by the
guest, I was calling it a "virtual SCSI device".  So yes, we were
calling things with different names.  Perhaps from now on
we can call them virtio-scsi {initiator,target,LUN} and have no
ambiguity?  I'll also modify the spec in this sense.

> The SCSI spec itself only deals with LUNs, so anything you'll read in
> there obviously will only handle the interaction between the
> initiator (read: host) and the LUN itself. However, the actual
> command is send via an intermediat target, hence you'll always see
> the reference to the ITL (initiator-target-lun) nexus.

Yes, this I understand.

> The SCSI spec details discovery of the individual LUNs presented by a
> given target, it does _NOT_ detail the discovery of the targets
> themselves.  That is being delegated to the underlying transport

And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a
transport:

     When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
     is sent for LUN 0, the driver should ask the initiator to rescan
     the target, in order to detect the case when an entire target has
     appeared or disappeared.

     [If the device fails] to report an event due to missing buffers,
     [...] the driver should poll the logical units for unit attention
     conditions, and/or do whatever form of bus scan is appropriate for
     the guest operating system.

> In the case of NPIV it would make sense to map the virtual SCSI host
>  to the backend, so that all devices presented to the virtual SCSI
> host will be presented to the backend, too. However, when doing so
> these devices will normally be referenced by their original LUN, as
> these will be presented to the guest via eg 'REPORT LUNS'.

Right.

> The above thread now tries to figure out if we should remap those LUN
> numbers or just expose them as they are. If we decide on remapping,
> we have to emulate _all_ commands referring explicitely to those LUN
> numbers (persistent reservations, anyone?).

But it seems to me that commands referring explicitly to LUN numbers
most likely have to be reimplemented anyway for virtualization.  I'm
thinking exactly of persistent reservations.  If two guests on the same
host try a persistent reservation, they should conflict with each other.
If reservation commands were just passed through, they would be seen
as coming from the same initiator (the HBA driver or iSCSI initiator in
the host OS).

etc.

> If we don't, we would expose some hardware detail to the guest, but
> would save us _a lot_ of processing.

But can we afford it?  And would the architecture allow that at all?

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-07-01  8:35                 ` Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-07-01  8:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Stefan Hajnoczi, qemu-devel, Nicholas A. Bellinger,
	Linux Kernel Mailing List, Linux Virtualization

On 07/01/2011 09:14 AM, Hannes Reinecke wrote:
> Actually, the kernel does _not_ do a LUN remapping.

Not the kernel, the in-kernel target.  The in-kernel target can and will
map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs
(mapped_lun).

>> Put in another way: the virtio-scsi device is itself a SCSI
>> target,
>
> Argl. No way. The virtio-scsi device has to map to a single LUN.

I think we are talking about different things. By "virtio-scsi device"
I meant the "virtio-scsi HBA".  When I referred to a LUN as seen by the
guest, I was calling it a "virtual SCSI device".  So yes, we were
calling things with different names.  Perhaps from now on
we can call them virtio-scsi {initiator,target,LUN} and have no
ambiguity?  I'll also modify the spec in this sense.

> The SCSI spec itself only deals with LUNs, so anything you'll read in
> there obviously will only handle the interaction between the
> initiator (read: host) and the LUN itself. However, the actual
> command is send via an intermediat target, hence you'll always see
> the reference to the ITL (initiator-target-lun) nexus.

Yes, this I understand.

> The SCSI spec details discovery of the individual LUNs presented by a
> given target, it does _NOT_ detail the discovery of the targets
> themselves.  That is being delegated to the underlying transport

And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a
transport:

     When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
     is sent for LUN 0, the driver should ask the initiator to rescan
     the target, in order to detect the case when an entire target has
     appeared or disappeared.

     [If the device fails] to report an event due to missing buffers,
     [...] the driver should poll the logical units for unit attention
     conditions, and/or do whatever form of bus scan is appropriate for
     the guest operating system.

> In the case of NPIV it would make sense to map the virtual SCSI host
>  to the backend, so that all devices presented to the virtual SCSI
> host will be presented to the backend, too. However, when doing so
> these devices will normally be referenced by their original LUN, as
> these will be presented to the guest via eg 'REPORT LUNS'.

Right.

> The above thread now tries to figure out if we should remap those LUN
> numbers or just expose them as they are. If we decide on remapping,
> we have to emulate _all_ commands referring explicitely to those LUN
> numbers (persistent reservations, anyone?).

But it seems to me that commands referring explicitly to LUN numbers
most likely have to be reimplemented anyway for virtualization.  I'm
thinking exactly of persistent reservations.  If two guests on the same
host try a persistent reservation, they should conflict with each other.
If reservation commands were just passed through, they would be seen
as coming from the same initiator (the HBA driver or iSCSI initiator in
the host OS).

etc.

> If we don't, we would expose some hardware detail to the guest, but
> would save us _a lot_ of processing.

But can we afford it?  And would the architecture allow that at all?

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-07-01  7:14               ` [Qemu-devel] " Hannes Reinecke
  (?)
  (?)
@ 2011-07-01  8:35               ` Paolo Bonzini
  -1 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-07-01  8:35 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 07/01/2011 09:14 AM, Hannes Reinecke wrote:
> Actually, the kernel does _not_ do a LUN remapping.

Not the kernel, the in-kernel target.  The in-kernel target can and will
map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs
(mapped_lun).

>> Put in another way: the virtio-scsi device is itself a SCSI
>> target,
>
> Argl. No way. The virtio-scsi device has to map to a single LUN.

I think we are talking about different things. By "virtio-scsi device"
I meant the "virtio-scsi HBA".  When I referred to a LUN as seen by the
guest, I was calling it a "virtual SCSI device".  So yes, we were
calling things with different names.  Perhaps from now on
we can call them virtio-scsi {initiator,target,LUN} and have no
ambiguity?  I'll also modify the spec in this sense.

> The SCSI spec itself only deals with LUNs, so anything you'll read in
> there obviously will only handle the interaction between the
> initiator (read: host) and the LUN itself. However, the actual
> command is send via an intermediat target, hence you'll always see
> the reference to the ITL (initiator-target-lun) nexus.

Yes, this I understand.

> The SCSI spec details discovery of the individual LUNs presented by a
> given target, it does _NOT_ detail the discovery of the targets
> themselves.  That is being delegated to the underlying transport

And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a
transport:

     When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
     is sent for LUN 0, the driver should ask the initiator to rescan
     the target, in order to detect the case when an entire target has
     appeared or disappeared.

     [If the device fails] to report an event due to missing buffers,
     [...] the driver should poll the logical units for unit attention
     conditions, and/or do whatever form of bus scan is appropriate for
     the guest operating system.

> In the case of NPIV it would make sense to map the virtual SCSI host
>  to the backend, so that all devices presented to the virtual SCSI
> host will be presented to the backend, too. However, when doing so
> these devices will normally be referenced by their original LUN, as
> these will be presented to the guest via eg 'REPORT LUNS'.

Right.

> The above thread now tries to figure out if we should remap those LUN
> numbers or just expose them as they are. If we decide on remapping,
> we have to emulate _all_ commands referring explicitely to those LUN
> numbers (persistent reservations, anyone?).

But it seems to me that commands referring explicitly to LUN numbers
most likely have to be reimplemented anyway for virtualization.  I'm
thinking exactly of persistent reservations.  If two guests on the same
host try a persistent reservation, they should conflict with each other.
If reservation commands were just passed through, they would be seen
as coming from the same initiator (the HBA driver or iSCSI initiator in
the host OS).

etc.

> If we don't, we would expose some hardware detail to the guest, but
> would save us _a lot_ of processing.

But can we afford it?  And would the architecture allow that at all?

Paolo

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-07-01  8:35                 ` [Qemu-devel] " Paolo Bonzini
@ 2011-07-04 13:38                   ` Hai Dong,Li
  -1 siblings, 0 replies; 91+ messages in thread
From: Hai Dong,Li @ 2011-07-04 13:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Hannes Reinecke, Stefan Hajnoczi, Christoph Hellwig,
	Stefan Hajnoczi, kvm, Michael S. Tsirkin, qemu-devel,
	Linux Kernel Mailing List, Linux Virtualization,
	Nicholas A. Bellinger

On 07/01/2011 09:14 AM, Hannes Reinecke wrote:
>> Actually, the kernel does _not_ do a LUN remapping.
>
> Not the kernel, the in-kernel target.  The in-kernel target can and will
> map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs
> (mapped_lun).
>
>>> Put in another way: the virtio-scsi device is itself a SCSI
>>> target,
>>
>> Argl. No way. The virtio-scsi device has to map to a single LUN.
>
> I think we are talking about different things. By "virtio-scsi device"
> I meant the "virtio-scsi HBA".  When I referred to a LUN as seen by the
> guest, I was calling it a "virtual SCSI device".  So yes, we were
> calling things with different names.  Perhaps from now on
> we can call them virtio-scsi {initiator,target,LUN} and have no
> ambiguity?  I'll also modify the spec in this sense.
>
>> The SCSI spec itself only deals with LUNs, so anything you'll read in
>> there obviously will only handle the interaction between the
>> initiator (read: host) and the LUN itself. However, the actual
>> command is send via an intermediat target, hence you'll always see
>> the reference to the ITL (initiator-target-lun) nexus.
>
> Yes, this I understand.
>
>> The SCSI spec details discovery of the individual LUNs presented by a
>> given target, it does _NOT_ detail the discovery of the targets
>> themselves.  That is being delegated to the underlying transport
>
> And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a
> transport:
Oh, here I catch up. I was wondering why there're ordering issues when 
talking
about virtio-scsi, since in SAM3, the third and the last paragraph of 
section
4.6.3 Request/Response ordering clearly describe it:

The manner in which ordering constraints are established is vendor 
specific. An
implementation may delegate this responsibility to the application client
(e.g., the device driver). In-order delivery may be an intrinsic property of
the service delivery subsystem or a requirement established by the SCSI
transport protocol standard.

To simplify the description of behavior, the SCSI architecture model assumes
in-order delivery of requests or responses to be a property of the service
delivery subsystem. This assumption does not constitute a requirement.  The
SCSI architecture model makes no assumption about and places no 
requirement on
the ordering of requests or responses for different I_T nexuses.

So if I understand correctly, virtio-scsi looks like an SCSI tranport 
protocol,
such as iSCSI, FCP and SRP which use tcp/ip, FC and Infiniband RDMA
respectively as the transfer media while virtio-scsi uses virtio, an 
virtual IO
channel, as the transfer media?

>
>     When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
>     is sent for LUN 0, the driver should ask the initiator to rescan
>     the target, in order to detect the case when an entire target has
>     appeared or disappeared.
>
>     [If the device fails] to report an event due to missing buffers,
>     [...] the driver should poll the logical units for unit attention
>     conditions, and/or do whatever form of bus scan is appropriate for
>     the guest operating system.
>
>> In the case of NPIV it would make sense to map the virtual SCSI host
>>  to the backend, so that all devices presented to the virtual SCSI
>> host will be presented to the backend, too. However, when doing so
>> these devices will normally be referenced by their original LUN, as
>> these will be presented to the guest via eg 'REPORT LUNS'.
>
> Right.
>
>> The above thread now tries to figure out if we should remap those LUN
>> numbers or just expose them as they are. If we decide on remapping,
>> we have to emulate _all_ commands referring explicitely to those LUN
>> numbers (persistent reservations, anyone?).
>
> But it seems to me that commands referring explicitly to LUN numbers
> most likely have to be reimplemented anyway for virtualization.  I'm
> thinking exactly of persistent reservations.  If two guests on the same
> host try a persistent reservation, they should conflict with each other.
> If reservation commands were just passed through, they would be seen
> as coming from the same initiator (the HBA driver or iSCSI initiator in
> the host OS).
>
> etc.
>
>> If we don't, we would expose some hardware detail to the guest, but
>> would save us _a lot_ of processing.
>
> But can we afford it?  And would the architecture allow that at all?
>
> Paolo
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-07-04 13:38                   ` Hai Dong,Li
  0 siblings, 0 replies; 91+ messages in thread
From: Hai Dong,Li @ 2011-07-04 13:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	Stefan Hajnoczi, qemu-devel, Nicholas A. Bellinger,
	Linux Kernel Mailing List, Hannes Reinecke, Linux Virtualization

On 07/01/2011 09:14 AM, Hannes Reinecke wrote:
>> Actually, the kernel does _not_ do a LUN remapping.
>
> Not the kernel, the in-kernel target.  The in-kernel target can and will
> map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs
> (mapped_lun).
>
>>> Put in another way: the virtio-scsi device is itself a SCSI
>>> target,
>>
>> Argl. No way. The virtio-scsi device has to map to a single LUN.
>
> I think we are talking about different things. By "virtio-scsi device"
> I meant the "virtio-scsi HBA".  When I referred to a LUN as seen by the
> guest, I was calling it a "virtual SCSI device".  So yes, we were
> calling things with different names.  Perhaps from now on
> we can call them virtio-scsi {initiator,target,LUN} and have no
> ambiguity?  I'll also modify the spec in this sense.
>
>> The SCSI spec itself only deals with LUNs, so anything you'll read in
>> there obviously will only handle the interaction between the
>> initiator (read: host) and the LUN itself. However, the actual
>> command is send via an intermediat target, hence you'll always see
>> the reference to the ITL (initiator-target-lun) nexus.
>
> Yes, this I understand.
>
>> The SCSI spec details discovery of the individual LUNs presented by a
>> given target, it does _NOT_ detail the discovery of the targets
>> themselves.  That is being delegated to the underlying transport
>
> And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a
> transport:
Oh, here I catch up. I was wondering why there're ordering issues when 
talking
about virtio-scsi, since in SAM3, the third and the last paragraph of 
section
4.6.3 Request/Response ordering clearly describe it:

The manner in which ordering constraints are established is vendor 
specific. An
implementation may delegate this responsibility to the application client
(e.g., the device driver). In-order delivery may be an intrinsic property of
the service delivery subsystem or a requirement established by the SCSI
transport protocol standard.

To simplify the description of behavior, the SCSI architecture model assumes
in-order delivery of requests or responses to be a property of the service
delivery subsystem. This assumption does not constitute a requirement.  The
SCSI architecture model makes no assumption about and places no 
requirement on
the ordering of requests or responses for different I_T nexuses.

So if I understand correctly, virtio-scsi looks like an SCSI tranport 
protocol,
such as iSCSI, FCP and SRP which use tcp/ip, FC and Infiniband RDMA
respectively as the transfer media while virtio-scsi uses virtio, an 
virtual IO
channel, as the transfer media?

>
>     When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
>     is sent for LUN 0, the driver should ask the initiator to rescan
>     the target, in order to detect the case when an entire target has
>     appeared or disappeared.
>
>     [If the device fails] to report an event due to missing buffers,
>     [...] the driver should poll the logical units for unit attention
>     conditions, and/or do whatever form of bus scan is appropriate for
>     the guest operating system.
>
>> In the case of NPIV it would make sense to map the virtual SCSI host
>>  to the backend, so that all devices presented to the virtual SCSI
>> host will be presented to the backend, too. However, when doing so
>> these devices will normally be referenced by their original LUN, as
>> these will be presented to the guest via eg 'REPORT LUNS'.
>
> Right.
>
>> The above thread now tries to figure out if we should remap those LUN
>> numbers or just expose them as they are. If we decide on remapping,
>> we have to emulate _all_ commands referring explicitely to those LUN
>> numbers (persistent reservations, anyone?).
>
> But it seems to me that commands referring explicitly to LUN numbers
> most likely have to be reimplemented anyway for virtualization.  I'm
> thinking exactly of persistent reservations.  If two guests on the same
> host try a persistent reservation, they should conflict with each other.
> If reservation commands were just passed through, they would be seen
> as coming from the same initiator (the HBA driver or iSCSI initiator in
> the host OS).
>
> etc.
>
>> If we don't, we would expose some hardware detail to the guest, but
>> would save us _a lot_ of processing.
>
> But can we afford it?  And would the architecture allow that at all?
>
> Paolo
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-07-01  8:35                 ` [Qemu-devel] " Paolo Bonzini
  (?)
  (?)
@ 2011-07-04 13:38                 ` Hai Dong,Li
  -1 siblings, 0 replies; 91+ messages in thread
From: Hai Dong,Li @ 2011-07-04 13:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Linux Virtualization

On 07/01/2011 09:14 AM, Hannes Reinecke wrote:
>> Actually, the kernel does _not_ do a LUN remapping.
>
> Not the kernel, the in-kernel target.  The in-kernel target can and will
> map hardware LUNs (target_lun in drivers/target/*) to arbitrary LUNs
> (mapped_lun).
>
>>> Put in another way: the virtio-scsi device is itself a SCSI
>>> target,
>>
>> Argl. No way. The virtio-scsi device has to map to a single LUN.
>
> I think we are talking about different things. By "virtio-scsi device"
> I meant the "virtio-scsi HBA".  When I referred to a LUN as seen by the
> guest, I was calling it a "virtual SCSI device".  So yes, we were
> calling things with different names.  Perhaps from now on
> we can call them virtio-scsi {initiator,target,LUN} and have no
> ambiguity?  I'll also modify the spec in this sense.
>
>> The SCSI spec itself only deals with LUNs, so anything you'll read in
>> there obviously will only handle the interaction between the
>> initiator (read: host) and the LUN itself. However, the actual
>> command is send via an intermediat target, hence you'll always see
>> the reference to the ITL (initiator-target-lun) nexus.
>
> Yes, this I understand.
>
>> The SCSI spec details discovery of the individual LUNs presented by a
>> given target, it does _NOT_ detail the discovery of the targets
>> themselves.  That is being delegated to the underlying transport
>
> And in fact I have this in virtio-scsi too, since virtio-scsi _is_ a
> transport:
Oh, here I catch up. I was wondering why there're ordering issues when 
talking
about virtio-scsi, since in SAM3, the third and the last paragraph of 
section
4.6.3 Request/Response ordering clearly describe it:

The manner in which ordering constraints are established is vendor 
specific. An
implementation may delegate this responsibility to the application client
(e.g., the device driver). In-order delivery may be an intrinsic property of
the service delivery subsystem or a requirement established by the SCSI
transport protocol standard.

To simplify the description of behavior, the SCSI architecture model assumes
in-order delivery of requests or responses to be a property of the service
delivery subsystem. This assumption does not constitute a requirement.  The
SCSI architecture model makes no assumption about and places no 
requirement on
the ordering of requests or responses for different I_T nexuses.

So if I understand correctly, virtio-scsi looks like an SCSI tranport 
protocol,
such as iSCSI, FCP and SRP which use tcp/ip, FC and Infiniband RDMA
respectively as the transfer media while virtio-scsi uses virtio, an 
virtual IO
channel, as the transfer media?

>
>     When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
>     is sent for LUN 0, the driver should ask the initiator to rescan
>     the target, in order to detect the case when an entire target has
>     appeared or disappeared.
>
>     [If the device fails] to report an event due to missing buffers,
>     [...] the driver should poll the logical units for unit attention
>     conditions, and/or do whatever form of bus scan is appropriate for
>     the guest operating system.
>
>> In the case of NPIV it would make sense to map the virtual SCSI host
>>  to the backend, so that all devices presented to the virtual SCSI
>> host will be presented to the backend, too. However, when doing so
>> these devices will normally be referenced by their original LUN, as
>> these will be presented to the guest via eg 'REPORT LUNS'.
>
> Right.
>
>> The above thread now tries to figure out if we should remap those LUN
>> numbers or just expose them as they are. If we decide on remapping,
>> we have to emulate _all_ commands referring explicitely to those LUN
>> numbers (persistent reservations, anyone?).
>
> But it seems to me that commands referring explicitly to LUN numbers
> most likely have to be reimplemented anyway for virtualization.  I'm
> thinking exactly of persistent reservations.  If two guests on the same
> host try a persistent reservation, they should conflict with each other.
> If reservation commands were just passed through, they would be seen
> as coming from the same initiator (the HBA driver or iSCSI initiator in
> the host OS).
>
> etc.
>
>> If we don't, we would expose some hardware detail to the guest, but
>> would save us _a lot_ of processing.
>
> But can we afford it?  And would the architecture allow that at all?
>
> Paolo
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-07-04 13:38                   ` [Qemu-devel] " Hai Dong,Li
@ 2011-07-04 14:22                     ` Stefan Hajnoczi
  -1 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-07-04 14:22 UTC (permalink / raw)
  To: Hai Dong,Li
  Cc: Paolo Bonzini, Hannes Reinecke, Christoph Hellwig,
	Stefan Hajnoczi, kvm, Michael S. Tsirkin, qemu-devel,
	Linux Kernel Mailing List, Linux Virtualization,
	Nicholas A. Bellinger

On Mon, Jul 4, 2011 at 2:38 PM, Hai Dong,Li <haidongl@linux.vnet.ibm.com> wrote:
> So if I understand correctly, virtio-scsi looks like an SCSI tranport
> protocol,
> such as iSCSI, FCP and SRP which use tcp/ip, FC and Infiniband RDMA
> respectively as the transfer media while virtio-scsi uses virtio, an virtual
> IO
> channel, as the transfer media?

Correct.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v3
@ 2011-07-04 14:22                     ` Stefan Hajnoczi
  0 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-07-04 14:22 UTC (permalink / raw)
  To: Hai Dong,Li
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Nicholas A. Bellinger, Linux Kernel Mailing List,
	Hannes Reinecke, Paolo Bonzini, Linux Virtualization

On Mon, Jul 4, 2011 at 2:38 PM, Hai Dong,Li <haidongl@linux.vnet.ibm.com> wrote:
> So if I understand correctly, virtio-scsi looks like an SCSI tranport
> protocol,
> such as iSCSI, FCP and SRP which use tcp/ip, FC and Infiniband RDMA
> respectively as the transfer media while virtio-scsi uses virtio, an virtual
> IO
> channel, as the transfer media?

Correct.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: virtio scsi host draft specification, v3
  2011-07-04 13:38                   ` [Qemu-devel] " Hai Dong,Li
  (?)
  (?)
@ 2011-07-04 14:22                   ` Stefan Hajnoczi
  -1 siblings, 0 replies; 91+ messages in thread
From: Stefan Hajnoczi @ 2011-07-04 14:22 UTC (permalink / raw)
  To: Hai Dong,Li
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin,
	qemu-devel, Linux Kernel Mailing List, Paolo Bonzini,
	Linux Virtualization

On Mon, Jul 4, 2011 at 2:38 PM, Hai Dong,Li <haidongl@linux.vnet.ibm.com> wrote:
> So if I understand correctly, virtio-scsi looks like an SCSI tranport
> protocol,
> such as iSCSI, FCP and SRP which use tcp/ip, FC and Infiniband RDMA
> respectively as the transfer media while virtio-scsi uses virtio, an virtual
> IO
> channel, as the transfer media?

Correct.

Stefan

^ permalink raw reply	[flat|nested] 91+ messages in thread

* virtio scsi host draft specification, v3
@ 2011-06-07 13:43 Paolo Bonzini
  0 siblings, 0 replies; 91+ messages in thread
From: Paolo Bonzini @ 2011-06-07 13:43 UTC (permalink / raw)
  To: Linux Virtualization, Linux Kernel Mailing List, qemu-devel
  Cc: Christoph Hellwig, Stefan Hajnoczi, kvm, Michael S. Tsirkin

Hi all,

after some preliminary discussion on the QEMU mailing list, I present a
draft specification for a virtio-based SCSI host (controller, HBA, you
name it).

The virtio SCSI host is the basis of an alternative storage stack for
KVM. This stack would overcome several limitations of the current
solution, virtio-blk:

1) scalability limitations: virtio-blk-over-PCI puts a strong upper
limit on the number of devices that can be added to a guest. Common
configurations have a limit of ~30 devices. While this can be worked
around by implementing a PCI-to-PCI bridge, or by using multifunction
virtio-blk devices, these solutions either have not been implemented
yet, or introduce management restrictions. On the other hand, the SCSI
architecture is well known for its scalability and virtio-scsi supports
advanced feature such as multiqueueing.

2) limited flexibility: virtio-blk does not support all possible storage
scenarios. For example, it does not allow SCSI passthrough or persistent
reservations. In principle, virtio-scsi provides anything that the
underlying SCSI target (be it physical storage, iSCSI or the in-kernel
target) supports.

3) limited extensibility: over the time, many features have been added
to virtio-blk. Each such change requires modifications to the virtio
specification, to the guest drivers, and to the device model in the
host. The virtio-scsi spec has been written to follow SAM conventions,
and exposing new features to the guest will only require changes to the
host's SCSI target implementation.


Comments are welcome.

Paolo 

------------------------------- >8 -----------------------------------


Virtio SCSI Host Device Spec
============================

The virtio SCSI host device groups together one or more simple virtual
devices (ie. disk), and allows communicating to these devices using the
SCSI protocol.  An instance of the device represents a SCSI host with
possibly many buses, targets and LUN attached.

The virtio SCSI device services two kinds of requests:

- command requests for a logical unit;

- task management functions related to a logical unit, target or
command.

The device is also able to send out notifications about added
and removed logical units.

v1:
    First public version

v2:
    Merged all virtqueues into one, removed separate TARGET fields

v3:
    Added configuration information and reworked descriptor structure.
    Added back multiqueue on Avi's request, while still leaving TARGET
    fields out.  Added dummy event and clarified some aspects of the
    event protocol.  First version sent to a wider audience (linux-kernel
    and virtio lists).

Configuration
-------------

Subsystem Device ID
    TBD

Virtqueues
    0:controlq
    1:eventq
    2..n:request queues

Feature bits
    VIRTIO_SCSI_F_INOUT (0) - Whether a single request can include both
        read-only and write-only data buffers.

Device configuration layout
    struct virtio_scsi_config {
        u32 num_queues;
        u32 event_info_size;
        u32 sense_size;
        u32 cdb_size;
    }

    num_queues is the total number of virtqueues exposed by the
    device.  The driver is free to use only one request queue, or
    it can use more to achieve better performance.

    event_info_size is the maximum size that the device will fill
    for buffers that the driver places in the eventq.  The
    driver should always put buffers at least of this size.

    sense_size is the maximum size of the sense data that the device
    will write.  The default value is written by the device and
    will always be 96, but the driver can modify it.

    cdb_size is the maximum size of the CBD that the driver
    will write.  The default value is written by the device and
    will always be 32, but the driver can likewise modify it.

Device initialization
---------------------

The initialization routine should first of all discover the device's
virtqueues.

The driver should then place at least a buffer in the eventq.
Buffers returned by the device on the eventq may be referred
to as "events" in the rest of the document.

The driver can immediately issue requests (for example, INQUIRY or
REPORT LUNS) or task management functions (for example, I_T RESET).

Device operation: request queues
--------------------------------

The driver queues requests to an arbitrary request queue, and they are
used by the device on that same queue.

Requests have the following format:

    struct virtio_scsi_req_cmd {
        u8 lun[8];
        u64 id;
        u8 task_attr;
        u8 prio;
        u8 crn;
        char cdb[cdb_size];
        char dataout[];

        u8 sense[sense_size];
        u32 sense_len;
        u32 residual;
        u16 status_qualifier;
        u8 status;
        u8 response;
        char datain[];
    };

    /* command-specific response values */
    #define VIRTIO_SCSI_S_OK              0
    #define VIRTIO_SCSI_S_UNDERRUN        1
    #define VIRTIO_SCSI_S_ABORTED         2
    #define VIRTIO_SCSI_S_FAILURE         3

    /* task_attr */
    #define VIRTIO_SCSI_S_SIMPLE          0
    #define VIRTIO_SCSI_S_ORDERED         1
    #define VIRTIO_SCSI_S_HEAD            2
    #define VIRTIO_SCSI_S_ACA             3

    The lun field addresses a bus, target and logical unit in the SCSI
    host.  The id field is the command identifier as defined in SAM.

    Task_attr, prio and CRN are defined in SAM.  The prio field should
    always be zero, as command priority is explicitly not supported by
    this version of the device.  task_attr defines the task attribute as
    in the table above, Note that all task attributes may be mapped to
    SIMPLE by the device.  CRN is generally expected to be 0, but clients
    can provide it.  The maximum CRN value defined by the protocol is 255,
    since CRN is stored in an 8-bit integer.

    All of these fields are always read-only, as are the cdb and dataout
    field.  sense and subsequent fields are always write-only.

    The sense_len field indicates the number of bytes actually written
    to the sense buffer.  The residual field indicates the residual
    size, calculated as data_length - number_of_transferred_bytes, for
    read or write operations.

    The status byte is written by the device to be the SCSI status code.

    The response byte is written by the device to be one of the following:

    - VIRTIO_SCSI_S_OK when the request was completed and the status byte
      is filled with a SCSI status code (not necessarily "GOOD").

    - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
      more data than is available in the data buffers.

    - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
      or another task management function.

    - VIRTIO_SCSI_S_FAILURE for other host or guest error.  In particular,
      if neither dataout nor datain is empty, and the VIRTIO_SCSI_F_INOUT
      feature has not been negotiated, the request will be immediately
      returned with a response equal to VIRTIO_SCSI_S_FAILURE.

Device operation: controlq
--------------------------

The controlq is used for other SCSI transport operations.
Requests have the following format:

    struct virtio_scsi_ctrl
    {
        u32 type;
        ...
        u8 response;
    }

    The type identifies the remaining fields.

The following commands are defined:

- Task management function

    #define VIRTIO_SCSI_T_TMF                      0

    #define VIRTIO_SCSI_T_TMF_ABORT_TASK           0
    #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET       1
    #define VIRTIO_SCSI_T_TMF_CLEAR_ACA            2
    #define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET       3
    #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET      4
    #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
    #define VIRTIO_SCSI_T_TMF_QUERY_TASK           6
    #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET       7

    struct virtio_scsi_ctrl_tmf
    {
        u32 type;
        u32 subtype;
        u8 lun[8];
        u64 id;
        u8 additional[];
        u8 response;
    }

    /* command-specific response values */
    #define VIRTIO_SCSI_S_FUNCTION_COMPLETE        0
    #define VIRTIO_SCSI_S_FAILURE                  3
    #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED       4
    #define VIRTIO_SCSI_S_FUNCTION_REJECTED        5
    #define VIRTIO_SCSI_S_INCORRECT_LUN            6

    The type is VIRTIO_SCSI_T_TMF.  All fields but the last one are
    filled by the driver, the response field is filled in by the device.
    The id command must match the id in a SCSI command.  Irrelevant fields
    for the requested TMF are ignored.

    Note that since ACA is not supported by this version of the spec,
    VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.

    The outcome of the task management function is written by the device
    in the response field.  Return values map 1-to-1 with those defined
    in SAM.

- Asynchronous notification query

    #define VIRTIO_SCSI_T_AN_QUERY                    1

    struct virtio_scsi_ctrl_an {
        u32 type;
        u8  lun[8];
        u32 event_requested;
        u32 event_actual;
        u8  response;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE  2
    #define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT          4
    #define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST    8
    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
    #define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST          32
    #define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY         64

    By sending this command, the driver asks the device which events
    the given LUN can report, as described in paragraphs 6.6 and A.6
    of the SCSI MMC specification.  The driver writes the events it is
    interested in into the event_requested; the device responds by
    writing the events that it supports into event_actual.

    The type is VIRTIO_SCSI_T_AN_QUERY.  The lun and event_requested
    fields are written by the driver.  The event_actual and response
    fields are written by the device.

    Valid values of the response byte are VIRTIO_SCSI_S_OK or
    VIRTIO_SCSI_S_FAILURE (with the same meaning as above).

- Asynchronous notification subscription

    #define VIRTIO_SCSI_T_AN_SUBSCRIBE                2

    struct virtio_scsi_ctrl_an {
        u32 type;
        u8  lun[8];
        u32 event_requested;
        u32 event_actual;
        u8  response;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16

    By sending this command, the driver asks the specified LUN to report
    events for its physical interface, as described in Annex A of the SCSI
    MMC specification.  The driver writes the events it is interested in
    into the event_requested; the device responds by writing the events
    that it supports into event_actual.

    The type is VIRTIO_SCSI_T_AN_SUBSCRIBE.  The lun and event_requested
    fields are written by the driver.  The event_actual and response
    fields are written by the device.

    Valid values of the response byte are VIRTIO_SCSI_S_OK,
    VIRTIO_SCSI_S_FAILURE (with the same meaning as above).

Device operation: eventq
------------------------

The eventq is used by the device to report information on logical units
that are attached to it.  The driver should always leave a few (?) buffers
ready in the eventq.  The device will end up dropping events if it finds
no buffer ready.

Buffers are placed in the eventq and filled by the device when interesting
events occur.  The buffers should be strictly write-only (device-filled)
and the size of the buffers should be at least the value given in the
device's configuration information.

Events have the following format:

    #define VIRTIO_SCSI_T_EVENTS_MISSED   0x80000000

    struct virtio_scsi_ctrl_recv {
        u32 event;
        ...
    }

If bit 31 is set in the event field, the device failed to report an
event due to missing buffers.  In this case, the driver should poll the
logical units for unit attention conditions, and/or do whatever form of
bus scan is appropriate for the guest operating system.

Other data that the device writes to the buffer depends on the contents
of the event field.  The following events are defined:

- No event

    #define VIRTIO_SCSI_T_NO_EVENT         0

    This event is fired in the following cases:

    1) When the device detects in the eventq a buffer that is shorter
    than what is indicated in the configuration field, it will use
    it immediately and put this dummy value in the event field.
    A well-written driver will never observe this situation.

    2) When events are dropped, the device may signal this event as
    soon as the drivers makes a buffer available, in order to request
    action from the driver.  In this case, of course, this event will
    be reported with the VIRTIO_SCSI_T_EVENTS_MISSED flag.

- Transport reset

    #define VIRTIO_SCSI_T_TRANSPORT_RESET  1

    struct virtio_scsi_reset {
        u32 event;
        u8  lun[8];
        u32 reason;
    }

    #define VIRTIO_SCSI_EVT_RESET_HARD         0
    #define VIRTIO_SCSI_EVT_RESET_RESCAN       1
    #define VIRTIO_SCSI_EVT_RESET_REMOVED      2

    By sending this event, the device signals that a logical unit
    on a target has been reset, including the case of a new device
    appearing or disappearing on the bus.

    The device fills in all fields.  The event field is set to
    VIRTIO_SCSI_T_TRANSPORT_RESET.  The lun field addresses a bus,
    target and logical unit in the SCSI host.

    The reason value is one of the four #define values appearing above.
    VIRTIO_SCSI_EVT_RESET_REMOVED is used if the target or logical unit
    is no longer able to receive commands.  VIRTIO_SCSI_EVT_RESET_HARD
    is used if the logical unit has been reset, but is still present.
    VIRTIO_SCSI_EVT_RESET_RESCAN is used if a target or logical unit has
    just appeared on the device.

    When VIRTIO_SCSI_EVT_RESET_REMOVED or VIRTIO_SCSI_EVT_RESET_RESCAN
    is sent for LUN 0, the driver should ask the initiator to rescan
    the target, in order to detect the case when an entire target has
    appeared or disappeared.

    Events will also be reported via sense codes (this obviously does
    not apply to newly appeared buses or targets, since the application
    has never discovered them):

    - VIRTIO_SCSI_EVT_RESET_HARD
      sense UNIT ATTENTION
      asc POWER ON, RESET OR BUS DEVICE RESET OCCURRED

    - VIRTIO_SCSI_EVT_RESET_RESCAN
      sense UNIT ATTENTION
      asc REPORTED LUNS DATA HAS CHANGED

    - VIRTIO_SCSI_EVT_RESET_REMOVED
      sense ILLEGAL REQUEST
      asc LOGICAL UNIT NOT SUPPORTED

    The preferred way to detect transport reset is always to use events,
    because sense codes are only seen by the driver when it sends a
    SCSI command to the logical unit or target.  However, in case events
    are dropped, the initiator will still be able to synchronize with the
    actual state of the controller if the driver asks the initiator to
    rescan of the SCSI bus.  During the rescan, the initiator will be
    able to observe the above sense codes, and it will process them as
    if it the driver had received the equivalent event.

- Asynchronous notification

    #define VIRTIO_SCSI_T_ASYNC_NOTIFY     2

    struct virtio_scsi_an_event {
        u32 event;
        u8  lun[8];
        u32 reason;
    }

    By sending this event, the device signals that an asynchronous
    event was fired from a physical interface.

    All fields are written by the device.  The event field is set to
    VIRTIO_SCSI_T_ASYNC_NOTIFY.  The reason field is a subset of the
    events that the driver has subscribed to via the "Asynchronous
    notification subscription" command.

    When dropped events are reported, the driver should poll for 
    asynchronous events manually using SCSI commands.

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2011-07-04 14:22 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-07 13:43 virtio scsi host draft specification, v3 Paolo Bonzini
2011-06-07 13:43 ` [Qemu-devel] " Paolo Bonzini
2011-06-08 23:28 ` Rusty Russell
2011-06-08 23:28   ` [Qemu-devel] " Rusty Russell
2011-06-08 23:28   ` Rusty Russell
2011-06-09  6:59   ` Paolo Bonzini
2011-06-09  6:59   ` Paolo Bonzini
2011-06-09  6:59     ` [Qemu-devel] " Paolo Bonzini
2011-06-10 11:33     ` Rusty Russell
2011-06-10 11:33       ` [Qemu-devel] " Rusty Russell
2011-06-10 12:14       ` Stefan Hajnoczi
2011-06-10 12:14       ` Stefan Hajnoczi
2011-06-10 12:14         ` [Qemu-devel] " Stefan Hajnoczi
2011-06-10 12:22         ` Paolo Bonzini
2011-06-10 12:22         ` Paolo Bonzini
2011-06-10 12:22           ` [Qemu-devel] " Paolo Bonzini
2011-06-10 11:33     ` Rusty Russell
2011-06-08 23:28 ` Rusty Russell
2011-06-10 12:55 ` Hannes Reinecke
2011-06-10 12:55 ` Hannes Reinecke
2011-06-10 12:55   ` [Qemu-devel] " Hannes Reinecke
2011-06-10 14:35   ` Paolo Bonzini
2011-06-10 14:35   ` Paolo Bonzini
2011-06-10 14:35     ` [Qemu-devel] " Paolo Bonzini
2011-06-14  8:39     ` Hannes Reinecke
2011-06-14  8:39     ` Hannes Reinecke
2011-06-14  8:39       ` [Qemu-devel] " Hannes Reinecke
2011-06-14 15:53       ` Stefan Hajnoczi
2011-06-14 15:53       ` Stefan Hajnoczi
2011-06-14 15:53         ` [Qemu-devel] " Stefan Hajnoczi
2011-06-29  8:33       ` Paolo Bonzini
2011-06-29  8:33       ` Paolo Bonzini
2011-06-29  8:33         ` [Qemu-devel] " Paolo Bonzini
2011-06-29  9:39         ` Stefan Hajnoczi
2011-06-29  9:39           ` [Qemu-devel] " Stefan Hajnoczi
2011-06-29 10:07           ` Christoph Hellwig
2011-06-29 10:07           ` Christoph Hellwig
2011-06-29 10:07             ` [Qemu-devel] " Christoph Hellwig
2011-06-29 10:23             ` Hannes Reinecke
2011-06-29 10:23               ` [Qemu-devel] " Hannes Reinecke
2011-06-29 10:27               ` Christoph Hellwig
2011-06-29 10:27               ` Christoph Hellwig
2011-06-29 10:27                 ` [Qemu-devel] " Christoph Hellwig
2011-06-29 10:23             ` Hannes Reinecke
2011-07-01  6:41           ` Paolo Bonzini
2011-07-01  6:41           ` Paolo Bonzini
2011-07-01  6:41             ` [Qemu-devel] " Paolo Bonzini
2011-07-01  7:14             ` Hannes Reinecke
2011-07-01  7:14               ` [Qemu-devel] " Hannes Reinecke
2011-07-01  8:35               ` Paolo Bonzini
2011-07-01  8:35                 ` [Qemu-devel] " Paolo Bonzini
2011-07-04 13:38                 ` Hai Dong,Li
2011-07-04 13:38                   ` [Qemu-devel] " Hai Dong,Li
2011-07-04 14:22                   ` Stefan Hajnoczi
2011-07-04 14:22                     ` [Qemu-devel] " Stefan Hajnoczi
2011-07-04 14:22                   ` Stefan Hajnoczi
2011-07-04 13:38                 ` Hai Dong,Li
2011-07-01  8:35               ` Paolo Bonzini
2011-07-01  7:14             ` Hannes Reinecke
2011-06-29  9:39         ` Stefan Hajnoczi
2011-06-12  7:51   ` Michael S. Tsirkin
2011-06-12  7:51   ` Michael S. Tsirkin
2011-06-12  7:51     ` [Qemu-devel] " Michael S. Tsirkin
2011-06-14 15:30     ` Hannes Reinecke
2011-06-14 15:30       ` [Qemu-devel] " Hannes Reinecke
2011-06-29 10:00       ` Christoph Hellwig
2011-06-29 10:00         ` [Qemu-devel] " Christoph Hellwig
2011-06-29 10:00       ` Christoph Hellwig
2011-06-14 15:30     ` Hannes Reinecke
2011-06-29  8:23     ` Paolo Bonzini
2011-06-29  8:23       ` [Qemu-devel] " Paolo Bonzini
2011-06-29  8:46       ` Michael S. Tsirkin
2011-06-29  8:46       ` Michael S. Tsirkin
2011-06-29  8:46         ` [Qemu-devel] " Michael S. Tsirkin
2011-06-29 10:03       ` Christoph Hellwig
2011-06-29 10:03         ` [Qemu-devel] " Christoph Hellwig
2011-06-29 10:06         ` Paolo Bonzini
2011-06-29 10:06         ` Paolo Bonzini
2011-06-29 10:06           ` [Qemu-devel] " Paolo Bonzini
2011-06-29 10:31           ` Michael S. Tsirkin
2011-06-29 10:31           ` Michael S. Tsirkin
2011-06-29 10:31             ` [Qemu-devel] " Michael S. Tsirkin
2011-06-29 10:35             ` Paolo Bonzini
2011-06-29 10:35             ` Paolo Bonzini
2011-06-29 10:35               ` [Qemu-devel] " Paolo Bonzini
2011-06-29 10:03       ` Christoph Hellwig
2011-06-29  8:23     ` Paolo Bonzini
2011-06-29 10:01     ` Christoph Hellwig
2011-06-29 10:01       ` [Qemu-devel] " Christoph Hellwig
2011-06-29 10:01     ` Christoph Hellwig
2011-06-07 13:43 Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.