All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] virtio scsi host draft specification, v2
@ 2011-05-20  8:21 Paolo Bonzini
  2011-05-28 19:33 ` Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Paolo Bonzini @ 2011-05-20  8:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 657 bytes --]

Hi all,

here is the second version of the spec.  In the end I took the advice of 
merging all requestq's into one.  The reason for this is that I took a 
look at the vSCSI device and liked its approach of using SAM 8-byte LUNs 
directly.  While it _is_ complex (and not yet done right by QEMU---will 
send a patch for that), the scheme is actually quite natural to 
implement and use, and supporting generic bus/target/LUN topologies is 
good to have for passthrough, as well.

I also added a few more features from SAM to avoid redefining the 
structs in the future.

Of course it may be that I'm completely wrong. :)  Please comment on the 
spec!

Paolo

[-- Attachment #2: virtio-scsi-v5.txt --]
[-- Type: text/plain, Size: 11939 bytes --]

Virtio SCSI Host Device Spec
============================

The virtio SCSI host device groups together one or more simple virtual
devices (ie. disk), and allows communicating to these devices using the
SCSI protocol.  An instance of the device represents a SCSI host with
possibly many buses, targets and LUN attached.

The virtio SCSI device services two kinds of requests:

- command requests for a logical unit;

- task management functions related to a logical unit, target or
command.

The device is also able to send out notifications about added
and removed logical units.

v4:
    First public version

v5:
    Merged all virtqueues into one, removed separate TARGET fields

Configuration
-------------

Subsystem Device ID
    TBD

Virtqueues
    0:control transmitq
    1:control receiveq
    2:requestq

Feature bits
    VIRTIO_SCSI_F_INOUT - Whether a single request can include both
        read-only and write-only data buffers.

Device configuration layout
    struct virtio_scsi_config {
    }

    (Still empty)

Device initialization
---------------------

The initialization routine should first of all discover the device's
control virtqueues.

The driver should then place at least a buffer in the control receiveq.
Buffers returned by the device on the control receiveq may be referred
to as "events" in the rest of the document.

The driver can immediately issue requests (for example, INQUIRY or
REPORT LUNS) or task management functions (for example, I_T RESET).

Device operation: request queue
-------------------------------

The driver queues requests to the virtqueue, and they are used by the device
(not necessarily in order).

Requests have the following format:

    struct virtio_scsi_req_cmd {
        u8 lun[8];
        u64 id;
        u8 task_attr;
        u8 prio;
        u8 crn;
        u32 num_dataout, num_datain;
        char cdb[];
        char data[][num_dataout+num_datain];
        u8 sense[];
        u32 sense_len;
        u32 residual;
        u16 status_qualifier;
        u8 status;
        u8 response;
    };

    /* command-specific response values */
    #define VIRTIO_SCSI_S_OK              0
    #define VIRTIO_SCSI_S_UNDERRUN        1
    #define VIRTIO_SCSI_S_ABORTED         2
    #define VIRTIO_SCSI_S_FAILURE         3

    The lun field addresses a bus, target and logical unit in the SCSI
    host.  The id field is the command identifier as defined in SAM.

    The task_attr, prio field should always be zero, as task
    attributes other than SIMPLE, as well as command priority, are
    explicitly not supported by this version of the device.
    CRN is also as defined in SAM; while it is generally expected to
    be 0, clients can provide it.  The maximum CRN value defined by
    the protocol is 255, since CRN is stored in an 8-bit integer.

    All of these fields are always read-only.

    The cdb, data and sense fields must reside in separate buffers.
    The cdb field is always read-only.  The data buffers may be either
    read-only or write-only, depending on the request, with the read-only
    buffers coming first.  The sense buffer is always write-only.

    The request shall have num_dataout read-only data buffers and
    num_datain write-only data buffers.  One of these two values must be
    zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.

    Remaining fields are filled in by the device.  The sense_len field
    indicates the number of bytes actually written to the sense buffer,
    while the residual field indicates the residual size, calculated as
    data_length - number_of_transferred_bytes.

    The status byte is written by the device to be the SCSI status code.

    The response byte is written by the device to be one of the following:

    - VIRTIO_SCSI_S_OK when the request was completed and the status byte
      is filled with a SCSI status code (not necessarily "GOOD").

    - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
      more data than is available in the data buffers.

    - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
      or another task management function.

    - VIRTIO_SCSI_S_FAILURE for other host or guest error.

Device operation: control transmitq
-----------------------------------

The control transmitq is used for other SCSI transport operations.
These are not placed on the requestq so that they can be sent out-of-band,
even when the requestq is full.  This is particularly important for task
management functions.

Requests have the following format:

    struct virtio_scsi_ctrl
    {
        u32 type;
        ...
        u8 response;
    }

    The type identifies the remaining fields.

The following commands are defined:

- Task management function

    #define VIRTIO_SCSI_T_TMF                      0

    #define VIRTIO_SCSI_T_TMF_ABORT_TASK           0
    #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET       1
    #define VIRTIO_SCSI_T_TMF_CLEAR_ACA            2
    #define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET       3
    #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET      4
    #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
    #define VIRTIO_SCSI_T_TMF_QUERY_TASK           6
    #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET       7

    struct virtio_scsi_ctrl_tmf
    {
        u32 type;
        u32 subtype;
        u8 lun[8];
        u64 id;
        u8 additional[];
        u8 response;
    }

    /* command-specific response values */
    #define VIRTIO_SCSI_S_FUNCTION_COMPLETE        0
    #define VIRTIO_SCSI_S_FAILURE                  3
    #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED       4
    #define VIRTIO_SCSI_S_FUNCTION_REJECTED        5
    #define VIRTIO_SCSI_S_INCORRECT_LUN            6

    The type is VIRTIO_SCSI_T_TMF.  All fields but the last one are
    filled by the driver, the response field is filled in by the device.
    The id command must match the id in a SCSI command.  Irrelevant fields
    for the requested TMF are ignored.

    Note that since ACA is not supported by this version of the spec,
    VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.

    The outcome of the task management function is written by the device
    in the response field.  Return values map 1-to-1 with those defined
    in SAM.

- Asynchronous notification query

    #define VIRTIO_SCSI_T_AN_QUERY                    1

    struct virtio_scsi_ctrl_an {
        u32 type;
        u8  lun[8];
        u32 event_requested;
        u32 event_actual;
        u8  response;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16

    By sending this command, the driver asks the device which events
    the given LUN can report, as described in Annex A of the SCSI MMC
    specification.  The driver writes the events it is interested in
    into the event_requested; the device responds by writing the events
    that it supports into event_actual.

    The type is VIRTIO_SCSI_T_AN_QUERY.  The lun and event_requested
    fields are written by the driver.  The event_actual and response
    fields are written by the device.

    Valid values of the response byte are VIRTIO_SCSI_S_OK or
    VIRTIO_SCSI_S_FAILURE (with the same meaning as above).

- Asynchronous notification subscription

    #define VIRTIO_SCSI_T_AN_SUBSCRIBE                2

    struct virtio_scsi_ctrl_an {
        u32 type;
        u8  lun[8];
        u32 event_requested;
        u32 event_actual;
        u8  response;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16

    By sending this command, the driver asks the specified LUN to report
    events for its physical interface, as described in Annex A of the SCSI
    MMC specification.  The driver writes the events it is interested in
    into the event_requested; the device responds by writing the events
    that it supports into event_actual.

    The type is VIRTIO_SCSI_T_AN_SUBSCRIBE.  The lun and event_requested
    fields are written by the driver.  The event_actual and response
    fields are written by the device.

    Valid values of the response byte are VIRTIO_SCSI_S_OK,
    VIRTIO_SCSI_S_FAILURE (with the same meaning as above).

Device operation: control receiveq
----------------------------------

The control receiveq is used by the device to report information on
logical units that are attached to it.  The driver should always
leave a few (?) buffers ready in the control receiveq.  The device may
end up dropping events if it finds no buffer ready.

Buffers are placed in the control receiveq and filled by the device when
interesting events occur.  Events have the following format:

    #define VIRTIO_SCSI_T_EVENTS_MISSED   0x80000000

    struct virtio_scsi_ctrl_recv {
        u32 event;
        ...
    }

If bit 31 is set in the event, the device failed to report an event due
to missing buffers.  In this case, the driver should poll the logical
units for unit attention conditions, and/or do whatever form of bus scan
is appropriate for the guest operating system.

The following events are defined:

- Transport reset

    #define VIRTIO_SCSI_T_TRANSPORT_RESET  0

    struct virtio_scsi_reset {
        u32 event;
        u8  lun[8];
        u32 reason;
    }

    #define VIRTIO_SCSI_EVT_RESET_HARD         0
    #define VIRTIO_SCSI_EVT_RESET_RESCAN       1
    #define VIRTIO_SCSI_EVT_RESET_SHUTDOWN     2
    #define VIRTIO_SCSI_EVT_RESET_REMOVED      3

    By sending this event, the device signals that a logical unit
    on a target has been reset, including the case of a new device
    appearing or disappearing on the bus.

    The device fills in all fields.  The event field is set to
    VIRTIO_SCSI_T_TRANSPORT_RESET.  The following format is used
    to represent an event that affects all LUNs enumerated by
    a target:

    - for logical units at the top level ("bus 0"), the LUN should be
    (255,255,0,0,0,0,0,0), whose meaning is "logical unit not specified";

    - otherwise it should identify a bus and target indication in
    peripheral device addressing format, followed by the six bytes
    (255,255,0,0,0,0), whose meaning is also "logical unit not
    specified".

    The reason value is one of the four #define values appearing above.
    VIRTIO_SCSI_EVT_RESET_REMOVED is used if the target or logical unit
    is no longer able to receive commands.  VIRTIO_SCSI_EVT_RESET_HARD
    is used if the logical unit has been reset, but is still present.
    VIRTIO_SCSI_EVT_RESET_RESCAN is used if a target or logical unit has
    just appeared on the device.  VIRTIO_SCSI_EVT_RESET_SHUTDOWN
    is used when the host wants to initiate a graceful shutdown of a
    logical unit.

    Events should also be reported via sense codes or response codes
    (this obviously does not apply to newly appeared buses or targets,
    since the application has never discovered them):

    - VIRTIO_SCSI_EVT_RESET_HARD
      sense UNIT ATTENTION
      asc POWER ON, RESET OR BUS DEVICE RESET OCCURRED

    - VIRTIO_SCSI_EVT_RESET_RESCAN
      sense UNIT ATTENTION
      asc REPORTED LUNS DATA HAS CHANGED

    - VIRTIO_SCSI_EVT_RESET_SHUTDOWN
      sense UNIT ATTENTION
      asc TARGET OPERATING CONDITIONS HAVE CHANGED
      ascq 0x80 (vendor specific)

    - VIRTIO_SCSI_EVT_RESET_REMOVED
      sense ILLEGAL REQUEST
      asc LOGICAL UNIT NOT SUPPORTED

    However, in general events should be more easily handled by the
    driver than sense codes.

- Asynchronous notification

    #define VIRTIO_SCSI_T_ASYNC_NOTIFY     1

    struct virtio_scsi_an_event {
        u8  lun[8];
        u32 event;
    }

    #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16

    By sending this event, the device signals that an event was
    fired from a physical interface.  The device only sends events
    that the driver has subscribed to via the "Asynchronous notification
    subscription" command.

    All fields are written by the device.  The event field is set to
    VIRTIO_SCSI_T_ASYNC_NOTIFY.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-05-20  8:21 [Qemu-devel] virtio scsi host draft specification, v2 Paolo Bonzini
@ 2011-05-28 19:33 ` Stefan Hajnoczi
  2011-05-30  9:22   ` Paolo Bonzini
  2011-06-01  8:49 ` Michael S. Tsirkin
  2011-06-01 13:46 ` Avi Kivity
  2 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2011-05-28 19:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Fri, May 20, 2011 at 10:21:03AM +0200, Paolo Bonzini wrote:
> Configuration
> -------------
> 
> Subsystem Device ID
>     TBD
> 
> Virtqueues
>     0:control transmitq
>     1:control receiveq

I find these names weird because control commands are actually processed
and completed on the transmitq.  The receiveq is only for receiving
asynchronous notifications.

0:control commandq
1:control eventq

This seems clearer to me although it's not as generic as
transmit/receive if we want to extend its semantics in the future.

>     2:requestq
> 
> Feature bits
>     VIRTIO_SCSI_F_INOUT - Whether a single request can include both
>         read-only and write-only data buffers.
> 
> Device configuration layout
>     struct virtio_scsi_config {
>     }
> 
>     (Still empty)
> 
> Device initialization
> ---------------------
> 
> The initialization routine should first of all discover the device's
> control virtqueues.
> 
> The driver should then place at least a buffer in the control receiveq.
> Buffers returned by the device on the control receiveq may be referred
> to as "events" in the rest of the document.
> 
> The driver can immediately issue requests (for example, INQUIRY or
> REPORT LUNS) or task management functions (for example, I_T RESET).
> 
> Device operation: request queue
> -------------------------------
> 
> The driver queues requests to the virtqueue, and they are used by the device
> (not necessarily in order).

What do you mean by "no necessarily in order"?  Doesn't the SAM already
define the available ordering semantics and how the target processes
requests - I think there is no need to mention anything here.

> 
> Requests have the following format:
> 
>     struct virtio_scsi_req_cmd {
>         u8 lun[8];
>         u64 id;
>         u8 task_attr;
>         u8 prio;
>         u8 crn;
>         u32 num_dataout, num_datain;

These fields can be discovered from the in and out counts that virtio
provides.  They seem redundant to me.

>         char cdb[];
>         char data[][num_dataout+num_datain];
>         u8 sense[];
>         u32 sense_len;
>         u32 residual;
>         u16 status_qualifier;
>         u8 status;
>         u8 response;
>     };
> 
>     /* command-specific response values */
>     #define VIRTIO_SCSI_S_OK              0
>     #define VIRTIO_SCSI_S_UNDERRUN        1
>     #define VIRTIO_SCSI_S_ABORTED         2
>     #define VIRTIO_SCSI_S_FAILURE         3
> 
>     The lun field addresses a bus, target and logical unit in the SCSI
>     host.  The id field is the command identifier as defined in SAM.
> 
>     The task_attr, prio field should always be zero, as task
>     attributes other than SIMPLE, as well as command priority, are
>     explicitly not supported by this version of the device.
>     CRN is also as defined in SAM; while it is generally expected to
>     be 0, clients can provide it.  The maximum CRN value defined by
>     the protocol is 255, since CRN is stored in an 8-bit integer.

SAMr4 5.1 The Execute Command procedure call:
"The CRN value zero shall be reserved for use as defined by the SCSI
transport protocol."

FWIW the SRP spec simply doesn't include CRN and I think we could do the
same.  I don't know what it is actually used for in other transports...

> 
>     All of these fields are always read-only.
> 
>     The cdb, data and sense fields must reside in separate buffers.
>     The cdb field is always read-only.  The data buffers may be either
>     read-only or write-only, depending on the request, with the read-only
>     buffers coming first.  The sense buffer is always write-only.
> 
>     The request shall have num_dataout read-only data buffers and
>     num_datain write-only data buffers.  One of these two values must be
>     zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.

What happens if this VIRTIO_SCSI_F_INOUT has not been negotiated and
both values are non-zero?  Perhaps the request should be immediately
returned with response = VIRTIO_SCSI_S_FAILURE.

I think we should define behavior for all inputs - otherwise we end up
with QEMU-side code that calls abort() which is bad ;).

> 
>     Remaining fields are filled in by the device.  The sense_len field
>     indicates the number of bytes actually written to the sense buffer,
>     while the residual field indicates the residual size, calculated as
>     data_length - number_of_transferred_bytes.
> 
>     The status byte is written by the device to be the SCSI status code.
> 
>     The response byte is written by the device to be one of the following:
> 
>     - VIRTIO_SCSI_S_OK when the request was completed and the status byte
>       is filled with a SCSI status code (not necessarily "GOOD").
> 
>     - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
>       more data than is available in the data buffers.
> 
>     - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
>       or another task management function.
> 
>     - VIRTIO_SCSI_S_FAILURE for other host or guest error.
> 
> Device operation: control transmitq
> -----------------------------------
> 
> The control transmitq is used for other SCSI transport operations.
> These are not placed on the requestq so that they can be sent out-of-band,
> even when the requestq is full.  This is particularly important for task
> management functions.
> 
> Requests have the following format:
> 
>     struct virtio_scsi_ctrl
>     {
>         u32 type;
>         ...
>         u8 response;
>     }
> 
>     The type identifies the remaining fields.
> 
> The following commands are defined:
> 
> - Task management function
> 
>     #define VIRTIO_SCSI_T_TMF                      0
> 
>     #define VIRTIO_SCSI_T_TMF_ABORT_TASK           0
>     #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET       1
>     #define VIRTIO_SCSI_T_TMF_CLEAR_ACA            2
>     #define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET       3
>     #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET      4
>     #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
>     #define VIRTIO_SCSI_T_TMF_QUERY_TASK           6
>     #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET       7
> 
>     struct virtio_scsi_ctrl_tmf
>     {
>         u32 type;
>         u32 subtype;
>         u8 lun[8];
>         u64 id;
>         u8 additional[];
>         u8 response;
>     }
> 
>     /* command-specific response values */
>     #define VIRTIO_SCSI_S_FUNCTION_COMPLETE        0
>     #define VIRTIO_SCSI_S_FAILURE                  3
>     #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED       4
>     #define VIRTIO_SCSI_S_FUNCTION_REJECTED        5
>     #define VIRTIO_SCSI_S_INCORRECT_LUN            6
> 
>     The type is VIRTIO_SCSI_T_TMF.  All fields but the last one are
>     filled by the driver, the response field is filled in by the device.
>     The id command must match the id in a SCSI command.  Irrelevant fields
>     for the requested TMF are ignored.
> 
>     Note that since ACA is not supported by this version of the spec,
>     VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.
> 
>     The outcome of the task management function is written by the device
>     in the response field.  Return values map 1-to-1 with those defined
>     in SAM.

We have no transport-specific response field here like we do with CDBs.
I guess this is okay because the SAM defines SERVICE DELIVERY OR TARGET
FAILURE, which we could use if there is a problem.

> - Asynchronous notification query
> 
>     #define VIRTIO_SCSI_T_AN_QUERY                    1
> 
>     struct virtio_scsi_ctrl_an {
>         u32 type;
>         u8  lun[8];
>         u32 event_requested;
>         u32 event_actual;
>         u8  response;
>     }
> 
>     #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
> 
>     By sending this command, the driver asks the device which events
>     the given LUN can report, as described in Annex A of the SCSI MMC
>     specification.  The driver writes the events it is interested in
>     into the event_requested; the device responds by writing the events
>     that it supports into event_actual.
> 
>     The type is VIRTIO_SCSI_T_AN_QUERY.  The lun and event_requested
>     fields are written by the driver.  The event_actual and response
>     fields are written by the device.
> 
>     Valid values of the response byte are VIRTIO_SCSI_S_OK or
>     VIRTIO_SCSI_S_FAILURE (with the same meaning as above).
> 
> - Asynchronous notification subscription
> 
>     #define VIRTIO_SCSI_T_AN_SUBSCRIBE                2
> 
>     struct virtio_scsi_ctrl_an {
>         u32 type;
>         u8  lun[8];
>         u32 event_requested;
>         u32 event_actual;
>         u8  response;
>     }
> 
>     #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
> 
>     By sending this command, the driver asks the specified LUN to report
>     events for its physical interface, as described in Annex A of the SCSI
>     MMC specification.  The driver writes the events it is interested in
>     into the event_requested; the device responds by writing the events
>     that it supports into event_actual.
> 
>     The type is VIRTIO_SCSI_T_AN_SUBSCRIBE.  The lun and event_requested
>     fields are written by the driver.  The event_actual and response
>     fields are written by the device.
> 
>     Valid values of the response byte are VIRTIO_SCSI_S_OK,
>     VIRTIO_SCSI_S_FAILURE (with the same meaning as above).
> 
> Device operation: control receiveq
> ----------------------------------
> 
> The control receiveq is used by the device to report information on
> logical units that are attached to it.  The driver should always
> leave a few (?) buffers ready in the control receiveq.  The device may

"The driver should always leave buffers ready in the control receiveq"

Also, I think it should say "the device must drop events if it finds no
buffer ready".  The spec goes into detail on how to notify about dropped
events, using "must" instead of "may" seems right.

Stefan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-05-28 19:33 ` Stefan Hajnoczi
@ 2011-05-30  9:22   ` Paolo Bonzini
  2011-06-01  4:44     ` Stefan Hajnoczi
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2011-05-30  9:22 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On 05/28/2011 09:33 PM, Stefan Hajnoczi wrote:
>> Virtqueues
>>      0:control transmitq
>>      1:control receiveq
>
> I find these names weird because control commands are actually processed
> and completed on the transmitq.  The receiveq is only for receiving
> asynchronous notifications.
>
> 0:control commandq
> 1:control eventq
>
> This seems clearer to me although it's not as generic as
> transmit/receive if we want to extend its semantics in the future.

I took the names from virtio-serial, IIRC.  But now I simplified your 
proposal further to controlq/eventq.

>> The driver queues requests to the virtqueue, and they are used by the device
>> (not necessarily in order).
>
> What do you mean by "no necessarily in order"?  Doesn't the SAM already
> define the available ordering semantics and how the target processes
> requests - I think there is no need to mention anything here.

Right.

>>
>> Requests have the following format:
>>
>>      struct virtio_scsi_req_cmd {
>>          u8 lun[8];
>>          u64 id;
>>          u8 task_attr;
>>          u8 prio;
>>          u8 crn;
>>          u32 num_dataout, num_datain;
>
> These fields can be discovered from the in and out counts that virtio
> provides.  They seem redundant to me.

I'm not sure, perhaps in the future more variable-sized fields may be 
added.  I added a note that requests will be failed if the driver 
detects inconsistencies between the actual number of buffers, and the 
count specified in num_dataout/num_datain.

> SAMr4 5.1 The Execute Command procedure call:
> "The CRN value zero shall be reserved for use as defined by the SCSI
> transport protocol."
>
> FWIW the SRP spec simply doesn't include CRN and I think we could do the
> same.  I don't know what it is actually used for in other transports...

I wasn't sure of what would happen in the case of SCSI passthrough to 
protocols that do use CRN.  It seems "free" to leave it in.

>>      The request shall have num_dataout read-only data buffers and
>>      num_datain write-only data buffers.  One of these two values must be
>>      zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.
>
> What happens if this VIRTIO_SCSI_F_INOUT has not been negotiated and
> both values are non-zero?  Perhaps the request should be immediately
> returned with response = VIRTIO_SCSI_S_FAILURE.
>
> I think we should define behavior for all inputs - otherwise we end up
> with QEMU-side code that calls abort() which is bad ;).

I agree, and I made the change.

>>      The outcome of the task management function is written by the device
>>      in the response field.  Return values map 1-to-1 with those defined
>>      in SAM.
>
> We have no transport-specific response field here like we do with CDBs.
> I guess this is okay because the SAM defines SERVICE DELIVERY OR TARGET
> FAILURE, which we could use if there is a problem.

Yes, that is VIRTIO_SCSI_S_FAILURE.

>> The control receiveq is used by the device to report information on
>> logical units that are attached to it.  The driver should always
>> leave a few (?) buffers ready in the control receiveq.  The device may
>
> "The driver should always leave buffers ready in the control receiveq"
>
> Also, I think it should say "the device must drop events if it finds no
> buffer ready".  The spec goes into detail on how to notify about dropped
> events, using "must" instead of "may" seems right.

"Must" seems too strong.  Dropped events are a racy event, so it is not 
really possible to guve any guarantee.  I changed it to "will" though.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-05-30  9:22   ` Paolo Bonzini
@ 2011-06-01  4:44     ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2011-06-01  4:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel

On Mon, May 30, 2011 at 10:22 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 05/28/2011 09:33 PM, Stefan Hajnoczi wrote:
>>> Requests have the following format:
>>>
>>>     struct virtio_scsi_req_cmd {
>>>         u8 lun[8];
>>>         u64 id;
>>>         u8 task_attr;
>>>         u8 prio;
>>>         u8 crn;
>>>         u32 num_dataout, num_datain;
>>
>> These fields can be discovered from the in and out counts that virtio
>> provides.  They seem redundant to me.
>
> I'm not sure, perhaps in the future more variable-sized fields may be added.
>  I added a note that requests will be failed if the driver detects
> inconsistencies between the actual number of buffers, and the count
> specified in num_dataout/num_datain.

Okay, you're right that it will be necessary if more multi-buffer data is added.

>> SAMr4 5.1 The Execute Command procedure call:
>> "The CRN value zero shall be reserved for use as defined by the SCSI
>> transport protocol."
>>
>> FWIW the SRP spec simply doesn't include CRN and I think we could do the
>> same.  I don't know what it is actually used for in other transports...
>
> I wasn't sure of what would happen in the case of SCSI passthrough to
> protocols that do use CRN.  It seems "free" to leave it in.

I don't know the answer here.

>>> The control receiveq is used by the device to report information on
>>> logical units that are attached to it.  The driver should always
>>> leave a few (?) buffers ready in the control receiveq.  The device may
>>
>> "The driver should always leave buffers ready in the control receiveq"
>>
>> Also, I think it should say "the device must drop events if it finds no
>> buffer ready".  The spec goes into detail on how to notify about dropped
>> events, using "must" instead of "may" seems right.
>
> "Must" seems too strong.  Dropped events are a racy event, so it is not
> really possible to guve any guarantee.  I changed it to "will" though.

Okay thanks, I was more worried about allowing the device to queue
when event buffers are not available.  If the dropping behavior is
part of the spec then it is clear what the device implementation
should do.

Stefan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-05-20  8:21 [Qemu-devel] virtio scsi host draft specification, v2 Paolo Bonzini
  2011-05-28 19:33 ` Stefan Hajnoczi
@ 2011-06-01  8:49 ` Michael S. Tsirkin
  2011-06-01 11:59   ` Paolo Bonzini
  2011-06-01 13:46 ` Avi Kivity
  2 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2011-06-01  8:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Fri, May 20, 2011 at 10:21:03AM +0200, Paolo Bonzini wrote:
> Hi all,
> 
> here is the second version of the spec.  In the end I took the
> advice of merging all requestq's into one.  The reason for this is
> that I took a look at the vSCSI device and liked its approach of
> using SAM 8-byte LUNs directly.  While it _is_ complex (and not yet
> done right by QEMU---will send a patch for that), the scheme is
> actually quite natural to implement and use, and supporting generic
> bus/target/LUN topologies is good to have for passthrough, as well.
> 
> I also added a few more features from SAM to avoid redefining the
> structs in the future.
> 
> Of course it may be that I'm completely wrong. :)  Please comment on
> the spec!
> 
> Paolo

> Virtio SCSI Host Device Spec
> ============================
> 
> The virtio SCSI host device groups together one or more simple virtual
> devices (ie. disk), and allows communicating to these devices using the
> SCSI protocol.  An instance of the device represents a SCSI host with
> possibly many buses, targets and LUN attached.
> 
> The virtio SCSI device services two kinds of requests:
> 
> - command requests for a logical unit;
> 
> - task management functions related to a logical unit, target or
> command.
> 
> The device is also able to send out notifications about added
> and removed logical units.
> 
> v4:
>     First public version
> 
> v5:
>     Merged all virtqueues into one, removed separate TARGET fields

Document still seems to refer to multiple VQs - maybe I misunderstand?

> Configuration
> -------------
> 
> Subsystem Device ID
>     TBD
> 
> Virtqueues
>     0:control transmitq
>     1:control receiveq
>     2:requestq
> 
> Feature bits
>     VIRTIO_SCSI_F_INOUT - Whether a single request can include both
>         read-only and write-only data buffers.
> 
> Device configuration layout
>     struct virtio_scsi_config {
>     }
> 
>     (Still empty)
> 
> Device initialization
> ---------------------
> 
> The initialization routine should first of all discover the device's
> control virtqueues.
> 
> The driver should then place at least a buffer in the control receiveq.

Size of the buffer?


> Buffers returned by the device on the control receiveq may be referred
> to as "events" in the rest of the document.
> 
> The driver can immediately issue requests (for example, INQUIRY or
> REPORT LUNS) or task management functions (for example, I_T RESET).
> 
> Device operation: request queue
> -------------------------------
> 
> The driver queues requests to the virtqueue, and they are used by the device
> (not necessarily in order).
> 
> Requests have the following format:
> 
>     struct virtio_scsi_req_cmd {
>         u8 lun[8];
>         u64 id;
>         u8 task_attr;
>         u8 prio;
>         u8 crn;
>         u32 num_dataout, num_datain;
>         char cdb[];
>         char data[][num_dataout+num_datain];
>         u8 sense[];
>         u32 sense_len;
>         u32 residual;
>         u16 status_qualifier;
>         u8 status;
>         u8 response;
>     };
> 
>     /* command-specific response values */
>     #define VIRTIO_SCSI_S_OK              0
>     #define VIRTIO_SCSI_S_UNDERRUN        1
>     #define VIRTIO_SCSI_S_ABORTED         2
>     #define VIRTIO_SCSI_S_FAILURE         3
> 
>     The lun field addresses a bus, target and logical unit in the SCSI
>     host.  The id field is the command identifier as defined in SAM.
> 
>     The task_attr, prio field should always be zero, as task
>     attributes other than SIMPLE, as well as command priority, are
>     explicitly not supported by this version of the device.
>     CRN is also as defined in SAM; while it is generally expected to
>     be 0, clients can provide it.  The maximum CRN value defined by
>     the protocol is 255, since CRN is stored in an 8-bit integer.
> 
>     All of these fields are always read-only.
> 
>     The cdb, data and sense fields must reside in separate buffers.
>     The cdb field is always read-only.  The data buffers may be either
>     read-only or write-only, depending on the request, with the read-only
>     buffers coming first.  The sense buffer is always write-only.
> 
>     The request shall have num_dataout read-only data buffers and
>     num_datain write-only data buffers.  One of these two values must be
>     zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.

Why do num_datain/num_dataout need to be there?
We can just look at the number of io/out bufs in
virtio descriptors, no?

Also, from experience, it's better not to have any layout
assumptions - let the guest stick everything in a single
in + single out buffer if it desires.


>     Remaining fields are filled in by the device.  The sense_len field
>     indicates the number of bytes actually written to the sense buffer,
>     while the residual field indicates the residual size, calculated as
>     data_length - number_of_transferred_bytes.

Again virtio gives you total number of written bytes in the used len
field.  So just one of these fields will be enough.

>     The status byte is written by the device to be the SCSI status code.
> 
>     The response byte is written by the device to be one of the following:
> 
>     - VIRTIO_SCSI_S_OK when the request was completed and the status byte
>       is filled with a SCSI status code (not necessarily "GOOD").
> 
>     - VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring
>       more data than is available in the data buffers.
> 
>     - VIRTIO_SCSI_S_ABORTED if the request was cancelled due to a reset
>       or another task management function.
> 
>     - VIRTIO_SCSI_S_FAILURE for other host or guest error.
> 
> Device operation: control transmitq
> -----------------------------------
> 
> The control transmitq is used for other SCSI transport operations.
> These are not placed on the requestq so that they can be sent out-of-band,
> even when the requestq is full.  This is particularly important for task
> management functions.
> 
> Requests have the following format:
> 
>     struct virtio_scsi_ctrl
>     {
>         u32 type;
>         ...
>         u8 response;
>     }
> 
>     The type identifies the remaining fields.
> 
> The following commands are defined:
> 
> - Task management function
> 
>     #define VIRTIO_SCSI_T_TMF                      0
> 
>     #define VIRTIO_SCSI_T_TMF_ABORT_TASK           0
>     #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET       1
>     #define VIRTIO_SCSI_T_TMF_CLEAR_ACA            2
>     #define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET       3
>     #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET      4
>     #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
>     #define VIRTIO_SCSI_T_TMF_QUERY_TASK           6
>     #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET       7
> 
>     struct virtio_scsi_ctrl_tmf
>     {
>         u32 type;
>         u32 subtype;
>         u8 lun[8];
>         u64 id;
>         u8 additional[];
>         u8 response;
>     }
> 
>     /* command-specific response values */
>     #define VIRTIO_SCSI_S_FUNCTION_COMPLETE        0
>     #define VIRTIO_SCSI_S_FAILURE                  3
>     #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED       4
>     #define VIRTIO_SCSI_S_FUNCTION_REJECTED        5
>     #define VIRTIO_SCSI_S_INCORRECT_LUN            6
> 
>     The type is VIRTIO_SCSI_T_TMF.  All fields but the last one are
>     filled by the driver, the response field is filled in by the device.
>     The id command must match the id in a SCSI command.  Irrelevant fields
>     for the requested TMF are ignored.
> 
>     Note that since ACA is not supported by this version of the spec,
>     VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.
> 
>     The outcome of the task management function is written by the device
>     in the response field.  Return values map 1-to-1 with those defined
>     in SAM.
> 
> - Asynchronous notification query
> 
>     #define VIRTIO_SCSI_T_AN_QUERY                    1
> 
>     struct virtio_scsi_ctrl_an {
>         u32 type;
>         u8  lun[8];
>         u32 event_requested;
>         u32 event_actual;
>         u8  response;
>     }
> 
>     #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
> 
>     By sending this command, the driver asks the device which events
>     the given LUN can report, as described in Annex A of the SCSI MMC
>     specification.  The driver writes the events it is interested in
>     into the event_requested; the device responds by writing the events
>     that it supports into event_actual.
> 
>     The type is VIRTIO_SCSI_T_AN_QUERY.  The lun and event_requested
>     fields are written by the driver.  The event_actual and response
>     fields are written by the device.
> 
>     Valid values of the response byte are VIRTIO_SCSI_S_OK or
>     VIRTIO_SCSI_S_FAILURE (with the same meaning as above).
> 
> - Asynchronous notification subscription
> 
>     #define VIRTIO_SCSI_T_AN_SUBSCRIBE                2
> 
>     struct virtio_scsi_ctrl_an {
>         u32 type;
>         u8  lun[8];
>         u32 event_requested;
>         u32 event_actual;
>         u8  response;
>     }
> 
>     #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
> 
>     By sending this command, the driver asks the specified LUN to report
>     events for its physical interface, as described in Annex A of the SCSI
>     MMC specification.  The driver writes the events it is interested in
>     into the event_requested; the device responds by writing the events
>     that it supports into event_actual.
> 
>     The type is VIRTIO_SCSI_T_AN_SUBSCRIBE.  The lun and event_requested
>     fields are written by the driver.  The event_actual and response
>     fields are written by the device.
> 
>     Valid values of the response byte are VIRTIO_SCSI_S_OK,
>     VIRTIO_SCSI_S_FAILURE (with the same meaning as above).
> 
> Device operation: control receiveq
> ----------------------------------
> 
> The control receiveq is used by the device to report information on
> logical units that are attached to it.  The driver should always
> leave a few (?) buffers ready in the control receiveq.  The device may
> end up dropping events if it finds no buffer ready.

Is this safe? It looks like there's a finite number of possible events.
Can't the device queue them until there's a buffer?
If this mechanism is unreliable, how is it useful?


> Buffers are placed in the control receiveq and filled by the device when
> interesting events occur.  Events have the following format:
> 
>     #define VIRTIO_SCSI_T_EVENTS_MISSED   0x80000000
> 
>     struct virtio_scsi_ctrl_recv {
>         u32 event;
>         ...
>     }
> 
> If bit 31 is set in the event, the device failed to report an event due
> to missing buffers.  In this case, the driver should poll the logical
> units for unit attention conditions, and/or do whatever form of bus scan
> is appropriate for the guest operating system.
> 
> The following events are defined:
> 
> - Transport reset
> 
>     #define VIRTIO_SCSI_T_TRANSPORT_RESET  0
> 
>     struct virtio_scsi_reset {
>         u32 event;
>         u8  lun[8];
>         u32 reason;
>     }
> 
>     #define VIRTIO_SCSI_EVT_RESET_HARD         0
>     #define VIRTIO_SCSI_EVT_RESET_RESCAN       1
>     #define VIRTIO_SCSI_EVT_RESET_SHUTDOWN     2
>     #define VIRTIO_SCSI_EVT_RESET_REMOVED      3
> 
>     By sending this event, the device signals that a logical unit
>     on a target has been reset, including the case of a new device
>     appearing or disappearing on the bus.
> 
>     The device fills in all fields.  The event field is set to
>     VIRTIO_SCSI_T_TRANSPORT_RESET.  The following format is used
>     to represent an event that affects all LUNs enumerated by
>     a target:
> 
>     - for logical units at the top level ("bus 0"), the LUN should be
>     (255,255,0,0,0,0,0,0), whose meaning is "logical unit not specified";
> 
>     - otherwise it should identify a bus and target indication in
>     peripheral device addressing format, followed by the six bytes
>     (255,255,0,0,0,0), whose meaning is also "logical unit not
>     specified".
> 
>     The reason value is one of the four #define values appearing above.
>     VIRTIO_SCSI_EVT_RESET_REMOVED is used if the target or logical unit
>     is no longer able to receive commands.  VIRTIO_SCSI_EVT_RESET_HARD
>     is used if the logical unit has been reset, but is still present.
>     VIRTIO_SCSI_EVT_RESET_RESCAN is used if a target or logical unit has
>     just appeared on the device.  VIRTIO_SCSI_EVT_RESET_SHUTDOWN
>     is used when the host wants to initiate a graceful shutdown of a
>     logical unit.
> 
>     Events should also be reported via sense codes or response codes
>     (this obviously does not apply to newly appeared buses or targets,
>     since the application has never discovered them):
> 
>     - VIRTIO_SCSI_EVT_RESET_HARD
>       sense UNIT ATTENTION
>       asc POWER ON, RESET OR BUS DEVICE RESET OCCURRED
> 
>     - VIRTIO_SCSI_EVT_RESET_RESCAN
>       sense UNIT ATTENTION
>       asc REPORTED LUNS DATA HAS CHANGED
> 
>     - VIRTIO_SCSI_EVT_RESET_SHUTDOWN
>       sense UNIT ATTENTION
>       asc TARGET OPERATING CONDITIONS HAVE CHANGED
>       ascq 0x80 (vendor specific)
> 
>     - VIRTIO_SCSI_EVT_RESET_REMOVED
>       sense ILLEGAL REQUEST
>       asc LOGICAL UNIT NOT SUPPORTED
> 
>     However, in general events should be more easily handled by the
>     driver than sense codes.
> 
> - Asynchronous notification
> 
>     #define VIRTIO_SCSI_T_ASYNC_NOTIFY     1
> 
>     struct virtio_scsi_an_event {
>         u8  lun[8];
>         u32 event;
>     }
> 
>     #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
> 
>     By sending this event, the device signals that an event was
>     fired from a physical interface.  The device only sends events
>     that the driver has subscribed to via the "Asynchronous notification
>     subscription" command.
> 
>     All fields are written by the device.  The event field is set to
>     VIRTIO_SCSI_T_ASYNC_NOTIFY.

We'll have to define events, right?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01  8:49 ` Michael S. Tsirkin
@ 2011-06-01 11:59   ` Paolo Bonzini
  2011-06-01 12:51     ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2011-06-01 11:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, Stefan Hajnoczi

>> Device initialization
>> ---------------------
>>
>> The initialization routine should first of all discover the device's
>> control virtqueues.
>>
>> The driver should then place at least a buffer in the control receiveq.
>
> Size of the buffer?

Good catch.  I'll add this to the configuration information.

>>      The cdb, data and sense fields must reside in separate buffers.
>>      The cdb field is always read-only.  The data buffers may be either
>>      read-only or write-only, depending on the request, with the read-only
>>      buffers coming first.  The sense buffer is always write-only.
>>
>>      The request shall have num_dataout read-only data buffers and
>>      num_datain write-only data buffers.  One of these two values must be
>>      zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.
>
> Why do num_datain/num_dataout need to be there?
> We can just look at the number of io/out bufs in
> virtio descriptors, no?

This depends on having a single variable-sized datum per direction.  I'd 
rather avoid this assumption.

> Also, from experience, it's better not to have any layout
> assumptions - let the guest stick everything in a single
> in + single out buffer if it desires.

Ok, changed.

>>      Remaining fields are filled in by the device.  The sense_len field
>>      indicates the number of bytes actually written to the sense buffer,
>>      while the residual field indicates the residual size, calculated as
>>      data_length - number_of_transferred_bytes.
>
> Again virtio gives you total number of written bytes in the used len
> field.  So just one of these fields will be enough.

The two fields give completely different information (sense vs. real 
data), and the math has to be done anyway in either the driver or the 
device.  The device is going to be written just once and actually it 
already has the separate information, so I put it in the struct and 
spared some annoyance to driver writers.

>> The control receiveq is used by the device to report information on
>> logical units that are attached to it.  The driver should always
>> leave a few (?) buffers ready in the control receiveq.  The device may
>> end up dropping events if it finds no buffer ready.
>
> [...] It looks like there's a finite number of possible events.

If you keep opening and closing the tray from the guest, you could fire 
a possibly unbounded number of events.

> If this mechanism is unreliable, how is it useful?

Events alone are unreliable, but the combination of events+sense is 
reliable.  And events+sense are still useful because:

1) sense codes only provide information when the driver next accesses 
the unit or, at best, the target.  Until then, the driver has no clue 
that the event happened.  Events can be reported at the time they 
happen.  This is important for example when the host requests a clean 
hot-unplug of a disk: if the disk is idle in the guest, the driver may 
never see the event and acknowledge the hot-unplug!

2) for this reason, unit attention has no way to signal events on a 
target that is unknown to the driver (because it has just been hotplugged).

Events and sense codes together are reliable because the driver is aware 
of dropped events.  The driver can react to it by rescanning the bus 
(which will let the driver see the unit attention conditions) and 
polling CD-ROM units for events it had subscribed to.

I'll add a short version of the above text to the spec, and I'll also 
add the following dummy event:

- No event

     #define VIRTIO_SCSI_T_NO_EVENT         0

     This event is fired in the following cases:

     1) When the device detects in the eventq a buffer that is shorter
     than what is indicated in the configuration field, it will use
     it immediately and put this dummy value in the event field.
     A well-written driver will never observe this situation.

     2) When events are dropped, the device may signal this event as
     soon as the drivers makes a buffer available, in order to request
     action from the driver.  In this case, of course, this event will
     be reported with the VIRTIO_SCSI_T_EVENTS_MISSED flag.

This will make it even clearer that no queuing is needed.

>> - Asynchronous notification
>>
>>      #define VIRTIO_SCSI_T_ASYNC_NOTIFY     1
>>
>>      struct virtio_scsi_an_event {
>>          u8  lun[8];
>>          u32 event;
>>      }
>>
>>      #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
>>
>>      By sending this event, the device signals that an event was
>>      fired from a physical interface.  The device only sends events
>>      that the driver has subscribed to via the "Asynchronous notification
>>      subscription" command.
>>
>>      All fields are written by the device.  The event field is set to
>>      VIRTIO_SCSI_T_ASYNC_NOTIFY.
>
> We'll have to define events, right?

They are defined in terms of annex A of the MMC spec (see the 
"Asynchronous notification query" command).  Media change is the only 
supported event for now, but others are already defined by the MMC spec.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 11:59   ` Paolo Bonzini
@ 2011-06-01 12:51     ` Michael S. Tsirkin
  2011-06-01 13:31       ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2011-06-01 12:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Wed, Jun 01, 2011 at 01:59:14PM +0200, Paolo Bonzini wrote:
> >>Device initialization
> >>---------------------
> >>
> >>The initialization routine should first of all discover the device's
> >>control virtqueues.
> >>
> >>The driver should then place at least a buffer in the control receiveq.
> >
> >Size of the buffer?
> 
> Good catch.  I'll add this to the configuration information.
> 
> >>     The cdb, data and sense fields must reside in separate buffers.
> >>     The cdb field is always read-only.  The data buffers may be either
> >>     read-only or write-only, depending on the request, with the read-only
> >>     buffers coming first.  The sense buffer is always write-only.
> >>
> >>     The request shall have num_dataout read-only data buffers and
> >>     num_datain write-only data buffers.  One of these two values must be
> >>     zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.
> >
> >Why do num_datain/num_dataout need to be there?
> >We can just look at the number of io/out bufs in
> >virtio descriptors, no?
> 
> This depends on having a single variable-sized datum per direction.
> I'd rather avoid this assumption.

I think it's a sane assumption: does scsi ever give you more?  If you
have many how do you know the size of each?

> >Also, from experience, it's better not to have any layout
> >assumptions - let the guest stick everything in a single
> >in + single out buffer if it desires.
> 
> Ok, changed.
> 
> >>     Remaining fields are filled in by the device.  The sense_len field
> >>     indicates the number of bytes actually written to the sense buffer,
> >>     while the residual field indicates the residual size, calculated as
> >>     data_length - number_of_transferred_bytes.
> >
> >Again virtio gives you total number of written bytes in the used len
> >field.  So just one of these fields will be enough.
> 
> The two fields give completely different information (sense vs. real
> data), and the math has to be done anyway in either the driver or
> the device.  The device is going to be written just once and
> actually it already has the separate information, so I put it in the
> struct and spared some annoyance to driver writers.

Yes but this way you get duplicate information which means
it can get out of sync. Go figure who's right then ...
Further less data->less cache pressure ...

> >>The control receiveq is used by the device to report information on
> >>logical units that are attached to it.  The driver should always
> >>leave a few (?) buffers ready in the control receiveq.  The device may
> >>end up dropping events if it finds no buffer ready.
> >
> >[...] It looks like there's a finite number of possible events.
> 
> If you keep opening and closing the tray from the guest, you could
> fire a possibly unbounded number of events.

In that case only the last one is really interesting.

> >If this mechanism is unreliable, how is it useful?
> 
> Events alone are unreliable, but the combination of events+sense is
> reliable.  And events+sense are still useful because:
> 
> 1) sense codes only provide information when the driver next
> accesses the unit or, at best, the target.  Until then, the driver
> has no clue that the event happened.  Events can be reported at the
> time they happen.  This is important for example when the host
> requests a clean hot-unplug of a disk: if the disk is idle in the
> guest, the driver may never see the event and acknowledge the
> hot-unplug!

right. But if you then drop this because you don't have a buffer,
you get the same problem.

> 2) for this reason, unit attention has no way to signal events on a
> target that is unknown to the driver (because it has just been
> hotplugged).
> 
> Events and sense codes together are reliable because the driver is
> aware of dropped events.

It is?  How is it notified of dropped events?

>  The driver can react to it by rescanning
> the bus (which will let the driver see the unit attention
> conditions) and polling CD-ROM units for events it had subscribed
> to.
> 
> I'll add a short version of the above text to the spec, and I'll
> also add the following dummy event:
> 
> - No event
> 
>     #define VIRTIO_SCSI_T_NO_EVENT         0
> 
>     This event is fired in the following cases:
> 
>     1) When the device detects in the eventq a buffer that is shorter
>     than what is indicated in the configuration field, it will use
>     it immediately and put this dummy value in the event field.
>     A well-written driver will never observe this situation.
> 
>     2) When events are dropped, the device may signal this event as
>     soon as the drivers makes a buffer available, in order to request
>     action from the driver.  In this case, of course, this event will
>     be reported with the VIRTIO_SCSI_T_EVENTS_MISSED flag.

For 2, you don't have a buffer - so how is this event reported?

> This will make it even clearer that no queuing is needed.
> 
> >>- Asynchronous notification
> >>
> >>     #define VIRTIO_SCSI_T_ASYNC_NOTIFY     1
> >>
> >>     struct virtio_scsi_an_event {
> >>         u8  lun[8];
> >>         u32 event;
> >>     }
> >>
> >>     #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE        16
> >>
> >>     By sending this event, the device signals that an event was
> >>     fired from a physical interface.  The device only sends events
> >>     that the driver has subscribed to via the "Asynchronous notification
> >>     subscription" command.
> >>
> >>     All fields are written by the device.  The event field is set to
> >>     VIRTIO_SCSI_T_ASYNC_NOTIFY.
> >
> >We'll have to define events, right?
> 
> They are defined in terms of annex A of the MMC spec (see the
> "Asynchronous notification query" command).  Media change is the
> only supported event for now, but others are already defined by the
> MMC spec.
> 
> Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 12:51     ` Michael S. Tsirkin
@ 2011-06-01 13:31       ` Paolo Bonzini
  2011-06-01 14:36         ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2011-06-01 13:31 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, Stefan Hajnoczi

On 06/01/2011 02:51 PM, Michael S. Tsirkin wrote:
>>>>      The cdb, data and sense fields must reside in separate buffers.
>>>>      The cdb field is always read-only.  The data buffers may be either
>>>>      read-only or write-only, depending on the request, with the read-only
>>>>      buffers coming first.  The sense buffer is always write-only.
>>>>
>>>>      The request shall have num_dataout read-only data buffers and
>>>>      num_datain write-only data buffers.  One of these two values must be
>>>>      zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.
>>>
>>> Why do num_datain/num_dataout need to be there?
>>> We can just look at the number of io/out bufs in
>>> virtio descriptors, no?
>>
>> This depends on having a single variable-sized datum per direction.
>> I'd rather avoid this assumption.
>
> I think it's a sane assumption: does scsi ever give you more?

Sense data, and data from the device, are both variable-length.  With 
your suggestion that they may be in a single buffer, we do need size 
information for them.

> If you have many how do you know the size of each?

That's up to the transport protocol to devise.

However, I'll make a few changes:

1) num_dataout/num_datain will become byte counts rather than buffer 
counts.  This fits better if the driver has the liberty has to use 
single buffers.

2) I'll add a sense_size field to give the size of the sense buffer, 
since now it won't necessarily be in a separate buffer (which gave this 
knowledge to the .  The sense_len will tell the driver how many bytes 
were actually written (usually 0).

>>>>      Remaining fields are filled in by the device.  The sense_len field
>>>>      indicates the number of bytes actually written to the sense buffer,
>>>>      while the residual field indicates the residual size, calculated as
>>>>      data_length - number_of_transferred_bytes.
>>>
>>> Again virtio gives you total number of written bytes in the used len
>>> field.  So just one of these fields will be enough.
>>
>> The two fields give completely different information (sense vs. real
>> data), and the math has to be done anyway in either the driver or
>> the device.  The device is going to be written just once and
>> actually it already has the separate information, so I put it in the
>> struct and spared some annoyance to driver writers.
>
> Yes but this way you get duplicate information which means
> it can get out of sync. Go figure who's right then ...

But is it duplicate actually?  The write-only buffer(s) start with 
datain_size bytes of data, and sense_size bytes of sense.  If everything 
is placed in a single buffer, the used len field cannot capture how many 
bytes were written in the data and how many were written in the sense. 
In addition, if everything is placed in a single buffer, the device will 
have to report that it has written the full buffer because other parts 
of the response come after datain and sense.

So, if I am going to give this liberty with buffers to the driver, I 
_have_ to keep the size information.  Otherwise, I agree that it is 
redundant and I will remove it.  What poison do you prefer?

>>> [...] It looks like there's a finite number of possible events.
>>
>> If you keep opening and closing the tray from the guest, you could
>> fire a possibly unbounded number of events.
>
> In that case only the last one is really interesting.

That's true.  But events are rare enough that you should not have any 
problems feeding enough buffers to the device, and even in that case 
there is a way out in case of dropped events (see below).  Assuming I am 
going to implement the device, requiring the device to queue events 
seems a bit masochistic. :)

>>> If this mechanism is unreliable, how is it useful?
>>
>> Events alone are unreliable, but the combination of events+sense is
>> reliable.  And events+sense are still useful because:
>>
>> 1) sense codes only provide information when the driver next
>> accesses the unit or, at best, the target.  Until then, the driver
>> has no clue that the event happened.  Events can be reported at the
>> time they happen.  This is important for example when the host
>> requests a clean hot-unplug of a disk: if the disk is idle in the
>> guest, the driver may never see the event and acknowledge the
>> hot-unplug!
>
> right. But if you then drop this because you don't have a buffer,
> you get the same problem.

First of all, it should be clear that while lack of buffers may happen, 
it is expected to be an exceptional case.  If the driver keeps on not 
providing buffers, so goes life.  PCI hot-unplug requires a cooperating 
operating system too, after all.

So, it may happen that the driver misses an event because it doesn't 
provide a buffer at the right time.  But if the driver works correctly, 
it will provide event buffers regularly to the device, and the device 
will tell the driver that events were dropped through the 
VIRTIO_SCSI_T_EVENTS_MISSED flag.  The driver will then ask the SCSI 
subsystem to rescan the bus, and will see the unit attention condition 
during the rescan.

I guess this answered your next questions too:

>> 2) for this reason, unit attention has no way to signal events on a
>> target that is unknown to the driver (because it has just been
>> hotplugged).
>>
>> Events and sense codes together are reliable because the driver is
>> aware of dropped events.
>
> It is?  How is it notified of dropped events?
>
>> - No event
>>
>>      #define VIRTIO_SCSI_T_NO_EVENT         0
>>
>>      This event is fired in the following cases:
>>
>>      1) When the device detects in the eventq a buffer that is shorter
>>      than what is indicated in the configuration field, it will use
>>      it immediately and put this dummy value in the event field.
>>      A well-written driver will never observe this situation.
>>
>>      2) When events are dropped, the device may signal this event as
>>      soon as the drivers makes a buffer available, in order to request
>>      action from the driver.  In this case, of course, this event will
>>      be reported with the VIRTIO_SCSI_T_EVENTS_MISSED flag.
>
> For 2, you don't have a buffer - so how is this event reported?

It is reported _eventually_: "as soon as the driver makes a buffer 
available".  After it is reported, a bus rescan synchronizes the device 
and the driver.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-05-20  8:21 [Qemu-devel] virtio scsi host draft specification, v2 Paolo Bonzini
  2011-05-28 19:33 ` Stefan Hajnoczi
  2011-06-01  8:49 ` Michael S. Tsirkin
@ 2011-06-01 13:46 ` Avi Kivity
  2011-06-01 16:25   ` Paolo Bonzini
  2 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2011-06-01 13:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On 05/20/2011 11:21 AM, Paolo Bonzini wrote:
> Hi all,
>
> here is the second version of the spec.  In the end I took the advice 
> of merging all requestq's into one.  The reason for this is that I 
> took a look at the vSCSI device and liked its approach of using SAM 
> 8-byte LUNs directly.  While it _is_ complex (and not yet done right 
> by QEMU---will send a patch for that), the scheme is actually quite 
> natural to implement and use, and supporting generic bus/target/LUN 
> topologies is good to have for passthrough, as well.
>
> I also added a few more features from SAM to avoid redefining the 
> structs in the future.
>
> Of course it may be that I'm completely wrong. :)  Please comment on 
> the spec!

> Virtqueues
>      0:control transmitq
>      1:control receiveq
>      2:requestq
>

Shouldn't we plan multiqueue for this from day 1?

> Requests have the following format:
>
>      struct virtio_scsi_req_cmd {
>          u8 lun[8];
>          u64 id;
>          u8 task_attr;
>          u8 prio;
>          u8 crn;
>          u32 num_dataout, num_datain;
>          char cdb[];
>          char data[][num_dataout+num_datain];
>          u8 sense[];
>          u32 sense_len;
>          u32 residual;
>          u16 status_qualifier;
>          u8 status;
>          u8 response;
>      };

flags? room for growth?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 13:31       ` Paolo Bonzini
@ 2011-06-01 14:36         ` Michael S. Tsirkin
  2011-06-01 14:51           ` Avi Kivity
  2011-06-01 15:59           ` Paolo Bonzini
  0 siblings, 2 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2011-06-01 14:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Wed, Jun 01, 2011 at 03:31:42PM +0200, Paolo Bonzini wrote:
> On 06/01/2011 02:51 PM, Michael S. Tsirkin wrote:
> >>>>     The cdb, data and sense fields must reside in separate buffers.
> >>>>     The cdb field is always read-only.  The data buffers may be either
> >>>>     read-only or write-only, depending on the request, with the read-only
> >>>>     buffers coming first.  The sense buffer is always write-only.
> >>>>
> >>>>     The request shall have num_dataout read-only data buffers and
> >>>>     num_datain write-only data buffers.  One of these two values must be
> >>>>     zero if the VIRTIO_SCSI_F_INOUT has not been negotiated.
> >>>
> >>>Why do num_datain/num_dataout need to be there?
> >>>We can just look at the number of io/out bufs in
> >>>virtio descriptors, no?
> >>
> >>This depends on having a single variable-sized datum per direction.
> >>I'd rather avoid this assumption.
> >
> >I think it's a sane assumption: does scsi ever give you more?
> 
> Sense data, and data from the device, are both variable-length.
> With your suggestion that they may be in a single buffer, we do need
> size information for them.
> 
> >If you have many how do you know the size of each?
> 
> That's up to the transport protocol to devise.
> 
> However, I'll make a few changes:
> 
> 1) num_dataout/num_datain will become byte counts rather than buffer
> counts.  This fits better if the driver has the liberty has to use
> single buffers.
> 2) I'll add a sense_size field to give the size of the sense buffer,
> since now it won't necessarily be in a separate buffer (which gave
> this knowledge to the .  The sense_len will tell the driver how many
> bytes were actually written (usually 0).
> >>>>     Remaining fields are filled in by the device.  The sense_len field
> >>>>     indicates the number of bytes actually written to the sense buffer,
> >>>>     while the residual field indicates the residual size, calculated as
> >>>>     data_length - number_of_transferred_bytes.
> >>>
> >>>Again virtio gives you total number of written bytes in the used len
> >>>field.  So just one of these fields will be enough.
> >>
> >>The two fields give completely different information (sense vs. real
> >>data), and the math has to be done anyway in either the driver or
> >>the device.  The device is going to be written just once and
> >>actually it already has the separate information, so I put it in the
> >>struct and spared some annoyance to driver writers.
> >
> >Yes but this way you get duplicate information which means
> >it can get out of sync. Go figure who's right then ...
> 
> But is it duplicate actually?  The write-only buffer(s) start with
> datain_size bytes of data, and sense_size bytes of sense.  If
> everything is placed in a single buffer, the used len field cannot
> capture how many bytes were written in the data and how many were
> written in the sense. In addition, if everything is placed in a
> single buffer, the device will have to report that it has written
> the full buffer because other parts of the response come after
> datain and sense.
> 
> So, if I am going to give this liberty with buffers to the driver, I
> _have_ to keep the size information.  Otherwise, I agree that it is
> redundant and I will remove it.  What poison do you prefer?
> 

Ah, I think I understand now. Both sense and data have in
fields that might only be used partially?
In that case I think I agree: it's best to require the use of separate
buffers for them, in this way used len will give you useful information
and you won't need sense_len and data_len: just a flag to
mark the fact that there *is* a sense buffer following.
And the num field does that.
virtio-net does something vaguely similar with mergeable buffers.

However some questions:
1. I think you don't need numdatain/numdataout: each
buffer can include in and out segments. Just tell device how many
buffers are there.
2. Put first out then in for data, then out then in for
sense, so that you can get away with less descriptors as all of data
can be in a single indirect descriptor.
2. Is it true that device really expects optional data +
optional sense? Then an explicit two bit field 'have data','have sense'
might be better than 'number'.


What do you think?

> >>>[...] It looks like there's a finite number of possible events.
> >>
> >>If you keep opening and closing the tray from the guest, you could
> >>fire a possibly unbounded number of events.
> >
> >In that case only the last one is really interesting.
> 
> That's true.  But events are rare enough that you should not have
> any problems feeding enough buffers to the device, and even in that
> case there is a way out in case of dropped events (see below).
> Assuming I am going to implement the device, requiring the device to
> queue events seems a bit masochistic. :)
> 
> >>>If this mechanism is unreliable, how is it useful?
> >>
> >>Events alone are unreliable, but the combination of events+sense is
> >>reliable.  And events+sense are still useful because:
> >>
> >>1) sense codes only provide information when the driver next
> >>accesses the unit or, at best, the target.  Until then, the driver
> >>has no clue that the event happened.  Events can be reported at the
> >>time they happen.  This is important for example when the host
> >>requests a clean hot-unplug of a disk: if the disk is idle in the
> >>guest, the driver may never see the event and acknowledge the
> >>hot-unplug!
> >
> >right. But if you then drop this because you don't have a buffer,
> >you get the same problem.
> 
> First of all, it should be clear that while lack of buffers may
> happen, it is expected to be an exceptional case.  If the driver
> keeps on not providing buffers, so goes life.  PCI hot-unplug
> requires a cooperating operating system too, after all.
> 
> So, it may happen that the driver misses an event because it doesn't
> provide a buffer at the right time.  But if the driver works
> correctly, it will provide event buffers regularly to the device,
> and the device will tell the driver that events were dropped through
> the VIRTIO_SCSI_T_EVENTS_MISSED flag.  The driver will then ask the
> SCSI subsystem to rescan the bus, and will see the unit attention
> condition during the rescan.
> 
> I guess this answered your next questions too:
> 
> >>2) for this reason, unit attention has no way to signal events on a
> >>target that is unknown to the driver (because it has just been
> >>hotplugged).
> >>
> >>Events and sense codes together are reliable because the driver is
> >>aware of dropped events.
> >
> >It is?  How is it notified of dropped events?
> >
> >>- No event
> >>
> >>     #define VIRTIO_SCSI_T_NO_EVENT         0
> >>
> >>     This event is fired in the following cases:
> >>
> >>     1) When the device detects in the eventq a buffer that is shorter
> >>     than what is indicated in the configuration field, it will use
> >>     it immediately and put this dummy value in the event field.
> >>     A well-written driver will never observe this situation.
> >>
> >>     2) When events are dropped, the device may signal this event as
> >>     soon as the drivers makes a buffer available, in order to request
> >>     action from the driver.  In this case, of course, this event will
> >>     be reported with the VIRTIO_SCSI_T_EVENTS_MISSED flag.
> >
> >For 2, you don't have a buffer - so how is this event reported?
> 
> It is reported _eventually_: "as soon as the driver makes a buffer
> available".  After it is reported, a bus rescan synchronizes the
> device and the driver.
> 
> Paolo

I see, that's what I missed. Sure, that sounds fine.

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 14:36         ` Michael S. Tsirkin
@ 2011-06-01 14:51           ` Avi Kivity
  2011-06-02 10:42             ` Michael S. Tsirkin
  2011-06-01 15:59           ` Paolo Bonzini
  1 sibling, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2011-06-01 14:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On 06/01/2011 05:36 PM, Michael S. Tsirkin wrote:
> >
> >  So, if I am going to give this liberty with buffers to the driver, I
> >  _have_ to keep the size information.  Otherwise, I agree that it is
> >  redundant and I will remove it.  What poison do you prefer?
> >
>
> Ah, I think I understand now. Both sense and data have in
> fields that might only be used partially?
> In that case I think I agree: it's best to require the use of separate
> buffers for them, in this way used len will give you useful information
> and you won't need sense_len and data_len: just a flag to
> mark the fact that there *is* a sense buffer following.
> And the num field does that.


Do you mean to use the virtio iovec length to determine information 
about the message (like splitting it into buffers)?

I think that's a bad idea.  Splitting into buffers is a function of 
memory management.  For example, a driver in userspace (or a nested 
guest) will have additional fragmentation into 4K pages after it passes 
through the iommu.

Let's not mix layers here.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 14:36         ` Michael S. Tsirkin
  2011-06-01 14:51           ` Avi Kivity
@ 2011-06-01 15:59           ` Paolo Bonzini
  2011-06-02 11:41             ` Michael S. Tsirkin
  1 sibling, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2011-06-01 15:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, Stefan Hajnoczi

On 06/01/2011 04:36 PM, Michael S. Tsirkin wrote:
> Ah, I think I understand now. Both sense and data have in
> fields that might only be used partially?
> In that case I think I agree: it's best to require the use of separate
> buffers for them, in this way used len will give you useful information
> and you won't need sense_len and data_len: just a flag to
> mark the fact that there*is*  a sense buffer following.

If the device wants a sense buffer to be there always, that's sensible. 
  No flag needed here, then.  Also, sense is always "in", there is no out.

But I do not understand how the used len helps me.  If I read the spec 
correctly, the length will be the number of bytes written, but this will 
always point to after the last field.  If sense or data are written 
partially, this will not be written in the fields---in fact, virtio_blk 
does contain both sense_len and residual.  Its sense field is fixed 
size, which is probably why it doesn't contain something like 
datain_size (there is just one variable-size field).

Strictly speaking I wouldn't need dataout_size too, because I have only 
one variable-size read-only field, but I prefer to be future-proof.

I think the following is a good compromise:

1) keep dataout_size, datain_size and sense_size;

2) dataout, datain and sense shall each start a separate buffer, and 
sense shall be contained in a single buffer; it is permissible to put 
sense and the subsequent fields in the same buffer.  This will make it 
easy for the QEMU implementation to pick up its iovecs.

It will also let the device detect mistakes in filling the data sizes. 
In practice I expect 3 descriptors will be used (one direct for 
read-only stuff up to data; one possibly indirect for data; one direct 
for sense and other write-only stuff).

> However some questions:
> 1. I think you don't need numdatain/numdataout: each
> buffer can include in and out segments. Just tell device how many
> buffers are there.

I don't understand.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 13:46 ` Avi Kivity
@ 2011-06-01 16:25   ` Paolo Bonzini
  2011-06-01 16:29     ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2011-06-01 16:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On 06/01/2011 03:46 PM, Avi Kivity wrote:
>
>> Virtqueues
>>      0:control transmitq
>>      1:control receiveq
>>      2:requestq
>>
>
> Shouldn't we plan multiqueue for this from day 1?

How would you do multiqueue?  Just provide N queues, and the device can 
place requests for any LUN on any queue?

If that's correct, it doesn't sound too hard to do, but also doesn't 
sound problematic to fit it later.  So I'm quite ambivalent

>> Requests have the following format:
>>
>>      struct virtio_scsi_req_cmd {
>>          u8 lun[8];
>>          u64 id;
>>          u8 task_attr;
>>          u8 prio;
>>          u8 crn;
>>          u32 num_dataout, num_datain;
>>          char cdb[];
>>          char data[][num_dataout+num_datain];
>>          u8 sense[];
>>          u32 sense_len;
>>          u32 residual;
>>          u16 status_qualifier;
>>          u8 status;
>>          u8 response;
>>      };
>
> flags? room for growth?

Feature bits can be used to negotiate the exact format of the request. 
I don't expect many changes, since we're closely mimicking the SCSI 
requests.

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 16:25   ` Paolo Bonzini
@ 2011-06-01 16:29     ` Avi Kivity
  0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2011-06-01 16:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On 06/01/2011 07:25 PM, Paolo Bonzini wrote:
> On 06/01/2011 03:46 PM, Avi Kivity wrote:
>>
>>> Virtqueues
>>>      0:control transmitq
>>>      1:control receiveq
>>>      2:requestq
>>>
>>
>> Shouldn't we plan multiqueue for this from day 1?
>
> How would you do multiqueue?  Just provide N queues, and the device 
> can place requests for any LUN on any queue?

Yes.

> If that's correct, it doesn't sound too hard to do, but also doesn't 
> sound problematic to fit it later.  So I'm quite ambivalent

Since it's easy, why not?  Reducing feature bit proliferation means 
easier testing.

>>
>> flags? room for growth?
>
> Feature bits can be used to negotiate the exact format of the request. 
> I don't expect many changes, since we're closely mimicking the SCSI 
> requests.

Ok.  If we're following a standard we should be safe.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 14:51           ` Avi Kivity
@ 2011-06-02 10:42             ` Michael S. Tsirkin
  2011-06-02 11:42               ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2011-06-02 10:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Wed, Jun 01, 2011 at 05:51:54PM +0300, Avi Kivity wrote:
> On 06/01/2011 05:36 PM, Michael S. Tsirkin wrote:
> >>
> >>  So, if I am going to give this liberty with buffers to the driver, I
> >>  _have_ to keep the size information.  Otherwise, I agree that it is
> >>  redundant and I will remove it.  What poison do you prefer?
> >>
> >
> >Ah, I think I understand now. Both sense and data have in
> >fields that might only be used partially?
> >In that case I think I agree: it's best to require the use of separate
> >buffers for them, in this way used len will give you useful information
> >and you won't need sense_len and data_len: just a flag to
> >mark the fact that there *is* a sense buffer following.
> >And the num field does that.
> 
> 
> Do you mean to use the virtio iovec length to determine information
> about the message (like splitting it into buffers)?

Exactly the reverse :)

> I think that's a bad idea.  Splitting into buffers is a function of
> memory management.  For example, a driver in userspace (or a nested
> guest) will have additional fragmentation into 4K pages after it
> passes through the iommu.
> 
> Let's not mix layers here.

Right. If there are two buffers of variable length there
should be two add_buf calls.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-01 15:59           ` Paolo Bonzini
@ 2011-06-02 11:41             ` Michael S. Tsirkin
  2011-06-02 12:02               ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2011-06-02 11:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

BTW, I think Cc the virtio mailing list
in the next version is a good idea.

On Wed, Jun 01, 2011 at 05:59:28PM +0200, Paolo Bonzini wrote:
> On 06/01/2011 04:36 PM, Michael S. Tsirkin wrote:
> >Ah, I think I understand now. Both sense and data have in
> >fields that might only be used partially?
> >In that case I think I agree: it's best to require the use of separate
> >buffers for them, in this way used len will give you useful information
> >and you won't need sense_len and data_len: just a flag to
> >mark the fact that there*is*  a sense buffer following.
> 
> If the device wants a sense buffer to be there always, that's
> sensible.  No flag needed here, then.  Also, sense is always "in",
> there is no out.
> 
> But I do not understand how the used len helps me.  If I read the
> spec correctly, the length will be the number of bytes written, but
> this will always point to after the last field.  If sense or data
> are written partially, this will not be written in the fields---in
> fact, virtio_blk does contain both sense_len and residual.  Its
> sense field is fixed size, which is probably why it doesn't contain
> something like datain_size (there is just one variable-size field).
> 
> Strictly speaking I wouldn't need dataout_size too, because I have
> only one variable-size read-only field, but I prefer to be
> future-proof.
> 
> I think the following is a good compromise:
> 
> 1) keep dataout_size, datain_size and sense_size;
> 
> 2) dataout, datain and sense shall each start a separate buffer, and
> sense shall be contained in a single buffer; it is permissible to
> put sense and the subsequent fields in the same buffer.  This will
> make it easy for the QEMU implementation to pick up its iovecs.
> 
> It will also let the device detect mistakes in filling the data
> sizes.

I am not sure whether the length info in the header is redundant. If so, I
think it's better not to duplicate it on the assumption that this will
let us detect bugs: the bugs will be in the header as likely as not.
If the overlap is not complete, some redundancy is not too bad.

> In practice I expect 3 descriptors will be used (one direct
> for read-only stuff up to data; one possibly indirect for data; one
> direct for sense and other write-only stuff).
> 
> >However some questions:
> >1. I think you don't need numdatain/numdataout: each
> >buffer can include in and out segments. Just tell device how many
> >buffers are there.
> 
> I don't understand.
> 
> Paolo

I think I didn't express myself clearly. Follows a somewhat
lengthy background to make sure we use the same terms.
Feel free to ignore until ----


There seems to be some confusion about the terminology: the term buffers
is confusing. I'm guilty of mixing terms too. Let's refer to the virtio
spec.  There are two kinds of entities there:

- descriptor: each descriptor points to a location
  in memory. It can be in or out. descriptors can be
  chained together.

- head: internally, points to a chain of descriptors, both in and out.
  * driver makes head available to device, thus adding some
    memory for out + some memory for in.
  * device uses the memory, and reports to driver how much
    in memory was used. The assumption is always that
    device consumes in memory from the beginning
    in a contigious way. That's why length is enough.
    That's also why it's called add_buf: conceptually
    it is a single buffer.

Drivers and devices never operate in terms of descriptors: these are
internal to the virtio ring transport.  Instead, they always operate in
term of heads. virtio interface does let you pass in
s/g entries but this is an artifact of linux.

At the moment many devices in qemu assume that drivers put some info in
specific descriptors. This is not a good idea, they should always
operate in terms of offsets from start of descriptor.  vhost does this
correctly BTW.  I posted a patch to fix that previously, need to dust it
up and merge.

----

Now to our problem:
As far as I can tell there are two input buffers in each request: sense
and data. Right?

If sense is fixed length, we can simply put it first, have device write
sense then data.  This does not seem too limiting, if you want a lot of
flexibility sense length can be in device config.  If we don't want to
limit ourselves to fixed length sense, we would have driver use two
heads for a request.  This is possible but one needs to be careful in
the driver to make sure there's enough space for both requests. Maybe
add_bufs API to add multiple bufs might be a good idea here.


-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-02 10:42             ` Michael S. Tsirkin
@ 2011-06-02 11:42               ` Avi Kivity
  2011-06-02 11:56                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2011-06-02 11:42 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On 06/02/2011 01:42 PM, Michael S. Tsirkin wrote:
> On Wed, Jun 01, 2011 at 05:51:54PM +0300, Avi Kivity wrote:
> >  On 06/01/2011 05:36 PM, Michael S. Tsirkin wrote:
> >  >>
> >  >>   So, if I am going to give this liberty with buffers to the driver, I
> >  >>   _have_ to keep the size information.  Otherwise, I agree that it is
> >  >>   redundant and I will remove it.  What poison do you prefer?
> >  >>
> >  >
> >  >Ah, I think I understand now. Both sense and data have in
> >  >fields that might only be used partially?
> >  >In that case I think I agree: it's best to require the use of separate
> >  >buffers for them, in this way used len will give you useful information
> >  >and you won't need sense_len and data_len: just a flag to
> >  >mark the fact that there *is* a sense buffer following.
> >  >And the num field does that.
> >
> >
> >  Do you mean to use the virtio iovec length to determine information
> >  about the message (like splitting it into buffers)?
>
> Exactly the reverse :)

They're both equally bad.

> >  I think that's a bad idea.  Splitting into buffers is a function of
> >  memory management.  For example, a driver in userspace (or a nested
> >  guest) will have additional fragmentation into 4K pages after it
> >  passes through the iommu.
> >
> >  Let's not mix layers here.
>
> Right. If there are two buffers of variable length there
> should be two add_buf calls.

No.  The guest should be free to use one large continuous buffer of size 
N, of N buffers of size 1.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-02 11:42               ` Avi Kivity
@ 2011-06-02 11:56                 ` Michael S. Tsirkin
  2011-06-02 12:18                   ` Avi Kivity
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2011-06-02 11:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Thu, Jun 02, 2011 at 02:42:51PM +0300, Avi Kivity wrote:
> On 06/02/2011 01:42 PM, Michael S. Tsirkin wrote:
> >On Wed, Jun 01, 2011 at 05:51:54PM +0300, Avi Kivity wrote:
> >>  On 06/01/2011 05:36 PM, Michael S. Tsirkin wrote:
> >>  >>
> >>  >>   So, if I am going to give this liberty with buffers to the driver, I
> >>  >>   _have_ to keep the size information.  Otherwise, I agree that it is
> >>  >>   redundant and I will remove it.  What poison do you prefer?
> >>  >>
> >>  >
> >>  >Ah, I think I understand now. Both sense and data have in
> >>  >fields that might only be used partially?
> >>  >In that case I think I agree: it's best to require the use of separate
> >>  >buffers for them, in this way used len will give you useful information
> >>  >and you won't need sense_len and data_len: just a flag to
> >>  >mark the fact that there *is* a sense buffer following.
> >>  >And the num field does that.
> >>
> >>
> >>  Do you mean to use the virtio iovec length to determine information
> >>  about the message (like splitting it into buffers)?
> >
> >Exactly the reverse :)
> 
> They're both equally bad.
> 
> >>  I think that's a bad idea.  Splitting into buffers is a function of
> >>  memory management.  For example, a driver in userspace (or a nested
> >>  guest) will have additional fragmentation into 4K pages after it
> >>  passes through the iommu.
> >>
> >>  Let's not mix layers here.
> >
> >Right. If there are two buffers of variable length there
> >should be two add_buf calls.
> 
> No.  The guest should be free to use one large continuous buffer of
> size N, of N buffers of size 1.

That's exactly what I was saying.

> -- 
> error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-02 11:41             ` Michael S. Tsirkin
@ 2011-06-02 12:02               ` Paolo Bonzini
  2011-06-02 12:54                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2011-06-02 12:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, Stefan Hajnoczi

On 06/02/2011 01:41 PM, Michael S. Tsirkin wrote:
> Now to our problem:
> As far as I can tell there are two input buffers in each request: sense
> and data. Right?
>
> If sense is fixed length, we can simply put it first, have device write
> sense then data.  This does not seem too limiting, if you want a lot of
> flexibility sense length can be in device config.  If we don't want to
> limit ourselves to fixed length sense, we would have driver use two
> heads for a request.  This is possible but one needs to be careful in
> the driver to make sure there's enough space for both requests. Maybe
> add_bufs API to add multiple bufs might be a good idea here.

I should be on holiday today so I'll answer this quickly.  Sounds like 
we can converge, I'll put data at the end and define the length of sense 
in the config: the device writes a default (defined by the spec to be 
always 96) and the driver can modify it.  The _single_ head would contain:

- read-only: command etc.

followed by:

- write-only: sense, status etc.

followed by:

- read-only: data to device
- write-only: data from device

IIUC, qemu only sees a bunch of read-only and write-only buffers.  It 
doesn't see the relative ordering of read-only vs. write-only.  But it 
knows the sizes of read-only and write-only data, so it can figure out 
datain_size and dataout_size.  sense_size is in the config, so neither 
of the three needs to be in the request.

sense_len needs to stay, since any number of bytes can be written in the 
sense buffer.  The used-length field should be usable for 
uni-directional commands, but I'm not sure about commands that have both 
datain and dataout.  I'll read the SCSI spec about it tomorrow.

Making qemu support arbitrarily partitioned buffers may require some 
extra utility functions to work on iovecs, but nothing too complex.  If 
your patches already contain something like that, please dig them up so 
I can avoid duplicate work!

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-02 11:56                 ` Michael S. Tsirkin
@ 2011-06-02 12:18                   ` Avi Kivity
  0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2011-06-02 12:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On 06/02/2011 02:56 PM, Michael S. Tsirkin wrote:
> >  >
> >  >Right. If there are two buffers of variable length there
> >  >should be two add_buf calls.
> >
> >  No.  The guest should be free to use one large continuous buffer of
> >  size N, of N buffers of size 1.
>
> That's exactly what I was saying.

I'm really confused then.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] virtio scsi host draft specification, v2
  2011-06-02 12:02               ` Paolo Bonzini
@ 2011-06-02 12:54                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2011-06-02 12:54 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Stefan Hajnoczi

On Thu, Jun 02, 2011 at 02:02:52PM +0200, Paolo Bonzini wrote:
> On 06/02/2011 01:41 PM, Michael S. Tsirkin wrote:
> >Now to our problem:
> >As far as I can tell there are two input buffers in each request: sense
> >and data. Right?
> >
> >If sense is fixed length, we can simply put it first, have device write
> >sense then data.  This does not seem too limiting, if you want a lot of
> >flexibility sense length can be in device config.  If we don't want to
> >limit ourselves to fixed length sense, we would have driver use two
> >heads for a request.  This is possible but one needs to be careful in
> >the driver to make sure there's enough space for both requests. Maybe
> >add_bufs API to add multiple bufs might be a good idea here.
> 
> I should be on holiday today so I'll answer this quickly.  Sounds
> like we can converge, I'll put data at the end and define the length
> of sense in the config: the device writes a default (defined by the
> spec to be always 96) and the driver can modify it.  The _single_
> head would contain:
> 
> - read-only: command etc.
> 
> followed by:
> 
> - write-only: sense, status etc.
> 
> followed by:
> 
> - read-only: data to device
> - write-only: data from device

Yes, this works.

> IIUC, qemu only sees a bunch of read-only and write-only buffers.
> It doesn't see the relative ordering of read-only vs. write-only.

In virtio write is always before read.

> But it knows the sizes of read-only and write-only data, so it can
> figure out datain_size and dataout_size.  sense_size is in the
> config, so neither of the three needs to be in the request.
> 
> sense_len needs to stay, since any number of bytes can be written in
> the sense buffer.

I think this means sense_len needs to go into the in buffer
as well. As head is first it's an out buffer.

>  The used-length field should be usable for
> uni-directional commands, but I'm not sure about commands that have
> both datain and dataout.  I'll read the SCSI spec about it tomorrow.

used-length is the part of in buffer actually written.
actual data len is thus used-length - 96
(sense is assumed to be fixed length by virtio ring,
 sense_len tells driver how many bytes are
 actually valid).


> 
> Making qemu support arbitrarily partitioned buffers may require some
> extra utility functions to work on iovecs, but nothing too complex.
> If your patches already contain something like that, please dig them
> up so I can avoid duplicate work!
> 
> Paolo

Yes. Will do.

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-06-02 12:54 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-20  8:21 [Qemu-devel] virtio scsi host draft specification, v2 Paolo Bonzini
2011-05-28 19:33 ` Stefan Hajnoczi
2011-05-30  9:22   ` Paolo Bonzini
2011-06-01  4:44     ` Stefan Hajnoczi
2011-06-01  8:49 ` Michael S. Tsirkin
2011-06-01 11:59   ` Paolo Bonzini
2011-06-01 12:51     ` Michael S. Tsirkin
2011-06-01 13:31       ` Paolo Bonzini
2011-06-01 14:36         ` Michael S. Tsirkin
2011-06-01 14:51           ` Avi Kivity
2011-06-02 10:42             ` Michael S. Tsirkin
2011-06-02 11:42               ` Avi Kivity
2011-06-02 11:56                 ` Michael S. Tsirkin
2011-06-02 12:18                   ` Avi Kivity
2011-06-01 15:59           ` Paolo Bonzini
2011-06-02 11:41             ` Michael S. Tsirkin
2011-06-02 12:02               ` Paolo Bonzini
2011-06-02 12:54                 ` Michael S. Tsirkin
2011-06-01 13:46 ` Avi Kivity
2011-06-01 16:25   ` Paolo Bonzini
2011-06-01 16:29     ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.