All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [sneak preview] major scsi overhaul
@ 2009-11-06 23:09 Gerd Hoffmann
  2009-11-07 15:22 ` Blue Swirl
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-06 23:09 UTC (permalink / raw)
  To: qemu-devel

   Hi,

http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v6

What is in there?

   (1) Reworked scsi data structures.  There is a SCSIRequest struct now.
   (2) Restructed emulation in the scsi-disk driver.
   (3) New interface for HBA <=> SCSIDevice interaction:
       * building on the new SCSIRequest struct.
       * the silly read_data/write_data ping-pong game is gone.
       * support for scatter lists and zerocopy I/O is there.
   (4) Added support for the new interface to scsi-disk and scsi-generic.
   (5) Switched esp and usb-storage to the new interface.
       * fixed usb-storage state engine along the way.  And, no, a
         backport to stable isn't going to happen.  With the new
         interface state tracking is simpler and easier to get right.
   (6) A bunch of misc fixes and improvements.

What needs to be done?

   (1) Better patch descriptions.
   (2) Submit patches to the list for review.
   (3) Switch over lsi to the new interface.
   (4) Zap old interface, killing tons of dead code.
   (5) Final cleanups.

Why should you have a look?

   * You want use usb-storage.
     - works rock-solid now for me, it didn't before.
   * You are writing a megaraid-sas HBA.
   * You are writing a virtio-scsi HBA.
   * You are considering to merge this for 0.12

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-06 23:09 [Qemu-devel] [sneak preview] major scsi overhaul Gerd Hoffmann
@ 2009-11-07 15:22 ` Blue Swirl
  2009-11-09  9:08   ` Gerd Hoffmann
  2009-11-11  4:06 ` Paul Brook
  2009-11-11 11:21 ` [Qemu-devel] " Gerd Hoffmann
  2 siblings, 1 reply; 43+ messages in thread
From: Blue Swirl @ 2009-11-07 15:22 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel

On Sat, Nov 7, 2009 at 1:09 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>  Hi,
>
> http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v6
>
> What is in there?
>
>  (1) Reworked scsi data structures.  There is a SCSIRequest struct now.
>  (2) Restructed emulation in the scsi-disk driver.
>  (3) New interface for HBA <=> SCSIDevice interaction:
>      * building on the new SCSIRequest struct.
>      * the silly read_data/write_data ping-pong game is gone.
>      * support for scatter lists and zerocopy I/O is there.

Cool.

>  (4) Added support for the new interface to scsi-disk and scsi-generic.
>  (5) Switched esp and usb-storage to the new interface.
>      * fixed usb-storage state engine along the way.  And, no, a
>        backport to stable isn't going to happen.  With the new
>        interface state tracking is simpler and easier to get right.
>  (6) A bunch of misc fixes and improvements.
>
> What needs to be done?
>
>  (1) Better patch descriptions.
>  (2) Submit patches to the list for review.
>  (3) Switch over lsi to the new interface.
>  (4) Zap old interface, killing tons of dead code.
>  (5) Final cleanups.

In general the commits look good, there are many obviously correct cleanups.

However, what happens to the DPRINTFs, it looks like they are removed
in the process?

You are also moving the compilation to Makefile.hw, which is not
exactly an improvement. Is this needed because of the QEMUIOVector
stuff?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-07 15:22 ` Blue Swirl
@ 2009-11-09  9:08   ` Gerd Hoffmann
  2009-11-09 12:37     ` Avi Kivity
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-09  9:08 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On 11/07/09 16:22, Blue Swirl wrote:
> In general the commits look good, there are many obviously correct cleanups.
>
> However, what happens to the DPRINTFs, it looks like they are removed
> in the process?

I guess you are talking about the ones for each emulated command in 
scsi-disk.c?  There is scsi_print_req() now, filling this hole.  I'll 
stick in a call, wrapped into #ifdef DEBUG_SCSI, so you'll get this 
printed by default when compiling with debugging enabled.

> You are also moving the compilation to Makefile.hw, which is not
> exactly an improvement. Is this needed because of the QEMUIOVector
> stuff?

Almost correct ;)

It is because of QEMUSGList which drags in a target_phys_addr_t dependency.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-09  9:08   ` Gerd Hoffmann
@ 2009-11-09 12:37     ` Avi Kivity
  2009-11-09 13:03       ` Gerd Hoffmann
  2009-11-09 20:38       ` Blue Swirl
  0 siblings, 2 replies; 43+ messages in thread
From: Avi Kivity @ 2009-11-09 12:37 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Blue Swirl, qemu-devel

On 11/09/2009 11:08 AM, Gerd Hoffmann wrote:
>
>> You are also moving the compilation to Makefile.hw, which is not
>> exactly an improvement. Is this needed because of the QEMUIOVector
>> stuff?
>
> Almost correct ;)
>
> It is because of QEMUSGList which drags in a target_phys_addr_t 
> dependency.

As Michael notes, devices have physical address sizes independent of the 
target platform; a PCI device that supports 64-bit addresses can be 
plugged into a motherboard that supports 32-bit address bus processors.

We can fix this in several ways:
- creating QEMUSG64List and QEMUSG32List (and typedefing PCISGList to 
the former)
- making QEMUSGList always use 64-bit addresses since it will almost 
always be used with devices (which are often 64-bit capable)
- making target_phys_addr_t always 64-bit (which loses some performance 
with 32-on-32 emulation)
- others?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-09 12:37     ` Avi Kivity
@ 2009-11-09 13:03       ` Gerd Hoffmann
  2009-11-09 13:17         ` Avi Kivity
  2009-11-09 20:38       ` Blue Swirl
  1 sibling, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-09 13:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Blue Swirl, qemu-devel

On 11/09/09 13:37, Avi Kivity wrote:
> On 11/09/2009 11:08 AM, Gerd Hoffmann wrote:
>> It is because of QEMUSGList which drags in a target_phys_addr_t
>> dependency.
>
> As Michael notes, devices have physical address sizes independent of the
> target platform; a PCI device that supports 64-bit addresses can be
> plugged into a motherboard that supports 32-bit address bus processors.
>
> We can fix this in several ways:
> - creating QEMUSG64List and QEMUSG32List (and typedefing PCISGList to
> the former)

Doesn't fly.  Either every SCSI HBA uses QEMUSG64List (thus making this 
aequivalent to the next option from the scsi point of view), or 
scsi-disk would have to support both formats.

> - making QEMUSGList always use 64-bit addresses since it will almost
> always be used with devices (which are often 64-bit capable)

Possible.

> - making target_phys_addr_t always 64-bit (which loses some performance
> with 32-on-32 emulation)

Possible too.

Has the advantage that more code can be compiled only once, often 
target_phys_addr_t is the only reason a source file is in Makefile.hw 
instead of Makefile.

Question is how big the performance hit actually is and whenever we are 
willing to accept that or not ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-09 13:03       ` Gerd Hoffmann
@ 2009-11-09 13:17         ` Avi Kivity
  2009-11-09 13:39           ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Avi Kivity @ 2009-11-09 13:17 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Blue Swirl, qemu-devel

On 11/09/2009 03:03 PM, Gerd Hoffmann wrote:
> On 11/09/09 13:37, Avi Kivity wrote:
>> On 11/09/2009 11:08 AM, Gerd Hoffmann wrote:
>>> It is because of QEMUSGList which drags in a target_phys_addr_t
>>> dependency.
>>
>> As Michael notes, devices have physical address sizes independent of the
>> target platform; a PCI device that supports 64-bit addresses can be
>> plugged into a motherboard that supports 32-bit address bus processors.
>>
>> We can fix this in several ways:
>> - creating QEMUSG64List and QEMUSG32List (and typedefing PCISGList to
>> the former)
>
> Doesn't fly.  Either every SCSI HBA uses QEMUSG64List (thus making 
> this aequivalent to the next option from the scsi point of view), or 
> scsi-disk would have to support both formats.

A device or device set which is always 64-bit would always use 
QEMUSG64List.  A device which is always 32-bit would use QEMUSG32List.

>
>> - making QEMUSGList always use 64-bit addresses since it will almost
>> always be used with devices (which are often 64-bit capable)
>
> Possible.

This is my preferred option btw.  There's no performance impact since 
actual device emulation will dwarf an 64-bit impact.

>> - making target_phys_addr_t always 64-bit (which loses some performance
>> with 32-on-32 emulation)
>
> Possible too.
>
> Has the advantage that more code can be compiled only once, often 
> target_phys_addr_t is the only reason a source file is in Makefile.hw 
> instead of Makefile.
>
> Question is how big the performance hit actually is and whenever we 
> are willing to accept that or not ...
>

Probably unmeasurable, but the qemu-devel thread size will be very 
measurable so not worth it.

Note platform-independent devices (like pci) shouldn't depend on 
target_phys_addr_t anyway; QEMSG64List is one part of this.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-09 13:17         ` Avi Kivity
@ 2009-11-09 13:39           ` Gerd Hoffmann
  2009-11-09 13:48             ` Avi Kivity
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-09 13:39 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Blue Swirl, qemu-devel

   Hi,

>> Doesn't fly. Either every SCSI HBA uses QEMUSG64List (thus making this
>> aequivalent to the next option from the scsi point of view), or
>> scsi-disk would have to support both formats.
>
> A device or device set which is always 64-bit would always use
> QEMUSG64List. A device which is always 32-bit would use QEMUSG32List.

A scsi-disk can be connected to both 32-bit and 64-bit HBAs though ...

>> Question is how big the performance hit actually is and whenever we
>> are willing to accept that or not ...
>
> Probably unmeasurable, but the qemu-devel thread size will be very
> measurable so not worth it.

;)

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-09 13:39           ` Gerd Hoffmann
@ 2009-11-09 13:48             ` Avi Kivity
  0 siblings, 0 replies; 43+ messages in thread
From: Avi Kivity @ 2009-11-09 13:48 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Blue Swirl, qemu-devel

On 11/09/2009 03:39 PM, Gerd Hoffmann wrote:
>   Hi,
>
>>> Doesn't fly. Either every SCSI HBA uses QEMUSG64List (thus making this
>>> aequivalent to the next option from the scsi point of view), or
>>> scsi-disk would have to support both formats.
>>
>> A device or device set which is always 64-bit would always use
>> QEMUSG64List. A device which is always 32-bit would use QEMUSG32List.
>
> A scsi-disk can be connected to both 32-bit and 64-bit HBAs though ...
>

A line has to be drawn.  Since we have a limited number of HBAs, and 
since the performance impact will be minimal, we can safely use 64-bit 
for scsi.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-09 12:37     ` Avi Kivity
  2009-11-09 13:03       ` Gerd Hoffmann
@ 2009-11-09 20:38       ` Blue Swirl
  2009-11-09 21:25         ` Gerd Hoffmann
  1 sibling, 1 reply; 43+ messages in thread
From: Blue Swirl @ 2009-11-09 20:38 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gerd Hoffmann, qemu-devel

On Mon, Nov 9, 2009 at 2:37 PM, Avi Kivity <avi@redhat.com> wrote:
> On 11/09/2009 11:08 AM, Gerd Hoffmann wrote:
>>
>>> You are also moving the compilation to Makefile.hw, which is not
>>> exactly an improvement. Is this needed because of the QEMUIOVector
>>> stuff?
>>
>> Almost correct ;)
>>
>> It is because of QEMUSGList which drags in a target_phys_addr_t
>> dependency.
>
> As Michael notes, devices have physical address sizes independent of the
> target platform; a PCI device that supports 64-bit addresses can be plugged
> into a motherboard that supports 32-bit address bus processors.

True. But I think the solution is not to make all buses maximum width,
but support multiple buses with width translation in between.

> We can fix this in several ways:
> - creating QEMUSG64List and QEMUSG32List (and typedefing PCISGList to the
> former)
> - making QEMUSGList always use 64-bit addresses since it will almost always
> be used with devices (which are often 64-bit capable)
> - making target_phys_addr_t always 64-bit (which loses some performance with
> 32-on-32 emulation)
> - others?

We could improve the compilation system so that on 64 bit hosts the
benefit of single size target_phys_addr_t results in compiling the
files only once.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-09 20:38       ` Blue Swirl
@ 2009-11-09 21:25         ` Gerd Hoffmann
  0 siblings, 0 replies; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-09 21:25 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Avi Kivity, qemu-devel

On 11/09/09 21:38, Blue Swirl wrote:
>> As Michael notes, devices have physical address sizes independent of the
>> target platform; a PCI device that supports 64-bit addresses can be plugged
>> into a motherboard that supports 32-bit address bus processors.
>
> True. But I think the solution is not to make all buses maximum width,
> but support multiple buses with width translation in between.

I suspect translating between 32 and 64 bit is more complex and more 
overhead than simply using 64bit all the time.

> We could improve the compilation system so that on 64 bit hosts the
> benefit of single size target_phys_addr_t results in compiling the
> files only once.

I think this is already the case.  When building on x86_64 host (both 
32bit and 64bit targets) I find the libhw32 directory empty ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-06 23:09 [Qemu-devel] [sneak preview] major scsi overhaul Gerd Hoffmann
  2009-11-07 15:22 ` Blue Swirl
@ 2009-11-11  4:06 ` Paul Brook
  2009-11-11  9:41   ` Gerd Hoffmann
  2009-11-11 11:21 ` [Qemu-devel] " Gerd Hoffmann
  2 siblings, 1 reply; 43+ messages in thread
From: Paul Brook @ 2009-11-11  4:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Gerd Hoffmann

On Friday 06 November 2009, Gerd Hoffmann wrote:
>    Hi,
> 
> http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v6
> 
> What is in there?
>   (3) New interface for HBA <=> SCSIDevice interaction:
>       * building on the new SCSIRequest struct.
>       * the silly read_data/write_data ping-pong game is gone.
>       * support for scatter lists and zerocopy I/O is there.

The "silly ping-pong game" is there for a reason.

Your code assumes that the whole transfer will be completed in a single 
request. This is not true for any of the current HBAs. Expecting the HBA to 
cache the entire response is simply not acceptable. Remember that a single 
transfer can be very large (greater than available ram on the host). Even with 
a SG capable HBA, you can not assume all the data will be transferred in a 
single request.

You should also consider how this interacts with command queueing. 
IIUC an Initiator (HBA) typically sends a tagged command, then disconnects 
from the target (disk). At some later time the target reconnects, and the 
initiator starts the DMA transfer. By my reading your code does not issue any 
IO requests until after the HBA starts transferring data. The only way to 
achieve this is for the target to pretend it has data available immediately, 
at which point the transfer stalls and we loose the opportunity for parallel 
command queueing.

On a cosmetic level, you seem somewhat schizophrenic between _req_ and 
_request_.

Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-11  4:06 ` Paul Brook
@ 2009-11-11  9:41   ` Gerd Hoffmann
  2009-11-11 14:13     ` Paul Brook
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-11  9:41 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 11/11/09 05:06, Paul Brook wrote:
> On Friday 06 November 2009, Gerd Hoffmann wrote:
>>     Hi,
>>
>> http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v6
>>
>> What is in there?
>>    (3) New interface for HBA<=>  SCSIDevice interaction:
>>        * building on the new SCSIRequest struct.
>>        * the silly read_data/write_data ping-pong game is gone.
>>        * support for scatter lists and zerocopy I/O is there.
>
> The "silly ping-pong game" is there for a reason.
>
> Your code assumes that the whole transfer will be completed in a single
> request.

... to the qemu block layer.  Yes.

The HBA <=> guest driver communication is a completely different story 
though.

> This is not true for any of the current HBAs.  Expecting the HBA to
> cache the entire response is simply not acceptable.

The current qemu code *does* cache the response.  scsi-disk caps the 
buffer at 128k (which is big enough for any request I've seen in my 
testing).  scsi-generic has no cap.

With the old interface scsi-disk and scsi-generic do the caching. 
Unconditionally.

With the new interface the HBA has to handle the caching if needed.  But 
the HBA also has the option to pass scatter lists, in which case qemu 
doesn't do any caching, the data is transfered directly from/to guest 
memory.  Which is clearly an improvement IMO.

> Remember that a single
> transfer can be very large (greater than available ram on the host). Even with
> a SG capable HBA, you can not assume all the data will be transferred in a
> single request.

scsi-generic: It must be a single request anyway, and it already is today.

scsi-disk: dma_bdrv_{read,write} will split it into smaller chunks if 
needed.

> You should also consider how this interacts with command queueing.
> IIUC an Initiator (HBA) typically sends a tagged command, then disconnects
> from the target (disk). At some later time the target reconnects, and the
> initiator starts the DMA transfer. By my reading your code does not issue any
> IO requests until after the HBA starts transferring data.

Hmm?  What code you are looking at?

For esp and usb-storage reads and writes are handles slightly different. 
  They roughly works like this:

read requests:
   - allocate + parse scsi command.      scsi_req_get+scsi_req_parse
   - submit I/O to qemu block layer.     scsi_req_buf
   - copy data do guest.
   - return status, release request      scsi_req_put

write requests:
   - allocate + parse scsi command.      scsi_req_get+scsi_req_parse
   - copy data from guest.
   - submit I/O to qemu block layer.     scsi_req_buf
   - return status, release request      scsi_req_put

Oh, and both do not support command queuing anyway.

lsi (only one in-tree with TCQ support) works like this:

- allocate + parse scsi command.        scsi_req_get+scsi_req_parse
- continue script processing, collect
   DMA addresses and stick them into
   a scatter list until it is complete.
- queue command and disconnect.
- submit I/O to the qemu block layer    scsi_req_sgl

*can process more scsi commands here*

- when I/O is finished reselect tag
- return status, release request.       scsi_req_put

[ Yes, this should go to the changelog.  As mentioned in the
   announcement the commit comments need some work ... ]

> The only way to
> achieve this is for the target to pretend it has data available immediately,
> at which point the transfer stalls and we loose the opportunity for parallel
> command queueing.

Note that command parsing and I/O submitting is separate now.  So the 
HBA knows how much data is going to be transfered by the command before 
actually submitting the I/O.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Qemu-devel] Re: [sneak preview] major scsi overhaul
  2009-11-06 23:09 [Qemu-devel] [sneak preview] major scsi overhaul Gerd Hoffmann
  2009-11-07 15:22 ` Blue Swirl
  2009-11-11  4:06 ` Paul Brook
@ 2009-11-11 11:21 ` Gerd Hoffmann
  2009-11-11 11:52   ` Hannes Reinecke
  2 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-11 11:21 UTC (permalink / raw)
  To: qemu-devel

On 11/07/09 00:09, Gerd Hoffmann wrote:
>   Hi,
>
> http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v6

Pushed an update, same branch.

> What needs to be done?
>
> (1) Better patch descriptions.
> (2) Submit patches to the list for review.

Still to do.  Also:

  - fold incremental fixes into the original patches.  Will happen next
    time I rebase the patches.
  - Also SCSI_XFER_* will be loose the META/DATA separation, it will
    only be "none / from device / to device".  Also planned for the next
    rebase.

> (3) Switch over lsi to the new interface.
> (4) Zap old interface, killing tons of dead code.
> (5) Final cleanups.

All done now.  So the final code tree is almost where I want to have it, 
although the patches series need some more work to be review-able.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] Re: [sneak preview] major scsi overhaul
  2009-11-11 11:21 ` [Qemu-devel] " Gerd Hoffmann
@ 2009-11-11 11:52   ` Hannes Reinecke
  2009-11-11 13:02     ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2009-11-11 11:52 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel

Gerd Hoffmann wrote:
> On 11/07/09 00:09, Gerd Hoffmann wrote:
>>   Hi,
>>
>> http://repo.or.cz/w/qemu/kraxel.git/shortlog/refs/heads/scsi.v6
> 
> Pushed an update, same branch.
> 
>> What needs to be done?
>>
>> (1) Better patch descriptions.
>> (2) Submit patches to the list for review.
> 
> Still to do.  Also:
> 
>  - fold incremental fixes into the original patches.  Will happen next
>    time I rebase the patches.
>  - Also SCSI_XFER_* will be loose the META/DATA separation, it will
>    only be "none / from device / to device".  Also planned for the next
>    rebase.
> 
Thanks. I'm rediffing my megasas driver.

I still don't quite agree with the new scsi callback, which has
just the SCSIRequest as an argument. So when supporting more command
queueing we have no idea to which (internal) request structure
the SCSIRequest belongs, and still have to do a lookup.
Which is painful. Can we have a second 'void *arg' argument
to the callback which will allow us some driver specific
pointer?

>> (3) Switch over lsi to the new interface.
>> (4) Zap old interface, killing tons of dead code.
>> (5) Final cleanups.
> 
> All done now.  So the final code tree is almost where I want to have it,
> although the patches series need some more work to be review-able.
> 
I have two more patches here, one to support more than 8 SCSI devices.
And another one updating the scsi-generic infrastructure to latch
on any device supporting SG_IO.
Will be posting them soon.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] Re: [sneak preview] major scsi overhaul
  2009-11-11 11:52   ` Hannes Reinecke
@ 2009-11-11 13:02     ` Gerd Hoffmann
  2009-11-11 13:30       ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-11 13:02 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: qemu-devel

On 11/11/09 12:52, Hannes Reinecke wrote:
> I still don't quite agree with the new scsi callback, which has
> just the SCSIRequest as an argument. So when supporting more command
> queueing we have no idea to which (internal) request structure
> the SCSIRequest belongs, and still have to do a lookup.
> Which is painful. Can we have a second 'void *arg' argument
> to the callback which will allow us some driver specific
> pointer?

How about sticking a 'void *hba_private' element into SCSIRequest instead?

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] Re: [sneak preview] major scsi overhaul
  2009-11-11 13:02     ` Gerd Hoffmann
@ 2009-11-11 13:30       ` Hannes Reinecke
  2009-11-11 14:37         ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2009-11-11 13:30 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel

Gerd Hoffmann wrote:
> On 11/11/09 12:52, Hannes Reinecke wrote:
>> I still don't quite agree with the new scsi callback, which has
>> just the SCSIRequest as an argument. So when supporting more command
>> queueing we have no idea to which (internal) request structure
>> the SCSIRequest belongs, and still have to do a lookup.
>> Which is painful. Can we have a second 'void *arg' argument
>> to the callback which will allow us some driver specific
>> pointer?
> 
> How about sticking a 'void *hba_private' element into SCSIRequest instead?
> 
Would work for me, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-11  9:41   ` Gerd Hoffmann
@ 2009-11-11 14:13     ` Paul Brook
  2009-11-11 15:26       ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Brook @ 2009-11-11 14:13 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel

> The current qemu code *does* cache the response.  scsi-disk caps the
> buffer at 128k (which is big enough for any request I've seen in my
> testing).  scsi-generic has no cap.

That cap is important.
For scsi-generic you probably don't have a choice because of the way the 
kernel interface works.
 
> With the new interface the HBA has to handle the caching if needed.  But
> the HBA also has the option to pass scatter lists, in which case qemu
> doesn't do any caching, the data is transfered directly from/to guest
> memory.  Which is clearly an improvement IMO.

> > Remember that a single
> > transfer can be very large (greater than available ram on the host). Even
> > with a SG capable HBA, you can not assume all the data will be
> > transferred in a single request.
> 
> scsi-generic: It must be a single request anyway, and it already is today.
> 
> scsi-disk: dma_bdrv_{read,write} will split it into smaller chunks if
> needed.

You seem to be assuming the HBA knows where it's going to put the data before 
it issues the command. This is not true (see blow).

> > You should also consider how this interacts with command queueing.
> > IIUC an Initiator (HBA) typically sends a tagged command, then
> > disconnects from the target (disk). At some later time the target
> > reconnects, and the initiator starts the DMA transfer. By my reading your
> > code does not issue any IO requests until after the HBA starts
> > transferring data.
> 
> Hmm?  What code you are looking at?
> 
> For esp and usb-storage reads and writes are handles slightly different.
>   They roughly works like this:

Neither ESP nor usb-storage implement command queueing, so aren't interesting.

> lsi (only one in-tree with TCQ support) works like this:
> 
> - allocate + parse scsi command.        scsi_req_get+scsi_req_parse
> - continue script processing, collect
>    DMA addresses and stick them into
>    a scatter list until it is complete.
> - queue command and disconnect.
> - submit I/O to the qemu block layer    scsi_req_sgl
> 
> *can process more scsi commands here*
>
> - when I/O is finished reselect tag
> - return status, release request.       scsi_req_put

I'm pretty sure this is wrong, and what actually happens is:

1) Wait for device to reconnect (goto 5), or commands from host (goto 2).

2) SCRIPTS connect to device, and send command.
3) If device has data immediately (metadata command) then goto 6
4) Device disconnects. goto 1

5) Device has data ready, and reconnects
6) SCRIPTS locate the next DMA block for this command, and initiate a (linear) 
DMA transfer.
7) DATA is transferred. Note that DMA stalls the SCRIPTS processor until the 
transfer completes.
8) If the device still has data then goto 6.
9) If the device runs out of data before the command completes then goto 3.
10) Command complete. goto 1

Note that the IO command is parsed at stage 2, but the data transfer is not 
requested until stage 6. i.e. after the command has partially completed. This 
window between issue and data transfer is where other commands are issued.

The only way to make your API work is to skip straight from step 3 to step 6, 
which effectively looses the command queueing capability. It may be that it's 
hard/impossible to get both command queueing and zero-copy. In that case I say 
command queueing wins.

Also note that use of self-modifying SCRIPTS is common.

Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] Re: [sneak preview] major scsi overhaul
  2009-11-11 13:30       ` Hannes Reinecke
@ 2009-11-11 14:37         ` Gerd Hoffmann
  2009-11-12  9:54           ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-11 14:37 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: qemu-devel

On 11/11/09 14:30, Hannes Reinecke wrote:
> Gerd Hoffmann wrote:
>> How about sticking a 'void *hba_private' element into SCSIRequest instead?
>>
> Would work for me, too.

Pushed (scsi.v7 now).

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-11 14:13     ` Paul Brook
@ 2009-11-11 15:26       ` Gerd Hoffmann
  2009-11-11 16:38         ` Paul Brook
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-11 15:26 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 11/11/09 15:13, Paul Brook wrote:
>> The current qemu code *does* cache the response.  scsi-disk caps the
>> buffer at 128k (which is big enough for any request I've seen in my
>> testing).  scsi-generic has no cap.
>
> That cap is important.
> For scsi-generic you probably don't have a choice because of the way the
> kernel interface works.

Exactly.  And why is the cap important for scsi-disk if scsi-generic 
does fine without?

>> scsi-disk: dma_bdrv_{read,write} will split it into smaller chunks if
>> needed.
>
> You seem to be assuming the HBA knows where it's going to put the data before
> it issues the command.  This is not true (see blow).

You are talking about a real HBA I guess?

>> lsi (only one in-tree with TCQ support) works like this:
>>
>> - allocate + parse scsi command.        scsi_req_get+scsi_req_parse
>> - continue script processing, collect
>>     DMA addresses and stick them into
>>     a scatter list until it is complete.
>> - queue command and disconnect.
>> - submit I/O to the qemu block layer    scsi_req_sgl
>>
>> *can process more scsi commands here*
>>
>> - when I/O is finished reselect tag
>> - return status, release request.       scsi_req_put
>
> I'm pretty sure this is wrong,

This describes what the lsi emulation does with my patches applied.

> and what actually happens is:
>
> 1) Wait for device to reconnect (goto 5), or commands from host (goto 2).
>
> 2) SCRIPTS connect to device, and send command.
> 3) If device has data immediately (metadata command) then goto 6
> 4) Device disconnects. goto 1
>
> 5) Device has data ready, and reconnects
> 6) SCRIPTS locate the next DMA block for this command, and initiate a (linear)
> DMA transfer.
> 7) DATA is transferred. Note that DMA stalls the SCRIPTS processor until the
> transfer completes.
> 8) If the device still has data then goto 6.
> 9) If the device runs out of data before the command completes then goto 3.
> 10) Command complete. goto 1

This is what a real HBA does.

> Note that the IO command is parsed at stage 2, but the data transfer is not
> requested until stage 6. i.e. after the command has partially completed. This
> window between issue and data transfer is where other commands are issued.

Or when the device disconnects again in the middle of a transfer.

> The only way to make your API work is to skip straight from step 3 to step 6,
> which effectively looses the command queueing capability.

It doesn't.  The disconnect and thus the opportunity to submit more 
commands while the device is busy doing the actual I/O is there.

> It may be that it's
> hard/impossible to get both command queueing and zero-copy.

I have it working.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-11 15:26       ` Gerd Hoffmann
@ 2009-11-11 16:38         ` Paul Brook
  2009-11-16 16:35           ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Brook @ 2009-11-11 16:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: Gerd Hoffmann

>> That cap is important.
>> For scsi-generic you probably don't have a choice because of the way the
>> kernel interface works.
>
>Exactly.  And why is the cap important for scsi-disk if scsi-generic
>does fine without?

With scsi-generic you're at the mercy of what the kernel API gives you, and if 
the guest hardware/OS isn't cooperative then you loose. With scsi-disk we can 
do significantly better.

> > The only way to make your API work is to skip straight from step 3 to
> > step 6, which effectively looses the command queueing capability.
> 
> It doesn't.  The disconnect and thus the opportunity to submit more
> commands while the device is busy doing the actual I/O is there.

Disconnecting on the first DMA request (after switching to a data phase and 
transferring zero bytes) is bizarre behavior, but probably allowable.

However by my reading DMA transfers must be performed synchronously by the 
SCRIPTS engine, so you need to do a lot of extra checking to prove that you 
can safely continue execution without actually performing the transfer.

> > It may be that it's
> > hard/impossible to get both command queueing and zero-copy.
> 
> I have it working.

More likely you have a nasty hack that happens to work with the Linux drivers. 
IIUC you're pretending that the DMA completed and eventually disconnecting the 
device, assuming that nothing will read that data until the command complete 
notification is received.

Consider the case there the guest transfers the data from a single command in 
two blocks, and has the HBA raise an interrupt in-between so that it can start 
processing before the command completes. This processing could even be done by 
the SCRIPTS engine itself, or a guest could even reuse the buffer for the 
second DMA block.

Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] Re: [sneak preview] major scsi overhaul
  2009-11-11 14:37         ` Gerd Hoffmann
@ 2009-11-12  9:54           ` Hannes Reinecke
  0 siblings, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2009-11-12  9:54 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel

Gerd Hoffmann wrote:
> On 11/11/09 14:30, Hannes Reinecke wrote:
>> Gerd Hoffmann wrote:
>>> How about sticking a 'void *hba_private' element into SCSIRequest
>>> instead?
>>>
>> Would work for me, too.
> 
> Pushed (scsi.v7 now).
> 
Okay, I've converted the driver.
Works so far.

One minor nitpick, though:

diff --git a/dma-helpers.c b/dma-helpers.c
index 712ed89..0c57648 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -12,7 +12,10 @@
 
 void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
 {
-    qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
+    if (alloc_hint > 0)
+       qsg->sg = qemu_malloc(alloc_hint * sizeof(ScatterGatherEntry));
+    else
+       qsg->sg = NULL;
     qsg->nsg = 0;
     qsg->nalloc = alloc_hint;
     qsg->size = 0;

For some commands (eg TUR) alloc_hint would be '0' here,
causing qemu to barf.

Can you include this patch, too?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-11 16:38         ` Paul Brook
@ 2009-11-16 16:35           ` Gerd Hoffmann
  2009-11-16 18:53             ` Paul Brook
  2009-11-16 19:08             ` Ryan Harper
  0 siblings, 2 replies; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-16 16:35 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 11/11/09 17:38, Paul Brook wrote:
>>> That cap is important.
>>> For scsi-generic you probably don't have a choice because of the way the
>>> kernel interface works.
>>
>> Exactly.  And why is the cap important for scsi-disk if scsi-generic
>> does fine without?
>
> With scsi-generic you're at the mercy of what the kernel API gives you, and if
> the guest hardware/OS isn't cooperative then you loose.

The guest will loose with unreasonable large requests.  qemu_malloc() -> 
oom() -> abort() -> guest is gone.

We can also limit the amout of host memory we allow the guest to 
consume, so uncooperative guests can't push the host into swap.  This is 
not implemented today, indicating that it hasn't been a problem so far. 
  And with zerocopy it will be even less a problem as we don't need host 
memory to buffer the data ...

>> It doesn't.  The disconnect and thus the opportunity to submit more
>> commands while the device is busy doing the actual I/O is there.
>
> Disconnecting on the first DMA request (after switching to a data phase and
> transferring zero bytes) is bizarre behavior, but probably allowable.

The new lsi code doesn't.  The old code could do that under certain 
circumstances.  And what is bizarre about that?  A real hard drive will 
most likely do exactly that on reads (unless it has the data cached and 
can start the transfer instantly).

> However by my reading DMA transfers must be performed synchronously by the
> SCRIPTS engine, so you need to do a lot of extra checking to prove that you
> can safely continue execution without actually performing the transfer.

I'll happily add a 'strict' mode which does data transfers synchronously 
in case any compatibility issues show up.

Such a mode would be slower of course.  We'll have to either do the I/O 
in lots of little chunks or loose zerocopy.  Large transfers + memcpy is 
probably the faster option.

>>> It may be that it's
>>> hard/impossible to get both command queueing and zero-copy.
>>
>> I have it working.
>
> More likely you have a nasty hack that happens to work with the Linux drivers.

Linux works.  Windows XP works.  FreeBSD works.  More regression testing 
is planned.

Suggestions for other guests which might be sensitive to changes like 
this?  Maybe even some which don't work with the current lsi emulation?

> IIUC you're pretending that the DMA completed and eventually disconnecting the
> device, assuming that nothing will read that data until the command complete
> notification is received.

Yes.

> Consider the case there the guest transfers the data from a single command in
> two blocks, and has the HBA raise an interrupt in-between so that it can start
> processing before the command completes. This processing could even be done by
> the SCRIPTS engine itself, or a guest could even reuse the buffer for the
> second DMA block.

In theory you can do all sorts of crazy stuff with the lsi scripts 
engine, like implementing a ext2 filesystem driver.

In practice I expect guests will use the scripts engine only for stuff 
which is also supported by other scsi controllers.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-16 16:35           ` Gerd Hoffmann
@ 2009-11-16 18:53             ` Paul Brook
  2009-11-16 21:50               ` Gerd Hoffmann
  2009-11-24 11:59               ` Gerd Hoffmann
  2009-11-16 19:08             ` Ryan Harper
  1 sibling, 2 replies; 43+ messages in thread
From: Paul Brook @ 2009-11-16 18:53 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: qemu-devel

On Monday 16 November 2009, Gerd Hoffmann wrote:
> On 11/11/09 17:38, Paul Brook wrote:
> >>> That cap is important.
> >>> For scsi-generic you probably don't have a choice because of the way
> >>> the kernel interface works.
> >>
> >> Exactly.  And why is the cap important for scsi-disk if scsi-generic
> >> does fine without?
> >
> > With scsi-generic you're at the mercy of what the kernel API gives you,
> > and if the guest hardware/OS isn't cooperative then you loose.
> 
> The guest will loose with unreasonable large requests.  qemu_malloc() ->
> oom() -> abort() -> guest is gone.

Exactly. This lossage is not acceptable.

> We can also limit the amout of host memory we allow the guest to
> consume, so uncooperative guests can't push the host into swap.  This is
> not implemented today, indicating that it hasn't been a problem so far.

Capping the amount of memory required for a transfer *is* implemented, in both 
LSI and virtio-blk.  The exception being SCSI passthrough where the kernel API 
makes it impossible.

>   And with zerocopy it will be even less a problem as we don't need host
> memory to buffer the data ...

zero-copy isn't possible in many cases. You must handle the other cases 
gracefully.

> >> It doesn't.  The disconnect and thus the opportunity to submit more
> >> commands while the device is busy doing the actual I/O is there.
> >
> > Disconnecting on the first DMA request (after switching to a data phase
> > and transferring zero bytes) is bizarre behavior, but probably allowable.
> 
> The new lsi code doesn't.  The old code could do that under certain
> circumstances.  And what is bizarre about that?  A real hard drive will
> most likely do exactly that on reads (unless it has the data cached and
> can start the transfer instantly).

No. The old code goes directly from the command phase to the message 
(disconnect) phase.

> > However by my reading DMA transfers must be performed synchronously by
> > the SCRIPTS engine, so you need to do a lot of extra checking to prove
> > that you can safely continue execution without actually performing the
> > transfer.
> 
> I'll happily add a 'strict' mode which does data transfers synchronously
> in case any compatibility issues show up.
> 
> Such a mode would be slower of course.  We'll have to either do the I/O
> in lots of little chunks or loose zerocopy.  Large transfers + memcpy is
> probably the faster option.

But as you agreed above, large transfers+memcpy is not a realistic option 
because it can have excessive memory requirements.

Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-16 16:35           ` Gerd Hoffmann
  2009-11-16 18:53             ` Paul Brook
@ 2009-11-16 19:08             ` Ryan Harper
  2009-11-16 20:40               ` Gerd Hoffmann
  1 sibling, 1 reply; 43+ messages in thread
From: Ryan Harper @ 2009-11-16 19:08 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Paul Brook, qemu-devel

* Gerd Hoffmann <kraxel@redhat.com> [2009-11-16 12:38]:
> On 11/11/09 17:38, Paul Brook wrote:

> >>>It may be that it's
> >>>hard/impossible to get both command queueing and zero-copy.
> >>
> >>I have it working.
> >
> >More likely you have a nasty hack that happens to work with the Linux 
> >drivers.
> 
> Linux works.  Windows XP works.  FreeBSD works.  More regression testing 
> is planned.
> 
> Suggestions for other guests which might be sensitive to changes like 
> this?  Maybe even some which don't work with the current lsi emulation?

XP 64-bit, 2k3 32/64, 2k8 32/64

64-bit is important because the DMA mode of the lsi driver is different
when the 64-bit versions of the win drivers.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-16 19:08             ` Ryan Harper
@ 2009-11-16 20:40               ` Gerd Hoffmann
  2009-11-16 21:45                 ` Ryan Harper
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-16 20:40 UTC (permalink / raw)
  To: Ryan Harper; +Cc: Paul Brook, qemu-devel

   Hi,

>> Suggestions for other guests which might be sensitive to changes like
>> this?  Maybe even some which don't work with the current lsi emulation?
>
> XP 64-bit, 2k3 32/64, 2k8 32/64
>
> 64-bit is important because the DMA mode of the lsi driver is different
> when the 64-bit versions of the win drivers.

Are these in the "needs regression testing" or in the "known broken" group?

thanks,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-16 20:40               ` Gerd Hoffmann
@ 2009-11-16 21:45                 ` Ryan Harper
  0 siblings, 0 replies; 43+ messages in thread
From: Ryan Harper @ 2009-11-16 21:45 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Ryan Harper, Paul Brook, qemu-devel

* Gerd Hoffmann <kraxel@redhat.com> [2009-11-16 14:42]:
>   Hi,
> 
> >>Suggestions for other guests which might be sensitive to changes like
> >>this?  Maybe even some which don't work with the current lsi emulation?
> >
> >XP 64-bit, 2k3 32/64, 2k8 32/64
> >
> >64-bit is important because the DMA mode of the lsi driver is different
> >when the 64-bit versions of the win drivers.
> 
> Are these in the "needs regression testing" or in the "known broken" group?

Regression:
xp-64
2k3-32/64

Known Broken:
2k8-32/64

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-16 18:53             ` Paul Brook
@ 2009-11-16 21:50               ` Gerd Hoffmann
  2009-11-24 11:59               ` Gerd Hoffmann
  1 sibling, 0 replies; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-16 21:50 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 11/16/09 19:53, Paul Brook wrote:
>> We can also limit the amout of host memory we allow the guest to
>> consume, so uncooperative guests can't push the host into swap.  This is
>> not implemented today, indicating that it hasn't been a problem so far.
>
> Capping the amount of memory required for a transfer *is* implemented, in both
> LSI and virtio-blk.  The exception being SCSI passthrough where the kernel API
> makes it impossible.

I was talking about scsi-generic.  There is no option to reject 
excessive large requests, and it was no problem so far.

>>    And with zerocopy it will be even less a problem as we don't need host
>> memory to buffer the data ...
>
> zero-copy isn't possible in many cases. You must handle the other cases
> gracefully.

I havn't yet found a guest OS where lsi can't do zerocopy.
Name one where it doesn't work and I'll have a look.

>>> Disconnecting on the first DMA request (after switching to a data phase
>>> and transferring zero bytes) is bizarre behavior, but probably allowable.
>>
>> The new lsi code doesn't.  The old code could do that under certain
>> circumstances.  And what is bizarre about that?  A real hard drive will
>> most likely do exactly that on reads (unless it has the data cached and
>> can start the transfer instantly).
>
> No. The old code goes directly from the command phase to the message
> (disconnect) phase.

Hmm, well.  It switches from DI / DO to MI before the guest runs again 
so the guest will not notice the switch ...

>>> However by my reading DMA transfers must be performed synchronously by
>>> the SCRIPTS engine, so you need to do a lot of extra checking to prove
>>> that you can safely continue execution without actually performing the
>>> transfer.
>>
>> I'll happily add a 'strict' mode which does data transfers synchronously
>> in case any compatibility issues show up.
>>
>> Such a mode would be slower of course.  We'll have to either do the I/O
>> in lots of little chunks or loose zerocopy.  Large transfers + memcpy is
>> probably the faster option.
>
> But as you agreed above, large transfers+memcpy is not a realistic option
> because it can have excessive memory requirements.

This "large" refers to normal request sizes (which are large compared to 
page-sized scatter list entries).  Having a 64k request submitted as a 
single I/O, then memcpy is most likely faster than submitting 16 I/O 
requests with 4k each one after another.  Buffering would be no problem 
here.

But I still don't expect problems with zerocopy though.  And zerocopy 
hasn't noticable host memory requirements even on excessive large requests.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-16 18:53             ` Paul Brook
  2009-11-16 21:50               ` Gerd Hoffmann
@ 2009-11-24 11:59               ` Gerd Hoffmann
  2009-11-24 13:51                 ` Paul Brook
  1 sibling, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-24 11:59 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 11/16/09 19:53, Paul Brook wrote:
> Capping the amount of memory required for a transfer *is* implemented, in both
> LSI and virtio-blk.  The exception being SCSI passthrough where the kernel API
> makes it impossible.

Well.  Figured while doing more testing:  The allowed request size is 
limited by the kernel, so scsi-generic requests larger than (currently) 
128k fail.

Now, how to handle *that*?  Is there some way to signal to the guest 
that the request was to big?

At least for known commands such as READ+WRITE which are likely to be 
big we could split the request internally into two (or more) if needed.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-24 11:59               ` Gerd Hoffmann
@ 2009-11-24 13:51                 ` Paul Brook
  2009-11-25 16:37                   ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Paul Brook @ 2009-11-24 13:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Gerd Hoffmann

On Tuesday 24 November 2009, Gerd Hoffmann wrote:
> On 11/16/09 19:53, Paul Brook wrote:
> > Capping the amount of memory required for a transfer *is* implemented, in
> > both LSI and virtio-blk.  The exception being SCSI passthrough where the
> > kernel API makes it impossible.
> 
> Well.  Figured while doing more testing:  The allowed request size is
> limited by the kernel, so scsi-generic requests larger than (currently)
> 128k fail.
> 
> Now, how to handle *that*?  Is there some way to signal to the guest
> that the request was to big?

Same as real hardware.  Probably also want to populate the Block Limits VPD 
page appropriately

Paul

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-24 13:51                 ` Paul Brook
@ 2009-11-25 16:37                   ` Gerd Hoffmann
  2009-11-26  7:31                     ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-25 16:37 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 11/24/09 14:51, Paul Brook wrote:
> On Tuesday 24 November 2009, Gerd Hoffmann wrote:
>> On 11/16/09 19:53, Paul Brook wrote:
>>> Capping the amount of memory required for a transfer *is* implemented, in
>>> both LSI and virtio-blk.  The exception being SCSI passthrough where the
>>> kernel API makes it impossible.
>>
>> Well.  Figured while doing more testing:  The allowed request size is
>> limited by the kernel, so scsi-generic requests larger than (currently)
>> 128k fail.
>>
>> Now, how to handle *that*?  Is there some way to signal to the guest
>> that the request was to big?
>
> Same as real hardware.  Probably also want to populate the Block Limits VPD
> page appropriately

Some experiements later.

Linux reads the block limits vpd, but seems to ignore the hard limit.

Answering large requests with "Illegal request, Invalid field in CDB" 
doesn't makes linux try smaller requests, instead it reports I/O errors 
to the syslog.

Hmm.

netbsd and winxp don't try large requests in the first place (they do 
64k max).  Other guests not tried yet.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-25 16:37                   ` Gerd Hoffmann
@ 2009-11-26  7:31                     ` Hannes Reinecke
  2009-11-26  8:25                       ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2009-11-26  7:31 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Paul Brook, qemu-devel

Gerd Hoffmann wrote:
> On 11/24/09 14:51, Paul Brook wrote:
>> On Tuesday 24 November 2009, Gerd Hoffmann wrote:
>>> On 11/16/09 19:53, Paul Brook wrote:
>>>> Capping the amount of memory required for a transfer *is*
>>>> implemented, in
>>>> both LSI and virtio-blk.  The exception being SCSI passthrough where
>>>> the
>>>> kernel API makes it impossible.
>>>
>>> Well.  Figured while doing more testing:  The allowed request size is
>>> limited by the kernel, so scsi-generic requests larger than (currently)
>>> 128k fail.
>>>
>>> Now, how to handle *that*?  Is there some way to signal to the guest
>>> that the request was to big?
>>
>> Same as real hardware.  Probably also want to populate the Block
>> Limits VPD
>> page appropriately
> 
> Some experiements later.
> 
> Linux reads the block limits vpd, but seems to ignore the hard limit.
> 
Ah. Maybe we should fix that up then ...

> Answering large requests with "Illegal request, Invalid field in CDB"
> doesn't makes linux try smaller requests, instead it reports I/O errors
> to the syslog.
> 
> Hmm.
>
Can't we just put residuals to good use here?
Ie finish up the request up to the size we can handle, and return the
original request with the transfer size set correctly.
Then the SCSI stack of the guest should retry the leftovers, which
then we should be able to handle. Or redo from start until we are
able to.
Should be straightforward to implement, one would assume.
And we could probably encapsulate it entirely within the bdrv
as don't actually need to expose those limits when the block
driver layer is handling it correctly.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26  7:31                     ` Hannes Reinecke
@ 2009-11-26  8:25                       ` Gerd Hoffmann
  2009-11-26 10:57                         ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-26  8:25 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Paul Brook, qemu-devel

   Hi,

>> Answering large requests with "Illegal request, Invalid field in CDB"
>> doesn't makes linux try smaller requests, instead it reports I/O errors
>> to the syslog.
>>
>> Hmm.
>>
> Can't we just put residuals to good use here?
> Ie finish up the request up to the size we can handle, and return the
> original request with the transfer size set correctly.

I'll try.

> Should be straightforward to implement, one would assume.

The infrastructure and the HBAs have to handle that already.  It 
frequently happens with MODE SENSE for example (guest passing a 256 byte 
buffer and we have less data to fill in).  So it should be easy.

> And we could probably encapsulate it entirely within the bdrv
> as don't actually need to expose those limits when the block
> driver layer is handling it correctly.

Hmm, I don't think so.  SG_IO just returns EINVAL.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26  8:25                       ` Gerd Hoffmann
@ 2009-11-26 10:57                         ` Hannes Reinecke
  2009-11-26 11:04                           ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2009-11-26 10:57 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Paul Brook, qemu-devel

Gerd Hoffmann wrote:
>   Hi,
> 
>>> Answering large requests with "Illegal request, Invalid field in CDB"
>>> doesn't makes linux try smaller requests, instead it reports I/O errors
>>> to the syslog.
>>>
>>> Hmm.
>>>
>> Can't we just put residuals to good use here?
>> Ie finish up the request up to the size we can handle, and return the
>> original request with the transfer size set correctly.
> 
> I'll try.
> 
>> Should be straightforward to implement, one would assume.
> 
> The infrastructure and the HBAs have to handle that already.  It
> frequently happens with MODE SENSE for example (guest passing a 256 byte
> buffer and we have less data to fill in).  So it should be easy.
> 
>> And we could probably encapsulate it entirely within the bdrv
>> as don't actually need to expose those limits when the block
>> driver layer is handling it correctly.
> 
> Hmm, I don't think so.  SG_IO just returns EINVAL.
> 
Yes, but we can hook into scsi_generic_map() to cap the resulting
iovec to the limits of the underlying block device.
Then we submit an iovec which matches the capabilities of the
underlying device.
Of course we should take care to update the resulting xferlen
values to match the actualy submitted data, but that should
be easily done.

Then the guest would see a partial request and retry the
remainder of the request, which (possibly after some iterations)
would result in all data transferred, albeit at a lower speed.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26 10:57                         ` Hannes Reinecke
@ 2009-11-26 11:04                           ` Gerd Hoffmann
  2009-11-26 11:20                             ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-26 11:04 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Paul Brook, qemu-devel

On 11/26/09 11:57, Hannes Reinecke wrote:
> Then the guest would see a partial request and retry the
> remainder of the request, which (possibly after some iterations)
> would result in all data transferred, albeit at a lower speed.

Except they don't.

/me looks at drivers/scsi/sd.c:sd_done()

I can't see any sane way to tell linux that the request was too big.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26 11:04                           ` Gerd Hoffmann
@ 2009-11-26 11:20                             ` Hannes Reinecke
  2009-11-26 14:21                               ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2009-11-26 11:20 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Paul Brook, qemu-devel

Gerd Hoffmann wrote:
> On 11/26/09 11:57, Hannes Reinecke wrote:
>> Then the guest would see a partial request and retry the
>> remainder of the request, which (possibly after some iterations)
>> would result in all data transferred, albeit at a lower speed.
> 
> Except they don't.
> 
> /me looks at drivers/scsi/sd.c:sd_done()
> 
> I can't see any sane way to tell linux that the request was too big.
> 
residuals is the key:

drivers/scsi/scsi.c:scsi_finish_command()


	good_bytes = scsi_bufflen(cmd);
        if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
		int old_good_bytes = good_bytes;
		drv = scsi_cmd_to_driver(cmd);
		if (drv->done)
			good_bytes = drv->done(cmd);
		/*
		 * USB may not give sense identifying bad sector and
		 * simply return a residue instead, so subtract off the
		 * residue if drv->done() error processing indicates no
		 * change to the completion length.
		 */
		if (good_bytes == old_good_bytes)
			good_bytes -= scsi_get_resid(cmd);
	}
	scsi_io_completion(cmd, good_bytes);

and in

drivers/scsi/scsi_lib.c:scsi_io_completion()

	/*
	 * A number of bytes were successfully read.  If there
	 * are leftovers and there is some kind of error
	 * (result != 0), retry the rest.
	 */
	if (scsi_end_request(cmd, error, good_bytes, result == 0) == NULL)
		return;


No-one forces us to complete the _entire_ request; just completing
the request partially is perfectly okay as long as we mark it properly.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26 11:20                             ` Hannes Reinecke
@ 2009-11-26 14:21                               ` Gerd Hoffmann
  2009-11-26 14:27                                 ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-26 14:21 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Paul Brook, qemu-devel

On 11/26/09 12:20, Hannes Reinecke wrote:
> Gerd Hoffmann wrote:
>> /me looks at drivers/scsi/sd.c:sd_done()
>>
>> I can't see any sane way to tell linux that the request was too big.
>>
> residuals is the key:
>
> drivers/scsi/scsi.c:scsi_finish_command()
>
>
> 	good_bytes = scsi_bufflen(cmd);
>          if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
> 		int old_good_bytes = good_bytes;
> 		drv = scsi_cmd_to_driver(cmd);
> 		if (drv->done)
> 			good_bytes = drv->done(cmd);

drv->done() actually is sd_done() mentioned above.

sd_done() tries to figure how many sectors it actually got for serious 
errors.  I don't feel signaling "medium error" for the first sector 
behind our limit just because we'd like to have smaller requests.

> 		/*
> 		 * USB may not give sense identifying bad sector and
> 		 * simply return a residue instead, so subtract off the
> 		 * residue if drv->done() error processing indicates no
> 		 * change to the completion length.
> 		 */
> 		if (good_bytes == old_good_bytes)
> 			good_bytes -= scsi_get_resid(cmd);

Poor mans bad sector identification.  Same issue as above IMHO.  On top 
of that I wouldn't expect all other guest OSes having the same quirk in 
there to handle usb disks.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26 14:21                               ` Gerd Hoffmann
@ 2009-11-26 14:27                                 ` Hannes Reinecke
  2009-11-26 14:37                                   ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2009-11-26 14:27 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Paul Brook, qemu-devel

Gerd Hoffmann wrote:
> On 11/26/09 12:20, Hannes Reinecke wrote:
>> Gerd Hoffmann wrote:
>>> /me looks at drivers/scsi/sd.c:sd_done()
>>>
>>> I can't see any sane way to tell linux that the request was too big.
>>>
>> residuals is the key:
>>
>> drivers/scsi/scsi.c:scsi_finish_command()
>>
>>
>>     good_bytes = scsi_bufflen(cmd);
>>          if (cmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
>>         int old_good_bytes = good_bytes;
>>         drv = scsi_cmd_to_driver(cmd);
>>         if (drv->done)
>>             good_bytes = drv->done(cmd);
> 
> drv->done() actually is sd_done() mentioned above.
> 
> sd_done() tries to figure how many sectors it actually got for serious
> errors.  I don't feel signaling "medium error" for the first sector
> behind our limit just because we'd like to have smaller requests.
> 
>>         /*
>>          * USB may not give sense identifying bad sector and
>>          * simply return a residue instead, so subtract off the
>>          * residue if drv->done() error processing indicates no
>>          * change to the completion length.
>>          */
>>         if (good_bytes == old_good_bytes)
>>             good_bytes -= scsi_get_resid(cmd);
> 
> Poor mans bad sector identification.  Same issue as above IMHO.  On top
> of that I wouldn't expect all other guest OSes having the same quirk in
> there to handle usb disks.
> 
Sigh. Communication seems to be tricky here.

Please, just ignore scsi_finish_command(), it's not doing anything
useful here.
Do skip this and study the next snipped, scsi_io_completion():

drivers/scsi/scsi_lib.c:scsi_io_completion()

	/*
	 * A number of bytes were successfully read.  If there
	 * are leftovers and there is some kind of error
	 * (result != 0), retry the rest.
	 */
	if (scsi_end_request(cmd, error, good_bytes, result == 0) == NULL)
		return;

scsi_end_request is being called with the number of bytes _actually processed_,
which might be less than the number of bytes requested.
And the remainder will be retried by the upper layers.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26 14:27                                 ` Hannes Reinecke
@ 2009-11-26 14:37                                   ` Gerd Hoffmann
  2009-11-26 15:50                                     ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-26 14:37 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Paul Brook, qemu-devel

On 11/26/09 15:27, Hannes Reinecke wrote:
> Gerd Hoffmann wrote:
>> sd_done() tries to figure how many sectors it actually got for serious
>> errors.  I don't feel signaling "medium error" for the first sector
>> behind our limit just because we'd like to have smaller requests.

> scsi_end_request is being called with the number of bytes _actually processed_,
> which might be less than the number of bytes requested.
> And the remainder will be retried by the upper layers.

The "number of bytes _actually processed_" must come from somewhere. 
And that somewhere is sd_done() in case of scsi-disks.  See my point now?

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26 14:37                                   ` Gerd Hoffmann
@ 2009-11-26 15:50                                     ` Hannes Reinecke
  2009-11-27 11:08                                       ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2009-11-26 15:50 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Paul Brook, qemu-devel

Gerd Hoffmann wrote:
> On 11/26/09 15:27, Hannes Reinecke wrote:
>> Gerd Hoffmann wrote:
>>> sd_done() tries to figure how many sectors it actually got for serious
>>> errors.  I don't feel signaling "medium error" for the first sector
>>> behind our limit just because we'd like to have smaller requests.
> 
>> scsi_end_request is being called with the number of bytes _actually
>> processed_,
>> which might be less than the number of bytes requested.
>> And the remainder will be retried by the upper layers.
> 
> The "number of bytes _actually processed_" must come from somewhere. And
> that somewhere is sd_done() in case of scsi-disks.  See my point now?
> 
Ah. Now. I concur.

So indeed, this approach would only work if we signal some sense code
back to the host.
I, OTOH, don't have any qualms with returning HARDWARE_ERROR,
0x26/0x08(TOO MANY SEGMENT DESCRIPTORS) resp 0x26h/0x0B (INLINE DATA LENGTH EXCEEDED).
Feels only fair to notify the guest it has done something wrong.

The alternative would be to split the I/O into smaller sections.
But this really feels a bit like an overkill.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-26 15:50                                     ` Hannes Reinecke
@ 2009-11-27 11:08                                       ` Gerd Hoffmann
  2009-12-02 13:47                                         ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-11-27 11:08 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Paul Brook, qemu-devel

On 11/26/09 16:50, Hannes Reinecke wrote:
> So indeed, this approach would only work if we signal some sense code
> back to the host.
> I, OTOH, don't have any qualms with returning HARDWARE_ERROR,
> 0x26/0x08(TOO MANY SEGMENT DESCRIPTORS) resp 0x26h/0x0B (INLINE DATA LENGTH EXCEEDED).
> Feels only fair to notify the guest it has done something wrong.

Also set the info field which linux uses to figure how many sectors it 
actually got.

Now I really need some more uptodate scsi specs.  Unfortunaly t10.org 
doesn't allow public access to the drafts any more.  Seems to be a 
recent change, and googeling found me tons of references to t10.org but 
not the documents itself.

Anyone has a source for me?

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-11-27 11:08                                       ` Gerd Hoffmann
@ 2009-12-02 13:47                                         ` Gerd Hoffmann
  2009-12-07  8:28                                           ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Gerd Hoffmann @ 2009-12-02 13:47 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Paul Brook, qemu-devel

On 11/27/09 12:08, Gerd Hoffmann wrote:
> On 11/26/09 16:50, Hannes Reinecke wrote:
>> So indeed, this approach would only work if we signal some sense code
>> back to the host.
>> I, OTOH, don't have any qualms with returning HARDWARE_ERROR,
>> 0x26/0x08(TOO MANY SEGMENT DESCRIPTORS) resp 0x26h/0x0B (INLINE DATA
>> LENGTH EXCEEDED).
>> Feels only fair to notify the guest it has done something wrong.
>
> Also set the info field which linux uses to figure how many sectors it
> actually got.

Hmm.  Well.  Seems to work out at least for linux, i.e. it figures it 
got a bunch of sectors and tries to continue.  Linux logs an I/O error. 
  Also I didn't try other guests (yet).

Using that as a way to limit scsi-disk request sizes probably isn't a 
good idea.  For scsi-generic that would be a improvement over the 
current situation though.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-12-02 13:47                                         ` Gerd Hoffmann
@ 2009-12-07  8:28                                           ` Hannes Reinecke
  2009-12-07  8:50                                             ` Gerd Hoffmann
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2009-12-07  8:28 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: Paul Brook, qemu-devel

Gerd Hoffmann wrote:
> On 11/27/09 12:08, Gerd Hoffmann wrote:
>> On 11/26/09 16:50, Hannes Reinecke wrote:
>>> So indeed, this approach would only work if we signal some sense code
>>> back to the host.
>>> I, OTOH, don't have any qualms with returning HARDWARE_ERROR,
>>> 0x26/0x08(TOO MANY SEGMENT DESCRIPTORS) resp 0x26h/0x0B (INLINE DATA
>>> LENGTH EXCEEDED).
>>> Feels only fair to notify the guest it has done something wrong.
>>
>> Also set the info field which linux uses to figure how many sectors it
>> actually got.
> 
> Hmm.  Well.  Seems to work out at least for linux, i.e. it figures it
> got a bunch of sectors and tries to continue.  Linux logs an I/O error.
>  Also I didn't try other guests (yet).
> 
> Using that as a way to limit scsi-disk request sizes probably isn't a
> good idea.  For scsi-generic that would be a improvement over the
> current situation though.
> 
Yes, quite.

But for scsi-disk we could always fallback to using bounce-buffers,
could we not?
Provide we get a detailed enough error code, but this could be arranged
methinks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [Qemu-devel] [sneak preview] major scsi overhaul
  2009-12-07  8:28                                           ` Hannes Reinecke
@ 2009-12-07  8:50                                             ` Gerd Hoffmann
  0 siblings, 0 replies; 43+ messages in thread
From: Gerd Hoffmann @ 2009-12-07  8:50 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Paul Brook, qemu-devel

On 12/07/09 09:28, Hannes Reinecke wrote:
>> Hmm.  Well.  Seems to work out at least for linux, i.e. it figures it
>> got a bunch of sectors and tries to continue.  Linux logs an I/O error.
>>   Also I didn't try other guests (yet).
>>
>> Using that as a way to limit scsi-disk request sizes probably isn't a
>> good idea.  For scsi-generic that would be a improvement over the
>> current situation though.
>>
> Yes, quite.
>
> But for scsi-disk we could always fallback to using bounce-buffers,
> could we not?

We want limit the bounce buffer size though.  As there seems to be no 
easy way to make sure the guest doesn't submit requests larger than a 
certain limit I guess there is no way around splitting the request into 
pieces for the bounce buffer case.

We could add offset+size arguments to scsi_req_buf() to accomplish this.

cheers,
   Gerd


PS: FYI: I suspect I wouldn't find time this year to continue working on
     this seriously, especially on the time-consuming testing part. Top
     priority right now are finishing touches for the 0.12 release.
     x-mas will be family time.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2009-12-07  8:51 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-06 23:09 [Qemu-devel] [sneak preview] major scsi overhaul Gerd Hoffmann
2009-11-07 15:22 ` Blue Swirl
2009-11-09  9:08   ` Gerd Hoffmann
2009-11-09 12:37     ` Avi Kivity
2009-11-09 13:03       ` Gerd Hoffmann
2009-11-09 13:17         ` Avi Kivity
2009-11-09 13:39           ` Gerd Hoffmann
2009-11-09 13:48             ` Avi Kivity
2009-11-09 20:38       ` Blue Swirl
2009-11-09 21:25         ` Gerd Hoffmann
2009-11-11  4:06 ` Paul Brook
2009-11-11  9:41   ` Gerd Hoffmann
2009-11-11 14:13     ` Paul Brook
2009-11-11 15:26       ` Gerd Hoffmann
2009-11-11 16:38         ` Paul Brook
2009-11-16 16:35           ` Gerd Hoffmann
2009-11-16 18:53             ` Paul Brook
2009-11-16 21:50               ` Gerd Hoffmann
2009-11-24 11:59               ` Gerd Hoffmann
2009-11-24 13:51                 ` Paul Brook
2009-11-25 16:37                   ` Gerd Hoffmann
2009-11-26  7:31                     ` Hannes Reinecke
2009-11-26  8:25                       ` Gerd Hoffmann
2009-11-26 10:57                         ` Hannes Reinecke
2009-11-26 11:04                           ` Gerd Hoffmann
2009-11-26 11:20                             ` Hannes Reinecke
2009-11-26 14:21                               ` Gerd Hoffmann
2009-11-26 14:27                                 ` Hannes Reinecke
2009-11-26 14:37                                   ` Gerd Hoffmann
2009-11-26 15:50                                     ` Hannes Reinecke
2009-11-27 11:08                                       ` Gerd Hoffmann
2009-12-02 13:47                                         ` Gerd Hoffmann
2009-12-07  8:28                                           ` Hannes Reinecke
2009-12-07  8:50                                             ` Gerd Hoffmann
2009-11-16 19:08             ` Ryan Harper
2009-11-16 20:40               ` Gerd Hoffmann
2009-11-16 21:45                 ` Ryan Harper
2009-11-11 11:21 ` [Qemu-devel] " Gerd Hoffmann
2009-11-11 11:52   ` Hannes Reinecke
2009-11-11 13:02     ` Gerd Hoffmann
2009-11-11 13:30       ` Hannes Reinecke
2009-11-11 14:37         ` Gerd Hoffmann
2009-11-12  9:54           ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.