[Qemu-devel] [RFC] virtio-fc: draft idea of virtual fibre channel HBA

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC] virtio-fc: draft idea of virtual fibre channel HBA
@ 2017-02-15  7:15 Lin Ma
  2017-02-15 15:33 ` Stefan Hajnoczi
  0 siblings, 1 reply; 15+ messages in thread
From: Lin Ma @ 2017-02-15  7:15 UTC (permalink / raw)
  To: qemu-devel; +Cc: mst, pbonzini, stefanha, Zhiqiang Zhou, hare

Hi all,

We know that libvirt can create fibre channels vHBA on host
based on npiv, and present the LUNs to guest.

I'd like to implement a virtual fibre channel HBA for qemu,
I havn't investigate it deeply yet. The idea presents a fc
vHBA in guest, interact with remote fc switch through npiv,
The LUNs will be recognized inside guest. I sent this email
here to see if you are in agreement with this approach and
hope to get some ideas/suggestions.

The frontend is based on virtio, say virtio-fc-pci; the backend
is based on npiv of physical fc hba on host.
The implementation of this virtual fc hba doesn't support Fc-al,
only supports Fabric. It wrappers scsi data info fc frames, then
forwards them to backend, sounds like scsi over fc.
(maybe I can re-use some of virtio-scsi code/idea to deal with scsi data)

The minimum invocation may look like:
qemu-system-x86_64 \
...... \
-object fibrechannel-backend,id=fcdev0,host=0000:81:00.0 \
-device virtio-fc-pci,id=vfc0,fc_backend=fcdev0,wwpn=1000000000000001,wwnn=1100000000000001 \
......

BTW, I have no idea how to make migration works:
How to deal with the BDF during migration?
How to deal with the Fabric ID during migration?

It's a draft idea, There are lots of related code I need to
investigate, Currently this is all thoughts I have.

Hello Paolo and Stefan, You are the authors of virtio-scsi,
and had some in-depth discuss about virtio-scsi in 2011
with Hannes, May I have your ideas/thoughts?

Thanks,
Lin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-02-15  7:15 [Qemu-devel] [RFC] virtio-fc: draft idea of virtual fibre channel HBA Lin Ma
@ 2017-02-15 15:33 ` Stefan Hajnoczi
  2017-02-16  7:16   ` [Qemu-devel] 答复： " Lin Ma
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2017-02-15 15:33 UTC (permalink / raw)
  To: Lin Ma
  Cc: qemu-devel, pbonzini, Zhiqiang Zhou, hare, stefanha, mst, Fam Zheng

[-- Attachment #1: Type: text/plain, Size: 2301 bytes --]

On Wed, Feb 15, 2017 at 12:15:02AM -0700, Lin Ma wrote:
> Hi all,
>  
> We know that libvirt can create fibre channels vHBA on host
> based on npiv, and present the LUNs to guest.
>  
> I'd like to implement a virtual fibre channel HBA for qemu,
> I havn't investigate it deeply yet. The idea presents a fc
> vHBA in guest, interact with remote fc switch through npiv,
> The LUNs will be recognized inside guest. I sent this email
> here to see if you are in agreement with this approach and
> hope to get some ideas/suggestions.
>  
> The frontend is based on virtio, say virtio-fc-pci; the backend
> is based on npiv of physical fc hba on host.
> The implementation of this virtual fc hba doesn't support Fc-al,
> only supports Fabric. It wrappers scsi data info fc frames, then
> forwards them to backend, sounds like scsi over fc.
> (maybe I can re-use some of virtio-scsi code/idea to deal with scsi data)
>  
> The minimum invocation may look like:
> qemu-system-x86_64 \
> ...... \
> -object fibrechannel-backend,id=fcdev0,host=0000:81:00.0 \
> -device virtio-fc-pci,id=vfc0,fc_backend=fcdev0,wwpn=1000000000000001,wwnn=1100000000000001 \
> ......
>  
> BTW, I have no idea how to make migration works:
> How to deal with the BDF during migration?
> How to deal with the Fabric ID during migration?
>  
> It's a draft idea, There are lots of related code I need to
> investigate, Currently this is all thoughts I have.
>  
> Hello Paolo and Stefan, You are the authors of virtio-scsi,
> and had some in-depth discuss about virtio-scsi in 2011
> with Hannes, May I have your ideas/thoughts?

I'm not sure it's necessary for the guest to have FC access.  Fam Zheng
and Paolo are working on virtio-scsi FC NPIV enhancements.  It should
make NPIV work better without adding a whole new FC device.

https://lkml.org/lkml/2017/1/16/439

The plan is:

1. libvirt listens to udev events on the host so it can add/remove
   QEMU SCSI LUNs.

2. virtio-scsi is extended to include a WWPN that the guest can see.

The guest doesn't do any FC fabric level stuff, it just does virtio-scsi
as usual.

It supports live migration by swapping between a pair of WWPNs across
migration.

What are the benefits of having FC access from the guest?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] 答复： Re:  [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-02-15 15:33 ` Stefan Hajnoczi
@ 2017-02-16  7:16   ` Lin Ma
  2017-02-16  8:39     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Lin Ma @ 2017-02-16  7:16 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Fam Zheng, mst, pbonzini, stefanha, Zhiqiang Zhou, hare



>>> Stefan Hajnoczi <stefanha@gmail.com> 2/15/2017 11:33 下午 >>>
>On Wed, Feb 15, 2017 at 12:15:02AM -0700, Lin Ma wrote:
>> Hi all,
>>  
>> We know that libvirt can create fibre channels vHBA on host
>> based on npiv, and present the LUNs to guest.
>>  
>> I'd like to implement a virtual fibre channel HBA for qemu,
>> I havn't investigate it deeply yet. The idea presents a fc
>> vHBA in guest, interact with remote fc switch through npiv,
>> The LUNs will be recognized inside guest. I sent this email
>> here to see if you are in agreement with this approach and
>> hope to get some ideas/suggestions.
>>  
>> The frontend is based on virtio, say virtio-fc-pci; the backend
>> is based on npiv of physical fc hba on host.
>> The implementation of this virtual fc hba doesn't support Fc-al,
>> only supports Fabric. It wrappers scsi data info fc frames, then
>> forwards them to backend, sounds like scsi over fc.
>> (maybe I can re-use some of virtio-scsi code/idea to deal with scsi data)
>>  
>> The minimum invocation may look like:
>> qemu-system-x86_64 \
>> ...... \
>> -object fibrechannel-backend,id=fcdev0,host=0000:81:00.0 \
>> -device virtio-fc-pci,id=vfc0,fc_backend=fcdev0,wwpn=1000000000000001,wwnn=1100000000000001 \
>> ......
>>  
>> BTW, I have no idea how to make migration works:
>> How to deal with the BDF during migration?
>> How to deal with the Fabric ID during migration?
>>  
>> It's a draft idea, There are lots of related code I need to
>> investigate, Currently this is all thoughts I have.
>>  
>> Hello Paolo and Stefan, You are the authors of virtio-scsi,
>> and had some in-depth discuss about virtio-scsi in 2011
>> with Hannes, May I have your ideas/thoughts?
>
>I'm not sure it's necessary for the guest to have FC access.  Fam Zheng
>and Paolo are working on virtio-scsi FC NPIV enhancements.  It should
>make NPIV work better without adding a whole new FC device.
>
>https://lkml.org/lkml/2017/1/16/439
>
>The plan is:
>
>1. libvirt listens to udev events on the host so it can add/remove
>   QEMU SCSI LUNs.
>
>2. virtio-scsi is extended to include a WWPN that the guest can see.
>
>The guest doesn't do any FC fabric level stuff, it just does virtio-scsi
>as usual.
>
>It supports live migration by swapping between a pair of WWPNs across
>migration.
OK, Thanks for the information. This way still makes qemu as a SCSI
TARGET, I'm not sure which way makes more sense, But it does solve
the live migration case.

>What are the benefits of having FC access from the guest?
Actually, I havn't dug it too much, Just thought that from virtualization's
perspective, when interact with FC storage, having complete FC access
from the guest is the way it should use.
Lin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-02-16  7:16   ` [Qemu-devel] 答复： " Lin Ma
@ 2017-02-16  8:39     ` Paolo Bonzini
  2017-02-16  9:02       ` Lin Ma
  2017-02-16  9:56       ` Hannes Reinecke
  0 siblings, 2 replies; 15+ messages in thread
From: Paolo Bonzini @ 2017-02-16  8:39 UTC (permalink / raw)
  To: Lin Ma, Stefan Hajnoczi
  Cc: Zhiqiang Zhou, Fam Zheng, mst, qemu-devel, hare, stefanha



On 16/02/2017 08:16, Lin Ma wrote:
>> What are the benefits of having FC access from the guest?
> 
> Actually, I havn't dug it too much, Just thought that from virtualization's
> perspective, when interact with FC storage, having complete FC access
> from the guest is the way it should use.

How much of this requires a completely new spec?  Could we get enough of
the benefit (e.g. the ability to issue rescans or LIPs on the host) by
extending virtio-scsi?

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-02-16  8:39     ` Paolo Bonzini
@ 2017-02-16  9:02       ` Lin Ma
  2017-02-16  9:56       ` Hannes Reinecke
  1 sibling, 0 replies; 15+ messages in thread
From: Lin Ma @ 2017-02-16  9:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, Fam Zheng, mst, stefanha, Zhiqiang Zhou, hare



>>> Paolo Bonzini <pbonzini@redhat.com> 2/16/2017 4:39 下午 >>>
>
>
>On 16/02/2017 08:16, Lin Ma wrote:
>>> What are the benefits of having FC access from the guest?
>> 
>> Actually, I havn't dug it too much, Just thought that from virtualization's
>> perspective, when interact with FC storage, having complete FC access
>> from the guest is the way it should use.
>
>How much of this requires a completely new spec?  Could we get enough of
>the benefit (e.g. the ability to issue rescans or LIPs on the host) by
>extending virtio-scsi?

I understand, It needs too much stuff for building such an all new device. 
Furthermore, from performance's perspective, Extending virtio-scsi way doesn't
involve in the fc fabric level, leave it to physical HBA to handle, may get better
performance.
 
Lin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-02-16  8:39     ` Paolo Bonzini
  2017-02-16  9:02       ` Lin Ma
@ 2017-02-16  9:56       ` Hannes Reinecke
  2017-02-22  8:19         ` Lin Ma
  1 sibling, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2017-02-16  9:56 UTC (permalink / raw)
  To: Paolo Bonzini, Lin Ma, Stefan Hajnoczi
  Cc: Zhiqiang Zhou, Fam Zheng, mst, qemu-devel, stefanha

On 02/16/2017 09:39 AM, Paolo Bonzini wrote:
> 
> 
> On 16/02/2017 08:16, Lin Ma wrote:
>>> What are the benefits of having FC access from the guest?
>>
>> Actually, I havn't dug it too much, Just thought that from virtualization's
>> perspective, when interact with FC storage, having complete FC access
>> from the guest is the way it should use.
> 
> How much of this requires a completely new spec?  Could we get enough of
> the benefit (e.g. the ability to issue rescans or LIPs on the host) by
> extending virtio-scsi?
> 
I guess I'd need to chime in with my favourite topic :-)

Initially I really would go with extending the virtio-scsi spec, as
'real' virtio-fc suffers from some issues:
While it's of course possible to implement a full fc stack in qemu
itself, it's not easily possible assign 'real' FC devices to the guest.
Problem here is that any virtio-fc would basically have to use the
standard FC frame format for virtio itself, but all 'real' FC HBAs
(excluding FCoE drivers) do _not_ allow access to the actual FC frames,
but rather present you with a 'cooked' version of them.
So if you were attempting to pass FC devices to the guest you would have
to implement yet-another conversion between raw FC frames and the
version the HBA would like.
And that doesn't even deal with the real complexity like generating
correct OXID/RXIDs etc.

So initially I would propose to update the virtio spec to include:
- Full 64bit LUNs
- A 64bit target port ID (well, _actually_ it should be a SCSI-compliant
  target port ID, but as this essentially is a free text field I'd
  restrict it to something sensible)
For full compability we'd also need a (64-bit) initiator ID, but that is
essentially a property of the virtio queue, so we don't need to transmit
it with every command; once during queue setup is enough.
And if we restrict virtio-scsi to point-to-point topology we can even
associate the target port ID with the virtio queue, making
implementation even easier.
I'm not sure if that is a good idea long-term, as one might want to
identify an NPIV host with a virtio device, in which case we're having a
1-M topology and that model won't work anymore.

To be precise:

I'd propose to update

struct virtio_scsi_config
with a field 'u8 initiator_id[8]'

and

struct virtio_scsi_req_cmd
with a field 'u8 target_id[8]'

and do away with the weird LUN remapping qemu has nowadays.

That should be enough to get proper NPIV passthrough working.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-02-16  9:56       ` Hannes Reinecke
@ 2017-02-22  8:19         ` Lin Ma
  2017-02-22  9:20           ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Lin Ma @ 2017-02-22  8:19 UTC (permalink / raw)
  To: Stefan Hajnoczi, Paolo Bonzini, Hannes Reinecke
  Cc: qemu-devel, Fam Zheng, mst, stefanha, Zhiqiang Zhou

Hi Hannes,

>>> Hannes Reinecke <hare@suse.de> 2017/2/16 星期四 下午 5:56 >>>
>On 02/16/2017 09:39 AM, Paolo Bonzini wrote:
>> 
>> 
>> On 16/02/2017 08:16, Lin Ma wrote:
>>>> What are the benefits of having FC access from the guest?
>>>
>>> Actually, I havn't dug it too much, Just thought that from virtualization's
>>> perspective, when interact with FC storage, having complete FC access
>>> from the guest is the way it should use.
>> 
>> How much of this requires a completely new spec?  Could we get enough of
>> the benefit (e.g. the ability to issue rescans or LIPs on the host) by
>> extending virtio-scsi?
>> 
>I guess I'd need to chime in with my favourite topic :-)
>
>Initially I really would go with extending the virtio-scsi spec, as
>'real' virtio-fc suffers from some issues:
>While it's of course possible to implement a full fc stack in qemu
>itself, it's not easily possible assign 'real' FC devices to the guest.
>Problem here is that any virtio-fc would basically have to use the
>standard FC frame format for virtio itself, but all 'real' FC HBAs
>(excluding FCoE drivers) do _not_ allow access to the actual FC frames,
>but rather present you with a 'cooked' version of them.
>So if you were attempting to pass FC devices to the guest you would have
>to implement yet-another conversion between raw FC frames and the
>version the HBA would like.
>And that doesn't even deal with the real complexity like generating
>correct OXID/RXIDs etc.
>
>So initially I would propose to update the virtio spec to include:
>- Full 64bit LUNs
>- A 64bit target port ID (well, _actually_ it should be a SCSI-compliant
>  target port ID, but as this essentially is a free text field I'd
>  restrict it to something sensible)
>For full compability we'd also need a (64-bit) initiator ID, but that is
>essentially a property of the virtio queue, so we don't need to transmit
>it with every command; once during queue setup is enough.
>And if we restrict virtio-scsi to point-to-point topology we can even
>associate the target port ID with the virtio queue, making
>implementation even easier.
>I'm not sure if that is a good idea long-term, as one might want to
>identify an NPIV host with a virtio device, in which case we're having a
>1-M topology and that model won't work anymore.
>
>To be precise:
>
>I'd propose to update
>
>struct virtio_scsi_config
>with a field 'u8 initiator_id[8]'
>
>and
>
>struct virtio_scsi_req_cmd
>with a field 'u8 target_id[8]'
>
>and do away with the weird LUN remapping qemu has nowadays.
Does it mean we dont need to provide '-drive' and '-device scsi-hd'
option in qemu command line? so the guest can get its own LUNs
through fc switch, right?

>That should be enough to get proper NPIV passthrough working.

Lin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-02-22  8:19         ` Lin Ma
@ 2017-02-22  9:20           ` Hannes Reinecke
  2017-05-15 17:21             ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2017-02-22  9:20 UTC (permalink / raw)
  To: Lin Ma, Stefan Hajnoczi, Paolo Bonzini
  Cc: qemu-devel, Fam Zheng, mst, stefanha, Zhiqiang Zhou

On 02/22/2017 09:19 AM, Lin Ma wrote:
> Hi Hannes,
> 
>>>> Hannes Reinecke <hare@suse.de> 2017/2/16 星期四 下午 5:56 >>>
>>On 02/16/2017 09:39 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 16/02/2017 08:16, Lin Ma wrote:
>>>>> What are the benefits of having FC access from the guest?
>>>>
>>>> Actually, I havn't dug it too much, Just thought that from
> virtualization's
>>>> perspective, when interact with FC storage, having complete FC access
>>>> from the guest is the way it should use.
>>>
>>> How much of this requires a completely new spec?  Could we get enough of
>>> the benefit (e.g. the ability to issue rescans or LIPs on the host) by
>>> extending virtio-scsi?
>>>
>> I guess I'd need to chime in with my favourite topic :-)
>>
>> Initially I really would go with extending the virtio-scsi spec, as
>> 'real' virtio-fc suffers from some issues:
>> While it's of course possible to implement a full fc stack in qemu
>> itself, it's not easily possible assign 'real' FC devices to the guest.
>> Problem here is that any virtio-fc would basically have to use the
>> standard FC frame format for virtio itself, but all 'real' FC HBAs
>> (excluding FCoE drivers) do _not_ allow access to the actual FC frames,
>> but rather present you with a 'cooked' version of them.
>> So if you were attempting to pass FC devices to the guest you would have
>> to implement yet-another conversion between raw FC frames and the
>> version the HBA would like.
>> And that doesn't even deal with the real complexity like generating
>> correct OXID/RXIDs etc.
>> 
>> So initially I would propose to update the virtio spec to include:
>> - Full 64bit LUNs
>> - A 64bit target port ID (well, _actually_ it should be a SCSI-compliant
>>   target port ID, but as this essentially is a free text field I'd
>>   restrict it to something sensible)
>> For full compability we'd also need a (64-bit) initiator ID, but that is
>> essentially a property of the virtio queue, so we don't need to transmit
>> it with every command; once during queue setup is enough.
>> And if we restrict virtio-scsi to point-to-point topology we can even
>> associate the target port ID with the virtio queue, making
>> implementation even easier.
>> I'm not sure if that is a good idea long-term, as one might want to
>> identify an NPIV host with a virtio device, in which case we're having a
>> 1-M topology and that model won't work anymore.
>> 
>> To be precise:
>> 
>> I'd propose to update
>> 
>> struct virtio_scsi_config
>> with a field 'u8 initiator_id[8]'
>> 
>> and
>> 
>> struct virtio_scsi_req_cmd
>> with a field 'u8 target_id[8]'
>> 
>> and do away with the weird LUN remapping qemu has nowadays.
> Does it mean we dont need to provide '-drive' and '-device scsi-hd'
> option in qemu command line? so the guest can get its own LUNs
> through fc switch, right?
> 
No, you still would need that (at least initially).
But with the modifications above we can add tooling around qemu to
establish the correct (host) device mappings.
Without it we
a) have no idea from the host side which devices should be attached to
any given guest
b) have no idea from the guest side what the initiator and target IDs
are; which will get _really_ tricky if someone decides to use persistent
reservations from within the guest...

For handling NPIV proper we would need to update qemu
a) locate the NPIV host based on the initiator ID from the guest
b) stop exposing the devices attached to that NPIV host to the guest
c) establish a 'rescan' routine to capture any state changes (LUN
remapping etc) of the NPIV host.

But having the initiator and target IDs in virtio scsi is a mandatory
step before we can attempt anything else.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-02-22  9:20           ` Hannes Reinecke
@ 2017-05-15 17:21             ` Paolo Bonzini
  2017-05-16  6:34               ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-05-15 17:21 UTC (permalink / raw)
  To: Hannes Reinecke, Lin Ma, Stefan Hajnoczi
  Cc: qemu-devel, Fam Zheng, mst, stefanha, Zhiqiang Zhou

Thread necromancy after doing my homework and studying a bunch of specs...

>>> I'd propose to update
>>>
>>> struct virtio_scsi_config
>>> with a field 'u8 initiator_id[8]'
>>>
>>> and
>>>
>>> struct virtio_scsi_req_cmd
>>> with a field 'u8 target_id[8]'
>>>
>>> and do away with the weird LUN remapping qemu has nowadays.
>> Does it mean we dont need to provide '-drive' and '-device scsi-hd'
>> option in qemu command line? so the guest can get its own LUNs
>> through fc switch, right?
>
> No, you still would need that (at least initially).
> But with the modifications above we can add tooling around qemu to
> establish the correct (host) device mappings.
> Without it we
> a) have no idea from the host side which devices should be attached to
> any given guest
> b) have no idea from the guest side what the initiator and target IDs
> are; which will get _really_ tricky if someone decides to use persistent
> reservations from within the guest...
> 
> For handling NPIV proper we would need to update qemu
> a) locate the NPIV host based on the initiator ID from the guest

1) How would the initiator ID (8 bytes) relate to the WWNN/WWPN (2*8
bytes) on the host?  Likewise for the target ID which, as I understand
it, matches the rport's WWNN/WWPN in Linux's FC transport.

2) If the initiator ID is the moral equivalent of a MAC address,
shouldn't it be the host that provides the initiator ID to the host in
the virtio-scsi config space?  (From your proposal, I'd guess it's the
latter, but maybe I am not reading correctly).

3) An initiator ID in virtio-scsi config space is orthogonal to an
initiator IDs in the request.  The former is host->guest, the latter is
guest->host and can be useful to support virtual (nested) NPIV.

> b) stop exposing the devices attached to that NPIV host to the guest

What do you mean exactly?

> c) establish a 'rescan' routine to capture any state changes (LUN
> remapping etc) of the NPIV host.

You'd also need "target add" and "target removed" events.  At this
point, this looks a lot less virtio-scsi and a lot more like virtio-fc
(with a 'cooked' FCP-based format of its own).

At this point, I can think of several ways  to do this, one being SG_IO
in QEMU while the other are more exoteric.

1) use virtio-scsi with userspace passthrough (current solution).

Advantages:
- guests can be stopped/restarted across hosts with different HBAs
- completely oblivious to host HBA driver
- no new guest drivers are needed (well, almost due to above issues)
- out-of-the-box support for live migration, albeit with hacks required
such as Hyper-V's two WWNN/WWPN pairs

Disadvantages:
- no full FCP support
- guest devices exposed as /dev nodes to the host

2) the exact opposite: use the recently added "mediated device
passthrough" (mdev) framework to present a "fake" PCI device to the
guest.  mdev is currently used for vGPU and will also be used by s390
for CCW passthrough.  It lets the host driver take care of device
emulation, and the result is similar to an SR-IOV virtual function but
without requiring SR-IOV in the host.  The PCI device would presumably
reuse in the guest the same driver as the host.

Advantages:
- no new guest drivers are needed
- solution confined entirely within the host driver
- each driver can use its own native 'cooked' format for FC frames

Disadvantages:
- specific to each HBA driver
- guests cannot be stopped/restarted across hosts with different HBAs
- it's still device passthrough, so live migration is a mess (and would
require guest-specific code in QEMU)

3) handle passthrough with a kernel driver.  Under this model, the guest
uses the virtio device, but the passthrough of commands and TMFs is
performed by the host driver.  The host driver grows the option to
present an NPIV vport through a vhost interface (*not* the same thing as
LIO's vhost-scsi target, but a similar API with a different /dev node or
even one node per scsi_host).

We can then choose whether to do it with virtio-scsi or with a new
virtio-fc.

Advantages:
- guests can be stopped/restarted across hosts with different HBAs
- no need to do the "two WWNN/WWPN pairs" hack for live migration,
unlike e.g. Hyper-V
- a bit Rube Goldberg, but the vhost interface can be consumed by any
userspace program, not just by virtual machines

Disadvantages:
- requires a new generalized vhost-scsi (or vhost-fc) layer
- not sure about support for live migration (what to do about in-flight
commands?)

I don't know the Linux code well enough to know if it would require code
specific to each HBA driver.  Maybe just some refactoring.

4) same as (3), but in userspace with a "macvtap" like layer (e.g.,
socket+bind creates an NPIV vport).  This layer can work on some kind of
FCP encapsulation, not the raw thing, and virtio-fc could be designed
according to a similar format for simplicity.

Advantages:
- less dependencies on kernel code
- simplest for live migration
- most flexible for userspace usage

Disadvantages:
- possibly two packs of cats to herd (SCSI + networking)?
- haven't thought much about it, so I'm not sure about the feasibility

Again, I don't know the Linux code well enough to know if it would
require code specific to each HBA driver.

If we can get the hardware manufacturers (and the SCSI maintainers...)
on board, (3) would probably be pretty easy to achieve, even accounting
for the extra complication of writing a virtio-fc specification.  Really
just one hardware manufacturer, the others would follow suit.

(2) would probably be what the manufacturers like best, but it would be
worse for lock in.  Or... they would like it best *because* it would be
worse for lock in.

The main disadvantage of (2)/(3) against (1) is more complex testing.  I
guess we can add a vhost-fc target for testing to LIO, so as not to
require an FC card for guest development.  And if it is still a problem
'cause configfs requires root, we can add a fake FC target in QEMU.

Any opinions?  Does the above even make sense?

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-05-15 17:21             ` Paolo Bonzini
@ 2017-05-16  6:34               ` Hannes Reinecke
  2017-05-16  8:19                 ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2017-05-16  6:34 UTC (permalink / raw)
  To: Paolo Bonzini, Lin Ma, Stefan Hajnoczi
  Cc: qemu-devel, Fam Zheng, mst, stefanha, Zhiqiang Zhou

On 05/15/2017 07:21 PM, Paolo Bonzini wrote:
> Thread necromancy after doing my homework and studying a bunch of specs...
> 
>>>> I'd propose to update
>>>>
>>>> struct virtio_scsi_config
>>>> with a field 'u8 initiator_id[8]'
>>>>
>>>> and
>>>>
>>>> struct virtio_scsi_req_cmd
>>>> with a field 'u8 target_id[8]'
>>>>
>>>> and do away with the weird LUN remapping qemu has nowadays.
>>> Does it mean we dont need to provide '-drive' and '-device scsi-hd'
>>> option in qemu command line? so the guest can get its own LUNs
>>> through fc switch, right?
>>
>> No, you still would need that (at least initially).
>> But with the modifications above we can add tooling around qemu to
>> establish the correct (host) device mappings.
>> Without it we
>> a) have no idea from the host side which devices should be attached to
>> any given guest
>> b) have no idea from the guest side what the initiator and target IDs
>> are; which will get _really_ tricky if someone decides to use persistent
>> reservations from within the guest...
>>
>> For handling NPIV proper we would need to update qemu
>> a) locate the NPIV host based on the initiator ID from the guest
> 
> 1) How would the initiator ID (8 bytes) relate to the WWNN/WWPN (2*8
> bytes) on the host?  Likewise for the target ID which, as I understand
> it, matches the rport's WWNN/WWPN in Linux's FC transport.
> 
Actually, there's no need to keep WWNN and WWPN separate. The original
idea was to have a WWNN (world-wide node name) to refer to the system,
and the WWPN (world-wide port name) to refer to the FC port.
But as most FC cards are standalone each card will have a unique WWNN,
and a unique WWPN per port.
So if the card only has one port, it'll have one WWNN and one WWPN.
And in basically all instances the one is derived from the other.
Plus SAM only knows about a single initiator identifier; it's a FC
pecularity that it has _two_.

So indeed, it might be better to keep it in a broader sense.

Maybe a union with an overall size of 256 byte (to hold the iSCSI iqn
string), which for FC carries the WWPN and the WWNN?


> 2) If the initiator ID is the moral equivalent of a MAC address,
> shouldn't it be the host that provides the initiator ID to the host in
> the virtio-scsi config space?  (From your proposal, I'd guess it's the
> latter, but maybe I am not reading correctly).
> 
That would be dependent on the emulation. For emulated SCSI disk I guess
we need to specify it in the commandline somewhere, but for scsi
passthrough we could grab it from the underlying device.

> 3) An initiator ID in virtio-scsi config space is orthogonal to an
> initiator IDs in the request.  The former is host->guest, the latter is
> guest->host and can be useful to support virtual (nested) NPIV.
> 
I don't think so. My idea is to have the initiator ID tied to the virtio
queue, so it wouldn't really matter _who_ sets the ID.
On the host we would use the (guest) initiator ID to establish the
connection between the virtio queue and the underlying device, be it a
qemu block device or a 'real' host block device.


>> b) stop exposing the devices attached to that NPIV host to the guest
> 
> What do you mean exactly?
> 
That's one of the longer term plans I have.
When doing NPIV currently all devices from the NPIV host appear on the
host. Including all partitions, LVM devices and what not.
This can lead to unwanted side-effects (systemd helpfully enable the
swap device on a partition for the host, when the actual block device is
being passed through to a guest ...). So ideally I would _not_ parse any
partitition or metadata on those devices. But that requires an a-priory
knowledge like "That device whose number I don't know and whose identity
is unknown but which I'm sure will appear shortly is going to be
forwarded to a guest."
If we make the (guest) initiator ID identical to the NPIV WWPN we can
tag the _host_ to not expose any partitions on any LUNs, making the
above quite easy.

>> c) establish a 'rescan' routine to capture any state changes (LUN
>> remapping etc) of the NPIV host.
> 
> You'd also need "target add" and "target removed" events.  At this
> point, this looks a lot less virtio-scsi and a lot more like virtio-fc
> (with a 'cooked' FCP-based format of its own).
> 
Yeah, that's a long shot indeed.

> At this point, I can think of several ways  to do this, one being SG_IO
> in QEMU while the other are more exoteric.
> 
> 1) use virtio-scsi with userspace passthrough (current solution).
> 
> Advantages:
> - guests can be stopped/restarted across hosts with different HBAs
> - completely oblivious to host HBA driver
> - no new guest drivers are needed (well, almost due to above issues)
> - out-of-the-box support for live migration, albeit with hacks required
> such as Hyper-V's two WWNN/WWPN pairs
> 
> Disadvantages:
> - no full FCP support
> - guest devices exposed as /dev nodes to the host
> 
> 
> 2) the exact opposite: use the recently added "mediated device
> passthrough" (mdev) framework to present a "fake" PCI device to the
> guest.  mdev is currently used for vGPU and will also be used by s390
> for CCW passthrough.  It lets the host driver take care of device
> emulation, and the result is similar to an SR-IOV virtual function but
> without requiring SR-IOV in the host.  The PCI device would presumably
> reuse in the guest the same driver as the host.
> 
> Advantages:
> - no new guest drivers are needed
> - solution confined entirely within the host driver
> - each driver can use its own native 'cooked' format for FC frames
> 
> Disadvantages:
> - specific to each HBA driver
> - guests cannot be stopped/restarted across hosts with different HBAs
> - it's still device passthrough, so live migration is a mess (and would
> require guest-specific code in QEMU)
> 
> 
> 3) handle passthrough with a kernel driver.  Under this model, the guest
> uses the virtio device, but the passthrough of commands and TMFs is
> performed by the host driver.  The host driver grows the option to
> present an NPIV vport through a vhost interface (*not* the same thing as
> LIO's vhost-scsi target, but a similar API with a different /dev node or
> even one node per scsi_host).
> 
> We can then choose whether to do it with virtio-scsi or with a new
> virtio-fc.
> 
> Advantages:
> - guests can be stopped/restarted across hosts with different HBAs
> - no need to do the "two WWNN/WWPN pairs" hack for live migration,
> unlike e.g. Hyper-V
> - a bit Rube Goldberg, but the vhost interface can be consumed by any
> userspace program, not just by virtual machines
> 
> Disadvantages:
> - requires a new generalized vhost-scsi (or vhost-fc) layer
> - not sure about support for live migration (what to do about in-flight
> commands?)
> 
> I don't know the Linux code well enough to know if it would require code
> specific to each HBA driver.  Maybe just some refactoring.
> 
> 
> 4) same as (3), but in userspace with a "macvtap" like layer (e.g.,
> socket+bind creates an NPIV vport).  This layer can work on some kind of
> FCP encapsulation, not the raw thing, and virtio-fc could be designed
> according to a similar format for simplicity.
> 
> Advantages:
> - less dependencies on kernel code
> - simplest for live migration
> - most flexible for userspace usage
> 
> Disadvantages:
> - possibly two packs of cats to herd (SCSI + networking)?
> - haven't thought much about it, so I'm not sure about the feasibility
> 
> Again, I don't know the Linux code well enough to know if it would
> require code specific to each HBA driver.
> 
> 
> If we can get the hardware manufacturers (and the SCSI maintainers...)
> on board, (3) would probably be pretty easy to achieve, even accounting
> for the extra complication of writing a virtio-fc specification.  Really
> just one hardware manufacturer, the others would follow suit.
> 
With option (1) and the target/initiator ID extensions we should be able
to get basic NPIV support to work, and would even be able to handle
reservations in a sane manner.

(4) would require raw FCP frame access, which is one thing we do _not_
have. Each card (except for the pure FCoE ones like bnx2fc, fnic, and
fcoe) only allows access to pre-formatted I/O commands. And has it's own
mechanism for generatind sequence IDs etc. So anything requiring raw FCP
access is basically out of the game.

(3) would be feasible, as it would effectively mean 'just' to update the
current NPIV mechanism. However, this would essentially lock us in for
FC; any other types (think NVMe) will require yet another solution.

(2) sounds interesting, but I'd have to have a look into the code to
figure out if it could easily be done.

> (2) would probably be what the manufacturers like best, but it would be
> worse for lock in.  Or... they would like it best *because* it would be
> worse for lock in.
> 
> The main disadvantage of (2)/(3) against (1) is more complex testing.  I
> guess we can add a vhost-fc target for testing to LIO, so as not to
> require an FC card for guest development.  And if it is still a problem
> 'cause configfs requires root, we can add a fake FC target in QEMU.
> 
Overall, I would vote to specify a new virtio scsi format _first_,
keeping in mind all of these options.
(1), (3), and (4) all require an update anyway :-)

The big advantage I see with (1) is that it can be added with just some
code changes to qemu and virtio-scsi. Every other option require some
vendor buy-in, which inevitably leads to more discussions, delays, and
more complex interaction (changes to qemu, virtio, _and_ the affected HBAs).

While we're at it: We also need a 'timeout' field to the virtion request
structure. I even posted an RFC for it :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-05-16  6:34               ` Hannes Reinecke
@ 2017-05-16  8:19                 ` Paolo Bonzini
  2017-05-16 15:22                   ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-05-16  8:19 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Lin Ma, Stefan Hajnoczi, Zhiqiang Zhou, Fam Zheng, qemu-devel,
	stefanha, mst

> Maybe a union with an overall size of 256 byte (to hold the iSCSI iqn
> string), which for FC carries the WWPN and the WWNN?

That depends on how you would like to do controller passthrough in
general.  iSCSI doesn't have the 64-bit target ID, and doesn't have
(AFAIK) hot-plug/hot-unplug support, so it's less important than FC.

> > 2) If the initiator ID is the moral equivalent of a MAC address,
> > shouldn't it be the host that provides the initiator ID to the host in
> > the virtio-scsi config space?  (From your proposal, I'd guess it's the
> > latter, but maybe I am not reading correctly).
> 
> That would be dependent on the emulation. For emulated SCSI disk I guess
> we need to specify it in the commandline somewhere, but for scsi
> passthrough we could grab it from the underlying device.

Wait, that would be the target ID.  The initiator ID would be the NPIV
vport's WWNN/WWPN.  It could be specified on the QEMU command line, or
it could be tied to some file descriptor (created and initialized by
libvirt, which has CAP_SYS_ADMIN, and then passed to QEMU; similar to
tap file descriptors).

> >> b) stop exposing the devices attached to that NPIV host to the guest
> > 
> > What do you mean exactly?
> > 
> That's one of the longer term plans I have.
> When doing NPIV currently all devices from the NPIV host appear on the
> host. Including all partitions, LVM devices and what not. [...]
> If we make the (guest) initiator ID identical to the NPIV WWPN we can
> tag the _host_ to not expose any partitions on any LUNs, making the
> above quite easy.

Yes, definitely.

> > At this point, I can think of several ways  to do this, one being SG_IO
> > in QEMU while the other are more exoteric.
> > 
> > 1) use virtio-scsi with userspace passthrough (current solution).
> 
> With option (1) and the target/initiator ID extensions we should be able
> to get basic NPIV support to work, and would even be able to handle
> reservations in a sane manner.

Agreed, but I'm not anymore that sure that the advantages outweigh the
disadvantages.  Also, let's add no FC-NVMe support to the disadvantages.

> > 2) the exact opposite: use the recently added "mediated device
> > passthrough" (mdev) framework to present a "fake" PCI device to the
> > guest.
> 
> (2) sounds interesting, but I'd have to have a look into the code to
> figure out if it could easily be done.

Not that easy, but it's the bread and butter of the hardware manufacturers.
If we want them to do it alone, (2) is the way.  Both nVidia and Intel are
using it.

> > 3) handle passthrough with a kernel driver.  Under this model, the guest
> > uses the virtio device, but the passthrough of commands and TMFs is
> > performed by the host driver.
> > 
> > We can then choose whether to do it with virtio-scsi or with a new
> > virtio-fc.
>
> (3) would be feasible, as it would effectively mean 'just' to update the
> current NPIV mechanism. However, this would essentially lock us in for
> FC; any other types (think NVMe) will require yet another solution.

An FC-NVMe driver could also expose the same vhost interface, couldn't it?
FC-NVMe doesn't have to share the Linux code; but sharing the virtio standard
and the userspace ABI would be great.

In fact, the main advantage of virtio-fc would be that (if we define it properly)
it could be reused for FC-NVMe instead of having to extend e.g. virtio-blk.
For example virtio-scsi has request, to-device payload, response, from-device
payload.  virtio-fc's request format could be the initiator and target port
identifiers, followed by FCP_CMD, to-device payload, FCP_RSP, from-device
payload.

> > 4) same as (3), but in userspace with a "macvtap" like layer (e.g.,
> > socket+bind creates an NPIV vport).  This layer can work on some kind of
> > FCP encapsulation, not the raw thing, and virtio-fc could be designed
> > according to a similar format for simplicity.
>
> (4) would require raw FCP frame access, which is one thing we do _not_
> have. Each card (except for the pure FCoE ones like bnx2fc, fnic, and
> fcoe) only allows access to pre-formatted I/O commands. And has it's own
> mechanism for generatind sequence IDs etc. So anything requiring raw FCP
> access is basically out of the game.

Not raw.  It could even be defined at the exchange level (plus some special
things for discovery and login services).  But I agree that (4) is a bit
pie-in-the-sky.

> Overall, I would vote to specify a new virtio scsi format _first_,
> keeping in mind all of these options.
> (1), (3), and (4) all require an update anyway :-)
> 
> The big advantage I see with (1) is that it can be added with just some
> code changes to qemu and virtio-scsi. Every other option require some
> vendor buy-in, which inevitably leads to more discussions, delays, and
> more complex interaction (changes to qemu, virtio, _and_ the affected HBAs).

I agree.  But if we have to reinvent everything in a couple years for
NVMe over fabrics, maybe it's not worth it.

> While we're at it: We also need a 'timeout' field to the virtion request
> structure. I even posted an RFC for it :-)

Yup, I've seen it. :)

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-05-16  8:19                 ` Paolo Bonzini
@ 2017-05-16 15:22                   ` Hannes Reinecke
  2017-05-16 16:22                     ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2017-05-16 15:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Lin Ma, Stefan Hajnoczi, Zhiqiang Zhou, Fam Zheng, qemu-devel,
	stefanha, mst

On 05/16/2017 10:19 AM, Paolo Bonzini wrote:
> 
>> Maybe a union with an overall size of 256 byte (to hold the iSCSI iqn
>> string), which for FC carries the WWPN and the WWNN?
> 
> That depends on how you would like to do controller passthrough in
> general.  iSCSI doesn't have the 64-bit target ID, and doesn't have
> (AFAIK) hot-plug/hot-unplug support, so it's less important than FC.
> 
iSCSI has its 'iqn' string, which is defined to be a 256-byte string.
Hence the number :-)
And if we're updating virtio anyway, we could as well update it to carry
_all_ possible scsi IDs.

>>> 2) If the initiator ID is the moral equivalent of a MAC address,
>>> shouldn't it be the host that provides the initiator ID to the host in
>>> the virtio-scsi config space?  (From your proposal, I'd guess it's the
>>> latter, but maybe I am not reading correctly).
>>
>> That would be dependent on the emulation. For emulated SCSI disk I guess
>> we need to specify it in the commandline somewhere, but for scsi
>> passthrough we could grab it from the underlying device.
> 
> Wait, that would be the target ID.  The initiator ID would be the NPIV
> vport's WWNN/WWPN.  It could be specified on the QEMU command line, or
> it could be tied to some file descriptor (created and initialized by
> libvirt, which has CAP_SYS_ADMIN, and then passed to QEMU; similar to
> tap file descriptors).
> 
No, I do mean the initiator ID.
If we allow qemu to specify an initiator ID, qemu could find the host
NPIV instance and expose that ID via virtio to the guest.
Or we could specify the NPIV host for qemu, and qemu does the magic
internally.

>>>> b) stop exposing the devices attached to that NPIV host to the guest
>>>
>>> What do you mean exactly?
>>>
>> That's one of the longer term plans I have.
>> When doing NPIV currently all devices from the NPIV host appear on the
>> host. Including all partitions, LVM devices and what not. [...]
>> If we make the (guest) initiator ID identical to the NPIV WWPN we can
>> tag the _host_ to not expose any partitions on any LUNs, making the
>> above quite easy.
> 
> Yes, definitely.
> 
>>> At this point, I can think of several ways  to do this, one being SG_IO
>>> in QEMU while the other are more exoteric.
>>>
>>> 1) use virtio-scsi with userspace passthrough (current solution).
>>
>> With option (1) and the target/initiator ID extensions we should be able
>> to get basic NPIV support to work, and would even be able to handle
>> reservations in a sane manner.
> 
> Agreed, but I'm not anymore that sure that the advantages outweigh the
> disadvantages.  Also, let's add no FC-NVMe support to the disadvantages.
> 
>>> 2) the exact opposite: use the recently added "mediated device
>>> passthrough" (mdev) framework to present a "fake" PCI device to the
>>> guest.
>>
>> (2) sounds interesting, but I'd have to have a look into the code to
>> figure out if it could easily be done.
> 
> Not that easy, but it's the bread and butter of the hardware manufacturers.
> If we want them to do it alone, (2) is the way.  Both nVidia and Intel are
> using it.
> 
>>> 3) handle passthrough with a kernel driver.  Under this model, the guest
>>> uses the virtio device, but the passthrough of commands and TMFs is
>>> performed by the host driver.
>>>
>>> We can then choose whether to do it with virtio-scsi or with a new
>>> virtio-fc.
>>
>> (3) would be feasible, as it would effectively mean 'just' to update the
>> current NPIV mechanism. However, this would essentially lock us in for
>> FC; any other types (think NVMe) will require yet another solution.
> 
> An FC-NVMe driver could also expose the same vhost interface, couldn't it?
> FC-NVMe doesn't have to share the Linux code; but sharing the virtio standard
> and the userspace ABI would be great.
> 
> In fact, the main advantage of virtio-fc would be that (if we define it properly)
> it could be reused for FC-NVMe instead of having to extend e.g. virtio-blk.
> For example virtio-scsi has request, to-device payload, response, from-device
> payload.  virtio-fc's request format could be the initiator and target port
> identifiers, followed by FCP_CMD, to-device payload, FCP_RSP, from-device
> payload.
> 
As already said: We do _not_ have access to the FCP frames.
So designing a virtio-fc protocol will only work for libfc-based HBAs,
namely fnic, bnx2fc, and fcoe.
Given that the future of FCoE is somewhat unclear I doubt it's a good
idea to restrict ourselves to that.

>>> 4) same as (3), but in userspace with a "macvtap" like layer (e.g.,
>>> socket+bind creates an NPIV vport).  This layer can work on some kind of
>>> FCP encapsulation, not the raw thing, and virtio-fc could be designed
>>> according to a similar format for simplicity.
>>
>> (4) would require raw FCP frame access, which is one thing we do _not_
>> have. Each card (except for the pure FCoE ones like bnx2fc, fnic, and
>> fcoe) only allows access to pre-formatted I/O commands. And has it's own
>> mechanism for generatind sequence IDs etc. So anything requiring raw FCP
>> access is basically out of the game.
> 
> Not raw.  It could even be defined at the exchange level (plus some special
> things for discovery and login services).  But I agree that (4) is a bit
> pie-in-the-sky.
> 
>> Overall, I would vote to specify a new virtio scsi format _first_,
>> keeping in mind all of these options.
>> (1), (3), and (4) all require an update anyway :-)
>>
>> The big advantage I see with (1) is that it can be added with just some
>> code changes to qemu and virtio-scsi. Every other option require some
>> vendor buy-in, which inevitably leads to more discussions, delays, and
>> more complex interaction (changes to qemu, virtio, _and_ the affected HBAs).
> 
> I agree.  But if we have to reinvent everything in a couple years for
> NVMe over fabrics, maybe it's not worth it.
> 
>> While we're at it: We also need a 'timeout' field to the virtion request
>> structure. I even posted an RFC for it :-)
> 
> Yup, I've seen it. :)
> 
Cool. Thanks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-05-16 15:22                   ` Hannes Reinecke
@ 2017-05-16 16:22                     ` Paolo Bonzini
  2017-05-17  6:01                       ` Hannes Reinecke
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2017-05-16 16:22 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Lin Ma, Stefan Hajnoczi, Zhiqiang Zhou, Fam Zheng, qemu-devel,
	stefanha, mst

Pruning to sort out the basic disagreements.

On 16/05/2017 17:22, Hannes Reinecke wrote:
>> That depends on how you would like to do controller passthrough in
>> general.  iSCSI doesn't have the 64-bit target ID, and doesn't have
>> (AFAIK) hot-plug/hot-unplug support, so it's less important than FC.
>>
> iSCSI has its 'iqn' string, which is defined to be a 256-byte string.
> Hence the number 
> And if we're updating virtio anyway, we could as well update it to carry
> _all_ possible scsi IDs.

Yes, but one iSCSI connection maps to one initiator and target IQN.
It's not like FC where each frame can specify its own initiator ID.

>>> (3) would be feasible, as it would effectively mean 'just' to update the
>>> current NPIV mechanism. However, this would essentially lock us in for
>>> FC; any other types (think NVMe) will require yet another solution.
>> An FC-NVMe driver could also expose the same vhost interface, couldn't it?
>> FC-NVMe doesn't have to share the Linux code; but sharing the virtio standard
>> and the userspace ABI would be great.
>>
>> In fact, the main advantage of virtio-fc would be that (if we define it properly)
>> it could be reused for FC-NVMe instead of having to extend e.g. virtio-blk.
>> For example virtio-scsi has request, to-device payload, response, from-device
>> payload.  virtio-fc's request format could be the initiator and target port
>> identifiers, followed by FCP_CMD, to-device payload, FCP_RSP, from-device
>> payload.
>>
> As already said: We do _not_ have access to the FCP frames.
> So designing a virtio-fc protocol will only work for libfc-based HBAs,
> namely fnic, bnx2fc, and fcoe.
> Given that the future of FCoE is somewhat unclear I doubt it's a good
> idea to restrict ourselves to that.

I understand that.  It doesn't have to be a 1:1 match with FCP frames or
even IUs; in fact the above minimal example is not (no OXID/RXID and no
FCP_XFER_RDY IU are just the first two things that come to mind).

The only purpose is to have a *transport* that is (roughly speaking)
flexible enough to support future NPIV usecases which will certainly
include FC-NVMe.  In other words: I'm inventing my own cooked FCP
format, but I might as well base it on FCP itself as much as possible.

Likewise, I'm not going to even mention ELS, but we would need _some_
kind of protocol to query name servers, receive state change
notifications, and get service parameters.  No idea yet how to do that,
probably something similar to virtio-scsi control and event queues, but
why not make the requests/responses look a little like PLOGI and PRLI?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-05-16 16:22                     ` Paolo Bonzini
@ 2017-05-17  6:01                       ` Hannes Reinecke
  2017-05-17  7:33                         ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Reinecke @ 2017-05-17  6:01 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Lin Ma, Stefan Hajnoczi, Zhiqiang Zhou, Fam Zheng, qemu-devel,
	stefanha, mst

On 05/16/2017 06:22 PM, Paolo Bonzini wrote:
> Pruning to sort out the basic disagreements.
> 
> On 16/05/2017 17:22, Hannes Reinecke wrote:
>>> That depends on how you would like to do controller passthrough in
>>> general.  iSCSI doesn't have the 64-bit target ID, and doesn't have
>>> (AFAIK) hot-plug/hot-unplug support, so it's less important than FC.
>>>
>> iSCSI has its 'iqn' string, which is defined to be a 256-byte string.
>> Hence the number 
>> And if we're updating virtio anyway, we could as well update it to carry
>> _all_ possible scsi IDs.
> 
> Yes, but one iSCSI connection maps to one initiator and target IQN.
> It's not like FC where each frame can specify its own initiator ID.
> 
Sure. But updating the format to hold _any_ SCSI Name would allow us to
reflect the actual initiator port name used by the host.
So the guest could be
>>>> (3) would be feasible, as it would effectively mean 'just' to update the
>>>> current NPIV mechanism. However, this would essentially lock us in for
>>>> FC; any other types (think NVMe) will require yet another solution.
>>> An FC-NVMe driver could also expose the same vhost interface, couldn't it?
>>> FC-NVMe doesn't have to share the Linux code; but sharing the virtio standard
>>> and the userspace ABI would be great.
>>>
>>> In fact, the main advantage of virtio-fc would be that (if we define it properly)
>>> it could be reused for FC-NVMe instead of having to extend e.g. virtio-blk.
>>> For example virtio-scsi has request, to-device payload, response, from-device
>>> payload.  virtio-fc's request format could be the initiator and target port
>>> identifiers, followed by FCP_CMD, to-device payload, FCP_RSP, from-device
>>> payload.
>>>
>> As already said: We do _not_ have access to the FCP frames.
>> So designing a virtio-fc protocol will only work for libfc-based HBAs,
>> namely fnic, bnx2fc, and fcoe.
>> Given that the future of FCoE is somewhat unclear I doubt it's a good
>> idea to restrict ourselves to that.
> 
> I understand that.  It doesn't have to be a 1:1 match with FCP frames or
> even IUs; in fact the above minimal example is not (no OXID/RXID and no
> FCP_XFER_RDY IU are just the first two things that come to mind).
> 
> The only purpose is to have a *transport* that is (roughly speaking)
> flexible enough to support future NPIV usecases which will certainly
> include FC-NVMe.  In other words: I'm inventing my own cooked FCP
> format, but I might as well base it on FCP itself as much as possible.
> 
Weeelll ... I don't want to go into nit-picking here, but FC-NVMe is
_NOT_ FCP. In fact, it's a different FC-4 provider with its own set of
FC-4 commands etc.

> Likewise, I'm not going to even mention ELS, but we would need _some_
> kind of protocol to query name servers, receive state change
> notifications, and get service parameters.  No idea yet how to do that,
> probably something similar to virtio-scsi control and event queues, but
> why not make the requests/responses look a little like PLOGI and PRLI?
> 
And my idea here is to keep virtio-scsi as the basic mode of (command)
transfer, but add a set of transport management commands which would
allow us to do things like:
- port discovery / scan
- port instantiation / login
- port reset
- transport link notification / status check
- transport reset

Those could be defined transport independently; and the neat thing is
they could even be made to work with the current NPIV implementation
with some tooling.
And we could define things such that all current transport protocols can
be mapped onto it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] 答复： Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
  2017-05-17  6:01                       ` Hannes Reinecke
@ 2017-05-17  7:33                         ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2017-05-17  7:33 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Lin Ma, Stefan Hajnoczi, Zhiqiang Zhou, Fam Zheng, qemu-devel,
	stefanha, mst



On 17/05/2017 08:01, Hannes Reinecke wrote:
> On 05/16/2017 06:22 PM, Paolo Bonzini wrote:
>> On 16/05/2017 17:22, Hannes Reinecke wrote:
>>> iSCSI has its 'iqn' string, which is defined to be a 256-byte string.
>>> Hence the number 
>>> And if we're updating virtio anyway, we could as well update it to carry
>>> _all_ possible scsi IDs.
>>
>> Yes, but one iSCSI connection maps to one initiator and target IQN.
>> It's not like FC where each frame can specify its own initiator ID.
>>
> Sure. But updating the format to hold _any_ SCSI Name would allow us to
> reflect the actual initiator port name used by the host.
> So the guest could be

... aware of it for things such as PERSISTENT RESERVE IN?

>>>>> (3) would be feasible, as it would effectively mean 'just' to update the
>>>>> current NPIV mechanism. However, this would essentially lock us in for
>>>>> FC; any other types (think NVMe) will require yet another solution.
>>>> An FC-NVMe driver could also expose the same vhost interface, couldn't it?
>>>> FC-NVMe doesn't have to share the Linux code; but sharing the virtio standard
>>>> and the userspace ABI would be great.
>>>>
>>>> In fact, the main advantage of virtio-fc would be that (if we define it properly)
>>>> it could be reused for FC-NVMe instead of having to extend e.g. virtio-blk.
>>>> For example virtio-scsi has request, to-device payload, response, from-device
>>>> payload.  virtio-fc's request format could be the initiator and target port
>>>> identifiers, followed by FCP_CMD, to-device payload, FCP_RSP, from-device
>>>> payload.
>>>>
>>> As already said: We do _not_ have access to the FCP frames.
>>> So designing a virtio-fc protocol will only work for libfc-based HBAs,
>>> namely fnic, bnx2fc, and fcoe.
>>> Given that the future of FCoE is somewhat unclear I doubt it's a good
>>> idea to restrict ourselves to that.
>>
>> I understand that.  It doesn't have to be a 1:1 match with FCP frames or
>> even IUs; in fact the above minimal example is not (no OXID/RXID and no
>> FCP_XFER_RDY IU are just the first two things that come to mind).
>>
>> The only purpose is to have a *transport* that is (roughly speaking)
>> flexible enough to support future NPIV usecases which will certainly
>> include FC-NVMe.  In other words: I'm inventing my own cooked FCP
>> format, but I might as well base it on FCP itself as much as possible.
>
> Weeelll ... I don't want to go into nit-picking here, but FC-NVMe is
> _NOT_ FCP. In fact, it's a different FC-4 provider with its own set of
> FC-4 commands etc.

Yes, but it reuses the IU format and the overall look of the exchange.
It's not FCP, but it looks and quacks very much like it AFAIU.

>> Likewise, I'm not going to even mention ELS, but we would need _some_
>> kind of protocol to query name servers, receive state change
>> notifications, and get service parameters.  No idea yet how to do that,
>> probably something similar to virtio-scsi control and event queues, but
>> why not make the requests/responses look a little like PLOGI and PRLI?
>>
> And my idea here is to keep virtio-scsi as the basic mode of (command)
> transfer, but add a set of transport management commands which would
> allow us to do things like:
> - port discovery / scan
> - port instantiation / login
> - port reset
> - transport link notification / status check
> - transport reset
> 
> Those could be defined transport independently; and the neat thing is
> they could even be made to work with the current NPIV implementation
> with some tooling.
> And we could define things such that all current transport protocols can
> be mapped onto it.

Okay, got it.  So some kind of virtio-scsi 2.0.  I think we should weigh
the two proposals.  Would yours be useful for anything except NPIV (e.g.
the iSCSI + persistent reservations case)?  What software would use it?
And please speak up loudly if I'm completely mistaken about FC-NVMe!

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-05-17  7:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-15  7:15 [Qemu-devel] [RFC] virtio-fc: draft idea of virtual fibre channel HBA Lin Ma
2017-02-15 15:33 ` Stefan Hajnoczi
2017-02-16  7:16   ` [Qemu-devel] 答复： " Lin Ma
2017-02-16  8:39     ` Paolo Bonzini
2017-02-16  9:02       ` Lin Ma
2017-02-16  9:56       ` Hannes Reinecke
2017-02-22  8:19         ` Lin Ma
2017-02-22  9:20           ` Hannes Reinecke
2017-05-15 17:21             ` Paolo Bonzini
2017-05-16  6:34               ` Hannes Reinecke
2017-05-16  8:19                 ` Paolo Bonzini
2017-05-16 15:22                   ` Hannes Reinecke
2017-05-16 16:22                     ` Paolo Bonzini
2017-05-17  6:01                       ` Hannes Reinecke
2017-05-17  7:33                         ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.