All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPDK] virtio-vhost-user with virtio-scsi: end-to-end setup
@ 2018-09-20 22:06 Nikos Dragazis
  2018-09-21  7:31 ` Wodkowski, PawelX
  0 siblings, 1 reply; 4+ messages in thread
From: Nikos Dragazis @ 2018-09-20 22:06 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10107 bytes --]

Dear Stefan,

I hope you are the right person to contact about this, please point me
to the right direction otherwise. I am also Cc:ing the SPDK and
qemu-devel mailing lists, to solicit community feedback.

As part of my internship at Arrikto, I have spent the last few months
working on the SPDK vhost target application. I was triggered by the
“VirtioVhostUser” feature you proposed for QEMU
[https://wiki.qemu.org/Features/VirtioVhostUser] and made my end goal to
have an end-to-end system running, where a slave VM offers storage to a
master VM over vhost-user, and exposes an underlying SCSI block device
underneath. My current approach is to use virtio-scsi-based storage
inside the slave VM.

I see that you have managed to move the vhost-user backend inside a VM
over a virtio-vhost-user transport. I have experimented with running the
SPDK vhost app over vhost-user, but have run with quite a few problems
with the virtio-pci driver. Apologies in advance for the rather lengthy
email, I would definitely value any short-term hints you may have, as
well as any longer-term feedback you may offer on my general direction.

My current state is:

I started with your DPDK code at
https://github.com/stefanha/dpdk/tree/virtio-vhost-user, and read about
your effort to integrate the DPDK vhost-scsi application with
virtio-vhost-user, here:
http://mails.dpdk.org/archives/dev/2018-January/088155.html

My initial approach was to replicate your work, but with the SPDK vhost
library running over virtio-vhost-user. I have pushed all of my code in
the following repository, it is still a WIP and I really need to tidy up
the commits:

https://bitbucket.org/ndragazis/spdk.git

Hacks I had to do:
- I use the modified script usertools/dpdk-devbind.py found in your DPDK
  repository here: https://github.com/stefanha/dpdk to bind the
  virtio-vhost-user device to the vfio-pci kernel driver. The SPDK setup
  script in scripts/setup.sh does not handle unclassified devices like
  the virtio-vhost-user device. I plan to fix this later.
- I pass the PCI address of the virtio-vhost-user device to the vhost
  library, by repurposing the existing -S option; it no longer refers to
  the UNIX socket, as in the case of the UNIX transport. This means the
  virtio-vhost-user transport is hardcoded and not configurable by the
  user.  I plan to fix this later.
- I copied your code that implements the virtio-vhost-user transport and
  made the necessary changes to abstract the transport implementation.
  I also copied the virtio-pci code from DPDK rte_vhost into the SPDK
  vhost library, so the virtio-vhost-user driver could use it. I saw
  this is what you did as a quick hack to make the DPDK vhost-scsi
  application handle the virtio-vhost-user device.

Having done that, I tried to demo my integration end-to-end, and
everything worked fine with a Malloc block device, but things broke
when I switched to a virtio-scsi block device inside the slave. My
attempts to call construct_vhost_scsi_controller failed with an I/O
error. Here is the log:

-- cut here --
$ export VVU_DEVICE="0000:00:06.0"
$ sudo modprobe vfio enable_unsafe_noiommu_mode=1
$ sudo modprobe vfio-pci
$ sudo ./dpdk-devbind.py -b vfio-pci $VVU_DEVICE
$ cd spdk
$ sudo scripts/setup.sh
Active mountpoints on /dev/vda, so not binding PCI dev 0000:00:04.0
0000:00:05.0 (1af4 1004): virtio-pci -> vfio-pci
$ sudo app/vhost/vhost -S "$VVU_DEVICE" -m 0x3 &
[1] 3917
$ Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...
[ DPDK EAL parameters: vhost -c 0x3 -m 1024 --file-prefix=spdk_pid3918 ]
EAL: Multi-process socket /var/run/.spdk_pid3918_unix
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1af4:1017 virtio_vhost_user
EAL:   using IOMMU type 8 (No-IOMMU)
EAL: Ignore mapping IO port bar(0)
VIRTIO_PCI_CONFIG: found modern virtio pci device.
VIRTIO_PCI_CONFIG: modern virtio pci detected.
VHOST_CONFIG: Added virtio-vhost-user device at 0000:00:06.0

$ sudo scripts/rpc.py construct_virtio_pci_scsi_bdev 0000:00:05.0 VirtioScsi0
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1af4:1004 spdk_virtio
EAL: Ignore mapping IO port bar(0)
[
  "VirtioScsi0t0"
]

$ sudo scripts/rpc.py construct_vhost_scsi_controller --cpumask 0x1 vhost.0
VHOST_CONFIG: BAR 2 not availabled
Got JSON-RPC error response
request:
{
  "params": {
    "cpumask": "0x1",
    "ctrlr": "vhost.0"
  },
  "jsonrpc": "2.0",
  "method": "construct_vhost_scsi_controller",
  "id": 1
}
response:
{
  "message": "Input/output error",
  "code": -32602
}
-- cut here --

This was really painful to debug. I managed to find the cause yesterday,
I had bumped into this DPDK bug:

https://bugs.dpdk.org/show_bug.cgi?id=85

and I worked around it, essentially by short-circuiting the point where
the DPDK runtime rescans the PCI bus and corrupts the
dev->mem_resource[] field for the already-mapped-in-userspace
virtio-vhost-user PCI device. I just commented out this line:

https://github.com/spdk/dpdk/blob/08332d13b3a66cb1a8c3a184def76b039052d676/drivers/bus/pci/linux/pci.c#L355

This seems to be a good enough workaround for now. I’m not sure this bug
has been fixed, I will comment on the DPDK bugzilla.

But, now, I have really hit a roadblock. I get a segfault, I run the
exact same commands as shown above, and end up with this backtrace:

-- cut here --

#0  0x000000000046ae42 in spdk_bdev_get_io (channel=0x30) at bdev.c:920
#1  0x000000000046c985 in spdk_bdev_readv_blocks (desc=0x93f8a0, ch=0x0,
    iov=0x7ffff2fb7c88, iovcnt=1, offset_blocks=0, num_blocks=8,
    cb=0x453e1a <spdk_bdev_scsi_task_complete_cmd>, cb_arg=0x7ffff2fb7bc0) at bdev.c:1696
#2  0x000000000046c911 in spdk_bdev_readv (desc=0x93f8a0, ch=0x0, iov=0x7ffff2fb7c88,
    iovcnt=1, offset=0, nbytes=4096, cb=0x453e1a <spdk_bdev_scsi_task_complete_cmd>,
    cb_arg=0x7ffff2fb7bc0) at bdev.c:1680
#3  0x0000000000453fe2 in spdk_bdev_scsi_read (bdev=0x941c80, bdev_desc=0x93f8a0,
    bdev_ch=0x0, task=0x7ffff2fb7bc0, lba=0, len=8) at scsi_bdev.c:1317
#4  0x000000000045462e in spdk_bdev_scsi_readwrite (task=0x7ffff2fb7bc0, lba=0,
    xfer_len=8, is_read=true) at scsi_bdev.c:1477
#5  0x0000000000454c95 in spdk_bdev_scsi_process_block (task=0x7ffff2fb7bc0)
    at scsi_bdev.c:1662
#6  0x00000000004559ce in spdk_bdev_scsi_execute (task=0x7ffff2fb7bc0)
    at scsi_bdev.c:2029
#7  0x00000000004512e4 in spdk_scsi_lun_execute_task (lun=0x93f830, task=0x7ffff2fb7bc0)
    at lun.c:162
#8  0x0000000000450a87 in spdk_scsi_dev_queue_task (dev=0x713c80 <g_devs>,
    task=0x7ffff2fb7bc0) at dev.c:264
#9  0x000000000045ae48 in task_submit (task=0x7ffff2fb7bc0) at vhost_scsi.c:268
#10 0x000000000045c2b8 in process_requestq (svdev=0x7ffff31d9dc0, vq=0x7ffff31d9f40)
    at vhost_scsi.c:649
#11 0x000000000045c4ad in vdev_worker (arg=0x7ffff31d9dc0) at vhost_scsi.c:685
#12 0x00000000004797f2 in _spdk_reactor_run (arg=0x944540) at reactor.c:471
#13 0x0000000000479dad in spdk_reactors_start () at reactor.c:633
#14 0x00000000004783b1 in spdk_app_start (opts=0x7fffffffe390,
    start_fn=0x404df8 <vhost_started>, arg1=0x0, arg2=0x0) at app.c:570
#15 0x0000000000404ec0 in main (argc=7, argv=0x7fffffffe4f8) at vhost.c:115

-- cut here --

I have not yet been able to debug this, it’s most probably my bug, but I
am wondering whether there could be a conflict between the two distinct
virtio drivers: (1) the pre-existing one in the SPDK virtio library
under lib/virtio/, and (2) the one I copied into lib/vhost/rte_vhost/ as
part of the vhost library.

I understand that even if I make it work for now, this cannot be a
long-term solution. I would like to re-use the pre-existing virtio-pci
code from the virtio library to support virtio-vhost-user.
Do you see any potential problems in this? Did you change the virtio
code that you placed inside rte_vhost? It seems there are subtle
differences between the two codebases.

These are my short-term issues. On the longer term, I’d be happy to
contribute to VirtioVhostUser development any way I can. I have seen
some TODOs in your QEMU code here:

https://github.com/stefanha/qemu/blob/virtio-vhost-user/hw/virtio/virtio-vhost-user.c

and I would like to contribute, but it’s not obvious to me what
progress you’ve made since.
As an example, I’d love to explore the possibility of adding support for
interrupt-driven vhost-user backends over the virtio-vhost-user
transport.

To summarize:
- I will follow up on the DPDK bug here:
  https://bugs.dpdk.org/show_bug.cgi?id=85 about a proposed fix.
- Any hints on my segfault? I will definitely continue troubleshooting.
- Once I’ve sorted this out, how can I start using a single copy of the
  virtio-pci codebase? I guess I have to make some changes to comply
  with the API and check the dependencies.
- My current plan to contribute towards an IRQ-based implementation of
  the  virtio-vhost-user transport would be to use the vhost-user kick
  file descriptors as a trigger to insert virtual interrupts and handle
  them in userspace. The virtio-vhost-user device could exploit the
  irqfd mechanism of the KVM for this purpose. I will keep you and the
  list posted on this, I would appreciate any early feedback you may
  have.

Looking forward to any comments/feedback/pointers you may have. I am
rather inexperienced with this stuff, but it’s definitely exciting and
I’d love to contribute more to QEMU and SPDK.

Thank you for reading this far,
Nikos

--
Nikos Dragazis
Undergraduate Student
School of Electrical and Computer Engineering
National Technical University of Athens



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] virtio-vhost-user with virtio-scsi: end-to-end setup
@ 2018-09-21  7:31 ` Wodkowski, PawelX
  2018-10-06 18:55     ` Nikos Dragazis
  0 siblings, 1 reply; 4+ messages in thread
From: Wodkowski, PawelX @ 2018-09-21  7:31 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2840 bytes --]

Hi Nikos,

About SPKD backtrace you got. There is something wrong with IO channel allocation.
SPKD vhost-scsi  should check the result of spdk_scsi_dev_allocate_io_channels() in
spdk_vhost_scsi_dev_add_tgt(). But this result is not checked :(
You can add some check or assert there.

Paweł

> 
> But, now, I have really hit a roadblock. I get a segfault, I run the
> exact same commands as shown above, and end up with this backtrace:
> 
> -- cut here --
> 
> #0  0x000000000046ae42 in spdk_bdev_get_io (channel=0x30) at bdev.c:920
> #1  0x000000000046c985 in spdk_bdev_readv_blocks (desc=0x93f8a0,
> ch=0x0,
>     iov=0x7ffff2fb7c88, iovcnt=1, offset_blocks=0, num_blocks=8,
>     cb=0x453e1a <spdk_bdev_scsi_task_complete_cmd>,
> cb_arg=0x7ffff2fb7bc0) at bdev.c:1696
> #2  0x000000000046c911 in spdk_bdev_readv (desc=0x93f8a0, ch=0x0,
> iov=0x7ffff2fb7c88,
>     iovcnt=1, offset=0, nbytes=4096, cb=0x453e1a
> <spdk_bdev_scsi_task_complete_cmd>,
>     cb_arg=0x7ffff2fb7bc0) at bdev.c:1680
> #3  0x0000000000453fe2 in spdk_bdev_scsi_read (bdev=0x941c80,
> bdev_desc=0x93f8a0,
>     bdev_ch=0x0, task=0x7ffff2fb7bc0, lba=0, len=8) at scsi_bdev.c:1317
> #4  0x000000000045462e in spdk_bdev_scsi_readwrite (task=0x7ffff2fb7bc0,
> lba=0,
>     xfer_len=8, is_read=true) at scsi_bdev.c:1477
> #5  0x0000000000454c95 in spdk_bdev_scsi_process_block
> (task=0x7ffff2fb7bc0)
>     at scsi_bdev.c:1662
> #6  0x00000000004559ce in spdk_bdev_scsi_execute (task=0x7ffff2fb7bc0)
>     at scsi_bdev.c:2029
> #7  0x00000000004512e4 in spdk_scsi_lun_execute_task (lun=0x93f830,
> task=0x7ffff2fb7bc0)
>     at lun.c:162
> #8  0x0000000000450a87 in spdk_scsi_dev_queue_task (dev=0x713c80
> <g_devs>,
>     task=0x7ffff2fb7bc0) at dev.c:264
> #9  0x000000000045ae48 in task_submit (task=0x7ffff2fb7bc0) at
> vhost_scsi.c:268
> #10 0x000000000045c2b8 in process_requestq (svdev=0x7ffff31d9dc0,
> vq=0x7ffff31d9f40)
>     at vhost_scsi.c:649
> #11 0x000000000045c4ad in vdev_worker (arg=0x7ffff31d9dc0) at
> vhost_scsi.c:685
> #12 0x00000000004797f2 in _spdk_reactor_run (arg=0x944540) at
> reactor.c:471
> #13 0x0000000000479dad in spdk_reactors_start () at reactor.c:633
> #14 0x00000000004783b1 in spdk_app_start (opts=0x7fffffffe390,
>     start_fn=0x404df8 <vhost_started>, arg1=0x0, arg2=0x0) at app.c:570
> #15 0x0000000000404ec0 in main (argc=7, argv=0x7fffffffe4f8) at vhost.c:115
> 
> -- cut here --
> 
> I have not yet been able to debug this, it’s most probably my bug, but I
> am wondering whether there could be a conflict between the two distinct
> virtio drivers: (1) the pre-existing one in the SPDK virtio library
> under lib/virtio/, and (2) the one I copied into lib/vhost/rte_vhost/ as
> part of the vhost library.
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [SPDK] virtio-vhost-user with virtio-scsi: end-to-end setup
@ 2018-10-06 18:55     ` Nikos Dragazis
  0 siblings, 0 replies; 4+ messages in thread
From: Nikos Dragazis @ 2018-10-06 18:55 UTC (permalink / raw)
  To: spdk; +Cc: Vangelis Koukis, Stefan Hajnoczi, qemu-devel

Hi Pawel,

Thank you for your quick reply. I appreciate your help.

I’m sorry for the late response. I am glad to tell you that I have a
working demo at last. I have managed to solve my problem.

You were right about the IO channels. Function
spdk_scsi_dev_allocate_io_channels() fails to allocate the IO channel
for the virtio-scsi bdev target and function spdk_vhost_scsi_start()
fails to verify its return value. My actual segfault was due to a race
on the unique virtio-scsi bdev request queue between the creation and
the destruction of the IO channel in the vhost device backend. This led
to the IO channel pointer lun->io_channel being NULL after the
vhost-user negotiation, and the bdev layer segfaulted when accessing it
in response to an IO request.

After discovering this, and spending quite some time debugging it, I
searched the bug tracker and the commit history in case I had missed
something. It seems this was a recently discovered bug, which has
fortunately already been solved:

https://github.com/spdk/spdk/commit/9ddf6438310cc97b346d805a5969af7507e84fde#diff-d361b53e911663e8c6c5890fb046a79b

I had overlooked pulling from the official repo for a while, so I missed
the patch. It works just fine after pulling the newest changes.

So, I’ll make sure to work on the latest commits next time :)

Thanks again,
Nikos


On 21/09/2018 10:31 πμ, Wodkowski, PawelX wrote:
> Hi Nikos,
>
> About SPKD backtrace you got. There is something wrong with IO channel allocation.
> SPKD vhost-scsi  should check the result of spdk_scsi_dev_allocate_io_channels() in
> spdk_vhost_scsi_dev_add_tgt(). But this result is not checked :(
> You can add some check or assert there.
>
> Paweł


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] virtio-vhost-user with virtio-scsi: end-to-end setup
@ 2018-10-06 18:55     ` Nikos Dragazis
  0 siblings, 0 replies; 4+ messages in thread
From: Nikos Dragazis @ 2018-10-06 18:55 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]

Hi Pawel,

Thank you for your quick reply. I appreciate your help.

I’m sorry for the late response. I am glad to tell you that I have a
working demo at last. I have managed to solve my problem.

You were right about the IO channels. Function
spdk_scsi_dev_allocate_io_channels() fails to allocate the IO channel
for the virtio-scsi bdev target and function spdk_vhost_scsi_start()
fails to verify its return value. My actual segfault was due to a race
on the unique virtio-scsi bdev request queue between the creation and
the destruction of the IO channel in the vhost device backend. This led
to the IO channel pointer lun->io_channel being NULL after the
vhost-user negotiation, and the bdev layer segfaulted when accessing it
in response to an IO request.

After discovering this, and spending quite some time debugging it, I
searched the bug tracker and the commit history in case I had missed
something. It seems this was a recently discovered bug, which has
fortunately already been solved:

https://github.com/spdk/spdk/commit/9ddf6438310cc97b346d805a5969af7507e84fde#diff-d361b53e911663e8c6c5890fb046a79b

I had overlooked pulling from the official repo for a while, so I missed
the patch. It works just fine after pulling the newest changes.

So, I’ll make sure to work on the latest commits next time :)

Thanks again,
Nikos


On 21/09/2018 10:31 πμ, Wodkowski, PawelX wrote:
> Hi Nikos,
>
> About SPKD backtrace you got. There is something wrong with IO channel allocation.
> SPKD vhost-scsi  should check the result of spdk_scsi_dev_allocate_io_channels() in
> spdk_vhost_scsi_dev_add_tgt(). But this result is not checked :(
> You can add some check or assert there.
>
> Paweł


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-06 18:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-20 22:06 [SPDK] virtio-vhost-user with virtio-scsi: end-to-end setup Nikos Dragazis
2018-09-21  7:31 ` Wodkowski, PawelX
2018-10-06 18:55   ` [Qemu-devel] " Nikos Dragazis
2018-10-06 18:55     ` Nikos Dragazis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.