All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] tap networking - how?
@ 2014-02-13  7:34 Alexey Kardashevskiy
  2014-02-13  8:40 ` Max Filippov
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-13  7:34 UTC (permalink / raw)
  To: qemu-devel

Hi!

I am debugging spapr-vlan and hit the following issue.

When I run QEMU as below, the kernel's DHCP client does not continue till I
hit any key in console. If I replace spapr-vlan with
e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
- everything is fine too. So the problem is with combination of spapr-vlan
+ tap.

The issue looks like - the guest kernel boots and then prints:
Sending DHCP requests ..
and it keeps printing dots till I press key or timeout expires. tcpdump
(running on the tap interface) shows one DHCP request and one DHCP response.

What normally happens is that QEMU calls os_host_main_loop_wait() which
calls qemu_poll_ns() and it is sitting there till eventfd signals.
This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
-> aio_set_event_notifier() but I cannot find where it gets passed to the
kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
calls qemu_iohandler_poll() which checks if TAP device has something to
read and eventually calls tap_send().

However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
only on stdin event.

I can see AIO eventfd created and event_notifier_test_and_clear() is called
on it before the kernel starts using spapr-vlan.

So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
expect eventfd to signal but this does not happen. Have I missed some reset
or notification request or "bottom half" (virtio-net uses them but
e1000/rtl8139 do not)?


Please, help. Thanks!


./qemu-system-ppc64 \
	-enable-kvm \
	-m 2048 \
	-L qemu-ppc64-bios/ \
	-machine pseries \
	-trace events=qemu_trace_events \
	-nographic \
	-vga none \
	-netdev tap,id=id0,ifname=tap-id0,script=ifup.sh,downscript=ifdown.sh \
	-device spapr-vlan,id=id1,netdev=id0,mac=C0:41:49:4b:00:00 \
	-kernel vml313 \
	-append "root=/dev/nfs ip=dhcp selinux=0
nfsroot=10.61.145.11:/scratch/alexey/fc19nfs_/"




-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] tap networking - how?
  2014-02-13  7:34 [Qemu-devel] tap networking - how? Alexey Kardashevskiy
@ 2014-02-13  8:40 ` Max Filippov
  2014-02-13 10:34   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Max Filippov @ 2014-02-13  8:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

Hi,

On Thu, Feb 13, 2014 at 11:34 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> Hi!
>
> I am debugging spapr-vlan and hit the following issue.
>
> When I run QEMU as below, the kernel's DHCP client does not continue till I
> hit any key in console. If I replace spapr-vlan with
> e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
> - everything is fine too. So the problem is with combination of spapr-vlan
> + tap.
>
> The issue looks like - the guest kernel boots and then prints:
> Sending DHCP requests ..
> and it keeps printing dots till I press key or timeout expires. tcpdump
> (running on the tap interface) shows one DHCP request and one DHCP response.
>
> What normally happens is that QEMU calls os_host_main_loop_wait() which
> calls qemu_poll_ns() and it is sitting there till eventfd signals.
> This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
> -> aio_set_event_notifier() but I cannot find where it gets passed to the
> kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
> calls qemu_iohandler_poll() which checks if TAP device has something to
> read and eventually calls tap_send().
>
> However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
> only on stdin event.
>
> I can see AIO eventfd created and event_notifier_test_and_clear() is called
> on it before the kernel starts using spapr-vlan.
>
> So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
> expect eventfd to signal but this does not happen. Have I missed some reset
> or notification request or "bottom half" (virtio-net uses them but
> e1000/rtl8139 do not)?

Sounds pretty much like the problem I had recently with opencores
10/100 MAC: https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00073.html

Does the following help?:

diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
index 1bd6f50..2436b5e 100644
--- a/hw/net/spapr_llan.c
+++ b/hw/net/spapr_llan.c
@@ -404,6 +404,7 @@ static target_ulong
h_add_logical_lan_buffer(PowerPCCPU *cpu,
     vio_stq(sdev, dev->buf_list + dev->add_buf_ptr, buf);

     dev->rx_bufs++;
+    qemu_flush_queued_packets(qemu_get_queue(dev->nic));

     DPRINTF("h_add_logical_lan_buffer():  Added buf  ptr=%d  rx_bufs=%d"
             " bd=0x%016llx\n", dev->add_buf_ptr, dev->rx_bufs,


-- 
Thanks.
-- Max

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] tap networking - how?
  2014-02-13  8:40 ` Max Filippov
@ 2014-02-13 10:34   ` Alexey Kardashevskiy
  2014-02-13 12:23     ` Max Filippov
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-13 10:34 UTC (permalink / raw)
  To: Max Filippov; +Cc: qemu-devel

On 02/13/2014 07:40 PM, Max Filippov wrote:
> Hi,
> 
> On Thu, Feb 13, 2014 at 11:34 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> Hi!
>>
>> I am debugging spapr-vlan and hit the following issue.
>>
>> When I run QEMU as below, the kernel's DHCP client does not continue till I
>> hit any key in console. If I replace spapr-vlan with
>> e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
>> - everything is fine too. So the problem is with combination of spapr-vlan
>> + tap.
>>
>> The issue looks like - the guest kernel boots and then prints:
>> Sending DHCP requests ..
>> and it keeps printing dots till I press key or timeout expires. tcpdump
>> (running on the tap interface) shows one DHCP request and one DHCP response.
>>
>> What normally happens is that QEMU calls os_host_main_loop_wait() which
>> calls qemu_poll_ns() and it is sitting there till eventfd signals.
>> This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
>> -> aio_set_event_notifier() but I cannot find where it gets passed to the
>> kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
>> calls qemu_iohandler_poll() which checks if TAP device has something to
>> read and eventually calls tap_send().
>>
>> However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
>> only on stdin event.
>>
>> I can see AIO eventfd created and event_notifier_test_and_clear() is called
>> on it before the kernel starts using spapr-vlan.
>>
>> So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
>> expect eventfd to signal but this does not happen. Have I missed some reset
>> or notification request or "bottom half" (virtio-net uses them but
>> e1000/rtl8139 do not)?
> 
> Sounds pretty much like the problem I had recently with opencores
> 10/100 MAC: https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00073.html
> 
> Does the following help?:


Yes, it does, thanks a lot!

While we are here and you seem to understand this stuff -
how is tap expected to work to deliver a packet from the external network
to the guest? I mean what event should be triggered in what order? My brain
is melting :( I just cannot see how receiving a packet on "tap" in the host
kernel can make os_host_main_loop_wait() exit in QEMU so it could call
qemu_iohandler_poll() and do the job. Thanks!




> diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
> index 1bd6f50..2436b5e 100644
> --- a/hw/net/spapr_llan.c
> +++ b/hw/net/spapr_llan.c
> @@ -404,6 +404,7 @@ static target_ulong
> h_add_logical_lan_buffer(PowerPCCPU *cpu,
>      vio_stq(sdev, dev->buf_list + dev->add_buf_ptr, buf);
> 
>      dev->rx_bufs++;
> +    qemu_flush_queued_packets(qemu_get_queue(dev->nic));
> 
>      DPRINTF("h_add_logical_lan_buffer():  Added buf  ptr=%d  rx_bufs=%d"
>              " bd=0x%016llx\n", dev->add_buf_ptr, dev->rx_bufs,
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] tap networking - how?
  2014-02-13 10:34   ` Alexey Kardashevskiy
@ 2014-02-13 12:23     ` Max Filippov
  2014-02-13 13:42       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Max Filippov @ 2014-02-13 12:23 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Thu, Feb 13, 2014 at 2:34 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> On 02/13/2014 07:40 PM, Max Filippov wrote:
>> Hi,
>>
>> On Thu, Feb 13, 2014 at 11:34 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> Hi!
>>>
>>> I am debugging spapr-vlan and hit the following issue.
>>>
>>> When I run QEMU as below, the kernel's DHCP client does not continue till I
>>> hit any key in console. If I replace spapr-vlan with
>>> e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
>>> - everything is fine too. So the problem is with combination of spapr-vlan
>>> + tap.
>>>
>>> The issue looks like - the guest kernel boots and then prints:
>>> Sending DHCP requests ..
>>> and it keeps printing dots till I press key or timeout expires. tcpdump
>>> (running on the tap interface) shows one DHCP request and one DHCP response.
>>>
>>> What normally happens is that QEMU calls os_host_main_loop_wait() which
>>> calls qemu_poll_ns() and it is sitting there till eventfd signals.
>>> This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
>>> -> aio_set_event_notifier() but I cannot find where it gets passed to the
>>> kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
>>> calls qemu_iohandler_poll() which checks if TAP device has something to
>>> read and eventually calls tap_send().
>>>
>>> However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
>>> only on stdin event.
>>>
>>> I can see AIO eventfd created and event_notifier_test_and_clear() is called
>>> on it before the kernel starts using spapr-vlan.
>>>
>>> So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
>>> expect eventfd to signal but this does not happen. Have I missed some reset
>>> or notification request or "bottom half" (virtio-net uses them but
>>> e1000/rtl8139 do not)?
>>
>> Sounds pretty much like the problem I had recently with opencores
>> 10/100 MAC: https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00073.html
>>
>> Does the following help?:
>
> Yes, it does, thanks a lot!
>
> While we are here and you seem to understand this stuff -
> how is tap expected to work to deliver a packet from the external network
> to the guest? I mean what event should be triggered in what order? My brain
> is melting :( I just cannot see how receiving a packet on "tap" in the host
> kernel can make os_host_main_loop_wait() exit in QEMU so it could call
> qemu_iohandler_poll() and do the job. Thanks!

I'm not very experienced in this area of QEMU, so the following may be not
100% accurate.
Tap file descriptor is registered among other file descriptors in an array
that os_host_main_loop_wait use to poll for events. So normally packet
arrives to the host, fd becomes readable, poll function completes and
registered handler (see tap_update_fd_handler) is called. The handler reads
packets and calls the attached NIC's NetClientInfo::receive callback through
network queuing infrastructure. But once NIC doesn't process a packet or its
NetClientInfo::can_receive returns false it stops polling for new packets
by updating handlers associated with its fd. So NIC needs to inform the
networking core when it can receive more packets by calling
qemu_flush_queued_packets, which will also complete polling and deliver
already queued packets.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] tap networking - how?
  2014-02-13 12:23     ` Max Filippov
@ 2014-02-13 13:42       ` Alexey Kardashevskiy
  2014-02-13 14:02         ` Max Filippov
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-13 13:42 UTC (permalink / raw)
  To: Max Filippov; +Cc: qemu-devel

On 02/13/2014 11:23 PM, Max Filippov wrote:
> On Thu, Feb 13, 2014 at 2:34 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> On 02/13/2014 07:40 PM, Max Filippov wrote:
>>> Hi,
>>>
>>> On Thu, Feb 13, 2014 at 11:34 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>> Hi!
>>>>
>>>> I am debugging spapr-vlan and hit the following issue.
>>>>
>>>> When I run QEMU as below, the kernel's DHCP client does not continue till I
>>>> hit any key in console. If I replace spapr-vlan with
>>>> e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
>>>> - everything is fine too. So the problem is with combination of spapr-vlan
>>>> + tap.
>>>>
>>>> The issue looks like - the guest kernel boots and then prints:
>>>> Sending DHCP requests ..
>>>> and it keeps printing dots till I press key or timeout expires. tcpdump
>>>> (running on the tap interface) shows one DHCP request and one DHCP response.
>>>>
>>>> What normally happens is that QEMU calls os_host_main_loop_wait() which
>>>> calls qemu_poll_ns() and it is sitting there till eventfd signals.
>>>> This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
>>>> -> aio_set_event_notifier() but I cannot find where it gets passed to the
>>>> kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
>>>> calls qemu_iohandler_poll() which checks if TAP device has something to
>>>> read and eventually calls tap_send().
>>>>
>>>> However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
>>>> only on stdin event.
>>>>
>>>> I can see AIO eventfd created and event_notifier_test_and_clear() is called
>>>> on it before the kernel starts using spapr-vlan.
>>>>
>>>> So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
>>>> expect eventfd to signal but this does not happen. Have I missed some reset
>>>> or notification request or "bottom half" (virtio-net uses them but
>>>> e1000/rtl8139 do not)?
>>>
>>> Sounds pretty much like the problem I had recently with opencores
>>> 10/100 MAC: https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00073.html
>>>
>>> Does the following help?:
>>
>> Yes, it does, thanks a lot!
>>
>> While we are here and you seem to understand this stuff -
>> how is tap expected to work to deliver a packet from the external network
>> to the guest? I mean what event should be triggered in what order? My brain
>> is melting :( I just cannot see how receiving a packet on "tap" in the host
>> kernel can make os_host_main_loop_wait() exit in QEMU so it could call
>> qemu_iohandler_poll() and do the job. Thanks!
> 
> I'm not very experienced in this area of QEMU, so the following may be not
> 100% accurate.
> Tap file descriptor is registered among other file descriptors in an array
> that os_host_main_loop_wait use to poll for events. So normally packet
> arrives to the host, fd becomes readable, poll function completes and
> registered handler (see tap_update_fd_handler) is called. The handler reads
> packets and calls the attached NIC's NetClientInfo::receive callback through
> network queuing infrastructure. But once NIC doesn't process a packet or its
> NetClientInfo::can_receive returns false it stops polling for new packets
> by updating handlers associated with its fd. So NIC needs to inform the
> networking core when it can receive more packets by calling
> qemu_flush_queued_packets, which will also complete polling and deliver
> already queued packets.


I am more interested in details :)
os_host_main_loop_wait() calls glib_pollfds_fill() which puts actual fds
into gpollfds GArray thing. Before the tap device started, its fd is not
there but after the patch you proposed, tap's fd gets to the list.
The actual fds are put into array by g_main_context_query() (if I read gdb
output correctly). So there must be some callback somewhere which tells
this g_main_context_query() what to poll for. I put a million breakpoints
to know what is called but to no avail.



-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] tap networking - how?
  2014-02-13 13:42       ` Alexey Kardashevskiy
@ 2014-02-13 14:02         ` Max Filippov
  2014-02-13 14:06           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Max Filippov @ 2014-02-13 14:02 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Thu, Feb 13, 2014 at 5:42 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> On 02/13/2014 11:23 PM, Max Filippov wrote:
>> On Thu, Feb 13, 2014 at 2:34 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> On 02/13/2014 07:40 PM, Max Filippov wrote:
>>>> Hi,
>>>>
>>>> On Thu, Feb 13, 2014 at 11:34 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>> Hi!
>>>>>
>>>>> I am debugging spapr-vlan and hit the following issue.
>>>>>
>>>>> When I run QEMU as below, the kernel's DHCP client does not continue till I
>>>>> hit any key in console. If I replace spapr-vlan with
>>>>> e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
>>>>> - everything is fine too. So the problem is with combination of spapr-vlan
>>>>> + tap.
>>>>>
>>>>> The issue looks like - the guest kernel boots and then prints:
>>>>> Sending DHCP requests ..
>>>>> and it keeps printing dots till I press key or timeout expires. tcpdump
>>>>> (running on the tap interface) shows one DHCP request and one DHCP response.
>>>>>
>>>>> What normally happens is that QEMU calls os_host_main_loop_wait() which
>>>>> calls qemu_poll_ns() and it is sitting there till eventfd signals.
>>>>> This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
>>>>> -> aio_set_event_notifier() but I cannot find where it gets passed to the
>>>>> kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
>>>>> calls qemu_iohandler_poll() which checks if TAP device has something to
>>>>> read and eventually calls tap_send().
>>>>>
>>>>> However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
>>>>> only on stdin event.
>>>>>
>>>>> I can see AIO eventfd created and event_notifier_test_and_clear() is called
>>>>> on it before the kernel starts using spapr-vlan.
>>>>>
>>>>> So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
>>>>> expect eventfd to signal but this does not happen. Have I missed some reset
>>>>> or notification request or "bottom half" (virtio-net uses them but
>>>>> e1000/rtl8139 do not)?
>>>>
>>>> Sounds pretty much like the problem I had recently with opencores
>>>> 10/100 MAC: https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00073.html
>>>>
>>>> Does the following help?:
>>>
>>> Yes, it does, thanks a lot!
>>>
>>> While we are here and you seem to understand this stuff -
>>> how is tap expected to work to deliver a packet from the external network
>>> to the guest? I mean what event should be triggered in what order? My brain
>>> is melting :( I just cannot see how receiving a packet on "tap" in the host
>>> kernel can make os_host_main_loop_wait() exit in QEMU so it could call
>>> qemu_iohandler_poll() and do the job. Thanks!
>>
>> I'm not very experienced in this area of QEMU, so the following may be not
>> 100% accurate.
>> Tap file descriptor is registered among other file descriptors in an array
>> that os_host_main_loop_wait use to poll for events. So normally packet
>> arrives to the host, fd becomes readable, poll function completes and
>> registered handler (see tap_update_fd_handler) is called. The handler reads
>> packets and calls the attached NIC's NetClientInfo::receive callback through
>> network queuing infrastructure. But once NIC doesn't process a packet or its
>> NetClientInfo::can_receive returns false it stops polling for new packets
>> by updating handlers associated with its fd. So NIC needs to inform the
>> networking core when it can receive more packets by calling
>> qemu_flush_queued_packets, which will also complete polling and deliver
>> already queued packets.
>
>
> I am more interested in details :)
> os_host_main_loop_wait() calls glib_pollfds_fill() which puts actual fds
> into gpollfds GArray thing. Before the tap device started, its fd is not
> there but after the patch you proposed, tap's fd gets to the list.
> The actual fds are put into array by g_main_context_query() (if I read gdb
> output correctly). So there must be some callback somewhere which tells
> this g_main_context_query() what to poll for. I put a million breakpoints
> to know what is called but to no avail.

I see that qemu_iohandler_fill puts fds into this array. And it only puts those
that have write handler or read handler and can read at the moment.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] tap networking - how?
  2014-02-13 14:02         ` Max Filippov
@ 2014-02-13 14:06           ` Alexey Kardashevskiy
  2014-02-13 14:17             ` Max Filippov
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-13 14:06 UTC (permalink / raw)
  To: Max Filippov; +Cc: qemu-devel

On 02/14/2014 01:02 AM, Max Filippov wrote:
> On Thu, Feb 13, 2014 at 5:42 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> On 02/13/2014 11:23 PM, Max Filippov wrote:
>>> On Thu, Feb 13, 2014 at 2:34 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>> On 02/13/2014 07:40 PM, Max Filippov wrote:
>>>>> Hi,
>>>>>
>>>>> On Thu, Feb 13, 2014 at 11:34 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>> Hi!
>>>>>>
>>>>>> I am debugging spapr-vlan and hit the following issue.
>>>>>>
>>>>>> When I run QEMU as below, the kernel's DHCP client does not continue till I
>>>>>> hit any key in console. If I replace spapr-vlan with
>>>>>> e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
>>>>>> - everything is fine too. So the problem is with combination of spapr-vlan
>>>>>> + tap.
>>>>>>
>>>>>> The issue looks like - the guest kernel boots and then prints:
>>>>>> Sending DHCP requests ..
>>>>>> and it keeps printing dots till I press key or timeout expires. tcpdump
>>>>>> (running on the tap interface) shows one DHCP request and one DHCP response.
>>>>>>
>>>>>> What normally happens is that QEMU calls os_host_main_loop_wait() which
>>>>>> calls qemu_poll_ns() and it is sitting there till eventfd signals.
>>>>>> This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
>>>>>> -> aio_set_event_notifier() but I cannot find where it gets passed to the
>>>>>> kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
>>>>>> calls qemu_iohandler_poll() which checks if TAP device has something to
>>>>>> read and eventually calls tap_send().
>>>>>>
>>>>>> However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
>>>>>> only on stdin event.
>>>>>>
>>>>>> I can see AIO eventfd created and event_notifier_test_and_clear() is called
>>>>>> on it before the kernel starts using spapr-vlan.
>>>>>>
>>>>>> So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
>>>>>> expect eventfd to signal but this does not happen. Have I missed some reset
>>>>>> or notification request or "bottom half" (virtio-net uses them but
>>>>>> e1000/rtl8139 do not)?
>>>>>
>>>>> Sounds pretty much like the problem I had recently with opencores
>>>>> 10/100 MAC: https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00073.html
>>>>>
>>>>> Does the following help?:
>>>>
>>>> Yes, it does, thanks a lot!
>>>>
>>>> While we are here and you seem to understand this stuff -
>>>> how is tap expected to work to deliver a packet from the external network
>>>> to the guest? I mean what event should be triggered in what order? My brain
>>>> is melting :( I just cannot see how receiving a packet on "tap" in the host
>>>> kernel can make os_host_main_loop_wait() exit in QEMU so it could call
>>>> qemu_iohandler_poll() and do the job. Thanks!
>>>
>>> I'm not very experienced in this area of QEMU, so the following may be not
>>> 100% accurate.
>>> Tap file descriptor is registered among other file descriptors in an array
>>> that os_host_main_loop_wait use to poll for events. So normally packet
>>> arrives to the host, fd becomes readable, poll function completes and
>>> registered handler (see tap_update_fd_handler) is called. The handler reads
>>> packets and calls the attached NIC's NetClientInfo::receive callback through
>>> network queuing infrastructure. But once NIC doesn't process a packet or its
>>> NetClientInfo::can_receive returns false it stops polling for new packets
>>> by updating handlers associated with its fd. So NIC needs to inform the
>>> networking core when it can receive more packets by calling
>>> qemu_flush_queued_packets, which will also complete polling and deliver
>>> already queued packets.
>>
>>
>> I am more interested in details :)
>> os_host_main_loop_wait() calls glib_pollfds_fill() which puts actual fds
>> into gpollfds GArray thing. Before the tap device started, its fd is not
>> there but after the patch you proposed, tap's fd gets to the list.
>> The actual fds are put into array by g_main_context_query() (if I read gdb
>> output correctly). So there must be some callback somewhere which tells
>> this g_main_context_query() what to poll for. I put a million breakpoints
>> to know what is called but to no avail.
> 
> I see that qemu_iohandler_fill puts fds into this array. And it only puts those
> that have write handler or read handler and can read at the moment.


os_host_main_loop_wait() - when things work, it waits on the tap device
too. Without your patch, it does not wait on the tap device fd (i.e. this
fd is not put to the array of fds by glib_pollfds_fill()). Where does this
difference happen - this is my question...





-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] tap networking - how?
  2014-02-13 14:06           ` Alexey Kardashevskiy
@ 2014-02-13 14:17             ` Max Filippov
  2014-02-13 14:25               ` Alexey Kardashevskiy
  0 siblings, 1 reply; 9+ messages in thread
From: Max Filippov @ 2014-02-13 14:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-devel

On Thu, Feb 13, 2014 at 6:06 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> On 02/14/2014 01:02 AM, Max Filippov wrote:
>> On Thu, Feb 13, 2014 at 5:42 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> On 02/13/2014 11:23 PM, Max Filippov wrote:
>>>> On Thu, Feb 13, 2014 at 2:34 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>> On 02/13/2014 07:40 PM, Max Filippov wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On Thu, Feb 13, 2014 at 11:34 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>> I am debugging spapr-vlan and hit the following issue.
>>>>>>>
>>>>>>> When I run QEMU as below, the kernel's DHCP client does not continue till I
>>>>>>> hit any key in console. If I replace spapr-vlan with
>>>>>>> e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
>>>>>>> - everything is fine too. So the problem is with combination of spapr-vlan
>>>>>>> + tap.
>>>>>>>
>>>>>>> The issue looks like - the guest kernel boots and then prints:
>>>>>>> Sending DHCP requests ..
>>>>>>> and it keeps printing dots till I press key or timeout expires. tcpdump
>>>>>>> (running on the tap interface) shows one DHCP request and one DHCP response.
>>>>>>>
>>>>>>> What normally happens is that QEMU calls os_host_main_loop_wait() which
>>>>>>> calls qemu_poll_ns() and it is sitting there till eventfd signals.
>>>>>>> This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
>>>>>>> -> aio_set_event_notifier() but I cannot find where it gets passed to the
>>>>>>> kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
>>>>>>> calls qemu_iohandler_poll() which checks if TAP device has something to
>>>>>>> read and eventually calls tap_send().
>>>>>>>
>>>>>>> However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
>>>>>>> only on stdin event.
>>>>>>>
>>>>>>> I can see AIO eventfd created and event_notifier_test_and_clear() is called
>>>>>>> on it before the kernel starts using spapr-vlan.
>>>>>>>
>>>>>>> So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
>>>>>>> expect eventfd to signal but this does not happen. Have I missed some reset
>>>>>>> or notification request or "bottom half" (virtio-net uses them but
>>>>>>> e1000/rtl8139 do not)?
>>>>>>
>>>>>> Sounds pretty much like the problem I had recently with opencores
>>>>>> 10/100 MAC: https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00073.html
>>>>>>
>>>>>> Does the following help?:
>>>>>
>>>>> Yes, it does, thanks a lot!
>>>>>
>>>>> While we are here and you seem to understand this stuff -
>>>>> how is tap expected to work to deliver a packet from the external network
>>>>> to the guest? I mean what event should be triggered in what order? My brain
>>>>> is melting :( I just cannot see how receiving a packet on "tap" in the host
>>>>> kernel can make os_host_main_loop_wait() exit in QEMU so it could call
>>>>> qemu_iohandler_poll() and do the job. Thanks!
>>>>
>>>> I'm not very experienced in this area of QEMU, so the following may be not
>>>> 100% accurate.
>>>> Tap file descriptor is registered among other file descriptors in an array
>>>> that os_host_main_loop_wait use to poll for events. So normally packet
>>>> arrives to the host, fd becomes readable, poll function completes and
>>>> registered handler (see tap_update_fd_handler) is called. The handler reads
>>>> packets and calls the attached NIC's NetClientInfo::receive callback through
>>>> network queuing infrastructure. But once NIC doesn't process a packet or its
>>>> NetClientInfo::can_receive returns false it stops polling for new packets
>>>> by updating handlers associated with its fd. So NIC needs to inform the
>>>> networking core when it can receive more packets by calling
>>>> qemu_flush_queued_packets, which will also complete polling and deliver
>>>> already queued packets.
>>>
>>>
>>> I am more interested in details :)
>>> os_host_main_loop_wait() calls glib_pollfds_fill() which puts actual fds
>>> into gpollfds GArray thing. Before the tap device started, its fd is not
>>> there but after the patch you proposed, tap's fd gets to the list.
>>> The actual fds are put into array by g_main_context_query() (if I read gdb
>>> output correctly). So there must be some callback somewhere which tells
>>> this g_main_context_query() what to poll for. I put a million breakpoints
>>> to know what is called but to no avail.
>>
>> I see that qemu_iohandler_fill puts fds into this array. And it only puts those
>> that have write handler or read handler and can read at the moment.
>
>
> os_host_main_loop_wait() - when things work, it waits on the tap device
> too. Without your patch, it does not wait on the tap device fd (i.e. this
> fd is not put to the array of fds by glib_pollfds_fill()). Where does this
> difference happen - this is my question...

It is triggered by the guest adding new descriptor to the NIC RX ring.
Added qemu_flush_queued_packets completes poll that doesn't have
TAP fd in the array, and (assuming there were no packets queued)
the next main_loop_wait -> qemu_iohandler_fill puts the TAP fd into
that array:

        if (ioh->fd_read &&
            (!ioh->fd_read_poll ||
             ioh->fd_read_poll(ioh->opaque) != 0)) {
            events |= G_IO_IN | G_IO_HUP | G_IO_ERR;
        }

because now NIC's can_receive (called here through ioh->fd_read_poll)
returns true.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] tap networking - how?
  2014-02-13 14:17             ` Max Filippov
@ 2014-02-13 14:25               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 9+ messages in thread
From: Alexey Kardashevskiy @ 2014-02-13 14:25 UTC (permalink / raw)
  To: Max Filippov; +Cc: qemu-devel

On 02/14/2014 01:17 AM, Max Filippov wrote:
> On Thu, Feb 13, 2014 at 6:06 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> On 02/14/2014 01:02 AM, Max Filippov wrote:
>>> On Thu, Feb 13, 2014 at 5:42 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>> On 02/13/2014 11:23 PM, Max Filippov wrote:
>>>>> On Thu, Feb 13, 2014 at 2:34 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>> On 02/13/2014 07:40 PM, Max Filippov wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Thu, Feb 13, 2014 at 11:34 AM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I am debugging spapr-vlan and hit the following issue.
>>>>>>>>
>>>>>>>> When I run QEMU as below, the kernel's DHCP client does not continue till I
>>>>>>>> hit any key in console. If I replace spapr-vlan with
>>>>>>>> e1000/rtl8139/virtio-net, everything is just fine. If I use "user" network
>>>>>>>> - everything is fine too. So the problem is with combination of spapr-vlan
>>>>>>>> + tap.
>>>>>>>>
>>>>>>>> The issue looks like - the guest kernel boots and then prints:
>>>>>>>> Sending DHCP requests ..
>>>>>>>> and it keeps printing dots till I press key or timeout expires. tcpdump
>>>>>>>> (running on the tap interface) shows one DHCP request and one DHCP response.
>>>>>>>>
>>>>>>>> What normally happens is that QEMU calls os_host_main_loop_wait() which
>>>>>>>> calls qemu_poll_ns() and it is sitting there till eventfd signals.
>>>>>>>> This eventfd is registered via qemu_init_main_loop() -> aio_context_new()
>>>>>>>> -> aio_set_event_notifier() but I cannot find where it gets passed to the
>>>>>>>> kernel (otherwise why would we need eventfd?).  When eventfd signals, QEMU
>>>>>>>> calls qemu_iohandler_poll() which checks if TAP device has something to
>>>>>>>> read and eventually calls tap_send().
>>>>>>>>
>>>>>>>> However in my bad example QEMU does not exit qemu_poll_ns() on eventfd,
>>>>>>>> only on stdin event.
>>>>>>>>
>>>>>>>> I can see AIO eventfd created and event_notifier_test_and_clear() is called
>>>>>>>> on it before the kernel starts using spapr-vlan.
>>>>>>>>
>>>>>>>> So. h_send_logical_lan() is called to sent a DHCP request packet. Now I
>>>>>>>> expect eventfd to signal but this does not happen. Have I missed some reset
>>>>>>>> or notification request or "bottom half" (virtio-net uses them but
>>>>>>>> e1000/rtl8139 do not)?
>>>>>>>
>>>>>>> Sounds pretty much like the problem I had recently with opencores
>>>>>>> 10/100 MAC: https://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00073.html
>>>>>>>
>>>>>>> Does the following help?:
>>>>>>
>>>>>> Yes, it does, thanks a lot!
>>>>>>
>>>>>> While we are here and you seem to understand this stuff -
>>>>>> how is tap expected to work to deliver a packet from the external network
>>>>>> to the guest? I mean what event should be triggered in what order? My brain
>>>>>> is melting :( I just cannot see how receiving a packet on "tap" in the host
>>>>>> kernel can make os_host_main_loop_wait() exit in QEMU so it could call
>>>>>> qemu_iohandler_poll() and do the job. Thanks!
>>>>>
>>>>> I'm not very experienced in this area of QEMU, so the following may be not
>>>>> 100% accurate.
>>>>> Tap file descriptor is registered among other file descriptors in an array
>>>>> that os_host_main_loop_wait use to poll for events. So normally packet
>>>>> arrives to the host, fd becomes readable, poll function completes and
>>>>> registered handler (see tap_update_fd_handler) is called. The handler reads
>>>>> packets and calls the attached NIC's NetClientInfo::receive callback through
>>>>> network queuing infrastructure. But once NIC doesn't process a packet or its
>>>>> NetClientInfo::can_receive returns false it stops polling for new packets
>>>>> by updating handlers associated with its fd. So NIC needs to inform the
>>>>> networking core when it can receive more packets by calling
>>>>> qemu_flush_queued_packets, which will also complete polling and deliver
>>>>> already queued packets.
>>>>
>>>>
>>>> I am more interested in details :)
>>>> os_host_main_loop_wait() calls glib_pollfds_fill() which puts actual fds
>>>> into gpollfds GArray thing. Before the tap device started, its fd is not
>>>> there but after the patch you proposed, tap's fd gets to the list.
>>>> The actual fds are put into array by g_main_context_query() (if I read gdb
>>>> output correctly). So there must be some callback somewhere which tells
>>>> this g_main_context_query() what to poll for. I put a million breakpoints
>>>> to know what is called but to no avail.
>>>
>>> I see that qemu_iohandler_fill puts fds into this array. And it only puts those
>>> that have write handler or read handler and can read at the moment.
>>
>>
>> os_host_main_loop_wait() - when things work, it waits on the tap device
>> too. Without your patch, it does not wait on the tap device fd (i.e. this
>> fd is not put to the array of fds by glib_pollfds_fill()). Where does this
>> difference happen - this is my question...
> 
> It is triggered by the guest adding new descriptor to the NIC RX ring.
> Added qemu_flush_queued_packets completes poll that doesn't have
> TAP fd in the array, and (assuming there were no packets queued)
> the next main_loop_wait -> qemu_iohandler_fill puts the TAP fd into
> that array:
> 
>         if (ioh->fd_read &&
>             (!ioh->fd_read_poll ||
>              ioh->fd_read_poll(ioh->opaque) != 0)) {
>             events |= G_IO_IN | G_IO_HUP | G_IO_ERR;
>         }
> 
> because now NIC's can_receive (called here through ioh->fd_read_poll)
> returns true.
> 

Oh. Right. The mosaic became a picture :) Thanks!


-- 
Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-02-13 14:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-13  7:34 [Qemu-devel] tap networking - how? Alexey Kardashevskiy
2014-02-13  8:40 ` Max Filippov
2014-02-13 10:34   ` Alexey Kardashevskiy
2014-02-13 12:23     ` Max Filippov
2014-02-13 13:42       ` Alexey Kardashevskiy
2014-02-13 14:02         ` Max Filippov
2014-02-13 14:06           ` Alexey Kardashevskiy
2014-02-13 14:17             ` Max Filippov
2014-02-13 14:25               ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.