All of lore.kernel.org
 help / color / mirror / Atom feed
* Strange qemu6 regression cauing disabled usb controller.
@ 2021-09-30 13:48 Remy Noel
  2021-09-30 15:05 ` Daniel P. Berrangé
  0 siblings, 1 reply; 4+ messages in thread
From: Remy Noel @ 2021-09-30 13:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Gerd Hoffmann, Maxim Levitsky

Hello,

I'm encountering a nagging issue with usbredir and a windows guest, but although I did pinpoint the commit that caused the issue, I have a hard time understanding it.

The issue occurs when a two usbredir devices are added to a guest windows vm. When the second device is added, the UHCI usb controller is disabled by windows with an error code 43 (can be seen with in the usb adapters section of the device manager).

The commit causing the issue (everything works well when reverting it) is 7bed89958bfbf40df9ca681cefbdca63abdde39d : device_core: use drain_call_rcu in in qmp_device_add.

I narrowed the problem to the unlock of the iothread: the minimum drain_call_rcu code that still reproduce the issue is:

void drain_call_rcu(void)
{
     bool locked = qemu_mutex_iothread_locked();
     if (locked) {
         qemu_mutex_unlock_iothread();
     }
     usleep(50000); // time spent draining the rcu on a few slow cases.

     if (locked) {
         qemu_mutex_lock_iothread();
     }
}

The issue can be reproduced from a basic windows home instalation and the following qemu command line:

qemu-system-x86_64 -M pc -cpu host,hv_time,hv_synic,hv_stimer,hv_vpindex -enable-kvm -m 4096 -device piix3-usb-uhci,id=uhci -qmp tcp:127.0.0.1:4444,server=on,wait=off,ipv4 -drive <disk-parameters> -snapshot

Note that the hv parameters are needed to trigger the issue I do not know why.

Two usbredir devices are then added to the vm (qmp client script added at the end of the mail) in order to trigger the issue.

The second device can be added right after or much later than the first and the bug still triggers.

I tried to find what was able to take avantage of the free iothread lock, but the only thing I got so far is that the iothread lock is not taken during the first drain (from the first device add), but is taken many times during the second drain by physmem's IOs (from kvm-accel, but at this point, I'm a bit lost).

I'm looking for pointers as to what could trigger the issue in order to narrow it down, as, so far, I do not understand exactly what causes the regression.
I am unsure of how this would even transcribe in a linux vm so i didn't try to reproduce the issue with one.

Below is the reproduction script (python):

import asyncio
import json
import socket
import sys
import time

char1 = {
     "execute": "chardev-add",
     "arguments": {
         "id": "r-usb0",
         "backend": {
             "type": "socket",
             "data": {
                 "nodelay": True,
                 "addr": {
                     "type": "inet",
                     "data": {
                         "host": "::",
                         "port": "10100"
                     }
                 }
             }
         }
     }
}
dev1 = {
     "execute": "device_add",
     "arguments": {
         "driver": "usb-redir",
         "id": "r-usb0",
         "chardev": "r-usb0",
         "bus": "uhci.0"
     }
}
char2 = {
     "execute": "chardev-add",
     "arguments": {
         "id": "r-usb1",
         "backend": {
             "type": "socket",
             "data": {
                 "nodelay": True,
                 "addr": {
                     "type": "inet",
                     "data": {
                         "host": "::",
                         "port": "10101"
                     }
                 }
             }
         }
     }
}
dev2 = {
     "execute": "device_add",
     "arguments": {
         "driver": "usb-redir",
         "id": "r-usb1",
         "chardev": "r-usb1",
         "bus": "uhci.0"
     }
}


cond = None
ids = {}

async def pull_answers(reader):
     while True:
         data = await reader.read(1024)
         if not data:
             sys.exit(0)
         data = data.decode('ascii')
         while data:
             try:
                 msg, remain = json.JSONDecoder().raw_decode(data)
             except ValueError as vae:
                 break
             if not msg:
                 break
             msg_id = msg.get('id')
             if msg_id:
                 async with cond:
                     ids[msg_id] = msg
                     cond.notify_all()
             else:
                 print("QMP:", msg)
             data = data[remain:]


async def wait_answer(async_msg_id):
     async with cond:
         msg_id = await async_msg_id
         await cond.wait_for(lambda : msg_id in ids)
         return ids[msg_id]


ID = 0

async def send_message(writer, obj):
     global ID
     obj['id'] = ID
     ID += 1
     writer.write(json.dumps(obj).encode())
     await writer.drain()
     return obj['id']


async def main():
     global cond
     cond = asyncio.Condition()
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
         sock.connect(('127.0.0.1', 4444))
         reader, writer = await asyncio.open_connection(sock=sock)
         waiter = asyncio.create_task(pull_answers(reader))
         await send_message(writer, {'execute': 'qmp_capabilities'})
         sender = asyncio.create_task(send_message(writer, char1))
         resp = await wait_answer(sender)
         sender = asyncio.create_task(send_message(writer, dev1))
         resp = await wait_answer(sender)
         time.sleep(.01)
         sender = send_message(writer, char2)
         resp = await wait_answer(sender)
         sender = send_message(writer, dev2)
         resp = await wait_answer(sender)
         print("Added. Awaiting disconnect ...")
         await waiter
         return 0


asyncio.run(main())


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Strange qemu6 regression cauing disabled usb controller.
  2021-09-30 13:48 Strange qemu6 regression cauing disabled usb controller Remy Noel
@ 2021-09-30 15:05 ` Daniel P. Berrangé
  2021-09-30 15:15   ` Remy Noel
  2021-10-05  7:46   ` Remy Noel
  0 siblings, 2 replies; 4+ messages in thread
From: Daniel P. Berrangé @ 2021-09-30 15:05 UTC (permalink / raw)
  To: Remy Noel; +Cc: Maxim Levitsky, qemu-devel, Gerd Hoffmann

On Thu, Sep 30, 2021 at 03:48:44PM +0200, Remy Noel wrote:
> Hello,
> 
> I'm encountering a nagging issue with usbredir and a windows guest, but although I did pinpoint the commit that caused the issue, I have a hard time understanding it.
> 
> The issue occurs when a two usbredir devices are added to a guest windows vm. When the second device is added, the UHCI usb controller is disabled by windows with an error code 43 (can be seen with in the usb adapters section of the device manager).
> 
> The commit causing the issue (everything works well when reverting it) is 7bed89958bfbf40df9ca681cefbdca63abdde39d : device_core: use drain_call_rcu in in qmp_device_add.

Co-incidentally we've just had another bug report filed today that
suggests 7bed89958bfbf40df9ca681cefbdca63abdde39d as a buggy commit
causing deadlock in QEMU

  https://gitlab.com/qemu-project/qemu/-/issues/650

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Strange qemu6 regression cauing disabled usb controller.
  2021-09-30 15:05 ` Daniel P. Berrangé
@ 2021-09-30 15:15   ` Remy Noel
  2021-10-05  7:46   ` Remy Noel
  1 sibling, 0 replies; 4+ messages in thread
From: Remy Noel @ 2021-09-30 15:15 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Maxim Levitsky, qemu-devel, Gerd Hoffmann

On Thu, Sep 30, 2021 at 04:05:52PM +0100, Daniel P. Berrangé wrote:
>On Thu, Sep 30, 2021 at 03:48:44PM +0200, Remy Noel wrote:
>
>Co-incidentally we've just had another bug report filed today that
>suggests 7bed89958bfbf40df9ca681cefbdca63abdde39d as a buggy commit
>causing deadlock in QEMU
>
>  https://gitlab.com/qemu-project/qemu/-/issues/650

Thanks, although, in this issue, the deadlock seems to be caused by the rcu 
while in the usb one, it is caused by the iothread unlock (or it may 
highlight an existing issue) even without rcu waiting.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Strange qemu6 regression cauing disabled usb controller.
  2021-09-30 15:05 ` Daniel P. Berrangé
  2021-09-30 15:15   ` Remy Noel
@ 2021-10-05  7:46   ` Remy Noel
  1 sibling, 0 replies; 4+ messages in thread
From: Remy Noel @ 2021-10-05  7:46 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Maxim Levitsky, qemu-devel, Gerd Hoffmann

On Thu, Sep 30, 2021 at 04:05:52PM +0100, Daniel P. Berrangé wrote:
>Co-incidentally we've just had another bug report filed today that
>suggests 7bed89958bfbf40df9ca681cefbdca63abdde39d as a buggy commit
>causing deadlock in QEMU
>
>  https://gitlab.com/qemu-project/qemu/-/issues/650

Is opening a gitlab ticket the prefered way to report issues now ? Should i 
do that ?

Thanks.

Remy.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-05  7:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30 13:48 Strange qemu6 regression cauing disabled usb controller Remy Noel
2021-09-30 15:05 ` Daniel P. Berrangé
2021-09-30 15:15   ` Remy Noel
2021-10-05  7:46   ` Remy Noel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.