All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>, Parav Pandit <parav@nvidia.com>
Cc: virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
Date: Mon, 19 Jul 2021 11:40:08 +0200	[thread overview]
Message-ID: <878s22bq13.fsf@redhat.com> (raw)
In-Reply-To: <20210717164152-mutt-send-email-mst@kernel.org>

On Sat, Jul 17 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Sat, Jul 17, 2021 at 10:42:58AM +0300, Parav Pandit wrote:
>> When a virtio pci device undergo surprise removal (aka async removaln in
>
> typo
>
>> PCIe spec), mark the device is broken so that any upper layer drivers can
>> abort any outstanding operation.
>> 
>> When a virtio net pci device undergo surprise removal which is used by a
>> NetworkManager, a below call trace was observed.
>> 
>> kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
>> watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
>> CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S      W I  L    5.13.0-hotplug+ #8
>> Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
>> Workqueue: events linkwatch_event
>> RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
>> Call Trace:
>>  virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
>>  ? __hw_addr_create_ex+0x85/0xc0
>>  __dev_mc_add+0x72/0x80
>>  igmp6_group_added+0xa7/0xd0
>>  ipv6_mc_up+0x3c/0x60
>>  ipv6_find_idev+0x36/0x80
>>  addrconf_add_dev+0x1e/0xa0
>>  addrconf_dev_config+0x71/0x130
>>  addrconf_notify+0x1f5/0xb40
>>  ? rtnl_is_locked+0x11/0x20
>>  ? __switch_to_asm+0x42/0x70
>>  ? finish_task_switch+0xaf/0x2c0
>>  ? raw_notifier_call_chain+0x3e/0x50
>>  raw_notifier_call_chain+0x3e/0x50
>>  netdev_state_change+0x67/0x90
>>  linkwatch_do_dev+0x3c/0x50
>>  __linkwatch_run_queue+0xd2/0x220
>>  linkwatch_event+0x21/0x30
>>  process_one_work+0x1c8/0x370
>>  worker_thread+0x30/0x380
>>  ? process_one_work+0x370/0x370
>>  kthread+0x118/0x140
>>  ? set_kthread_struct+0x40/0x40
>>  ret_from_fork+0x1f/0x30
>> 
>> Hence, add the ability to abort the command on surprise removal
>> which prevents infinite loop and system lockup.
>> 
>> Signed-off-by: Parav Pandit <parav@nvidia.com>
>
> OK that's nice, but I suspect this is not enough.
> For example we need to also fix up all config space reads
> to handle all-ones patterns specially.
>
> E.g.
>
>         /* After writing 0 to device_status, the driver MUST wait for a read of
>          * device_status to return 0 before reinitializing the device.
>          * This will flush out the status write, and flush in device writes,
>          * including MSI-X interrupts, if any.
>          */
>         while (vp_modern_get_status(mdev))
>                 msleep(1);
>
> lots of code in drivers needs to be fixed too.
>
> I guess we need to annotate drivers somehow to mark up
> whether they support surprise removal? And maybe find a
> way to let host know?

I'm wondering whether virtio-pci surprise removal would need more
support in drivers than virtio-ccw surprise removal; given that
virtio-ccw *only* supports surprise removal and I don't remember any
problem reports, the situation is probably not that bad.

Is surprise removal of block devices still a big problem? We have some
support for (non-virtio) ccw devices (e.g. dasd) via a 'disconnected'
state that was designed to mitigate problems with block devices that are
suddenly gone.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

      parent reply	other threads:[~2021-07-19  9:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-17  7:42 [PATCH 0/4] virtio short improvements Parav Pandit via Virtualization
2021-07-17  7:42 ` [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization Parav Pandit via Virtualization
2021-07-17 20:38   ` Michael S. Tsirkin
2021-07-19  5:26     ` Parav Pandit via Virtualization
2021-07-19 12:04       ` Michael S. Tsirkin
2021-07-19 14:20         ` Parav Pandit via Virtualization
2021-07-17  7:42 ` [PATCH 2/4] virtio: Keep vring_del_virtqueue() mirror of VQ create Parav Pandit via Virtualization
2021-07-17  7:42 ` [PATCH 3/4] virtio: Protect vqs list access Parav Pandit via Virtualization
2021-07-17 20:41   ` Michael S. Tsirkin
2021-07-19  5:37     ` Parav Pandit via Virtualization
2021-07-17  7:42 ` [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device Parav Pandit via Virtualization
2021-07-17 20:46   ` Michael S. Tsirkin
2021-07-19  5:44     ` Parav Pandit via Virtualization
2021-07-19 12:07       ` Michael S. Tsirkin
2021-07-19 14:15         ` Parav Pandit via Virtualization
2021-07-19  9:40     ` Cornelia Huck [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878s22bq13.fsf@redhat.com \
    --to=cohuck@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@nvidia.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.