All of lore.kernel.org
 help / color / mirror / Atom feed
From: liulongfang via <qemu-devel@nongnu.org>
To: Avihai Horon <avihaih@nvidia.com>, <qemu-devel@nongnu.org>,
	Cornelia Huck <cohuck@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	Maor Gottlieb <maorg@nvidia.com>,
	Kirti Wankhede <kwankhede@nvidia.com>,
	Tarun Gupta <targupta@nvidia.com>
Subject: Re: [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails
Date: Tue, 11 Oct 2022 09:41:03 +0800	[thread overview]
Message-ID: <6e83765a-cbae-781f-5bd0-b7a34c477740@huawei.com> (raw)
In-Reply-To: <20220530170739.19072-10-avihaih@nvidia.com>

On 2022/5/31 1:07, Avihai Horon wrote:
> If vfio_migration_set_state() fails to set the device in the requested
> state it tries to put it in a recover state. If setting the device in
> the recover state fails as well, hw_error is triggered and the VM is
> aborted.
> 
> To improve user experience and avoid VM data loss, reset the device with
> VFIO_RESET_DEVICE instead of aborting the VM.
> 
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>  hw/vfio/migration.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 852759e6ca..6c34502611 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -89,8 +89,16 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>          /* Try to put the device in some good state */
>          mig_state->device_state = recover_state;
>          if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -            hw_error("%s: Device in error state, can't recover",
> -                     vbasedev->name);
> +            if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
> +                hw_error("%s: Device in error state, can't recover",
> +                         vbasedev->name);
> +            }
> +
> +            error_report(
> +                "%s: Device was reset due to failure in changing device state to recover state %s",
> +                vbasedev->name, mig_state_to_str(recover_state));
> +
> +            return -1;
>          }
> 

When I used the qemu 7.1.50 version compiled with this set of patches,
I found that after the migration failed due to disconnecting the destination VM
during the live migration process, when I exited the source qemu, the
following error would appear:

[100337.287047] BUG: Bad page state in process qemu-system-aar  pfn:82199518
[100337.295815] page:00000000356de4da refcount:-2 mapcount:0 mapping:00000000000
00000 index:0x0 pfn:0x82199518
[100337.306403] flags: 0xbfff80000000000(node=0|zone=2|lastcpupid=0x7fff)
[100337.314091] raw: 0bfff80000000000 dead000000000100 dead000000000122 00000000
00000000
[100337.322589] raw: 0000000000000000 0000000000000000 fffffffeffffffff 00000000
00000000
[100337.330630] page dumped because: nonzero _refcount
[100337.335840] Modules linked in: hisi_acc_vfio_pci hisi_sec2 hisi_zip hisi_hpr
e hisi_qm uacce vfio_iommu_type1 vfio_pci vfio_pci_core vfio_virqfd vfio pv680_m
ii(O) [last unloaded: hisi_sec2]
[100337.354564] CPU: 1 PID: 786 Comm: qemu-system-aar Tainted: G    B      O
   6.0.0-rc4+ #1
[100337.377378] Call trace:
[100337.380382]  dump_backtrace.part.0+0xc4/0xd0
[100337.385791]  show_stack+0x24/0x40
[100337.389478]  dump_stack_lvl+0x68/0x84
[100337.394155]  dump_stack+0x18/0x34
[100337.398006]  bad_page+0xf0/0x120
[100337.401796]  check_free_page_bad+0x84/0x90
[100337.406404]  free_pcppages_bulk+0x1bc/0x2b0
[100337.411126]  free_unref_page_commit+0x120/0x15c
[100337.416935]  free_unref_page+0x15c/0x254
[100337.421436]  free_compound_page+0x6c/0x100
[100337.425868]  free_transhuge_page+0xd4/0x140
[100337.430535]  destroy_large_folio+0x30/0x40
[100337.434953]  release_pages+0x1bc/0x4d0
[100337.439268]  free_pages_and_swap_cache+0x68/0x80
[100337.444224]  tlb_batch_pages_flush+0x5c/0x94
[100337.448976]  tlb_flush_mmu+0x4c/0xd4
[100337.453062]  unmap_page_range+0x8d0/0xbd0
[100337.457432]  unmap_single_vma+0x90/0x12c
[100337.461673]  unmap_vmas+0x84/0xfc
[100337.465354]  exit_mmap+0x88/0x1b0
[100337.469008]  __mmput+0x48/0x134
[100337.472637]  mmput+0x44/0x50
[100337.475857]  do_exit+0x2b8/0x970
[100337.479641]  do_group_exit+0x40/0xac
[100337.484079]  get_signal+0x8c0/0x934
[100337.488215]  do_notify_resume+0x1d0/0x1570
[100337.492795]  el0_svc+0xa8/0xc0
[100337.496452]  el0t_64_sync_handler+0x1ac/0x1b0
[100337.501187]  el0t_64_sync+0x19c/0x1a0

Can anyone see what is causing this error?

>          error_report("%s: Failed changing device state to %s", vbasedev->name,
> 
Thanks
Longfang.


  reply	other threads:[~2022-10-11  1:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-30 17:07 [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2022-05-30 17:07 ` [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug Avihai Horon
2022-05-30 17:07 ` [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported Avihai Horon
2022-05-30 17:12   ` Avihai Horon
2022-06-07 17:53     ` Avihai Horon
2022-05-30 17:07 ` [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd() Avihai Horon
2022-05-30 17:07 ` [PATCH v2 04/11] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one Avihai Horon
2022-05-30 17:07 ` [PATCH v2 05/11] vfio/migration: Move migration v1 logic to vfio_migration_init() Avihai Horon
2022-05-30 17:07 ` [PATCH v2 06/11] vfio/migration: Rename functions/structs related to v1 protocol Avihai Horon
2022-05-30 17:07 ` [PATCH v2 07/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2022-06-14 11:08   ` Joao Martins
2022-06-14 16:34     ` Avihai Horon
2022-06-14 17:24       ` Joao Martins
2022-06-15  6:40         ` Avihai Horon
2022-07-18 15:12   ` Jason Gunthorpe
2022-07-27 15:45     ` Avihai Horon
2022-05-30 17:07 ` [PATCH v2 08/11] vfio/migration: Remove VFIO migration protocol v1 Avihai Horon
2022-09-19  8:35   ` liulongfang via
2022-09-19 11:50     ` Alex Williamson
2022-09-19 12:58       ` Philippe Mathieu-Daudé via
2022-09-19  9:41   ` Philippe Mathieu-Daudé via
2022-05-30 17:07 ` [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails Avihai Horon
2022-10-11  1:41   ` liulongfang via [this message]
2022-05-30 17:07 ` [PATCH v2 10/11] vfio: Alphabetize migration section of VFIO trace-events file Avihai Horon
2022-05-30 17:07 ` [PATCH v2 11/11] docs/devel: Align vfio-migration docs to VFIO migration v2 Avihai Horon
2022-06-07 17:44 ` [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Avihai Horon
2022-06-07 21:32   ` Alex Williamson
2022-06-13 11:21     ` Avihai Horon
2022-06-17 21:51       ` Alex Williamson
2022-06-23 14:56         ` Jason Gunthorpe
2022-06-27  7:36         ` Avihai Horon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e83765a-cbae-781f-5bd0-b7a34c477740@huawei.com \
    --to=qemu-devel@nongnu.org \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kwankhede@nvidia.com \
    --cc=liulongfang@huawei.com \
    --cc=maorg@nvidia.com \
    --cc=mbloch@nvidia.com \
    --cc=quintela@redhat.com \
    --cc=targupta@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.