All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] vfio: Some fixes about vfio-pci MMIO RAM mapping
@ 2021-09-03  9:36 Kunkun Jiang
  2021-09-03  9:36 ` [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration Kunkun Jiang
  2021-09-03  9:36 ` [PATCH 2/2] vfio/common: Add trace point when a MMIO RAM section less than PAGE_SIZE Kunkun Jiang
  0 siblings, 2 replies; 9+ messages in thread
From: Kunkun Jiang @ 2021-09-03  9:36 UTC (permalink / raw)
  To: Alex Williamson, Eric Auger, Kirti Wankhede,
	open list:All patches CC here
  Cc: wanghaibin.wang, Kunkun Jiang, tangnianyao, ganqixin

This series include patches as below:

Patch 1:
- Deleted a check to fix vfio-pci sub-page MMIO BAR mmaping in live migration

Patch 2:
- Added a trace point to informe users when a MMIO RAM ection less than PAGE_SIZE

Kunkun Jiang (2):
  vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration
  vfio/common: Add trace point when a MMIO RAM section less than
    PAGE_SIZE

 hw/vfio/common.c | 7 +++++++
 hw/vfio/pci.c    | 8 +-------
 2 files changed, 8 insertions(+), 7 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration
  2021-09-03  9:36 [PATCH 0/2] vfio: Some fixes about vfio-pci MMIO RAM mapping Kunkun Jiang
@ 2021-09-03  9:36 ` Kunkun Jiang
  2021-09-08 20:45   ` Alex Williamson
  2021-09-03  9:36 ` [PATCH 2/2] vfio/common: Add trace point when a MMIO RAM section less than PAGE_SIZE Kunkun Jiang
  1 sibling, 1 reply; 9+ messages in thread
From: Kunkun Jiang @ 2021-09-03  9:36 UTC (permalink / raw)
  To: Alex Williamson, Eric Auger, Kirti Wankhede,
	open list:All patches CC here
  Cc: wanghaibin.wang, Kunkun Jiang, tangnianyao, ganqixin

We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
vfio_pci_write_config to improve IO performance.
The MemoryRegions of destination VM will not be expanded
successful in live migration, because their addresses have
been updated in vmstate_load_state (vfio_pci_load_config).

Remove the restriction on base address change in
vfio_pci_write_config for correct mmapping sub-page MMIO
BARs. Accroding to my analysis, the remaining parameter
verification is enough.

Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
Reported-by: Nianyao Tang <tangnianyao@huawei.com>
Reported-by: Qixin Gan <ganqixin@huawei.com>
Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 hw/vfio/pci.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e1ea1d8a23..891b211ddf 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1189,18 +1189,12 @@ void vfio_pci_write_config(PCIDevice *pdev,
         }
     } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
         range_covers_byte(addr, len, PCI_COMMAND)) {
-        pcibus_t old_addr[PCI_NUM_REGIONS - 1];
         int bar;
 
-        for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
-            old_addr[bar] = pdev->io_regions[bar].addr;
-        }
-
         pci_default_write_config(pdev, addr, val, len);
 
         for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
-            if (old_addr[bar] != pdev->io_regions[bar].addr &&
-                vdev->bars[bar].region.size > 0 &&
+            if (vdev->bars[bar].region.size > 0 &&
                 vdev->bars[bar].region.size < qemu_real_host_page_size) {
                 vfio_sub_page_bar_update_mapping(pdev, bar);
             }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] vfio/common: Add trace point when a MMIO RAM section less than PAGE_SIZE
  2021-09-03  9:36 [PATCH 0/2] vfio: Some fixes about vfio-pci MMIO RAM mapping Kunkun Jiang
  2021-09-03  9:36 ` [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration Kunkun Jiang
@ 2021-09-03  9:36 ` Kunkun Jiang
  2021-09-08 20:45   ` Alex Williamson
  1 sibling, 1 reply; 9+ messages in thread
From: Kunkun Jiang @ 2021-09-03  9:36 UTC (permalink / raw)
  To: Alex Williamson, Eric Auger, Kirti Wankhede,
	open list:All patches CC here
  Cc: wanghaibin.wang, Kunkun Jiang, tangnianyao, ganqixin

The MSI-X structures of some devices and other non-MSI-X structures
are in the same BAR. They may share one host page, especially in the
case of large page granularity, suck as 64K.

For example, MSIX-Table size of 82599 NIC is 0x30 and the offset in
Bar 3(size 64KB) is 0x0. If host page size is 64KB.
vfio_listenerregion_add() will be called to map the remaining range
(0x30-0xffff). And it will return early at
'int128_ge((int128_make64(iova), llend))' and hasn't any message.
Let's add a trace point to informed users like 5c08600547c did.

Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
---
 hw/vfio/common.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8728d4d5c2..2fc6213c0f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -892,6 +892,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
     llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask));
 
     if (int128_ge(int128_make64(iova), llend)) {
+        if (memory_region_is_ram_device(section->mr)) {
+            trace_vfio_listener_region_add_no_dma_map(
+                memory_region_name(section->mr),
+                section->offset_within_address_space,
+                int128_getlo(section->size),
+                qemu_real_host_page_size);
+        }
         return;
     }
     end = int128_get64(int128_sub(llend, int128_one()));
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration
  2021-09-03  9:36 ` [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration Kunkun Jiang
@ 2021-09-08 20:45   ` Alex Williamson
  2021-09-10  8:33     ` Kunkun Jiang
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2021-09-08 20:45 UTC (permalink / raw)
  To: Kunkun Jiang
  Cc: open list:All patches CC here, Eric Auger, Kirti Wankhede,
	tangnianyao, ganqixin, wanghaibin.wang

On Fri, 3 Sep 2021 17:36:10 +0800
Kunkun Jiang <jiangkunkun@huawei.com> wrote:

> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
> vfio_pci_write_config to improve IO performance.
> The MemoryRegions of destination VM will not be expanded
> successful in live migration, because their addresses have
> been updated in vmstate_load_state (vfio_pci_load_config).

What's the call path through vfio_pci_write_config() that you're
relying on to get triggered to enable this and why wouldn't we just
walk all sub-page BARs in vfio_pci_load_config() to resolve the issue
then?  It's my understanding that we do this update in write-config
because it's required that the VM sizes the BAR before using it, which
is not the case when we resume from migration.  Thanks,

Alex
 
> Remove the restriction on base address change in
> vfio_pci_write_config for correct mmapping sub-page MMIO
> BARs. Accroding to my analysis, the remaining parameter
> verification is enough.
> 
> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
> Reported-by: Qixin Gan <ganqixin@huawei.com>
> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
>  hw/vfio/pci.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index e1ea1d8a23..891b211ddf 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1189,18 +1189,12 @@ void vfio_pci_write_config(PCIDevice *pdev,
>          }
>      } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
>          range_covers_byte(addr, len, PCI_COMMAND)) {
> -        pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>          int bar;
>  
> -        for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> -            old_addr[bar] = pdev->io_regions[bar].addr;
> -        }
> -
>          pci_default_write_config(pdev, addr, val, len);
>  
>          for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> -            if (old_addr[bar] != pdev->io_regions[bar].addr &&
> -                vdev->bars[bar].region.size > 0 &&
> +            if (vdev->bars[bar].region.size > 0 &&
>                  vdev->bars[bar].region.size < qemu_real_host_page_size) {
>                  vfio_sub_page_bar_update_mapping(pdev, bar);
>              }



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] vfio/common: Add trace point when a MMIO RAM section less than PAGE_SIZE
  2021-09-03  9:36 ` [PATCH 2/2] vfio/common: Add trace point when a MMIO RAM section less than PAGE_SIZE Kunkun Jiang
@ 2021-09-08 20:45   ` Alex Williamson
  2021-09-10  8:38     ` Kunkun Jiang
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2021-09-08 20:45 UTC (permalink / raw)
  To: Kunkun Jiang
  Cc: open list:All patches CC here, Eric Auger, Kirti Wankhede,
	tangnianyao, ganqixin, wanghaibin.wang

On Fri, 3 Sep 2021 17:36:11 +0800
Kunkun Jiang <jiangkunkun@huawei.com> wrote:

> The MSI-X structures of some devices and other non-MSI-X structures
> are in the same BAR. They may share one host page, especially in the
> case of large page granularity, suck as 64K.

s/suck/such/

> For example, MSIX-Table size of 82599 NIC is 0x30 and the offset in
> Bar 3(size 64KB) is 0x0. If host page size is 64KB.
> vfio_listenerregion_add() will be called to map the remaining range

s/vfio_listenerregion_add/vfio_listener_region_add/

> (0x30-0xffff). And it will return early at
> 'int128_ge((int128_make64(iova), llend))' and hasn't any message.
> Let's add a trace point to informed users like 5c08600547c did.

Please use the following syntax for referencing previous commits
(12-char sha1 is standard):

  commit 5c08600547c0 ("vfio: Use a trace point when a RAM section
  cannot be DMA mapped")

Thanks,
Alex

> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> ---
>  hw/vfio/common.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 8728d4d5c2..2fc6213c0f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -892,6 +892,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
>      llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask));
>  
>      if (int128_ge(int128_make64(iova), llend)) {
> +        if (memory_region_is_ram_device(section->mr)) {
> +            trace_vfio_listener_region_add_no_dma_map(
> +                memory_region_name(section->mr),
> +                section->offset_within_address_space,
> +                int128_getlo(section->size),
> +                qemu_real_host_page_size);
> +        }
>          return;
>      }
>      end = int128_get64(int128_sub(llend, int128_one()));



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration
  2021-09-08 20:45   ` Alex Williamson
@ 2021-09-10  8:33     ` Kunkun Jiang
  2021-09-10 22:24       ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Kunkun Jiang @ 2021-09-10  8:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: open list:All patches CC here, Eric Auger, Kirti Wankhede,
	tangnianyao, ganqixin, wanghaibin.wang

Hi Alex,

On 2021/9/9 4:45, Alex Williamson wrote:
> On Fri, 3 Sep 2021 17:36:10 +0800
> Kunkun Jiang <jiangkunkun@huawei.com> wrote:
>
>> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
>> vfio_pci_write_config to improve IO performance.
>> The MemoryRegions of destination VM will not be expanded
>> successful in live migration, because their addresses have
>> been updated in vmstate_load_state (vfio_pci_load_config).
> What's the call path through vfio_pci_write_config() that you're
> relying on to get triggered to enable this and why wouldn't we just
> walk all sub-page BARs in vfio_pci_load_config() to resolve the issue
> then?  It's my understanding that we do this update in write-config
> because it's required that the VM sizes the BAR before using it, which
> is not the case when we resume from migration.  Thanks,
Let's take an example:

AArch64
host page granularity: 64KB
PCI device: *Bar2 size 32KB* [mem 0x8000200000-0x800020ffff 64bit pref]

When enable Command register bit 1(Memory Space), the code flow is
as follows:

vfio_pci_write_config (addr: 4 val: 2 len: 2)
     // record the old address of each bar, 0xffffffffffffffff
     old_addr[bar] = pdev->io_regions[bar].addr;
     pci_default_write_config
         pci_update_mappings
             new_addr = pci_bar_address    // 0x8000200000
             r->addr = new_addr;
             memory_region_addr_subregion_overlap
                 ...
*vfio_listener_region_add*
                     alignment check of the ram section address and size 
fail, return
*kvm_region_add*
                     kvm_set_phys_mem
                         alignment check of the ram section address and 
size fail, return

     // old_addr[bar] != pdev->io_regions[bar].addr &&
     // 0 < vdev->bars[bar].region.size < qemu_real_host_page_size
     vfio_sub_page_bar_update_mapping
*bar size = qemu_real_host_page_size*
             ...
             vfio_listener_region_add
                 map success
             kvm_region_add
                 kvm_set_phys_mem
                     map success

In live migration, only pci config data is sent to the destination VM.
Therefore, we need to update the bar's size before destination VM
using it.

In vfio_pci_load_config, the code flow is as follows:

vfio_pci_load_config
     vmstate_load_state
         *get_pci_config_device*
             pci_update_mappings
                 ...
                 // bar's addr is updated(0x8000200000), but bar's size 
is still 32KB, so map failed
     vfio_pci_write_config
         // bar's addr will not be changed, so 
vfio_sub_page_bar_update_mapping won't be called

My idea is that removing the check 'old_addr[bar] != 
pdev->io_regions[bar].addr' doesn't
affect the previous process. There's also a bar size check. In 
vfio_sub_page_bar_update_mapping,
it will check if bar is mapped and page aligned.
1) If bar's addr is 0xffffffffffffffff, it will not pass the 
vfio_sub_page_bar_update_mapping check.
2) If bar's size has been updated, it will not pass the bar size check 
in vfio_pci_write_config.

Thanks,
Kunkun Jiang

> Alex
>   
>> Remove the restriction on base address change in
>> vfio_pci_write_config for correct mmapping sub-page MMIO
>> BARs. Accroding to my analysis, the remaining parameter
>> verification is enough.
>>
>> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
>> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
>> Reported-by: Qixin Gan <ganqixin@huawei.com>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>   hw/vfio/pci.c | 8 +-------
>>   1 file changed, 1 insertion(+), 7 deletions(-)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index e1ea1d8a23..891b211ddf 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -1189,18 +1189,12 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>           }
>>       } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
>>           range_covers_byte(addr, len, PCI_COMMAND)) {
>> -        pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>>           int bar;
>>   
>> -        for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>> -            old_addr[bar] = pdev->io_regions[bar].addr;
>> -        }
>> -
>>           pci_default_write_config(pdev, addr, val, len);
>>   
>>           for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>> -            if (old_addr[bar] != pdev->io_regions[bar].addr &&
>> -                vdev->bars[bar].region.size > 0 &&
>> +            if (vdev->bars[bar].region.size > 0 &&
>>                   vdev->bars[bar].region.size < qemu_real_host_page_size) {
>>                   vfio_sub_page_bar_update_mapping(pdev, bar);
>>               }
> .




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] vfio/common: Add trace point when a MMIO RAM section less than PAGE_SIZE
  2021-09-08 20:45   ` Alex Williamson
@ 2021-09-10  8:38     ` Kunkun Jiang
  0 siblings, 0 replies; 9+ messages in thread
From: Kunkun Jiang @ 2021-09-10  8:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: open list:All patches CC here, Eric Auger, Kirti Wankhede,
	tangnianyao, ganqixin, wanghaibin.wang

Hi Alex,

On 2021/9/9 4:45, Alex Williamson wrote:
> On Fri, 3 Sep 2021 17:36:11 +0800
> Kunkun Jiang <jiangkunkun@huawei.com> wrote:
>
>> The MSI-X structures of some devices and other non-MSI-X structures
>> are in the same BAR. They may share one host page, especially in the
>> case of large page granularity, suck as 64K.
> s/suck/such/
ok
>> For example, MSIX-Table size of 82599 NIC is 0x30 and the offset in
>> Bar 3(size 64KB) is 0x0. If host page size is 64KB.
>> vfio_listenerregion_add() will be called to map the remaining range
> s/vfio_listenerregion_add/vfio_listener_region_add/
ok
>> (0x30-0xffff). And it will return early at
>> 'int128_ge((int128_make64(iova), llend))' and hasn't any message.
>> Let's add a trace point to informed users like 5c08600547c did.
> Please use the following syntax for referencing previous commits
> (12-char sha1 is standard):
>
>    commit 5c08600547c0 ("vfio: Use a trace point when a RAM section
>    cannot be DMA mapped")
Thank you for your reply. I will correct it in V2.

Thanks,
Kunkun Jiang
> Thanks,
> Alex
>
>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>> ---
>>   hw/vfio/common.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 8728d4d5c2..2fc6213c0f 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -892,6 +892,13 @@ static void vfio_listener_region_add(MemoryListener *listener,
>>       llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask));
>>   
>>       if (int128_ge(int128_make64(iova), llend)) {
>> +        if (memory_region_is_ram_device(section->mr)) {
>> +            trace_vfio_listener_region_add_no_dma_map(
>> +                memory_region_name(section->mr),
>> +                section->offset_within_address_space,
>> +                int128_getlo(section->size),
>> +                qemu_real_host_page_size);
>> +        }
>>           return;
>>       }
>>       end = int128_get64(int128_sub(llend, int128_one()));
> .




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration
  2021-09-10  8:33     ` Kunkun Jiang
@ 2021-09-10 22:24       ` Alex Williamson
  2021-09-13  2:44         ` Kunkun Jiang
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2021-09-10 22:24 UTC (permalink / raw)
  To: Kunkun Jiang
  Cc: open list:All patches CC here, Eric Auger, Kirti Wankhede,
	tangnianyao, ganqixin, wanghaibin.wang

On Fri, 10 Sep 2021 16:33:12 +0800
Kunkun Jiang <jiangkunkun@huawei.com> wrote:

> Hi Alex,
> 
> On 2021/9/9 4:45, Alex Williamson wrote:
> > On Fri, 3 Sep 2021 17:36:10 +0800
> > Kunkun Jiang <jiangkunkun@huawei.com> wrote:
> >  
> >> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
> >> vfio_pci_write_config to improve IO performance.
> >> The MemoryRegions of destination VM will not be expanded
> >> successful in live migration, because their addresses have
> >> been updated in vmstate_load_state (vfio_pci_load_config).  
> > What's the call path through vfio_pci_write_config() that you're
> > relying on to get triggered to enable this and why wouldn't we just
> > walk all sub-page BARs in vfio_pci_load_config() to resolve the issue
> > then?  It's my understanding that we do this update in write-config
> > because it's required that the VM sizes the BAR before using it, which
> > is not the case when we resume from migration.  Thanks,  
> Let's take an example:
> 
> AArch64
> host page granularity: 64KB
> PCI device: *Bar2 size 32KB* [mem 0x8000200000-0x800020ffff 64bit pref]
> 
> When enable Command register bit 1(Memory Space), the code flow is
> as follows:
> 
> vfio_pci_write_config (addr: 4 val: 2 len: 2)
>      // record the old address of each bar, 0xffffffffffffffff
>      old_addr[bar] = pdev->io_regions[bar].addr;
>      pci_default_write_config
>          pci_update_mappings
>              new_addr = pci_bar_address    // 0x8000200000
>              r->addr = new_addr;
>              memory_region_addr_subregion_overlap
>                  ...
> *vfio_listener_region_add*
>                      alignment check of the ram section address and size 
> fail, return
> *kvm_region_add*
>                      kvm_set_phys_mem
>                          alignment check of the ram section address and 
> size fail, return
> 
>      // old_addr[bar] != pdev->io_regions[bar].addr &&
>      // 0 < vdev->bars[bar].region.size < qemu_real_host_page_size
>      vfio_sub_page_bar_update_mapping
> *bar size = qemu_real_host_page_size*
>              ...
>              vfio_listener_region_add
>                  map success
>              kvm_region_add
>                  kvm_set_phys_mem
>                      map success
> 
> In live migration, only pci config data is sent to the destination VM.
> Therefore, we need to update the bar's size before destination VM
> using it.
> 
> In vfio_pci_load_config, the code flow is as follows:
> 
> vfio_pci_load_config
>      vmstate_load_state
>          *get_pci_config_device*
>              pci_update_mappings
>                  ...
>                  // bar's addr is updated(0x8000200000), but bar's size 
> is still 32KB, so map failed
>      vfio_pci_write_config
>          // bar's addr will not be changed, so 
> vfio_sub_page_bar_update_mapping won't be called
> 
> My idea is that removing the check 'old_addr[bar] != 
> pdev->io_regions[bar].addr' doesn't
> affect the previous process. There's also a bar size check. In 
> vfio_sub_page_bar_update_mapping,
> it will check if bar is mapped and page aligned.
> 1) If bar's addr is 0xffffffffffffffff, it will not pass the 
> vfio_sub_page_bar_update_mapping check.
> 2) If bar's size has been updated, it will not pass the bar size check 
> in vfio_pci_write_config.

The bar size check in vfio_pci_write_config() only tests if the vfio
region is >0 and <qemu_real_host_page_size, afaict our sub-page support
affects the size of the MemoryRegions mapping the vfio region, but we
never change the vfio region size itself.  So if you're expecting
(vdev->bars[bar].region.size == qemu_real_host_page_size) once we setup
the sub-page support, I'm not convinced that's true.

So yes, sub-page-update can reject invalid addresses and we already
rely on it to do so, but the code being removed avoids that redundant
writes to the BAR won't trigger redundant MemoryRegion manipulation.
Maybe those are harmless, but that's not your argument for allowing it.

OTOH, why wouldn't vfio_pci_load_config() iterate sub-page BARs and try
to update them at that time?  Thanks,

Alex


> >     
> >> Remove the restriction on base address change in
> >> vfio_pci_write_config for correct mmapping sub-page MMIO
> >> BARs. Accroding to my analysis, the remaining parameter
> >> verification is enough.
> >>
> >> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
> >> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
> >> Reported-by: Qixin Gan <ganqixin@huawei.com>
> >> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
> >> ---
> >>   hw/vfio/pci.c | 8 +-------
> >>   1 file changed, 1 insertion(+), 7 deletions(-)
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index e1ea1d8a23..891b211ddf 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -1189,18 +1189,12 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >>           }
> >>       } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
> >>           range_covers_byte(addr, len, PCI_COMMAND)) {
> >> -        pcibus_t old_addr[PCI_NUM_REGIONS - 1];
> >>           int bar;
> >>   
> >> -        for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> >> -            old_addr[bar] = pdev->io_regions[bar].addr;
> >> -        }
> >> -
> >>           pci_default_write_config(pdev, addr, val, len);
> >>   
> >>           for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> >> -            if (old_addr[bar] != pdev->io_regions[bar].addr &&
> >> -                vdev->bars[bar].region.size > 0 &&
> >> +            if (vdev->bars[bar].region.size > 0 &&
> >>                   vdev->bars[bar].region.size < qemu_real_host_page_size) {
> >>                   vfio_sub_page_bar_update_mapping(pdev, bar);
> >>               }  
> > .  
> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration
  2021-09-10 22:24       ` Alex Williamson
@ 2021-09-13  2:44         ` Kunkun Jiang
  0 siblings, 0 replies; 9+ messages in thread
From: Kunkun Jiang @ 2021-09-13  2:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: open list:All patches CC here, Eric Auger, Kirti Wankhede,
	tangnianyao, ganqixin, wanghaibin.wang

On 2021/9/11 6:24, Alex Williamson wrote:
> On Fri, 10 Sep 2021 16:33:12 +0800
> Kunkun Jiang <jiangkunkun@huawei.com> wrote:
>
>> Hi Alex,
>>
>> On 2021/9/9 4:45, Alex Williamson wrote:
>>> On Fri, 3 Sep 2021 17:36:10 +0800
>>> Kunkun Jiang <jiangkunkun@huawei.com> wrote:
>>>   
>>>> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
>>>> vfio_pci_write_config to improve IO performance.
>>>> The MemoryRegions of destination VM will not be expanded
>>>> successful in live migration, because their addresses have
>>>> been updated in vmstate_load_state (vfio_pci_load_config).
>>> What's the call path through vfio_pci_write_config() that you're
>>> relying on to get triggered to enable this and why wouldn't we just
>>> walk all sub-page BARs in vfio_pci_load_config() to resolve the issue
>>> then?  It's my understanding that we do this update in write-config
>>> because it's required that the VM sizes the BAR before using it, which
>>> is not the case when we resume from migration.  Thanks,
>> Let's take an example:
>>
>> AArch64
>> host page granularity: 64KB
>> PCI device: *Bar2 size 32KB* [mem 0x8000200000-0x800020ffff 64bit pref]
>>
>> When enable Command register bit 1(Memory Space), the code flow is
>> as follows:
>>
>> vfio_pci_write_config (addr: 4 val: 2 len: 2)
>>       // record the old address of each bar, 0xffffffffffffffff
>>       old_addr[bar] = pdev->io_regions[bar].addr;
>>       pci_default_write_config
>>           pci_update_mappings
>>               new_addr = pci_bar_address    // 0x8000200000
>>               r->addr = new_addr;
>>               memory_region_addr_subregion_overlap
>>                   ...
>> *vfio_listener_region_add*
>>                       alignment check of the ram section address and size
>> fail, return
>> *kvm_region_add*
>>                       kvm_set_phys_mem
>>                           alignment check of the ram section address and
>> size fail, return
>>
>>       // old_addr[bar] != pdev->io_regions[bar].addr &&
>>       // 0 < vdev->bars[bar].region.size < qemu_real_host_page_size
>>       vfio_sub_page_bar_update_mapping
>> *bar size = qemu_real_host_page_size*
>>               ...
>>               vfio_listener_region_add
>>                   map success
>>               kvm_region_add
>>                   kvm_set_phys_mem
>>                       map success
>>
>> In live migration, only pci config data is sent to the destination VM.
>> Therefore, we need to update the bar's size before destination VM
>> using it.
>>
>> In vfio_pci_load_config, the code flow is as follows:
>>
>> vfio_pci_load_config
>>       vmstate_load_state
>>           *get_pci_config_device*
>>               pci_update_mappings
>>                   ...
>>                   // bar's addr is updated(0x8000200000), but bar's size
>> is still 32KB, so map failed
>>       vfio_pci_write_config
>>           // bar's addr will not be changed, so
>> vfio_sub_page_bar_update_mapping won't be called
>>
>> My idea is that removing the check 'old_addr[bar] !=
>> pdev->io_regions[bar].addr' doesn't
>> affect the previous process. There's also a bar size check. In
>> vfio_sub_page_bar_update_mapping,
>> it will check if bar is mapped and page aligned.
>> 1) If bar's addr is 0xffffffffffffffff, it will not pass the
>> vfio_sub_page_bar_update_mapping check.
>> 2) If bar's size has been updated, it will not pass the bar size check
>> in vfio_pci_write_config.
> The bar size check in vfio_pci_write_config() only tests if the vfio
> region is >0 and <qemu_real_host_page_size, afaict our sub-page support
> affects the size of the MemoryRegions mapping the vfio region, but we
> never change the vfio region size itself.  So if you're expecting
> (vdev->bars[bar].region.size == qemu_real_host_page_size) once we setup
> the sub-page support, I'm not convinced that's true.
>
> So yes, sub-page-update can reject invalid addresses and we already
> rely on it to do so, but the code being removed avoids that redundant
> writes to the BAR won't trigger redundant MemoryRegion manipulation.
> Maybe those are harmless, but that's not your argument for allowing it.
>
> OTOH, why wouldn't vfio_pci_load_config() iterate sub-page BARs and try
> to update them at that time?  Thanks,
Like this? I've tested it, and it's okay.
> vfio_pci_load_config
> {
>      for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>          if (old_addr[bar] != pdev->io_regions[bar].addr &&
>              vdev->bars[bar].region.size > 0 &&
>              vdev->bars[bar].region.size < qemu_real_host_page_size) {
>              vfio_sub_page_bar_update_mapping(pdev, bar);
>          }
>      }
> }

Thanks,
Kunkun Jiang
>
> Alex
>
>
>>>      
>>>> Remove the restriction on base address change in
>>>> vfio_pci_write_config for correct mmapping sub-page MMIO
>>>> BARs. Accroding to my analysis, the remaining parameter
>>>> verification is enough.
>>>>
>>>> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
>>>> Reported-by: Nianyao Tang <tangnianyao@huawei.com>
>>>> Reported-by: Qixin Gan <ganqixin@huawei.com>
>>>> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com>
>>>> ---
>>>>    hw/vfio/pci.c | 8 +-------
>>>>    1 file changed, 1 insertion(+), 7 deletions(-)
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index e1ea1d8a23..891b211ddf 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -1189,18 +1189,12 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>            }
>>>>        } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
>>>>            range_covers_byte(addr, len, PCI_COMMAND)) {
>>>> -        pcibus_t old_addr[PCI_NUM_REGIONS - 1];
>>>>            int bar;
>>>>    
>>>> -        for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>>>> -            old_addr[bar] = pdev->io_regions[bar].addr;
>>>> -        }
>>>> -
>>>>            pci_default_write_config(pdev, addr, val, len);
>>>>    
>>>>            for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
>>>> -            if (old_addr[bar] != pdev->io_regions[bar].addr &&
>>>> -                vdev->bars[bar].region.size > 0 &&
>>>> +            if (vdev->bars[bar].region.size > 0 &&
>>>>                    vdev->bars[bar].region.size < qemu_real_host_page_size) {
>>>>                    vfio_sub_page_bar_update_mapping(pdev, bar);
>>>>                }
>>> .
>>
> .




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-13  2:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03  9:36 [PATCH 0/2] vfio: Some fixes about vfio-pci MMIO RAM mapping Kunkun Jiang
2021-09-03  9:36 ` [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration Kunkun Jiang
2021-09-08 20:45   ` Alex Williamson
2021-09-10  8:33     ` Kunkun Jiang
2021-09-10 22:24       ` Alex Williamson
2021-09-13  2:44         ` Kunkun Jiang
2021-09-03  9:36 ` [PATCH 2/2] vfio/common: Add trace point when a MMIO RAM section less than PAGE_SIZE Kunkun Jiang
2021-09-08 20:45   ` Alex Williamson
2021-09-10  8:38     ` Kunkun Jiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.