All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements
@ 2024-02-20 12:24 Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
                   ` (14 more replies)
  0 siblings, 15 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki, qemu-stable

I submitted a RFC series[1] to add support for SR-IOV emulation to
virtio-net-pci. During the development of the series, I fixed some
trivial bugs and made improvements that I think are independently
useful. This series extracts those fixes and improvements from the RFC
series.

[1]: https://patchew.org/QEMU/20231210-sriov-v2-0-b959e8a6dfaf@daynix.com/

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
Changes in v6:
- Fixed migration.
- Added patch "pcie_sriov: Do not manually unrealize".
- Restored patch "pcie_sriov: Release VFs failed to realize" that was
  missed in v5.
- Link to v5: https://lore.kernel.org/r/20240218-reuse-v5-0-e4fc1c19b5a9@daynix.com

Changes in v5:
- Added patch "hw/pci: Always call pcie_sriov_pf_reset()".
- Added patch "pcie_sriov: Reset SR-IOV extended capability".
- Removed a reference to PCI_SRIOV_CTRL_VFE in hw/nvme.
  (Michael S. Tsirkin)
- Noted the impact on the guest of patch "pcie_sriov: Do not reset
  NumVFs after unregistering VFs". (Michael S. Tsirkin)
- Changed to use pcie_sriov_num_vfs().
- Restored pci_set_power() and changed it to call pci_set_enabled() only
  for PFs with an expalanation. (Michael S. Tsirkin)
- Reordered patches.
- Link to v4: https://lore.kernel.org/r/20240214-reuse-v4-0-89ad093a07f4@daynix.com

Changes in v4:
- Reverted the change to pci_rom_bar_explicitly_enabled().
  (Michael S. Tsirkin)
- Added patch "pcie_sriov: Do not reset NumVFs after unregistering VFs".
- Added patch "hw/nvme: Refer to dev->exp.sriov_pf.num_vfs".
- Link to v3: https://lore.kernel.org/r/20240212-reuse-v3-0-8017b689ce7f@daynix.com

Changes in v3:
- Extracted patch "hw/pci: Use -1 as a default value for rombar" from
  patch "hw/pci: Determine if rombar is explicitly enabled"
  (Philippe Mathieu-Daudé)
- Added an audit result of PCIDevice::rom_bar to the message of patch
  "hw/pci: Use -1 as a default value for rombar"
  (Philippe Mathieu-Daudé)
- Link to v2: https://lore.kernel.org/r/20240210-reuse-v2-0-24ba2a502692@daynix.com

Changes in v2:
- Reset after enabling a function so that NVMe VF state gets updated.
- Link to v1: https://lore.kernel.org/r/20240203-reuse-v1-0-5be8c5ce6338@daynix.com

---
Akihiko Odaki (15):
      hw/nvme: Use pcie_sriov_num_vfs()
      pcie_sriov: Validate NumVFs
      pcie_sriov: Reset SR-IOV extended capability
      pcie_sriov: Do not reset NumVFs after disabling VFs
      hw/pci: Always call pcie_sriov_pf_reset()
      hw/pci: Rename has_power to enabled
      pcie_sriov: Do not manually unrealize
      pcie_sriov: Reuse SR-IOV VF device instances
      pcie_sriov: Release VFs failed to realize
      pcie_sriov: Remove num_vfs from PCIESriovPF
      pcie_sriov: Register VFs after migration
      hw/pci: Use -1 as a default value for rombar
      hw/pci: Determine if rombar is explicitly enabled
      vfio: Avoid inspecting option QDict for rombar
      hw/qdev: Remove opts member

 docs/pcie_sriov.txt         |   8 ++-
 include/hw/pci/pci.h        |   2 +-
 include/hw/pci/pci_device.h |  22 +++++-
 include/hw/pci/pcie_sriov.h |  13 ++--
 include/hw/qdev-core.h      |   4 --
 hw/core/qdev.c              |   1 -
 hw/net/igb.c                |  15 ++--
 hw/nvme/ctrl.c              |  54 +++++++-------
 hw/pci/pci.c                |  24 ++++---
 hw/pci/pci_host.c           |   4 +-
 hw/pci/pcie_sriov.c         | 170 ++++++++++++++++++++++++--------------------
 hw/vfio/pci.c               |   3 +-
 system/qdev-monitor.c       |  12 ++--
 hw/pci/trace-events         |   2 +-
 14 files changed, 189 insertions(+), 145 deletions(-)
---
base-commit: 5005aed8a7e728d028efb40e243ecfc2b4f3df3a
change-id: 20240129-reuse-faae22b11934

Best regards,
-- 
Akihiko Odaki <akihiko.odaki@daynix.com>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 14:29   ` Kevin Wolf
  2024-02-20 12:24 ` [PATCH v6 02/15] pcie_sriov: Validate NumVFs Akihiko Odaki
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki, qemu-stable

nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
configurations to know the number of VFs being disabled due to SR-IOV
configuration writes, but the logic was flawed and resulted in
out-of-bound memory access.

It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
VFs, but it actually doesn't in the following cases:
- PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
- PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
- VFs were only partially enabled because of realization failure.

It is a responsibility of pcie_sriov to interpret SR-IOV configurations
and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
provides, to get the number of enabled VFs before and after SR-IOV
configuration writes.

Cc: qemu-stable@nongnu.org
Fixes: CVE-2024-26328
Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/nvme/ctrl.c | 26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f026245d1e9e..7a56e7b79b4d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
     nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
 }
 
-static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
-                                      uint32_t val, int len)
+static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
 {
     NvmeCtrl *n = NVME(dev);
     NvmeSecCtrlEntry *sctrl;
-    uint16_t sriov_cap = dev->exp.sriov_cap;
-    uint32_t off = address - sriov_cap;
-    int i, num_vfs;
+    int i;
 
-    if (!sriov_cap) {
-        return;
-    }
-
-    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
-        if (!(val & PCI_SRIOV_CTRL_VFE)) {
-            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
-            for (i = 0; i < num_vfs; i++) {
-                sctrl = &n->sec_ctrl_list.sec[i];
-                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
-            }
-        }
+    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
+        sctrl = &n->sec_ctrl_list.sec[i];
+        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
     }
 }
 
 static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
                                   uint32_t val, int len)
 {
-    nvme_sriov_pre_write_ctrl(dev, address, val, len);
+    uint16_t old_num_vfs = pcie_sriov_num_vfs(dev);
+
     pci_default_write_config(dev, address, val, len);
     pcie_cap_flr_write_config(dev, address, val, len);
+    nvme_sriov_post_write_config(dev, old_num_vfs);
 }
 
 static const VMStateDescription nvme_vmstate = {

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 02/15] pcie_sriov: Validate NumVFs
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 03/15] pcie_sriov: Reset SR-IOV extended capability Akihiko Odaki
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki, qemu-stable

The guest may write NumVFs greater than TotalVFs and that can lead
to buffer overflow in VF implementations.

Cc: qemu-stable@nongnu.org
Fixes: CVE-2024-26327
Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pcie_sriov.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index a1fe65f5d801..da209b7f47fd 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -176,6 +176,9 @@ static void register_vfs(PCIDevice *dev)
 
     assert(sriov_cap > 0);
     num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
+    if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
+        return;
+    }
 
     dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
 

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 03/15] pcie_sriov: Reset SR-IOV extended capability
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 02/15] pcie_sriov: Validate NumVFs Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 04/15] pcie_sriov: Do not reset NumVFs after disabling VFs Akihiko Odaki
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

pcie_sriov_pf_disable_vfs() is called when resetting the PF, but it only
disables VFs and does not reset SR-IOV extended capability, leaking the
state and making the VF Enable register inconsistent with the actual
state.

Replace pcie_sriov_pf_disable_vfs() with pcie_sriov_pf_reset(), which
does not only disable VFs but also resets the capability.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pcie_sriov.h |  4 ++--
 hw/net/igb.c                |  2 +-
 hw/nvme/ctrl.c              |  2 +-
 hw/pci/pcie_sriov.c         | 26 ++++++++++++++++++--------
 4 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 095fb0c9edf9..b77eb7bf58ac 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -58,8 +58,8 @@ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
                              uint32_t val, int len);
 
-/* Reset SR/IOV VF Enable bit to unregister all VFs */
-void pcie_sriov_pf_disable_vfs(PCIDevice *dev);
+/* Reset SR/IOV */
+void pcie_sriov_pf_reset(PCIDevice *dev);
 
 /* Get logical VF number of a VF - only valid for VFs */
 uint16_t pcie_sriov_vf_number(PCIDevice *dev);
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 0b5c31a58bba..9345506f81ec 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -493,7 +493,7 @@ static void igb_qdev_reset_hold(Object *obj)
 
     trace_e1000e_cb_qdev_reset_hold();
 
-    pcie_sriov_pf_disable_vfs(d);
+    pcie_sriov_pf_reset(d);
     igb_core_reset(&s->core);
 }
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 7a56e7b79b4d..7c0d3f108724 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7116,7 +7116,7 @@ static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
             }
 
             if (rst != NVME_RESET_CONTROLLER) {
-                pcie_sriov_pf_disable_vfs(pci_dev);
+                pcie_sriov_pf_reset(pci_dev);
             }
         }
 
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index da209b7f47fd..51b66d1bb342 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -249,16 +249,26 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
 }
 
 
-/* Reset SR/IOV VF Enable bit to trigger an unregister of all VFs */
-void pcie_sriov_pf_disable_vfs(PCIDevice *dev)
+/* Reset SR/IOV */
+void pcie_sriov_pf_reset(PCIDevice *dev)
 {
     uint16_t sriov_cap = dev->exp.sriov_cap;
-    if (sriov_cap) {
-        uint32_t val = pci_get_byte(dev->config + sriov_cap + PCI_SRIOV_CTRL);
-        if (val & PCI_SRIOV_CTRL_VFE) {
-            val &= ~PCI_SRIOV_CTRL_VFE;
-            pcie_sriov_config_write(dev, sriov_cap + PCI_SRIOV_CTRL, val, 1);
-        }
+    if (!sriov_cap) {
+        return;
+    }
+
+    pci_set_word(dev->config + sriov_cap + PCI_SRIOV_CTRL, 0);
+    unregister_vfs(dev);
+
+    /*
+     * Default is to use 4K pages, software can modify it
+     * to any of the supported bits
+     */
+    pci_set_word(dev->config + sriov_cap + PCI_SRIOV_SYS_PGSIZE, 0x1);
+
+    for (uint16_t i = 0; i < PCI_NUM_REGIONS; i++) {
+        pci_set_quad(dev->config + sriov_cap + PCI_SRIOV_BAR + i * 4,
+                     dev->exp.sriov_pf.vf_bar_type[i]);
     }
 }
 

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 04/15] pcie_sriov: Do not reset NumVFs after disabling VFs
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (2 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 03/15] pcie_sriov: Reset SR-IOV extended capability Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 05/15] hw/pci: Always call pcie_sriov_pf_reset() Akihiko Odaki
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

The spec does not NumVFs is reset after disabling VFs except when
resetting the PF. Clearing it is guest visible and out of spec, even
though Linux doesn't rely on this value being preserved, so we never
noticed.

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pcie_sriov.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 51b66d1bb342..e9b23221d713 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -215,7 +215,6 @@ static void unregister_vfs(PCIDevice *dev)
     g_free(dev->exp.sriov_pf.vf);
     dev->exp.sriov_pf.vf = NULL;
     dev->exp.sriov_pf.num_vfs = 0;
-    pci_set_word(dev->config + dev->exp.sriov_cap + PCI_SRIOV_NUM_VF, 0);
 }
 
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
@@ -260,6 +259,8 @@ void pcie_sriov_pf_reset(PCIDevice *dev)
     pci_set_word(dev->config + sriov_cap + PCI_SRIOV_CTRL, 0);
     unregister_vfs(dev);
 
+    pci_set_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF, 0);
+
     /*
      * Default is to use 4K pages, software can modify it
      * to any of the supported bits

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 05/15] hw/pci: Always call pcie_sriov_pf_reset()
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (3 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 04/15] pcie_sriov: Do not reset NumVFs after disabling VFs Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 06/15] hw/pci: Rename has_power to enabled Akihiko Odaki
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

Call pcie_sriov_pf_reset() from pci_do_device_reset() just as we do
for msi_reset() and msix_reset() to prevent duplicating code for each
SR-IOV PF.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/net/igb.c   | 2 --
 hw/nvme/ctrl.c | 4 ----
 hw/pci/pci.c   | 1 +
 3 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/hw/net/igb.c b/hw/net/igb.c
index 9345506f81ec..9b37523d6df8 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -488,12 +488,10 @@ static void igb_pci_uninit(PCIDevice *pci_dev)
 
 static void igb_qdev_reset_hold(Object *obj)
 {
-    PCIDevice *d = PCI_DEVICE(obj);
     IGBState *s = IGB(obj);
 
     trace_e1000e_cb_qdev_reset_hold();
 
-    pcie_sriov_pf_reset(d);
     igb_core_reset(&s->core);
 }
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 7c0d3f108724..c1af4b87b34a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7114,10 +7114,6 @@ static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
                 sctrl = &n->sec_ctrl_list.sec[i];
                 nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
             }
-
-            if (rst != NVME_RESET_CONTROLLER) {
-                pcie_sriov_pf_reset(pci_dev);
-            }
         }
 
         if (rst != NVME_RESET_CONTROLLER) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 6496d027ca61..e7a39cb203ae 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -409,6 +409,7 @@ static void pci_do_device_reset(PCIDevice *dev)
 
     msi_reset(dev);
     msix_reset(dev);
+    pcie_sriov_pf_reset(dev);
 }
 
 /*

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 06/15] hw/pci: Rename has_power to enabled
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (4 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 05/15] hw/pci: Always call pcie_sriov_pf_reset() Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 07/15] pcie_sriov: Do not manually unrealize Akihiko Odaki
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

The renamed state will not only represent powering state of PFs, but
also represent SR-IOV VF enablement in the future.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pci.h        |  7 ++++++-
 include/hw/pci/pci_device.h |  2 +-
 hw/pci/pci.c                | 14 +++++++-------
 hw/pci/pci_host.c           |  4 ++--
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index eaa3fc99d884..6c92b2f70008 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -642,6 +642,11 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
 }
 
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
-void pci_set_power(PCIDevice *pci_dev, bool state);
+void pci_set_enabled(PCIDevice *pci_dev, bool state);
+
+static inline void pci_set_power(PCIDevice *pci_dev, bool state)
+{
+    pci_set_enabled(pci_dev, state);
+}
 
 #endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b273..d57f9ce83884 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -56,7 +56,7 @@ typedef struct PCIReqIDCache PCIReqIDCache;
 struct PCIDevice {
     DeviceState qdev;
     bool partially_hotplugged;
-    bool has_power;
+    bool enabled;
 
     /* PCI config space */
     uint8_t *config;
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e7a39cb203ae..8bde13f7cd1e 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1525,7 +1525,7 @@ static void pci_update_mappings(PCIDevice *d)
             continue;
 
         new_addr = pci_bar_address(d, i, r->type, r->size);
-        if (!d->has_power) {
+        if (!d->enabled) {
             new_addr = PCI_BAR_UNMAPPED;
         }
 
@@ -1613,7 +1613,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int
         pci_update_irq_disabled(d, was_irq_disabled);
         memory_region_set_enabled(&d->bus_master_enable_region,
                                   (pci_get_word(d->config + PCI_COMMAND)
-                                   & PCI_COMMAND_MASTER) && d->has_power);
+                                   & PCI_COMMAND_MASTER) && d->enabled);
     }
 
     msi_write_config(d, addr, val_in, l);
@@ -2811,18 +2811,18 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
     return msg;
 }
 
-void pci_set_power(PCIDevice *d, bool state)
+void pci_set_enabled(PCIDevice *d, bool state)
 {
-    if (d->has_power == state) {
+    if (d->enabled == state) {
         return;
     }
 
-    d->has_power = state;
+    d->enabled = state;
     pci_update_mappings(d);
     memory_region_set_enabled(&d->bus_master_enable_region,
                               (pci_get_word(d->config + PCI_COMMAND)
-                               & PCI_COMMAND_MASTER) && d->has_power);
-    if (!d->has_power) {
+                               & PCI_COMMAND_MASTER) && d->enabled);
+    if (!d->enabled) {
         pci_device_reset(d);
     }
 }
diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c
index dfe6fe618401..0d82727cc9dd 100644
--- a/hw/pci/pci_host.c
+++ b/hw/pci/pci_host.c
@@ -86,7 +86,7 @@ void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr,
      * allowing direct removal of unexposed functions.
      */
     if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
-        !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) {
+        !pci_dev->enabled || is_pci_dev_ejected(pci_dev)) {
         return;
     }
 
@@ -111,7 +111,7 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr,
      * allowing direct removal of unexposed functions.
      */
     if ((pci_dev->qdev.hotplugged && !pci_get_function_0(pci_dev)) ||
-        !pci_dev->has_power || is_pci_dev_ejected(pci_dev)) {
+        !pci_dev->enabled || is_pci_dev_ejected(pci_dev)) {
         return ~0x0;
     }
 

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 07/15] pcie_sriov: Do not manually unrealize
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (5 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 06/15] hw/pci: Rename has_power to enabled Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-03-12 19:37   ` Michael S. Tsirkin
  2024-02-20 12:24 ` [PATCH v6 08/15] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

A device gets automatically unrealized when being unparented.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pcie_sriov.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index e9b23221d713..8b1fd2a89ad7 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -17,7 +17,6 @@
 #include "hw/qdev-properties.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
-#include "qapi/error.h"
 #include "trace.h"
 
 static PCIDevice *register_vf(PCIDevice *pf, int devfn,
@@ -204,11 +203,7 @@ static void unregister_vfs(PCIDevice *dev)
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
                                PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        Error *err = NULL;
         PCIDevice *vf = dev->exp.sriov_pf.vf[i];
-        if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) {
-            error_reportf_err(err, "Failed to unplug: ");
-        }
         object_unparent(OBJECT(vf));
         object_unref(OBJECT(vf));
     }

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 08/15] pcie_sriov: Reuse SR-IOV VF device instances
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (6 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 07/15] pcie_sriov: Do not manually unrealize Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-03-12 19:47   ` Michael S. Tsirkin
  2024-02-20 12:24 ` [PATCH v6 09/15] pcie_sriov: Release VFs failed to realize Akihiko Odaki
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

Disable SR-IOV VF devices by reusing code to power down PCI devices
instead of removing them when the guest requests to disable VFs. This
allows to realize devices and report VF realization errors at PF
realization time.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 docs/pcie_sriov.txt         |   8 ++--
 include/hw/pci/pci.h        |   5 ---
 include/hw/pci/pci_device.h |  15 +++++++
 include/hw/pci/pcie_sriov.h |   6 +--
 hw/net/igb.c                |  13 ++++--
 hw/nvme/ctrl.c              |  24 +++++++----
 hw/pci/pci.c                |   2 +-
 hw/pci/pcie_sriov.c         | 102 +++++++++++++++++++-------------------------
 8 files changed, 95 insertions(+), 80 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index a47aad0bfab0..ab2142807f79 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -52,9 +52,11 @@ setting up a BAR for a VF.
       ...
 
       /* Add and initialize the SR/IOV capability */
-      pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
-                       vf_devid, initial_vfs, total_vfs,
-                       fun_offset, stride);
+      if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+                              vf_devid, initial_vfs, total_vfs,
+                              fun_offset, stride, errp)) {
+         return;
+      }
 
       /* Set up individual VF BARs (parameters as for normal BARs) */
       pcie_sriov_pf_init_vf_bar( ... )
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6c92b2f70008..442017b4865d 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -644,9 +644,4 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
 MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
 void pci_set_enabled(PCIDevice *pci_dev, bool state);
 
-static inline void pci_set_power(PCIDevice *pci_dev, bool state)
-{
-    pci_set_enabled(pci_dev, state);
-}
-
 #endif
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d57f9ce83884..ca151325085d 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -205,6 +205,21 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
     return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
 }
 
+static inline void pci_set_power(PCIDevice *pci_dev, bool state)
+{
+    /*
+     * Don't change the enabled state of VFs when powering on/off the device.
+     *
+     * When powering on, VFs must not be enabled immediately but they must
+     * wait until the guest configures SR-IOV.
+     * When powering off, their corresponding PFs will be reset and disable
+     * VFs.
+     */
+    if (!pci_is_vf(pci_dev)) {
+        pci_set_enabled(pci_dev, state);
+    }
+}
+
 uint16_t pci_requester_id(PCIDevice *dev);
 
 /* DMA access functions */
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index b77eb7bf58ac..4b1133f79e15 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -18,7 +18,6 @@
 struct PCIESriovPF {
     uint16_t num_vfs;   /* Number of virtual functions created */
     uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
-    const char *vfname; /* Reference to the device type used for the VFs */
     PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
 };
 
@@ -27,10 +26,11 @@ struct PCIESriovVF {
     uint16_t vf_number; /* Logical VF number of this function */
 };
 
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         const char *vfname, uint16_t vf_dev_id,
                         uint16_t init_vfs, uint16_t total_vfs,
-                        uint16_t vf_offset, uint16_t vf_stride);
+                        uint16_t vf_offset, uint16_t vf_stride,
+                        Error **errp);
 void pcie_sriov_pf_exit(PCIDevice *dev);
 
 /* Set up a VF bar in the SR/IOV bar area */
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 9b37523d6df8..907259fd8b3b 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -447,9 +447,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp)
 
     pcie_ari_init(pci_dev, 0x150);
 
-    pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
-        IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
-        IGB_VF_OFFSET, IGB_VF_STRIDE);
+    if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET,
+                            TYPE_IGBVF, IGB_82576_VF_DEV_ID,
+                            IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
+                            IGB_VF_OFFSET, IGB_VF_STRIDE,
+                            errp)) {
+        pcie_cap_exit(pci_dev);
+        igb_cleanup_msix(s);
+        msi_uninit(pci_dev);
+        return;
+    }
 
     pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
         PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index c1af4b87b34a..98c3e942077c 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8027,7 +8027,8 @@ static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs,
     return bar_size;
 }
 
-static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
+static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+                            Error **errp)
 {
     uint16_t vf_dev_id = n->params.use_intel_id ?
                          PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
@@ -8036,12 +8037,17 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
                                       le16_to_cpu(cap->vifrsm),
                                       NULL, NULL);
 
-    pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
-                       n->params.sriov_max_vfs, n->params.sriov_max_vfs,
-                       NVME_VF_OFFSET, NVME_VF_STRIDE);
+    if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
+                            n->params.sriov_max_vfs, n->params.sriov_max_vfs,
+                            NVME_VF_OFFSET, NVME_VF_STRIDE,
+                            errp)) {
+        return false;
+    }
 
     pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
                               PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
+
+    return true;
 }
 
 static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
@@ -8120,6 +8126,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
         return false;
     }
 
+    if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs &&
+        !nvme_init_sriov(n, pci_dev, 0x120, errp)) {
+        msix_uninit(pci_dev, &n->bar0, &n->bar0);
+        return false;
+    }
+
     nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
 
     if (n->params.cmb_size_mb) {
@@ -8130,10 +8142,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
         nvme_init_pmr(n, pci_dev);
     }
 
-    if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) {
-        nvme_init_sriov(n, pci_dev, 0x120);
-    }
-
     return true;
 }
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 8bde13f7cd1e..750c2ba696d1 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2822,7 +2822,7 @@ void pci_set_enabled(PCIDevice *d, bool state)
     memory_region_set_enabled(&d->bus_master_enable_region,
                               (pci_get_word(d->config + PCI_COMMAND)
                                & PCI_COMMAND_MASTER) && d->enabled);
-    if (!d->enabled) {
+    if (d->qdev.realized) {
         pci_device_reset(d);
     }
 }
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 8b1fd2a89ad7..d934cd7d0e64 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -19,15 +19,25 @@
 #include "qemu/range.h"
 #include "trace.h"
 
-static PCIDevice *register_vf(PCIDevice *pf, int devfn,
-                              const char *name, uint16_t vf_num);
-static void unregister_vfs(PCIDevice *dev);
+static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs)
+{
+    for (uint16_t i = 0; i < total_vfs; i++) {
+        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
+        object_unparent(OBJECT(vf));
+        object_unref(OBJECT(vf));
+    }
+    g_free(dev->exp.sriov_pf.vf);
+    dev->exp.sriov_pf.vf = NULL;
+}
 
-void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
+bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         const char *vfname, uint16_t vf_dev_id,
                         uint16_t init_vfs, uint16_t total_vfs,
-                        uint16_t vf_offset, uint16_t vf_stride)
+                        uint16_t vf_offset, uint16_t vf_stride,
+                        Error **errp)
 {
+    BusState *bus = qdev_get_parent_bus(&dev->qdev);
+    int32_t devfn = dev->devfn + vf_offset;
     uint8_t *cfg = dev->config + offset;
     uint8_t *wmask;
 
@@ -35,7 +45,6 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
                         offset, PCI_EXT_CAP_SRIOV_SIZEOF);
     dev->exp.sriov_cap = offset;
     dev->exp.sriov_pf.num_vfs = 0;
-    dev->exp.sriov_pf.vfname = g_strdup(vfname);
     dev->exp.sriov_pf.vf = NULL;
 
     pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -68,13 +77,35 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
     pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
 
     qdev_prop_set_bit(&dev->qdev, "multifunction", true);
+
+    dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs);
+
+    for (uint16_t i = 0; i < total_vfs; i++) {
+        PCIDevice *vf = pci_new(devfn, vfname);
+        vf->exp.sriov_vf.pf = dev;
+        vf->exp.sriov_vf.vf_number = i;
+
+        if (!qdev_realize(&vf->qdev, bus, errp)) {
+            unparent_vfs(dev, i);
+            return false;
+        }
+
+        /* set vid/did according to sr/iov spec - they are not used */
+        pci_config_set_vendor_id(vf->config, 0xffff);
+        pci_config_set_device_id(vf->config, 0xffff);
+
+        dev->exp.sriov_pf.vf[i] = vf;
+        devfn += vf_stride;
+    }
+
+    return true;
 }
 
 void pcie_sriov_pf_exit(PCIDevice *dev)
 {
-    unregister_vfs(dev);
-    g_free((char *)dev->exp.sriov_pf.vfname);
-    dev->exp.sriov_pf.vfname = NULL;
+    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
+
+    unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF));
 }
 
 void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
@@ -140,38 +171,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
     }
 }
 
-static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name,
-                              uint16_t vf_num)
-{
-    PCIDevice *dev = pci_new(devfn, name);
-    dev->exp.sriov_vf.pf = pf;
-    dev->exp.sriov_vf.vf_number = vf_num;
-    PCIBus *bus = pci_get_bus(pf);
-    Error *local_err = NULL;
-
-    qdev_realize(&dev->qdev, &bus->qbus, &local_err);
-    if (local_err) {
-        error_report_err(local_err);
-        return NULL;
-    }
-
-    /* set vid/did according to sr/iov spec - they are not used */
-    pci_config_set_vendor_id(dev->config, 0xffff);
-    pci_config_set_device_id(dev->config, 0xffff);
-
-    return dev;
-}
-
 static void register_vfs(PCIDevice *dev)
 {
     uint16_t num_vfs;
     uint16_t i;
     uint16_t sriov_cap = dev->exp.sriov_cap;
-    uint16_t vf_offset =
-        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
-    uint16_t vf_stride =
-        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
-    int32_t devfn = dev->devfn + vf_offset;
 
     assert(sriov_cap > 0);
     num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
@@ -179,18 +183,10 @@ static void register_vfs(PCIDevice *dev)
         return;
     }
 
-    dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
-
     trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
                              PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn,
-                                              dev->exp.sriov_pf.vfname, i);
-        if (!dev->exp.sriov_pf.vf[i]) {
-            num_vfs = i;
-            break;
-        }
-        devfn += vf_stride;
+        pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
     }
     dev->exp.sriov_pf.num_vfs = num_vfs;
 }
@@ -203,12 +199,8 @@ static void unregister_vfs(PCIDevice *dev)
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
                                PCI_FUNC(dev->devfn), num_vfs);
     for (i = 0; i < num_vfs; i++) {
-        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
-        object_unparent(OBJECT(vf));
-        object_unref(OBJECT(vf));
+        pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
     }
-    g_free(dev->exp.sriov_pf.vf);
-    dev->exp.sriov_pf.vf = NULL;
     dev->exp.sriov_pf.num_vfs = 0;
 }
 
@@ -230,14 +222,10 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
                              PCI_FUNC(dev->devfn), off, val, len);
 
     if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
-        if (dev->exp.sriov_pf.num_vfs) {
-            if (!(val & PCI_SRIOV_CTRL_VFE)) {
-                unregister_vfs(dev);
-            }
+        if (val & PCI_SRIOV_CTRL_VFE) {
+            register_vfs(dev);
         } else {
-            if (val & PCI_SRIOV_CTRL_VFE) {
-                register_vfs(dev);
-            }
+            unregister_vfs(dev);
         }
     }
 }

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 09/15] pcie_sriov: Release VFs failed to realize
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (7 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 08/15] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 10/15] pcie_sriov: Remove num_vfs from PCIESriovPF Akihiko Odaki
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

Release VFs failed to realize just as we do in unregister_vfs().

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pcie_sriov.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index d934cd7d0e64..8710ee95b26d 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -86,6 +86,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
         vf->exp.sriov_vf.vf_number = i;
 
         if (!qdev_realize(&vf->qdev, bus, errp)) {
+            object_unparent(OBJECT(vf));
+            object_unref(vf);
             unparent_vfs(dev, i);
             return false;
         }

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 10/15] pcie_sriov: Remove num_vfs from PCIESriovPF
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (8 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 09/15] pcie_sriov: Release VFs failed to realize Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 11/15] pcie_sriov: Register VFs after migration Akihiko Odaki
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

num_vfs is not migrated so use PCI_SRIOV_CTRL_VFE and PCI_SRIOV_NUM_VF
instead.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pcie_sriov.h |  1 -
 hw/pci/pcie_sriov.c         | 28 ++++++++++++++++++++--------
 hw/pci/trace-events         |  2 +-
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 4b1133f79e15..793d03c5f12e 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -16,7 +16,6 @@
 #include "hw/pci/pci.h"
 
 struct PCIESriovPF {
-    uint16_t num_vfs;   /* Number of virtual functions created */
     uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
     PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
 };
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 8710ee95b26d..fb48c1a18c9a 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -44,7 +44,6 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
     pcie_add_capability(dev, PCI_EXT_CAP_ID_SRIOV, 1,
                         offset, PCI_EXT_CAP_SRIOV_SIZEOF);
     dev->exp.sriov_cap = offset;
-    dev->exp.sriov_pf.num_vfs = 0;
     dev->exp.sriov_pf.vf = NULL;
 
     pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
@@ -173,6 +172,12 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
     }
 }
 
+static void clear_ctrl_vfe(PCIDevice *dev)
+{
+    uint8_t *ctrl = dev->config + dev->exp.sriov_cap + PCI_SRIOV_CTRL;
+    pci_set_word(ctrl, pci_get_word(ctrl) & ~PCI_SRIOV_CTRL_VFE);
+}
+
 static void register_vfs(PCIDevice *dev)
 {
     uint16_t num_vfs;
@@ -182,6 +187,7 @@ static void register_vfs(PCIDevice *dev)
     assert(sriov_cap > 0);
     num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
     if (num_vfs > pci_get_word(dev->config + sriov_cap + PCI_SRIOV_TOTAL_VF)) {
+        clear_ctrl_vfe(dev);
         return;
     }
 
@@ -190,20 +196,18 @@ static void register_vfs(PCIDevice *dev)
     for (i = 0; i < num_vfs; i++) {
         pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
     }
-    dev->exp.sriov_pf.num_vfs = num_vfs;
 }
 
 static void unregister_vfs(PCIDevice *dev)
 {
-    uint16_t num_vfs = dev->exp.sriov_pf.num_vfs;
     uint16_t i;
+    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
 
     trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
-                               PCI_FUNC(dev->devfn), num_vfs);
-    for (i = 0; i < num_vfs; i++) {
+                               PCI_FUNC(dev->devfn));
+    for (i = 0; i < pci_get_word(cfg + PCI_SRIOV_TOTAL_VF); i++) {
         pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
     }
-    dev->exp.sriov_pf.num_vfs = 0;
 }
 
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
@@ -229,6 +233,9 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
         } else {
             unregister_vfs(dev);
         }
+    } else if (range_covers_byte(off, len, PCI_SRIOV_NUM_VF)) {
+        clear_ctrl_vfe(dev);
+        unregister_vfs(dev);
     }
 }
 
@@ -291,7 +298,7 @@ PCIDevice *pcie_sriov_get_pf(PCIDevice *dev)
 PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
 {
     assert(!pci_is_vf(dev));
-    if (n < dev->exp.sriov_pf.num_vfs) {
+    if (n < pcie_sriov_num_vfs(dev)) {
         return dev->exp.sriov_pf.vf[n];
     }
     return NULL;
@@ -299,5 +306,10 @@ PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
 
 uint16_t pcie_sriov_num_vfs(PCIDevice *dev)
 {
-    return dev->exp.sriov_pf.num_vfs;
+    uint16_t sriov_cap = dev->exp.sriov_cap;
+    uint8_t *cfg = dev->config + sriov_cap;
+
+    return sriov_cap &&
+           (pci_get_word(cfg + PCI_SRIOV_CTRL) & PCI_SRIOV_CTRL_VFE) ?
+           pci_get_word(cfg + PCI_SRIOV_NUM_VF) : 0;
 }
diff --git a/hw/pci/trace-events b/hw/pci/trace-events
index 42430869ce05..b30e31619787 100644
--- a/hw/pci/trace-events
+++ b/hw/pci/trace-events
@@ -14,5 +14,5 @@ msix_write_config(char *name, bool enabled, bool masked) "dev %s enabled %d mask
 
 # hw/pci/pcie_sriov.c
 sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: creating %d vf devs"
-sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) "%s %02x:%x: Unregistering %d vf devs"
+sriov_unregister_vfs(const char *name, int slot, int function) "%s %02x:%x: Unregistering vf devs"
 sriov_config_write(const char *name, int slot, int fun, uint32_t offset, uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 11/15] pcie_sriov: Register VFs after migration
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (9 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 10/15] pcie_sriov: Remove num_vfs from PCIESriovPF Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 12/15] hw/pci: Use -1 as a default value for rombar Akihiko Odaki
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

pcie_sriov doesn't have code to restore its state after migration, but
igb, which uses pcie_sriov, naively claimed its migration capability.

Add code to register VFs after migration and fix igb migration.

Fixes: 3a977deebe6b ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pcie_sriov.h | 2 ++
 hw/pci/pci.c                | 7 +++++++
 hw/pci/pcie_sriov.c         | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 793d03c5f12e..d576a8c6be19 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -57,6 +57,8 @@ void pcie_sriov_pf_add_sup_pgsize(PCIDevice *dev, uint16_t opt_sup_pgsize);
 void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
                              uint32_t val, int len);
 
+void pcie_sriov_pf_post_load(PCIDevice *dev);
+
 /* Reset SR/IOV */
 void pcie_sriov_pf_reset(PCIDevice *dev);
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 750c2ba696d1..54b375da2d26 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -733,10 +733,17 @@ static bool migrate_is_not_pcie(void *opaque, int version_id)
     return !pci_is_express((PCIDevice *)opaque);
 }
 
+static int pci_post_load(void *opaque, int version_id)
+{
+    pcie_sriov_pf_post_load(opaque);
+    return 0;
+}
+
 const VMStateDescription vmstate_pci_device = {
     .name = "PCIDevice",
     .version_id = 2,
     .minimum_version_id = 1,
+    .post_load = pci_post_load,
     .fields = (const VMStateField[]) {
         VMSTATE_INT32_POSITIVE_LE(version_id, PCIDevice),
         VMSTATE_BUFFER_UNSAFE_INFO_TEST(config, PCIDevice,
diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index fb48c1a18c9a..09a53ed30027 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -239,6 +239,13 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
     }
 }
 
+void pcie_sriov_pf_post_load(PCIDevice *dev)
+{
+    if (dev->exp.sriov_cap) {
+        register_vfs(dev);
+    }
+}
+
 
 /* Reset SR/IOV */
 void pcie_sriov_pf_reset(PCIDevice *dev)

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 12/15] hw/pci: Use -1 as a default value for rombar
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (10 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 11/15] pcie_sriov: Register VFs after migration Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-21  7:59   ` Markus Armbruster
  2024-02-20 12:24 ` [PATCH v6 13/15] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

Currently there is no way to distinguish the case that rombar is
explicitly specified as 1 and the case that rombar is not specified.

Set rombar -1 by default to distinguish these cases just as it is done
for addr and romsize. It was confirmed that changing the default value
to -1 will not change the behavior by looking at occurences of rom_bar.

$ git grep -w rom_bar
hw/display/qxl.c:328:    QXLRom *rom = memory_region_get_ram_ptr(&d->rom_bar);
hw/display/qxl.c:431:    qxl_set_dirty(&qxl->rom_bar, 0, qxl->rom_size);
hw/display/qxl.c:1048:    QXLRom *rom = memory_region_get_ram_ptr(&qxl->rom_bar);
hw/display/qxl.c:2131:    memory_region_init_rom(&qxl->rom_bar, OBJECT(qxl), "qxl.vrom",
hw/display/qxl.c:2154: PCI_BASE_ADDRESS_SPACE_MEMORY, &qxl->rom_bar);
hw/display/qxl.h:101:    MemoryRegion       rom_bar;
hw/pci/pci.c:74:    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
hw/pci/pci.c:2329:    if (!pdev->rom_bar) {
hw/vfio/pci.c:1019:    if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
hw/xen/xen_pt_load_rom.c:29:    if (dev->romfile || !dev->rom_bar) {
include/hw/pci/pci_device.h:150:    uint32_t rom_bar;

rom_bar refers to a different variable in qxl. It is only tested if
the value is 0 or not in the other places.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 54b375da2d26..909c5b3ee4ee 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -71,7 +71,7 @@ static Property pci_props[] = {
     DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
     DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
     DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, -1),
-    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
+    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, -1),
     DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
                     QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
     DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present,

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 13/15] hw/pci: Determine if rombar is explicitly enabled
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (11 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 12/15] hw/pci: Use -1 as a default value for rombar Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-21  8:15   ` Markus Armbruster
  2024-02-20 12:24 ` [PATCH v6 14/15] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki
  2024-02-20 12:24 ` [PATCH v6 15/15] hw/qdev: Remove opts member Akihiko Odaki
  14 siblings, 1 reply; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

vfio determines if rombar is explicitly enabled by inspecting QDict.
Inspecting QDict is not nice because QDict is untyped and depends on the
details on the external interface. Add an infrastructure to determine if
rombar is explicitly enabled to hw/pci.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/hw/pci/pci_device.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index ca151325085d..c4fdc96ef50d 100644
--- a/include/hw/pci/pci_device.h
+++ b/include/hw/pci/pci_device.h
@@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
     return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
 }
 
+static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev)
+{
+    return dev->rom_bar && dev->rom_bar != -1;
+}
+
 static inline void pci_set_power(PCIDevice *pci_dev, bool state)
 {
     /*

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 14/15] vfio: Avoid inspecting option QDict for rombar
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (12 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 13/15] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-21  8:16   ` Markus Armbruster
  2024-02-20 12:24 ` [PATCH v6 15/15] hw/qdev: Remove opts member Akihiko Odaki
  14 siblings, 1 reply; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

Use pci_rom_bar_explicitly_enabled() to determine if rombar is explicitly
enabled.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/vfio/pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 4fa387f0430d..647f15b2a060 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1012,7 +1012,6 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 {
     uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
     off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
-    DeviceState *dev = DEVICE(vdev);
     char *name;
     int fd = vdev->vbasedev.fd;
 
@@ -1046,7 +1045,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
     }
 
     if (vfio_opt_rom_in_denylist(vdev)) {
-        if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
+        if (pci_rom_bar_explicitly_enabled(&vdev->pdev)) {
             warn_report("Device at %s is known to cause system instability"
                         " issues during option rom execution",
                         vdev->vbasedev.name);

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 15/15] hw/qdev: Remove opts member
  2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
                   ` (13 preceding siblings ...)
  2024-02-20 12:24 ` [PATCH v6 14/15] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki
@ 2024-02-20 12:24 ` Akihiko Odaki
  2024-02-21  8:18   ` Markus Armbruster
  14 siblings, 1 reply; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 12:24 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen
  Cc: qemu-devel, qemu-block, Akihiko Odaki

It is no longer used.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 include/hw/qdev-core.h |  4 ----
 hw/core/qdev.c         |  1 -
 system/qdev-monitor.c  | 12 +++++++-----
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 9228e96c87e9..5954404dcbfe 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -237,10 +237,6 @@ struct DeviceState {
      * @pending_deleted_expires_ms: optional timeout for deletion events
      */
     int64_t pending_deleted_expires_ms;
-    /**
-     * @opts: QDict of options for the device
-     */
-    QDict *opts;
     /**
      * @hotplugged: was device added after PHASE_MACHINE_READY?
      */
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index c68d0f7c512f..7349c9a86be8 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -706,7 +706,6 @@ static void device_finalize(Object *obj)
         dev->canonical_path = NULL;
     }
 
-    qobject_unref(dev->opts);
     g_free(dev->id);
 }
 
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index a13db763e5dd..71c00f62ee38 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -625,6 +625,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
     char *id;
     DeviceState *dev = NULL;
     BusState *bus = NULL;
+    QDict *properties;
 
     driver = qdict_get_try_str(opts, "driver");
     if (!driver) {
@@ -705,13 +706,14 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
     }
 
     /* set properties */
-    dev->opts = qdict_clone_shallow(opts);
-    qdict_del(dev->opts, "driver");
-    qdict_del(dev->opts, "bus");
-    qdict_del(dev->opts, "id");
+    properties = qdict_clone_shallow(opts);
+    qdict_del(properties, "driver");
+    qdict_del(properties, "bus");
+    qdict_del(properties, "id");
 
-    object_set_properties_from_keyval(&dev->parent_obj, dev->opts, from_json,
+    object_set_properties_from_keyval(&dev->parent_obj, properties, from_json,
                                       errp);
+    qobject_unref(properties);
     if (*errp) {
         goto err_del_dev;
     }

-- 
2.43.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
  2024-02-20 12:24 ` [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
@ 2024-02-20 14:29   ` Kevin Wolf
  2024-02-20 14:47     ` Klaus Jensen
  2024-02-20 14:53     ` Kevin Wolf
  0 siblings, 2 replies; 29+ messages in thread
From: Kevin Wolf @ 2024-02-20 14:29 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block, qemu-stable

Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> configurations to know the number of VFs being disabled due to SR-IOV
> configuration writes, but the logic was flawed and resulted in
> out-of-bound memory access.
> 
> It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> VFs, but it actually doesn't in the following cases:
> - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> - VFs were only partially enabled because of realization failure.
> 
> It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> provides, to get the number of enabled VFs before and after SR-IOV
> configuration writes.
> 
> Cc: qemu-stable@nongnu.org
> Fixes: CVE-2024-26328
> Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  hw/nvme/ctrl.c | 26 ++++++++------------------
>  1 file changed, 8 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index f026245d1e9e..7a56e7b79b4d 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
>      nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
>  }
>  
> -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> -                                      uint32_t val, int len)
> +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
>  {
>      NvmeCtrl *n = NVME(dev);
>      NvmeSecCtrlEntry *sctrl;
> -    uint16_t sriov_cap = dev->exp.sriov_cap;
> -    uint32_t off = address - sriov_cap;
> -    int i, num_vfs;
> +    int i;
>  
> -    if (!sriov_cap) {
> -        return;
> -    }
> -
> -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> -            for (i = 0; i < num_vfs; i++) {
> -                sctrl = &n->sec_ctrl_list.sec[i];
> -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> -            }
> -        }
> +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> +        sctrl = &n->sec_ctrl_list.sec[i];
> +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
>      }
>  }

Maybe I'm missing something, but if the concern is that 'i' could run
beyond the end of the array, I don't see anything that limits
pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
has. register_vfs() seems to just take whatever 16 bit value the guest
wrote without imposing additional restrictions.

If there is some mechanism that makes register_vf() fail if we exceed
the limit, maybe an assertion with a comment would be in order because
it doesn't seem obvious. I couldn't find any code that enforces it,
sriov_max_vfs only ever seems to be used as a hint for the guest.

If not, do we need another check that fails gracefully in the error
case?

Kevin



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
  2024-02-20 14:29   ` Kevin Wolf
@ 2024-02-20 14:47     ` Klaus Jensen
  2024-02-20 14:53     ` Kevin Wolf
  1 sibling, 0 replies; 29+ messages in thread
From: Klaus Jensen @ 2024-02-20 14:47 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Akihiko Odaki, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	qemu-devel, qemu-block, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 3242 bytes --]

On Feb 20 15:29, Kevin Wolf wrote:
> Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> > configurations to know the number of VFs being disabled due to SR-IOV
> > configuration writes, but the logic was flawed and resulted in
> > out-of-bound memory access.
> > 
> > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> > VFs, but it actually doesn't in the following cases:
> > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> > - VFs were only partially enabled because of realization failure.
> > 
> > It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> > provides, to get the number of enabled VFs before and after SR-IOV
> > configuration writes.
> > 
> > Cc: qemu-stable@nongnu.org
> > Fixes: CVE-2024-26328
> > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > ---
> >  hw/nvme/ctrl.c | 26 ++++++++------------------
> >  1 file changed, 8 insertions(+), 18 deletions(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index f026245d1e9e..7a56e7b79b4d 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> >      nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> >  }
> >  
> > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> > -                                      uint32_t val, int len)
> > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
> >  {
> >      NvmeCtrl *n = NVME(dev);
> >      NvmeSecCtrlEntry *sctrl;
> > -    uint16_t sriov_cap = dev->exp.sriov_cap;
> > -    uint32_t off = address - sriov_cap;
> > -    int i, num_vfs;
> > +    int i;
> >  
> > -    if (!sriov_cap) {
> > -        return;
> > -    }
> > -
> > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> > -            for (i = 0; i < num_vfs; i++) {
> > -                sctrl = &n->sec_ctrl_list.sec[i];
> > -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > -            }
> > -        }
> > +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> > +        sctrl = &n->sec_ctrl_list.sec[i];
> > +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> >      }
> >  }
> 
> Maybe I'm missing something, but if the concern is that 'i' could run
> beyond the end of the array, I don't see anything that limits
> pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
> has. register_vfs() seems to just take whatever 16 bit value the guest
> wrote without imposing additional restrictions.
> 

Hi Kevin,

Thanks for reviewing, I believe patch 2 in this series fixes that
missing validation of NumVFs.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
  2024-02-20 14:29   ` Kevin Wolf
  2024-02-20 14:47     ` Klaus Jensen
@ 2024-02-20 14:53     ` Kevin Wolf
  2024-02-20 15:33       ` Akihiko Odaki
  1 sibling, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2024-02-20 14:53 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block, qemu-stable

Am 20.02.2024 um 15:29 hat Kevin Wolf geschrieben:
> Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> > configurations to know the number of VFs being disabled due to SR-IOV
> > configuration writes, but the logic was flawed and resulted in
> > out-of-bound memory access.
> > 
> > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> > VFs, but it actually doesn't in the following cases:
> > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> > - VFs were only partially enabled because of realization failure.
> > 
> > It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> > provides, to get the number of enabled VFs before and after SR-IOV
> > configuration writes.
> > 
> > Cc: qemu-stable@nongnu.org
> > Fixes: CVE-2024-26328
> > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > ---
> >  hw/nvme/ctrl.c | 26 ++++++++------------------
> >  1 file changed, 8 insertions(+), 18 deletions(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index f026245d1e9e..7a56e7b79b4d 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> >      nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> >  }
> >  
> > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> > -                                      uint32_t val, int len)
> > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
> >  {
> >      NvmeCtrl *n = NVME(dev);
> >      NvmeSecCtrlEntry *sctrl;
> > -    uint16_t sriov_cap = dev->exp.sriov_cap;
> > -    uint32_t off = address - sriov_cap;
> > -    int i, num_vfs;
> > +    int i;
> >  
> > -    if (!sriov_cap) {
> > -        return;
> > -    }
> > -
> > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> > -            for (i = 0; i < num_vfs; i++) {
> > -                sctrl = &n->sec_ctrl_list.sec[i];
> > -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > -            }
> > -        }
> > +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> > +        sctrl = &n->sec_ctrl_list.sec[i];
> > +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> >      }
> >  }
> 
> Maybe I'm missing something, but if the concern is that 'i' could run
> beyond the end of the array, I don't see anything that limits
> pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
> has. register_vfs() seems to just take whatever 16 bit value the guest
> wrote without imposing additional restrictions.
> 
> If there is some mechanism that makes register_vf() fail if we exceed
> the limit, maybe an assertion with a comment would be in order because
> it doesn't seem obvious. I couldn't find any code that enforces it,
> sriov_max_vfs only ever seems to be used as a hint for the guest.
> 
> If not, do we need another check that fails gracefully in the error
> case?

Ok, I see now that patch 2 fixes this. But then the commit message is
wrong because it implies that this patch is the only thing you need to
fix the problem with nvme. You can't say "Fixes: CVE-2024-26328" if half
of the fix is still missing.

Maybe asserting old_num_vfs < n->sec_ctrl_list.numcntl would still be a
good idea. But looking at this one, it seems to me that numcntl isn't
completely correct either:

    list->numcntl = cpu_to_le16(max_vfs);

Both list->numcntl and max_vfs are uint8_t, so I think this will always
be 0 on big endian hosts?

Kevin



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
  2024-02-20 14:53     ` Kevin Wolf
@ 2024-02-20 15:33       ` Akihiko Odaki
  2024-02-20 20:40         ` Klaus Jensen
  0 siblings, 1 reply; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-20 15:33 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block, qemu-stable

On 2024/02/20 23:53, Kevin Wolf wrote:
> Am 20.02.2024 um 15:29 hat Kevin Wolf geschrieben:
>> Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
>>> nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
>>> configurations to know the number of VFs being disabled due to SR-IOV
>>> configuration writes, but the logic was flawed and resulted in
>>> out-of-bound memory access.
>>>
>>> It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
>>> VFs, but it actually doesn't in the following cases:
>>> - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
>>> - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
>>> - VFs were only partially enabled because of realization failure.
>>>
>>> It is a responsibility of pcie_sriov to interpret SR-IOV configurations
>>> and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
>>> provides, to get the number of enabled VFs before and after SR-IOV
>>> configuration writes.
>>>
>>> Cc: qemu-stable@nongnu.org
>>> Fixes: CVE-2024-26328
>>> Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
>>> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>> ---
>>>   hw/nvme/ctrl.c | 26 ++++++++------------------
>>>   1 file changed, 8 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
>>> index f026245d1e9e..7a56e7b79b4d 100644
>>> --- a/hw/nvme/ctrl.c
>>> +++ b/hw/nvme/ctrl.c
>>> @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
>>>       nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
>>>   }
>>>   
>>> -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
>>> -                                      uint32_t val, int len)
>>> +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
>>>   {
>>>       NvmeCtrl *n = NVME(dev);
>>>       NvmeSecCtrlEntry *sctrl;
>>> -    uint16_t sriov_cap = dev->exp.sriov_cap;
>>> -    uint32_t off = address - sriov_cap;
>>> -    int i, num_vfs;
>>> +    int i;
>>>   
>>> -    if (!sriov_cap) {
>>> -        return;
>>> -    }
>>> -
>>> -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
>>> -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
>>> -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
>>> -            for (i = 0; i < num_vfs; i++) {
>>> -                sctrl = &n->sec_ctrl_list.sec[i];
>>> -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
>>> -            }
>>> -        }
>>> +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
>>> +        sctrl = &n->sec_ctrl_list.sec[i];
>>> +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
>>>       }
>>>   }
>>
>> Maybe I'm missing something, but if the concern is that 'i' could run
>> beyond the end of the array, I don't see anything that limits
>> pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
>> has. register_vfs() seems to just take whatever 16 bit value the guest
>> wrote without imposing additional restrictions.
>>
>> If there is some mechanism that makes register_vf() fail if we exceed
>> the limit, maybe an assertion with a comment would be in order because
>> it doesn't seem obvious. I couldn't find any code that enforces it,
>> sriov_max_vfs only ever seems to be used as a hint for the guest.
>>
>> If not, do we need another check that fails gracefully in the error
>> case?
> 
> Ok, I see now that patch 2 fixes this. But then the commit message is
> wrong because it implies that this patch is the only thing you need to
> fix the problem with nvme. You can't say "Fixes: CVE-2024-26328" if half
> of the fix is still missing.

I didn't assign CVE-2024-26328 for the case that the value of 
PCI_SRIOV_NUM_VF is greater than PCI_SRIOV_TOTAL_VF; it's what 
CVE-2024-26327 deals with.

The problem I dealt here is that the value of PCI_SRIOV_NUM_VF may not 
represent the actual number of enabled VFs because another register 
(PCI_SRIOV_CTRL_VFE) is not set, for example.

If an assertion to be added, I think it should be in 
pcie_sriov_num_vfs() and ensure the returning value is less than the 
value of PCI_SRIOV_TOTAL_VF (aka sriov_max_vfs in hw/nvme/ctrl.c), but I 
think it's fine without it.

> 
> Maybe asserting old_num_vfs < n->sec_ctrl_list.numcntl would still be a
> good idea. But looking at this one, it seems to me that numcntl isn't
> completely correct either:
> 
>      list->numcntl = cpu_to_le16(max_vfs);
> 
> Both list->numcntl and max_vfs are uint8_t, so I think this will always
> be 0 on big endian hosts?

Indeed it looks wrong. Will you write a patch?

Regards,
Akihiko Odaki


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs()
  2024-02-20 15:33       ` Akihiko Odaki
@ 2024-02-20 20:40         ` Klaus Jensen
  0 siblings, 0 replies; 29+ messages in thread
From: Klaus Jensen @ 2024-02-20 20:40 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Kevin Wolf, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	qemu-devel, qemu-block, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 5255 bytes --]

On Feb 21 00:33, Akihiko Odaki wrote:
> On 2024/02/20 23:53, Kevin Wolf wrote:
> > Am 20.02.2024 um 15:29 hat Kevin Wolf geschrieben:
> > > Am 20.02.2024 um 13:24 hat Akihiko Odaki geschrieben:
> > > > nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV
> > > > configurations to know the number of VFs being disabled due to SR-IOV
> > > > configuration writes, but the logic was flawed and resulted in
> > > > out-of-bound memory access.
> > > > 
> > > > It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled
> > > > VFs, but it actually doesn't in the following cases:
> > > > - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been.
> > > > - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set.
> > > > - VFs were only partially enabled because of realization failure.
> > > > 
> > > > It is a responsibility of pcie_sriov to interpret SR-IOV configurations
> > > > and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it
> > > > provides, to get the number of enabled VFs before and after SR-IOV
> > > > configuration writes.
> > > > 
> > > > Cc: qemu-stable@nongnu.org
> > > > Fixes: CVE-2024-26328
> > > > Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command")
> > > > Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > ---
> > > >   hw/nvme/ctrl.c | 26 ++++++++------------------
> > > >   1 file changed, 8 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > > > index f026245d1e9e..7a56e7b79b4d 100644
> > > > --- a/hw/nvme/ctrl.c
> > > > +++ b/hw/nvme/ctrl.c
> > > > @@ -8466,36 +8466,26 @@ static void nvme_pci_reset(DeviceState *qdev)
> > > >       nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
> > > >   }
> > > > -static void nvme_sriov_pre_write_ctrl(PCIDevice *dev, uint32_t address,
> > > > -                                      uint32_t val, int len)
> > > > +static void nvme_sriov_post_write_config(PCIDevice *dev, uint16_t old_num_vfs)
> > > >   {
> > > >       NvmeCtrl *n = NVME(dev);
> > > >       NvmeSecCtrlEntry *sctrl;
> > > > -    uint16_t sriov_cap = dev->exp.sriov_cap;
> > > > -    uint32_t off = address - sriov_cap;
> > > > -    int i, num_vfs;
> > > > +    int i;
> > > > -    if (!sriov_cap) {
> > > > -        return;
> > > > -    }
> > > > -
> > > > -    if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> > > > -        if (!(val & PCI_SRIOV_CTRL_VFE)) {
> > > > -            num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> > > > -            for (i = 0; i < num_vfs; i++) {
> > > > -                sctrl = &n->sec_ctrl_list.sec[i];
> > > > -                nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > > > -            }
> > > > -        }
> > > > +    for (i = pcie_sriov_num_vfs(dev); i < old_num_vfs; i++) {
> > > > +        sctrl = &n->sec_ctrl_list.sec[i];
> > > > +        nvme_virt_set_state(n, le16_to_cpu(sctrl->scid), false);
> > > >       }
> > > >   }
> > > 
> > > Maybe I'm missing something, but if the concern is that 'i' could run
> > > beyond the end of the array, I don't see anything that limits
> > > pcie_sriov_num_vfs() to the static size of 127 that n->sec_ctrl_list.sec
> > > has. register_vfs() seems to just take whatever 16 bit value the guest
> > > wrote without imposing additional restrictions.
> > > 
> > > If there is some mechanism that makes register_vf() fail if we exceed
> > > the limit, maybe an assertion with a comment would be in order because
> > > it doesn't seem obvious. I couldn't find any code that enforces it,
> > > sriov_max_vfs only ever seems to be used as a hint for the guest.
> > > 
> > > If not, do we need another check that fails gracefully in the error
> > > case?
> > 
> > Ok, I see now that patch 2 fixes this. But then the commit message is
> > wrong because it implies that this patch is the only thing you need to
> > fix the problem with nvme. You can't say "Fixes: CVE-2024-26328" if half
> > of the fix is still missing.
> 
> I didn't assign CVE-2024-26328 for the case that the value of
> PCI_SRIOV_NUM_VF is greater than PCI_SRIOV_TOTAL_VF; it's what
> CVE-2024-26327 deals with.
> 
> The problem I dealt here is that the value of PCI_SRIOV_NUM_VF may not
> represent the actual number of enabled VFs because another register
> (PCI_SRIOV_CTRL_VFE) is not set, for example.
> 
> If an assertion to be added, I think it should be in pcie_sriov_num_vfs()
> and ensure the returning value is less than the value of PCI_SRIOV_TOTAL_VF
> (aka sriov_max_vfs in hw/nvme/ctrl.c), but I think it's fine without it.
> 
> > 
> > Maybe asserting old_num_vfs < n->sec_ctrl_list.numcntl would still be a
> > good idea. But looking at this one, it seems to me that numcntl isn't
> > completely correct either:
> > 
> >      list->numcntl = cpu_to_le16(max_vfs);
> > 
> > Both list->numcntl and max_vfs are uint8_t, so I think this will always
> > be 0 on big endian hosts?
> 
> Indeed it looks wrong. Will you write a patch?
> 

I'll fix it. And give the SR-IOV parts of hw/nvme some love all around.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 12/15] hw/pci: Use -1 as a default value for rombar
  2024-02-20 12:24 ` [PATCH v6 12/15] hw/pci: Use -1 as a default value for rombar Akihiko Odaki
@ 2024-02-21  7:59   ` Markus Armbruster
  2024-02-22  6:40     ` Akihiko Odaki
  0 siblings, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2024-02-21  7:59 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block

Akihiko Odaki <akihiko.odaki@daynix.com> writes:

> Currently there is no way to distinguish the case that rombar is
> explicitly specified as 1 and the case that rombar is not specified.
>
> Set rombar -1 by default to distinguish these cases just as it is done
> for addr and romsize. It was confirmed that changing the default value
> to -1 will not change the behavior by looking at occurences of rom_bar.
>
> $ git grep -w rom_bar
> hw/display/qxl.c:328:    QXLRom *rom = memory_region_get_ram_ptr(&d->rom_bar);
> hw/display/qxl.c:431:    qxl_set_dirty(&qxl->rom_bar, 0, qxl->rom_size);
> hw/display/qxl.c:1048:    QXLRom *rom = memory_region_get_ram_ptr(&qxl->rom_bar);
> hw/display/qxl.c:2131:    memory_region_init_rom(&qxl->rom_bar, OBJECT(qxl), "qxl.vrom",
> hw/display/qxl.c:2154: PCI_BASE_ADDRESS_SPACE_MEMORY, &qxl->rom_bar);
> hw/display/qxl.h:101:    MemoryRegion       rom_bar;
> hw/pci/pci.c:74:    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
> hw/pci/pci.c:2329:    if (!pdev->rom_bar) {
> hw/vfio/pci.c:1019:    if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
> hw/xen/xen_pt_load_rom.c:29:    if (dev->romfile || !dev->rom_bar) {
> include/hw/pci/pci_device.h:150:    uint32_t rom_bar;
>
> rom_bar refers to a different variable in qxl. It is only tested if
> the value is 0 or not in the other places.

Makes me wonder why it's uint32_t.  Not this patch's problem.

> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  hw/pci/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 54b375da2d26..909c5b3ee4ee 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -71,7 +71,7 @@ static Property pci_props[] = {
>      DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>      DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
>      DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, -1),
> -    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
> +    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, -1),
>      DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
>                      QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
>      DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present,



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 13/15] hw/pci: Determine if rombar is explicitly enabled
  2024-02-20 12:24 ` [PATCH v6 13/15] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki
@ 2024-02-21  8:15   ` Markus Armbruster
  2024-02-22  6:50     ` Akihiko Odaki
  0 siblings, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2024-02-21  8:15 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block

Akihiko Odaki <akihiko.odaki@daynix.com> writes:

> vfio determines if rombar is explicitly enabled by inspecting QDict.
> Inspecting QDict is not nice because QDict is untyped and depends on the
> details on the external interface. Add an infrastructure to determine if
> rombar is explicitly enabled to hw/pci.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  include/hw/pci/pci_device.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
> index ca151325085d..c4fdc96ef50d 100644
> --- a/include/hw/pci/pci_device.h
> +++ b/include/hw/pci/pci_device.h
> @@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
>      return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
>  }
>  
> +static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev)
> +{
> +    return dev->rom_bar && dev->rom_bar != -1;

Compares signed with unsigned: rom_bar is uint32_t, -1 is int.

If int is at most 32 bits, the comparison converts -1 to
(uint32_t)0xffffffff.

If int is wider, it converts dev->rom_bar to int without changing its
value.  In particular, it converts the default value 0xffffffff (written
as -1 in the previous patch) to (int)0xffffffff.  Oops.

Best not to mix signed and unsigned in comparisons.  Easy enough to
avoid here.

Still, we don't actually test "rom_bar has not been set".  We test
"rom_bar has the default value".  Nothing stops a user from passing
rombar=0xffffffff to -device.  See my review of the next patch.

> +}
> +
>  static inline void pci_set_power(PCIDevice *pci_dev, bool state)
>  {
>      /*



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 14/15] vfio: Avoid inspecting option QDict for rombar
  2024-02-20 12:24 ` [PATCH v6 14/15] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki
@ 2024-02-21  8:16   ` Markus Armbruster
  0 siblings, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2024-02-21  8:16 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block

Akihiko Odaki <akihiko.odaki@daynix.com> writes:

> Use pci_rom_bar_explicitly_enabled() to determine if rombar is explicitly
> enabled.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  hw/vfio/pci.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 4fa387f0430d..647f15b2a060 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1012,7 +1012,6 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>  {
>      uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
>      off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
> -    DeviceState *dev = DEVICE(vdev);
>      char *name;
>      int fd = vdev->vbasedev.fd;
>  
> @@ -1046,7 +1045,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>      }
>  
>      if (vfio_opt_rom_in_denylist(vdev)) {
> -        if (dev->opts && qdict_haskey(dev->opts, "rombar")) {
> +        if (pci_rom_bar_explicitly_enabled(&vdev->pdev)) {
>              warn_report("Device at %s is known to cause system instability"
>                          " issues during option rom execution",
>                          vdev->vbasedev.name);

Consider -device ...,rombar=0xffffffff.

Before the patch, the condition is true.

Afterwards, it's false.

Do we care?



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 15/15] hw/qdev: Remove opts member
  2024-02-20 12:24 ` [PATCH v6 15/15] hw/qdev: Remove opts member Akihiko Odaki
@ 2024-02-21  8:18   ` Markus Armbruster
  0 siblings, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2024-02-21  8:18 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block

Akihiko Odaki <akihiko.odaki@daynix.com> writes:

> It is no longer used.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
>  include/hw/qdev-core.h |  4 ----
>  hw/core/qdev.c         |  1 -
>  system/qdev-monitor.c  | 12 +++++++-----
>  3 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 9228e96c87e9..5954404dcbfe 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -237,10 +237,6 @@ struct DeviceState {
>       * @pending_deleted_expires_ms: optional timeout for deletion events
>       */
>      int64_t pending_deleted_expires_ms;
> -    /**
> -     * @opts: QDict of options for the device
> -     */
> -    QDict *opts;
>      /**
>       * @hotplugged: was device added after PHASE_MACHINE_READY?
>       */
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index c68d0f7c512f..7349c9a86be8 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -706,7 +706,6 @@ static void device_finalize(Object *obj)
>          dev->canonical_path = NULL;
>      }
>  
> -    qobject_unref(dev->opts);
>      g_free(dev->id);
>  }
>  
> diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
> index a13db763e5dd..71c00f62ee38 100644
> --- a/system/qdev-monitor.c
> +++ b/system/qdev-monitor.c
> @@ -625,6 +625,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
>      char *id;
>      DeviceState *dev = NULL;
>      BusState *bus = NULL;
> +    QDict *properties;
>  
>      driver = qdict_get_try_str(opts, "driver");
>      if (!driver) {
> @@ -705,13 +706,14 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
>      }
>  
>      /* set properties */
> -    dev->opts = qdict_clone_shallow(opts);
> -    qdict_del(dev->opts, "driver");
> -    qdict_del(dev->opts, "bus");
> -    qdict_del(dev->opts, "id");
> +    properties = qdict_clone_shallow(opts);
> +    qdict_del(properties, "driver");
> +    qdict_del(properties, "bus");
> +    qdict_del(properties, "id");
>  
> -    object_set_properties_from_keyval(&dev->parent_obj, dev->opts, from_json,
> +    object_set_properties_from_keyval(&dev->parent_obj, properties, from_json,
>                                        errp);
> +    qobject_unref(properties);
>      if (*errp) {
>          goto err_del_dev;
>      }

Reviewed-by: Markus Armbruster <armbru@redhat.com>

Depends on the previous few patches, of course.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 12/15] hw/pci: Use -1 as a default value for rombar
  2024-02-21  7:59   ` Markus Armbruster
@ 2024-02-22  6:40     ` Akihiko Odaki
  0 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-22  6:40 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block

On 2024/02/21 16:59, Markus Armbruster wrote:
> Akihiko Odaki <akihiko.odaki@daynix.com> writes:
> 
>> Currently there is no way to distinguish the case that rombar is
>> explicitly specified as 1 and the case that rombar is not specified.
>>
>> Set rombar -1 by default to distinguish these cases just as it is done
>> for addr and romsize. It was confirmed that changing the default value
>> to -1 will not change the behavior by looking at occurences of rom_bar.
>>
>> $ git grep -w rom_bar
>> hw/display/qxl.c:328:    QXLRom *rom = memory_region_get_ram_ptr(&d->rom_bar);
>> hw/display/qxl.c:431:    qxl_set_dirty(&qxl->rom_bar, 0, qxl->rom_size);
>> hw/display/qxl.c:1048:    QXLRom *rom = memory_region_get_ram_ptr(&qxl->rom_bar);
>> hw/display/qxl.c:2131:    memory_region_init_rom(&qxl->rom_bar, OBJECT(qxl), "qxl.vrom",
>> hw/display/qxl.c:2154: PCI_BASE_ADDRESS_SPACE_MEMORY, &qxl->rom_bar);
>> hw/display/qxl.h:101:    MemoryRegion       rom_bar;
>> hw/pci/pci.c:74:    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
>> hw/pci/pci.c:2329:    if (!pdev->rom_bar) {
>> hw/vfio/pci.c:1019:    if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
>> hw/xen/xen_pt_load_rom.c:29:    if (dev->romfile || !dev->rom_bar) {
>> include/hw/pci/pci_device.h:150:    uint32_t rom_bar;
>>
>> rom_bar refers to a different variable in qxl. It is only tested if
>> the value is 0 or not in the other places.
> 
> Makes me wonder why it's uint32_t.  Not this patch's problem.

Indeed. It should have been OnOffAuto. I'm not changing it now though 
because I'm too worried if it is too disruptive.

> 
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>> ---
>>   hw/pci/pci.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 54b375da2d26..909c5b3ee4ee 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -71,7 +71,7 @@ static Property pci_props[] = {
>>       DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
>>       DEFINE_PROP_STRING("romfile", PCIDevice, romfile),
>>       DEFINE_PROP_UINT32("romsize", PCIDevice, romsize, -1),
>> -    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
>> +    DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, -1),
>>       DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
>>                       QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
>>       DEFINE_PROP_BIT("x-pcie-lnksta-dllla", PCIDevice, cap_present,
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 13/15] hw/pci: Determine if rombar is explicitly enabled
  2024-02-21  8:15   ` Markus Armbruster
@ 2024-02-22  6:50     ` Akihiko Odaki
  0 siblings, 0 replies; 29+ messages in thread
From: Akihiko Odaki @ 2024-02-22  6:50 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Marcel Apfelbaum, Alex Williamson,
	Cédric Le Goater, Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block

On 2024/02/21 17:15, Markus Armbruster wrote:
> Akihiko Odaki <akihiko.odaki@daynix.com> writes:
> 
>> vfio determines if rombar is explicitly enabled by inspecting QDict.
>> Inspecting QDict is not nice because QDict is untyped and depends on the
>> details on the external interface. Add an infrastructure to determine if
>> rombar is explicitly enabled to hw/pci.
>>
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>> ---
>>   include/hw/pci/pci_device.h | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
>> index ca151325085d..c4fdc96ef50d 100644
>> --- a/include/hw/pci/pci_device.h
>> +++ b/include/hw/pci/pci_device.h
>> @@ -205,6 +205,11 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
>>       return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
>>   }
>>   
>> +static inline bool pci_rom_bar_explicitly_enabled(PCIDevice *dev)
>> +{
>> +    return dev->rom_bar && dev->rom_bar != -1;
> 
> Compares signed with unsigned: rom_bar is uint32_t, -1 is int.
> 
> If int is at most 32 bits, the comparison converts -1 to
> (uint32_t)0xffffffff.
> 
> If int is wider, it converts dev->rom_bar to int without changing its
> value.  In particular, it converts the default value 0xffffffff (written
> as -1 in the previous patch) to (int)0xffffffff.  Oops.
> 
> Best not to mix signed and unsigned in comparisons.  Easy enough to
> avoid here.
> 
> Still, we don't actually test "rom_bar has not been set".  We test
> "rom_bar has the default value".  Nothing stops a user from passing
> rombar=0xffffffff to -device.  See my review of the next patch.

I followed addr and romsize properties that already use -1 as the 
default value. addr is int32_t, but romsize is uint32_t. I'll change 
romsize and rom_bar to use UINT32_MAX as the default value.

This changes the behavior of 0xffffffff, but that should be fine. Most 
people should only set 0 or 1. Maybe someone types wrongly and sets a 
number like 2 or 12, but nobody types 0xffffffff by mistake.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 07/15] pcie_sriov: Do not manually unrealize
  2024-02-20 12:24 ` [PATCH v6 07/15] pcie_sriov: Do not manually unrealize Akihiko Odaki
@ 2024-03-12 19:37   ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2024-03-12 19:37 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Alex Williamson, Cédric Le Goater,
	Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block

On Tue, Feb 20, 2024 at 09:24:42PM +0900, Akihiko Odaki wrote:
> A device gets automatically unrealized when being unparented.
> 
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  hw/pci/pcie_sriov.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> index e9b23221d713..8b1fd2a89ad7 100644
> --- a/hw/pci/pcie_sriov.c
> +++ b/hw/pci/pcie_sriov.c
> @@ -17,7 +17,6 @@
>  #include "hw/qdev-properties.h"
>  #include "qemu/error-report.h"
>  #include "qemu/range.h"
> -#include "qapi/error.h"
>  #include "trace.h"
>  
>  static PCIDevice *register_vf(PCIDevice *pf, int devfn,

error.h is used elsewhere in this file too, removing the
include is not a good idea, doing so breaks bisect.

> @@ -204,11 +203,7 @@ static void unregister_vfs(PCIDevice *dev)
>      trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
>                                 PCI_FUNC(dev->devfn), num_vfs);
>      for (i = 0; i < num_vfs; i++) {
> -        Error *err = NULL;
>          PCIDevice *vf = dev->exp.sriov_pf.vf[i];
> -        if (!object_property_set_bool(OBJECT(vf), "realized", false, &err)) {
> -            error_reportf_err(err, "Failed to unplug: ");
> -        }
>          object_unparent(OBJECT(vf));
>          object_unref(OBJECT(vf));
>      }
> 
> -- 
> 2.43.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 08/15] pcie_sriov: Reuse SR-IOV VF device instances
  2024-02-20 12:24 ` [PATCH v6 08/15] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
@ 2024-03-12 19:47   ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2024-03-12 19:47 UTC (permalink / raw)
  To: Akihiko Odaki
  Cc: Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Alex Williamson, Cédric Le Goater,
	Paolo Bonzini, Daniel P. Berrangé,
	Eduardo Habkost, Sriram Yagnaraman, Jason Wang, Keith Busch,
	Klaus Jensen, qemu-devel, qemu-block

On Tue, Feb 20, 2024 at 09:24:43PM +0900, Akihiko Odaki wrote:
> Disable SR-IOV VF devices by reusing code to power down PCI devices
> instead of removing them when the guest requests to disable VFs. This
> allows to realize devices and report VF realization errors at PF
> realization time.
> 
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
>  docs/pcie_sriov.txt         |   8 ++--
>  include/hw/pci/pci.h        |   5 ---
>  include/hw/pci/pci_device.h |  15 +++++++
>  include/hw/pci/pcie_sriov.h |   6 +--
>  hw/net/igb.c                |  13 ++++--
>  hw/nvme/ctrl.c              |  24 +++++++----
>  hw/pci/pci.c                |   2 +-
>  hw/pci/pcie_sriov.c         | 102 +++++++++++++++++++-------------------------
>  8 files changed, 95 insertions(+), 80 deletions(-)
> 
> diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
> index a47aad0bfab0..ab2142807f79 100644
> --- a/docs/pcie_sriov.txt
> +++ b/docs/pcie_sriov.txt
> @@ -52,9 +52,11 @@ setting up a BAR for a VF.
>        ...
>  
>        /* Add and initialize the SR/IOV capability */
> -      pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
> -                       vf_devid, initial_vfs, total_vfs,
> -                       fun_offset, stride);
> +      if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
> +                              vf_devid, initial_vfs, total_vfs,
> +                              fun_offset, stride, errp)) {
> +         return;
> +      }
>  
>        /* Set up individual VF BARs (parameters as for normal BARs) */
>        pcie_sriov_pf_init_vf_bar( ... )
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 6c92b2f70008..442017b4865d 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -644,9 +644,4 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev)
>  MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
>  void pci_set_enabled(PCIDevice *pci_dev, bool state);
>  
> -static inline void pci_set_power(PCIDevice *pci_dev, bool state)
> -{
> -    pci_set_enabled(pci_dev, state);
> -}
> -
>  #endif
> diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
> index d57f9ce83884..ca151325085d 100644
> --- a/include/hw/pci/pci_device.h
> +++ b/include/hw/pci/pci_device.h
> @@ -205,6 +205,21 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev)
>      return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn);
>  }
>  
> +static inline void pci_set_power(PCIDevice *pci_dev, bool state)
> +{
> +    /*
> +     * Don't change the enabled state of VFs when powering on/off the device.
> +     *
> +     * When powering on, VFs must not be enabled immediately but they must
> +     * wait until the guest configures SR-IOV.
> +     * When powering off, their corresponding PFs will be reset and disable
> +     * VFs.
> +     */
> +    if (!pci_is_vf(pci_dev)) {
> +        pci_set_enabled(pci_dev, state);
> +    }
> +}
> +
>  uint16_t pci_requester_id(PCIDevice *dev);
>  
>  /* DMA access functions */
> diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
> index b77eb7bf58ac..4b1133f79e15 100644
> --- a/include/hw/pci/pcie_sriov.h
> +++ b/include/hw/pci/pcie_sriov.h
> @@ -18,7 +18,6 @@
>  struct PCIESriovPF {
>      uint16_t num_vfs;   /* Number of virtual functions created */
>      uint8_t vf_bar_type[PCI_NUM_REGIONS];   /* Store type for each VF bar */
> -    const char *vfname; /* Reference to the device type used for the VFs */
>      PCIDevice **vf;     /* Pointer to an array of num_vfs VF devices */
>  };
>  
> @@ -27,10 +26,11 @@ struct PCIESriovVF {
>      uint16_t vf_number; /* Logical VF number of this function */
>  };
>  
> -void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
> +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>                          const char *vfname, uint16_t vf_dev_id,
>                          uint16_t init_vfs, uint16_t total_vfs,
> -                        uint16_t vf_offset, uint16_t vf_stride);
> +                        uint16_t vf_offset, uint16_t vf_stride,
> +                        Error **errp);
>  void pcie_sriov_pf_exit(PCIDevice *dev);
>  
>  /* Set up a VF bar in the SR/IOV bar area */
> diff --git a/hw/net/igb.c b/hw/net/igb.c
> index 9b37523d6df8..907259fd8b3b 100644
> --- a/hw/net/igb.c
> +++ b/hw/net/igb.c
> @@ -447,9 +447,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp)
>  
>      pcie_ari_init(pci_dev, 0x150);
>  
> -    pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
> -        IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
> -        IGB_VF_OFFSET, IGB_VF_STRIDE);
> +    if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET,
> +                            TYPE_IGBVF, IGB_82576_VF_DEV_ID,
> +                            IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
> +                            IGB_VF_OFFSET, IGB_VF_STRIDE,
> +                            errp)) {
> +        pcie_cap_exit(pci_dev);
> +        igb_cleanup_msix(s);
> +        msi_uninit(pci_dev);
> +        return;
> +    }
>  
>      pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX,
>          PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH,


So, this breaks igb:

 ./build/qemu-system-ppc -S -machine mpc8544ds,accel=tcg -device igb
Unexpected error in set_pci_devfn() at ../hw/core/qdev-properties-system.c:795:
qemu-system-ppc: -device igb: Parameter 'addr' expects a value between -1 and 255
Aborted (core dumped)


Dropped for now.


> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index c1af4b87b34a..98c3e942077c 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -8027,7 +8027,8 @@ static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs,
>      return bar_size;
>  }
>  
> -static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
> +static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
> +                            Error **errp)
>  {
>      uint16_t vf_dev_id = n->params.use_intel_id ?
>                           PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
> @@ -8036,12 +8037,17 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset)
>                                        le16_to_cpu(cap->vifrsm),
>                                        NULL, NULL);
>  
> -    pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
> -                       n->params.sriov_max_vfs, n->params.sriov_max_vfs,
> -                       NVME_VF_OFFSET, NVME_VF_STRIDE);
> +    if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
> +                            n->params.sriov_max_vfs, n->params.sriov_max_vfs,
> +                            NVME_VF_OFFSET, NVME_VF_STRIDE,
> +                            errp)) {
> +        return false;
> +    }
>  
>      pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
>                                PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
> +
> +    return true;
>  }
>  
>  static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
> @@ -8120,6 +8126,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
>          return false;
>      }
>  
> +    if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs &&
> +        !nvme_init_sriov(n, pci_dev, 0x120, errp)) {
> +        msix_uninit(pci_dev, &n->bar0, &n->bar0);
> +        return false;
> +    }
> +
>      nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize);
>  
>      if (n->params.cmb_size_mb) {
> @@ -8130,10 +8142,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
>          nvme_init_pmr(n, pci_dev);
>      }
>  
> -    if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) {
> -        nvme_init_sriov(n, pci_dev, 0x120);
> -    }
> -
>      return true;
>  }
>  
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 8bde13f7cd1e..750c2ba696d1 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2822,7 +2822,7 @@ void pci_set_enabled(PCIDevice *d, bool state)
>      memory_region_set_enabled(&d->bus_master_enable_region,
>                                (pci_get_word(d->config + PCI_COMMAND)
>                                 & PCI_COMMAND_MASTER) && d->enabled);
> -    if (!d->enabled) {
> +    if (d->qdev.realized) {
>          pci_device_reset(d);
>      }
>  }
> diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> index 8b1fd2a89ad7..d934cd7d0e64 100644
> --- a/hw/pci/pcie_sriov.c
> +++ b/hw/pci/pcie_sriov.c
> @@ -19,15 +19,25 @@
>  #include "qemu/range.h"
>  #include "trace.h"
>  
> -static PCIDevice *register_vf(PCIDevice *pf, int devfn,
> -                              const char *name, uint16_t vf_num);
> -static void unregister_vfs(PCIDevice *dev);
> +static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs)
> +{
> +    for (uint16_t i = 0; i < total_vfs; i++) {
> +        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
> +        object_unparent(OBJECT(vf));
> +        object_unref(OBJECT(vf));
> +    }
> +    g_free(dev->exp.sriov_pf.vf);
> +    dev->exp.sriov_pf.vf = NULL;
> +}
>  
> -void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
> +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>                          const char *vfname, uint16_t vf_dev_id,
>                          uint16_t init_vfs, uint16_t total_vfs,
> -                        uint16_t vf_offset, uint16_t vf_stride)
> +                        uint16_t vf_offset, uint16_t vf_stride,
> +                        Error **errp)
>  {
> +    BusState *bus = qdev_get_parent_bus(&dev->qdev);
> +    int32_t devfn = dev->devfn + vf_offset;
>      uint8_t *cfg = dev->config + offset;
>      uint8_t *wmask;
>  
> @@ -35,7 +45,6 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>                          offset, PCI_EXT_CAP_SRIOV_SIZEOF);
>      dev->exp.sriov_cap = offset;
>      dev->exp.sriov_pf.num_vfs = 0;
> -    dev->exp.sriov_pf.vfname = g_strdup(vfname);
>      dev->exp.sriov_pf.vf = NULL;
>  
>      pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset);
> @@ -68,13 +77,35 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
>      pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553);
>  
>      qdev_prop_set_bit(&dev->qdev, "multifunction", true);
> +
> +    dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs);
> +
> +    for (uint16_t i = 0; i < total_vfs; i++) {
> +        PCIDevice *vf = pci_new(devfn, vfname);
> +        vf->exp.sriov_vf.pf = dev;
> +        vf->exp.sriov_vf.vf_number = i;
> +
> +        if (!qdev_realize(&vf->qdev, bus, errp)) {
> +            unparent_vfs(dev, i);
> +            return false;
> +        }
> +
> +        /* set vid/did according to sr/iov spec - they are not used */
> +        pci_config_set_vendor_id(vf->config, 0xffff);
> +        pci_config_set_device_id(vf->config, 0xffff);
> +
> +        dev->exp.sriov_pf.vf[i] = vf;
> +        devfn += vf_stride;
> +    }
> +
> +    return true;
>  }
>  
>  void pcie_sriov_pf_exit(PCIDevice *dev)
>  {
> -    unregister_vfs(dev);
> -    g_free((char *)dev->exp.sriov_pf.vfname);
> -    dev->exp.sriov_pf.vfname = NULL;
> +    uint8_t *cfg = dev->config + dev->exp.sriov_cap;
> +
> +    unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF));
>  }
>  
>  void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num,
> @@ -140,38 +171,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num,
>      }
>  }
>  
> -static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name,
> -                              uint16_t vf_num)
> -{
> -    PCIDevice *dev = pci_new(devfn, name);
> -    dev->exp.sriov_vf.pf = pf;
> -    dev->exp.sriov_vf.vf_number = vf_num;
> -    PCIBus *bus = pci_get_bus(pf);
> -    Error *local_err = NULL;
> -
> -    qdev_realize(&dev->qdev, &bus->qbus, &local_err);
> -    if (local_err) {
> -        error_report_err(local_err);
> -        return NULL;
> -    }
> -
> -    /* set vid/did according to sr/iov spec - they are not used */
> -    pci_config_set_vendor_id(dev->config, 0xffff);
> -    pci_config_set_device_id(dev->config, 0xffff);
> -
> -    return dev;
> -}
> -
>  static void register_vfs(PCIDevice *dev)
>  {
>      uint16_t num_vfs;
>      uint16_t i;
>      uint16_t sriov_cap = dev->exp.sriov_cap;
> -    uint16_t vf_offset =
> -        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET);
> -    uint16_t vf_stride =
> -        pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE);
> -    int32_t devfn = dev->devfn + vf_offset;
>  
>      assert(sriov_cap > 0);
>      num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
> @@ -179,18 +183,10 @@ static void register_vfs(PCIDevice *dev)
>          return;
>      }
>  
> -    dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
> -
>      trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
>                               PCI_FUNC(dev->devfn), num_vfs);
>      for (i = 0; i < num_vfs; i++) {
> -        dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn,
> -                                              dev->exp.sriov_pf.vfname, i);
> -        if (!dev->exp.sriov_pf.vf[i]) {
> -            num_vfs = i;
> -            break;
> -        }
> -        devfn += vf_stride;
> +        pci_set_enabled(dev->exp.sriov_pf.vf[i], true);
>      }
>      dev->exp.sriov_pf.num_vfs = num_vfs;
>  }
> @@ -203,12 +199,8 @@ static void unregister_vfs(PCIDevice *dev)
>      trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn),
>                                 PCI_FUNC(dev->devfn), num_vfs);
>      for (i = 0; i < num_vfs; i++) {
> -        PCIDevice *vf = dev->exp.sriov_pf.vf[i];
> -        object_unparent(OBJECT(vf));
> -        object_unref(OBJECT(vf));
> +        pci_set_enabled(dev->exp.sriov_pf.vf[i], false);
>      }
> -    g_free(dev->exp.sriov_pf.vf);
> -    dev->exp.sriov_pf.vf = NULL;
>      dev->exp.sriov_pf.num_vfs = 0;
>  }
>  
> @@ -230,14 +222,10 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address,
>                               PCI_FUNC(dev->devfn), off, val, len);
>  
>      if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) {
> -        if (dev->exp.sriov_pf.num_vfs) {
> -            if (!(val & PCI_SRIOV_CTRL_VFE)) {
> -                unregister_vfs(dev);
> -            }
> +        if (val & PCI_SRIOV_CTRL_VFE) {
> +            register_vfs(dev);
>          } else {
> -            if (val & PCI_SRIOV_CTRL_VFE) {
> -                register_vfs(dev);
> -            }
> +            unregister_vfs(dev);
>          }
>      }
>  }
> 
> -- 
> 2.43.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-03-12 19:48 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-20 12:24 [PATCH v6 00/15] hw/pci: SR-IOV related fixes and improvements Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 01/15] hw/nvme: Use pcie_sriov_num_vfs() Akihiko Odaki
2024-02-20 14:29   ` Kevin Wolf
2024-02-20 14:47     ` Klaus Jensen
2024-02-20 14:53     ` Kevin Wolf
2024-02-20 15:33       ` Akihiko Odaki
2024-02-20 20:40         ` Klaus Jensen
2024-02-20 12:24 ` [PATCH v6 02/15] pcie_sriov: Validate NumVFs Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 03/15] pcie_sriov: Reset SR-IOV extended capability Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 04/15] pcie_sriov: Do not reset NumVFs after disabling VFs Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 05/15] hw/pci: Always call pcie_sriov_pf_reset() Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 06/15] hw/pci: Rename has_power to enabled Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 07/15] pcie_sriov: Do not manually unrealize Akihiko Odaki
2024-03-12 19:37   ` Michael S. Tsirkin
2024-02-20 12:24 ` [PATCH v6 08/15] pcie_sriov: Reuse SR-IOV VF device instances Akihiko Odaki
2024-03-12 19:47   ` Michael S. Tsirkin
2024-02-20 12:24 ` [PATCH v6 09/15] pcie_sriov: Release VFs failed to realize Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 10/15] pcie_sriov: Remove num_vfs from PCIESriovPF Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 11/15] pcie_sriov: Register VFs after migration Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 12/15] hw/pci: Use -1 as a default value for rombar Akihiko Odaki
2024-02-21  7:59   ` Markus Armbruster
2024-02-22  6:40     ` Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 13/15] hw/pci: Determine if rombar is explicitly enabled Akihiko Odaki
2024-02-21  8:15   ` Markus Armbruster
2024-02-22  6:50     ` Akihiko Odaki
2024-02-20 12:24 ` [PATCH v6 14/15] vfio: Avoid inspecting option QDict for rombar Akihiko Odaki
2024-02-21  8:16   ` Markus Armbruster
2024-02-20 12:24 ` [PATCH v6 15/15] hw/qdev: Remove opts member Akihiko Odaki
2024-02-21  8:18   ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.