All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
To: will@kernel.org
Cc: andre.przywara@arm.com, alexandru.elisei@arm.com,
	kvm@vger.kernel.org,
	Jean-Philippe Brucker <jean-philippe@linaro.org>,
	Pierre Gondois <pierre.gondois@arm.com>
Subject: [PATCH kvmtool v2] pci: Disable writes to Status register
Date: Thu, 20 Oct 2022 18:34:53 +0100	[thread overview]
Message-ID: <20221020173452.203043-1-jean-philippe@linaro.org> (raw)

Although the PCI Status register only contains read-only and
write-1-to-clear bits, we currently keep anything written there, which
can confuse a guest.

The problem was highlighted by recent Linux commit 6cd514e58f12 ("PCI:
Clear PCI_STATUS when setting up device"), which unconditionally writes
0xffff to the Status register in order to clear pending errors. Then the
EDAC driver sees the parity status bits set and attempts to clear them
by writing 0xc100, which in turn clears the Capabilities List bit.
Later on, when the virtio-pci driver starts probing, it assumes due to
missing capabilities that the device is using the legacy transport, and
fails to setup the device because of mismatched protocol.

Filter writes to the config space, keeping only those to writable
fields. Tighten the access size check while we're at it, to prevent
overflow. This is only a small step in the right direction, not a
foolproof solution, because a guest could still write both Command and
Status registers using a single 32-bit write. More work is needed for:
* Supporting arbitrary sized writes.
* Sanitizing accesses to capabilities, which are device-specific.

Also remove the old hack that filtered accesses. It was most likely
guarding against ROM BAR writes, which is now handled by the
pci_config_writable bitmap.

Reported-by: Pierre Gondois <pierre.gondois@arm.com>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
Since v1 [1], I added Alex' suggestions:
* Filter PCI_COMMAND bits
* Also sanitize PIO access length. I'm not as comfortable rejecting a
  boundary-crossing accesses as for ECAM, since the PCI spec is not as
  clear cut, so just update the length. The guests I looked at seem to
  be sane anyway.
* Call VFIO write after checking the mask.

Note that the issue described here only shows up during ACPI boot for
me, because edac_init() happens after PCI enumeration. With DT boot,
edac_pci_clear_parity_errors() runs earlier and doesn't find any device.

[1] https://lore.kernel.org/kvm/20220908144208.231272-1-jean-philippe@linaro.org/
---
 pci.c | 54 ++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 40 insertions(+), 14 deletions(-)

diff --git a/pci.c b/pci.c
index a769ae27..b170885c 100644
--- a/pci.c
+++ b/pci.c
@@ -117,9 +117,6 @@ static void pci_config_data_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data,
 {
 	union pci_config_address pci_config_address;
 
-	if (len > 4)
-		len = 4;
-
 	pci_config_address.w = ioport__read32(&pci_config_address_bits);
 	/*
 	 * If someone accesses PCI configuration space offsets that are not
@@ -127,6 +124,9 @@ static void pci_config_data_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data,
 	 */
 	pci_config_address.reg_offset = addr - PCI_CONFIG_DATA;
 
+	/* Ensure the access does not cross a 4-byte boundary */
+	len = min(len, 4U - pci_config_address.reg_offset);
+
 	if (is_write)
 		pci__config_wr(vcpu->kvm, pci_config_address, data, len);
 	else
@@ -350,6 +350,24 @@ static void pci_config_bar_wr(struct kvm *kvm,
 	pci_activate_bar_regions(kvm, old_addr, bar_size);
 }
 
+/*
+ * Bits that are writable in the config space header.
+ * Write-1-to-clear Status bits are missing since we never set them.
+ */
+static const u8 pci_config_writable[PCI_STD_HEADER_SIZEOF] = {
+	[PCI_COMMAND] =
+		PCI_COMMAND_IO |
+		PCI_COMMAND_MEMORY |
+		PCI_COMMAND_MASTER |
+		PCI_COMMAND_PARITY,
+	[PCI_COMMAND + 1] =
+		(PCI_COMMAND_SERR |
+		 PCI_COMMAND_INTX_DISABLE) >> 8,
+	[PCI_INTERRUPT_LINE] = 0xff,
+	[PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5 + 3] = 0xff,
+	[PCI_CACHE_LINE_SIZE] = 0xff,
+};
+
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
@@ -357,7 +375,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
-	u32 value = 0;
+	u32 value = 0, mask = 0;
 
 	if (!pci_device_exists(addr.bus_number, dev_num, 0))
 		return;
@@ -365,19 +383,19 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	offset = addr.w & PCI_DEV_CFG_MASK;
 	base = pci_hdr = device__find_dev(DEVICE_BUS_PCI, dev_num)->data;
 
+	/* We don't sanity-check capabilities for the moment */
+	if (offset < PCI_STD_HEADER_SIZEOF) {
+		memcpy(&mask, pci_config_writable + offset, size);
+		if (!mask)
+			return;
+	}
+
 	if (pci_hdr->cfg_ops.write)
 		pci_hdr->cfg_ops.write(kvm, pci_hdr, offset, data, size);
 
-	/*
-	 * legacy hack: ignore writes to uninitialized regions (e.g. ROM BAR).
-	 * Not very nice but has been working so far.
-	 */
-	if (*(u32 *)(base + offset) == 0)
-		return;
-
 	if (offset == PCI_COMMAND) {
 		memcpy(&value, data, size);
-		pci_config_command_wr(kvm, pci_hdr, (u16)value);
+		pci_config_command_wr(kvm, pci_hdr, (u16)value & mask);
 		return;
 	}
 
@@ -419,8 +437,16 @@ static void pci_config_mmio_access(struct kvm_cpu *vcpu, u64 addr, u8 *data,
 	cfg_addr.w		= (u32)addr;
 	cfg_addr.enable_bit	= 1;
 
-	if (len > 4)
-		len = 4;
+	/*
+	 * To prevent some overflows, reject accesses that cross a 4-byte
+	 * boundary. The PCIe specification says:
+	 *
+	 *  "Root Complex implementations are not required to support the
+	 *  generation of Configuration Requests from accesses that cross DW
+	 *  [4 bytes] boundaries."
+	 */
+	if ((addr & 3) + len > 4)
+		return;
 
 	if (is_write)
 		pci__config_wr(kvm, cfg_addr, data, len);
-- 
2.37.3


             reply	other threads:[~2022-10-20 17:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-20 17:34 Jean-Philippe Brucker [this message]
2022-11-08 17:38 ` [PATCH kvmtool v2] pci: Disable writes to Status register Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221020173452.203043-1-jean-philippe@linaro.org \
    --to=jean-philippe@linaro.org \
    --cc=alexandru.elisei@arm.com \
    --cc=andre.przywara@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=pierre.gondois@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.