linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
@ 2015-10-03 23:28 Gabriel L. Somlo
  2015-10-03 23:28 ` [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's " Gabriel L. Somlo
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-03 23:28 UTC (permalink / raw)
  To: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel
  Cc: kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, pbonzini, kraxel,
	qemu-devel

From: "Gabriel Somlo" <somlo@cmu.edu>

Allow access to QEMU firmware blobs, passed into the guest VM via
the fw_cfg device, through SysFS entries. Blob meta-data (e.g. name,
size, and fw_cfg key), as well as the raw binary blob data may be
accessed.

The SysFS access location is /sys/firmware/qemu_fw_cfg/... and was
selected based on overall similarity to the type of information
exposed under /sys/firmware/dmi/entries/...


NEW (since v2): Using ACPI to detect the presence and details of the
fw_cfg virtual hardware device.

    Device Tree has been suggested by Ard as a comment on v2 of this
    patch, but after some deliberation I decided to go with ACPI,
    since it's supported on both x86 and some (uefi-enabled) versions
    of aarch64. I really don't see how I'd reasonably use *both* DT (on
    ARM) *and* ACPI (on x86), and after all I'm mostly concerned with
    x86, but originally wanted to maximize portability (which is where
    the register probing in earlier versions came from).

    A patch set generating an ACPI device node for qemu's fw_cfg is
    currently under review on the qemu-devel list:

    http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg06946.html
    (sorry, gmane appears down at the moment...)

In consequence:

	- Patch 1/4 is mostly the same as in v2;
	- Patch 2/4 switches device initialization from register
	  probing to using ACPI; this is a separate patch only to
	  illustrate the transition from probing to ACPI, and I'm
	  assuming it will end up squashed on top of patch 1/4 in
	  the final version.

	- Patches 3/4 and 4/4 add a "human-readable" directory
	  hierarchy built from tokenizing fw_cfg blob names into
	  '/'-separated components, with symlinks to each 'by_key'
	  blob folder (same as in earlier versions). At Greg's
	  suggestion I tried to build this folder hierarchy and
	  leaf symlinks using udev rules, but so far I haven't been
	  successful in figuring that out. If udev turns out to 
	  be applicable after all, these two patches can be dropped
	  from this series.

In other words, patches 1 and 2 give us the following "by_key" listing
of blobs contained in the qemu fw_cfg device (example pulled from a PC
qemu guest running Fedora 22), with the value of each "name" attribute
shown on the right:

$ tree /sys/firmware/qemu_fw_cfg/
/sys/firmware/qemu_fw_cfg/
|-- by_key
|   |-- 32
|   |   |-- key
|   |   |-- name			("etc/boot-fail-wait")
|   |   |-- raw
|   |   `-- size
|   |-- 33
|   |   |-- key
|   |   |-- name			("etc/smbios/smbios-tables")
|   |   |-- raw
|   |   `-- size
|   |-- 34
|   |   |-- key
|   |   |-- name			("etc/smbios/smbios-anchor")
|   |   |-- raw
|   |   `-- size
|   |-- 35
|   |   |-- key
|   |   |-- name			("etc/e820")
|   |   |-- raw
|   |   `-- size
|   |-- 36
|   |   |-- key
|   |   |-- name			("genroms/kvmvapic.bin")
|   |   |-- raw
|   |   `-- size
|   |-- 37
|   |   |-- key
|   |   |-- name			("etc/system-states")
|   |   |-- raw
|   |   `-- size
|   |-- 38
|   |   |-- key
|   |   |-- name			("etc/acpi/tables")
|   |   |-- raw
|   |   `-- size
|   |-- 39
|   |   |-- key
|   |   |-- name			("etc/table-loader")
|   |   |-- raw
|   |   `-- size
|   |-- 40
|   |   |-- key
|   |   |-- name			("etc/tpm/log")
|   |   |-- raw
|   |   `-- size
|   |-- 41
|   |   |-- key
|   |   |-- name			("etc/acpi/rsdp")
|   |   |-- raw
|   |   `-- size
|   `-- 42
|       |-- key
|       |-- name			("bootorder")
|       |-- raw
|       `-- size
|
...

Additionally, patches 3 and 4 (mostly 4) give us the following
"user friendly" directory hierarchy as a complement to the above,
based on tokenizing each blob name into symlink-tipped (sub)directories:

...
|-- by_name
|   |-- bootorder -> ../by_key/42
|   |-- etc
|   |   |-- acpi
|   |   |   |-- rsdp -> ../../../by_key/41
|   |   |   `-- tables -> ../../../by_key/38
|   |   |-- boot-fail-wait -> ../../by_key/32
|   |   |-- e820 -> ../../by_key/35
|   |   |-- smbios
|   |   |   |-- smbios-anchor -> ../../../by_key/34
|   |   |   `-- smbios-tables -> ../../../by_key/33
|   |   |-- system-states -> ../../by_key/37
|   |   |-- table-loader -> ../../by_key/39
|   |   `-- tpm
|   |       `-- log -> ../../../by_key/40
|   `-- genroms
|       `-- kvmvapic.bin -> ../../by_key/36
`-- rev

The trick is to figure out how to replace patches 3 and 4 with a
udev rule that would read the contents of each "name" attribute,
and build the "by_name" hierarchy and symlinks in userspace.

I tried:

$ udevadm info -a -p /sys/firmware/qemu_fw_cfg/by_key/33

  looking at device '/firmware/qemu_fw_cfg/by_key/33':
    KERNEL=="33"
    SUBSYSTEM==""
    DRIVER==""
    ATTR{key}=="33"
    ATTR{name}=="etc/smbios/smbios-tables"
    ATTR{size}=="388"

  looking at parent device '/firmware/qemu_fw_cfg/by_key':
    KERNELS=="by_key"
    SUBSYSTEMS==""
    DRIVERS==""

  looking at parent device '/firmware/qemu_fw_cfg':
    KERNELS=="qemu_fw_cfg"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{rev}=="1"

Then I tried creating a file, /usr/lib/udev/rules.d/99-qemu-fw-cfg.rules
containing the following line:

KERNELS=="qemu_fw_cfg", ATTRS{rev}=="1", SYMLINK="%p/%s{name}"

but NOTHING happens when I insert/remove qemu_fw_cfg.ko.  I also tried:

KERNELS=="qemu_fw_cfg", ATTRS{rev}=="1", PROGRAM="/foo %k"

where "/foo" basically did "echo $* > /tmp/bar", but no /tmp/bar file ever
showed up as a consequence of inserting/removing the qemu_fw_cfg.ko module.

At this point, I need help figuring out whether udev is really what would
get the second, user-friendly, "by_name" /sysfs directory tree created,
and how I'd go about that...

Thanks much,
  --Gabriel

Gabriel Somlo (4):
  firmware: introduce sysfs driver for QEMU's fw_cfg device
  firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg
    driver
  kobject: export kset_find_obj() for module use
  firmware: create directory hierarchy for sysfs fw_cfg entries

 .../ABI/testing/sysfs-firmware-qemu_fw_cfg         | 213 ++++++++
 drivers/firmware/Kconfig                           |  10 +
 drivers/firmware/Makefile                          |   1 +
 drivers/firmware/qemu_fw_cfg.c                     | 575 +++++++++++++++++++++
 lib/kobject.c                                      |   1 +
 5 files changed, 800 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
 create mode 100644 drivers/firmware/qemu_fw_cfg.c

-- 
2.4.3


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's fw_cfg device
  2015-10-03 23:28 [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Gabriel L. Somlo
@ 2015-10-03 23:28 ` Gabriel L. Somlo
  2015-10-04  1:34   ` kbuild test robot
                     ` (3 more replies)
  2015-10-03 23:28 ` [PATCH v3 2/4] firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg driver Gabriel L. Somlo
                   ` (3 subsequent siblings)
  4 siblings, 4 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-03 23:28 UTC (permalink / raw)
  To: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel
  Cc: kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, pbonzini, kraxel,
	qemu-devel

From: Gabriel Somlo <somlo@cmu.edu>

Make fw_cfg entries of type "file" available via sysfs. Entries
are listed under /sys/firmware/qemu_fw_cfg/by_key, in folders
named after each entry's selector key. Filename, selector value,
and size read-only attributes are included for each entry. Also,
a "raw" attribute allows retrieval of the full binary content of
each entry.

This patch also provides a documentation file outlining the
guest-side "hardware" interface exposed by the QEMU fw_cfg device.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
---
 .../ABI/testing/sysfs-firmware-qemu_fw_cfg         | 167 ++++++++
 drivers/firmware/Kconfig                           |  10 +
 drivers/firmware/Makefile                          |   1 +
 drivers/firmware/qemu_fw_cfg.c                     | 456 +++++++++++++++++++++
 4 files changed, 634 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
 create mode 100644 drivers/firmware/qemu_fw_cfg.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
new file mode 100644
index 0000000..f1ef44e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
@@ -0,0 +1,167 @@
+What:		/sys/firmware/qemu_fw_cfg/
+Date:		August 2015
+Contact:	Gabriel Somlo <somlo@cmu.edu>
+Description:
+		Several different architectures supported by QEMU (x86, arm,
+		sun4*, ppc/mac) are provisioned with a firmware configuration
+		(fw_cfg) device, used by the host to provide configuration data
+		to the starting guest. While most of this data is meant for use
+		by the guest firmware, starting with QEMU v2.4, guest VMs may
+		be given arbitrary fw_cfg entries supplied directly on the
+		command line, which therefore may be of interest to userspace.
+
+		=== Guest-side Hardware Interface ===
+
+		The fw_cfg device is available to guest VMs as a register pair
+		(control and data), accessible as either a IO ports or as MMIO
+		addresses, depending on the architecture.
+
+		--- Control Register ---
+
+		Width: 16-bit
+		Access: Write-Only
+		Endianness: LE (if IOport) or BE (if MMIO)
+
+		A write to the control register selects the index for one of
+		the firmware configuration items (or "blobs") available on the
+		fw_cfg device, which can subsequently be read from the data
+		register.
+
+		Each time the control register is written, an data offset
+		internal to the fw_cfg device will be set to zero. This data
+		offset impacts which portion of the selected fw_cfg blob is
+		accessed by reading the data register, as explained below.
+
+		--- Data Register ---
+
+		Width: 8-bit (if IOport), or 8/16/32/64-bit (if MMIO)
+		Access: Read-Only
+		Endianness: string preserving
+
+		The data register allows access to an array of bytes which
+		represent the fw_cfg blob last selected by a write to the
+		control register.
+
+		Immediately following a write to the control register, the data
+		offset will be set to zero. Each successful read access to the
+		data register will increment the data offset by the appropriate
+		access width.
+
+		Each fw_cfg blob has a maximum associated data length. Once the
+		data offset exceeds this maximum length, any subsequent reads
+		via the data register will return 0x00.
+
+		An N-byte wide read of the data register will return the next
+		available N bytes of the selected fw_cfg blob, as a substring,
+		in increasing address order, similar to memcpy(), zero-padded
+		if necessary should the maximum data length of the selected
+		item be reached, as described above.
+
+		--- Per-arch Register Details ---
+
+		-------------------------------------------------------------
+		arch	access	       base	ctrl	ctrl	data	max.
+			mode	    address	offset	endian	offset	data
+						(bytes)			(bytes)
+		-------------------------------------------------------------
+		x86*	IOport	      0x510	0	LE	1	1
+		arm	MMIO	  0x9020000	8	BE	0	8
+		sun4u	IOport	      0x510	0	LE	1	1
+		sun4m	MMIO	0xd00000510	0	BE	2	1
+		ppc/mac	MMIO	 0xf0000510	0	BE	2	1
+		-------------------------------------------------------------
+
+		NOTE 1. On platforms where the fw_cfg registers are exposed as
+		IO ports, the data port number will always be one greater than
+		the port number of the control register. I.e., the two ports
+		are overlapping, and can not be mapped separately.
+
+		=== Firmware Configuration Items of Interest ===
+
+		Originally, the index key, size, and formatting of blobs in
+		fw_cfg was hard coded by mutual agreement between QEMU on the
+		host side, and the guest-side firmware. Later on, a file
+		transfer interface was added: by reading a special blob, the
+		fw_cfg consumer can retrieve a list of records containing the
+		name, selector key, and size of further fw_cfg blobs made
+		available by the host. Below we describe three fw_cfg blobs
+		of interest to the sysfs driver.
+
+		--- Signature (Key 0x0000, FW_CFG_SIGNATURE) ---
+
+		The presence of the fw_cfg device can be verified by selecting
+		the signature blob by writing 0x0000 to the control register,
+		and reading four bytes from the data register. If the fw_cfg
+		device is present, the four bytes read will match the ASCII
+		characters "QEMU".
+
+		--- Revision (Key 0x0001, FW_CFG_ID) ---
+
+		A 32-bit little-endian unsigned integer, this item is used as
+		an interface revision number.
+
+		--- File Directory (Key 0x0019, FW_CFG_FILE_DIR) ---
+
+		Any fw_cfg blobs stored at key 0x0020 FW_CFG_FILE_FIRST() or
+		higher will have an associated entry in this "directory" blob,
+		which facilitates the discovery of available items by software
+		(e.g. BIOS) running on the guest. The format of the directory
+		blob is shown below.
+
+		NOTE: All integers are stored in big-endian format!
+
+		/* the entire file directory "blob" */
+		struct FWCfgFiles {
+			u32 count;		/* total number of entries */
+			struct FWCfgFile f[];	/* entry array, see below */
+		};
+
+		/* an individual directory entry, 64 bytes total */
+		struct FWCfgFile {
+			u32 size;	/* size of referenced blob */
+			u16 select;	/* selector key for referenced blob */
+			u16 reserved;
+			char name[56];	/* blob name, nul-terminated ASCII */
+		};
+
+		=== SysFS fw_cfg Interface ===
+
+		The fw_cfg sysfs interface described in this document is only
+		intended to display discoverable blobs (i.e., those registered
+		with the file directory), as there is no way to determine the
+		presence or size of "legacy" blobs (with selector keys between
+		0x0002 and 0x0018) programmatically.
+
+		All fw_cfg information is shown under:
+
+			/sys/firmware/qemu_fw_cfg/
+
+		The only legacy blob displayed is the fw_cfg device revision:
+
+			/sys/firmware/qemu_fw_cfg/rev
+
+		--- Discoverable fw_cfg blobs by selector key ---
+
+		All discoverable blobs listed in the fw_cfg file directory are
+		displayed as entries named after their unique selector key
+		value, e.g.:
+
+			/sys/firmware/qemu_fw_cfg/by_key/32
+			/sys/firmware/qemu_fw_cfg/by_key/33
+			/sys/firmware/qemu_fw_cfg/by_key/34
+			...
+
+		Each such fw_cfg sysfs entry has the following values exported
+		as attributes:
+
+		name  	: The 56-byte nul-terminated ASCII string used as the
+			  blob's 'file name' in the fw_cfg directory.
+		size  	: The length of the blob, as given in the fw_cfg
+			  directory.
+		key	: The value of the blob's selector key as given in the
+			  fw_cfg directory. This value is the same as used in
+			  the parent directory name.
+		raw	: The raw bytes of the blob, obtained by selecting the
+			  entry via the control register, and reading a number
+			  of bytes equal to the blob size from the data
+			  register.
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 665efca..0466e80 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -135,6 +135,16 @@ config ISCSI_IBFT
 	  detect iSCSI boot parameters dynamically during system boot, say Y.
 	  Otherwise, say N.
 
+config FW_CFG_SYSFS
+	tristate "QEMU fw_cfg device support in sysfs"
+	depends on SYSFS
+	default n
+	help
+	  Say Y or M here to enable the exporting of the QEMU firmware
+	  configuration (fw_cfg) file entries via sysfs. Entries are
+	  found under /sys/firmware/fw_cfg when this option is enabled
+	  and loaded.
+
 config QCOM_SCM
 	bool
 	depends on ARM || ARM64
diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
index 2ee8347..efba22a 100644
--- a/drivers/firmware/Makefile
+++ b/drivers/firmware/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_DMIID)		+= dmi-id.o
 obj-$(CONFIG_ISCSI_IBFT_FIND)	+= iscsi_ibft_find.o
 obj-$(CONFIG_ISCSI_IBFT)	+= iscsi_ibft.o
 obj-$(CONFIG_FIRMWARE_MEMMAP)	+= memmap.o
+obj-$(CONFIG_FW_CFG_SYSFS)	+= qemu_fw_cfg.o
 obj-$(CONFIG_QCOM_SCM)		+= qcom_scm.o
 obj-$(CONFIG_QCOM_SCM_64)	+= qcom_scm-64.o
 obj-$(CONFIG_QCOM_SCM_32)	+= qcom_scm-32.o
diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
new file mode 100644
index 0000000..3a67a16
--- /dev/null
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -0,0 +1,456 @@
+/*
+ * drivers/firmware/qemu_fw_cfg.c
+ *
+ * Expose entries from QEMU's firmware configuration (fw_cfg) device in
+ * sysfs (read-only, under "/sys/firmware/qemu_fw_cfg/...").
+ *
+ * Copyright 2015 Carnegie Mellon University
+ */
+
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/io.h>
+#include <linux/ioport.h>
+
+MODULE_AUTHOR("Gabriel L. Somlo <somlo@cmu.edu>");
+MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
+MODULE_LICENSE("GPL");
+
+/* selector key values for "well-known" fw_cfg entries */
+#define FW_CFG_SIGNATURE  0x00
+#define FW_CFG_ID         0x01
+#define FW_CFG_FILE_DIR   0x19
+
+/* size in bytes of fw_cfg signature */
+#define FW_CFG_SIG_SIZE 4
+
+/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
+#define FW_CFG_MAX_FILE_PATH 56
+
+/* fw_cfg file directory entry type */
+struct fw_cfg_file {
+	u32 size;
+	u16 select;
+	u16 reserved;
+	char name[FW_CFG_MAX_FILE_PATH];
+};
+
+/* fw_cfg device i/o access options type */
+struct fw_cfg_access {
+	const char *name;
+	phys_addr_t base;
+	u8 size;
+	u8 ctrl_offset;
+	u8 data_offset;
+	bool is_mmio;
+};
+
+/* table of fw_cfg device i/o access options for known architectures */
+static struct fw_cfg_access fw_cfg_modes[] = {
+	{
+		.name = "fw_cfg IOport on i386, sun4u",
+		.base = 0x510,
+		.size = 0x02,
+		.ctrl_offset = 0x00,
+		.data_offset = 0x01,
+		.is_mmio = false,
+	}, {
+		.name = "fw_cfg MMIO on arm",
+		.base = 0x9020000,
+		.size = 0x0a,
+		.ctrl_offset = 0x08,
+		.data_offset = 0x00,
+		.is_mmio = true,
+	}, {
+		.name = "fw_cfg MMIO on sun4m",
+		.base = 0xd00000510,
+		.size = 0x03,
+		.ctrl_offset = 0x00,
+		.data_offset = 0x02,
+		.is_mmio = true,
+	}, {
+		.name = "fw_cfg MMIO on ppc/mac",
+		.base = 0xf0000510,
+		.size = 0x03,
+		.ctrl_offset = 0x00,
+		.data_offset = 0x02,
+		.is_mmio = true,
+	}, { } /* END */
+};
+
+/* fw_cfg device i/o currently selected option set */
+static struct fw_cfg_access *fw_cfg_mode;
+
+/* fw_cfg device i/o register addresses */
+static void __iomem *fw_cfg_dev_base;
+static void __iomem *fw_cfg_reg_ctrl;
+static void __iomem *fw_cfg_reg_data;
+
+/* atomic access to fw_cfg device (potentially slow i/o, so using mutex) */
+static DEFINE_MUTEX(fw_cfg_dev_lock);
+
+/* pick appropriate endianness for selector key */
+static inline u16 fw_cfg_sel_endianness(u16 key)
+{
+	return fw_cfg_mode->is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
+}
+
+/* type for fw_cfg "directory scan" visitor/callback function */
+typedef int (*fw_cfg_file_callback)(const struct fw_cfg_file *f);
+
+/* run a given callback on each fw_cfg directory entry */
+static int fw_cfg_scan_dir(fw_cfg_file_callback callback)
+{
+	int ret = 0;
+	u32 count, i;
+	struct fw_cfg_file f;
+
+	mutex_lock(&fw_cfg_dev_lock);
+	iowrite16(fw_cfg_sel_endianness(FW_CFG_FILE_DIR), fw_cfg_reg_ctrl);
+	ioread8_rep(fw_cfg_reg_data, &count, sizeof(count));
+	for (i = 0; i < be32_to_cpu(count); i++) {
+		ioread8_rep(fw_cfg_reg_data, &f, sizeof(f));
+		ret = callback(&f);
+		if (ret)
+			break;
+	}
+	mutex_unlock(&fw_cfg_dev_lock);
+	return ret;
+}
+
+/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
+static inline void fw_cfg_read_blob(u16 key,
+				    void *buf, loff_t pos, size_t count)
+{
+	mutex_lock(&fw_cfg_dev_lock);
+	iowrite16(fw_cfg_sel_endianness(key), fw_cfg_reg_ctrl);
+	while (pos-- > 0)
+		ioread8(fw_cfg_reg_data);
+	ioread8_rep(fw_cfg_reg_data, buf, count);
+	mutex_unlock(&fw_cfg_dev_lock);
+}
+
+/* clean up fw_cfg device i/o */
+static void fw_cfg_io_cleanup(void)
+{
+	if (fw_cfg_mode->is_mmio) {
+		iounmap(fw_cfg_dev_base);
+		release_mem_region(fw_cfg_mode->base, fw_cfg_mode->size);
+	} else {
+		ioport_unmap(fw_cfg_dev_base);
+		release_region(fw_cfg_mode->base, fw_cfg_mode->size);
+	}
+}
+
+/* probe and map fw_cfg device */
+static int __init fw_cfg_io_probe(void)
+{
+	char sig[FW_CFG_SIG_SIZE];
+
+	for (fw_cfg_mode = &fw_cfg_modes[0];
+	     fw_cfg_mode->base; fw_cfg_mode++) {
+
+		phys_addr_t base = fw_cfg_mode->base;
+		u8 size = fw_cfg_mode->size;
+
+		/* reserve and map mmio or ioport region */
+		if (fw_cfg_mode->is_mmio) {
+			if (!request_mem_region(base, size, fw_cfg_mode->name))
+				continue;
+			fw_cfg_dev_base = ioremap(base, size);
+			if (!fw_cfg_dev_base) {
+				release_mem_region(base, size);
+				continue;
+			}
+		} else {
+			if (!request_region(base, size, fw_cfg_mode->name))
+				continue;
+			fw_cfg_dev_base = ioport_map(base, size);
+			if (!fw_cfg_dev_base) {
+				release_region(base, size);
+				continue;
+			}
+		}
+
+		/* set control and data register addresses */
+		fw_cfg_reg_ctrl = fw_cfg_dev_base + fw_cfg_mode->ctrl_offset;
+		fw_cfg_reg_data = fw_cfg_dev_base + fw_cfg_mode->data_offset;
+
+		/* verify fw_cfg device signature */
+		fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
+		if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) == 0)
+			/* success, we're done */
+			return 0;
+
+		/* clean up before probing next access mode */
+		fw_cfg_io_cleanup();
+	}
+
+	return -ENODEV;
+}
+
+/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
+static u32 fw_cfg_rev;
+
+static ssize_t fw_cfg_showrev(struct kobject *k, struct attribute *a, char *buf)
+{
+	return sprintf(buf, "%u\n", fw_cfg_rev);
+}
+
+static const struct {
+	struct attribute attr;
+	ssize_t (*show)(struct kobject *k, struct attribute *a, char *buf);
+} fw_cfg_rev_attr = {
+	.attr = { .name = "rev", .mode = S_IRUSR },
+	.show = fw_cfg_showrev,
+};
+
+/* fw_cfg_sysfs_entry type */
+struct fw_cfg_sysfs_entry {
+	struct kobject kobj;
+	struct fw_cfg_file f;
+	struct list_head list;
+};
+
+/* get fw_cfg_sysfs_entry from kobject member */
+static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
+{
+	return container_of(kobj, struct fw_cfg_sysfs_entry, kobj);
+}
+
+/* fw_cfg_sysfs_attribute type */
+struct fw_cfg_sysfs_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct fw_cfg_sysfs_entry *entry, char *buf);
+};
+
+/* get fw_cfg_sysfs_attribute from attribute member */
+static inline struct fw_cfg_sysfs_attribute *to_attr(struct attribute *attr)
+{
+	return container_of(attr, struct fw_cfg_sysfs_attribute, attr);
+}
+
+/* global cache of fw_cfg_sysfs_entry objects */
+static LIST_HEAD(fw_cfg_entry_cache);
+
+/* kobjects removed lazily by kernel, mutual exclusion needed */
+static DEFINE_SPINLOCK(fw_cfg_cache_lock);
+
+static inline void fw_cfg_sysfs_cache_enlist(struct fw_cfg_sysfs_entry *entry)
+{
+	spin_lock(&fw_cfg_cache_lock);
+	list_add_tail(&entry->list, &fw_cfg_entry_cache);
+	spin_unlock(&fw_cfg_cache_lock);
+}
+
+static inline void fw_cfg_sysfs_cache_delist(struct fw_cfg_sysfs_entry *entry)
+{
+	spin_lock(&fw_cfg_cache_lock);
+	list_del(&entry->list);
+	spin_unlock(&fw_cfg_cache_lock);
+}
+
+static void fw_cfg_sysfs_cache_cleanup(void)
+{
+	struct fw_cfg_sysfs_entry *entry, *next;
+
+	list_for_each_entry_safe(entry, next, &fw_cfg_entry_cache, list) {
+		/* will end up invoking fw_cfg_sysfs_cache_delist()
+		 * via each object's release() method (i.e. destructor)
+		 */
+		kobject_put(&entry->kobj);
+	}
+}
+
+/* default_attrs: per-entry attributes and show methods */
+
+#define FW_CFG_SYSFS_ATTR(_attr) \
+struct fw_cfg_sysfs_attribute fw_cfg_sysfs_attr_##_attr = { \
+	.attr = { .name = __stringify(_attr), .mode = S_IRUSR }, \
+	.show = fw_cfg_sysfs_show_##_attr, \
+}
+
+static ssize_t fw_cfg_sysfs_show_size(struct fw_cfg_sysfs_entry *e, char *buf)
+{
+	return sprintf(buf, "%u\n", e->f.size);
+}
+
+static ssize_t fw_cfg_sysfs_show_key(struct fw_cfg_sysfs_entry *e, char *buf)
+{
+	return sprintf(buf, "%u\n", e->f.select);
+}
+
+static ssize_t fw_cfg_sysfs_show_name(struct fw_cfg_sysfs_entry *e, char *buf)
+{
+	return sprintf(buf, "%s\n", e->f.name);
+}
+
+static FW_CFG_SYSFS_ATTR(size);
+static FW_CFG_SYSFS_ATTR(key);
+static FW_CFG_SYSFS_ATTR(name);
+
+static struct attribute *fw_cfg_sysfs_entry_attrs[] = {
+	&fw_cfg_sysfs_attr_size.attr,
+	&fw_cfg_sysfs_attr_key.attr,
+	&fw_cfg_sysfs_attr_name.attr,
+	NULL,
+};
+
+/* sysfs_ops: find fw_cfg_[entry, attribute] and call appropriate show method */
+static ssize_t fw_cfg_sysfs_attr_show(struct kobject *kobj, struct attribute *a,
+				      char *buf)
+{
+	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
+	struct fw_cfg_sysfs_attribute *attr = to_attr(a);
+
+	return attr->show(entry, buf);
+}
+
+static const struct sysfs_ops fw_cfg_sysfs_attr_ops = {
+	.show = fw_cfg_sysfs_attr_show,
+};
+
+/* release: destructor, to be called via kobject_put() */
+static void fw_cfg_sysfs_release_entry(struct kobject *kobj)
+{
+	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
+
+	fw_cfg_sysfs_cache_delist(entry);
+	kfree(entry);
+}
+
+/* kobj_type: ties together all properties required to register an entry */
+static struct kobj_type fw_cfg_sysfs_entry_ktype = {
+	.default_attrs = fw_cfg_sysfs_entry_attrs,
+	.sysfs_ops = &fw_cfg_sysfs_attr_ops,
+	.release = fw_cfg_sysfs_release_entry,
+};
+
+/* raw-read method and attribute */
+static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
+				     struct bin_attribute *bin_attr,
+				     char *buf, loff_t pos, size_t count)
+{
+	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
+
+	if (pos > entry->f.size)
+		return -EINVAL;
+
+	if (count > entry->f.size - pos)
+		count = entry->f.size - pos;
+
+	fw_cfg_read_blob(entry->f.select, buf, pos, count);
+	return count;
+}
+
+static struct bin_attribute fw_cfg_sysfs_attr_raw = {
+	.attr = { .name = "raw", .mode = 0400 },
+	.read = fw_cfg_sysfs_read_raw,
+};
+
+/* kobjects & kset representing top-level, by_key, and by_name folders */
+static struct kobject *fw_cfg_top_ko;
+static struct kobject *fw_cfg_sel_ko;
+
+/* callback function to register an individual fw_cfg file */
+static int __init fw_cfg_register_file(const struct fw_cfg_file *f)
+{
+	int err;
+	struct fw_cfg_sysfs_entry *entry;
+
+	/* allocate new entry */
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+
+	/* set file entry information */
+	entry->f.size = be32_to_cpu(f->size);
+	entry->f.select = be16_to_cpu(f->select);
+	strcpy(entry->f.name, f->name);
+
+	/* register entry under "/sys/firmware/qemu_fw_cfg/by_key/" */
+	err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype,
+				   fw_cfg_sel_ko, "%d", entry->f.select);
+	if (err)
+		goto err_register;
+
+	/* add raw binary content access */
+	err = sysfs_create_bin_file(&entry->kobj, &fw_cfg_sysfs_attr_raw);
+	if (err)
+		goto err_add_raw;
+
+	/* success, add entry to global cache */
+	fw_cfg_sysfs_cache_enlist(entry);
+	return 0;
+
+err_add_raw:
+	kobject_del(&entry->kobj);
+err_register:
+	kfree(entry);
+	return err;
+}
+
+/* unregister top-level or by_key folder */
+static inline void fw_cfg_kobj_cleanup(struct kobject *kobj)
+{
+	kobject_del(kobj);
+	kobject_put(kobj);
+}
+
+static int __init fw_cfg_sysfs_init(void)
+{
+	int err;
+
+	/* probe for the fw_cfg "hardware" */
+	err = fw_cfg_io_probe();
+	if (err)
+		return err;
+
+	/* create /sys/firmware/qemu_fw_cfg/ and its subdirectories */
+	err = -ENOMEM;
+	fw_cfg_top_ko = kobject_create_and_add("qemu_fw_cfg", firmware_kobj);
+	if (!fw_cfg_top_ko)
+		goto err_top;
+	fw_cfg_sel_ko = kobject_create_and_add("by_key", fw_cfg_top_ko);
+	if (!fw_cfg_sel_ko)
+		goto err_sel;
+
+	/* get revision number, add matching top-level attribute */
+	fw_cfg_read_blob(FW_CFG_ID, &fw_cfg_rev, 0, sizeof(fw_cfg_rev));
+	fw_cfg_rev = le32_to_cpu(fw_cfg_rev);
+	err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
+	if (err)
+		goto err_rev;
+
+	/* process fw_cfg file directory entry, registering each file */
+	err = fw_cfg_scan_dir(fw_cfg_register_file);
+	if (err)
+		goto err_scan;
+
+	/* success */
+	pr_debug("fw_cfg: loaded.\n");
+	return 0;
+
+err_scan:
+	fw_cfg_sysfs_cache_cleanup();
+	sysfs_remove_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
+err_rev:
+	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
+err_sel:
+	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
+err_top:
+	fw_cfg_io_cleanup();
+	return err;
+}
+
+static void __exit fw_cfg_sysfs_exit(void)
+{
+	pr_debug("fw_cfg: unloading.\n");
+	fw_cfg_sysfs_cache_cleanup();
+	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
+	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
+	fw_cfg_io_cleanup();
+}
+
+module_init(fw_cfg_sysfs_init);
+module_exit(fw_cfg_sysfs_exit);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/4] firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg driver
  2015-10-03 23:28 [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Gabriel L. Somlo
  2015-10-03 23:28 ` [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's " Gabriel L. Somlo
@ 2015-10-03 23:28 ` Gabriel L. Somlo
  2015-10-04  7:54   ` Michael S. Tsirkin
  2015-10-03 23:28 ` [PATCH v3 3/4] kobject: export kset_find_obj() for module use Gabriel L. Somlo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-03 23:28 UTC (permalink / raw)
  To: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel
  Cc: kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, pbonzini, kraxel,
	qemu-devel

From: Gabriel Somlo <somlo@cmu.edu>

Instead of blindly probing fw_cfg registers at known IOport and MMIO
locations, use the ACPI subsystem to determine whether a QEMU fw_cfg
device is present, and, if found, to initialize it.

This limits portability to architectures which support ACPI (x86 and
UEFI-enabled aarch64), but avoids touching hardware registers before
being certain that our device is present.

NOTE: The standard way to verify the presence of fw_cfg on arm VMs
would have been to use the device tree, but that would have left out
x86, which is the primary architecture targeted by this patch.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
---
 .../ABI/testing/sysfs-firmware-qemu_fw_cfg         |   4 +
 drivers/firmware/Kconfig                           |   2 +-
 drivers/firmware/qemu_fw_cfg.c                     | 201 +++++++++++----------
 3 files changed, 113 insertions(+), 94 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
index f1ef44e..e9761bf 100644
--- a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
+++ b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
@@ -76,6 +76,10 @@ Description:
 		the port number of the control register. I.e., the two ports
 		are overlapping, and can not be mapped separately.
 
+		NOTE 2. QEMU publishes the register details in the device tree
+		on arm guests, and in ACPI (under _HID "QEMU0002") on x86 and
+		select arm (aarch64) VM types.
+
 		=== Firmware Configuration Items of Interest ===
 
 		Originally, the index key, size, and formatting of blobs in
diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 0466e80..bc12d31 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -137,7 +137,7 @@ config ISCSI_IBFT
 
 config FW_CFG_SYSFS
 	tristate "QEMU fw_cfg device support in sysfs"
-	depends on SYSFS
+	depends on SYSFS && ACPI
 	default n
 	help
 	  Say Y or M here to enable the exporting of the QEMU firmware
diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 3a67a16..f935afb 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -8,6 +8,7 @@
  */
 
 #include <linux/module.h>
+#include <linux/acpi.h>
 #include <linux/slab.h>
 #include <linux/io.h>
 #include <linux/ioport.h>
@@ -35,53 +36,10 @@ struct fw_cfg_file {
 	char name[FW_CFG_MAX_FILE_PATH];
 };
 
-/* fw_cfg device i/o access options type */
-struct fw_cfg_access {
-	const char *name;
-	phys_addr_t base;
-	u8 size;
-	u8 ctrl_offset;
-	u8 data_offset;
-	bool is_mmio;
-};
-
-/* table of fw_cfg device i/o access options for known architectures */
-static struct fw_cfg_access fw_cfg_modes[] = {
-	{
-		.name = "fw_cfg IOport on i386, sun4u",
-		.base = 0x510,
-		.size = 0x02,
-		.ctrl_offset = 0x00,
-		.data_offset = 0x01,
-		.is_mmio = false,
-	}, {
-		.name = "fw_cfg MMIO on arm",
-		.base = 0x9020000,
-		.size = 0x0a,
-		.ctrl_offset = 0x08,
-		.data_offset = 0x00,
-		.is_mmio = true,
-	}, {
-		.name = "fw_cfg MMIO on sun4m",
-		.base = 0xd00000510,
-		.size = 0x03,
-		.ctrl_offset = 0x00,
-		.data_offset = 0x02,
-		.is_mmio = true,
-	}, {
-		.name = "fw_cfg MMIO on ppc/mac",
-		.base = 0xf0000510,
-		.size = 0x03,
-		.ctrl_offset = 0x00,
-		.data_offset = 0x02,
-		.is_mmio = true,
-	}, { } /* END */
-};
-
-/* fw_cfg device i/o currently selected option set */
-static struct fw_cfg_access *fw_cfg_mode;
-
 /* fw_cfg device i/o register addresses */
+static bool fw_cfg_is_mmio;
+static phys_addr_t fw_cfg_phys_base;
+static u32 fw_cfg_phys_size;
 static void __iomem *fw_cfg_dev_base;
 static void __iomem *fw_cfg_reg_ctrl;
 static void __iomem *fw_cfg_reg_data;
@@ -92,7 +50,7 @@ static DEFINE_MUTEX(fw_cfg_dev_lock);
 /* pick appropriate endianness for selector key */
 static inline u16 fw_cfg_sel_endianness(u16 key)
 {
-	return fw_cfg_mode->is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
+	return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
 }
 
 /* type for fw_cfg "directory scan" visitor/callback function */
@@ -133,60 +91,100 @@ static inline void fw_cfg_read_blob(u16 key,
 /* clean up fw_cfg device i/o */
 static void fw_cfg_io_cleanup(void)
 {
-	if (fw_cfg_mode->is_mmio) {
+	if (fw_cfg_is_mmio) {
 		iounmap(fw_cfg_dev_base);
-		release_mem_region(fw_cfg_mode->base, fw_cfg_mode->size);
+		release_mem_region(fw_cfg_phys_base, fw_cfg_phys_size);
 	} else {
 		ioport_unmap(fw_cfg_dev_base);
-		release_region(fw_cfg_mode->base, fw_cfg_mode->size);
+		release_region(fw_cfg_phys_base, fw_cfg_phys_size);
 	}
 }
 
-/* probe and map fw_cfg device */
-static int __init fw_cfg_io_probe(void)
+/* configure fw_cfg device i/o from ACPI _CRS method */
+static acpi_status fw_cfg_walk_crs(struct acpi_resource *r, void *context)
+{
+	struct acpi_resource_io *io;
+	struct acpi_resource_fixed_memory32 *mmio;
+
+	switch (r->type) {
+	case ACPI_RESOURCE_TYPE_END_TAG:
+		return AE_OK;
+	case ACPI_RESOURCE_TYPE_IO:
+		io = &r->data.io;
+		/* physical base addr should NOT be already set */
+		if (fw_cfg_phys_base)
+			return AE_ERROR;
+		if (!request_region(io->minimum,
+				    io->address_length, "fw_cfg_io"))
+			return AE_ERROR;
+		fw_cfg_dev_base = ioport_map(io->minimum, io->address_length);
+		if (!fw_cfg_dev_base) {
+			release_region(io->minimum, io->address_length);
+			return AE_ERROR;
+		}
+		fw_cfg_phys_base = io->minimum;
+		fw_cfg_phys_size = io->address_length;
+		fw_cfg_is_mmio = false;
+		/* set register addresses (pc/i386 offsets) */
+		fw_cfg_reg_ctrl = fw_cfg_dev_base + 0x00;
+		fw_cfg_reg_data = fw_cfg_dev_base + 0x01;
+		return AE_OK;
+	case ACPI_RESOURCE_TYPE_FIXED_MEMORY32:
+		mmio = &r->data.fixed_memory32;
+		/* physical base addr should NOT be already set */
+		if (fw_cfg_phys_base)
+			return AE_ERROR;
+		/* MMIO and ACPI, but not on ARM ?!?! */
+		if (mmio->address_length < 0x0a)
+			return AE_ERROR;
+		if (!request_mem_region(mmio->address,
+					mmio->address_length, "fw_cfg_mem"))
+			return AE_ERROR;
+		fw_cfg_dev_base = ioremap(mmio->address, mmio->address_length);
+		if (!fw_cfg_dev_base) {
+			release_mem_region(mmio->address, mmio->address_length);
+			return AE_ERROR;
+		}
+		fw_cfg_phys_base = mmio->address;
+		fw_cfg_phys_size = mmio->address_length;
+		fw_cfg_is_mmio = true;
+		/* set register addresses (arm offsets) */
+		fw_cfg_reg_ctrl = fw_cfg_dev_base + 0x08;
+		fw_cfg_reg_data = fw_cfg_dev_base + 0x00;
+		return AE_OK;
+	default:
+		return AE_ERROR;
+	}
+}
+
+/* initialize fw_cfg device i/o from ACPI data */
+static int fw_cfg_acpi_init(struct acpi_device *dev)
 {
 	char sig[FW_CFG_SIG_SIZE];
+	acpi_status status;
+	int err;
 
-	for (fw_cfg_mode = &fw_cfg_modes[0];
-	     fw_cfg_mode->base; fw_cfg_mode++) {
-
-		phys_addr_t base = fw_cfg_mode->base;
-		u8 size = fw_cfg_mode->size;
-
-		/* reserve and map mmio or ioport region */
-		if (fw_cfg_mode->is_mmio) {
-			if (!request_mem_region(base, size, fw_cfg_mode->name))
-				continue;
-			fw_cfg_dev_base = ioremap(base, size);
-			if (!fw_cfg_dev_base) {
-				release_mem_region(base, size);
-				continue;
-			}
-		} else {
-			if (!request_region(base, size, fw_cfg_mode->name))
-				continue;
-			fw_cfg_dev_base = ioport_map(base, size);
-			if (!fw_cfg_dev_base) {
-				release_region(base, size);
-				continue;
-			}
-		}
+	err = acpi_bus_get_status(dev);
+	if (err < 0)
+		return err;
 
-		/* set control and data register addresses */
-		fw_cfg_reg_ctrl = fw_cfg_dev_base + fw_cfg_mode->ctrl_offset;
-		fw_cfg_reg_data = fw_cfg_dev_base + fw_cfg_mode->data_offset;
+	if (!(dev->status.enabled && dev->status.functional))
+		return -ENODEV;
 
-		/* verify fw_cfg device signature */
-		fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
-		if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) == 0)
-			/* success, we're done */
-			return 0;
+	/* extract device i/o details from _CRS */
+	status = acpi_walk_resources(dev->handle, METHOD_NAME__CRS,
+				     fw_cfg_walk_crs, NULL);
+	if (status != AE_OK || !fw_cfg_phys_base)
+		return -ENODEV;
 
-		/* clean up before probing next access mode */
+	/* verify fw_cfg device signature */
+	fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
+	if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
 		fw_cfg_io_cleanup();
+		return -ENODEV;
 	}
 
-	return -ENODEV;
+	return 0;
 }
 
 /* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
@@ -353,7 +351,7 @@ static struct kobject *fw_cfg_top_ko;
 static struct kobject *fw_cfg_sel_ko;
 
 /* callback function to register an individual fw_cfg file */
-static int __init fw_cfg_register_file(const struct fw_cfg_file *f)
+static int fw_cfg_register_file(const struct fw_cfg_file *f)
 {
 	int err;
 	struct fw_cfg_sysfs_entry *entry;
@@ -397,12 +395,12 @@ static inline void fw_cfg_kobj_cleanup(struct kobject *kobj)
 	kobject_put(kobj);
 }
 
-static int __init fw_cfg_sysfs_init(void)
+static int fw_cfg_sysfs_add(struct acpi_device *dev)
 {
 	int err;
 
-	/* probe for the fw_cfg "hardware" */
-	err = fw_cfg_io_probe();
+	/* initialize fw_cfg device i/o from ACPI data */
+	err = fw_cfg_acpi_init(dev);
 	if (err)
 		return err;
 
@@ -443,14 +441,31 @@ err_top:
 	return err;
 }
 
-static void __exit fw_cfg_sysfs_exit(void)
+static int fw_cfg_sysfs_remove(struct acpi_device *dev)
 {
 	pr_debug("fw_cfg: unloading.\n");
 	fw_cfg_sysfs_cache_cleanup();
 	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
 	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
 	fw_cfg_io_cleanup();
+	return 0;
 }
 
-module_init(fw_cfg_sysfs_init);
-module_exit(fw_cfg_sysfs_exit);
+static const struct acpi_device_id fw_cfg_sysfs_device_ids[] = {
+	{ "QEMU0002", 0 },
+	{ "", 0 },
+};
+MODULE_DEVICE_TABLE(acpi, fw_cfg_sysfs_device_ids);
+
+static struct acpi_driver fw_cfg_sysfs_driver = {
+	.name =		"fw_cfg",
+	.class =	"QEMU",
+	.ids =		fw_cfg_sysfs_device_ids,
+	.ops =		{
+				.add =		fw_cfg_sysfs_add,
+				.remove =	fw_cfg_sysfs_remove,
+			},
+	.owner =	THIS_MODULE,
+};
+
+module_acpi_driver(fw_cfg_sysfs_driver);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/4] kobject: export kset_find_obj() for module use
  2015-10-03 23:28 [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Gabriel L. Somlo
  2015-10-03 23:28 ` [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's " Gabriel L. Somlo
  2015-10-03 23:28 ` [PATCH v3 2/4] firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg driver Gabriel L. Somlo
@ 2015-10-03 23:28 ` Gabriel L. Somlo
  2015-10-03 23:28 ` [PATCH v3 4/4] firmware: create directory hierarchy for sysfs fw_cfg entries Gabriel L. Somlo
  2015-10-05 10:00 ` [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Mark Rutland
  4 siblings, 0 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-03 23:28 UTC (permalink / raw)
  To: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel
  Cc: kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, pbonzini, kraxel,
	qemu-devel

From: Gabriel Somlo <somlo@cmu.edu>

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
---
 lib/kobject.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/kobject.c b/lib/kobject.c
index 3e3a5c3..bea2c9b 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -842,6 +842,7 @@ struct kobject *kset_find_obj(struct kset *kset, const char *name)
 	spin_unlock(&kset->list_lock);
 	return ret;
 }
+EXPORT_SYMBOL(kset_find_obj);
 
 static void kset_release(struct kobject *kobj)
 {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 4/4] firmware: create directory hierarchy for sysfs fw_cfg entries
  2015-10-03 23:28 [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Gabriel L. Somlo
                   ` (2 preceding siblings ...)
  2015-10-03 23:28 ` [PATCH v3 3/4] kobject: export kset_find_obj() for module use Gabriel L. Somlo
@ 2015-10-03 23:28 ` Gabriel L. Somlo
  2015-10-05 10:00 ` [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Mark Rutland
  4 siblings, 0 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-03 23:28 UTC (permalink / raw)
  To: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel
  Cc: kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, pbonzini, kraxel,
	qemu-devel

From: Gabriel Somlo <somlo@cmu.edu>

Each fw_cfg entry of type "file" has an associated 56-char,
nul-terminated ASCII string which represents its name. While
the fw_cfg device doesn't itself impose any specific naming
convention, QEMU developers have traditionally used path name
semantics (i.e. "etc/acpi/rsdp") to descriptively name the
various fw_cfg "blobs" passed into the guest.

This patch attempts, on a best effort basis, to create a
directory hierarchy representing the content of fw_cfg file
names, under /sys/firmware/qemu_fw_cfg/by_name.

Upon successful creation of all directories representing the
"dirname" portion of a fw_cfg file, a symlink will be created
to represent the "basename", pointing at the appropriate
/sys/firmware/qemu_fw_cfg/by_key entry. If a file name is not
suitable for this procedure (e.g., if its basename or dirname
components collide with an already existing dirname component
or basename, respectively) the corresponding fw_cfg blob is
skipped and will remain available in sysfs only by its selector
key value.

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
---
 .../ABI/testing/sysfs-firmware-qemu_fw_cfg         |  42 +++++++++
 drivers/firmware/qemu_fw_cfg.c                     | 104 +++++++++++++++++++++
 2 files changed, 146 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
index e9761bf..2e3eaba 100644
--- a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
+++ b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
@@ -169,3 +169,45 @@ Description:
 			  entry via the control register, and reading a number
 			  of bytes equal to the blob size from the data
 			  register.
+
+		--- Listing fw_cfg blobs by file name ---
+
+		While the fw_cfg device does not impose any specific naming
+		convention on the blobs registered in the file directory,
+		QEMU developers have traditionally used path name semantics
+		to give each blob a descriptive name. For example:
+
+			"bootorder"
+			"genroms/kvmvapic.bin"
+			"etc/e820"
+			"etc/boot-fail-wait"
+			"etc/system-states"
+			"etc/table-loader"
+			"etc/acpi/rsdp"
+			"etc/acpi/tables"
+			"etc/smbios/smbios-tables"
+			"etc/smbios/smbios-anchor"
+			...
+
+		In addition to the listing by unique selector key described
+		above, the fw_cfg sysfs driver also attempts to build a tree
+		of directories matching the path name components of fw_cfg
+		blob names, ending in symlinks to the by_key entry for each
+		"basename", as illustrated below (assume current directory is
+		/sys/firmware):
+
+		    qemu_fw_cfg/by_name/bootorder -> ../by_key/38
+		    qemu_fw_cfg/by_name/etc/e820 -> ../../by_key/35
+		    qemu_fw_cfg/by_name/etc/acpi/rsdp -> ../../../by_key/41
+		    ...
+
+		Construction of the directory tree and symlinks is done on a
+		"best-effort" basis, as there is no guarantee that components
+		of fw_cfg blob names are always "well behaved". I.e., there is
+		the possibility that a symlink (basename) will conflict with
+		a dirname component of another fw_cfg blob, in which case the
+		creation of the offending /sys/firmware/qemu_fw_cfg/by_name
+		entry will be skipped.
+
+		The authoritative list of entries will continue to be found
+		under the /sys/firmware/qemu_fw_cfg/by_key directory.
diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index f935afb..880f8ff 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -346,9 +346,104 @@ static struct bin_attribute fw_cfg_sysfs_attr_raw = {
 	.read = fw_cfg_sysfs_read_raw,
 };
 
+/*
+ * Create a kset subdirectory matching each '/' delimited dirname token
+ * in 'name', starting with sysfs kset/folder 'dir'; At the end, create
+ * a symlink directed at the given 'target'.
+ * NOTE: We do this on a best-effort basis, since 'name' is not guaranteed
+ * to be a well-behaved path name. Whenever a symlink vs. kset directory
+ * name collision occurs, the kernel will issue big scary warnings while
+ * refusing to add the offending link or directory. We follow up with our
+ * own, slightly less scary error messages explaining the situation :)
+ */
+static int __init fw_cfg_build_symlink(struct kset *dir,
+				       struct kobject *target,
+				       const char *name)
+{
+	int ret;
+	struct kset *subdir;
+	struct kobject *ko;
+	char *name_copy, *p, *tok;
+
+	if (!dir || !target || !name || !*name)
+		return -EINVAL;
+
+	/* clone a copy of name for parsing */
+	name_copy = p = kstrdup(name, GFP_KERNEL);
+	if (!name_copy)
+		return -ENOMEM;
+
+	/* create folders for each dirname token, then symlink for basename */
+	while ((tok = strsep(&p, "/")) && *tok) {
+
+		/* last (basename) token? If so, add symlink here */
+		if (!p || !*p) {
+			ret = sysfs_create_link(&dir->kobj, target, tok);
+			break;
+		}
+
+		/* does the current dir contain an item named after tok ? */
+		ko = kset_find_obj(dir, tok);
+		if (ko) {
+			/* drop reference added by kset_find_obj */
+			kobject_put(ko);
+
+			/* ko MUST be a kset - we're about to use it as one ! */
+			if (ko->ktype != dir->kobj.ktype) {
+				ret = -EINVAL;
+				break;
+			}
+
+			/* descend into already existing subdirectory */
+			dir = to_kset(ko);
+		} else {
+			/* create new subdirectory kset */
+			subdir = kzalloc(sizeof(struct kset), GFP_KERNEL);
+			if (!subdir) {
+				ret = -ENOMEM;
+				break;
+			}
+			subdir->kobj.kset = dir;
+			subdir->kobj.ktype = dir->kobj.ktype;
+			ret = kobject_set_name(&subdir->kobj, "%s", tok);
+			if (ret) {
+				kfree(subdir);
+				break;
+			}
+			ret = kset_register(subdir);
+			if (ret) {
+				kfree(subdir);
+				break;
+			}
+
+			/* descend into newly created subdirectory */
+			dir = subdir;
+		}
+	}
+
+	/* we're done with cloned copy of name */
+	kfree(name_copy);
+	return ret;
+}
+
+/* recursively unregister fw_cfg/by_name/ kset directory tree */
+static void fw_cfg_kset_unregister_recursive(struct kset *kset)
+{
+	struct kobject *k, *next;
+
+	list_for_each_entry_safe(k, next, &kset->list, entry)
+		/* all set members are ksets too, but check just in case... */
+		if (k->ktype == kset->kobj.ktype)
+			fw_cfg_kset_unregister_recursive(to_kset(k));
+
+	/* symlinks are cleanly and automatically removed with the directory */
+	kset_unregister(kset);
+}
+
 /* kobjects & kset representing top-level, by_key, and by_name folders */
 static struct kobject *fw_cfg_top_ko;
 static struct kobject *fw_cfg_sel_ko;
+static struct kset *fw_cfg_fname_kset;
 
 /* callback function to register an individual fw_cfg file */
 static int fw_cfg_register_file(const struct fw_cfg_file *f)
@@ -377,6 +472,9 @@ static int fw_cfg_register_file(const struct fw_cfg_file *f)
 	if (err)
 		goto err_add_raw;
 
+	/* try adding "/sys/firmware/qemu_fw_cfg/by_name/" symlink */
+	fw_cfg_build_symlink(fw_cfg_fname_kset, &entry->kobj, entry->f.name);
+
 	/* success, add entry to global cache */
 	fw_cfg_sysfs_cache_enlist(entry);
 	return 0;
@@ -412,6 +510,9 @@ static int fw_cfg_sysfs_add(struct acpi_device *dev)
 	fw_cfg_sel_ko = kobject_create_and_add("by_key", fw_cfg_top_ko);
 	if (!fw_cfg_sel_ko)
 		goto err_sel;
+	fw_cfg_fname_kset = kset_create_and_add("by_name", NULL, fw_cfg_top_ko);
+	if (!fw_cfg_fname_kset)
+		goto err_name;
 
 	/* get revision number, add matching top-level attribute */
 	fw_cfg_read_blob(FW_CFG_ID, &fw_cfg_rev, 0, sizeof(fw_cfg_rev));
@@ -433,6 +534,8 @@ err_scan:
 	fw_cfg_sysfs_cache_cleanup();
 	sysfs_remove_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
 err_rev:
+	fw_cfg_kset_unregister_recursive(fw_cfg_fname_kset);
+err_name:
 	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
 err_sel:
 	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
@@ -445,6 +548,7 @@ static int fw_cfg_sysfs_remove(struct acpi_device *dev)
 {
 	pr_debug("fw_cfg: unloading.\n");
 	fw_cfg_sysfs_cache_cleanup();
+	fw_cfg_kset_unregister_recursive(fw_cfg_fname_kset);
 	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
 	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
 	fw_cfg_io_cleanup();
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's fw_cfg device
  2015-10-03 23:28 ` [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's " Gabriel L. Somlo
@ 2015-10-04  1:34   ` kbuild test robot
  2015-10-06  8:40   ` [Qemu-devel] " Stefan Hajnoczi
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 26+ messages in thread
From: kbuild test robot @ 2015-10-04  1:34 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: kbuild-all, gregkh, paul, galak, will.deacon, agross,
	mark.rutland, zajec5, hanjun.guo, catalin.marinas, linux-api,
	linux-kernel, kernelnewbies, matt.fleming, lersek,
	jordan.l.justen, mst, peter.maydell, leif.lindholm,
	ard.biesheuvel, pbonzini, kraxel, qemu-devel

Hi Gabriel,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   drivers/firmware/qemu_fw_cfg.c:66:25: sparse: constant 0xd00000510 is so big it is long
>> drivers/firmware/qemu_fw_cfg.c:95:39: sparse: restricted __be16 degrades to integer
>> drivers/firmware/qemu_fw_cfg.c:95:58: sparse: restricted __le16 degrades to integer
>> drivers/firmware/qemu_fw_cfg.c:111:25: sparse: cast to restricted __be32
>> drivers/firmware/qemu_fw_cfg.c:111:25: sparse: cast to restricted __be32
>> drivers/firmware/qemu_fw_cfg.c:111:25: sparse: cast to restricted __be32
>> drivers/firmware/qemu_fw_cfg.c:111:25: sparse: cast to restricted __be32
>> drivers/firmware/qemu_fw_cfg.c:111:25: sparse: cast to restricted __be32
>> drivers/firmware/qemu_fw_cfg.c:111:25: sparse: cast to restricted __be32
>> drivers/firmware/qemu_fw_cfg.c:95:39: sparse: restricted __be16 degrades to integer
>> drivers/firmware/qemu_fw_cfg.c:95:58: sparse: restricted __le16 degrades to integer
>> drivers/firmware/qemu_fw_cfg.c:95:39: sparse: restricted __be16 degrades to integer
>> drivers/firmware/qemu_fw_cfg.c:95:58: sparse: restricted __le16 degrades to integer
   drivers/firmware/qemu_fw_cfg.c:367:25: sparse: cast to restricted __be32
   drivers/firmware/qemu_fw_cfg.c:367:25: sparse: cast to restricted __be32
   drivers/firmware/qemu_fw_cfg.c:367:25: sparse: cast to restricted __be32
   drivers/firmware/qemu_fw_cfg.c:367:25: sparse: cast to restricted __be32
   drivers/firmware/qemu_fw_cfg.c:367:25: sparse: cast to restricted __be32
   drivers/firmware/qemu_fw_cfg.c:367:25: sparse: cast to restricted __be32
>> drivers/firmware/qemu_fw_cfg.c:368:27: sparse: cast to restricted __be16
>> drivers/firmware/qemu_fw_cfg.c:368:27: sparse: cast to restricted __be16
>> drivers/firmware/qemu_fw_cfg.c:368:27: sparse: cast to restricted __be16
>> drivers/firmware/qemu_fw_cfg.c:368:27: sparse: cast to restricted __be16
>> drivers/firmware/qemu_fw_cfg.c:95:39: sparse: restricted __be16 degrades to integer
>> drivers/firmware/qemu_fw_cfg.c:95:58: sparse: restricted __le16 degrades to integer
>> drivers/firmware/qemu_fw_cfg.c:420:22: sparse: cast to restricted __le32

vim +95 drivers/firmware/qemu_fw_cfg.c

    60			.size = 0x0a,
    61			.ctrl_offset = 0x08,
    62			.data_offset = 0x00,
    63			.is_mmio = true,
    64		}, {
    65			.name = "fw_cfg MMIO on sun4m",
  > 66			.base = 0xd00000510,
    67			.size = 0x03,
    68			.ctrl_offset = 0x00,
    69			.data_offset = 0x02,
    70			.is_mmio = true,
    71		}, {
    72			.name = "fw_cfg MMIO on ppc/mac",
    73			.base = 0xf0000510,
    74			.size = 0x03,
    75			.ctrl_offset = 0x00,
    76			.data_offset = 0x02,
    77			.is_mmio = true,
    78		}, { } /* END */
    79	};
    80	
    81	/* fw_cfg device i/o currently selected option set */
    82	static struct fw_cfg_access *fw_cfg_mode;
    83	
    84	/* fw_cfg device i/o register addresses */
    85	static void __iomem *fw_cfg_dev_base;
    86	static void __iomem *fw_cfg_reg_ctrl;
    87	static void __iomem *fw_cfg_reg_data;
    88	
    89	/* atomic access to fw_cfg device (potentially slow i/o, so using mutex) */
    90	static DEFINE_MUTEX(fw_cfg_dev_lock);
    91	
    92	/* pick appropriate endianness for selector key */
    93	static inline u16 fw_cfg_sel_endianness(u16 key)
    94	{
  > 95		return fw_cfg_mode->is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
    96	}
    97	
    98	/* type for fw_cfg "directory scan" visitor/callback function */
    99	typedef int (*fw_cfg_file_callback)(const struct fw_cfg_file *f);
   100	
   101	/* run a given callback on each fw_cfg directory entry */
   102	static int fw_cfg_scan_dir(fw_cfg_file_callback callback)
   103	{
   104		int ret = 0;
   105		u32 count, i;
   106		struct fw_cfg_file f;
   107	
   108		mutex_lock(&fw_cfg_dev_lock);
   109		iowrite16(fw_cfg_sel_endianness(FW_CFG_FILE_DIR), fw_cfg_reg_ctrl);
   110		ioread8_rep(fw_cfg_reg_data, &count, sizeof(count));
 > 111		for (i = 0; i < be32_to_cpu(count); i++) {
   112			ioread8_rep(fw_cfg_reg_data, &f, sizeof(f));
   113			ret = callback(&f);
   114			if (ret)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg driver
  2015-10-03 23:28 ` [PATCH v3 2/4] firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg driver Gabriel L. Somlo
@ 2015-10-04  7:54   ` Michael S. Tsirkin
  2015-10-04 20:24     ` Gabriel L. Somlo
  0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2015-10-04  7:54 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	kernelnewbies, matt.fleming, lersek, jordan.l.justen,
	peter.maydell, leif.lindholm, ard.biesheuvel, pbonzini, kraxel,
	qemu-devel

On Sat, Oct 03, 2015 at 07:28:07PM -0400, Gabriel L. Somlo wrote:
> From: Gabriel Somlo <somlo@cmu.edu>
> 
> Instead of blindly probing fw_cfg registers at known IOport and MMIO
> locations, use the ACPI subsystem to determine whether a QEMU fw_cfg
> device is present, and, if found, to initialize it.
> 
> This limits portability to architectures which support ACPI (x86 and
> UEFI-enabled aarch64), but avoids touching hardware registers before
> being certain that our device is present.
> 
> NOTE: The standard way to verify the presence of fw_cfg on arm VMs
> would have been to use the device tree, but that would have left out
> x86, which is the primary architecture targeted by this patch.
> 
> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>

IMHO it's not a good idea to probe registers provided
by CRS like this.
It seems quite reasonable that we'd want to add some
extra registers in the future, and this probing will break.

Further, accessing registers directly means that there's
no way to have ACPI code access them as that would
cause race conditions.

Maybe we should provide access methods in ACPI instead?


> ---
>  .../ABI/testing/sysfs-firmware-qemu_fw_cfg         |   4 +
>  drivers/firmware/Kconfig                           |   2 +-
>  drivers/firmware/qemu_fw_cfg.c                     | 201 +++++++++++----------
>  3 files changed, 113 insertions(+), 94 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> index f1ef44e..e9761bf 100644
> --- a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> +++ b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> @@ -76,6 +76,10 @@ Description:
>  		the port number of the control register. I.e., the two ports
>  		are overlapping, and can not be mapped separately.
>  
> +		NOTE 2. QEMU publishes the register details in the device tree
> +		on arm guests, and in ACPI (under _HID "QEMU0002") on x86 and
> +		select arm (aarch64) VM types.
> +
>  		=== Firmware Configuration Items of Interest ===
>  
>  		Originally, the index key, size, and formatting of blobs in
> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> index 0466e80..bc12d31 100644
> --- a/drivers/firmware/Kconfig
> +++ b/drivers/firmware/Kconfig
> @@ -137,7 +137,7 @@ config ISCSI_IBFT
>  
>  config FW_CFG_SYSFS
>  	tristate "QEMU fw_cfg device support in sysfs"
> -	depends on SYSFS
> +	depends on SYSFS && ACPI
>  	default n
>  	help
>  	  Say Y or M here to enable the exporting of the QEMU firmware
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index 3a67a16..f935afb 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -8,6 +8,7 @@
>   */
>  
>  #include <linux/module.h>
> +#include <linux/acpi.h>
>  #include <linux/slab.h>
>  #include <linux/io.h>
>  #include <linux/ioport.h>
> @@ -35,53 +36,10 @@ struct fw_cfg_file {
>  	char name[FW_CFG_MAX_FILE_PATH];
>  };
>  
> -/* fw_cfg device i/o access options type */
> -struct fw_cfg_access {
> -	const char *name;
> -	phys_addr_t base;
> -	u8 size;
> -	u8 ctrl_offset;
> -	u8 data_offset;
> -	bool is_mmio;
> -};
> -
> -/* table of fw_cfg device i/o access options for known architectures */
> -static struct fw_cfg_access fw_cfg_modes[] = {
> -	{
> -		.name = "fw_cfg IOport on i386, sun4u",
> -		.base = 0x510,
> -		.size = 0x02,
> -		.ctrl_offset = 0x00,
> -		.data_offset = 0x01,
> -		.is_mmio = false,
> -	}, {
> -		.name = "fw_cfg MMIO on arm",
> -		.base = 0x9020000,
> -		.size = 0x0a,
> -		.ctrl_offset = 0x08,
> -		.data_offset = 0x00,
> -		.is_mmio = true,
> -	}, {
> -		.name = "fw_cfg MMIO on sun4m",
> -		.base = 0xd00000510,
> -		.size = 0x03,
> -		.ctrl_offset = 0x00,
> -		.data_offset = 0x02,
> -		.is_mmio = true,
> -	}, {
> -		.name = "fw_cfg MMIO on ppc/mac",
> -		.base = 0xf0000510,
> -		.size = 0x03,
> -		.ctrl_offset = 0x00,
> -		.data_offset = 0x02,
> -		.is_mmio = true,
> -	}, { } /* END */
> -};
> -
> -/* fw_cfg device i/o currently selected option set */
> -static struct fw_cfg_access *fw_cfg_mode;
> -
>  /* fw_cfg device i/o register addresses */
> +static bool fw_cfg_is_mmio;
> +static phys_addr_t fw_cfg_phys_base;
> +static u32 fw_cfg_phys_size;
>  static void __iomem *fw_cfg_dev_base;
>  static void __iomem *fw_cfg_reg_ctrl;
>  static void __iomem *fw_cfg_reg_data;
> @@ -92,7 +50,7 @@ static DEFINE_MUTEX(fw_cfg_dev_lock);
>  /* pick appropriate endianness for selector key */
>  static inline u16 fw_cfg_sel_endianness(u16 key)
>  {
> -	return fw_cfg_mode->is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
> +	return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
>  }
>  
>  /* type for fw_cfg "directory scan" visitor/callback function */
> @@ -133,60 +91,100 @@ static inline void fw_cfg_read_blob(u16 key,
>  /* clean up fw_cfg device i/o */
>  static void fw_cfg_io_cleanup(void)
>  {
> -	if (fw_cfg_mode->is_mmio) {
> +	if (fw_cfg_is_mmio) {
>  		iounmap(fw_cfg_dev_base);
> -		release_mem_region(fw_cfg_mode->base, fw_cfg_mode->size);
> +		release_mem_region(fw_cfg_phys_base, fw_cfg_phys_size);
>  	} else {
>  		ioport_unmap(fw_cfg_dev_base);
> -		release_region(fw_cfg_mode->base, fw_cfg_mode->size);
> +		release_region(fw_cfg_phys_base, fw_cfg_phys_size);
>  	}
>  }
>  
> -/* probe and map fw_cfg device */
> -static int __init fw_cfg_io_probe(void)
> +/* configure fw_cfg device i/o from ACPI _CRS method */
> +static acpi_status fw_cfg_walk_crs(struct acpi_resource *r, void *context)
> +{
> +	struct acpi_resource_io *io;
> +	struct acpi_resource_fixed_memory32 *mmio;
> +
> +	switch (r->type) {
> +	case ACPI_RESOURCE_TYPE_END_TAG:
> +		return AE_OK;
> +	case ACPI_RESOURCE_TYPE_IO:
> +		io = &r->data.io;
> +		/* physical base addr should NOT be already set */
> +		if (fw_cfg_phys_base)
> +			return AE_ERROR;
> +		if (!request_region(io->minimum,
> +				    io->address_length, "fw_cfg_io"))
> +			return AE_ERROR;
> +		fw_cfg_dev_base = ioport_map(io->minimum, io->address_length);
> +		if (!fw_cfg_dev_base) {
> +			release_region(io->minimum, io->address_length);
> +			return AE_ERROR;
> +		}
> +		fw_cfg_phys_base = io->minimum;
> +		fw_cfg_phys_size = io->address_length;
> +		fw_cfg_is_mmio = false;
> +		/* set register addresses (pc/i386 offsets) */
> +		fw_cfg_reg_ctrl = fw_cfg_dev_base + 0x00;
> +		fw_cfg_reg_data = fw_cfg_dev_base + 0x01;
> +		return AE_OK;
> +	case ACPI_RESOURCE_TYPE_FIXED_MEMORY32:
> +		mmio = &r->data.fixed_memory32;
> +		/* physical base addr should NOT be already set */
> +		if (fw_cfg_phys_base)
> +			return AE_ERROR;
> +		/* MMIO and ACPI, but not on ARM ?!?! */
> +		if (mmio->address_length < 0x0a)
> +			return AE_ERROR;
> +		if (!request_mem_region(mmio->address,
> +					mmio->address_length, "fw_cfg_mem"))
> +			return AE_ERROR;
> +		fw_cfg_dev_base = ioremap(mmio->address, mmio->address_length);
> +		if (!fw_cfg_dev_base) {
> +			release_mem_region(mmio->address, mmio->address_length);
> +			return AE_ERROR;
> +		}
> +		fw_cfg_phys_base = mmio->address;
> +		fw_cfg_phys_size = mmio->address_length;
> +		fw_cfg_is_mmio = true;
> +		/* set register addresses (arm offsets) */
> +		fw_cfg_reg_ctrl = fw_cfg_dev_base + 0x08;
> +		fw_cfg_reg_data = fw_cfg_dev_base + 0x00;
> +		return AE_OK;
> +	default:
> +		return AE_ERROR;
> +	}
> +}
> +
> +/* initialize fw_cfg device i/o from ACPI data */
> +static int fw_cfg_acpi_init(struct acpi_device *dev)
>  {
>  	char sig[FW_CFG_SIG_SIZE];
> +	acpi_status status;
> +	int err;
>  
> -	for (fw_cfg_mode = &fw_cfg_modes[0];
> -	     fw_cfg_mode->base; fw_cfg_mode++) {
> -
> -		phys_addr_t base = fw_cfg_mode->base;
> -		u8 size = fw_cfg_mode->size;
> -
> -		/* reserve and map mmio or ioport region */
> -		if (fw_cfg_mode->is_mmio) {
> -			if (!request_mem_region(base, size, fw_cfg_mode->name))
> -				continue;
> -			fw_cfg_dev_base = ioremap(base, size);
> -			if (!fw_cfg_dev_base) {
> -				release_mem_region(base, size);
> -				continue;
> -			}
> -		} else {
> -			if (!request_region(base, size, fw_cfg_mode->name))
> -				continue;
> -			fw_cfg_dev_base = ioport_map(base, size);
> -			if (!fw_cfg_dev_base) {
> -				release_region(base, size);
> -				continue;
> -			}
> -		}
> +	err = acpi_bus_get_status(dev);
> +	if (err < 0)
> +		return err;
>  
> -		/* set control and data register addresses */
> -		fw_cfg_reg_ctrl = fw_cfg_dev_base + fw_cfg_mode->ctrl_offset;
> -		fw_cfg_reg_data = fw_cfg_dev_base + fw_cfg_mode->data_offset;
> +	if (!(dev->status.enabled && dev->status.functional))
> +		return -ENODEV;
>  
> -		/* verify fw_cfg device signature */
> -		fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> -		if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) == 0)
> -			/* success, we're done */
> -			return 0;
> +	/* extract device i/o details from _CRS */
> +	status = acpi_walk_resources(dev->handle, METHOD_NAME__CRS,
> +				     fw_cfg_walk_crs, NULL);
> +	if (status != AE_OK || !fw_cfg_phys_base)
> +		return -ENODEV;
>  
> -		/* clean up before probing next access mode */
> +	/* verify fw_cfg device signature */
> +	fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> +	if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
>  		fw_cfg_io_cleanup();
> +		return -ENODEV;
>  	}
>  
> -	return -ENODEV;
> +	return 0;
>  }
>  
>  /* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> @@ -353,7 +351,7 @@ static struct kobject *fw_cfg_top_ko;
>  static struct kobject *fw_cfg_sel_ko;
>  
>  /* callback function to register an individual fw_cfg file */
> -static int __init fw_cfg_register_file(const struct fw_cfg_file *f)
> +static int fw_cfg_register_file(const struct fw_cfg_file *f)
>  {
>  	int err;
>  	struct fw_cfg_sysfs_entry *entry;
> @@ -397,12 +395,12 @@ static inline void fw_cfg_kobj_cleanup(struct kobject *kobj)
>  	kobject_put(kobj);
>  }
>  
> -static int __init fw_cfg_sysfs_init(void)
> +static int fw_cfg_sysfs_add(struct acpi_device *dev)
>  {
>  	int err;
>  
> -	/* probe for the fw_cfg "hardware" */
> -	err = fw_cfg_io_probe();
> +	/* initialize fw_cfg device i/o from ACPI data */
> +	err = fw_cfg_acpi_init(dev);
>  	if (err)
>  		return err;
>  
> @@ -443,14 +441,31 @@ err_top:
>  	return err;
>  }
>  
> -static void __exit fw_cfg_sysfs_exit(void)
> +static int fw_cfg_sysfs_remove(struct acpi_device *dev)
>  {
>  	pr_debug("fw_cfg: unloading.\n");
>  	fw_cfg_sysfs_cache_cleanup();
>  	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
>  	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
>  	fw_cfg_io_cleanup();
> +	return 0;
>  }
>  
> -module_init(fw_cfg_sysfs_init);
> -module_exit(fw_cfg_sysfs_exit);
> +static const struct acpi_device_id fw_cfg_sysfs_device_ids[] = {
> +	{ "QEMU0002", 0 },
> +	{ "", 0 },
> +};
> +MODULE_DEVICE_TABLE(acpi, fw_cfg_sysfs_device_ids);
> +
> +static struct acpi_driver fw_cfg_sysfs_driver = {
> +	.name =		"fw_cfg",
> +	.class =	"QEMU",
> +	.ids =		fw_cfg_sysfs_device_ids,
> +	.ops =		{
> +				.add =		fw_cfg_sysfs_add,
> +				.remove =	fw_cfg_sysfs_remove,
> +			},
> +	.owner =	THIS_MODULE,
> +};
> +
> +module_acpi_driver(fw_cfg_sysfs_driver);
> -- 
> 2.4.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg driver
  2015-10-04  7:54   ` Michael S. Tsirkin
@ 2015-10-04 20:24     ` Gabriel L. Somlo
  2015-10-04 20:27       ` Gabriel L. Somlo
  0 siblings, 1 reply; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-04 20:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	kernelnewbies, matt.fleming, lersek, jordan.l.justen,
	peter.maydell, leif.lindholm, ard.biesheuvel, pbonzini, kraxel,
	qemu-devel

On Sun, Oct 04, 2015 at 10:54:57AM +0300, Michael S. Tsirkin wrote:
> On Sat, Oct 03, 2015 at 07:28:07PM -0400, Gabriel L. Somlo wrote:
> > From: Gabriel Somlo <somlo@cmu.edu>
> > 
> > Instead of blindly probing fw_cfg registers at known IOport and MMIO
> > locations, use the ACPI subsystem to determine whether a QEMU fw_cfg
> > device is present, and, if found, to initialize it.
> > 
> > This limits portability to architectures which support ACPI (x86 and
> > UEFI-enabled aarch64), but avoids touching hardware registers before
> > being certain that our device is present.
> > 
> > NOTE: The standard way to verify the presence of fw_cfg on arm VMs
> > would have been to use the device tree, but that would have left out
> > x86, which is the primary architecture targeted by this patch.
> > 
> > Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> 
> IMHO it's not a good idea to probe registers provided
> by CRS like this.
> It seems quite reasonable that we'd want to add some
> extra registers in the future, and this probing will break.
> 
> Further, accessing registers directly means that there's
> no way to have ACPI code access them as that would
> cause race conditions.
> 
> Maybe we should provide access methods in ACPI instead?

OK, I think I understand what you meant by "don't poke CRS" in the
other thread...

So, you're proposing I move the follwing bits:

  /* atomic access to fw_cfg device (potentially slow i/o, so using
   * mutex) */
  static DEFINE_MUTEX(fw_cfg_dev_lock);

  /* pick appropriate endianness for selector key */
  static inline u16 fw_cfg_sel_endianness(u16 key)
  {
          return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
  }

  /* type for fw_cfg "directory scan" visitor/callback function */
  typedef int (*fw_cfg_file_callback)(const struct fw_cfg_file *f);

  /* run a given callback on each fw_cfg directory entry */
  static int fw_cfg_scan_dir(fw_cfg_file_callback callback)
  {
          int ret = 0;
          u32 count, i;
          struct fw_cfg_file f;

          mutex_lock(&fw_cfg_dev_lock);
          iowrite16(fw_cfg_sel_endianness(FW_CFG_FILE_DIR), fw_cfg_reg_ctrl);
          ioread8_rep(fw_cfg_reg_data, &count, sizeof(count));
          for (i = 0; i < be32_to_cpu(count); i++) {
                  ioread8_rep(fw_cfg_reg_data, &f, sizeof(f));
                  ret = callback(&f);
                  if (ret)
                          break;
          }
          mutex_unlock(&fw_cfg_dev_lock);
          return ret;
  }

  /* read chunk of given fw_cfg blob (caller responsible for
   * sanity-check) */
  static inline void fw_cfg_read_blob(u16 key,
                                      void *buf, loff_t pos, size_t count)
  {
          mutex_lock(&fw_cfg_dev_lock);
          iowrite16(fw_cfg_sel_endianness(key), fw_cfg_reg_ctrl);
          while (pos-- > 0)
                  ioread8(fw_cfg_reg_data);
          ioread8_rep(fw_cfg_reg_data, buf, count);
          mutex_unlock(&fw_cfg_dev_lock);
  }

into the FWCF, "QEMU0002" node as an AML method ? Have ACPI provide
mutual exclusion against competing readers, and somehow figure out how
to call the ACPI/AML code from the guest-side kernel driver whenever
I need to call fw_cfg_read_blob() ?

I guess I could implement fw_cfg_scan_dir() using fw_cfg_read_blob():

  u32 count;
  size_t  bufsize;
  void *buf;
  fw_cfg_read_blob(FW_CFG_FILE_DIR, &count, 0, sizeof(u32));
  bufsize = sizeof(u32) + count * sizeof(struct fw_cfg_file);
  buf = kalloc(bufsize);
  fw_cfg_read_blob(FW_CFG_FILE_DIR, buf, 0, bufsize);
  ...
  /* now read all the blob meta-data from buf ... */

It would be 100% atomic, but since we can safely assume the fw_cfg
contents never change, it'd be OK.

The atomicity of the ACPI version of fw_cfg_read_blob(), picking the
right endianness for the selector, etc. would have to be done in AML
within the QEMU host-side patch.

If you know of anything I can look at for a good ASL example, please
point it out to me. I'm going to go away now and spend some quality
time with the ACPI spec :)

Thanks,
--Gabriel

> 
> 
> > ---
> >  .../ABI/testing/sysfs-firmware-qemu_fw_cfg         |   4 +
> >  drivers/firmware/Kconfig                           |   2 +-
> >  drivers/firmware/qemu_fw_cfg.c                     | 201 +++++++++++----------
> >  3 files changed, 113 insertions(+), 94 deletions(-)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> > index f1ef44e..e9761bf 100644
> > --- a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> > +++ b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> > @@ -76,6 +76,10 @@ Description:
> >  		the port number of the control register. I.e., the two ports
> >  		are overlapping, and can not be mapped separately.
> >  
> > +		NOTE 2. QEMU publishes the register details in the device tree
> > +		on arm guests, and in ACPI (under _HID "QEMU0002") on x86 and
> > +		select arm (aarch64) VM types.
> > +
> >  		=== Firmware Configuration Items of Interest ===
> >  
> >  		Originally, the index key, size, and formatting of blobs in
> > diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> > index 0466e80..bc12d31 100644
> > --- a/drivers/firmware/Kconfig
> > +++ b/drivers/firmware/Kconfig
> > @@ -137,7 +137,7 @@ config ISCSI_IBFT
> >  
> >  config FW_CFG_SYSFS
> >  	tristate "QEMU fw_cfg device support in sysfs"
> > -	depends on SYSFS
> > +	depends on SYSFS && ACPI
> >  	default n
> >  	help
> >  	  Say Y or M here to enable the exporting of the QEMU firmware
> > diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> > index 3a67a16..f935afb 100644
> > --- a/drivers/firmware/qemu_fw_cfg.c
> > +++ b/drivers/firmware/qemu_fw_cfg.c
> > @@ -8,6 +8,7 @@
> >   */
> >  
> >  #include <linux/module.h>
> > +#include <linux/acpi.h>
> >  #include <linux/slab.h>
> >  #include <linux/io.h>
> >  #include <linux/ioport.h>
> > @@ -35,53 +36,10 @@ struct fw_cfg_file {
> >  	char name[FW_CFG_MAX_FILE_PATH];
> >  };
> >  
> > -/* fw_cfg device i/o access options type */
> > -struct fw_cfg_access {
> > -	const char *name;
> > -	phys_addr_t base;
> > -	u8 size;
> > -	u8 ctrl_offset;
> > -	u8 data_offset;
> > -	bool is_mmio;
> > -};
> > -
> > -/* table of fw_cfg device i/o access options for known architectures */
> > -static struct fw_cfg_access fw_cfg_modes[] = {
> > -	{
> > -		.name = "fw_cfg IOport on i386, sun4u",
> > -		.base = 0x510,
> > -		.size = 0x02,
> > -		.ctrl_offset = 0x00,
> > -		.data_offset = 0x01,
> > -		.is_mmio = false,
> > -	}, {
> > -		.name = "fw_cfg MMIO on arm",
> > -		.base = 0x9020000,
> > -		.size = 0x0a,
> > -		.ctrl_offset = 0x08,
> > -		.data_offset = 0x00,
> > -		.is_mmio = true,
> > -	}, {
> > -		.name = "fw_cfg MMIO on sun4m",
> > -		.base = 0xd00000510,
> > -		.size = 0x03,
> > -		.ctrl_offset = 0x00,
> > -		.data_offset = 0x02,
> > -		.is_mmio = true,
> > -	}, {
> > -		.name = "fw_cfg MMIO on ppc/mac",
> > -		.base = 0xf0000510,
> > -		.size = 0x03,
> > -		.ctrl_offset = 0x00,
> > -		.data_offset = 0x02,
> > -		.is_mmio = true,
> > -	}, { } /* END */
> > -};
> > -
> > -/* fw_cfg device i/o currently selected option set */
> > -static struct fw_cfg_access *fw_cfg_mode;
> > -
> >  /* fw_cfg device i/o register addresses */
> > +static bool fw_cfg_is_mmio;
> > +static phys_addr_t fw_cfg_phys_base;
> > +static u32 fw_cfg_phys_size;
> >  static void __iomem *fw_cfg_dev_base;
> >  static void __iomem *fw_cfg_reg_ctrl;
> >  static void __iomem *fw_cfg_reg_data;
> > @@ -92,7 +50,7 @@ static DEFINE_MUTEX(fw_cfg_dev_lock);
> >  /* pick appropriate endianness for selector key */
> >  static inline u16 fw_cfg_sel_endianness(u16 key)
> >  {
> > -	return fw_cfg_mode->is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
> > +	return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
> >  }
> >  
> >  /* type for fw_cfg "directory scan" visitor/callback function */
> > @@ -133,60 +91,100 @@ static inline void fw_cfg_read_blob(u16 key,
> >  /* clean up fw_cfg device i/o */
> >  static void fw_cfg_io_cleanup(void)
> >  {
> > -	if (fw_cfg_mode->is_mmio) {
> > +	if (fw_cfg_is_mmio) {
> >  		iounmap(fw_cfg_dev_base);
> > -		release_mem_region(fw_cfg_mode->base, fw_cfg_mode->size);
> > +		release_mem_region(fw_cfg_phys_base, fw_cfg_phys_size);
> >  	} else {
> >  		ioport_unmap(fw_cfg_dev_base);
> > -		release_region(fw_cfg_mode->base, fw_cfg_mode->size);
> > +		release_region(fw_cfg_phys_base, fw_cfg_phys_size);
> >  	}
> >  }
> >  
> > -/* probe and map fw_cfg device */
> > -static int __init fw_cfg_io_probe(void)
> > +/* configure fw_cfg device i/o from ACPI _CRS method */
> > +static acpi_status fw_cfg_walk_crs(struct acpi_resource *r, void *context)
> > +{
> > +	struct acpi_resource_io *io;
> > +	struct acpi_resource_fixed_memory32 *mmio;
> > +
> > +	switch (r->type) {
> > +	case ACPI_RESOURCE_TYPE_END_TAG:
> > +		return AE_OK;
> > +	case ACPI_RESOURCE_TYPE_IO:
> > +		io = &r->data.io;
> > +		/* physical base addr should NOT be already set */
> > +		if (fw_cfg_phys_base)
> > +			return AE_ERROR;
> > +		if (!request_region(io->minimum,
> > +				    io->address_length, "fw_cfg_io"))
> > +			return AE_ERROR;
> > +		fw_cfg_dev_base = ioport_map(io->minimum, io->address_length);
> > +		if (!fw_cfg_dev_base) {
> > +			release_region(io->minimum, io->address_length);
> > +			return AE_ERROR;
> > +		}
> > +		fw_cfg_phys_base = io->minimum;
> > +		fw_cfg_phys_size = io->address_length;
> > +		fw_cfg_is_mmio = false;
> > +		/* set register addresses (pc/i386 offsets) */
> > +		fw_cfg_reg_ctrl = fw_cfg_dev_base + 0x00;
> > +		fw_cfg_reg_data = fw_cfg_dev_base + 0x01;
> > +		return AE_OK;
> > +	case ACPI_RESOURCE_TYPE_FIXED_MEMORY32:
> > +		mmio = &r->data.fixed_memory32;
> > +		/* physical base addr should NOT be already set */
> > +		if (fw_cfg_phys_base)
> > +			return AE_ERROR;
> > +		/* MMIO and ACPI, but not on ARM ?!?! */
> > +		if (mmio->address_length < 0x0a)
> > +			return AE_ERROR;
> > +		if (!request_mem_region(mmio->address,
> > +					mmio->address_length, "fw_cfg_mem"))
> > +			return AE_ERROR;
> > +		fw_cfg_dev_base = ioremap(mmio->address, mmio->address_length);
> > +		if (!fw_cfg_dev_base) {
> > +			release_mem_region(mmio->address, mmio->address_length);
> > +			return AE_ERROR;
> > +		}
> > +		fw_cfg_phys_base = mmio->address;
> > +		fw_cfg_phys_size = mmio->address_length;
> > +		fw_cfg_is_mmio = true;
> > +		/* set register addresses (arm offsets) */
> > +		fw_cfg_reg_ctrl = fw_cfg_dev_base + 0x08;
> > +		fw_cfg_reg_data = fw_cfg_dev_base + 0x00;
> > +		return AE_OK;
> > +	default:
> > +		return AE_ERROR;
> > +	}
> > +}
> > +
> > +/* initialize fw_cfg device i/o from ACPI data */
> > +static int fw_cfg_acpi_init(struct acpi_device *dev)
> >  {
> >  	char sig[FW_CFG_SIG_SIZE];
> > +	acpi_status status;
> > +	int err;
> >  
> > -	for (fw_cfg_mode = &fw_cfg_modes[0];
> > -	     fw_cfg_mode->base; fw_cfg_mode++) {
> > -
> > -		phys_addr_t base = fw_cfg_mode->base;
> > -		u8 size = fw_cfg_mode->size;
> > -
> > -		/* reserve and map mmio or ioport region */
> > -		if (fw_cfg_mode->is_mmio) {
> > -			if (!request_mem_region(base, size, fw_cfg_mode->name))
> > -				continue;
> > -			fw_cfg_dev_base = ioremap(base, size);
> > -			if (!fw_cfg_dev_base) {
> > -				release_mem_region(base, size);
> > -				continue;
> > -			}
> > -		} else {
> > -			if (!request_region(base, size, fw_cfg_mode->name))
> > -				continue;
> > -			fw_cfg_dev_base = ioport_map(base, size);
> > -			if (!fw_cfg_dev_base) {
> > -				release_region(base, size);
> > -				continue;
> > -			}
> > -		}
> > +	err = acpi_bus_get_status(dev);
> > +	if (err < 0)
> > +		return err;
> >  
> > -		/* set control and data register addresses */
> > -		fw_cfg_reg_ctrl = fw_cfg_dev_base + fw_cfg_mode->ctrl_offset;
> > -		fw_cfg_reg_data = fw_cfg_dev_base + fw_cfg_mode->data_offset;
> > +	if (!(dev->status.enabled && dev->status.functional))
> > +		return -ENODEV;
> >  
> > -		/* verify fw_cfg device signature */
> > -		fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> > -		if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) == 0)
> > -			/* success, we're done */
> > -			return 0;
> > +	/* extract device i/o details from _CRS */
> > +	status = acpi_walk_resources(dev->handle, METHOD_NAME__CRS,
> > +				     fw_cfg_walk_crs, NULL);
> > +	if (status != AE_OK || !fw_cfg_phys_base)
> > +		return -ENODEV;
> >  
> > -		/* clean up before probing next access mode */
> > +	/* verify fw_cfg device signature */
> > +	fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> > +	if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) != 0) {
> >  		fw_cfg_io_cleanup();
> > +		return -ENODEV;
> >  	}
> >  
> > -	return -ENODEV;
> > +	return 0;
> >  }
> >  
> >  /* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> > @@ -353,7 +351,7 @@ static struct kobject *fw_cfg_top_ko;
> >  static struct kobject *fw_cfg_sel_ko;
> >  
> >  /* callback function to register an individual fw_cfg file */
> > -static int __init fw_cfg_register_file(const struct fw_cfg_file *f)
> > +static int fw_cfg_register_file(const struct fw_cfg_file *f)
> >  {
> >  	int err;
> >  	struct fw_cfg_sysfs_entry *entry;
> > @@ -397,12 +395,12 @@ static inline void fw_cfg_kobj_cleanup(struct kobject *kobj)
> >  	kobject_put(kobj);
> >  }
> >  
> > -static int __init fw_cfg_sysfs_init(void)
> > +static int fw_cfg_sysfs_add(struct acpi_device *dev)
> >  {
> >  	int err;
> >  
> > -	/* probe for the fw_cfg "hardware" */
> > -	err = fw_cfg_io_probe();
> > +	/* initialize fw_cfg device i/o from ACPI data */
> > +	err = fw_cfg_acpi_init(dev);
> >  	if (err)
> >  		return err;
> >  
> > @@ -443,14 +441,31 @@ err_top:
> >  	return err;
> >  }
> >  
> > -static void __exit fw_cfg_sysfs_exit(void)
> > +static int fw_cfg_sysfs_remove(struct acpi_device *dev)
> >  {
> >  	pr_debug("fw_cfg: unloading.\n");
> >  	fw_cfg_sysfs_cache_cleanup();
> >  	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
> >  	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
> >  	fw_cfg_io_cleanup();
> > +	return 0;
> >  }
> >  
> > -module_init(fw_cfg_sysfs_init);
> > -module_exit(fw_cfg_sysfs_exit);
> > +static const struct acpi_device_id fw_cfg_sysfs_device_ids[] = {
> > +	{ "QEMU0002", 0 },
> > +	{ "", 0 },
> > +};
> > +MODULE_DEVICE_TABLE(acpi, fw_cfg_sysfs_device_ids);
> > +
> > +static struct acpi_driver fw_cfg_sysfs_driver = {
> > +	.name =		"fw_cfg",
> > +	.class =	"QEMU",
> > +	.ids =		fw_cfg_sysfs_device_ids,
> > +	.ops =		{
> > +				.add =		fw_cfg_sysfs_add,
> > +				.remove =	fw_cfg_sysfs_remove,
> > +			},
> > +	.owner =	THIS_MODULE,
> > +};
> > +
> > +module_acpi_driver(fw_cfg_sysfs_driver);
> > -- 
> > 2.4.3

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/4] firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg driver
  2015-10-04 20:24     ` Gabriel L. Somlo
@ 2015-10-04 20:27       ` Gabriel L. Somlo
  0 siblings, 0 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-04 20:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	kernelnewbies, matt.fleming, lersek, jordan.l.justen,
	peter.maydell, leif.lindholm, ard.biesheuvel, pbonzini, kraxel,
	qemu-devel

On Sun, Oct 04, 2015 at 04:24:00PM -0400, Gabriel L. Somlo wrote:
> On Sun, Oct 04, 2015 at 10:54:57AM +0300, Michael S. Tsirkin wrote:
> > On Sat, Oct 03, 2015 at 07:28:07PM -0400, Gabriel L. Somlo wrote:
> > > 
> > > Instead of blindly probing fw_cfg registers at known IOport and MMIO
> > > locations, use the ACPI subsystem to determine whether a QEMU fw_cfg
> > > device is present, and, if found, to initialize it.
> > > 
> > > This limits portability to architectures which support ACPI (x86 and
> > > UEFI-enabled aarch64), but avoids touching hardware registers before
> > > being certain that our device is present.
> > > 
> > > NOTE: The standard way to verify the presence of fw_cfg on arm VMs
> > > would have been to use the device tree, but that would have left out
> > > x86, which is the primary architecture targeted by this patch.
> > > 
> > > Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> > 
> > IMHO it's not a good idea to probe registers provided
> > by CRS like this.
> > It seems quite reasonable that we'd want to add some
> > extra registers in the future, and this probing will break.
> > 
> > Further, accessing registers directly means that there's
> > no way to have ACPI code access them as that would
> > cause race conditions.
> > 
> > Maybe we should provide access methods in ACPI instead?
> 
> OK, I think I understand what you meant by "don't poke CRS" in the
> other thread...
> 
> So, you're proposing I move the follwing bits:
> 
>   /* atomic access to fw_cfg device (potentially slow i/o, so using
>    * mutex) */
>   static DEFINE_MUTEX(fw_cfg_dev_lock);
> 
>   /* pick appropriate endianness for selector key */
>   static inline u16 fw_cfg_sel_endianness(u16 key)
>   {
>           return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
>   }
> 
>   /* type for fw_cfg "directory scan" visitor/callback function */
>   typedef int (*fw_cfg_file_callback)(const struct fw_cfg_file *f);
> 
>   /* run a given callback on each fw_cfg directory entry */
>   static int fw_cfg_scan_dir(fw_cfg_file_callback callback)
>   {
>           int ret = 0;
>           u32 count, i;
>           struct fw_cfg_file f;
> 
>           mutex_lock(&fw_cfg_dev_lock);
>           iowrite16(fw_cfg_sel_endianness(FW_CFG_FILE_DIR), fw_cfg_reg_ctrl);
>           ioread8_rep(fw_cfg_reg_data, &count, sizeof(count));
>           for (i = 0; i < be32_to_cpu(count); i++) {
>                   ioread8_rep(fw_cfg_reg_data, &f, sizeof(f));
>                   ret = callback(&f);
>                   if (ret)
>                           break;
>           }
>           mutex_unlock(&fw_cfg_dev_lock);
>           return ret;
>   }
> 
>   /* read chunk of given fw_cfg blob (caller responsible for
>    * sanity-check) */
>   static inline void fw_cfg_read_blob(u16 key,
>                                       void *buf, loff_t pos, size_t count)
>   {
>           mutex_lock(&fw_cfg_dev_lock);
>           iowrite16(fw_cfg_sel_endianness(key), fw_cfg_reg_ctrl);
>           while (pos-- > 0)
>                   ioread8(fw_cfg_reg_data);
>           ioread8_rep(fw_cfg_reg_data, buf, count);
>           mutex_unlock(&fw_cfg_dev_lock);
>   }
> 
> into the FWCF, "QEMU0002" node as an AML method ? Have ACPI provide
> mutual exclusion against competing readers, and somehow figure out how
> to call the ACPI/AML code from the guest-side kernel driver whenever
> I need to call fw_cfg_read_blob() ?
> 
> I guess I could implement fw_cfg_scan_dir() using fw_cfg_read_blob():
> 
>   u32 count;
>   size_t  bufsize;
>   void *buf;
>   fw_cfg_read_blob(FW_CFG_FILE_DIR, &count, 0, sizeof(u32));
>   bufsize = sizeof(u32) + count * sizeof(struct fw_cfg_file);
>   buf = kalloc(bufsize);
>   fw_cfg_read_blob(FW_CFG_FILE_DIR, buf, 0, bufsize);
>   ...
>   /* now read all the blob meta-data from buf ... */
> 
> It would be 100% atomic, but since we can safely assume the fw_cfg
> contents never change, it'd be OK.

I meant "wouldn't be 100% atomic", as in "it would be a case of
verify-then-use"...

Sorry,
--Gabriel

> 
> The atomicity of the ACPI version of fw_cfg_read_blob(), picking the
> right endianness for the selector, etc. would have to be done in AML
> within the QEMU host-side patch.
> 
> If you know of anything I can look at for a good ASL example, please
> point it out to me. I'm going to go away now and spend some quality
> time with the ACPI spec :)
> 
> Thanks,
> --Gabriel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-03 23:28 [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Gabriel L. Somlo
                   ` (3 preceding siblings ...)
  2015-10-03 23:28 ` [PATCH v3 4/4] firmware: create directory hierarchy for sysfs fw_cfg entries Gabriel L. Somlo
@ 2015-10-05 10:00 ` Mark Rutland
  2015-10-05 11:48   ` Paolo Bonzini
  2015-10-05 12:40   ` Gabriel L. Somlo
  4 siblings, 2 replies; 26+ messages in thread
From: Mark Rutland @ 2015-10-05 10:00 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: gregkh, paul, galak, will.deacon, agross, zajec5, hanjun.guo,
	catalin.marinas, linux-api, linux-kernel, kernelnewbies,
	matt.fleming, lersek, jordan.l.justen, mst, peter.maydell,
	leif.lindholm, ard.biesheuvel, pbonzini, kraxel, qemu-devel

On Sat, Oct 03, 2015 at 07:28:05PM -0400, Gabriel L. Somlo wrote:
> From: "Gabriel Somlo" <somlo@cmu.edu>
> 
> Allow access to QEMU firmware blobs, passed into the guest VM via
> the fw_cfg device, through SysFS entries. Blob meta-data (e.g. name,
> size, and fw_cfg key), as well as the raw binary blob data may be
> accessed.
> 
> The SysFS access location is /sys/firmware/qemu_fw_cfg/... and was
> selected based on overall similarity to the type of information
> exposed under /sys/firmware/dmi/entries/...

What is the intended use of these?

Some of the keys in the example look like they'd come from other sources
(e.g. the *-tables entries), while others look like kernel/bootloader
configuration options (e.g. etc/boot-fail-wait, bootorder) -- I'm
concerned about redundancy here.

> NEW (since v2): Using ACPI to detect the presence and details of the
> fw_cfg virtual hardware device.
> 
>     Device Tree has been suggested by Ard as a comment on v2 of this
>     patch, but after some deliberation I decided to go with ACPI,
>     since it's supported on both x86 and some (uefi-enabled) versions
>     of aarch64. I really don't see how I'd reasonably use *both* DT (on
>     ARM) *and* ACPI (on x86), and after all I'm mostly concerned with
>     x86, but originally wanted to maximize portability (which is where
>     the register probing in earlier versions came from).

There are defintitely going to be arm64 VMs that don't use ACPI, so we
may need DT support depending on what the intended use is.

I'm not sure I follow what the difficulty with supporting DT in addition
to ACPI is? It looks like all you need is a compatible string and a reg
entry.

Thanks,
Mark.

>     A patch set generating an ACPI device node for qemu's fw_cfg is
>     currently under review on the qemu-devel list:
> 
>     http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg06946.html
>     (sorry, gmane appears down at the moment...)
> 
> In consequence:
> 
> 	- Patch 1/4 is mostly the same as in v2;
> 	- Patch 2/4 switches device initialization from register
> 	  probing to using ACPI; this is a separate patch only to
> 	  illustrate the transition from probing to ACPI, and I'm
> 	  assuming it will end up squashed on top of patch 1/4 in
> 	  the final version.
> 
> 	- Patches 3/4 and 4/4 add a "human-readable" directory
> 	  hierarchy built from tokenizing fw_cfg blob names into
> 	  '/'-separated components, with symlinks to each 'by_key'
> 	  blob folder (same as in earlier versions). At Greg's
> 	  suggestion I tried to build this folder hierarchy and
> 	  leaf symlinks using udev rules, but so far I haven't been
> 	  successful in figuring that out. If udev turns out to 
> 	  be applicable after all, these two patches can be dropped
> 	  from this series.
> 
> In other words, patches 1 and 2 give us the following "by_key" listing
> of blobs contained in the qemu fw_cfg device (example pulled from a PC
> qemu guest running Fedora 22), with the value of each "name" attribute
> shown on the right:
> 
> $ tree /sys/firmware/qemu_fw_cfg/
> /sys/firmware/qemu_fw_cfg/
> |-- by_key
> |   |-- 32
> |   |   |-- key
> |   |   |-- name			("etc/boot-fail-wait")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 33
> |   |   |-- key
> |   |   |-- name			("etc/smbios/smbios-tables")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 34
> |   |   |-- key
> |   |   |-- name			("etc/smbios/smbios-anchor")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 35
> |   |   |-- key
> |   |   |-- name			("etc/e820")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 36
> |   |   |-- key
> |   |   |-- name			("genroms/kvmvapic.bin")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 37
> |   |   |-- key
> |   |   |-- name			("etc/system-states")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 38
> |   |   |-- key
> |   |   |-- name			("etc/acpi/tables")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 39
> |   |   |-- key
> |   |   |-- name			("etc/table-loader")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 40
> |   |   |-- key
> |   |   |-- name			("etc/tpm/log")
> |   |   |-- raw
> |   |   `-- size
> |   |-- 41
> |   |   |-- key
> |   |   |-- name			("etc/acpi/rsdp")
> |   |   |-- raw
> |   |   `-- size
> |   `-- 42
> |       |-- key
> |       |-- name			("bootorder")
> |       |-- raw
> |       `-- size
> |
> ...
> 
> Additionally, patches 3 and 4 (mostly 4) give us the following
> "user friendly" directory hierarchy as a complement to the above,
> based on tokenizing each blob name into symlink-tipped (sub)directories:
> 
> ...
> |-- by_name
> |   |-- bootorder -> ../by_key/42
> |   |-- etc
> |   |   |-- acpi
> |   |   |   |-- rsdp -> ../../../by_key/41
> |   |   |   `-- tables -> ../../../by_key/38
> |   |   |-- boot-fail-wait -> ../../by_key/32
> |   |   |-- e820 -> ../../by_key/35
> |   |   |-- smbios
> |   |   |   |-- smbios-anchor -> ../../../by_key/34
> |   |   |   `-- smbios-tables -> ../../../by_key/33
> |   |   |-- system-states -> ../../by_key/37
> |   |   |-- table-loader -> ../../by_key/39
> |   |   `-- tpm
> |   |       `-- log -> ../../../by_key/40
> |   `-- genroms
> |       `-- kvmvapic.bin -> ../../by_key/36
> `-- rev
> 
> The trick is to figure out how to replace patches 3 and 4 with a
> udev rule that would read the contents of each "name" attribute,
> and build the "by_name" hierarchy and symlinks in userspace.
> 
> I tried:
> 
> $ udevadm info -a -p /sys/firmware/qemu_fw_cfg/by_key/33
> 
>   looking at device '/firmware/qemu_fw_cfg/by_key/33':
>     KERNEL=="33"
>     SUBSYSTEM==""
>     DRIVER==""
>     ATTR{key}=="33"
>     ATTR{name}=="etc/smbios/smbios-tables"
>     ATTR{size}=="388"
> 
>   looking at parent device '/firmware/qemu_fw_cfg/by_key':
>     KERNELS=="by_key"
>     SUBSYSTEMS==""
>     DRIVERS==""
> 
>   looking at parent device '/firmware/qemu_fw_cfg':
>     KERNELS=="qemu_fw_cfg"
>     SUBSYSTEMS==""
>     DRIVERS==""
>     ATTRS{rev}=="1"
> 
> Then I tried creating a file, /usr/lib/udev/rules.d/99-qemu-fw-cfg.rules
> containing the following line:
> 
> KERNELS=="qemu_fw_cfg", ATTRS{rev}=="1", SYMLINK="%p/%s{name}"
> 
> but NOTHING happens when I insert/remove qemu_fw_cfg.ko.  I also tried:
> 
> KERNELS=="qemu_fw_cfg", ATTRS{rev}=="1", PROGRAM="/foo %k"
> 
> where "/foo" basically did "echo $* > /tmp/bar", but no /tmp/bar file ever
> showed up as a consequence of inserting/removing the qemu_fw_cfg.ko module.
> 
> At this point, I need help figuring out whether udev is really what would
> get the second, user-friendly, "by_name" /sysfs directory tree created,
> and how I'd go about that...
> 
> Thanks much,
>   --Gabriel
> 
> Gabriel Somlo (4):
>   firmware: introduce sysfs driver for QEMU's fw_cfg device
>   firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg
>     driver
>   kobject: export kset_find_obj() for module use
>   firmware: create directory hierarchy for sysfs fw_cfg entries
> 
>  .../ABI/testing/sysfs-firmware-qemu_fw_cfg         | 213 ++++++++
>  drivers/firmware/Kconfig                           |  10 +
>  drivers/firmware/Makefile                          |   1 +
>  drivers/firmware/qemu_fw_cfg.c                     | 575 +++++++++++++++++++++
>  lib/kobject.c                                      |   1 +
>  5 files changed, 800 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
>  create mode 100644 drivers/firmware/qemu_fw_cfg.c
> 
> -- 
> 2.4.3
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 10:00 ` [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Mark Rutland
@ 2015-10-05 11:48   ` Paolo Bonzini
  2015-10-05 12:23     ` Mark Rutland
  2015-10-05 12:40   ` Gabriel L. Somlo
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2015-10-05 11:48 UTC (permalink / raw)
  To: Mark Rutland, Gabriel L. Somlo
  Cc: gregkh, paul, galak, will.deacon, agross, zajec5, hanjun.guo,
	catalin.marinas, linux-api, linux-kernel, kernelnewbies,
	matt.fleming, lersek, jordan.l.justen, mst, peter.maydell,
	leif.lindholm, ard.biesheuvel, kraxel, qemu-devel



On 05/10/2015 12:00, Mark Rutland wrote:
> Some of the keys in the example look like they'd come from other sources
> (e.g. the *-tables entries), while others look like kernel/bootloader
> configuration options (e.g. etc/boot-fail-wait, bootorder) -- I'm
> concerned about redundancy here.

The redundancy is because the firmware and the bootloader actually
_consume_ these fw_cfg strings to produce the others (the ACPI tables,
the kernel configuration options).

On the other hand, hiding some strings just because they ought to have
been consumed already makes little sense.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 11:48   ` Paolo Bonzini
@ 2015-10-05 12:23     ` Mark Rutland
  2015-10-05 12:43       ` Gabriel L. Somlo
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2015-10-05 12:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gabriel L. Somlo, gregkh, paul, galak, will.deacon, agross,
	zajec5, hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, kraxel, qemu-devel

On Mon, Oct 05, 2015 at 01:48:52PM +0200, Paolo Bonzini wrote:
> 
> 
> On 05/10/2015 12:00, Mark Rutland wrote:
> > Some of the keys in the example look like they'd come from other sources
> > (e.g. the *-tables entries), while others look like kernel/bootloader
> > configuration options (e.g. etc/boot-fail-wait, bootorder) -- I'm
> > concerned about redundancy here.
> 
> The redundancy is because the firmware and the bootloader actually
> _consume_ these fw_cfg strings to produce the others (the ACPI tables,
> the kernel configuration options).
> 
> On the other hand, hiding some strings just because they ought to have
> been consumed already makes little sense.

Sure. However, I'm concerned that providing redundant interfaces for
those could lead to people grabbing information from here (because it's
convenient) rather than the existing canonical locations, which means we
get more software that works on fewer systems for no good reason.

What I couldn't figure out was what _additional_ information this
provided; it looked like a mixed bag of details we could already get
from disparate sources. If that's all it does, then it seems to me like
it doesn't add any benefit and potentially makes things worse.

So what do we get from this interface that we cannot get elsewhere, and
why is this the best way of exposing it?

Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 10:00 ` [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Mark Rutland
  2015-10-05 11:48   ` Paolo Bonzini
@ 2015-10-05 12:40   ` Gabriel L. Somlo
  2015-10-05 12:50     ` Peter Maydell
  2015-10-05 13:05     ` Mark Rutland
  1 sibling, 2 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-05 12:40 UTC (permalink / raw)
  To: Mark Rutland
  Cc: gregkh, paul, galak, will.deacon, agross, zajec5, hanjun.guo,
	catalin.marinas, linux-api, linux-kernel, kernelnewbies,
	matt.fleming, lersek, jordan.l.justen, mst, peter.maydell,
	leif.lindholm, ard.biesheuvel, pbonzini, kraxel, qemu-devel

On Mon, Oct 05, 2015 at 11:00:36AM +0100, Mark Rutland wrote:
> On Sat, Oct 03, 2015 at 07:28:05PM -0400, Gabriel L. Somlo wrote:
> > From: "Gabriel Somlo" <somlo@cmu.edu>
> > 
> > Allow access to QEMU firmware blobs, passed into the guest VM via
> > the fw_cfg device, through SysFS entries. Blob meta-data (e.g. name,
> > size, and fw_cfg key), as well as the raw binary blob data may be
> > accessed.
> > 
> > The SysFS access location is /sys/firmware/qemu_fw_cfg/... and was
> > selected based on overall similarity to the type of information
> > exposed under /sys/firmware/dmi/entries/...
> 
> What is the intended use of these?
> 
> Some of the keys in the example look like they'd come from other sources
> (e.g. the *-tables entries), while others look like kernel/bootloader
> configuration options (e.g. etc/boot-fail-wait, bootorder) -- I'm
> concerned about redundancy here.

Paolo already answered that (more eloquently than I would have) so I'll
leave it at that, for now...

> 
> > NEW (since v2): Using ACPI to detect the presence and details of the
> > fw_cfg virtual hardware device.
> > 
> >     Device Tree has been suggested by Ard as a comment on v2 of this
> >     patch, but after some deliberation I decided to go with ACPI,
> >     since it's supported on both x86 and some (uefi-enabled) versions
> >     of aarch64. I really don't see how I'd reasonably use *both* DT (on
> >     ARM) *and* ACPI (on x86), and after all I'm mostly concerned with
> >     x86, but originally wanted to maximize portability (which is where
> >     the register probing in earlier versions came from).
> 
> There are defintitely going to be arm64 VMs that don't use ACPI, so we
> may need DT support depending on what the intended use is.
> 
> I'm not sure I follow what the difficulty with supporting DT in addition
> to ACPI is? It looks like all you need is a compatible string and a reg
> entry.

Bearing in mind that I have almost no experience with arm:

I started out by probing all possible port-io and mmio locations where
fw_cfg registers might have been found, from a "classic" module_init
method.

Arm has DT, which as far as I understand will answer the following two
questions: 1. Do I have fw_cfg ? 2. If yes, what address range does it use ?
So that I could continue using a classic module_init, but won't need
to probe for the device.

PC (my primary architecture, the one I actually care about) does not
have DT. If I want to share the same code, I can't probe, so if I try
DT and don't find fw_cfg there (or somehow DT is no-op-ed out because
I'm on a PC guest), I could somehow look it up in ACPI the same way
(i.e., use ACPI as sort of a stand-in for DT).

But all ACPI-enabled drivers I could find use dedicated macros (i.e.
no more classic module_init() and module_exit(), but rather
module_acpi_driver() with .add and .remove methods on an acpi_driver
object, etc.) Not sure how I'd glue DT back into something like that.

In addition, Michael's comment earlier in the thread suggests that
even my current acpi version isn't sufficiently "orthodox" w.r.t.
ACPI, and I should be providing the hardware access routine as
an ACPI/AML routine, to avoid race conditions with the rest of ACPI,
and for encapsulation. I.e. it's even rude to use the fw_cfg node's
ACPI _CRS method (the part where I'd be treating it like a DT stand-in
only to query fw_cfg's hardware specifics).

So far, all the information I've been able to pull together points
away from a dual DT + ACPI all-in-one solution for fw_cfg. If you know
of an example where that's done in an acceptable way, please let
me know so I can use it for inspiration...

Thanks much,
--Gabriel

> 
> >     A patch set generating an ACPI device node for qemu's fw_cfg is
> >     currently under review on the qemu-devel list:
> > 
> >     http://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg06946.html
> >     (sorry, gmane appears down at the moment...)
> > 
> > In consequence:
> > 
> > 	- Patch 1/4 is mostly the same as in v2;
> > 	- Patch 2/4 switches device initialization from register
> > 	  probing to using ACPI; this is a separate patch only to
> > 	  illustrate the transition from probing to ACPI, and I'm
> > 	  assuming it will end up squashed on top of patch 1/4 in
> > 	  the final version.
> > 
> > 	- Patches 3/4 and 4/4 add a "human-readable" directory
> > 	  hierarchy built from tokenizing fw_cfg blob names into
> > 	  '/'-separated components, with symlinks to each 'by_key'
> > 	  blob folder (same as in earlier versions). At Greg's
> > 	  suggestion I tried to build this folder hierarchy and
> > 	  leaf symlinks using udev rules, but so far I haven't been
> > 	  successful in figuring that out. If udev turns out to 
> > 	  be applicable after all, these two patches can be dropped
> > 	  from this series.
> > 
> > In other words, patches 1 and 2 give us the following "by_key" listing
> > of blobs contained in the qemu fw_cfg device (example pulled from a PC
> > qemu guest running Fedora 22), with the value of each "name" attribute
> > shown on the right:
> > 
> > $ tree /sys/firmware/qemu_fw_cfg/
> > /sys/firmware/qemu_fw_cfg/
> > |-- by_key
> > |   |-- 32
> > |   |   |-- key
> > |   |   |-- name			("etc/boot-fail-wait")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 33
> > |   |   |-- key
> > |   |   |-- name			("etc/smbios/smbios-tables")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 34
> > |   |   |-- key
> > |   |   |-- name			("etc/smbios/smbios-anchor")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 35
> > |   |   |-- key
> > |   |   |-- name			("etc/e820")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 36
> > |   |   |-- key
> > |   |   |-- name			("genroms/kvmvapic.bin")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 37
> > |   |   |-- key
> > |   |   |-- name			("etc/system-states")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 38
> > |   |   |-- key
> > |   |   |-- name			("etc/acpi/tables")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 39
> > |   |   |-- key
> > |   |   |-- name			("etc/table-loader")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 40
> > |   |   |-- key
> > |   |   |-- name			("etc/tpm/log")
> > |   |   |-- raw
> > |   |   `-- size
> > |   |-- 41
> > |   |   |-- key
> > |   |   |-- name			("etc/acpi/rsdp")
> > |   |   |-- raw
> > |   |   `-- size
> > |   `-- 42
> > |       |-- key
> > |       |-- name			("bootorder")
> > |       |-- raw
> > |       `-- size
> > |
> > ...
> > 
> > Additionally, patches 3 and 4 (mostly 4) give us the following
> > "user friendly" directory hierarchy as a complement to the above,
> > based on tokenizing each blob name into symlink-tipped (sub)directories:
> > 
> > ...
> > |-- by_name
> > |   |-- bootorder -> ../by_key/42
> > |   |-- etc
> > |   |   |-- acpi
> > |   |   |   |-- rsdp -> ../../../by_key/41
> > |   |   |   `-- tables -> ../../../by_key/38
> > |   |   |-- boot-fail-wait -> ../../by_key/32
> > |   |   |-- e820 -> ../../by_key/35
> > |   |   |-- smbios
> > |   |   |   |-- smbios-anchor -> ../../../by_key/34
> > |   |   |   `-- smbios-tables -> ../../../by_key/33
> > |   |   |-- system-states -> ../../by_key/37
> > |   |   |-- table-loader -> ../../by_key/39
> > |   |   `-- tpm
> > |   |       `-- log -> ../../../by_key/40
> > |   `-- genroms
> > |       `-- kvmvapic.bin -> ../../by_key/36
> > `-- rev
> > 
> > The trick is to figure out how to replace patches 3 and 4 with a
> > udev rule that would read the contents of each "name" attribute,
> > and build the "by_name" hierarchy and symlinks in userspace.
> > 
> > I tried:
> > 
> > $ udevadm info -a -p /sys/firmware/qemu_fw_cfg/by_key/33
> > 
> >   looking at device '/firmware/qemu_fw_cfg/by_key/33':
> >     KERNEL=="33"
> >     SUBSYSTEM==""
> >     DRIVER==""
> >     ATTR{key}=="33"
> >     ATTR{name}=="etc/smbios/smbios-tables"
> >     ATTR{size}=="388"
> > 
> >   looking at parent device '/firmware/qemu_fw_cfg/by_key':
> >     KERNELS=="by_key"
> >     SUBSYSTEMS==""
> >     DRIVERS==""
> > 
> >   looking at parent device '/firmware/qemu_fw_cfg':
> >     KERNELS=="qemu_fw_cfg"
> >     SUBSYSTEMS==""
> >     DRIVERS==""
> >     ATTRS{rev}=="1"
> > 
> > Then I tried creating a file, /usr/lib/udev/rules.d/99-qemu-fw-cfg.rules
> > containing the following line:
> > 
> > KERNELS=="qemu_fw_cfg", ATTRS{rev}=="1", SYMLINK="%p/%s{name}"
> > 
> > but NOTHING happens when I insert/remove qemu_fw_cfg.ko.  I also tried:
> > 
> > KERNELS=="qemu_fw_cfg", ATTRS{rev}=="1", PROGRAM="/foo %k"
> > 
> > where "/foo" basically did "echo $* > /tmp/bar", but no /tmp/bar file ever
> > showed up as a consequence of inserting/removing the qemu_fw_cfg.ko module.
> > 
> > At this point, I need help figuring out whether udev is really what would
> > get the second, user-friendly, "by_name" /sysfs directory tree created,
> > and how I'd go about that...
> > 
> > Thanks much,
> >   --Gabriel
> > 
> > Gabriel Somlo (4):
> >   firmware: introduce sysfs driver for QEMU's fw_cfg device
> >   firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg
> >     driver
> >   kobject: export kset_find_obj() for module use
> >   firmware: create directory hierarchy for sysfs fw_cfg entries
> > 
> >  .../ABI/testing/sysfs-firmware-qemu_fw_cfg         | 213 ++++++++
> >  drivers/firmware/Kconfig                           |  10 +
> >  drivers/firmware/Makefile                          |   1 +
> >  drivers/firmware/qemu_fw_cfg.c                     | 575 +++++++++++++++++++++
> >  lib/kobject.c                                      |   1 +
> >  5 files changed, 800 insertions(+)
> >  create mode 100644 Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> >  create mode 100644 drivers/firmware/qemu_fw_cfg.c
> > 
> > -- 
> > 2.4.3
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 12:23     ` Mark Rutland
@ 2015-10-05 12:43       ` Gabriel L. Somlo
  2015-10-05 12:56         ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-05 12:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Paolo Bonzini, gregkh, paul, galak, will.deacon, agross, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, kraxel, qemu-devel

On Mon, Oct 05, 2015 at 01:23:33PM +0100, Mark Rutland wrote:
> On Mon, Oct 05, 2015 at 01:48:52PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 05/10/2015 12:00, Mark Rutland wrote:
> > > Some of the keys in the example look like they'd come from other sources
> > > (e.g. the *-tables entries), while others look like kernel/bootloader
> > > configuration options (e.g. etc/boot-fail-wait, bootorder) -- I'm
> > > concerned about redundancy here.
> > 
> > The redundancy is because the firmware and the bootloader actually
> > _consume_ these fw_cfg strings to produce the others (the ACPI tables,
> > the kernel configuration options).
> > 
> > On the other hand, hiding some strings just because they ought to have
> > been consumed already makes little sense.
> 
> Sure. However, I'm concerned that providing redundant interfaces for
> those could lead to people grabbing information from here (because it's
> convenient) rather than the existing canonical locations, which means we
> get more software that works on fewer systems for no good reason.
> 
> What I couldn't figure out was what _additional_ information this
> provided; it looked like a mixed bag of details we could already get
> from disparate sources. If that's all it does, then it seems to me like
> it doesn't add any benefit and potentially makes things worse.
> 
> So what do we get from this interface that we cannot get elsewhere, and
> why is this the best way of exposing it?

Starting with qemu 2.4, it is possible to insert arbitrary named
blobs into fw_cfg from the qemu command line. *Those* entries
might be interesting to userspace, which is why it might be handy
to access to fw_cfg blobs in general.

Thanks,
--Gabriel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 12:40   ` Gabriel L. Somlo
@ 2015-10-05 12:50     ` Peter Maydell
  2015-10-05 13:13       ` Gabriel L. Somlo
  2015-10-05 13:18       ` Paolo Bonzini
  2015-10-05 13:05     ` Mark Rutland
  1 sibling, 2 replies; 26+ messages in thread
From: Peter Maydell @ 2015-10-05 12:50 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: Mark Rutland, gregkh, paul, Kumar Gala, Will Deacon, agross,
	zajec5, Hanjun Guo, Catalin Marinas, linux-api,
	lkml - Kernel Mailing List, kernelnewbies, Matt Fleming,
	Laszlo Ersek, Jordan Justen, Michael S. Tsirkin, Leif Lindholm,
	Ard Biesheuvel, Paolo Bonzini, Gerd Hoffmann, QEMU Developers

On 5 October 2015 at 13:40, Gabriel L. Somlo <somlo@cmu.edu> wrote:
> In addition, Michael's comment earlier in the thread suggests that
> even my current acpi version isn't sufficiently "orthodox" w.r.t.
> ACPI, and I should be providing the hardware access routine as
> an ACPI/AML routine, to avoid race conditions with the rest of ACPI,
> and for encapsulation. I.e. it's even rude to use the fw_cfg node's
> ACPI _CRS method (the part where I'd be treating it like a DT stand-in
> only to query fw_cfg's hardware specifics).

If you want to try to support "firmware might also be reading
fw_cfg at the same time as the kernel" this is a (painful)
problem regardless of how the kernel figures out whether a
fw_cfg device is present. I had assumed that one of the design
assumptions of this series was that firmware would only
read the fw_cfg before booting the guest kernel and never touch
it afterwards. If it might touch it later then letting the
guest kernel also mess with fw_cfg seems like a really bad idea.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 12:43       ` Gabriel L. Somlo
@ 2015-10-05 12:56         ` Mark Rutland
  2015-10-05 13:21           ` Gabriel L. Somlo
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2015-10-05 12:56 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: Paolo Bonzini, gregkh, paul, galak, will.deacon, agross, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, kraxel, qemu-devel

On Mon, Oct 05, 2015 at 08:43:46AM -0400, Gabriel L. Somlo wrote:
> On Mon, Oct 05, 2015 at 01:23:33PM +0100, Mark Rutland wrote:
> > On Mon, Oct 05, 2015 at 01:48:52PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 05/10/2015 12:00, Mark Rutland wrote:
> > > > Some of the keys in the example look like they'd come from other sources
> > > > (e.g. the *-tables entries), while others look like kernel/bootloader
> > > > configuration options (e.g. etc/boot-fail-wait, bootorder) -- I'm
> > > > concerned about redundancy here.
> > > 
> > > The redundancy is because the firmware and the bootloader actually
> > > _consume_ these fw_cfg strings to produce the others (the ACPI tables,
> > > the kernel configuration options).
> > > 
> > > On the other hand, hiding some strings just because they ought to have
> > > been consumed already makes little sense.
> > 
> > Sure. However, I'm concerned that providing redundant interfaces for
> > those could lead to people grabbing information from here (because it's
> > convenient) rather than the existing canonical locations, which means we
> > get more software that works on fewer systems for no good reason.
> > 
> > What I couldn't figure out was what _additional_ information this
> > provided; it looked like a mixed bag of details we could already get
> > from disparate sources. If that's all it does, then it seems to me like
> > it doesn't add any benefit and potentially makes things worse.
> > 
> > So what do we get from this interface that we cannot get elsewhere, and
> > why is this the best way of exposing it?
> 
> Starting with qemu 2.4, it is possible to insert arbitrary named
> blobs into fw_cfg from the qemu command line. *Those* entries
> might be interesting to userspace, which is why it might be handy
> to access to fw_cfg blobs in general.

So this is a mechanism to pass arbitrary key:value pairs to a guest
userspace? What would those be used for, and why would this be the
correct location for that?

How do we avoid clashes between user-selected names and those we need to
pass actual FW data?

Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 12:40   ` Gabriel L. Somlo
  2015-10-05 12:50     ` Peter Maydell
@ 2015-10-05 13:05     ` Mark Rutland
  2015-10-06  7:18       ` Laszlo Ersek
  1 sibling, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2015-10-05 13:05 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: gregkh, paul, galak, will.deacon, agross, zajec5, hanjun.guo,
	catalin.marinas, linux-api, linux-kernel, kernelnewbies,
	matt.fleming, lersek, jordan.l.justen, mst, peter.maydell,
	leif.lindholm, ard.biesheuvel, pbonzini, kraxel, qemu-devel

> > I'm not sure I follow what the difficulty with supporting DT in addition
> > to ACPI is? It looks like all you need is a compatible string and a reg
> > entry.
> 
> Bearing in mind that I have almost no experience with arm:
> 
> I started out by probing all possible port-io and mmio locations where
> fw_cfg registers might have been found, from a "classic" module_init
> method.
> 
> Arm has DT, which as far as I understand will answer the following two
> questions: 1. Do I have fw_cfg ? 2. If yes, what address range does it use ?
> So that I could continue using a classic module_init, but won't need
> to probe for the device.
> 
> PC (my primary architecture, the one I actually care about) does not
> have DT. If I want to share the same code, I can't probe, so if I try
> DT and don't find fw_cfg there (or somehow DT is no-op-ed out because
> I'm on a PC guest), I could somehow look it up in ACPI the same way
> (i.e., use ACPI as sort of a stand-in for DT).

I'd imagine that it's simple to have something in your probe path like:

if (pdev->dev.of_node)
	parse_dt(pdev);
else
	parse_acpi(pdev);

> But all ACPI-enabled drivers I could find use dedicated macros (i.e.
> no more classic module_init() and module_exit(), but rather
> module_acpi_driver() with .add and .remove methods on an acpi_driver
> object, etc.) Not sure how I'd glue DT back into something like that.

You don't have to use those macros, and can simply use the classic
module_{init,exit} functions, calling the requisite acpi driver
registration functions at module {init,exit} time.

> In addition, Michael's comment earlier in the thread suggests that
> even my current acpi version isn't sufficiently "orthodox" w.r.t.
> ACPI, and I should be providing the hardware access routine as
> an ACPI/AML routine, to avoid race conditions with the rest of ACPI,
> and for encapsulation. I.e. it's even rude to use the fw_cfg node's
> ACPI _CRS method (the part where I'd be treating it like a DT stand-in
> only to query fw_cfg's hardware specifics).

As Peter stated, this sounds very much like it rules out sharing the
interface with FW generally (and is certainly scary).

> So far, all the information I've been able to pull together points
> away from a dual DT + ACPI all-in-one solution for fw_cfg. If you know
> of an example where that's done in an acceptable way, please let
> me know so I can use it for inspiration...

I'm not immediately aware, but I would imagine you could search for
files that had both an of_match_table and a acpi_bus_register_driver
call.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 12:50     ` Peter Maydell
@ 2015-10-05 13:13       ` Gabriel L. Somlo
  2015-10-05 13:18       ` Paolo Bonzini
  1 sibling, 0 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-05 13:13 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Mark Rutland, gregkh, paul, Kumar Gala, Will Deacon, agross,
	zajec5, Hanjun Guo, Catalin Marinas, linux-api,
	lkml - Kernel Mailing List, kernelnewbies, Matt Fleming,
	Laszlo Ersek, Jordan Justen, Michael S. Tsirkin, Leif Lindholm,
	Ard Biesheuvel, Paolo Bonzini, Gerd Hoffmann, QEMU Developers

On Mon, Oct 05, 2015 at 01:50:47PM +0100, Peter Maydell wrote:
> On 5 October 2015 at 13:40, Gabriel L. Somlo <somlo@cmu.edu> wrote:
> > In addition, Michael's comment earlier in the thread suggests that
> > even my current acpi version isn't sufficiently "orthodox" w.r.t.
> > ACPI, and I should be providing the hardware access routine as
> > an ACPI/AML routine, to avoid race conditions with the rest of ACPI,
> > and for encapsulation. I.e. it's even rude to use the fw_cfg node's
> > ACPI _CRS method (the part where I'd be treating it like a DT stand-in
> > only to query fw_cfg's hardware specifics).
> 
> If you want to try to support "firmware might also be reading
> fw_cfg at the same time as the kernel" this is a (painful)
> problem regardless of how the kernel figures out whether a
> fw_cfg device is present. I had assumed that one of the design
> assumptions of this series was that firmware would only
> read the fw_cfg before booting the guest kernel and never touch
> it afterwards. If it might touch it later then letting the
> guest kernel also mess with fw_cfg seems like a really bad idea.

I don't know of any case where firmware and kernel might race each
other to access fw_cfg.

The issue AFAICT is whether it's safe (future-proof) to rely on
parsing _CRS for the fw_cfg i/o access information, or whether
such logic could be rendered obsolete by potential future updates
to fw_cfg's _CRS. If I "outsource" the fw_cfg_dump_blob_by_key()
functionality entirely to an ACPI method, my kernel driver won't
have to worry about keeping up with said future updates.

On the down-side, that means the kernel driver will be ACPI or
nothing (but I'm OK with that, at my curent level of understanding :)

Thanks,
--Gabriel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 12:50     ` Peter Maydell
  2015-10-05 13:13       ` Gabriel L. Somlo
@ 2015-10-05 13:18       ` Paolo Bonzini
  2015-11-04 20:48         ` Gabriel L. Somlo
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2015-10-05 13:18 UTC (permalink / raw)
  To: Peter Maydell, Gabriel L. Somlo
  Cc: Mark Rutland, gregkh, paul, Kumar Gala, Will Deacon, agross,
	zajec5, Hanjun Guo, Catalin Marinas, linux-api,
	lkml - Kernel Mailing List, kernelnewbies, Matt Fleming,
	Laszlo Ersek, Jordan Justen, Michael S. Tsirkin, Leif Lindholm,
	Ard Biesheuvel, Gerd Hoffmann, QEMU Developers



On 05/10/2015 14:50, Peter Maydell wrote:
> If you want to try to support "firmware might also be reading
> fw_cfg at the same time as the kernel" this is a (painful)
> problem regardless of how the kernel figures out whether a
> fw_cfg device is present. I had assumed that one of the design
> assumptions of this series was that firmware would only
> read the fw_cfg before booting the guest kernel and never touch
> it afterwards. If it might touch it later then letting the
> guest kernel also mess with fw_cfg seems like a really bad idea.

The idea of tinkering with fw_cfg from the AML code (DSDT/SSDT) has been
proposed many times, and always dropped.  One of the reasons was that
the OS could have a driver for fw_cfg.

So I think that we can define the QEMU0002 id as owned by the OSPM,
similar to the various standard ACPI ids that are usually found in the
x86 world (e.g. PNP0B00 is a mc146818 RTC, PNP0303 is an 8042 keyboard
controller, PNP0501 is a 16550 or similar UART, and so on).  This
basically sanctions _CRS as the way to pass information from the
firmware to the OSPM, also similarly to those standard PNP ids.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 12:56         ` Mark Rutland
@ 2015-10-05 13:21           ` Gabriel L. Somlo
  0 siblings, 0 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-05 13:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Paolo Bonzini, gregkh, paul, galak, will.deacon, agross, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	kernelnewbies, matt.fleming, lersek, jordan.l.justen, mst,
	peter.maydell, leif.lindholm, ard.biesheuvel, kraxel, qemu-devel

On Mon, Oct 05, 2015 at 01:56:47PM +0100, Mark Rutland wrote:
> On Mon, Oct 05, 2015 at 08:43:46AM -0400, Gabriel L. Somlo wrote:
> > On Mon, Oct 05, 2015 at 01:23:33PM +0100, Mark Rutland wrote:
> > > On Mon, Oct 05, 2015 at 01:48:52PM +0200, Paolo Bonzini wrote:
> > > > 
> > > > 
> > > > On 05/10/2015 12:00, Mark Rutland wrote:
> > > > > Some of the keys in the example look like they'd come from other sources
> > > > > (e.g. the *-tables entries), while others look like kernel/bootloader
> > > > > configuration options (e.g. etc/boot-fail-wait, bootorder) -- I'm
> > > > > concerned about redundancy here.
> > > > 
> > > > The redundancy is because the firmware and the bootloader actually
> > > > _consume_ these fw_cfg strings to produce the others (the ACPI tables,
> > > > the kernel configuration options).
> > > > 
> > > > On the other hand, hiding some strings just because they ought to have
> > > > been consumed already makes little sense.
> > > 
> > > Sure. However, I'm concerned that providing redundant interfaces for
> > > those could lead to people grabbing information from here (because it's
> > > convenient) rather than the existing canonical locations, which means we
> > > get more software that works on fewer systems for no good reason.
> > > 
> > > What I couldn't figure out was what _additional_ information this
> > > provided; it looked like a mixed bag of details we could already get
> > > from disparate sources. If that's all it does, then it seems to me like
> > > it doesn't add any benefit and potentially makes things worse.
> > > 
> > > So what do we get from this interface that we cannot get elsewhere, and
> > > why is this the best way of exposing it?
> > 
> > Starting with qemu 2.4, it is possible to insert arbitrary named
> > blobs into fw_cfg from the qemu command line. *Those* entries
> > might be interesting to userspace, which is why it might be handy
> > to access to fw_cfg blobs in general.
> 
> So this is a mechanism to pass arbitrary key:value pairs to a guest
> userspace? What would those be used for, and why would this be the
> correct location for that?

Yes to arbitrary host->guest arbitrary key:value pairs.
fw_cfg because it's asynchronous (host supplies the data at guest
start time, and no longer has to worry about whether and when guests
may or may not start some sort of agent in order to be able to accept
connections, etc); also because it's guest-os agnostic (no
piggy-backing on e.g. kernel command line). Drivers to make data
available to guest userspace can be written for any guest OS.

> How do we avoid clashes between user-selected names and those we need to
> pass actual FW data?

Internally supplied blobs (by QEMU) meant for the firmware are, by
convention, prefixed with "/etc/...". Command-line blobs are expected
to use "opt/...". QEMU issues a warning if a name is used on the
command line that doesn't begin with 'opt/'.

Thanks,
--Gabriel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 13:05     ` Mark Rutland
@ 2015-10-06  7:18       ` Laszlo Ersek
  0 siblings, 0 replies; 26+ messages in thread
From: Laszlo Ersek @ 2015-10-06  7:18 UTC (permalink / raw)
  To: Mark Rutland, Gabriel L. Somlo
  Cc: gregkh, paul, galak, will.deacon, agross, zajec5, hanjun.guo,
	catalin.marinas, linux-api, linux-kernel, kernelnewbies,
	matt.fleming, jordan.l.justen, mst, peter.maydell, leif.lindholm,
	ard.biesheuvel, pbonzini, kraxel, qemu-devel

On 10/05/15 15:05, Mark Rutland wrote:
>>> I'm not sure I follow what the difficulty with supporting DT in addition
>>> to ACPI is? It looks like all you need is a compatible string and a reg
>>> entry.
>>
>> Bearing in mind that I have almost no experience with arm:
>>
>> I started out by probing all possible port-io and mmio locations where
>> fw_cfg registers might have been found, from a "classic" module_init
>> method.
>>
>> Arm has DT, which as far as I understand will answer the following two
>> questions: 1. Do I have fw_cfg ? 2. If yes, what address range does it use ?
>> So that I could continue using a classic module_init, but won't need
>> to probe for the device.
>>
>> PC (my primary architecture, the one I actually care about) does not
>> have DT. If I want to share the same code, I can't probe, so if I try
>> DT and don't find fw_cfg there (or somehow DT is no-op-ed out because
>> I'm on a PC guest), I could somehow look it up in ACPI the same way
>> (i.e., use ACPI as sort of a stand-in for DT).
> 
> I'd imagine that it's simple to have something in your probe path like:
> 
> if (pdev->dev.of_node)
> 	parse_dt(pdev);
> else
> 	parse_acpi(pdev);
> 
>> But all ACPI-enabled drivers I could find use dedicated macros (i.e.
>> no more classic module_init() and module_exit(), but rather
>> module_acpi_driver() with .add and .remove methods on an acpi_driver
>> object, etc.) Not sure how I'd glue DT back into something like that.
> 
> You don't have to use those macros, and can simply use the classic
> module_{init,exit} functions, calling the requisite acpi driver
> registration functions at module {init,exit} time.
> 
>> In addition, Michael's comment earlier in the thread suggests that
>> even my current acpi version isn't sufficiently "orthodox" w.r.t.
>> ACPI, and I should be providing the hardware access routine as
>> an ACPI/AML routine, to avoid race conditions with the rest of ACPI,
>> and for encapsulation. I.e. it's even rude to use the fw_cfg node's
>> ACPI _CRS method (the part where I'd be treating it like a DT stand-in
>> only to query fw_cfg's hardware specifics).
> 
> As Peter stated, this sounds very much like it rules out sharing the
> interface with FW generally (and is certainly scary).
> 
>> So far, all the information I've been able to pull together points
>> away from a dual DT + ACPI all-in-one solution for fw_cfg. If you know
>> of an example where that's done in an acceptable way, please let
>> me know so I can use it for inspiration...
> 
> I'm not immediately aware, but I would imagine you could search for
> files that had both an of_match_table and a acpi_bus_register_driver
> call.

One file that I think is an example for this (and I have looked at
before) is: "drivers/virtio/virtio_mmio.c".

Virtio-mmio is supposed to be enumerable in both ACPI and DT virtual
machines. For the QEMU side, grep QEMU for "LNRO0005" vs. "virtio,mmio".

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's fw_cfg device
  2015-10-03 23:28 ` [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's " Gabriel L. Somlo
  2015-10-04  1:34   ` kbuild test robot
@ 2015-10-06  8:40   ` Stefan Hajnoczi
  2015-10-06 12:53   ` Laszlo Ersek
  2015-10-06 17:54   ` Andy Lutomirski
  3 siblings, 0 replies; 26+ messages in thread
From: Stefan Hajnoczi @ 2015-10-06  8:40 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	peter.maydell, matt.fleming, mst, jordan.l.justen, kernelnewbies,
	qemu-devel, leif.lindholm, ard.biesheuvel, kraxel, pbonzini,
	lersek

On Sat, Oct 03, 2015 at 07:28:06PM -0400, Gabriel L. Somlo wrote:
> +/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> +static inline void fw_cfg_read_blob(u16 key,
> +				    void *buf, loff_t pos, size_t count)
> +{
> +	mutex_lock(&fw_cfg_dev_lock);
> +	iowrite16(fw_cfg_sel_endianness(key), fw_cfg_reg_ctrl);
> +	while (pos-- > 0)
> +		ioread8(fw_cfg_reg_data);
> +	ioread8_rep(fw_cfg_reg_data, buf, count);
> +	mutex_unlock(&fw_cfg_dev_lock);
> +}

Have you had a chance to play with Marc Mari's fw_cfg DMA interface
patches?  They should make this operation much faster.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg325541.html

Stefan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's fw_cfg device
  2015-10-03 23:28 ` [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's " Gabriel L. Somlo
  2015-10-04  1:34   ` kbuild test robot
  2015-10-06  8:40   ` [Qemu-devel] " Stefan Hajnoczi
@ 2015-10-06 12:53   ` Laszlo Ersek
  2015-10-06 17:54   ` Andy Lutomirski
  3 siblings, 0 replies; 26+ messages in thread
From: Laszlo Ersek @ 2015-10-06 12:53 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: gregkh, paul, galak, will.deacon, agross, mark.rutland, zajec5,
	hanjun.guo, catalin.marinas, linux-api, linux-kernel,
	kernelnewbies, matt.fleming, jordan.l.justen, mst, peter.maydell,
	leif.lindholm, ard.biesheuvel, pbonzini, kraxel, qemu-devel

On 10/04/15 01:28, Gabriel L. Somlo wrote:
> From: Gabriel Somlo <somlo@cmu.edu>
> 
> Make fw_cfg entries of type "file" available via sysfs. Entries
> are listed under /sys/firmware/qemu_fw_cfg/by_key, in folders
> named after each entry's selector key. Filename, selector value,
> and size read-only attributes are included for each entry. Also,
> a "raw" attribute allows retrieval of the full binary content of
> each entry.
> 
> This patch also provides a documentation file outlining the
> guest-side "hardware" interface exposed by the QEMU fw_cfg device.
> 
> Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
> ---
>  .../ABI/testing/sysfs-firmware-qemu_fw_cfg         | 167 ++++++++
>  drivers/firmware/Kconfig                           |  10 +
>  drivers/firmware/Makefile                          |   1 +
>  drivers/firmware/qemu_fw_cfg.c                     | 456 +++++++++++++++++++++
>  4 files changed, 634 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
>  create mode 100644 drivers/firmware/qemu_fw_cfg.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> new file mode 100644
> index 0000000..f1ef44e
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-firmware-qemu_fw_cfg
> @@ -0,0 +1,167 @@
> +What:		/sys/firmware/qemu_fw_cfg/
> +Date:		August 2015
> +Contact:	Gabriel Somlo <somlo@cmu.edu>
> +Description:
> +		Several different architectures supported by QEMU (x86, arm,
> +		sun4*, ppc/mac) are provisioned with a firmware configuration
> +		(fw_cfg) device, used by the host to provide configuration data
> +		to the starting guest. While most of this data is meant for use
> +		by the guest firmware, starting with QEMU v2.4, guest VMs may
> +		be given arbitrary fw_cfg entries supplied directly on the
> +		command line, which therefore may be of interest to userspace.
> +
> +		=== Guest-side Hardware Interface ===
> +
> +		The fw_cfg device is available to guest VMs as a register pair
> +		(control and data), accessible as either a IO ports or as MMIO
> +		addresses, depending on the architecture.
> +
> +		--- Control Register ---
> +
> +		Width: 16-bit
> +		Access: Write-Only
> +		Endianness: LE (if IOport) or BE (if MMIO)
> +
> +		A write to the control register selects the index for one of
> +		the firmware configuration items (or "blobs") available on the
> +		fw_cfg device, which can subsequently be read from the data
> +		register.
> +
> +		Each time the control register is written, an data offset
> +		internal to the fw_cfg device will be set to zero. This data
> +		offset impacts which portion of the selected fw_cfg blob is
> +		accessed by reading the data register, as explained below.
> +
> +		--- Data Register ---
> +
> +		Width: 8-bit (if IOport), or 8/16/32/64-bit (if MMIO)
> +		Access: Read-Only
> +		Endianness: string preserving
> +
> +		The data register allows access to an array of bytes which
> +		represent the fw_cfg blob last selected by a write to the
> +		control register.
> +
> +		Immediately following a write to the control register, the data
> +		offset will be set to zero. Each successful read access to the
> +		data register will increment the data offset by the appropriate
> +		access width.
> +
> +		Each fw_cfg blob has a maximum associated data length. Once the
> +		data offset exceeds this maximum length, any subsequent reads
> +		via the data register will return 0x00.
> +
> +		An N-byte wide read of the data register will return the next
> +		available N bytes of the selected fw_cfg blob, as a substring,
> +		in increasing address order, similar to memcpy(), zero-padded
> +		if necessary should the maximum data length of the selected
> +		item be reached, as described above.
> +
> +		--- Per-arch Register Details ---
> +
> +		-------------------------------------------------------------
> +		arch	access	       base	ctrl	ctrl	data	max.
> +			mode	    address	offset	endian	offset	data
> +						(bytes)			(bytes)
> +		-------------------------------------------------------------
> +		x86*	IOport	      0x510	0	LE	1	1
> +		arm	MMIO	  0x9020000	8	BE	0	8
> +		sun4u	IOport	      0x510	0	LE	1	1
> +		sun4m	MMIO	0xd00000510	0	BE	2	1
> +		ppc/mac	MMIO	 0xf0000510	0	BE	2	1
> +		-------------------------------------------------------------
> +
> +		NOTE 1. On platforms where the fw_cfg registers are exposed as
> +		IO ports, the data port number will always be one greater than
> +		the port number of the control register. I.e., the two ports
> +		are overlapping, and can not be mapped separately.
> +
> +		=== Firmware Configuration Items of Interest ===
> +
> +		Originally, the index key, size, and formatting of blobs in
> +		fw_cfg was hard coded by mutual agreement between QEMU on the
> +		host side, and the guest-side firmware. Later on, a file
> +		transfer interface was added: by reading a special blob, the
> +		fw_cfg consumer can retrieve a list of records containing the
> +		name, selector key, and size of further fw_cfg blobs made
> +		available by the host. Below we describe three fw_cfg blobs
> +		of interest to the sysfs driver.
> +
> +		--- Signature (Key 0x0000, FW_CFG_SIGNATURE) ---
> +
> +		The presence of the fw_cfg device can be verified by selecting
> +		the signature blob by writing 0x0000 to the control register,
> +		and reading four bytes from the data register. If the fw_cfg
> +		device is present, the four bytes read will match the ASCII
> +		characters "QEMU".
> +
> +		--- Revision (Key 0x0001, FW_CFG_ID) ---
> +
> +		A 32-bit little-endian unsigned integer, this item is used as
> +		an interface revision number.
> +
> +		--- File Directory (Key 0x0019, FW_CFG_FILE_DIR) ---
> +
> +		Any fw_cfg blobs stored at key 0x0020 FW_CFG_FILE_FIRST() or
> +		higher will have an associated entry in this "directory" blob,
> +		which facilitates the discovery of available items by software
> +		(e.g. BIOS) running on the guest. The format of the directory
> +		blob is shown below.
> +
> +		NOTE: All integers are stored in big-endian format!
> +
> +		/* the entire file directory "blob" */
> +		struct FWCfgFiles {
> +			u32 count;		/* total number of entries */
> +			struct FWCfgFile f[];	/* entry array, see below */
> +		};
> +
> +		/* an individual directory entry, 64 bytes total */
> +		struct FWCfgFile {
> +			u32 size;	/* size of referenced blob */
> +			u16 select;	/* selector key for referenced blob */
> +			u16 reserved;
> +			char name[56];	/* blob name, nul-terminated ASCII */
> +		};

I think the above is somewhat redundant with regard to
"Documentation/devicetree/bindings/arm/fw-cfg.txt". That may not
necessarily be a problem, I'd just like to make you aware of that file
too. Perhaps that file should be updated too.

... In any case, I wanted to call your attention to that file, because
it's been now raised that the driver should handle both ACPI and DT
guests. In DT, the device is already exposed -- that's how the UEFI
guest firmware for aarch64 guests learns about it.

So, if you'd like to follow the example of
"drivers/virtio/virtio_mmio.c" (which I think would be a good idea), in
order to recognize the device in both DT and ACPI guests, then on the
QEMU side, you won't have to do anything for DT, because
"qemu,fw-cfg-mmio" is already exposed in create_fw_cfg() [hw/arm/virt.c].

Thanks
Laszlo

> +
> +		=== SysFS fw_cfg Interface ===
> +
> +		The fw_cfg sysfs interface described in this document is only
> +		intended to display discoverable blobs (i.e., those registered
> +		with the file directory), as there is no way to determine the
> +		presence or size of "legacy" blobs (with selector keys between
> +		0x0002 and 0x0018) programmatically.
> +
> +		All fw_cfg information is shown under:
> +
> +			/sys/firmware/qemu_fw_cfg/
> +
> +		The only legacy blob displayed is the fw_cfg device revision:
> +
> +			/sys/firmware/qemu_fw_cfg/rev
> +
> +		--- Discoverable fw_cfg blobs by selector key ---
> +
> +		All discoverable blobs listed in the fw_cfg file directory are
> +		displayed as entries named after their unique selector key
> +		value, e.g.:
> +
> +			/sys/firmware/qemu_fw_cfg/by_key/32
> +			/sys/firmware/qemu_fw_cfg/by_key/33
> +			/sys/firmware/qemu_fw_cfg/by_key/34
> +			...
> +
> +		Each such fw_cfg sysfs entry has the following values exported
> +		as attributes:
> +
> +		name  	: The 56-byte nul-terminated ASCII string used as the
> +			  blob's 'file name' in the fw_cfg directory.
> +		size  	: The length of the blob, as given in the fw_cfg
> +			  directory.
> +		key	: The value of the blob's selector key as given in the
> +			  fw_cfg directory. This value is the same as used in
> +			  the parent directory name.
> +		raw	: The raw bytes of the blob, obtained by selecting the
> +			  entry via the control register, and reading a number
> +			  of bytes equal to the blob size from the data
> +			  register.
> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> index 665efca..0466e80 100644
> --- a/drivers/firmware/Kconfig
> +++ b/drivers/firmware/Kconfig
> @@ -135,6 +135,16 @@ config ISCSI_IBFT
>  	  detect iSCSI boot parameters dynamically during system boot, say Y.
>  	  Otherwise, say N.
>  
> +config FW_CFG_SYSFS
> +	tristate "QEMU fw_cfg device support in sysfs"
> +	depends on SYSFS
> +	default n
> +	help
> +	  Say Y or M here to enable the exporting of the QEMU firmware
> +	  configuration (fw_cfg) file entries via sysfs. Entries are
> +	  found under /sys/firmware/fw_cfg when this option is enabled
> +	  and loaded.
> +
>  config QCOM_SCM
>  	bool
>  	depends on ARM || ARM64
> diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile
> index 2ee8347..efba22a 100644
> --- a/drivers/firmware/Makefile
> +++ b/drivers/firmware/Makefile
> @@ -12,6 +12,7 @@ obj-$(CONFIG_DMIID)		+= dmi-id.o
>  obj-$(CONFIG_ISCSI_IBFT_FIND)	+= iscsi_ibft_find.o
>  obj-$(CONFIG_ISCSI_IBFT)	+= iscsi_ibft.o
>  obj-$(CONFIG_FIRMWARE_MEMMAP)	+= memmap.o
> +obj-$(CONFIG_FW_CFG_SYSFS)	+= qemu_fw_cfg.o
>  obj-$(CONFIG_QCOM_SCM)		+= qcom_scm.o
>  obj-$(CONFIG_QCOM_SCM_64)	+= qcom_scm-64.o
>  obj-$(CONFIG_QCOM_SCM_32)	+= qcom_scm-32.o
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> new file mode 100644
> index 0000000..3a67a16
> --- /dev/null
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -0,0 +1,456 @@
> +/*
> + * drivers/firmware/qemu_fw_cfg.c
> + *
> + * Expose entries from QEMU's firmware configuration (fw_cfg) device in
> + * sysfs (read-only, under "/sys/firmware/qemu_fw_cfg/...").
> + *
> + * Copyright 2015 Carnegie Mellon University
> + */
> +
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/io.h>
> +#include <linux/ioport.h>
> +
> +MODULE_AUTHOR("Gabriel L. Somlo <somlo@cmu.edu>");
> +MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
> +MODULE_LICENSE("GPL");
> +
> +/* selector key values for "well-known" fw_cfg entries */
> +#define FW_CFG_SIGNATURE  0x00
> +#define FW_CFG_ID         0x01
> +#define FW_CFG_FILE_DIR   0x19
> +
> +/* size in bytes of fw_cfg signature */
> +#define FW_CFG_SIG_SIZE 4
> +
> +/* fw_cfg "file name" is up to 56 characters (including terminating nul) */
> +#define FW_CFG_MAX_FILE_PATH 56
> +
> +/* fw_cfg file directory entry type */
> +struct fw_cfg_file {
> +	u32 size;
> +	u16 select;
> +	u16 reserved;
> +	char name[FW_CFG_MAX_FILE_PATH];
> +};
> +
> +/* fw_cfg device i/o access options type */
> +struct fw_cfg_access {
> +	const char *name;
> +	phys_addr_t base;
> +	u8 size;
> +	u8 ctrl_offset;
> +	u8 data_offset;
> +	bool is_mmio;
> +};
> +
> +/* table of fw_cfg device i/o access options for known architectures */
> +static struct fw_cfg_access fw_cfg_modes[] = {
> +	{
> +		.name = "fw_cfg IOport on i386, sun4u",
> +		.base = 0x510,
> +		.size = 0x02,
> +		.ctrl_offset = 0x00,
> +		.data_offset = 0x01,
> +		.is_mmio = false,
> +	}, {
> +		.name = "fw_cfg MMIO on arm",
> +		.base = 0x9020000,
> +		.size = 0x0a,
> +		.ctrl_offset = 0x08,
> +		.data_offset = 0x00,
> +		.is_mmio = true,
> +	}, {
> +		.name = "fw_cfg MMIO on sun4m",
> +		.base = 0xd00000510,
> +		.size = 0x03,
> +		.ctrl_offset = 0x00,
> +		.data_offset = 0x02,
> +		.is_mmio = true,
> +	}, {
> +		.name = "fw_cfg MMIO on ppc/mac",
> +		.base = 0xf0000510,
> +		.size = 0x03,
> +		.ctrl_offset = 0x00,
> +		.data_offset = 0x02,
> +		.is_mmio = true,
> +	}, { } /* END */
> +};
> +
> +/* fw_cfg device i/o currently selected option set */
> +static struct fw_cfg_access *fw_cfg_mode;
> +
> +/* fw_cfg device i/o register addresses */
> +static void __iomem *fw_cfg_dev_base;
> +static void __iomem *fw_cfg_reg_ctrl;
> +static void __iomem *fw_cfg_reg_data;
> +
> +/* atomic access to fw_cfg device (potentially slow i/o, so using mutex) */
> +static DEFINE_MUTEX(fw_cfg_dev_lock);
> +
> +/* pick appropriate endianness for selector key */
> +static inline u16 fw_cfg_sel_endianness(u16 key)
> +{
> +	return fw_cfg_mode->is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
> +}
> +
> +/* type for fw_cfg "directory scan" visitor/callback function */
> +typedef int (*fw_cfg_file_callback)(const struct fw_cfg_file *f);
> +
> +/* run a given callback on each fw_cfg directory entry */
> +static int fw_cfg_scan_dir(fw_cfg_file_callback callback)
> +{
> +	int ret = 0;
> +	u32 count, i;
> +	struct fw_cfg_file f;
> +
> +	mutex_lock(&fw_cfg_dev_lock);
> +	iowrite16(fw_cfg_sel_endianness(FW_CFG_FILE_DIR), fw_cfg_reg_ctrl);
> +	ioread8_rep(fw_cfg_reg_data, &count, sizeof(count));
> +	for (i = 0; i < be32_to_cpu(count); i++) {
> +		ioread8_rep(fw_cfg_reg_data, &f, sizeof(f));
> +		ret = callback(&f);
> +		if (ret)
> +			break;
> +	}
> +	mutex_unlock(&fw_cfg_dev_lock);
> +	return ret;
> +}
> +
> +/* read chunk of given fw_cfg blob (caller responsible for sanity-check) */
> +static inline void fw_cfg_read_blob(u16 key,
> +				    void *buf, loff_t pos, size_t count)
> +{
> +	mutex_lock(&fw_cfg_dev_lock);
> +	iowrite16(fw_cfg_sel_endianness(key), fw_cfg_reg_ctrl);
> +	while (pos-- > 0)
> +		ioread8(fw_cfg_reg_data);
> +	ioread8_rep(fw_cfg_reg_data, buf, count);
> +	mutex_unlock(&fw_cfg_dev_lock);
> +}
> +
> +/* clean up fw_cfg device i/o */
> +static void fw_cfg_io_cleanup(void)
> +{
> +	if (fw_cfg_mode->is_mmio) {
> +		iounmap(fw_cfg_dev_base);
> +		release_mem_region(fw_cfg_mode->base, fw_cfg_mode->size);
> +	} else {
> +		ioport_unmap(fw_cfg_dev_base);
> +		release_region(fw_cfg_mode->base, fw_cfg_mode->size);
> +	}
> +}
> +
> +/* probe and map fw_cfg device */
> +static int __init fw_cfg_io_probe(void)
> +{
> +	char sig[FW_CFG_SIG_SIZE];
> +
> +	for (fw_cfg_mode = &fw_cfg_modes[0];
> +	     fw_cfg_mode->base; fw_cfg_mode++) {
> +
> +		phys_addr_t base = fw_cfg_mode->base;
> +		u8 size = fw_cfg_mode->size;
> +
> +		/* reserve and map mmio or ioport region */
> +		if (fw_cfg_mode->is_mmio) {
> +			if (!request_mem_region(base, size, fw_cfg_mode->name))
> +				continue;
> +			fw_cfg_dev_base = ioremap(base, size);
> +			if (!fw_cfg_dev_base) {
> +				release_mem_region(base, size);
> +				continue;
> +			}
> +		} else {
> +			if (!request_region(base, size, fw_cfg_mode->name))
> +				continue;
> +			fw_cfg_dev_base = ioport_map(base, size);
> +			if (!fw_cfg_dev_base) {
> +				release_region(base, size);
> +				continue;
> +			}
> +		}
> +
> +		/* set control and data register addresses */
> +		fw_cfg_reg_ctrl = fw_cfg_dev_base + fw_cfg_mode->ctrl_offset;
> +		fw_cfg_reg_data = fw_cfg_dev_base + fw_cfg_mode->data_offset;
> +
> +		/* verify fw_cfg device signature */
> +		fw_cfg_read_blob(FW_CFG_SIGNATURE, sig, 0, FW_CFG_SIG_SIZE);
> +		if (memcmp(sig, "QEMU", FW_CFG_SIG_SIZE) == 0)
> +			/* success, we're done */
> +			return 0;
> +
> +		/* clean up before probing next access mode */
> +		fw_cfg_io_cleanup();
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. */
> +static u32 fw_cfg_rev;
> +
> +static ssize_t fw_cfg_showrev(struct kobject *k, struct attribute *a, char *buf)
> +{
> +	return sprintf(buf, "%u\n", fw_cfg_rev);
> +}
> +
> +static const struct {
> +	struct attribute attr;
> +	ssize_t (*show)(struct kobject *k, struct attribute *a, char *buf);
> +} fw_cfg_rev_attr = {
> +	.attr = { .name = "rev", .mode = S_IRUSR },
> +	.show = fw_cfg_showrev,
> +};
> +
> +/* fw_cfg_sysfs_entry type */
> +struct fw_cfg_sysfs_entry {
> +	struct kobject kobj;
> +	struct fw_cfg_file f;
> +	struct list_head list;
> +};
> +
> +/* get fw_cfg_sysfs_entry from kobject member */
> +static inline struct fw_cfg_sysfs_entry *to_entry(struct kobject *kobj)
> +{
> +	return container_of(kobj, struct fw_cfg_sysfs_entry, kobj);
> +}
> +
> +/* fw_cfg_sysfs_attribute type */
> +struct fw_cfg_sysfs_attribute {
> +	struct attribute attr;
> +	ssize_t (*show)(struct fw_cfg_sysfs_entry *entry, char *buf);
> +};
> +
> +/* get fw_cfg_sysfs_attribute from attribute member */
> +static inline struct fw_cfg_sysfs_attribute *to_attr(struct attribute *attr)
> +{
> +	return container_of(attr, struct fw_cfg_sysfs_attribute, attr);
> +}
> +
> +/* global cache of fw_cfg_sysfs_entry objects */
> +static LIST_HEAD(fw_cfg_entry_cache);
> +
> +/* kobjects removed lazily by kernel, mutual exclusion needed */
> +static DEFINE_SPINLOCK(fw_cfg_cache_lock);
> +
> +static inline void fw_cfg_sysfs_cache_enlist(struct fw_cfg_sysfs_entry *entry)
> +{
> +	spin_lock(&fw_cfg_cache_lock);
> +	list_add_tail(&entry->list, &fw_cfg_entry_cache);
> +	spin_unlock(&fw_cfg_cache_lock);
> +}
> +
> +static inline void fw_cfg_sysfs_cache_delist(struct fw_cfg_sysfs_entry *entry)
> +{
> +	spin_lock(&fw_cfg_cache_lock);
> +	list_del(&entry->list);
> +	spin_unlock(&fw_cfg_cache_lock);
> +}
> +
> +static void fw_cfg_sysfs_cache_cleanup(void)
> +{
> +	struct fw_cfg_sysfs_entry *entry, *next;
> +
> +	list_for_each_entry_safe(entry, next, &fw_cfg_entry_cache, list) {
> +		/* will end up invoking fw_cfg_sysfs_cache_delist()
> +		 * via each object's release() method (i.e. destructor)
> +		 */
> +		kobject_put(&entry->kobj);
> +	}
> +}
> +
> +/* default_attrs: per-entry attributes and show methods */
> +
> +#define FW_CFG_SYSFS_ATTR(_attr) \
> +struct fw_cfg_sysfs_attribute fw_cfg_sysfs_attr_##_attr = { \
> +	.attr = { .name = __stringify(_attr), .mode = S_IRUSR }, \
> +	.show = fw_cfg_sysfs_show_##_attr, \
> +}
> +
> +static ssize_t fw_cfg_sysfs_show_size(struct fw_cfg_sysfs_entry *e, char *buf)
> +{
> +	return sprintf(buf, "%u\n", e->f.size);
> +}
> +
> +static ssize_t fw_cfg_sysfs_show_key(struct fw_cfg_sysfs_entry *e, char *buf)
> +{
> +	return sprintf(buf, "%u\n", e->f.select);
> +}
> +
> +static ssize_t fw_cfg_sysfs_show_name(struct fw_cfg_sysfs_entry *e, char *buf)
> +{
> +	return sprintf(buf, "%s\n", e->f.name);
> +}
> +
> +static FW_CFG_SYSFS_ATTR(size);
> +static FW_CFG_SYSFS_ATTR(key);
> +static FW_CFG_SYSFS_ATTR(name);
> +
> +static struct attribute *fw_cfg_sysfs_entry_attrs[] = {
> +	&fw_cfg_sysfs_attr_size.attr,
> +	&fw_cfg_sysfs_attr_key.attr,
> +	&fw_cfg_sysfs_attr_name.attr,
> +	NULL,
> +};
> +
> +/* sysfs_ops: find fw_cfg_[entry, attribute] and call appropriate show method */
> +static ssize_t fw_cfg_sysfs_attr_show(struct kobject *kobj, struct attribute *a,
> +				      char *buf)
> +{
> +	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> +	struct fw_cfg_sysfs_attribute *attr = to_attr(a);
> +
> +	return attr->show(entry, buf);
> +}
> +
> +static const struct sysfs_ops fw_cfg_sysfs_attr_ops = {
> +	.show = fw_cfg_sysfs_attr_show,
> +};
> +
> +/* release: destructor, to be called via kobject_put() */
> +static void fw_cfg_sysfs_release_entry(struct kobject *kobj)
> +{
> +	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> +
> +	fw_cfg_sysfs_cache_delist(entry);
> +	kfree(entry);
> +}
> +
> +/* kobj_type: ties together all properties required to register an entry */
> +static struct kobj_type fw_cfg_sysfs_entry_ktype = {
> +	.default_attrs = fw_cfg_sysfs_entry_attrs,
> +	.sysfs_ops = &fw_cfg_sysfs_attr_ops,
> +	.release = fw_cfg_sysfs_release_entry,
> +};
> +
> +/* raw-read method and attribute */
> +static ssize_t fw_cfg_sysfs_read_raw(struct file *filp, struct kobject *kobj,
> +				     struct bin_attribute *bin_attr,
> +				     char *buf, loff_t pos, size_t count)
> +{
> +	struct fw_cfg_sysfs_entry *entry = to_entry(kobj);
> +
> +	if (pos > entry->f.size)
> +		return -EINVAL;
> +
> +	if (count > entry->f.size - pos)
> +		count = entry->f.size - pos;
> +
> +	fw_cfg_read_blob(entry->f.select, buf, pos, count);
> +	return count;
> +}
> +
> +static struct bin_attribute fw_cfg_sysfs_attr_raw = {
> +	.attr = { .name = "raw", .mode = 0400 },
> +	.read = fw_cfg_sysfs_read_raw,
> +};
> +
> +/* kobjects & kset representing top-level, by_key, and by_name folders */
> +static struct kobject *fw_cfg_top_ko;
> +static struct kobject *fw_cfg_sel_ko;
> +
> +/* callback function to register an individual fw_cfg file */
> +static int __init fw_cfg_register_file(const struct fw_cfg_file *f)
> +{
> +	int err;
> +	struct fw_cfg_sysfs_entry *entry;
> +
> +	/* allocate new entry */
> +	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> +	if (!entry)
> +		return -ENOMEM;
> +
> +	/* set file entry information */
> +	entry->f.size = be32_to_cpu(f->size);
> +	entry->f.select = be16_to_cpu(f->select);
> +	strcpy(entry->f.name, f->name);
> +
> +	/* register entry under "/sys/firmware/qemu_fw_cfg/by_key/" */
> +	err = kobject_init_and_add(&entry->kobj, &fw_cfg_sysfs_entry_ktype,
> +				   fw_cfg_sel_ko, "%d", entry->f.select);
> +	if (err)
> +		goto err_register;
> +
> +	/* add raw binary content access */
> +	err = sysfs_create_bin_file(&entry->kobj, &fw_cfg_sysfs_attr_raw);
> +	if (err)
> +		goto err_add_raw;
> +
> +	/* success, add entry to global cache */
> +	fw_cfg_sysfs_cache_enlist(entry);
> +	return 0;
> +
> +err_add_raw:
> +	kobject_del(&entry->kobj);
> +err_register:
> +	kfree(entry);
> +	return err;
> +}
> +
> +/* unregister top-level or by_key folder */
> +static inline void fw_cfg_kobj_cleanup(struct kobject *kobj)
> +{
> +	kobject_del(kobj);
> +	kobject_put(kobj);
> +}
> +
> +static int __init fw_cfg_sysfs_init(void)
> +{
> +	int err;
> +
> +	/* probe for the fw_cfg "hardware" */
> +	err = fw_cfg_io_probe();
> +	if (err)
> +		return err;
> +
> +	/* create /sys/firmware/qemu_fw_cfg/ and its subdirectories */
> +	err = -ENOMEM;
> +	fw_cfg_top_ko = kobject_create_and_add("qemu_fw_cfg", firmware_kobj);
> +	if (!fw_cfg_top_ko)
> +		goto err_top;
> +	fw_cfg_sel_ko = kobject_create_and_add("by_key", fw_cfg_top_ko);
> +	if (!fw_cfg_sel_ko)
> +		goto err_sel;
> +
> +	/* get revision number, add matching top-level attribute */
> +	fw_cfg_read_blob(FW_CFG_ID, &fw_cfg_rev, 0, sizeof(fw_cfg_rev));
> +	fw_cfg_rev = le32_to_cpu(fw_cfg_rev);
> +	err = sysfs_create_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> +	if (err)
> +		goto err_rev;
> +
> +	/* process fw_cfg file directory entry, registering each file */
> +	err = fw_cfg_scan_dir(fw_cfg_register_file);
> +	if (err)
> +		goto err_scan;
> +
> +	/* success */
> +	pr_debug("fw_cfg: loaded.\n");
> +	return 0;
> +
> +err_scan:
> +	fw_cfg_sysfs_cache_cleanup();
> +	sysfs_remove_file(fw_cfg_top_ko, &fw_cfg_rev_attr.attr);
> +err_rev:
> +	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
> +err_sel:
> +	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
> +err_top:
> +	fw_cfg_io_cleanup();
> +	return err;
> +}
> +
> +static void __exit fw_cfg_sysfs_exit(void)
> +{
> +	pr_debug("fw_cfg: unloading.\n");
> +	fw_cfg_sysfs_cache_cleanup();
> +	fw_cfg_kobj_cleanup(fw_cfg_sel_ko);
> +	fw_cfg_kobj_cleanup(fw_cfg_top_ko);
> +	fw_cfg_io_cleanup();
> +}
> +
> +module_init(fw_cfg_sysfs_init);
> +module_exit(fw_cfg_sysfs_exit);
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's fw_cfg device
  2015-10-03 23:28 ` [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's " Gabriel L. Somlo
                     ` (2 preceding siblings ...)
  2015-10-06 12:53   ` Laszlo Ersek
@ 2015-10-06 17:54   ` Andy Lutomirski
  2015-10-06 18:17     ` Gabriel L. Somlo
  3 siblings, 1 reply; 26+ messages in thread
From: Andy Lutomirski @ 2015-10-06 17:54 UTC (permalink / raw)
  To: Gabriel L. Somlo
  Cc: Greg KH, paul, galak, Will Deacon, agross, Mark Rutland, zajec5,
	hanjun.guo, Catalin Marinas, Linux API, linux-kernel,
	kernelnewbies, Matt Fleming, Laszlo Ersek,
	Jordan Justen (Intel address),
	Michael S. Tsirkin, peter.maydell, Leif Lindholm, Ard Biesheuvel,
	Paolo Bonzini, kraxel, qemu-devel@nongnu.org Developers

On Sat, Oct 3, 2015 at 4:28 PM, Gabriel L. Somlo <somlo@cmu.edu> wrote:
> From: Gabriel Somlo <somlo@cmu.edu>
>
> Make fw_cfg entries of type "file" available via sysfs. Entries
> are listed under /sys/firmware/qemu_fw_cfg/by_key, in folders
> named after each entry's selector key. Filename, selector value,
> and size read-only attributes are included for each entry. Also,
> a "raw" attribute allows retrieval of the full binary content of
> each entry.
>
> This patch also provides a documentation file outlining the
> guest-side "hardware" interface exposed by the QEMU fw_cfg device.
>

What's the status of "by_name"?  There's a single (presumably
incorrect) mention of it in a comment in this patch.

I would prefer if the kernel populated by_name itself rather than
deferring that to udev, since I'd like to use this facility in virtme,
and I'd like to use fw_cfg very early on boot before I even start
udev.

--Andy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's fw_cfg device
  2015-10-06 17:54   ` Andy Lutomirski
@ 2015-10-06 18:17     ` Gabriel L. Somlo
  0 siblings, 0 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-10-06 18:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Greg KH, paul, galak, Will Deacon, agross, Mark Rutland, zajec5,
	hanjun.guo, Catalin Marinas, Linux API, linux-kernel,
	kernelnewbies, Matt Fleming, Laszlo Ersek,
	Jordan Justen (Intel address),
	Michael S. Tsirkin, peter.maydell, Leif Lindholm, Ard Biesheuvel,
	Paolo Bonzini, kraxel, qemu-devel@nongnu.org Developers

On Tue, Oct 06, 2015 at 10:54:42AM -0700, Andy Lutomirski wrote:
> On Sat, Oct 3, 2015 at 4:28 PM, Gabriel L. Somlo <somlo@cmu.edu> wrote:
> > From: Gabriel Somlo <somlo@cmu.edu>
> >
> > Make fw_cfg entries of type "file" available via sysfs. Entries
> > are listed under /sys/firmware/qemu_fw_cfg/by_key, in folders
> > named after each entry's selector key. Filename, selector value,
> > and size read-only attributes are included for each entry. Also,
> > a "raw" attribute allows retrieval of the full binary content of
> > each entry.
> >
> > This patch also provides a documentation file outlining the
> > guest-side "hardware" interface exposed by the QEMU fw_cfg device.
> >
> 
> What's the status of "by_name"?  There's a single (presumably
> incorrect) mention of it in a comment in this patch.
> 
> I would prefer if the kernel populated by_name itself rather than
> deferring that to udev, since I'd like to use this facility in virtme,
> and I'd like to use fw_cfg very early on boot before I even start
> udev.

"by_name" is added with patch 4/4 of the series, which I kept separate
due to the "To udev or not to udev" conversation from earlier. So far
I haven't yet figured out just HOW I'd set it up in udev, but it works
already if done in the kernel :)

Thanks,
--Gabriel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device
  2015-10-05 13:18       ` Paolo Bonzini
@ 2015-11-04 20:48         ` Gabriel L. Somlo
  0 siblings, 0 replies; 26+ messages in thread
From: Gabriel L. Somlo @ 2015-11-04 20:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Mark Rutland, gregkh, paul, Kumar Gala,
	Will Deacon, agross, zajec5, Hanjun Guo, Catalin Marinas,
	linux-api, lkml - Kernel Mailing List, kernelnewbies,
	Matt Fleming, Laszlo Ersek, Jordan Justen, Michael S. Tsirkin,
	Leif Lindholm, Ard Biesheuvel, Gerd Hoffmann, QEMU Developers

On Mon, Oct 05, 2015 at 03:18:05PM +0200, Paolo Bonzini wrote:
> 
> 
> On 05/10/2015 14:50, Peter Maydell wrote:
> > If you want to try to support "firmware might also be reading
> > fw_cfg at the same time as the kernel" this is a (painful)
> > problem regardless of how the kernel figures out whether a
> > fw_cfg device is present. I had assumed that one of the design
> > assumptions of this series was that firmware would only
> > read the fw_cfg before booting the guest kernel and never touch
> > it afterwards. If it might touch it later then letting the
> > guest kernel also mess with fw_cfg seems like a really bad idea.
> 
> The idea of tinkering with fw_cfg from the AML code (DSDT/SSDT) has been
> proposed many times, and always dropped.  One of the reasons was that
> the OS could have a driver for fw_cfg.
> 
> So I think that we can define the QEMU0002 id as owned by the OSPM,
> similar to the various standard ACPI ids that are usually found in the
> x86 world (e.g. PNP0B00 is a mc146818 RTC, PNP0303 is an 8042 keyboard
> controller, PNP0501 is a 16550 or similar UART, and so on).  This
> basically sanctions _CRS as the way to pass information from the
> firmware to the OSPM, also similarly to those standard PNP ids.

OK, so I don't expect to go the "pure ACPI" route in the final
version, mainly because I'm still hoping to fill the requirement
of writing a driver which can use either ACPI or DT to detect the
presence of fw_cfg (how I'm going to compile this on kernels with
no ACPI or no DT support is still TBD, and probably will have to
involve #ifdef, but I digress :)

However, Michael's idea of having ACPI supply an "accessor method" for
the guest kernel driver to call, without having to worry about the
specific details in _CRS, sounded intriguing enough to at least explore
in further detail.

So, assuming we have the following fw_cfg AML in ssdt (i386) or
dsdt (arm) (using cpp-style #ifdef to highlight the arch-specific
bits):

Scope (\_SB)
{
    Device (FWCF)
    {
        Name (_HID, EisaId ("QMU0002"))  // _HID: Hardware ID
        Name (_STA, 0x0B)  // _STA: Status

#ifdef ARCH_X86

        Name (_CRS, ResourceTemplate ()
        {
            IO (Decode16,
                0x0510,             // Range Minimum
                0x0510,             // Range Maximum
                0x01,               // Alignment
                0x02,               // Length
                )
        })

#else /* ARCH_ARM */

        NAME (_CRS, ResourceTemplate ()
        {
            Memory32Fixed (ReadWrite,
                           0x09020000,  // Address Base
                           0x0000000a,  // Address Length
                          )
        })

#endif

    }
}

I can easily patch QEMU to additionally insert the following into
"Device (FWCF)":

#ifdef ARCH_X86

        OperationRegion (FWCR, SystemIO, 0x0510, 0x02)
        Field (FWCR, WordAcc, NoLock, Preserve)
        {
            FWCC,   16
        }
        Field (FWCR, ByteAcc, NoLock, Preserve)
        {
            Offset (0x01),
            FWCD,   8
        }

#else /* ARCH_ARM */

        OperationRegion (FWCR, SystemMemory, 0x09020000, 0x0a)
        Field (FWCR, WordAcc, NoLock, Preserve)
        {
            Offset (0x08),
            FWCC,   16  // not sure about endianness on ARM here, but
                        // I can still address this from the userspace
                        // kernel driver if needed
        }
        Field (FWCR, ByteAcc, NoLock, Preserve)
        {
            FWCD,   8
        }

#endif

        Method (RDBL, 3, Serialized) // (Arg0 = key, Arg1 = skip, Arg2 = count)
        {
            FWCC = Arg0
            Local0 = Zero
            While ((Local0 < Arg1))
            {
                Local1 = FWCD
                Local0++
            }
            Name (BBUF, Buffer (Arg2) {})
            Local0 = Zero
            While ((Local0 < Arg2))
            {
                Index (BBUF, Local0) = FWCD /* \_SB_.FWCF.FWCD */
                Local0++
            }
            Return (BBUF) /* \_SB_.FWCF.RDBL.BBUF */
        }

With the host generating the additional RDBL method above, I could
write a "pure ACPI" driver for the guest kernel, where instead of the
current fw_cfg_read_blob() logic:


  static DEFINE_MUTEX(fw_cfg_dev_lock);
  static bool fw_cfg_is_mmio;

  static inline u16 fw_cfg_sel_endianness(u16 key)
  {
          return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
  }

  static inline void fw_cfg_read_blob(u16 key,
                                      void *buf, loff_t pos, size_t count)
  {
          mutex_lock(&fw_cfg_dev_lock);
          iowrite16(fw_cfg_sel_endianness(key), fw_cfg_reg_ctrl);
          while (pos-- > 0)
                  ioread8(fw_cfg_reg_data);
          ioread8_rep(fw_cfg_reg_data, buf, count);
          mutex_unlock(&fw_cfg_dev_lock);
  }


I could instead write something like this:


  struct acpi_device *dev;    /* set during module init.  */

  static inline void fw_cfg_read_blob(u16 key,
                                      void *buf, loff_t pos, size_t count)
  {
          union acpi_object arg_elem[3], *obj;
          struct acpi_object_list arg;
          struct acpi_buffer acpi_buf = { ACPI_ALLOCATE_BUFFER, NULL };
          acpi_status status;

          arg.count = 3;
          arg.pointer = &arg_elem[0];

          arg_elem[0].type =
          arg_elem[1].type =
          arg_elem[2].type = ACPI_TYPE_INTEGER;

          arg_elem[0].integer.value = key;
          arg_elem[1].integer.value = pos;
          arg_elem[2].integer.value = count;

          status = acpi_evaluate_object_typed(dev->handle, "RDBL", &arg,
                                              &acpi_buf, ACPI_TYPE_BUFFER);
          if (ACPI_FAILURE(status)) {
              return; /* FIXME: actual error signaling to caller TBD */
          }
 
          obj = (union acpi_object *)acpi_buf.pointer;

          /* FIXME: in lieu of better error signaling to caller: */
          assert(count == obj->buffer.length);

          memcpy(buf, obj->buffer.pointer, obj->buffer.length);
          kfree(acpi_buf.pointer);
  }

Now, if ACPI-less DT-enabled architectures are to be supported, this
wouldn't get me out of having to spell out the original ioread8_rep()
based fw_cfg_read_blob(), so probably not worth doing.

But I simply *had* to try and chase this down before writing it off,
since part of the reason I'm doing all this in the first place is to
teach myself new tricks... :)

Sorry for going off on a somewhat lengthy (and maybe just a *little*
bit off-topic) tangent, but I figured I'd put it out there to at least
facilitate future archaeology :)

Obviously, any comments or feedback much appreciated!

Cheers,
--Gabriel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-11-04 20:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-03 23:28 [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Gabriel L. Somlo
2015-10-03 23:28 ` [PATCH v3 1/4] firmware: introduce sysfs driver for QEMU's " Gabriel L. Somlo
2015-10-04  1:34   ` kbuild test robot
2015-10-06  8:40   ` [Qemu-devel] " Stefan Hajnoczi
2015-10-06 12:53   ` Laszlo Ersek
2015-10-06 17:54   ` Andy Lutomirski
2015-10-06 18:17     ` Gabriel L. Somlo
2015-10-03 23:28 ` [PATCH v3 2/4] firmware: use acpi to detect QEMU fw_cfg device for sysfs fw_cfg driver Gabriel L. Somlo
2015-10-04  7:54   ` Michael S. Tsirkin
2015-10-04 20:24     ` Gabriel L. Somlo
2015-10-04 20:27       ` Gabriel L. Somlo
2015-10-03 23:28 ` [PATCH v3 3/4] kobject: export kset_find_obj() for module use Gabriel L. Somlo
2015-10-03 23:28 ` [PATCH v3 4/4] firmware: create directory hierarchy for sysfs fw_cfg entries Gabriel L. Somlo
2015-10-05 10:00 ` [PATCH v3 0/4] SysFS driver for QEMU fw_cfg device Mark Rutland
2015-10-05 11:48   ` Paolo Bonzini
2015-10-05 12:23     ` Mark Rutland
2015-10-05 12:43       ` Gabriel L. Somlo
2015-10-05 12:56         ` Mark Rutland
2015-10-05 13:21           ` Gabriel L. Somlo
2015-10-05 12:40   ` Gabriel L. Somlo
2015-10-05 12:50     ` Peter Maydell
2015-10-05 13:13       ` Gabriel L. Somlo
2015-10-05 13:18       ` Paolo Bonzini
2015-11-04 20:48         ` Gabriel L. Somlo
2015-10-05 13:05     ` Mark Rutland
2015-10-06  7:18       ` Laszlo Ersek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).