All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
@ 2018-03-20  3:16 Michael S. Tsirkin
  2018-03-20  3:16   ` [virtio-dev] " Michael S. Tsirkin
                   ` (50 more replies)
  0 siblings, 51 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

Changes from v1:
- dropped include change for one generated file - proposed a tree-wide refactoring
- dropped vhost used slot refactoring due to alignment issues found by clang
- added vhost-user post-copy support

The following changes since commit 026aaf47c02b79036feb830206cfebb2a726510d:

  Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' into staging (2018-03-13 16:26:44 +0000)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to a466e2cdb09d7b1262e24bae8cc47a51550d3af3:

  postcopy shared docs (2018-03-20 05:03:30 +0200)

----------------------------------------------------------------
virtio,vhost,pci,pc: features, cleanups

SRAT tables for DIMM devices
new virtio net flags for speed/duplex
post-copy migration support in vhost
cleanups in pci

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----------------------------------------------------------------
Dr. David Alan Gilbert (29):
      migrate: Update ram_block_discard_range for shared
      qemu_ram_block_host_offset
      postcopy: use UFFDIO_ZEROPAGE only when available
      postcopy: Add notifier chain
      postcopy: Add vhost-user flag for postcopy and check it
      vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
      libvhost-user: Support sending fds back to qemu
      libvhost-user: Open userfaultfd
      postcopy: Allow registering of fd handler
      vhost+postcopy: Register shared ufd with postcopy
      vhost+postcopy: Transmit 'listen' to slave
      postcopy+vhost-user: Split set_mem_table for postcopy
      migration/ram: ramblock_recv_bitmap_test_byte_offset
      libvhost-user+postcopy: Register new regions with the ufd
      vhost+postcopy: Send address back to qemu
      vhost+postcopy: Stash RAMBlock and offset
      vhost+postcopy: Helper to send requests to source for shared pages
      vhost+postcopy: Resolve client address
      postcopy: helper for waking shared
      postcopy: postcopy_notify_shared_wake
      vhost+postcopy: Add vhost waker
      vhost+postcopy: Call wakeups
      libvhost-user: mprotect & madvises for postcopy
      vhost-user: Add VHOST_USER_POSTCOPY_END message
      vhost+postcopy: Wire up POSTCOPY_END notify
      vhost: Huge page align and merge
      postcopy: Allow shared memory
      libvhost-user: Claim support for postcopy
      postcopy shared docs

Haozhong Zhang (5):
      pc-dimm: make qmp_pc_dimm_device_list() sort devices by address
      qmp: distinguish PC-DIMM and NVDIMM in MemoryDeviceInfoList
      hw/acpi-build: build SRAT memory affinity structures for DIMM devices
      tests/bios-tables-test: add test cases for DIMM proximity
      test/acpi-test-data: add ACPI tables for dimmpxm test

Igor Mammedov (10):
      acpi: remove unused acpi-dsdt.aml
      pc: replace pm object initialization with one-liner in acpi_get_pm_info()
      acpi: reuse AcpiGenericAddress instead of Acpi20GenericAddress
      acpi: add build_append_gas() helper for Generic Address Structure
      acpi: move ACPI_PORT_SMI_CMD define to header it belongs to
      pc: acpi: isolate FADT specific data into AcpiFadtData structure
      pc: acpi: use build_append_foo() API to construct FADT
      acpi: move build_fadt() from i386 specific to generic ACPI source
      virt_arm: acpi: reuse common build_fadt()
      tests: acpi: don't read all fields in test_acpi_fadt_table()

Jason Baron (3):
      scripts/update-linux-headers: add ethtool.h and update to 4.16.0-rc4
      virtio-net: use 64-bit values for feature flags
      virtio-net: add linkspeed and duplex settings to virtio-net

Michael S. Tsirkin (2):
      standard-headers: update virtio_net.h
      Makefile: add target to print generated files

Philippe Mathieu-Daudé (1):
      hw/pci: remove obsolete PCIDevice->init()

 docs/interop/vhost-user.txt                 |   52 +
 Makefile                                    |    4 +-
 qapi/misc.json                              |    6 +-
 contrib/libvhost-user/libvhost-user.h       |   11 +
 include/exec/cpu-common.h                   |    4 +
 include/hw/acpi/acpi-defs.h                 |  136 +-
 include/hw/acpi/aml-build.h                 |   23 +
 include/hw/isa/apm.h                        |    3 +
 include/hw/mem/pc-dimm.h                    |    2 +-
 include/hw/pci/pci.h                        |    1 -
 include/hw/virtio/virtio-net.h              |    5 +-
 include/standard-headers/linux/ethtool.h    | 1821 +++++++++++++++++++++++++++
 include/standard-headers/linux/kernel.h     |   15 +
 include/standard-headers/linux/sysinfo.h    |   25 +
 include/standard-headers/linux/virtio_net.h |   13 +
 migration/migration.h                       |    4 +
 migration/postcopy-ram.h                    |   73 ++
 migration/ram.h                             |    1 +
 contrib/libvhost-user/libvhost-user.c       |  302 ++++-
 exec.c                                      |   86 +-
 hmp.c                                       |   14 +-
 hw/acpi/aml-build.c                         |  140 ++
 hw/arm/virt-acpi-build.c                    |   39 +-
 hw/i386/acpi-build.c                        |  252 ++--
 hw/isa/apm.c                                |    1 -
 hw/mem/pc-dimm.c                            |   91 +-
 hw/net/virtio-net.c                         |   81 +-
 hw/pci/pci.c                                |   14 -
 hw/ppc/spapr.c                              |    3 +-
 hw/virtio/vhost-user.c                      |  411 +++++-
 hw/virtio/vhost.c                           |   66 +-
 migration/migration.c                       |    6 +
 migration/postcopy-ram.c                    |  353 +++++-
 migration/ram.c                             |    5 +
 migration/savevm.c                          |   13 +
 numa.c                                      |   23 +-
 qmp.c                                       |    7 +-
 stubs/qmp_pc_dimm.c                         |    4 +-
 tests/bios-tables-test.c                    |  120 +-
 vl.c                                        |    2 +
 docs/devel/migration.rst                    |   41 +
 hw/virtio/trace-events                      |   16 +-
 migration/trace-events                      |    6 +
 pc-bios/acpi-dsdt.aml                       |  Bin 4405 -> 0 bytes
 scripts/update-linux-headers.sh             |   11 +-
 tests/acpi-test-data/pc/APIC.dimmpxm        |  Bin 0 -> 144 bytes
 tests/acpi-test-data/pc/DSDT.dimmpxm        |  Bin 0 -> 6803 bytes
 tests/acpi-test-data/pc/NFIT.dimmpxm        |  Bin 0 -> 224 bytes
 tests/acpi-test-data/pc/SRAT.dimmpxm        |  Bin 0 -> 472 bytes
 tests/acpi-test-data/pc/SSDT.dimmpxm        |  Bin 0 -> 685 bytes
 tests/acpi-test-data/q35/APIC.dimmpxm       |  Bin 0 -> 144 bytes
 tests/acpi-test-data/q35/DSDT.dimmpxm       |  Bin 0 -> 9487 bytes
 tests/acpi-test-data/q35/NFIT.dimmpxm       |  Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/SRAT.dimmpxm       |  Bin 0 -> 472 bytes
 tests/acpi-test-data/q35/SSDT.dimmpxm       |  Bin 0 -> 685 bytes
 trace-events                                |    3 +-
 56 files changed, 3773 insertions(+), 536 deletions(-)
 create mode 100644 include/standard-headers/linux/ethtool.h
 create mode 100644 include/standard-headers/linux/kernel.h
 create mode 100644 include/standard-headers/linux/sysinfo.h
 delete mode 100644 pc-bios/acpi-dsdt.aml
 create mode 100644 tests/acpi-test-data/pc/APIC.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/DSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/NFIT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/SRAT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/SSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/APIC.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/DSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/NFIT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/SRAT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/SSDT.dimmpxm

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 01/50] scripts/update-linux-headers: add ethtool.h and update to 4.16.0-rc4
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
@ 2018-03-20  3:16   ` Michael S. Tsirkin
  2018-03-20  3:17   ` [virtio-dev] " Michael S. Tsirkin
                     ` (49 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jason Baron, Jason Wang, virtio-dev, Roman Kagan,
	Paolo Bonzini, Cornelia Huck, Yuval Shaia, Stefan Hajnoczi

From: Jason Baron <jbaron@akamai.com>

A subsequent patch to add support for setting linkspeed/duplex in
virtio-net, requires a few definitions from ethtool.h, which ends up
pulling in kernel.h and sysinfo.h as well.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
---
 include/standard-headers/linux/ethtool.h | 1821 ++++++++++++++++++++++++++++++
 include/standard-headers/linux/kernel.h  |   15 +
 include/standard-headers/linux/sysinfo.h |   25 +
 scripts/update-linux-headers.sh          |   11 +-
 4 files changed, 1871 insertions(+), 1 deletion(-)
 create mode 100644 include/standard-headers/linux/ethtool.h
 create mode 100644 include/standard-headers/linux/kernel.h
 create mode 100644 include/standard-headers/linux/sysinfo.h

diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
new file mode 100644
index 0000000..94aacb7
--- /dev/null
+++ b/include/standard-headers/linux/ethtool.h
@@ -0,0 +1,1821 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * ethtool.h: Defines for Linux ethtool.
+ *
+ * Copyright (C) 1998 David S. Miller (davem@redhat.com)
+ * Copyright 2001 Jeff Garzik <jgarzik@pobox.com>
+ * Portions Copyright 2001 Sun Microsystems (thockin@sun.com)
+ * Portions Copyright 2002 Intel (eli.kupermann@intel.com,
+ *                                christopher.leech@intel.com,
+ *                                scott.feldman@intel.com)
+ * Portions Copyright (C) Sun Microsystems 2008
+ */
+
+#ifndef _LINUX_ETHTOOL_H
+#define _LINUX_ETHTOOL_H
+
+#include "net/eth.h"
+
+#include "standard-headers/linux/kernel.h"
+#include "standard-headers/linux/types.h"
+#include "standard-headers/linux/if_ether.h"
+
+#include <limits.h> /* for INT_MAX */
+
+/* All structures exposed to userland should be defined such that they
+ * have the same layout for 32-bit and 64-bit userland.
+ */
+
+/**
+ * struct ethtool_cmd - DEPRECATED, link control and status
+ * This structure is DEPRECATED, please use struct ethtool_link_settings.
+ * @cmd: Command number = %ETHTOOL_GSET or %ETHTOOL_SSET
+ * @supported: Bitmask of %SUPPORTED_* flags for the link modes,
+ *	physical connectors and other link features for which the
+ *	interface supports autonegotiation or auto-detection.
+ *	Read-only.
+ * @advertising: Bitmask of %ADVERTISED_* flags for the link modes,
+ *	physical connectors and other link features that are
+ *	advertised through autonegotiation or enabled for
+ *	auto-detection.
+ * @speed: Low bits of the speed, 1Mb units, 0 to INT_MAX or SPEED_UNKNOWN
+ * @duplex: Duplex mode; one of %DUPLEX_*
+ * @port: Physical connector type; one of %PORT_*
+ * @phy_address: MDIO address of PHY (transceiver); 0 or 255 if not
+ *	applicable.  For clause 45 PHYs this is the PRTAD.
+ * @transceiver: Historically used to distinguish different possible
+ *	PHY types, but not in a consistent way.  Deprecated.
+ * @autoneg: Enable/disable autonegotiation and auto-detection;
+ *	either %AUTONEG_DISABLE or %AUTONEG_ENABLE
+ * @mdio_support: Bitmask of %ETH_MDIO_SUPPORTS_* flags for the MDIO
+ *	protocols supported by the interface; 0 if unknown.
+ *	Read-only.
+ * @maxtxpkt: Historically used to report TX IRQ coalescing; now
+ *	obsoleted by &struct ethtool_coalesce.  Read-only; deprecated.
+ * @maxrxpkt: Historically used to report RX IRQ coalescing; now
+ *	obsoleted by &struct ethtool_coalesce.  Read-only; deprecated.
+ * @speed_hi: High bits of the speed, 1Mb units, 0 to INT_MAX or SPEED_UNKNOWN
+ * @eth_tp_mdix: Ethernet twisted-pair MDI(-X) status; one of
+ *	%ETH_TP_MDI_*.  If the status is unknown or not applicable, the
+ *	value will be %ETH_TP_MDI_INVALID.  Read-only.
+ * @eth_tp_mdix_ctrl: Ethernet twisted pair MDI(-X) control; one of
+ *	%ETH_TP_MDI_*.  If MDI(-X) control is not implemented, reads
+ *	yield %ETH_TP_MDI_INVALID and writes may be ignored or rejected.
+ *	When written successfully, the link should be renegotiated if
+ *	necessary.
+ * @lp_advertising: Bitmask of %ADVERTISED_* flags for the link modes
+ *	and other link features that the link partner advertised
+ *	through autonegotiation; 0 if unknown or not applicable.
+ *	Read-only.
+ *
+ * The link speed in Mbps is split between @speed and @speed_hi.  Use
+ * the ethtool_cmd_speed() and ethtool_cmd_speed_set() functions to
+ * access it.
+ *
+ * If autonegotiation is disabled, the speed and @duplex represent the
+ * fixed link mode and are writable if the driver supports multiple
+ * link modes.  If it is enabled then they are read-only; if the link
+ * is up they represent the negotiated link mode; if the link is down,
+ * the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
+ * @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.
+ *
+ * Some hardware interfaces may have multiple PHYs and/or physical
+ * connectors fitted or do not allow the driver to detect which are
+ * fitted.  For these interfaces @port and/or @phy_address may be
+ * writable, possibly dependent on @autoneg being %AUTONEG_DISABLE.
+ * Otherwise, attempts to write different values may be ignored or
+ * rejected.
+ *
+ * Users should assume that all fields not marked read-only are
+ * writable and subject to validation by the driver.  They should use
+ * %ETHTOOL_GSET to get the current values before making specific
+ * changes and then applying them with %ETHTOOL_SSET.
+ *
+ * Drivers that implement set_settings() should validate all fields
+ * other than @cmd that are not described as read-only or deprecated,
+ * and must ignore all fields described as read-only.
+ *
+ * Deprecated fields should be ignored by both users and drivers.
+ */
+struct ethtool_cmd {
+	uint32_t	cmd;
+	uint32_t	supported;
+	uint32_t	advertising;
+	uint16_t	speed;
+	uint8_t	duplex;
+	uint8_t	port;
+	uint8_t	phy_address;
+	uint8_t	transceiver;
+	uint8_t	autoneg;
+	uint8_t	mdio_support;
+	uint32_t	maxtxpkt;
+	uint32_t	maxrxpkt;
+	uint16_t	speed_hi;
+	uint8_t	eth_tp_mdix;
+	uint8_t	eth_tp_mdix_ctrl;
+	uint32_t	lp_advertising;
+	uint32_t	reserved[2];
+};
+
+static inline void ethtool_cmd_speed_set(struct ethtool_cmd *ep,
+					 uint32_t speed)
+{
+	ep->speed = (uint16_t)(speed & 0xFFFF);
+	ep->speed_hi = (uint16_t)(speed >> 16);
+}
+
+static inline uint32_t ethtool_cmd_speed(const struct ethtool_cmd *ep)
+{
+	return (ep->speed_hi << 16) | ep->speed;
+}
+
+/* Device supports clause 22 register access to PHY or peripherals
+ * using the interface defined in "standard-headers/linux/mii.h".  This should not be
+ * set if there are known to be no such peripherals present or if
+ * the driver only emulates clause 22 registers for compatibility.
+ */
+#define ETH_MDIO_SUPPORTS_C22	1
+
+/* Device supports clause 45 register access to PHY or peripherals
+ * using the interface defined in "standard-headers/linux/mii.h" and <linux/mdio.h>.
+ * This should not be set if there are known to be no such peripherals
+ * present.
+ */
+#define ETH_MDIO_SUPPORTS_C45	2
+
+#define ETHTOOL_FWVERS_LEN	32
+#define ETHTOOL_BUSINFO_LEN	32
+#define ETHTOOL_EROMVERS_LEN	32
+
+/**
+ * struct ethtool_drvinfo - general driver and device information
+ * @cmd: Command number = %ETHTOOL_GDRVINFO
+ * @driver: Driver short name.  This should normally match the name
+ *	in its bus driver structure (e.g. pci_driver::name).  Must
+ *	not be an empty string.
+ * @version: Driver version string; may be an empty string
+ * @fw_version: Firmware version string; may be an empty string
+ * @erom_version: Expansion ROM version string; may be an empty string
+ * @bus_info: Device bus address.  This should match the dev_name()
+ *	string for the underlying bus device, if there is one.  May be
+ *	an empty string.
+ * @n_priv_flags: Number of flags valid for %ETHTOOL_GPFLAGS and
+ *	%ETHTOOL_SPFLAGS commands; also the number of strings in the
+ *	%ETH_SS_PRIV_FLAGS set
+ * @n_stats: Number of uint64_t statistics returned by the %ETHTOOL_GSTATS
+ *	command; also the number of strings in the %ETH_SS_STATS set
+ * @testinfo_len: Number of results returned by the %ETHTOOL_TEST
+ *	command; also the number of strings in the %ETH_SS_TEST set
+ * @eedump_len: Size of EEPROM accessible through the %ETHTOOL_GEEPROM
+ *	and %ETHTOOL_SEEPROM commands, in bytes
+ * @regdump_len: Size of register dump returned by the %ETHTOOL_GREGS
+ *	command, in bytes
+ *
+ * Users can use the %ETHTOOL_GSSET_INFO command to get the number of
+ * strings in any string set (from Linux 2.6.34).
+ *
+ * Drivers should set at most @driver, @version, @fw_version and
+ * @bus_info in their get_drvinfo() implementation.  The ethtool
+ * core fills in the other fields using other driver operations.
+ */
+struct ethtool_drvinfo {
+	uint32_t	cmd;
+	char	driver[32];
+	char	version[32];
+	char	fw_version[ETHTOOL_FWVERS_LEN];
+	char	bus_info[ETHTOOL_BUSINFO_LEN];
+	char	erom_version[ETHTOOL_EROMVERS_LEN];
+	char	reserved2[12];
+	uint32_t	n_priv_flags;
+	uint32_t	n_stats;
+	uint32_t	testinfo_len;
+	uint32_t	eedump_len;
+	uint32_t	regdump_len;
+};
+
+#define SOPASS_MAX	6
+
+/**
+ * struct ethtool_wolinfo - Wake-On-Lan configuration
+ * @cmd: Command number = %ETHTOOL_GWOL or %ETHTOOL_SWOL
+ * @supported: Bitmask of %WAKE_* flags for supported Wake-On-Lan modes.
+ *	Read-only.
+ * @wolopts: Bitmask of %WAKE_* flags for enabled Wake-On-Lan modes.
+ * @sopass: SecureOn(tm) password; meaningful only if %WAKE_MAGICSECURE
+ *	is set in @wolopts.
+ */
+struct ethtool_wolinfo {
+	uint32_t	cmd;
+	uint32_t	supported;
+	uint32_t	wolopts;
+	uint8_t	sopass[SOPASS_MAX];
+};
+
+/* for passing single values */
+struct ethtool_value {
+	uint32_t	cmd;
+	uint32_t	data;
+};
+
+enum tunable_id {
+	ETHTOOL_ID_UNSPEC,
+	ETHTOOL_RX_COPYBREAK,
+	ETHTOOL_TX_COPYBREAK,
+	/*
+	 * Add your fresh new tubale attribute above and remember to update
+	 * tunable_strings[] in net/core/ethtool.c
+	 */
+	__ETHTOOL_TUNABLE_COUNT,
+};
+
+enum tunable_type_id {
+	ETHTOOL_TUNABLE_UNSPEC,
+	ETHTOOL_TUNABLE_U8,
+	ETHTOOL_TUNABLE_U16,
+	ETHTOOL_TUNABLE_U32,
+	ETHTOOL_TUNABLE_U64,
+	ETHTOOL_TUNABLE_STRING,
+	ETHTOOL_TUNABLE_S8,
+	ETHTOOL_TUNABLE_S16,
+	ETHTOOL_TUNABLE_S32,
+	ETHTOOL_TUNABLE_S64,
+};
+
+struct ethtool_tunable {
+	uint32_t	cmd;
+	uint32_t	id;
+	uint32_t	type_id;
+	uint32_t	len;
+	void	*data[0];
+};
+
+#define DOWNSHIFT_DEV_DEFAULT_COUNT	0xff
+#define DOWNSHIFT_DEV_DISABLE		0
+
+enum phy_tunable_id {
+	ETHTOOL_PHY_ID_UNSPEC,
+	ETHTOOL_PHY_DOWNSHIFT,
+	/*
+	 * Add your fresh new phy tunable attribute above and remember to update
+	 * phy_tunable_strings[] in net/core/ethtool.c
+	 */
+	__ETHTOOL_PHY_TUNABLE_COUNT,
+};
+
+/**
+ * struct ethtool_regs - hardware register dump
+ * @cmd: Command number = %ETHTOOL_GREGS
+ * @version: Dump format version.  This is driver-specific and may
+ *	distinguish different chips/revisions.  Drivers must use new
+ *	version numbers whenever the dump format changes in an
+ *	incompatible way.
+ * @len: On entry, the real length of @data.  On return, the number of
+ *	bytes used.
+ * @data: Buffer for the register dump
+ *
+ * Users should use %ETHTOOL_GDRVINFO to find the maximum length of
+ * a register dump for the interface.  They must allocate the buffer
+ * immediately following this structure.
+ */
+struct ethtool_regs {
+	uint32_t	cmd;
+	uint32_t	version;
+	uint32_t	len;
+	uint8_t	data[0];
+};
+
+/**
+ * struct ethtool_eeprom - EEPROM dump
+ * @cmd: Command number = %ETHTOOL_GEEPROM, %ETHTOOL_GMODULEEEPROM or
+ *	%ETHTOOL_SEEPROM
+ * @magic: A 'magic cookie' value to guard against accidental changes.
+ *	The value passed in to %ETHTOOL_SEEPROM must match the value
+ *	returned by %ETHTOOL_GEEPROM for the same device.  This is
+ *	unused when @cmd is %ETHTOOL_GMODULEEEPROM.
+ * @offset: Offset within the EEPROM to begin reading/writing, in bytes
+ * @len: On entry, number of bytes to read/write.  On successful
+ *	return, number of bytes actually read/written.  In case of
+ *	error, this may indicate at what point the error occurred.
+ * @data: Buffer to read/write from
+ *
+ * Users may use %ETHTOOL_GDRVINFO or %ETHTOOL_GMODULEINFO to find
+ * the length of an on-board or module EEPROM, respectively.  They
+ * must allocate the buffer immediately following this structure.
+ */
+struct ethtool_eeprom {
+	uint32_t	cmd;
+	uint32_t	magic;
+	uint32_t	offset;
+	uint32_t	len;
+	uint8_t	data[0];
+};
+
+/**
+ * struct ethtool_eee - Energy Efficient Ethernet information
+ * @cmd: ETHTOOL_{G,S}EEE
+ * @supported: Mask of %SUPPORTED_* flags for the speed/duplex combinations
+ *	for which there is EEE support.
+ * @advertised: Mask of %ADVERTISED_* flags for the speed/duplex combinations
+ *	advertised as eee capable.
+ * @lp_advertised: Mask of %ADVERTISED_* flags for the speed/duplex
+ *	combinations advertised by the link partner as eee capable.
+ * @eee_active: Result of the eee auto negotiation.
+ * @eee_enabled: EEE configured mode (enabled/disabled).
+ * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given
+ *	that eee was negotiated.
+ * @tx_lpi_timer: Time in microseconds the interface delays prior to asserting
+ *	its tx lpi (after reaching 'idle' state). Effective only when eee
+ *	was negotiated and tx_lpi_enabled was set.
+ */
+struct ethtool_eee {
+	uint32_t	cmd;
+	uint32_t	supported;
+	uint32_t	advertised;
+	uint32_t	lp_advertised;
+	uint32_t	eee_active;
+	uint32_t	eee_enabled;
+	uint32_t	tx_lpi_enabled;
+	uint32_t	tx_lpi_timer;
+	uint32_t	reserved[2];
+};
+
+/**
+ * struct ethtool_modinfo - plugin module eeprom information
+ * @cmd: %ETHTOOL_GMODULEINFO
+ * @type: Standard the module information conforms to %ETH_MODULE_SFF_xxxx
+ * @eeprom_len: Length of the eeprom
+ *
+ * This structure is used to return the information to
+ * properly size memory for a subsequent call to %ETHTOOL_GMODULEEEPROM.
+ * The type code indicates the eeprom data format
+ */
+struct ethtool_modinfo {
+	uint32_t   cmd;
+	uint32_t   type;
+	uint32_t   eeprom_len;
+	uint32_t   reserved[8];
+};
+
+/**
+ * struct ethtool_coalesce - coalescing parameters for IRQs and stats updates
+ * @cmd: ETHTOOL_{G,S}COALESCE
+ * @rx_coalesce_usecs: How many usecs to delay an RX interrupt after
+ *	a packet arrives.
+ * @rx_max_coalesced_frames: Maximum number of packets to receive
+ *	before an RX interrupt.
+ * @rx_coalesce_usecs_irq: Same as @rx_coalesce_usecs, except that
+ *	this value applies while an IRQ is being serviced by the host.
+ * @rx_max_coalesced_frames_irq: Same as @rx_max_coalesced_frames,
+ *	except that this value applies while an IRQ is being serviced
+ *	by the host.
+ * @tx_coalesce_usecs: How many usecs to delay a TX interrupt after
+ *	a packet is sent.
+ * @tx_max_coalesced_frames: Maximum number of packets to be sent
+ *	before a TX interrupt.
+ * @tx_coalesce_usecs_irq: Same as @tx_coalesce_usecs, except that
+ *	this value applies while an IRQ is being serviced by the host.
+ * @tx_max_coalesced_frames_irq: Same as @tx_max_coalesced_frames,
+ *	except that this value applies while an IRQ is being serviced
+ *	by the host.
+ * @stats_block_coalesce_usecs: How many usecs to delay in-memory
+ *	statistics block updates.  Some drivers do not have an
+ *	in-memory statistic block, and in such cases this value is
+ *	ignored.  This value must not be zero.
+ * @use_adaptive_rx_coalesce: Enable adaptive RX coalescing.
+ * @use_adaptive_tx_coalesce: Enable adaptive TX coalescing.
+ * @pkt_rate_low: Threshold for low packet rate (packets per second).
+ * @rx_coalesce_usecs_low: How many usecs to delay an RX interrupt after
+ *	a packet arrives, when the packet rate is below @pkt_rate_low.
+ * @rx_max_coalesced_frames_low: Maximum number of packets to be received
+ *	before an RX interrupt, when the packet rate is below @pkt_rate_low.
+ * @tx_coalesce_usecs_low: How many usecs to delay a TX interrupt after
+ *	a packet is sent, when the packet rate is below @pkt_rate_low.
+ * @tx_max_coalesced_frames_low: Maximum nuumber of packets to be sent before
+ *	a TX interrupt, when the packet rate is below @pkt_rate_low.
+ * @pkt_rate_high: Threshold for high packet rate (packets per second).
+ * @rx_coalesce_usecs_high: How many usecs to delay an RX interrupt after
+ *	a packet arrives, when the packet rate is above @pkt_rate_high.
+ * @rx_max_coalesced_frames_high: Maximum number of packets to be received
+ *	before an RX interrupt, when the packet rate is above @pkt_rate_high.
+ * @tx_coalesce_usecs_high: How many usecs to delay a TX interrupt after
+ *	a packet is sent, when the packet rate is above @pkt_rate_high.
+ * @tx_max_coalesced_frames_high: Maximum number of packets to be sent before
+ *	a TX interrupt, when the packet rate is above @pkt_rate_high.
+ * @rate_sample_interval: How often to do adaptive coalescing packet rate
+ *	sampling, measured in seconds.  Must not be zero.
+ *
+ * Each pair of (usecs, max_frames) fields specifies that interrupts
+ * should be coalesced until
+ *	(usecs > 0 && time_since_first_completion >= usecs) ||
+ *	(max_frames > 0 && completed_frames >= max_frames)
+ *
+ * It is illegal to set both usecs and max_frames to zero as this
+ * would cause interrupts to never be generated.  To disable
+ * coalescing, set usecs = 0 and max_frames = 1.
+ *
+ * Some implementations ignore the value of max_frames and use the
+ * condition time_since_first_completion >= usecs
+ *
+ * This is deprecated.  Drivers for hardware that does not support
+ * counting completions should validate that max_frames == !rx_usecs.
+ *
+ * Adaptive RX/TX coalescing is an algorithm implemented by some
+ * drivers to improve latency under low packet rates and improve
+ * throughput under high packet rates.  Some drivers only implement
+ * one of RX or TX adaptive coalescing.  Anything not implemented by
+ * the driver causes these values to be silently ignored.
+ *
+ * When the packet rate is below @pkt_rate_high but above
+ * @pkt_rate_low (both measured in packets per second) the
+ * normal {rx,tx}_* coalescing parameters are used.
+ */
+struct ethtool_coalesce {
+	uint32_t	cmd;
+	uint32_t	rx_coalesce_usecs;
+	uint32_t	rx_max_coalesced_frames;
+	uint32_t	rx_coalesce_usecs_irq;
+	uint32_t	rx_max_coalesced_frames_irq;
+	uint32_t	tx_coalesce_usecs;
+	uint32_t	tx_max_coalesced_frames;
+	uint32_t	tx_coalesce_usecs_irq;
+	uint32_t	tx_max_coalesced_frames_irq;
+	uint32_t	stats_block_coalesce_usecs;
+	uint32_t	use_adaptive_rx_coalesce;
+	uint32_t	use_adaptive_tx_coalesce;
+	uint32_t	pkt_rate_low;
+	uint32_t	rx_coalesce_usecs_low;
+	uint32_t	rx_max_coalesced_frames_low;
+	uint32_t	tx_coalesce_usecs_low;
+	uint32_t	tx_max_coalesced_frames_low;
+	uint32_t	pkt_rate_high;
+	uint32_t	rx_coalesce_usecs_high;
+	uint32_t	rx_max_coalesced_frames_high;
+	uint32_t	tx_coalesce_usecs_high;
+	uint32_t	tx_max_coalesced_frames_high;
+	uint32_t	rate_sample_interval;
+};
+
+/**
+ * struct ethtool_ringparam - RX/TX ring parameters
+ * @cmd: Command number = %ETHTOOL_GRINGPARAM or %ETHTOOL_SRINGPARAM
+ * @rx_max_pending: Maximum supported number of pending entries per
+ *	RX ring.  Read-only.
+ * @rx_mini_max_pending: Maximum supported number of pending entries
+ *	per RX mini ring.  Read-only.
+ * @rx_jumbo_max_pending: Maximum supported number of pending entries
+ *	per RX jumbo ring.  Read-only.
+ * @tx_max_pending: Maximum supported number of pending entries per
+ *	TX ring.  Read-only.
+ * @rx_pending: Current maximum number of pending entries per RX ring
+ * @rx_mini_pending: Current maximum number of pending entries per RX
+ *	mini ring
+ * @rx_jumbo_pending: Current maximum number of pending entries per RX
+ *	jumbo ring
+ * @tx_pending: Current maximum supported number of pending entries
+ *	per TX ring
+ *
+ * If the interface does not have separate RX mini and/or jumbo rings,
+ * @rx_mini_max_pending and/or @rx_jumbo_max_pending will be 0.
+ *
+ * There may also be driver-dependent minimum values for the number
+ * of entries per ring.
+ */
+struct ethtool_ringparam {
+	uint32_t	cmd;
+	uint32_t	rx_max_pending;
+	uint32_t	rx_mini_max_pending;
+	uint32_t	rx_jumbo_max_pending;
+	uint32_t	tx_max_pending;
+	uint32_t	rx_pending;
+	uint32_t	rx_mini_pending;
+	uint32_t	rx_jumbo_pending;
+	uint32_t	tx_pending;
+};
+
+/**
+ * struct ethtool_channels - configuring number of network channel
+ * @cmd: ETHTOOL_{G,S}CHANNELS
+ * @max_rx: Read only. Maximum number of receive channel the driver support.
+ * @max_tx: Read only. Maximum number of transmit channel the driver support.
+ * @max_other: Read only. Maximum number of other channel the driver support.
+ * @max_combined: Read only. Maximum number of combined channel the driver
+ *	support. Set of queues RX, TX or other.
+ * @rx_count: Valid values are in the range 1 to the max_rx.
+ * @tx_count: Valid values are in the range 1 to the max_tx.
+ * @other_count: Valid values are in the range 1 to the max_other.
+ * @combined_count: Valid values are in the range 1 to the max_combined.
+ *
+ * This can be used to configure RX, TX and other channels.
+ */
+
+struct ethtool_channels {
+	uint32_t	cmd;
+	uint32_t	max_rx;
+	uint32_t	max_tx;
+	uint32_t	max_other;
+	uint32_t	max_combined;
+	uint32_t	rx_count;
+	uint32_t	tx_count;
+	uint32_t	other_count;
+	uint32_t	combined_count;
+};
+
+/**
+ * struct ethtool_pauseparam - Ethernet pause (flow control) parameters
+ * @cmd: Command number = %ETHTOOL_GPAUSEPARAM or %ETHTOOL_SPAUSEPARAM
+ * @autoneg: Flag to enable autonegotiation of pause frame use
+ * @rx_pause: Flag to enable reception of pause frames
+ * @tx_pause: Flag to enable transmission of pause frames
+ *
+ * Drivers should reject a non-zero setting of @autoneg when
+ * autoneogotiation is disabled (or not supported) for the link.
+ *
+ * If the link is autonegotiated, drivers should use
+ * mii_advertise_flowctrl() or similar code to set the advertised
+ * pause frame capabilities based on the @rx_pause and @tx_pause flags,
+ * even if @autoneg is zero.  They should also allow the advertised
+ * pause frame capabilities to be controlled directly through the
+ * advertising field of &struct ethtool_cmd.
+ *
+ * If @autoneg is non-zero, the MAC is configured to send and/or
+ * receive pause frames according to the result of autonegotiation.
+ * Otherwise, it is configured directly based on the @rx_pause and
+ * @tx_pause flags.
+ */
+struct ethtool_pauseparam {
+	uint32_t	cmd;
+	uint32_t	autoneg;
+	uint32_t	rx_pause;
+	uint32_t	tx_pause;
+};
+
+#define ETH_GSTRING_LEN		32
+
+/**
+ * enum ethtool_stringset - string set ID
+ * @ETH_SS_TEST: Self-test result names, for use with %ETHTOOL_TEST
+ * @ETH_SS_STATS: Statistic names, for use with %ETHTOOL_GSTATS
+ * @ETH_SS_PRIV_FLAGS: Driver private flag names, for use with
+ *	%ETHTOOL_GPFLAGS and %ETHTOOL_SPFLAGS
+ * @ETH_SS_NTUPLE_FILTERS: Previously used with %ETHTOOL_GRXNTUPLE;
+ *	now deprecated
+ * @ETH_SS_FEATURES: Device feature names
+ * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
+ * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS
+ * @ETH_SS_PHY_TUNABLES: PHY tunable names
+ */
+enum ethtool_stringset {
+	ETH_SS_TEST		= 0,
+	ETH_SS_STATS,
+	ETH_SS_PRIV_FLAGS,
+	ETH_SS_NTUPLE_FILTERS,
+	ETH_SS_FEATURES,
+	ETH_SS_RSS_HASH_FUNCS,
+	ETH_SS_TUNABLES,
+	ETH_SS_PHY_STATS,
+	ETH_SS_PHY_TUNABLES,
+};
+
+/**
+ * struct ethtool_gstrings - string set for data tagging
+ * @cmd: Command number = %ETHTOOL_GSTRINGS
+ * @string_set: String set ID; one of &enum ethtool_stringset
+ * @len: On return, the number of strings in the string set
+ * @data: Buffer for strings.  Each string is null-padded to a size of
+ *	%ETH_GSTRING_LEN.
+ *
+ * Users must use %ETHTOOL_GSSET_INFO to find the number of strings in
+ * the string set.  They must allocate a buffer of the appropriate
+ * size immediately following this structure.
+ */
+struct ethtool_gstrings {
+	uint32_t	cmd;
+	uint32_t	string_set;
+	uint32_t	len;
+	uint8_t	data[0];
+};
+
+/**
+ * struct ethtool_sset_info - string set information
+ * @cmd: Command number = %ETHTOOL_GSSET_INFO
+ * @sset_mask: On entry, a bitmask of string sets to query, with bits
+ *	numbered according to &enum ethtool_stringset.  On return, a
+ *	bitmask of those string sets queried that are supported.
+ * @data: Buffer for string set sizes.  On return, this contains the
+ *	size of each string set that was queried and supported, in
+ *	order of ID.
+ *
+ * Example: The user passes in @sset_mask = 0x7 (sets 0, 1, 2) and on
+ * return @sset_mask == 0x6 (sets 1, 2).  Then @data[0] contains the
+ * size of set 1 and @data[1] contains the size of set 2.
+ *
+ * Users must allocate a buffer of the appropriate size (4 * number of
+ * sets queried) immediately following this structure.
+ */
+struct ethtool_sset_info {
+	uint32_t	cmd;
+	uint32_t	reserved;
+	uint64_t	sset_mask;
+	uint32_t	data[0];
+};
+
+/**
+ * enum ethtool_test_flags - flags definition of ethtool_test
+ * @ETH_TEST_FL_OFFLINE: if set perform online and offline tests, otherwise
+ *	only online tests.
+ * @ETH_TEST_FL_FAILED: Driver set this flag if test fails.
+ * @ETH_TEST_FL_EXTERNAL_LB: Application request to perform external loopback
+ *	test.
+ * @ETH_TEST_FL_EXTERNAL_LB_DONE: Driver performed the external loopback test
+ */
+
+enum ethtool_test_flags {
+	ETH_TEST_FL_OFFLINE	= (1 << 0),
+	ETH_TEST_FL_FAILED	= (1 << 1),
+	ETH_TEST_FL_EXTERNAL_LB	= (1 << 2),
+	ETH_TEST_FL_EXTERNAL_LB_DONE	= (1 << 3),
+};
+
+/**
+ * struct ethtool_test - device self-test invocation
+ * @cmd: Command number = %ETHTOOL_TEST
+ * @flags: A bitmask of flags from &enum ethtool_test_flags.  Some
+ *	flags may be set by the user on entry; others may be set by
+ *	the driver on return.
+ * @len: On return, the number of test results
+ * @data: Array of test results
+ *
+ * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the
+ * number of test results that will be returned.  They must allocate a
+ * buffer of the appropriate size (8 * number of results) immediately
+ * following this structure.
+ */
+struct ethtool_test {
+	uint32_t	cmd;
+	uint32_t	flags;
+	uint32_t	reserved;
+	uint32_t	len;
+	uint64_t	data[0];
+};
+
+/**
+ * struct ethtool_stats - device-specific statistics
+ * @cmd: Command number = %ETHTOOL_GSTATS
+ * @n_stats: On return, the number of statistics
+ * @data: Array of statistics
+ *
+ * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the
+ * number of statistics that will be returned.  They must allocate a
+ * buffer of the appropriate size (8 * number of statistics)
+ * immediately following this structure.
+ */
+struct ethtool_stats {
+	uint32_t	cmd;
+	uint32_t	n_stats;
+	uint64_t	data[0];
+};
+
+/**
+ * struct ethtool_perm_addr - permanent hardware address
+ * @cmd: Command number = %ETHTOOL_GPERMADDR
+ * @size: On entry, the size of the buffer.  On return, the size of the
+ *	address.  The command fails if the buffer is too small.
+ * @data: Buffer for the address
+ *
+ * Users must allocate the buffer immediately following this structure.
+ * A buffer size of %MAX_ADDR_LEN should be sufficient for any address
+ * type.
+ */
+struct ethtool_perm_addr {
+	uint32_t	cmd;
+	uint32_t	size;
+	uint8_t	data[0];
+};
+
+/* boolean flags controlling per-interface behavior characteristics.
+ * When reading, the flag indicates whether or not a certain behavior
+ * is enabled/present.  When writing, the flag indicates whether
+ * or not the driver should turn on (set) or off (clear) a behavior.
+ *
+ * Some behaviors may read-only (unconditionally absent or present).
+ * If such is the case, return EINVAL in the set-flags operation if the
+ * flag differs from the read-only value.
+ */
+enum ethtool_flags {
+	ETH_FLAG_TXVLAN		= (1 << 7),	/* TX VLAN offload enabled */
+	ETH_FLAG_RXVLAN		= (1 << 8),	/* RX VLAN offload enabled */
+	ETH_FLAG_LRO		= (1 << 15),	/* LRO is enabled */
+	ETH_FLAG_NTUPLE		= (1 << 27),	/* N-tuple filters enabled */
+	ETH_FLAG_RXHASH		= (1 << 28),
+};
+
+/* The following structures are for supporting RX network flow
+ * classification and RX n-tuple configuration. Note, all multibyte
+ * fields, e.g., ip4src, ip4dst, psrc, pdst, spi, etc. are expected to
+ * be in network byte order.
+ */
+
+/**
+ * struct ethtool_tcpip4_spec - flow specification for TCP/IPv4 etc.
+ * @ip4src: Source host
+ * @ip4dst: Destination host
+ * @psrc: Source port
+ * @pdst: Destination port
+ * @tos: Type-of-service
+ *
+ * This can be used to specify a TCP/IPv4, UDP/IPv4 or SCTP/IPv4 flow.
+ */
+struct ethtool_tcpip4_spec {
+	uint32_t	ip4src;
+	uint32_t	ip4dst;
+	uint16_t	psrc;
+	uint16_t	pdst;
+	uint8_t    tos;
+};
+
+/**
+ * struct ethtool_ah_espip4_spec - flow specification for IPsec/IPv4
+ * @ip4src: Source host
+ * @ip4dst: Destination host
+ * @spi: Security parameters index
+ * @tos: Type-of-service
+ *
+ * This can be used to specify an IPsec transport or tunnel over IPv4.
+ */
+struct ethtool_ah_espip4_spec {
+	uint32_t	ip4src;
+	uint32_t	ip4dst;
+	uint32_t	spi;
+	uint8_t    tos;
+};
+
+#define	ETH_RX_NFC_IP4	1
+
+/**
+ * struct ethtool_usrip4_spec - general flow specification for IPv4
+ * @ip4src: Source host
+ * @ip4dst: Destination host
+ * @l4_4_bytes: First 4 bytes of transport (layer 4) header
+ * @tos: Type-of-service
+ * @ip_ver: Value must be %ETH_RX_NFC_IP4; mask must be 0
+ * @proto: Transport protocol number; mask must be 0
+ */
+struct ethtool_usrip4_spec {
+	uint32_t	ip4src;
+	uint32_t	ip4dst;
+	uint32_t	l4_4_bytes;
+	uint8_t    tos;
+	uint8_t    ip_ver;
+	uint8_t    proto;
+};
+
+/**
+ * struct ethtool_tcpip6_spec - flow specification for TCP/IPv6 etc.
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @psrc: Source port
+ * @pdst: Destination port
+ * @tclass: Traffic Class
+ *
+ * This can be used to specify a TCP/IPv6, UDP/IPv6 or SCTP/IPv6 flow.
+ */
+struct ethtool_tcpip6_spec {
+	uint32_t	ip6src[4];
+	uint32_t	ip6dst[4];
+	uint16_t	psrc;
+	uint16_t	pdst;
+	uint8_t    tclass;
+};
+
+/**
+ * struct ethtool_ah_espip6_spec - flow specification for IPsec/IPv6
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @spi: Security parameters index
+ * @tclass: Traffic Class
+ *
+ * This can be used to specify an IPsec transport or tunnel over IPv6.
+ */
+struct ethtool_ah_espip6_spec {
+	uint32_t	ip6src[4];
+	uint32_t	ip6dst[4];
+	uint32_t	spi;
+	uint8_t    tclass;
+};
+
+/**
+ * struct ethtool_usrip6_spec - general flow specification for IPv6
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @l4_4_bytes: First 4 bytes of transport (layer 4) header
+ * @tclass: Traffic Class
+ * @l4_proto: Transport protocol number (nexthdr after any Extension Headers)
+ */
+struct ethtool_usrip6_spec {
+	uint32_t	ip6src[4];
+	uint32_t	ip6dst[4];
+	uint32_t	l4_4_bytes;
+	uint8_t    tclass;
+	uint8_t    l4_proto;
+};
+
+union ethtool_flow_union {
+	struct ethtool_tcpip4_spec		tcp_ip4_spec;
+	struct ethtool_tcpip4_spec		udp_ip4_spec;
+	struct ethtool_tcpip4_spec		sctp_ip4_spec;
+	struct ethtool_ah_espip4_spec		ah_ip4_spec;
+	struct ethtool_ah_espip4_spec		esp_ip4_spec;
+	struct ethtool_usrip4_spec		usr_ip4_spec;
+	struct ethtool_tcpip6_spec		tcp_ip6_spec;
+	struct ethtool_tcpip6_spec		udp_ip6_spec;
+	struct ethtool_tcpip6_spec		sctp_ip6_spec;
+	struct ethtool_ah_espip6_spec		ah_ip6_spec;
+	struct ethtool_ah_espip6_spec		esp_ip6_spec;
+	struct ethtool_usrip6_spec		usr_ip6_spec;
+	struct eth_header				ether_spec;
+	uint8_t					hdata[52];
+};
+
+/**
+ * struct ethtool_flow_ext - additional RX flow fields
+ * @h_dest: destination MAC address
+ * @vlan_etype: VLAN EtherType
+ * @vlan_tci: VLAN tag control information
+ * @data: user defined data
+ *
+ * Note, @vlan_etype, @vlan_tci, and @data are only valid if %FLOW_EXT
+ * is set in &struct ethtool_rx_flow_spec @flow_type.
+ * @h_dest is valid if %FLOW_MAC_EXT is set.
+ */
+struct ethtool_flow_ext {
+	uint8_t		padding[2];
+	unsigned char	h_dest[ETH_ALEN];
+	uint16_t		vlan_etype;
+	uint16_t		vlan_tci;
+	uint32_t		data[2];
+};
+
+/**
+ * struct ethtool_rx_flow_spec - classification rule for RX flows
+ * @flow_type: Type of match to perform, e.g. %TCP_V4_FLOW
+ * @h_u: Flow fields to match (dependent on @flow_type)
+ * @h_ext: Additional fields to match
+ * @m_u: Masks for flow field bits to be matched
+ * @m_ext: Masks for additional field bits to be matched
+ *	Note, all additional fields must be ignored unless @flow_type
+ *	includes the %FLOW_EXT or %FLOW_MAC_EXT flag
+ *	(see &struct ethtool_flow_ext description).
+ * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
+ *	if packets should be discarded
+ * @location: Location of rule in the table.  Locations must be
+ *	numbered such that a flow matching multiple rules will be
+ *	classified according to the first (lowest numbered) rule.
+ */
+struct ethtool_rx_flow_spec {
+	uint32_t		flow_type;
+	union ethtool_flow_union h_u;
+	struct ethtool_flow_ext h_ext;
+	union ethtool_flow_union m_u;
+	struct ethtool_flow_ext m_ext;
+	uint64_t		ring_cookie;
+	uint32_t		location;
+};
+
+/* How rings are layed out when accessing virtual functions or
+ * offloaded queues is device specific. To allow users to do flow
+ * steering and specify these queues the ring cookie is partitioned
+ * into a 32bit queue index with an 8 bit virtual function id.
+ * This also leaves the 3bytes for further specifiers. It is possible
+ * future devices may support more than 256 virtual functions if
+ * devices start supporting PCIe w/ARI. However at the moment I
+ * do not know of any devices that support this so I do not reserve
+ * space for this at this time. If a future patch consumes the next
+ * byte it should be aware of this possiblity.
+ */
+#define ETHTOOL_RX_FLOW_SPEC_RING	0x00000000FFFFFFFFLL
+#define ETHTOOL_RX_FLOW_SPEC_RING_VF	0x000000FF00000000LL
+#define ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF 32
+static inline uint64_t ethtool_get_flow_spec_ring(uint64_t ring_cookie)
+{
+	return ETHTOOL_RX_FLOW_SPEC_RING & ring_cookie;
+};
+
+static inline uint64_t ethtool_get_flow_spec_ring_vf(uint64_t ring_cookie)
+{
+	return (ETHTOOL_RX_FLOW_SPEC_RING_VF & ring_cookie) >>
+				ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
+};
+
+/**
+ * struct ethtool_rxnfc - command to get or set RX flow classification rules
+ * @cmd: Specific command number - %ETHTOOL_GRXFH, %ETHTOOL_SRXFH,
+ *	%ETHTOOL_GRXRINGS, %ETHTOOL_GRXCLSRLCNT, %ETHTOOL_GRXCLSRULE,
+ *	%ETHTOOL_GRXCLSRLALL, %ETHTOOL_SRXCLSRLDEL or %ETHTOOL_SRXCLSRLINS
+ * @flow_type: Type of flow to be affected, e.g. %TCP_V4_FLOW
+ * @data: Command-dependent value
+ * @fs: Flow classification rule
+ * @rule_cnt: Number of rules to be affected
+ * @rule_locs: Array of used rule locations
+ *
+ * For %ETHTOOL_GRXFH and %ETHTOOL_SRXFH, @data is a bitmask indicating
+ * the fields included in the flow hash, e.g. %RXH_IP_SRC.  The following
+ * structure fields must not be used.
+ *
+ * For %ETHTOOL_GRXRINGS, @data is set to the number of RX rings/queues
+ * on return.
+ *
+ * For %ETHTOOL_GRXCLSRLCNT, @rule_cnt is set to the number of defined
+ * rules on return.  If @data is non-zero on return then it is the
+ * size of the rule table, plus the flag %RX_CLS_LOC_SPECIAL if the
+ * driver supports any special location values.  If that flag is not
+ * set in @data then special location values should not be used.
+ *
+ * For %ETHTOOL_GRXCLSRULE, @fs.@location specifies the location of an
+ * existing rule on entry and @fs contains the rule on return.
+ *
+ * For %ETHTOOL_GRXCLSRLALL, @rule_cnt specifies the array size of the
+ * user buffer for @rule_locs on entry.  On return, @data is the size
+ * of the rule table, @rule_cnt is the number of defined rules, and
+ * @rule_locs contains the locations of the defined rules.  Drivers
+ * must use the second parameter to get_rxnfc() instead of @rule_locs.
+ *
+ * For %ETHTOOL_SRXCLSRLINS, @fs specifies the rule to add or update.
+ * @fs.@location either specifies the location to use or is a special
+ * location value with %RX_CLS_LOC_SPECIAL flag set.  On return,
+ * @fs.@location is the actual rule location.
+ *
+ * For %ETHTOOL_SRXCLSRLDEL, @fs.@location specifies the location of an
+ * existing rule on entry.
+ *
+ * A driver supporting the special location values for
+ * %ETHTOOL_SRXCLSRLINS may add the rule at any suitable unused
+ * location, and may remove a rule at a later location (lower
+ * priority) that matches exactly the same set of flows.  The special
+ * values are %RX_CLS_LOC_ANY, selecting any location;
+ * %RX_CLS_LOC_FIRST, selecting the first suitable location (maximum
+ * priority); and %RX_CLS_LOC_LAST, selecting the last suitable
+ * location (minimum priority).  Additional special values may be
+ * defined in future and drivers must return -%EINVAL for any
+ * unrecognised value.
+ */
+struct ethtool_rxnfc {
+	uint32_t				cmd;
+	uint32_t				flow_type;
+	uint64_t				data;
+	struct ethtool_rx_flow_spec	fs;
+	uint32_t				rule_cnt;
+	uint32_t				rule_locs[0];
+};
+
+
+/**
+ * struct ethtool_rxfh_indir - command to get or set RX flow hash indirection
+ * @cmd: Specific command number - %ETHTOOL_GRXFHINDIR or %ETHTOOL_SRXFHINDIR
+ * @size: On entry, the array size of the user buffer, which may be zero.
+ *	On return from %ETHTOOL_GRXFHINDIR, the array size of the hardware
+ *	indirection table.
+ * @ring_index: RX ring/queue index for each hash value
+ *
+ * For %ETHTOOL_GRXFHINDIR, a @size of zero means that only the size
+ * should be returned.  For %ETHTOOL_SRXFHINDIR, a @size of zero means
+ * the table should be reset to default values.  This last feature
+ * is not supported by the original implementations.
+ */
+struct ethtool_rxfh_indir {
+	uint32_t	cmd;
+	uint32_t	size;
+	uint32_t	ring_index[0];
+};
+
+/**
+ * struct ethtool_rxfh - command to get/set RX flow hash indir or/and hash key.
+ * @cmd: Specific command number - %ETHTOOL_GRSSH or %ETHTOOL_SRSSH
+ * @rss_context: RSS context identifier.
+ * @indir_size: On entry, the array size of the user buffer for the
+ *	indirection table, which may be zero, or (for %ETHTOOL_SRSSH),
+ *	%ETH_RXFH_INDIR_NO_CHANGE.  On return from %ETHTOOL_GRSSH,
+ *	the array size of the hardware indirection table.
+ * @key_size: On entry, the array size of the user buffer for the hash key,
+ *	which may be zero.  On return from %ETHTOOL_GRSSH, the size of the
+ *	hardware hash key.
+ * @hfunc: Defines the current RSS hash function used by HW (or to be set to).
+ *	Valid values are one of the %ETH_RSS_HASH_*.
+ * @rsvd:	Reserved for future extensions.
+ * @rss_config: RX ring/queue index for each hash value i.e., indirection table
+ *	of @indir_size uint32_t elements, followed by hash key of @key_size
+ *	bytes.
+ *
+ * For %ETHTOOL_GRSSH, a @indir_size and key_size of zero means that only the
+ * size should be returned.  For %ETHTOOL_SRSSH, an @indir_size of
+ * %ETH_RXFH_INDIR_NO_CHANGE means that indir table setting is not requested
+ * and a @indir_size of zero means the indir table should be reset to default
+ * values. An hfunc of zero means that hash function setting is not requested.
+ */
+struct ethtool_rxfh {
+	uint32_t   cmd;
+	uint32_t	rss_context;
+	uint32_t   indir_size;
+	uint32_t   key_size;
+	uint8_t	hfunc;
+	uint8_t	rsvd8[3];
+	uint32_t	rsvd32;
+	uint32_t   rss_config[0];
+};
+#define ETH_RXFH_INDIR_NO_CHANGE	0xffffffff
+
+/**
+ * struct ethtool_rx_ntuple_flow_spec - specification for RX flow filter
+ * @flow_type: Type of match to perform, e.g. %TCP_V4_FLOW
+ * @h_u: Flow field values to match (dependent on @flow_type)
+ * @m_u: Masks for flow field value bits to be ignored
+ * @vlan_tag: VLAN tag to match
+ * @vlan_tag_mask: Mask for VLAN tag bits to be ignored
+ * @data: Driver-dependent data to match
+ * @data_mask: Mask for driver-dependent data bits to be ignored
+ * @action: RX ring/queue index to deliver to (non-negative) or other action
+ *	(negative, e.g. %ETHTOOL_RXNTUPLE_ACTION_DROP)
+ *
+ * For flow types %TCP_V4_FLOW, %UDP_V4_FLOW and %SCTP_V4_FLOW, where
+ * a field value and mask are both zero this is treated as if all mask
+ * bits are set i.e. the field is ignored.
+ */
+struct ethtool_rx_ntuple_flow_spec {
+	uint32_t		 flow_type;
+	union {
+		struct ethtool_tcpip4_spec		tcp_ip4_spec;
+		struct ethtool_tcpip4_spec		udp_ip4_spec;
+		struct ethtool_tcpip4_spec		sctp_ip4_spec;
+		struct ethtool_ah_espip4_spec		ah_ip4_spec;
+		struct ethtool_ah_espip4_spec		esp_ip4_spec;
+		struct ethtool_usrip4_spec		usr_ip4_spec;
+		struct eth_header				ether_spec;
+		uint8_t					hdata[72];
+	} h_u, m_u;
+
+	uint16_t	        vlan_tag;
+	uint16_t	        vlan_tag_mask;
+	uint64_t		data;
+	uint64_t		data_mask;
+
+	int32_t		action;
+#define ETHTOOL_RXNTUPLE_ACTION_DROP	(-1)	/* drop packet */
+#define ETHTOOL_RXNTUPLE_ACTION_CLEAR	(-2)	/* clear filter */
+};
+
+/**
+ * struct ethtool_rx_ntuple - command to set or clear RX flow filter
+ * @cmd: Command number - %ETHTOOL_SRXNTUPLE
+ * @fs: Flow filter specification
+ */
+struct ethtool_rx_ntuple {
+	uint32_t					cmd;
+	struct ethtool_rx_ntuple_flow_spec	fs;
+};
+
+#define ETHTOOL_FLASH_MAX_FILENAME	128
+enum ethtool_flash_op_type {
+	ETHTOOL_FLASH_ALL_REGIONS	= 0,
+};
+
+/* for passing firmware flashing related parameters */
+struct ethtool_flash {
+	uint32_t	cmd;
+	uint32_t	region;
+	char	data[ETHTOOL_FLASH_MAX_FILENAME];
+};
+
+/**
+ * struct ethtool_dump - used for retrieving, setting device dump
+ * @cmd: Command number - %ETHTOOL_GET_DUMP_FLAG, %ETHTOOL_GET_DUMP_DATA, or
+ * 	%ETHTOOL_SET_DUMP
+ * @version: FW version of the dump, filled in by driver
+ * @flag: driver dependent flag for dump setting, filled in by driver during
+ *        get and filled in by ethtool for set operation.
+ *        flag must be initialized by macro ETH_FW_DUMP_DISABLE value when
+ *        firmware dump is disabled.
+ * @len: length of dump data, used as the length of the user buffer on entry to
+ * 	 %ETHTOOL_GET_DUMP_DATA and this is returned as dump length by driver
+ * 	 for %ETHTOOL_GET_DUMP_FLAG command
+ * @data: data collected for get dump data operation
+ */
+struct ethtool_dump {
+	uint32_t	cmd;
+	uint32_t	version;
+	uint32_t	flag;
+	uint32_t	len;
+	uint8_t	data[0];
+};
+
+#define ETH_FW_DUMP_DISABLE 0
+
+/* for returning and changing feature sets */
+
+/**
+ * struct ethtool_get_features_block - block with state of 32 features
+ * @available: mask of changeable features
+ * @requested: mask of features requested to be enabled if possible
+ * @active: mask of currently enabled features
+ * @never_changed: mask of features not changeable for any device
+ */
+struct ethtool_get_features_block {
+	uint32_t	available;
+	uint32_t	requested;
+	uint32_t	active;
+	uint32_t	never_changed;
+};
+
+/**
+ * struct ethtool_gfeatures - command to get state of device's features
+ * @cmd: command number = %ETHTOOL_GFEATURES
+ * @size: On entry, the number of elements in the features[] array;
+ *	on return, the number of elements in features[] needed to hold
+ *	all features
+ * @features: state of features
+ */
+struct ethtool_gfeatures {
+	uint32_t	cmd;
+	uint32_t	size;
+	struct ethtool_get_features_block features[0];
+};
+
+/**
+ * struct ethtool_set_features_block - block with request for 32 features
+ * @valid: mask of features to be changed
+ * @requested: values of features to be changed
+ */
+struct ethtool_set_features_block {
+	uint32_t	valid;
+	uint32_t	requested;
+};
+
+/**
+ * struct ethtool_sfeatures - command to request change in device's features
+ * @cmd: command number = %ETHTOOL_SFEATURES
+ * @size: array size of the features[] array
+ * @features: feature change masks
+ */
+struct ethtool_sfeatures {
+	uint32_t	cmd;
+	uint32_t	size;
+	struct ethtool_set_features_block features[0];
+};
+
+/**
+ * struct ethtool_ts_info - holds a device's timestamping and PHC association
+ * @cmd: command number = %ETHTOOL_GET_TS_INFO
+ * @so_timestamping: bit mask of the sum of the supported SO_TIMESTAMPING flags
+ * @phc_index: device index of the associated PHC, or -1 if there is none
+ * @tx_types: bit mask of the supported hwtstamp_tx_types enumeration values
+ * @rx_filters: bit mask of the supported hwtstamp_rx_filters enumeration values
+ *
+ * The bits in the 'tx_types' and 'rx_filters' fields correspond to
+ * the 'hwtstamp_tx_types' and 'hwtstamp_rx_filters' enumeration values,
+ * respectively.  For example, if the device supports HWTSTAMP_TX_ON,
+ * then (1 << HWTSTAMP_TX_ON) in 'tx_types' will be set.
+ *
+ * Drivers should only report the filters they actually support without
+ * upscaling in the SIOCSHWTSTAMP ioctl. If the SIOCSHWSTAMP request for
+ * HWTSTAMP_FILTER_V1_SYNC is supported by HWTSTAMP_FILTER_V1_EVENT, then the
+ * driver should only report HWTSTAMP_FILTER_V1_EVENT in this op.
+ */
+struct ethtool_ts_info {
+	uint32_t	cmd;
+	uint32_t	so_timestamping;
+	int32_t	phc_index;
+	uint32_t	tx_types;
+	uint32_t	tx_reserved[3];
+	uint32_t	rx_filters;
+	uint32_t	rx_reserved[3];
+};
+
+/*
+ * %ETHTOOL_SFEATURES changes features present in features[].valid to the
+ * values of corresponding bits in features[].requested. Bits in .requested
+ * not set in .valid or not changeable are ignored.
+ *
+ * Returns %EINVAL when .valid contains undefined or never-changeable bits
+ * or size is not equal to required number of features words (32-bit blocks).
+ * Returns >= 0 if request was completed; bits set in the value mean:
+ *   %ETHTOOL_F_UNSUPPORTED - there were bits set in .valid that are not
+ *	changeable (not present in %ETHTOOL_GFEATURES' features[].available)
+ *	those bits were ignored.
+ *   %ETHTOOL_F_WISH - some or all changes requested were recorded but the
+ *      resulting state of bits masked by .valid is not equal to .requested.
+ *      Probably there are other device-specific constraints on some features
+ *      in the set. When %ETHTOOL_F_UNSUPPORTED is set, .valid is considered
+ *      here as though ignored bits were cleared.
+ *   %ETHTOOL_F_COMPAT - some or all changes requested were made by calling
+ *      compatibility functions. Requested offload state cannot be properly
+ *      managed by kernel.
+ *
+ * Meaning of bits in the masks are obtained by %ETHTOOL_GSSET_INFO (number of
+ * bits in the arrays - always multiple of 32) and %ETHTOOL_GSTRINGS commands
+ * for ETH_SS_FEATURES string set. First entry in the table corresponds to least
+ * significant bit in features[0] fields. Empty strings mark undefined features.
+ */
+enum ethtool_sfeatures_retval_bits {
+	ETHTOOL_F_UNSUPPORTED__BIT,
+	ETHTOOL_F_WISH__BIT,
+	ETHTOOL_F_COMPAT__BIT,
+};
+
+#define ETHTOOL_F_UNSUPPORTED   (1 << ETHTOOL_F_UNSUPPORTED__BIT)
+#define ETHTOOL_F_WISH          (1 << ETHTOOL_F_WISH__BIT)
+#define ETHTOOL_F_COMPAT        (1 << ETHTOOL_F_COMPAT__BIT)
+
+#define MAX_NUM_QUEUE		4096
+
+/**
+ * struct ethtool_per_queue_op - apply sub command to the queues in mask.
+ * @cmd: ETHTOOL_PERQUEUE
+ * @sub_command: the sub command which apply to each queues
+ * @queue_mask: Bitmap of the queues which sub command apply to
+ * @data: A complete command structure following for each of the queues addressed
+ */
+struct ethtool_per_queue_op {
+	uint32_t	cmd;
+	uint32_t	sub_command;
+	uint32_t	queue_mask[__KERNEL_DIV_ROUND_UP(MAX_NUM_QUEUE, 32)];
+	char	data[];
+};
+
+/**
+ * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters
+ * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM
+ * @active_fec: FEC mode which is active on porte
+ * @fec: Bitmask of supported/configured FEC modes
+ * @rsvd: Reserved for future extensions. i.e FEC bypass feature.
+ *
+ * Drivers should reject a non-zero setting of @autoneg when
+ * autoneogotiation is disabled (or not supported) for the link.
+ *
+ */
+struct ethtool_fecparam {
+	uint32_t   cmd;
+	/* bitmask of FEC modes */
+	uint32_t   active_fec;
+	uint32_t   fec;
+	uint32_t   reserved;
+};
+
+/**
+ * enum ethtool_fec_config_bits - flags definition of ethtool_fec_configuration
+ * @ETHTOOL_FEC_NONE: FEC mode configuration is not supported
+ * @ETHTOOL_FEC_AUTO: Default/Best FEC mode provided by driver
+ * @ETHTOOL_FEC_OFF: No FEC Mode
+ * @ETHTOOL_FEC_RS: Reed-Solomon Forward Error Detection mode
+ * @ETHTOOL_FEC_BASER: Base-R/Reed-Solomon Forward Error Detection mode
+ */
+enum ethtool_fec_config_bits {
+	ETHTOOL_FEC_NONE_BIT,
+	ETHTOOL_FEC_AUTO_BIT,
+	ETHTOOL_FEC_OFF_BIT,
+	ETHTOOL_FEC_RS_BIT,
+	ETHTOOL_FEC_BASER_BIT,
+};
+
+#define ETHTOOL_FEC_NONE		(1 << ETHTOOL_FEC_NONE_BIT)
+#define ETHTOOL_FEC_AUTO		(1 << ETHTOOL_FEC_AUTO_BIT)
+#define ETHTOOL_FEC_OFF			(1 << ETHTOOL_FEC_OFF_BIT)
+#define ETHTOOL_FEC_RS			(1 << ETHTOOL_FEC_RS_BIT)
+#define ETHTOOL_FEC_BASER		(1 << ETHTOOL_FEC_BASER_BIT)
+
+/* CMDs currently supported */
+#define ETHTOOL_GSET		0x00000001 /* DEPRECATED, Get settings.
+					    * Please use ETHTOOL_GLINKSETTINGS
+					    */
+#define ETHTOOL_SSET		0x00000002 /* DEPRECATED, Set settings.
+					    * Please use ETHTOOL_SLINKSETTINGS
+					    */
+#define ETHTOOL_GDRVINFO	0x00000003 /* Get driver info. */
+#define ETHTOOL_GREGS		0x00000004 /* Get NIC registers. */
+#define ETHTOOL_GWOL		0x00000005 /* Get wake-on-lan options. */
+#define ETHTOOL_SWOL		0x00000006 /* Set wake-on-lan options. */
+#define ETHTOOL_GMSGLVL		0x00000007 /* Get driver message level */
+#define ETHTOOL_SMSGLVL		0x00000008 /* Set driver msg level. */
+#define ETHTOOL_NWAY_RST	0x00000009 /* Restart autonegotiation. */
+/* Get link status for host, i.e. whether the interface *and* the
+ * physical port (if there is one) are up (ethtool_value). */
+#define ETHTOOL_GLINK		0x0000000a
+#define ETHTOOL_GEEPROM		0x0000000b /* Get EEPROM data */
+#define ETHTOOL_SEEPROM		0x0000000c /* Set EEPROM data. */
+#define ETHTOOL_GCOALESCE	0x0000000e /* Get coalesce config */
+#define ETHTOOL_SCOALESCE	0x0000000f /* Set coalesce config. */
+#define ETHTOOL_GRINGPARAM	0x00000010 /* Get ring parameters */
+#define ETHTOOL_SRINGPARAM	0x00000011 /* Set ring parameters. */
+#define ETHTOOL_GPAUSEPARAM	0x00000012 /* Get pause parameters */
+#define ETHTOOL_SPAUSEPARAM	0x00000013 /* Set pause parameters. */
+#define ETHTOOL_GRXCSUM		0x00000014 /* Get RX hw csum enable (ethtool_value) */
+#define ETHTOOL_SRXCSUM		0x00000015 /* Set RX hw csum enable (ethtool_value) */
+#define ETHTOOL_GTXCSUM		0x00000016 /* Get TX hw csum enable (ethtool_value) */
+#define ETHTOOL_STXCSUM		0x00000017 /* Set TX hw csum enable (ethtool_value) */
+#define ETHTOOL_GSG		0x00000018 /* Get scatter-gather enable
+					    * (ethtool_value) */
+#define ETHTOOL_SSG		0x00000019 /* Set scatter-gather enable
+					    * (ethtool_value). */
+#define ETHTOOL_TEST		0x0000001a /* execute NIC self-test. */
+#define ETHTOOL_GSTRINGS	0x0000001b /* get specified string set */
+#define ETHTOOL_PHYS_ID		0x0000001c /* identify the NIC */
+#define ETHTOOL_GSTATS		0x0000001d /* get NIC-specific statistics */
+#define ETHTOOL_GTSO		0x0000001e /* Get TSO enable (ethtool_value) */
+#define ETHTOOL_STSO		0x0000001f /* Set TSO enable (ethtool_value) */
+#define ETHTOOL_GPERMADDR	0x00000020 /* Get permanent hardware address */
+#define ETHTOOL_GUFO		0x00000021 /* Get UFO enable (ethtool_value) */
+#define ETHTOOL_SUFO		0x00000022 /* Set UFO enable (ethtool_value) */
+#define ETHTOOL_GGSO		0x00000023 /* Get GSO enable (ethtool_value) */
+#define ETHTOOL_SGSO		0x00000024 /* Set GSO enable (ethtool_value) */
+#define ETHTOOL_GFLAGS		0x00000025 /* Get flags bitmap(ethtool_value) */
+#define ETHTOOL_SFLAGS		0x00000026 /* Set flags bitmap(ethtool_value) */
+#define ETHTOOL_GPFLAGS		0x00000027 /* Get driver-private flags bitmap */
+#define ETHTOOL_SPFLAGS		0x00000028 /* Set driver-private flags bitmap */
+
+#define ETHTOOL_GRXFH		0x00000029 /* Get RX flow hash configuration */
+#define ETHTOOL_SRXFH		0x0000002a /* Set RX flow hash configuration */
+#define ETHTOOL_GGRO		0x0000002b /* Get GRO enable (ethtool_value) */
+#define ETHTOOL_SGRO		0x0000002c /* Set GRO enable (ethtool_value) */
+#define ETHTOOL_GRXRINGS	0x0000002d /* Get RX rings available for LB */
+#define ETHTOOL_GRXCLSRLCNT	0x0000002e /* Get RX class rule count */
+#define ETHTOOL_GRXCLSRULE	0x0000002f /* Get RX classification rule */
+#define ETHTOOL_GRXCLSRLALL	0x00000030 /* Get all RX classification rule */
+#define ETHTOOL_SRXCLSRLDEL	0x00000031 /* Delete RX classification rule */
+#define ETHTOOL_SRXCLSRLINS	0x00000032 /* Insert RX classification rule */
+#define ETHTOOL_FLASHDEV	0x00000033 /* Flash firmware to device */
+#define ETHTOOL_RESET		0x00000034 /* Reset hardware */
+#define ETHTOOL_SRXNTUPLE	0x00000035 /* Add an n-tuple filter to device */
+#define ETHTOOL_GRXNTUPLE	0x00000036 /* deprecated */
+#define ETHTOOL_GSSET_INFO	0x00000037 /* Get string set info */
+#define ETHTOOL_GRXFHINDIR	0x00000038 /* Get RX flow hash indir'n table */
+#define ETHTOOL_SRXFHINDIR	0x00000039 /* Set RX flow hash indir'n table */
+
+#define ETHTOOL_GFEATURES	0x0000003a /* Get device offload settings */
+#define ETHTOOL_SFEATURES	0x0000003b /* Change device offload settings */
+#define ETHTOOL_GCHANNELS	0x0000003c /* Get no of channels */
+#define ETHTOOL_SCHANNELS	0x0000003d /* Set no of channels */
+#define ETHTOOL_SET_DUMP	0x0000003e /* Set dump settings */
+#define ETHTOOL_GET_DUMP_FLAG	0x0000003f /* Get dump settings */
+#define ETHTOOL_GET_DUMP_DATA	0x00000040 /* Get dump data */
+#define ETHTOOL_GET_TS_INFO	0x00000041 /* Get time stamping and PHC info */
+#define ETHTOOL_GMODULEINFO	0x00000042 /* Get plug-in module information */
+#define ETHTOOL_GMODULEEEPROM	0x00000043 /* Get plug-in module eeprom */
+#define ETHTOOL_GEEE		0x00000044 /* Get EEE settings */
+#define ETHTOOL_SEEE		0x00000045 /* Set EEE settings */
+
+#define ETHTOOL_GRSSH		0x00000046 /* Get RX flow hash configuration */
+#define ETHTOOL_SRSSH		0x00000047 /* Set RX flow hash configuration */
+#define ETHTOOL_GTUNABLE	0x00000048 /* Get tunable configuration */
+#define ETHTOOL_STUNABLE	0x00000049 /* Set tunable configuration */
+#define ETHTOOL_GPHYSTATS	0x0000004a /* get PHY-specific statistics */
+
+#define ETHTOOL_PERQUEUE	0x0000004b /* Set per queue options */
+
+#define ETHTOOL_GLINKSETTINGS	0x0000004c /* Get ethtool_link_settings */
+#define ETHTOOL_SLINKSETTINGS	0x0000004d /* Set ethtool_link_settings */
+#define ETHTOOL_PHY_GTUNABLE	0x0000004e /* Get PHY tunable configuration */
+#define ETHTOOL_PHY_STUNABLE	0x0000004f /* Set PHY tunable configuration */
+#define ETHTOOL_GFECPARAM	0x00000050 /* Get FEC settings */
+#define ETHTOOL_SFECPARAM	0x00000051 /* Set FEC settings */
+
+/* compatibility with older code */
+#define SPARC_ETH_GSET		ETHTOOL_GSET
+#define SPARC_ETH_SSET		ETHTOOL_SSET
+
+/* Link mode bit indices */
+enum ethtool_link_mode_bit_indices {
+	ETHTOOL_LINK_MODE_10baseT_Half_BIT	= 0,
+	ETHTOOL_LINK_MODE_10baseT_Full_BIT	= 1,
+	ETHTOOL_LINK_MODE_100baseT_Half_BIT	= 2,
+	ETHTOOL_LINK_MODE_100baseT_Full_BIT	= 3,
+	ETHTOOL_LINK_MODE_1000baseT_Half_BIT	= 4,
+	ETHTOOL_LINK_MODE_1000baseT_Full_BIT	= 5,
+	ETHTOOL_LINK_MODE_Autoneg_BIT		= 6,
+	ETHTOOL_LINK_MODE_TP_BIT		= 7,
+	ETHTOOL_LINK_MODE_AUI_BIT		= 8,
+	ETHTOOL_LINK_MODE_MII_BIT		= 9,
+	ETHTOOL_LINK_MODE_FIBRE_BIT		= 10,
+	ETHTOOL_LINK_MODE_BNC_BIT		= 11,
+	ETHTOOL_LINK_MODE_10000baseT_Full_BIT	= 12,
+	ETHTOOL_LINK_MODE_Pause_BIT		= 13,
+	ETHTOOL_LINK_MODE_Asym_Pause_BIT	= 14,
+	ETHTOOL_LINK_MODE_2500baseX_Full_BIT	= 15,
+	ETHTOOL_LINK_MODE_Backplane_BIT		= 16,
+	ETHTOOL_LINK_MODE_1000baseKX_Full_BIT	= 17,
+	ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT	= 18,
+	ETHTOOL_LINK_MODE_10000baseKR_Full_BIT	= 19,
+	ETHTOOL_LINK_MODE_10000baseR_FEC_BIT	= 20,
+	ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT = 21,
+	ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT	= 22,
+	ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT	= 23,
+	ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT	= 24,
+	ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT	= 25,
+	ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT	= 26,
+	ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT	= 27,
+	ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT	= 28,
+	ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT	= 29,
+	ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT	= 30,
+	ETHTOOL_LINK_MODE_25000baseCR_Full_BIT	= 31,
+	ETHTOOL_LINK_MODE_25000baseKR_Full_BIT	= 32,
+	ETHTOOL_LINK_MODE_25000baseSR_Full_BIT	= 33,
+	ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT	= 34,
+	ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT	= 35,
+	ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT	= 36,
+	ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT	= 37,
+	ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT	= 38,
+	ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT	= 39,
+	ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT		= 40,
+	ETHTOOL_LINK_MODE_1000baseX_Full_BIT	= 41,
+	ETHTOOL_LINK_MODE_10000baseCR_Full_BIT	= 42,
+	ETHTOOL_LINK_MODE_10000baseSR_Full_BIT	= 43,
+	ETHTOOL_LINK_MODE_10000baseLR_Full_BIT	= 44,
+	ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT	= 45,
+	ETHTOOL_LINK_MODE_10000baseER_Full_BIT	= 46,
+	ETHTOOL_LINK_MODE_2500baseT_Full_BIT	= 47,
+	ETHTOOL_LINK_MODE_5000baseT_Full_BIT	= 48,
+
+	ETHTOOL_LINK_MODE_FEC_NONE_BIT	= 49,
+	ETHTOOL_LINK_MODE_FEC_RS_BIT	= 50,
+	ETHTOOL_LINK_MODE_FEC_BASER_BIT	= 51,
+
+	/* Last allowed bit for __ETHTOOL_LINK_MODE_LEGACY_MASK is bit
+	 * 31. Please do NOT define any SUPPORTED_* or ADVERTISED_*
+	 * macro for bits > 31. The only way to use indices > 31 is to
+	 * use the new ETHTOOL_GLINKSETTINGS/ETHTOOL_SLINKSETTINGS API.
+	 */
+
+	__ETHTOOL_LINK_MODE_LAST
+	  = ETHTOOL_LINK_MODE_FEC_BASER_BIT,
+};
+
+#define __ETHTOOL_LINK_MODE_LEGACY_MASK(base_name)	\
+	(1UL << (ETHTOOL_LINK_MODE_ ## base_name ## _BIT))
+
+/* DEPRECATED macros. Please migrate to
+ * ETHTOOL_GLINKSETTINGS/ETHTOOL_SLINKSETTINGS API. Please do NOT
+ * define any new SUPPORTED_* macro for bits > 31.
+ */
+#define SUPPORTED_10baseT_Half		__ETHTOOL_LINK_MODE_LEGACY_MASK(10baseT_Half)
+#define SUPPORTED_10baseT_Full		__ETHTOOL_LINK_MODE_LEGACY_MASK(10baseT_Full)
+#define SUPPORTED_100baseT_Half		__ETHTOOL_LINK_MODE_LEGACY_MASK(100baseT_Half)
+#define SUPPORTED_100baseT_Full		__ETHTOOL_LINK_MODE_LEGACY_MASK(100baseT_Full)
+#define SUPPORTED_1000baseT_Half	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseT_Half)
+#define SUPPORTED_1000baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseT_Full)
+#define SUPPORTED_Autoneg		__ETHTOOL_LINK_MODE_LEGACY_MASK(Autoneg)
+#define SUPPORTED_TP			__ETHTOOL_LINK_MODE_LEGACY_MASK(TP)
+#define SUPPORTED_AUI			__ETHTOOL_LINK_MODE_LEGACY_MASK(AUI)
+#define SUPPORTED_MII			__ETHTOOL_LINK_MODE_LEGACY_MASK(MII)
+#define SUPPORTED_FIBRE			__ETHTOOL_LINK_MODE_LEGACY_MASK(FIBRE)
+#define SUPPORTED_BNC			__ETHTOOL_LINK_MODE_LEGACY_MASK(BNC)
+#define SUPPORTED_10000baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseT_Full)
+#define SUPPORTED_Pause			__ETHTOOL_LINK_MODE_LEGACY_MASK(Pause)
+#define SUPPORTED_Asym_Pause		__ETHTOOL_LINK_MODE_LEGACY_MASK(Asym_Pause)
+#define SUPPORTED_2500baseX_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(2500baseX_Full)
+#define SUPPORTED_Backplane		__ETHTOOL_LINK_MODE_LEGACY_MASK(Backplane)
+#define SUPPORTED_1000baseKX_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseKX_Full)
+#define SUPPORTED_10000baseKX4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseKX4_Full)
+#define SUPPORTED_10000baseKR_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseKR_Full)
+#define SUPPORTED_10000baseR_FEC	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseR_FEC)
+#define SUPPORTED_20000baseMLD2_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(20000baseMLD2_Full)
+#define SUPPORTED_20000baseKR2_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(20000baseKR2_Full)
+#define SUPPORTED_40000baseKR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseKR4_Full)
+#define SUPPORTED_40000baseCR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseCR4_Full)
+#define SUPPORTED_40000baseSR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseSR4_Full)
+#define SUPPORTED_40000baseLR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseLR4_Full)
+#define SUPPORTED_56000baseKR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseKR4_Full)
+#define SUPPORTED_56000baseCR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseCR4_Full)
+#define SUPPORTED_56000baseSR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseSR4_Full)
+#define SUPPORTED_56000baseLR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseLR4_Full)
+/* Please do not define any new SUPPORTED_* macro for bits > 31, see
+ * notice above.
+ */
+
+/*
+ * DEPRECATED macros. Please migrate to
+ * ETHTOOL_GLINKSETTINGS/ETHTOOL_SLINKSETTINGS API. Please do NOT
+ * define any new ADERTISE_* macro for bits > 31.
+ */
+#define ADVERTISED_10baseT_Half		__ETHTOOL_LINK_MODE_LEGACY_MASK(10baseT_Half)
+#define ADVERTISED_10baseT_Full		__ETHTOOL_LINK_MODE_LEGACY_MASK(10baseT_Full)
+#define ADVERTISED_100baseT_Half	__ETHTOOL_LINK_MODE_LEGACY_MASK(100baseT_Half)
+#define ADVERTISED_100baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(100baseT_Full)
+#define ADVERTISED_1000baseT_Half	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseT_Half)
+#define ADVERTISED_1000baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseT_Full)
+#define ADVERTISED_Autoneg		__ETHTOOL_LINK_MODE_LEGACY_MASK(Autoneg)
+#define ADVERTISED_TP			__ETHTOOL_LINK_MODE_LEGACY_MASK(TP)
+#define ADVERTISED_AUI			__ETHTOOL_LINK_MODE_LEGACY_MASK(AUI)
+#define ADVERTISED_MII			__ETHTOOL_LINK_MODE_LEGACY_MASK(MII)
+#define ADVERTISED_FIBRE		__ETHTOOL_LINK_MODE_LEGACY_MASK(FIBRE)
+#define ADVERTISED_BNC			__ETHTOOL_LINK_MODE_LEGACY_MASK(BNC)
+#define ADVERTISED_10000baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseT_Full)
+#define ADVERTISED_Pause		__ETHTOOL_LINK_MODE_LEGACY_MASK(Pause)
+#define ADVERTISED_Asym_Pause		__ETHTOOL_LINK_MODE_LEGACY_MASK(Asym_Pause)
+#define ADVERTISED_2500baseX_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(2500baseX_Full)
+#define ADVERTISED_Backplane		__ETHTOOL_LINK_MODE_LEGACY_MASK(Backplane)
+#define ADVERTISED_1000baseKX_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseKX_Full)
+#define ADVERTISED_10000baseKX4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseKX4_Full)
+#define ADVERTISED_10000baseKR_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseKR_Full)
+#define ADVERTISED_10000baseR_FEC	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseR_FEC)
+#define ADVERTISED_20000baseMLD2_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(20000baseMLD2_Full)
+#define ADVERTISED_20000baseKR2_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(20000baseKR2_Full)
+#define ADVERTISED_40000baseKR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseKR4_Full)
+#define ADVERTISED_40000baseCR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseCR4_Full)
+#define ADVERTISED_40000baseSR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseSR4_Full)
+#define ADVERTISED_40000baseLR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseLR4_Full)
+#define ADVERTISED_56000baseKR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseKR4_Full)
+#define ADVERTISED_56000baseCR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseCR4_Full)
+#define ADVERTISED_56000baseSR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseSR4_Full)
+#define ADVERTISED_56000baseLR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseLR4_Full)
+/* Please do not define any new ADVERTISED_* macro for bits > 31, see
+ * notice above.
+ */
+
+/* The following are all involved in forcing a particular link
+ * mode for the device for setting things.  When getting the
+ * devices settings, these indicate the current mode and whether
+ * it was forced up into this mode or autonegotiated.
+ */
+
+/* The forced speed, in units of 1Mb. All values 0 to INT_MAX are legal.
+ * Update drivers/net/phy/phy.c:phy_speed_to_str() and
+ * drivers/net/bonding/bond_3ad.c:__get_link_speed() when adding new values.
+ */
+#define SPEED_10		10
+#define SPEED_100		100
+#define SPEED_1000		1000
+#define SPEED_2500		2500
+#define SPEED_5000		5000
+#define SPEED_10000		10000
+#define SPEED_14000		14000
+#define SPEED_20000		20000
+#define SPEED_25000		25000
+#define SPEED_40000		40000
+#define SPEED_50000		50000
+#define SPEED_56000		56000
+#define SPEED_100000		100000
+
+#define SPEED_UNKNOWN		-1
+
+static inline int ethtool_validate_speed(uint32_t speed)
+{
+	return speed <= INT_MAX || speed == SPEED_UNKNOWN;
+}
+
+/* Duplex, half or full. */
+#define DUPLEX_HALF		0x00
+#define DUPLEX_FULL		0x01
+#define DUPLEX_UNKNOWN		0xff
+
+static inline int ethtool_validate_duplex(uint8_t duplex)
+{
+	switch (duplex) {
+	case DUPLEX_HALF:
+	case DUPLEX_FULL:
+	case DUPLEX_UNKNOWN:
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Which connector port. */
+#define PORT_TP			0x00
+#define PORT_AUI		0x01
+#define PORT_MII		0x02
+#define PORT_FIBRE		0x03
+#define PORT_BNC		0x04
+#define PORT_DA			0x05
+#define PORT_NONE		0xef
+#define PORT_OTHER		0xff
+
+/* Which transceiver to use. */
+#define XCVR_INTERNAL		0x00 /* PHY and MAC are in the same package */
+#define XCVR_EXTERNAL		0x01 /* PHY and MAC are in different packages */
+#define XCVR_DUMMY1		0x02
+#define XCVR_DUMMY2		0x03
+#define XCVR_DUMMY3		0x04
+
+/* Enable or disable autonegotiation. */
+#define AUTONEG_DISABLE		0x00
+#define AUTONEG_ENABLE		0x01
+
+/* MDI or MDI-X status/control - if MDI/MDI_X/AUTO is set then
+ * the driver is required to renegotiate link
+ */
+#define ETH_TP_MDI_INVALID	0x00 /* status: unknown; control: unsupported */
+#define ETH_TP_MDI		0x01 /* status: MDI;     control: force MDI */
+#define ETH_TP_MDI_X		0x02 /* status: MDI-X;   control: force MDI-X */
+#define ETH_TP_MDI_AUTO		0x03 /*                  control: auto-select */
+
+/* Wake-On-Lan options. */
+#define WAKE_PHY		(1 << 0)
+#define WAKE_UCAST		(1 << 1)
+#define WAKE_MCAST		(1 << 2)
+#define WAKE_BCAST		(1 << 3)
+#define WAKE_ARP		(1 << 4)
+#define WAKE_MAGIC		(1 << 5)
+#define WAKE_MAGICSECURE	(1 << 6) /* only meaningful if WAKE_MAGIC */
+
+/* L2-L4 network traffic flow types */
+#define	TCP_V4_FLOW	0x01	/* hash or spec (tcp_ip4_spec) */
+#define	UDP_V4_FLOW	0x02	/* hash or spec (udp_ip4_spec) */
+#define	SCTP_V4_FLOW	0x03	/* hash or spec (sctp_ip4_spec) */
+#define	AH_ESP_V4_FLOW	0x04	/* hash only */
+#define	TCP_V6_FLOW	0x05	/* hash or spec (tcp_ip6_spec; nfc only) */
+#define	UDP_V6_FLOW	0x06	/* hash or spec (udp_ip6_spec; nfc only) */
+#define	SCTP_V6_FLOW	0x07	/* hash or spec (sctp_ip6_spec; nfc only) */
+#define	AH_ESP_V6_FLOW	0x08	/* hash only */
+#define	AH_V4_FLOW	0x09	/* hash or spec (ah_ip4_spec) */
+#define	ESP_V4_FLOW	0x0a	/* hash or spec (esp_ip4_spec) */
+#define	AH_V6_FLOW	0x0b	/* hash or spec (ah_ip6_spec; nfc only) */
+#define	ESP_V6_FLOW	0x0c	/* hash or spec (esp_ip6_spec; nfc only) */
+#define	IPV4_USER_FLOW	0x0d	/* spec only (usr_ip4_spec) */
+#define	IP_USER_FLOW	IPV4_USER_FLOW
+#define	IPV6_USER_FLOW	0x0e	/* spec only (usr_ip6_spec; nfc only) */
+#define	IPV4_FLOW	0x10	/* hash only */
+#define	IPV6_FLOW	0x11	/* hash only */
+#define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
+/* Flag to enable additional fields in struct ethtool_rx_flow_spec */
+#define	FLOW_EXT	0x80000000
+#define	FLOW_MAC_EXT	0x40000000
+
+/* L3-L4 network traffic flow hash options */
+#define	RXH_L2DA	(1 << 1)
+#define	RXH_VLAN	(1 << 2)
+#define	RXH_L3_PROTO	(1 << 3)
+#define	RXH_IP_SRC	(1 << 4)
+#define	RXH_IP_DST	(1 << 5)
+#define	RXH_L4_B_0_1	(1 << 6) /* src port in case of TCP/UDP/SCTP */
+#define	RXH_L4_B_2_3	(1 << 7) /* dst port in case of TCP/UDP/SCTP */
+#define	RXH_DISCARD	(1 << 31)
+
+#define	RX_CLS_FLOW_DISC	0xffffffffffffffffULL
+
+/* Special RX classification rule insert location values */
+#define RX_CLS_LOC_SPECIAL	0x80000000	/* flag */
+#define RX_CLS_LOC_ANY		0xffffffff
+#define RX_CLS_LOC_FIRST	0xfffffffe
+#define RX_CLS_LOC_LAST		0xfffffffd
+
+/* EEPROM Standards for plug in modules */
+#define ETH_MODULE_SFF_8079		0x1
+#define ETH_MODULE_SFF_8079_LEN		256
+#define ETH_MODULE_SFF_8472		0x2
+#define ETH_MODULE_SFF_8472_LEN		512
+#define ETH_MODULE_SFF_8636		0x3
+#define ETH_MODULE_SFF_8636_LEN		256
+#define ETH_MODULE_SFF_8436		0x4
+#define ETH_MODULE_SFF_8436_LEN		256
+
+/* Reset flags */
+/* The reset() operation must clear the flags for the components which
+ * were actually reset.  On successful return, the flags indicate the
+ * components which were not reset, either because they do not exist
+ * in the hardware or because they cannot be reset independently.  The
+ * driver must never reset any components that were not requested.
+ */
+enum ethtool_reset_flags {
+	/* These flags represent components dedicated to the interface
+	 * the command is addressed to.  Shift any flag left by
+	 * ETH_RESET_SHARED_SHIFT to reset a shared component of the
+	 * same type.
+	 */
+	ETH_RESET_MGMT		= 1 << 0,	/* Management processor */
+	ETH_RESET_IRQ		= 1 << 1,	/* Interrupt requester */
+	ETH_RESET_DMA		= 1 << 2,	/* DMA engine */
+	ETH_RESET_FILTER	= 1 << 3,	/* Filtering/flow direction */
+	ETH_RESET_OFFLOAD	= 1 << 4,	/* Protocol offload */
+	ETH_RESET_MAC		= 1 << 5,	/* Media access controller */
+	ETH_RESET_PHY		= 1 << 6,	/* Transceiver/PHY */
+	ETH_RESET_RAM		= 1 << 7,	/* RAM shared between
+						 * multiple components */
+	ETH_RESET_AP		= 1 << 8,	/* Application processor */
+
+	ETH_RESET_DEDICATED	= 0x0000ffff,	/* All components dedicated to
+						 * this interface */
+	ETH_RESET_ALL		= 0xffffffff,	/* All components used by this
+						 * interface, even if shared */
+};
+#define ETH_RESET_SHARED_SHIFT	16
+
+
+/**
+ * struct ethtool_link_settings - link control and status
+ *
+ * IMPORTANT, Backward compatibility notice: When implementing new
+ *	user-space tools, please first try %ETHTOOL_GLINKSETTINGS, and
+ *	if it succeeds use %ETHTOOL_SLINKSETTINGS to change link
+ *	settings; do not use %ETHTOOL_SSET if %ETHTOOL_GLINKSETTINGS
+ *	succeeded: stick to %ETHTOOL_GLINKSETTINGS/%SLINKSETTINGS in
+ *	that case.  Conversely, if %ETHTOOL_GLINKSETTINGS fails, use
+ *	%ETHTOOL_GSET to query and %ETHTOOL_SSET to change link
+ *	settings; do not use %ETHTOOL_SLINKSETTINGS if
+ *	%ETHTOOL_GLINKSETTINGS failed: stick to
+ *	%ETHTOOL_GSET/%ETHTOOL_SSET in that case.
+ *
+ * @cmd: Command number = %ETHTOOL_GLINKSETTINGS or %ETHTOOL_SLINKSETTINGS
+ * @speed: Link speed (Mbps)
+ * @duplex: Duplex mode; one of %DUPLEX_*
+ * @port: Physical connector type; one of %PORT_*
+ * @phy_address: MDIO address of PHY (transceiver); 0 or 255 if not
+ *	applicable.  For clause 45 PHYs this is the PRTAD.
+ * @autoneg: Enable/disable autonegotiation and auto-detection;
+ *	either %AUTONEG_DISABLE or %AUTONEG_ENABLE
+ * @mdio_support: Bitmask of %ETH_MDIO_SUPPORTS_* flags for the MDIO
+ *	protocols supported by the interface; 0 if unknown.
+ *	Read-only.
+ * @eth_tp_mdix: Ethernet twisted-pair MDI(-X) status; one of
+ *	%ETH_TP_MDI_*.  If the status is unknown or not applicable, the
+ *	value will be %ETH_TP_MDI_INVALID.  Read-only.
+ * @eth_tp_mdix_ctrl: Ethernet twisted pair MDI(-X) control; one of
+ *	%ETH_TP_MDI_*.  If MDI(-X) control is not implemented, reads
+ *	yield %ETH_TP_MDI_INVALID and writes may be ignored or rejected.
+ *	When written successfully, the link should be renegotiated if
+ *	necessary.
+ * @link_mode_masks_nwords: Number of 32-bit words for each of the
+ *	supported, advertising, lp_advertising link mode bitmaps. For
+ *	%ETHTOOL_GLINKSETTINGS: on entry, number of words passed by user
+ *	(>= 0); on return, if handshake in progress, negative if
+ *	request size unsupported by kernel: absolute value indicates
+ *	kernel expected size and all the other fields but cmd
+ *	are 0; otherwise (handshake completed), strictly positive
+ *	to indicate size used by kernel and cmd field stays
+ *	%ETHTOOL_GLINKSETTINGS, all other fields populated by driver. For
+ *	%ETHTOOL_SLINKSETTINGS: must be valid on entry, ie. a positive
+ *	value returned previously by %ETHTOOL_GLINKSETTINGS, otherwise
+ *	refused. For drivers: ignore this field (use kernel's
+ *	__ETHTOOL_LINK_MODE_MASK_NBITS instead), any change to it will
+ *	be overwritten by kernel.
+ * @supported: Bitmap with each bit meaning given by
+ *	%ethtool_link_mode_bit_indices for the link modes, physical
+ *	connectors and other link features for which the interface
+ *	supports autonegotiation or auto-detection.  Read-only.
+ * @advertising: Bitmap with each bit meaning given by
+ *	%ethtool_link_mode_bit_indices for the link modes, physical
+ *	connectors and other link features that are advertised through
+ *	autonegotiation or enabled for auto-detection.
+ * @lp_advertising: Bitmap with each bit meaning given by
+ *	%ethtool_link_mode_bit_indices for the link modes, and other
+ *	link features that the link partner advertised through
+ *	autonegotiation; 0 if unknown or not applicable.  Read-only.
+ * @transceiver: Used to distinguish different possible PHY types,
+ *	reported consistently by PHYLIB.  Read-only.
+ *
+ * If autonegotiation is disabled, the speed and @duplex represent the
+ * fixed link mode and are writable if the driver supports multiple
+ * link modes.  If it is enabled then they are read-only; if the link
+ * is up they represent the negotiated link mode; if the link is down,
+ * the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
+ * @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.
+ *
+ * Some hardware interfaces may have multiple PHYs and/or physical
+ * connectors fitted or do not allow the driver to detect which are
+ * fitted.  For these interfaces @port and/or @phy_address may be
+ * writable, possibly dependent on @autoneg being %AUTONEG_DISABLE.
+ * Otherwise, attempts to write different values may be ignored or
+ * rejected.
+ *
+ * Deprecated %ethtool_cmd fields transceiver, maxtxpkt and maxrxpkt
+ * are not available in %ethtool_link_settings. Until all drivers are
+ * converted to ignore them or to the new %ethtool_link_settings API,
+ * for both queries and changes, users should always try
+ * %ETHTOOL_GLINKSETTINGS first, and if it fails with -ENOTSUPP stick
+ * only to %ETHTOOL_GSET and %ETHTOOL_SSET consistently. If it
+ * succeeds, then users should stick to %ETHTOOL_GLINKSETTINGS and
+ * %ETHTOOL_SLINKSETTINGS (which would support drivers implementing
+ * either %ethtool_cmd or %ethtool_link_settings).
+ *
+ * Users should assume that all fields not marked read-only are
+ * writable and subject to validation by the driver.  They should use
+ * %ETHTOOL_GLINKSETTINGS to get the current values before making specific
+ * changes and then applying them with %ETHTOOL_SLINKSETTINGS.
+ *
+ * Drivers that implement %get_link_ksettings and/or
+ * %set_link_ksettings should ignore the @cmd
+ * and @link_mode_masks_nwords fields (any change to them overwritten
+ * by kernel), and rely only on kernel's internal
+ * %__ETHTOOL_LINK_MODE_MASK_NBITS and
+ * %ethtool_link_mode_mask_t. Drivers that implement
+ * %set_link_ksettings() should validate all fields other than @cmd
+ * and @link_mode_masks_nwords that are not described as read-only or
+ * deprecated, and must ignore all fields described as read-only.
+ */
+struct ethtool_link_settings {
+	uint32_t	cmd;
+	uint32_t	speed;
+	uint8_t	duplex;
+	uint8_t	port;
+	uint8_t	phy_address;
+	uint8_t	autoneg;
+	uint8_t	mdio_support;
+	uint8_t	eth_tp_mdix;
+	uint8_t	eth_tp_mdix_ctrl;
+	int8_t	link_mode_masks_nwords;
+	uint8_t	transceiver;
+	uint8_t	reserved1[3];
+	uint32_t	reserved[7];
+	uint32_t	link_mode_masks[0];
+	/* layout of link_mode_masks fields:
+	 * uint32_t map_supported[link_mode_masks_nwords];
+	 * uint32_t map_advertising[link_mode_masks_nwords];
+	 * uint32_t map_lp_advertising[link_mode_masks_nwords];
+	 */
+};
+#endif /* _LINUX_ETHTOOL_H */
diff --git a/include/standard-headers/linux/kernel.h b/include/standard-headers/linux/kernel.h
new file mode 100644
index 0000000..1eeba2e
--- /dev/null
+++ b/include/standard-headers/linux/kernel.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_KERNEL_H
+#define _LINUX_KERNEL_H
+
+#include "standard-headers/linux/sysinfo.h"
+
+/*
+ * 'kernel.h' contains some often-used function prototypes etc
+ */
+#define __ALIGN_KERNEL(x, a)		__ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
+#define __ALIGN_KERNEL_MASK(x, mask)	(((x) + (mask)) & ~(mask))
+
+#define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
+
+#endif /* _LINUX_KERNEL_H */
diff --git a/include/standard-headers/linux/sysinfo.h b/include/standard-headers/linux/sysinfo.h
new file mode 100644
index 0000000..e3c06ac
--- /dev/null
+++ b/include/standard-headers/linux/sysinfo.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_SYSINFO_H
+#define _LINUX_SYSINFO_H
+
+#include "standard-headers/linux/types.h"
+
+#define SI_LOAD_SHIFT	16
+struct sysinfo {
+	long uptime;		/* Seconds since boot */
+	unsigned long loads[3];	/* 1, 5, and 15 minute load averages */
+	unsigned long totalram;	/* Total usable main memory size */
+	unsigned long freeram;	/* Available memory size */
+	unsigned long sharedram;	/* Amount of shared memory */
+	unsigned long bufferram;	/* Memory used by buffers */
+	unsigned long totalswap;	/* Total swap space size */
+	unsigned long freeswap;	/* swap space still available */
+	uint16_t procs;		   	/* Number of current processes */
+	uint16_t pad;		   	/* Explicit padding for m68k */
+	unsigned long totalhigh;	/* Total high memory size */
+	unsigned long freehigh;	/* Available high memory size */
+	uint32_t mem_unit;			/* Memory unit size in bytes */
+	char _f[20-2*sizeof(unsigned long)-sizeof(uint32_t)];	/* Padding: libc5 uses this.. */
+};
+
+#endif /* _LINUX_SYSINFO_H */
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index be06570..d18e2f1 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -39,6 +39,9 @@ cp_portable() {
                                      -e 'input-event-codes' \
                                      -e 'sys/' \
                                      -e 'pvrdma_verbs' \
+                                     -e 'limits' \
+                                     -e 'linux/kernel' \
+                                     -e 'linux/sysinfo' \
                                      > /dev/null
     then
         echo "Unexpected #include in input file $f".
@@ -59,6 +62,10 @@ cp_portable() {
         -e '/sys\/ioctl.h/d' \
         -e 's/SW_MAX/SW_MAX_/' \
         -e 's/atomic_t/int/' \
+        -e 's/__kernel_long_t/long/' \
+        -e 's/__kernel_ulong_t/unsigned long/' \
+        -e 's/struct ethhdr/struct eth_header/' \
+        -e '/\#define _LINUX_ETHTOOL_H/a \\n\#include "net/eth.h"' \
         "$f" > "$to/$header";
 }
 
@@ -146,7 +153,9 @@ rm -rf "$output/include/standard-headers/linux"
 mkdir -p "$output/include/standard-headers/linux"
 for i in "$tmpdir"/include/linux/*virtio*.h "$tmpdir/include/linux/input.h" \
          "$tmpdir/include/linux/input-event-codes.h" \
-         "$tmpdir/include/linux/pci_regs.h"; do
+         "$tmpdir/include/linux/pci_regs.h" \
+         "$tmpdir/include/linux/ethtool.h" "$tmpdir/include/linux/kernel.h" \
+         "$tmpdir/include/linux/sysinfo.h"; do
     cp_portable "$i" "$output/include/standard-headers/linux"
 done
 
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [virtio-dev] [PULL v2 01/50] scripts/update-linux-headers: add ethtool.h and update to 4.16.0-rc4
@ 2018-03-20  3:16   ` Michael S. Tsirkin
  0 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Jason Baron, Jason Wang, virtio-dev, Roman Kagan,
	Paolo Bonzini, Cornelia Huck, Yuval Shaia, Stefan Hajnoczi

From: Jason Baron <jbaron@akamai.com>

A subsequent patch to add support for setting linkspeed/duplex in
virtio-net, requires a few definitions from ethtool.h, which ends up
pulling in kernel.h and sysinfo.h as well.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
---
 include/standard-headers/linux/ethtool.h | 1821 ++++++++++++++++++++++++++++++
 include/standard-headers/linux/kernel.h  |   15 +
 include/standard-headers/linux/sysinfo.h |   25 +
 scripts/update-linux-headers.sh          |   11 +-
 4 files changed, 1871 insertions(+), 1 deletion(-)
 create mode 100644 include/standard-headers/linux/ethtool.h
 create mode 100644 include/standard-headers/linux/kernel.h
 create mode 100644 include/standard-headers/linux/sysinfo.h

diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h
new file mode 100644
index 0000000..94aacb7
--- /dev/null
+++ b/include/standard-headers/linux/ethtool.h
@@ -0,0 +1,1821 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * ethtool.h: Defines for Linux ethtool.
+ *
+ * Copyright (C) 1998 David S. Miller (davem@redhat.com)
+ * Copyright 2001 Jeff Garzik <jgarzik@pobox.com>
+ * Portions Copyright 2001 Sun Microsystems (thockin@sun.com)
+ * Portions Copyright 2002 Intel (eli.kupermann@intel.com,
+ *                                christopher.leech@intel.com,
+ *                                scott.feldman@intel.com)
+ * Portions Copyright (C) Sun Microsystems 2008
+ */
+
+#ifndef _LINUX_ETHTOOL_H
+#define _LINUX_ETHTOOL_H
+
+#include "net/eth.h"
+
+#include "standard-headers/linux/kernel.h"
+#include "standard-headers/linux/types.h"
+#include "standard-headers/linux/if_ether.h"
+
+#include <limits.h> /* for INT_MAX */
+
+/* All structures exposed to userland should be defined such that they
+ * have the same layout for 32-bit and 64-bit userland.
+ */
+
+/**
+ * struct ethtool_cmd - DEPRECATED, link control and status
+ * This structure is DEPRECATED, please use struct ethtool_link_settings.
+ * @cmd: Command number = %ETHTOOL_GSET or %ETHTOOL_SSET
+ * @supported: Bitmask of %SUPPORTED_* flags for the link modes,
+ *	physical connectors and other link features for which the
+ *	interface supports autonegotiation or auto-detection.
+ *	Read-only.
+ * @advertising: Bitmask of %ADVERTISED_* flags for the link modes,
+ *	physical connectors and other link features that are
+ *	advertised through autonegotiation or enabled for
+ *	auto-detection.
+ * @speed: Low bits of the speed, 1Mb units, 0 to INT_MAX or SPEED_UNKNOWN
+ * @duplex: Duplex mode; one of %DUPLEX_*
+ * @port: Physical connector type; one of %PORT_*
+ * @phy_address: MDIO address of PHY (transceiver); 0 or 255 if not
+ *	applicable.  For clause 45 PHYs this is the PRTAD.
+ * @transceiver: Historically used to distinguish different possible
+ *	PHY types, but not in a consistent way.  Deprecated.
+ * @autoneg: Enable/disable autonegotiation and auto-detection;
+ *	either %AUTONEG_DISABLE or %AUTONEG_ENABLE
+ * @mdio_support: Bitmask of %ETH_MDIO_SUPPORTS_* flags for the MDIO
+ *	protocols supported by the interface; 0 if unknown.
+ *	Read-only.
+ * @maxtxpkt: Historically used to report TX IRQ coalescing; now
+ *	obsoleted by &struct ethtool_coalesce.  Read-only; deprecated.
+ * @maxrxpkt: Historically used to report RX IRQ coalescing; now
+ *	obsoleted by &struct ethtool_coalesce.  Read-only; deprecated.
+ * @speed_hi: High bits of the speed, 1Mb units, 0 to INT_MAX or SPEED_UNKNOWN
+ * @eth_tp_mdix: Ethernet twisted-pair MDI(-X) status; one of
+ *	%ETH_TP_MDI_*.  If the status is unknown or not applicable, the
+ *	value will be %ETH_TP_MDI_INVALID.  Read-only.
+ * @eth_tp_mdix_ctrl: Ethernet twisted pair MDI(-X) control; one of
+ *	%ETH_TP_MDI_*.  If MDI(-X) control is not implemented, reads
+ *	yield %ETH_TP_MDI_INVALID and writes may be ignored or rejected.
+ *	When written successfully, the link should be renegotiated if
+ *	necessary.
+ * @lp_advertising: Bitmask of %ADVERTISED_* flags for the link modes
+ *	and other link features that the link partner advertised
+ *	through autonegotiation; 0 if unknown or not applicable.
+ *	Read-only.
+ *
+ * The link speed in Mbps is split between @speed and @speed_hi.  Use
+ * the ethtool_cmd_speed() and ethtool_cmd_speed_set() functions to
+ * access it.
+ *
+ * If autonegotiation is disabled, the speed and @duplex represent the
+ * fixed link mode and are writable if the driver supports multiple
+ * link modes.  If it is enabled then they are read-only; if the link
+ * is up they represent the negotiated link mode; if the link is down,
+ * the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
+ * @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.
+ *
+ * Some hardware interfaces may have multiple PHYs and/or physical
+ * connectors fitted or do not allow the driver to detect which are
+ * fitted.  For these interfaces @port and/or @phy_address may be
+ * writable, possibly dependent on @autoneg being %AUTONEG_DISABLE.
+ * Otherwise, attempts to write different values may be ignored or
+ * rejected.
+ *
+ * Users should assume that all fields not marked read-only are
+ * writable and subject to validation by the driver.  They should use
+ * %ETHTOOL_GSET to get the current values before making specific
+ * changes and then applying them with %ETHTOOL_SSET.
+ *
+ * Drivers that implement set_settings() should validate all fields
+ * other than @cmd that are not described as read-only or deprecated,
+ * and must ignore all fields described as read-only.
+ *
+ * Deprecated fields should be ignored by both users and drivers.
+ */
+struct ethtool_cmd {
+	uint32_t	cmd;
+	uint32_t	supported;
+	uint32_t	advertising;
+	uint16_t	speed;
+	uint8_t	duplex;
+	uint8_t	port;
+	uint8_t	phy_address;
+	uint8_t	transceiver;
+	uint8_t	autoneg;
+	uint8_t	mdio_support;
+	uint32_t	maxtxpkt;
+	uint32_t	maxrxpkt;
+	uint16_t	speed_hi;
+	uint8_t	eth_tp_mdix;
+	uint8_t	eth_tp_mdix_ctrl;
+	uint32_t	lp_advertising;
+	uint32_t	reserved[2];
+};
+
+static inline void ethtool_cmd_speed_set(struct ethtool_cmd *ep,
+					 uint32_t speed)
+{
+	ep->speed = (uint16_t)(speed & 0xFFFF);
+	ep->speed_hi = (uint16_t)(speed >> 16);
+}
+
+static inline uint32_t ethtool_cmd_speed(const struct ethtool_cmd *ep)
+{
+	return (ep->speed_hi << 16) | ep->speed;
+}
+
+/* Device supports clause 22 register access to PHY or peripherals
+ * using the interface defined in "standard-headers/linux/mii.h".  This should not be
+ * set if there are known to be no such peripherals present or if
+ * the driver only emulates clause 22 registers for compatibility.
+ */
+#define ETH_MDIO_SUPPORTS_C22	1
+
+/* Device supports clause 45 register access to PHY or peripherals
+ * using the interface defined in "standard-headers/linux/mii.h" and <linux/mdio.h>.
+ * This should not be set if there are known to be no such peripherals
+ * present.
+ */
+#define ETH_MDIO_SUPPORTS_C45	2
+
+#define ETHTOOL_FWVERS_LEN	32
+#define ETHTOOL_BUSINFO_LEN	32
+#define ETHTOOL_EROMVERS_LEN	32
+
+/**
+ * struct ethtool_drvinfo - general driver and device information
+ * @cmd: Command number = %ETHTOOL_GDRVINFO
+ * @driver: Driver short name.  This should normally match the name
+ *	in its bus driver structure (e.g. pci_driver::name).  Must
+ *	not be an empty string.
+ * @version: Driver version string; may be an empty string
+ * @fw_version: Firmware version string; may be an empty string
+ * @erom_version: Expansion ROM version string; may be an empty string
+ * @bus_info: Device bus address.  This should match the dev_name()
+ *	string for the underlying bus device, if there is one.  May be
+ *	an empty string.
+ * @n_priv_flags: Number of flags valid for %ETHTOOL_GPFLAGS and
+ *	%ETHTOOL_SPFLAGS commands; also the number of strings in the
+ *	%ETH_SS_PRIV_FLAGS set
+ * @n_stats: Number of uint64_t statistics returned by the %ETHTOOL_GSTATS
+ *	command; also the number of strings in the %ETH_SS_STATS set
+ * @testinfo_len: Number of results returned by the %ETHTOOL_TEST
+ *	command; also the number of strings in the %ETH_SS_TEST set
+ * @eedump_len: Size of EEPROM accessible through the %ETHTOOL_GEEPROM
+ *	and %ETHTOOL_SEEPROM commands, in bytes
+ * @regdump_len: Size of register dump returned by the %ETHTOOL_GREGS
+ *	command, in bytes
+ *
+ * Users can use the %ETHTOOL_GSSET_INFO command to get the number of
+ * strings in any string set (from Linux 2.6.34).
+ *
+ * Drivers should set at most @driver, @version, @fw_version and
+ * @bus_info in their get_drvinfo() implementation.  The ethtool
+ * core fills in the other fields using other driver operations.
+ */
+struct ethtool_drvinfo {
+	uint32_t	cmd;
+	char	driver[32];
+	char	version[32];
+	char	fw_version[ETHTOOL_FWVERS_LEN];
+	char	bus_info[ETHTOOL_BUSINFO_LEN];
+	char	erom_version[ETHTOOL_EROMVERS_LEN];
+	char	reserved2[12];
+	uint32_t	n_priv_flags;
+	uint32_t	n_stats;
+	uint32_t	testinfo_len;
+	uint32_t	eedump_len;
+	uint32_t	regdump_len;
+};
+
+#define SOPASS_MAX	6
+
+/**
+ * struct ethtool_wolinfo - Wake-On-Lan configuration
+ * @cmd: Command number = %ETHTOOL_GWOL or %ETHTOOL_SWOL
+ * @supported: Bitmask of %WAKE_* flags for supported Wake-On-Lan modes.
+ *	Read-only.
+ * @wolopts: Bitmask of %WAKE_* flags for enabled Wake-On-Lan modes.
+ * @sopass: SecureOn(tm) password; meaningful only if %WAKE_MAGICSECURE
+ *	is set in @wolopts.
+ */
+struct ethtool_wolinfo {
+	uint32_t	cmd;
+	uint32_t	supported;
+	uint32_t	wolopts;
+	uint8_t	sopass[SOPASS_MAX];
+};
+
+/* for passing single values */
+struct ethtool_value {
+	uint32_t	cmd;
+	uint32_t	data;
+};
+
+enum tunable_id {
+	ETHTOOL_ID_UNSPEC,
+	ETHTOOL_RX_COPYBREAK,
+	ETHTOOL_TX_COPYBREAK,
+	/*
+	 * Add your fresh new tubale attribute above and remember to update
+	 * tunable_strings[] in net/core/ethtool.c
+	 */
+	__ETHTOOL_TUNABLE_COUNT,
+};
+
+enum tunable_type_id {
+	ETHTOOL_TUNABLE_UNSPEC,
+	ETHTOOL_TUNABLE_U8,
+	ETHTOOL_TUNABLE_U16,
+	ETHTOOL_TUNABLE_U32,
+	ETHTOOL_TUNABLE_U64,
+	ETHTOOL_TUNABLE_STRING,
+	ETHTOOL_TUNABLE_S8,
+	ETHTOOL_TUNABLE_S16,
+	ETHTOOL_TUNABLE_S32,
+	ETHTOOL_TUNABLE_S64,
+};
+
+struct ethtool_tunable {
+	uint32_t	cmd;
+	uint32_t	id;
+	uint32_t	type_id;
+	uint32_t	len;
+	void	*data[0];
+};
+
+#define DOWNSHIFT_DEV_DEFAULT_COUNT	0xff
+#define DOWNSHIFT_DEV_DISABLE		0
+
+enum phy_tunable_id {
+	ETHTOOL_PHY_ID_UNSPEC,
+	ETHTOOL_PHY_DOWNSHIFT,
+	/*
+	 * Add your fresh new phy tunable attribute above and remember to update
+	 * phy_tunable_strings[] in net/core/ethtool.c
+	 */
+	__ETHTOOL_PHY_TUNABLE_COUNT,
+};
+
+/**
+ * struct ethtool_regs - hardware register dump
+ * @cmd: Command number = %ETHTOOL_GREGS
+ * @version: Dump format version.  This is driver-specific and may
+ *	distinguish different chips/revisions.  Drivers must use new
+ *	version numbers whenever the dump format changes in an
+ *	incompatible way.
+ * @len: On entry, the real length of @data.  On return, the number of
+ *	bytes used.
+ * @data: Buffer for the register dump
+ *
+ * Users should use %ETHTOOL_GDRVINFO to find the maximum length of
+ * a register dump for the interface.  They must allocate the buffer
+ * immediately following this structure.
+ */
+struct ethtool_regs {
+	uint32_t	cmd;
+	uint32_t	version;
+	uint32_t	len;
+	uint8_t	data[0];
+};
+
+/**
+ * struct ethtool_eeprom - EEPROM dump
+ * @cmd: Command number = %ETHTOOL_GEEPROM, %ETHTOOL_GMODULEEEPROM or
+ *	%ETHTOOL_SEEPROM
+ * @magic: A 'magic cookie' value to guard against accidental changes.
+ *	The value passed in to %ETHTOOL_SEEPROM must match the value
+ *	returned by %ETHTOOL_GEEPROM for the same device.  This is
+ *	unused when @cmd is %ETHTOOL_GMODULEEEPROM.
+ * @offset: Offset within the EEPROM to begin reading/writing, in bytes
+ * @len: On entry, number of bytes to read/write.  On successful
+ *	return, number of bytes actually read/written.  In case of
+ *	error, this may indicate at what point the error occurred.
+ * @data: Buffer to read/write from
+ *
+ * Users may use %ETHTOOL_GDRVINFO or %ETHTOOL_GMODULEINFO to find
+ * the length of an on-board or module EEPROM, respectively.  They
+ * must allocate the buffer immediately following this structure.
+ */
+struct ethtool_eeprom {
+	uint32_t	cmd;
+	uint32_t	magic;
+	uint32_t	offset;
+	uint32_t	len;
+	uint8_t	data[0];
+};
+
+/**
+ * struct ethtool_eee - Energy Efficient Ethernet information
+ * @cmd: ETHTOOL_{G,S}EEE
+ * @supported: Mask of %SUPPORTED_* flags for the speed/duplex combinations
+ *	for which there is EEE support.
+ * @advertised: Mask of %ADVERTISED_* flags for the speed/duplex combinations
+ *	advertised as eee capable.
+ * @lp_advertised: Mask of %ADVERTISED_* flags for the speed/duplex
+ *	combinations advertised by the link partner as eee capable.
+ * @eee_active: Result of the eee auto negotiation.
+ * @eee_enabled: EEE configured mode (enabled/disabled).
+ * @tx_lpi_enabled: Whether the interface should assert its tx lpi, given
+ *	that eee was negotiated.
+ * @tx_lpi_timer: Time in microseconds the interface delays prior to asserting
+ *	its tx lpi (after reaching 'idle' state). Effective only when eee
+ *	was negotiated and tx_lpi_enabled was set.
+ */
+struct ethtool_eee {
+	uint32_t	cmd;
+	uint32_t	supported;
+	uint32_t	advertised;
+	uint32_t	lp_advertised;
+	uint32_t	eee_active;
+	uint32_t	eee_enabled;
+	uint32_t	tx_lpi_enabled;
+	uint32_t	tx_lpi_timer;
+	uint32_t	reserved[2];
+};
+
+/**
+ * struct ethtool_modinfo - plugin module eeprom information
+ * @cmd: %ETHTOOL_GMODULEINFO
+ * @type: Standard the module information conforms to %ETH_MODULE_SFF_xxxx
+ * @eeprom_len: Length of the eeprom
+ *
+ * This structure is used to return the information to
+ * properly size memory for a subsequent call to %ETHTOOL_GMODULEEEPROM.
+ * The type code indicates the eeprom data format
+ */
+struct ethtool_modinfo {
+	uint32_t   cmd;
+	uint32_t   type;
+	uint32_t   eeprom_len;
+	uint32_t   reserved[8];
+};
+
+/**
+ * struct ethtool_coalesce - coalescing parameters for IRQs and stats updates
+ * @cmd: ETHTOOL_{G,S}COALESCE
+ * @rx_coalesce_usecs: How many usecs to delay an RX interrupt after
+ *	a packet arrives.
+ * @rx_max_coalesced_frames: Maximum number of packets to receive
+ *	before an RX interrupt.
+ * @rx_coalesce_usecs_irq: Same as @rx_coalesce_usecs, except that
+ *	this value applies while an IRQ is being serviced by the host.
+ * @rx_max_coalesced_frames_irq: Same as @rx_max_coalesced_frames,
+ *	except that this value applies while an IRQ is being serviced
+ *	by the host.
+ * @tx_coalesce_usecs: How many usecs to delay a TX interrupt after
+ *	a packet is sent.
+ * @tx_max_coalesced_frames: Maximum number of packets to be sent
+ *	before a TX interrupt.
+ * @tx_coalesce_usecs_irq: Same as @tx_coalesce_usecs, except that
+ *	this value applies while an IRQ is being serviced by the host.
+ * @tx_max_coalesced_frames_irq: Same as @tx_max_coalesced_frames,
+ *	except that this value applies while an IRQ is being serviced
+ *	by the host.
+ * @stats_block_coalesce_usecs: How many usecs to delay in-memory
+ *	statistics block updates.  Some drivers do not have an
+ *	in-memory statistic block, and in such cases this value is
+ *	ignored.  This value must not be zero.
+ * @use_adaptive_rx_coalesce: Enable adaptive RX coalescing.
+ * @use_adaptive_tx_coalesce: Enable adaptive TX coalescing.
+ * @pkt_rate_low: Threshold for low packet rate (packets per second).
+ * @rx_coalesce_usecs_low: How many usecs to delay an RX interrupt after
+ *	a packet arrives, when the packet rate is below @pkt_rate_low.
+ * @rx_max_coalesced_frames_low: Maximum number of packets to be received
+ *	before an RX interrupt, when the packet rate is below @pkt_rate_low.
+ * @tx_coalesce_usecs_low: How many usecs to delay a TX interrupt after
+ *	a packet is sent, when the packet rate is below @pkt_rate_low.
+ * @tx_max_coalesced_frames_low: Maximum nuumber of packets to be sent before
+ *	a TX interrupt, when the packet rate is below @pkt_rate_low.
+ * @pkt_rate_high: Threshold for high packet rate (packets per second).
+ * @rx_coalesce_usecs_high: How many usecs to delay an RX interrupt after
+ *	a packet arrives, when the packet rate is above @pkt_rate_high.
+ * @rx_max_coalesced_frames_high: Maximum number of packets to be received
+ *	before an RX interrupt, when the packet rate is above @pkt_rate_high.
+ * @tx_coalesce_usecs_high: How many usecs to delay a TX interrupt after
+ *	a packet is sent, when the packet rate is above @pkt_rate_high.
+ * @tx_max_coalesced_frames_high: Maximum number of packets to be sent before
+ *	a TX interrupt, when the packet rate is above @pkt_rate_high.
+ * @rate_sample_interval: How often to do adaptive coalescing packet rate
+ *	sampling, measured in seconds.  Must not be zero.
+ *
+ * Each pair of (usecs, max_frames) fields specifies that interrupts
+ * should be coalesced until
+ *	(usecs > 0 && time_since_first_completion >= usecs) ||
+ *	(max_frames > 0 && completed_frames >= max_frames)
+ *
+ * It is illegal to set both usecs and max_frames to zero as this
+ * would cause interrupts to never be generated.  To disable
+ * coalescing, set usecs = 0 and max_frames = 1.
+ *
+ * Some implementations ignore the value of max_frames and use the
+ * condition time_since_first_completion >= usecs
+ *
+ * This is deprecated.  Drivers for hardware that does not support
+ * counting completions should validate that max_frames == !rx_usecs.
+ *
+ * Adaptive RX/TX coalescing is an algorithm implemented by some
+ * drivers to improve latency under low packet rates and improve
+ * throughput under high packet rates.  Some drivers only implement
+ * one of RX or TX adaptive coalescing.  Anything not implemented by
+ * the driver causes these values to be silently ignored.
+ *
+ * When the packet rate is below @pkt_rate_high but above
+ * @pkt_rate_low (both measured in packets per second) the
+ * normal {rx,tx}_* coalescing parameters are used.
+ */
+struct ethtool_coalesce {
+	uint32_t	cmd;
+	uint32_t	rx_coalesce_usecs;
+	uint32_t	rx_max_coalesced_frames;
+	uint32_t	rx_coalesce_usecs_irq;
+	uint32_t	rx_max_coalesced_frames_irq;
+	uint32_t	tx_coalesce_usecs;
+	uint32_t	tx_max_coalesced_frames;
+	uint32_t	tx_coalesce_usecs_irq;
+	uint32_t	tx_max_coalesced_frames_irq;
+	uint32_t	stats_block_coalesce_usecs;
+	uint32_t	use_adaptive_rx_coalesce;
+	uint32_t	use_adaptive_tx_coalesce;
+	uint32_t	pkt_rate_low;
+	uint32_t	rx_coalesce_usecs_low;
+	uint32_t	rx_max_coalesced_frames_low;
+	uint32_t	tx_coalesce_usecs_low;
+	uint32_t	tx_max_coalesced_frames_low;
+	uint32_t	pkt_rate_high;
+	uint32_t	rx_coalesce_usecs_high;
+	uint32_t	rx_max_coalesced_frames_high;
+	uint32_t	tx_coalesce_usecs_high;
+	uint32_t	tx_max_coalesced_frames_high;
+	uint32_t	rate_sample_interval;
+};
+
+/**
+ * struct ethtool_ringparam - RX/TX ring parameters
+ * @cmd: Command number = %ETHTOOL_GRINGPARAM or %ETHTOOL_SRINGPARAM
+ * @rx_max_pending: Maximum supported number of pending entries per
+ *	RX ring.  Read-only.
+ * @rx_mini_max_pending: Maximum supported number of pending entries
+ *	per RX mini ring.  Read-only.
+ * @rx_jumbo_max_pending: Maximum supported number of pending entries
+ *	per RX jumbo ring.  Read-only.
+ * @tx_max_pending: Maximum supported number of pending entries per
+ *	TX ring.  Read-only.
+ * @rx_pending: Current maximum number of pending entries per RX ring
+ * @rx_mini_pending: Current maximum number of pending entries per RX
+ *	mini ring
+ * @rx_jumbo_pending: Current maximum number of pending entries per RX
+ *	jumbo ring
+ * @tx_pending: Current maximum supported number of pending entries
+ *	per TX ring
+ *
+ * If the interface does not have separate RX mini and/or jumbo rings,
+ * @rx_mini_max_pending and/or @rx_jumbo_max_pending will be 0.
+ *
+ * There may also be driver-dependent minimum values for the number
+ * of entries per ring.
+ */
+struct ethtool_ringparam {
+	uint32_t	cmd;
+	uint32_t	rx_max_pending;
+	uint32_t	rx_mini_max_pending;
+	uint32_t	rx_jumbo_max_pending;
+	uint32_t	tx_max_pending;
+	uint32_t	rx_pending;
+	uint32_t	rx_mini_pending;
+	uint32_t	rx_jumbo_pending;
+	uint32_t	tx_pending;
+};
+
+/**
+ * struct ethtool_channels - configuring number of network channel
+ * @cmd: ETHTOOL_{G,S}CHANNELS
+ * @max_rx: Read only. Maximum number of receive channel the driver support.
+ * @max_tx: Read only. Maximum number of transmit channel the driver support.
+ * @max_other: Read only. Maximum number of other channel the driver support.
+ * @max_combined: Read only. Maximum number of combined channel the driver
+ *	support. Set of queues RX, TX or other.
+ * @rx_count: Valid values are in the range 1 to the max_rx.
+ * @tx_count: Valid values are in the range 1 to the max_tx.
+ * @other_count: Valid values are in the range 1 to the max_other.
+ * @combined_count: Valid values are in the range 1 to the max_combined.
+ *
+ * This can be used to configure RX, TX and other channels.
+ */
+
+struct ethtool_channels {
+	uint32_t	cmd;
+	uint32_t	max_rx;
+	uint32_t	max_tx;
+	uint32_t	max_other;
+	uint32_t	max_combined;
+	uint32_t	rx_count;
+	uint32_t	tx_count;
+	uint32_t	other_count;
+	uint32_t	combined_count;
+};
+
+/**
+ * struct ethtool_pauseparam - Ethernet pause (flow control) parameters
+ * @cmd: Command number = %ETHTOOL_GPAUSEPARAM or %ETHTOOL_SPAUSEPARAM
+ * @autoneg: Flag to enable autonegotiation of pause frame use
+ * @rx_pause: Flag to enable reception of pause frames
+ * @tx_pause: Flag to enable transmission of pause frames
+ *
+ * Drivers should reject a non-zero setting of @autoneg when
+ * autoneogotiation is disabled (or not supported) for the link.
+ *
+ * If the link is autonegotiated, drivers should use
+ * mii_advertise_flowctrl() or similar code to set the advertised
+ * pause frame capabilities based on the @rx_pause and @tx_pause flags,
+ * even if @autoneg is zero.  They should also allow the advertised
+ * pause frame capabilities to be controlled directly through the
+ * advertising field of &struct ethtool_cmd.
+ *
+ * If @autoneg is non-zero, the MAC is configured to send and/or
+ * receive pause frames according to the result of autonegotiation.
+ * Otherwise, it is configured directly based on the @rx_pause and
+ * @tx_pause flags.
+ */
+struct ethtool_pauseparam {
+	uint32_t	cmd;
+	uint32_t	autoneg;
+	uint32_t	rx_pause;
+	uint32_t	tx_pause;
+};
+
+#define ETH_GSTRING_LEN		32
+
+/**
+ * enum ethtool_stringset - string set ID
+ * @ETH_SS_TEST: Self-test result names, for use with %ETHTOOL_TEST
+ * @ETH_SS_STATS: Statistic names, for use with %ETHTOOL_GSTATS
+ * @ETH_SS_PRIV_FLAGS: Driver private flag names, for use with
+ *	%ETHTOOL_GPFLAGS and %ETHTOOL_SPFLAGS
+ * @ETH_SS_NTUPLE_FILTERS: Previously used with %ETHTOOL_GRXNTUPLE;
+ *	now deprecated
+ * @ETH_SS_FEATURES: Device feature names
+ * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names
+ * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS
+ * @ETH_SS_PHY_TUNABLES: PHY tunable names
+ */
+enum ethtool_stringset {
+	ETH_SS_TEST		= 0,
+	ETH_SS_STATS,
+	ETH_SS_PRIV_FLAGS,
+	ETH_SS_NTUPLE_FILTERS,
+	ETH_SS_FEATURES,
+	ETH_SS_RSS_HASH_FUNCS,
+	ETH_SS_TUNABLES,
+	ETH_SS_PHY_STATS,
+	ETH_SS_PHY_TUNABLES,
+};
+
+/**
+ * struct ethtool_gstrings - string set for data tagging
+ * @cmd: Command number = %ETHTOOL_GSTRINGS
+ * @string_set: String set ID; one of &enum ethtool_stringset
+ * @len: On return, the number of strings in the string set
+ * @data: Buffer for strings.  Each string is null-padded to a size of
+ *	%ETH_GSTRING_LEN.
+ *
+ * Users must use %ETHTOOL_GSSET_INFO to find the number of strings in
+ * the string set.  They must allocate a buffer of the appropriate
+ * size immediately following this structure.
+ */
+struct ethtool_gstrings {
+	uint32_t	cmd;
+	uint32_t	string_set;
+	uint32_t	len;
+	uint8_t	data[0];
+};
+
+/**
+ * struct ethtool_sset_info - string set information
+ * @cmd: Command number = %ETHTOOL_GSSET_INFO
+ * @sset_mask: On entry, a bitmask of string sets to query, with bits
+ *	numbered according to &enum ethtool_stringset.  On return, a
+ *	bitmask of those string sets queried that are supported.
+ * @data: Buffer for string set sizes.  On return, this contains the
+ *	size of each string set that was queried and supported, in
+ *	order of ID.
+ *
+ * Example: The user passes in @sset_mask = 0x7 (sets 0, 1, 2) and on
+ * return @sset_mask == 0x6 (sets 1, 2).  Then @data[0] contains the
+ * size of set 1 and @data[1] contains the size of set 2.
+ *
+ * Users must allocate a buffer of the appropriate size (4 * number of
+ * sets queried) immediately following this structure.
+ */
+struct ethtool_sset_info {
+	uint32_t	cmd;
+	uint32_t	reserved;
+	uint64_t	sset_mask;
+	uint32_t	data[0];
+};
+
+/**
+ * enum ethtool_test_flags - flags definition of ethtool_test
+ * @ETH_TEST_FL_OFFLINE: if set perform online and offline tests, otherwise
+ *	only online tests.
+ * @ETH_TEST_FL_FAILED: Driver set this flag if test fails.
+ * @ETH_TEST_FL_EXTERNAL_LB: Application request to perform external loopback
+ *	test.
+ * @ETH_TEST_FL_EXTERNAL_LB_DONE: Driver performed the external loopback test
+ */
+
+enum ethtool_test_flags {
+	ETH_TEST_FL_OFFLINE	= (1 << 0),
+	ETH_TEST_FL_FAILED	= (1 << 1),
+	ETH_TEST_FL_EXTERNAL_LB	= (1 << 2),
+	ETH_TEST_FL_EXTERNAL_LB_DONE	= (1 << 3),
+};
+
+/**
+ * struct ethtool_test - device self-test invocation
+ * @cmd: Command number = %ETHTOOL_TEST
+ * @flags: A bitmask of flags from &enum ethtool_test_flags.  Some
+ *	flags may be set by the user on entry; others may be set by
+ *	the driver on return.
+ * @len: On return, the number of test results
+ * @data: Array of test results
+ *
+ * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the
+ * number of test results that will be returned.  They must allocate a
+ * buffer of the appropriate size (8 * number of results) immediately
+ * following this structure.
+ */
+struct ethtool_test {
+	uint32_t	cmd;
+	uint32_t	flags;
+	uint32_t	reserved;
+	uint32_t	len;
+	uint64_t	data[0];
+};
+
+/**
+ * struct ethtool_stats - device-specific statistics
+ * @cmd: Command number = %ETHTOOL_GSTATS
+ * @n_stats: On return, the number of statistics
+ * @data: Array of statistics
+ *
+ * Users must use %ETHTOOL_GSSET_INFO or %ETHTOOL_GDRVINFO to find the
+ * number of statistics that will be returned.  They must allocate a
+ * buffer of the appropriate size (8 * number of statistics)
+ * immediately following this structure.
+ */
+struct ethtool_stats {
+	uint32_t	cmd;
+	uint32_t	n_stats;
+	uint64_t	data[0];
+};
+
+/**
+ * struct ethtool_perm_addr - permanent hardware address
+ * @cmd: Command number = %ETHTOOL_GPERMADDR
+ * @size: On entry, the size of the buffer.  On return, the size of the
+ *	address.  The command fails if the buffer is too small.
+ * @data: Buffer for the address
+ *
+ * Users must allocate the buffer immediately following this structure.
+ * A buffer size of %MAX_ADDR_LEN should be sufficient for any address
+ * type.
+ */
+struct ethtool_perm_addr {
+	uint32_t	cmd;
+	uint32_t	size;
+	uint8_t	data[0];
+};
+
+/* boolean flags controlling per-interface behavior characteristics.
+ * When reading, the flag indicates whether or not a certain behavior
+ * is enabled/present.  When writing, the flag indicates whether
+ * or not the driver should turn on (set) or off (clear) a behavior.
+ *
+ * Some behaviors may read-only (unconditionally absent or present).
+ * If such is the case, return EINVAL in the set-flags operation if the
+ * flag differs from the read-only value.
+ */
+enum ethtool_flags {
+	ETH_FLAG_TXVLAN		= (1 << 7),	/* TX VLAN offload enabled */
+	ETH_FLAG_RXVLAN		= (1 << 8),	/* RX VLAN offload enabled */
+	ETH_FLAG_LRO		= (1 << 15),	/* LRO is enabled */
+	ETH_FLAG_NTUPLE		= (1 << 27),	/* N-tuple filters enabled */
+	ETH_FLAG_RXHASH		= (1 << 28),
+};
+
+/* The following structures are for supporting RX network flow
+ * classification and RX n-tuple configuration. Note, all multibyte
+ * fields, e.g., ip4src, ip4dst, psrc, pdst, spi, etc. are expected to
+ * be in network byte order.
+ */
+
+/**
+ * struct ethtool_tcpip4_spec - flow specification for TCP/IPv4 etc.
+ * @ip4src: Source host
+ * @ip4dst: Destination host
+ * @psrc: Source port
+ * @pdst: Destination port
+ * @tos: Type-of-service
+ *
+ * This can be used to specify a TCP/IPv4, UDP/IPv4 or SCTP/IPv4 flow.
+ */
+struct ethtool_tcpip4_spec {
+	uint32_t	ip4src;
+	uint32_t	ip4dst;
+	uint16_t	psrc;
+	uint16_t	pdst;
+	uint8_t    tos;
+};
+
+/**
+ * struct ethtool_ah_espip4_spec - flow specification for IPsec/IPv4
+ * @ip4src: Source host
+ * @ip4dst: Destination host
+ * @spi: Security parameters index
+ * @tos: Type-of-service
+ *
+ * This can be used to specify an IPsec transport or tunnel over IPv4.
+ */
+struct ethtool_ah_espip4_spec {
+	uint32_t	ip4src;
+	uint32_t	ip4dst;
+	uint32_t	spi;
+	uint8_t    tos;
+};
+
+#define	ETH_RX_NFC_IP4	1
+
+/**
+ * struct ethtool_usrip4_spec - general flow specification for IPv4
+ * @ip4src: Source host
+ * @ip4dst: Destination host
+ * @l4_4_bytes: First 4 bytes of transport (layer 4) header
+ * @tos: Type-of-service
+ * @ip_ver: Value must be %ETH_RX_NFC_IP4; mask must be 0
+ * @proto: Transport protocol number; mask must be 0
+ */
+struct ethtool_usrip4_spec {
+	uint32_t	ip4src;
+	uint32_t	ip4dst;
+	uint32_t	l4_4_bytes;
+	uint8_t    tos;
+	uint8_t    ip_ver;
+	uint8_t    proto;
+};
+
+/**
+ * struct ethtool_tcpip6_spec - flow specification for TCP/IPv6 etc.
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @psrc: Source port
+ * @pdst: Destination port
+ * @tclass: Traffic Class
+ *
+ * This can be used to specify a TCP/IPv6, UDP/IPv6 or SCTP/IPv6 flow.
+ */
+struct ethtool_tcpip6_spec {
+	uint32_t	ip6src[4];
+	uint32_t	ip6dst[4];
+	uint16_t	psrc;
+	uint16_t	pdst;
+	uint8_t    tclass;
+};
+
+/**
+ * struct ethtool_ah_espip6_spec - flow specification for IPsec/IPv6
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @spi: Security parameters index
+ * @tclass: Traffic Class
+ *
+ * This can be used to specify an IPsec transport or tunnel over IPv6.
+ */
+struct ethtool_ah_espip6_spec {
+	uint32_t	ip6src[4];
+	uint32_t	ip6dst[4];
+	uint32_t	spi;
+	uint8_t    tclass;
+};
+
+/**
+ * struct ethtool_usrip6_spec - general flow specification for IPv6
+ * @ip6src: Source host
+ * @ip6dst: Destination host
+ * @l4_4_bytes: First 4 bytes of transport (layer 4) header
+ * @tclass: Traffic Class
+ * @l4_proto: Transport protocol number (nexthdr after any Extension Headers)
+ */
+struct ethtool_usrip6_spec {
+	uint32_t	ip6src[4];
+	uint32_t	ip6dst[4];
+	uint32_t	l4_4_bytes;
+	uint8_t    tclass;
+	uint8_t    l4_proto;
+};
+
+union ethtool_flow_union {
+	struct ethtool_tcpip4_spec		tcp_ip4_spec;
+	struct ethtool_tcpip4_spec		udp_ip4_spec;
+	struct ethtool_tcpip4_spec		sctp_ip4_spec;
+	struct ethtool_ah_espip4_spec		ah_ip4_spec;
+	struct ethtool_ah_espip4_spec		esp_ip4_spec;
+	struct ethtool_usrip4_spec		usr_ip4_spec;
+	struct ethtool_tcpip6_spec		tcp_ip6_spec;
+	struct ethtool_tcpip6_spec		udp_ip6_spec;
+	struct ethtool_tcpip6_spec		sctp_ip6_spec;
+	struct ethtool_ah_espip6_spec		ah_ip6_spec;
+	struct ethtool_ah_espip6_spec		esp_ip6_spec;
+	struct ethtool_usrip6_spec		usr_ip6_spec;
+	struct eth_header				ether_spec;
+	uint8_t					hdata[52];
+};
+
+/**
+ * struct ethtool_flow_ext - additional RX flow fields
+ * @h_dest: destination MAC address
+ * @vlan_etype: VLAN EtherType
+ * @vlan_tci: VLAN tag control information
+ * @data: user defined data
+ *
+ * Note, @vlan_etype, @vlan_tci, and @data are only valid if %FLOW_EXT
+ * is set in &struct ethtool_rx_flow_spec @flow_type.
+ * @h_dest is valid if %FLOW_MAC_EXT is set.
+ */
+struct ethtool_flow_ext {
+	uint8_t		padding[2];
+	unsigned char	h_dest[ETH_ALEN];
+	uint16_t		vlan_etype;
+	uint16_t		vlan_tci;
+	uint32_t		data[2];
+};
+
+/**
+ * struct ethtool_rx_flow_spec - classification rule for RX flows
+ * @flow_type: Type of match to perform, e.g. %TCP_V4_FLOW
+ * @h_u: Flow fields to match (dependent on @flow_type)
+ * @h_ext: Additional fields to match
+ * @m_u: Masks for flow field bits to be matched
+ * @m_ext: Masks for additional field bits to be matched
+ *	Note, all additional fields must be ignored unless @flow_type
+ *	includes the %FLOW_EXT or %FLOW_MAC_EXT flag
+ *	(see &struct ethtool_flow_ext description).
+ * @ring_cookie: RX ring/queue index to deliver to, or %RX_CLS_FLOW_DISC
+ *	if packets should be discarded
+ * @location: Location of rule in the table.  Locations must be
+ *	numbered such that a flow matching multiple rules will be
+ *	classified according to the first (lowest numbered) rule.
+ */
+struct ethtool_rx_flow_spec {
+	uint32_t		flow_type;
+	union ethtool_flow_union h_u;
+	struct ethtool_flow_ext h_ext;
+	union ethtool_flow_union m_u;
+	struct ethtool_flow_ext m_ext;
+	uint64_t		ring_cookie;
+	uint32_t		location;
+};
+
+/* How rings are layed out when accessing virtual functions or
+ * offloaded queues is device specific. To allow users to do flow
+ * steering and specify these queues the ring cookie is partitioned
+ * into a 32bit queue index with an 8 bit virtual function id.
+ * This also leaves the 3bytes for further specifiers. It is possible
+ * future devices may support more than 256 virtual functions if
+ * devices start supporting PCIe w/ARI. However at the moment I
+ * do not know of any devices that support this so I do not reserve
+ * space for this at this time. If a future patch consumes the next
+ * byte it should be aware of this possiblity.
+ */
+#define ETHTOOL_RX_FLOW_SPEC_RING	0x00000000FFFFFFFFLL
+#define ETHTOOL_RX_FLOW_SPEC_RING_VF	0x000000FF00000000LL
+#define ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF 32
+static inline uint64_t ethtool_get_flow_spec_ring(uint64_t ring_cookie)
+{
+	return ETHTOOL_RX_FLOW_SPEC_RING & ring_cookie;
+};
+
+static inline uint64_t ethtool_get_flow_spec_ring_vf(uint64_t ring_cookie)
+{
+	return (ETHTOOL_RX_FLOW_SPEC_RING_VF & ring_cookie) >>
+				ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF;
+};
+
+/**
+ * struct ethtool_rxnfc - command to get or set RX flow classification rules
+ * @cmd: Specific command number - %ETHTOOL_GRXFH, %ETHTOOL_SRXFH,
+ *	%ETHTOOL_GRXRINGS, %ETHTOOL_GRXCLSRLCNT, %ETHTOOL_GRXCLSRULE,
+ *	%ETHTOOL_GRXCLSRLALL, %ETHTOOL_SRXCLSRLDEL or %ETHTOOL_SRXCLSRLINS
+ * @flow_type: Type of flow to be affected, e.g. %TCP_V4_FLOW
+ * @data: Command-dependent value
+ * @fs: Flow classification rule
+ * @rule_cnt: Number of rules to be affected
+ * @rule_locs: Array of used rule locations
+ *
+ * For %ETHTOOL_GRXFH and %ETHTOOL_SRXFH, @data is a bitmask indicating
+ * the fields included in the flow hash, e.g. %RXH_IP_SRC.  The following
+ * structure fields must not be used.
+ *
+ * For %ETHTOOL_GRXRINGS, @data is set to the number of RX rings/queues
+ * on return.
+ *
+ * For %ETHTOOL_GRXCLSRLCNT, @rule_cnt is set to the number of defined
+ * rules on return.  If @data is non-zero on return then it is the
+ * size of the rule table, plus the flag %RX_CLS_LOC_SPECIAL if the
+ * driver supports any special location values.  If that flag is not
+ * set in @data then special location values should not be used.
+ *
+ * For %ETHTOOL_GRXCLSRULE, @fs.@location specifies the location of an
+ * existing rule on entry and @fs contains the rule on return.
+ *
+ * For %ETHTOOL_GRXCLSRLALL, @rule_cnt specifies the array size of the
+ * user buffer for @rule_locs on entry.  On return, @data is the size
+ * of the rule table, @rule_cnt is the number of defined rules, and
+ * @rule_locs contains the locations of the defined rules.  Drivers
+ * must use the second parameter to get_rxnfc() instead of @rule_locs.
+ *
+ * For %ETHTOOL_SRXCLSRLINS, @fs specifies the rule to add or update.
+ * @fs.@location either specifies the location to use or is a special
+ * location value with %RX_CLS_LOC_SPECIAL flag set.  On return,
+ * @fs.@location is the actual rule location.
+ *
+ * For %ETHTOOL_SRXCLSRLDEL, @fs.@location specifies the location of an
+ * existing rule on entry.
+ *
+ * A driver supporting the special location values for
+ * %ETHTOOL_SRXCLSRLINS may add the rule at any suitable unused
+ * location, and may remove a rule at a later location (lower
+ * priority) that matches exactly the same set of flows.  The special
+ * values are %RX_CLS_LOC_ANY, selecting any location;
+ * %RX_CLS_LOC_FIRST, selecting the first suitable location (maximum
+ * priority); and %RX_CLS_LOC_LAST, selecting the last suitable
+ * location (minimum priority).  Additional special values may be
+ * defined in future and drivers must return -%EINVAL for any
+ * unrecognised value.
+ */
+struct ethtool_rxnfc {
+	uint32_t				cmd;
+	uint32_t				flow_type;
+	uint64_t				data;
+	struct ethtool_rx_flow_spec	fs;
+	uint32_t				rule_cnt;
+	uint32_t				rule_locs[0];
+};
+
+
+/**
+ * struct ethtool_rxfh_indir - command to get or set RX flow hash indirection
+ * @cmd: Specific command number - %ETHTOOL_GRXFHINDIR or %ETHTOOL_SRXFHINDIR
+ * @size: On entry, the array size of the user buffer, which may be zero.
+ *	On return from %ETHTOOL_GRXFHINDIR, the array size of the hardware
+ *	indirection table.
+ * @ring_index: RX ring/queue index for each hash value
+ *
+ * For %ETHTOOL_GRXFHINDIR, a @size of zero means that only the size
+ * should be returned.  For %ETHTOOL_SRXFHINDIR, a @size of zero means
+ * the table should be reset to default values.  This last feature
+ * is not supported by the original implementations.
+ */
+struct ethtool_rxfh_indir {
+	uint32_t	cmd;
+	uint32_t	size;
+	uint32_t	ring_index[0];
+};
+
+/**
+ * struct ethtool_rxfh - command to get/set RX flow hash indir or/and hash key.
+ * @cmd: Specific command number - %ETHTOOL_GRSSH or %ETHTOOL_SRSSH
+ * @rss_context: RSS context identifier.
+ * @indir_size: On entry, the array size of the user buffer for the
+ *	indirection table, which may be zero, or (for %ETHTOOL_SRSSH),
+ *	%ETH_RXFH_INDIR_NO_CHANGE.  On return from %ETHTOOL_GRSSH,
+ *	the array size of the hardware indirection table.
+ * @key_size: On entry, the array size of the user buffer for the hash key,
+ *	which may be zero.  On return from %ETHTOOL_GRSSH, the size of the
+ *	hardware hash key.
+ * @hfunc: Defines the current RSS hash function used by HW (or to be set to).
+ *	Valid values are one of the %ETH_RSS_HASH_*.
+ * @rsvd:	Reserved for future extensions.
+ * @rss_config: RX ring/queue index for each hash value i.e., indirection table
+ *	of @indir_size uint32_t elements, followed by hash key of @key_size
+ *	bytes.
+ *
+ * For %ETHTOOL_GRSSH, a @indir_size and key_size of zero means that only the
+ * size should be returned.  For %ETHTOOL_SRSSH, an @indir_size of
+ * %ETH_RXFH_INDIR_NO_CHANGE means that indir table setting is not requested
+ * and a @indir_size of zero means the indir table should be reset to default
+ * values. An hfunc of zero means that hash function setting is not requested.
+ */
+struct ethtool_rxfh {
+	uint32_t   cmd;
+	uint32_t	rss_context;
+	uint32_t   indir_size;
+	uint32_t   key_size;
+	uint8_t	hfunc;
+	uint8_t	rsvd8[3];
+	uint32_t	rsvd32;
+	uint32_t   rss_config[0];
+};
+#define ETH_RXFH_INDIR_NO_CHANGE	0xffffffff
+
+/**
+ * struct ethtool_rx_ntuple_flow_spec - specification for RX flow filter
+ * @flow_type: Type of match to perform, e.g. %TCP_V4_FLOW
+ * @h_u: Flow field values to match (dependent on @flow_type)
+ * @m_u: Masks for flow field value bits to be ignored
+ * @vlan_tag: VLAN tag to match
+ * @vlan_tag_mask: Mask for VLAN tag bits to be ignored
+ * @data: Driver-dependent data to match
+ * @data_mask: Mask for driver-dependent data bits to be ignored
+ * @action: RX ring/queue index to deliver to (non-negative) or other action
+ *	(negative, e.g. %ETHTOOL_RXNTUPLE_ACTION_DROP)
+ *
+ * For flow types %TCP_V4_FLOW, %UDP_V4_FLOW and %SCTP_V4_FLOW, where
+ * a field value and mask are both zero this is treated as if all mask
+ * bits are set i.e. the field is ignored.
+ */
+struct ethtool_rx_ntuple_flow_spec {
+	uint32_t		 flow_type;
+	union {
+		struct ethtool_tcpip4_spec		tcp_ip4_spec;
+		struct ethtool_tcpip4_spec		udp_ip4_spec;
+		struct ethtool_tcpip4_spec		sctp_ip4_spec;
+		struct ethtool_ah_espip4_spec		ah_ip4_spec;
+		struct ethtool_ah_espip4_spec		esp_ip4_spec;
+		struct ethtool_usrip4_spec		usr_ip4_spec;
+		struct eth_header				ether_spec;
+		uint8_t					hdata[72];
+	} h_u, m_u;
+
+	uint16_t	        vlan_tag;
+	uint16_t	        vlan_tag_mask;
+	uint64_t		data;
+	uint64_t		data_mask;
+
+	int32_t		action;
+#define ETHTOOL_RXNTUPLE_ACTION_DROP	(-1)	/* drop packet */
+#define ETHTOOL_RXNTUPLE_ACTION_CLEAR	(-2)	/* clear filter */
+};
+
+/**
+ * struct ethtool_rx_ntuple - command to set or clear RX flow filter
+ * @cmd: Command number - %ETHTOOL_SRXNTUPLE
+ * @fs: Flow filter specification
+ */
+struct ethtool_rx_ntuple {
+	uint32_t					cmd;
+	struct ethtool_rx_ntuple_flow_spec	fs;
+};
+
+#define ETHTOOL_FLASH_MAX_FILENAME	128
+enum ethtool_flash_op_type {
+	ETHTOOL_FLASH_ALL_REGIONS	= 0,
+};
+
+/* for passing firmware flashing related parameters */
+struct ethtool_flash {
+	uint32_t	cmd;
+	uint32_t	region;
+	char	data[ETHTOOL_FLASH_MAX_FILENAME];
+};
+
+/**
+ * struct ethtool_dump - used for retrieving, setting device dump
+ * @cmd: Command number - %ETHTOOL_GET_DUMP_FLAG, %ETHTOOL_GET_DUMP_DATA, or
+ * 	%ETHTOOL_SET_DUMP
+ * @version: FW version of the dump, filled in by driver
+ * @flag: driver dependent flag for dump setting, filled in by driver during
+ *        get and filled in by ethtool for set operation.
+ *        flag must be initialized by macro ETH_FW_DUMP_DISABLE value when
+ *        firmware dump is disabled.
+ * @len: length of dump data, used as the length of the user buffer on entry to
+ * 	 %ETHTOOL_GET_DUMP_DATA and this is returned as dump length by driver
+ * 	 for %ETHTOOL_GET_DUMP_FLAG command
+ * @data: data collected for get dump data operation
+ */
+struct ethtool_dump {
+	uint32_t	cmd;
+	uint32_t	version;
+	uint32_t	flag;
+	uint32_t	len;
+	uint8_t	data[0];
+};
+
+#define ETH_FW_DUMP_DISABLE 0
+
+/* for returning and changing feature sets */
+
+/**
+ * struct ethtool_get_features_block - block with state of 32 features
+ * @available: mask of changeable features
+ * @requested: mask of features requested to be enabled if possible
+ * @active: mask of currently enabled features
+ * @never_changed: mask of features not changeable for any device
+ */
+struct ethtool_get_features_block {
+	uint32_t	available;
+	uint32_t	requested;
+	uint32_t	active;
+	uint32_t	never_changed;
+};
+
+/**
+ * struct ethtool_gfeatures - command to get state of device's features
+ * @cmd: command number = %ETHTOOL_GFEATURES
+ * @size: On entry, the number of elements in the features[] array;
+ *	on return, the number of elements in features[] needed to hold
+ *	all features
+ * @features: state of features
+ */
+struct ethtool_gfeatures {
+	uint32_t	cmd;
+	uint32_t	size;
+	struct ethtool_get_features_block features[0];
+};
+
+/**
+ * struct ethtool_set_features_block - block with request for 32 features
+ * @valid: mask of features to be changed
+ * @requested: values of features to be changed
+ */
+struct ethtool_set_features_block {
+	uint32_t	valid;
+	uint32_t	requested;
+};
+
+/**
+ * struct ethtool_sfeatures - command to request change in device's features
+ * @cmd: command number = %ETHTOOL_SFEATURES
+ * @size: array size of the features[] array
+ * @features: feature change masks
+ */
+struct ethtool_sfeatures {
+	uint32_t	cmd;
+	uint32_t	size;
+	struct ethtool_set_features_block features[0];
+};
+
+/**
+ * struct ethtool_ts_info - holds a device's timestamping and PHC association
+ * @cmd: command number = %ETHTOOL_GET_TS_INFO
+ * @so_timestamping: bit mask of the sum of the supported SO_TIMESTAMPING flags
+ * @phc_index: device index of the associated PHC, or -1 if there is none
+ * @tx_types: bit mask of the supported hwtstamp_tx_types enumeration values
+ * @rx_filters: bit mask of the supported hwtstamp_rx_filters enumeration values
+ *
+ * The bits in the 'tx_types' and 'rx_filters' fields correspond to
+ * the 'hwtstamp_tx_types' and 'hwtstamp_rx_filters' enumeration values,
+ * respectively.  For example, if the device supports HWTSTAMP_TX_ON,
+ * then (1 << HWTSTAMP_TX_ON) in 'tx_types' will be set.
+ *
+ * Drivers should only report the filters they actually support without
+ * upscaling in the SIOCSHWTSTAMP ioctl. If the SIOCSHWSTAMP request for
+ * HWTSTAMP_FILTER_V1_SYNC is supported by HWTSTAMP_FILTER_V1_EVENT, then the
+ * driver should only report HWTSTAMP_FILTER_V1_EVENT in this op.
+ */
+struct ethtool_ts_info {
+	uint32_t	cmd;
+	uint32_t	so_timestamping;
+	int32_t	phc_index;
+	uint32_t	tx_types;
+	uint32_t	tx_reserved[3];
+	uint32_t	rx_filters;
+	uint32_t	rx_reserved[3];
+};
+
+/*
+ * %ETHTOOL_SFEATURES changes features present in features[].valid to the
+ * values of corresponding bits in features[].requested. Bits in .requested
+ * not set in .valid or not changeable are ignored.
+ *
+ * Returns %EINVAL when .valid contains undefined or never-changeable bits
+ * or size is not equal to required number of features words (32-bit blocks).
+ * Returns >= 0 if request was completed; bits set in the value mean:
+ *   %ETHTOOL_F_UNSUPPORTED - there were bits set in .valid that are not
+ *	changeable (not present in %ETHTOOL_GFEATURES' features[].available)
+ *	those bits were ignored.
+ *   %ETHTOOL_F_WISH - some or all changes requested were recorded but the
+ *      resulting state of bits masked by .valid is not equal to .requested.
+ *      Probably there are other device-specific constraints on some features
+ *      in the set. When %ETHTOOL_F_UNSUPPORTED is set, .valid is considered
+ *      here as though ignored bits were cleared.
+ *   %ETHTOOL_F_COMPAT - some or all changes requested were made by calling
+ *      compatibility functions. Requested offload state cannot be properly
+ *      managed by kernel.
+ *
+ * Meaning of bits in the masks are obtained by %ETHTOOL_GSSET_INFO (number of
+ * bits in the arrays - always multiple of 32) and %ETHTOOL_GSTRINGS commands
+ * for ETH_SS_FEATURES string set. First entry in the table corresponds to least
+ * significant bit in features[0] fields. Empty strings mark undefined features.
+ */
+enum ethtool_sfeatures_retval_bits {
+	ETHTOOL_F_UNSUPPORTED__BIT,
+	ETHTOOL_F_WISH__BIT,
+	ETHTOOL_F_COMPAT__BIT,
+};
+
+#define ETHTOOL_F_UNSUPPORTED   (1 << ETHTOOL_F_UNSUPPORTED__BIT)
+#define ETHTOOL_F_WISH          (1 << ETHTOOL_F_WISH__BIT)
+#define ETHTOOL_F_COMPAT        (1 << ETHTOOL_F_COMPAT__BIT)
+
+#define MAX_NUM_QUEUE		4096
+
+/**
+ * struct ethtool_per_queue_op - apply sub command to the queues in mask.
+ * @cmd: ETHTOOL_PERQUEUE
+ * @sub_command: the sub command which apply to each queues
+ * @queue_mask: Bitmap of the queues which sub command apply to
+ * @data: A complete command structure following for each of the queues addressed
+ */
+struct ethtool_per_queue_op {
+	uint32_t	cmd;
+	uint32_t	sub_command;
+	uint32_t	queue_mask[__KERNEL_DIV_ROUND_UP(MAX_NUM_QUEUE, 32)];
+	char	data[];
+};
+
+/**
+ * struct ethtool_fecparam - Ethernet forward error correction(fec) parameters
+ * @cmd: Command number = %ETHTOOL_GFECPARAM or %ETHTOOL_SFECPARAM
+ * @active_fec: FEC mode which is active on porte
+ * @fec: Bitmask of supported/configured FEC modes
+ * @rsvd: Reserved for future extensions. i.e FEC bypass feature.
+ *
+ * Drivers should reject a non-zero setting of @autoneg when
+ * autoneogotiation is disabled (or not supported) for the link.
+ *
+ */
+struct ethtool_fecparam {
+	uint32_t   cmd;
+	/* bitmask of FEC modes */
+	uint32_t   active_fec;
+	uint32_t   fec;
+	uint32_t   reserved;
+};
+
+/**
+ * enum ethtool_fec_config_bits - flags definition of ethtool_fec_configuration
+ * @ETHTOOL_FEC_NONE: FEC mode configuration is not supported
+ * @ETHTOOL_FEC_AUTO: Default/Best FEC mode provided by driver
+ * @ETHTOOL_FEC_OFF: No FEC Mode
+ * @ETHTOOL_FEC_RS: Reed-Solomon Forward Error Detection mode
+ * @ETHTOOL_FEC_BASER: Base-R/Reed-Solomon Forward Error Detection mode
+ */
+enum ethtool_fec_config_bits {
+	ETHTOOL_FEC_NONE_BIT,
+	ETHTOOL_FEC_AUTO_BIT,
+	ETHTOOL_FEC_OFF_BIT,
+	ETHTOOL_FEC_RS_BIT,
+	ETHTOOL_FEC_BASER_BIT,
+};
+
+#define ETHTOOL_FEC_NONE		(1 << ETHTOOL_FEC_NONE_BIT)
+#define ETHTOOL_FEC_AUTO		(1 << ETHTOOL_FEC_AUTO_BIT)
+#define ETHTOOL_FEC_OFF			(1 << ETHTOOL_FEC_OFF_BIT)
+#define ETHTOOL_FEC_RS			(1 << ETHTOOL_FEC_RS_BIT)
+#define ETHTOOL_FEC_BASER		(1 << ETHTOOL_FEC_BASER_BIT)
+
+/* CMDs currently supported */
+#define ETHTOOL_GSET		0x00000001 /* DEPRECATED, Get settings.
+					    * Please use ETHTOOL_GLINKSETTINGS
+					    */
+#define ETHTOOL_SSET		0x00000002 /* DEPRECATED, Set settings.
+					    * Please use ETHTOOL_SLINKSETTINGS
+					    */
+#define ETHTOOL_GDRVINFO	0x00000003 /* Get driver info. */
+#define ETHTOOL_GREGS		0x00000004 /* Get NIC registers. */
+#define ETHTOOL_GWOL		0x00000005 /* Get wake-on-lan options. */
+#define ETHTOOL_SWOL		0x00000006 /* Set wake-on-lan options. */
+#define ETHTOOL_GMSGLVL		0x00000007 /* Get driver message level */
+#define ETHTOOL_SMSGLVL		0x00000008 /* Set driver msg level. */
+#define ETHTOOL_NWAY_RST	0x00000009 /* Restart autonegotiation. */
+/* Get link status for host, i.e. whether the interface *and* the
+ * physical port (if there is one) are up (ethtool_value). */
+#define ETHTOOL_GLINK		0x0000000a
+#define ETHTOOL_GEEPROM		0x0000000b /* Get EEPROM data */
+#define ETHTOOL_SEEPROM		0x0000000c /* Set EEPROM data. */
+#define ETHTOOL_GCOALESCE	0x0000000e /* Get coalesce config */
+#define ETHTOOL_SCOALESCE	0x0000000f /* Set coalesce config. */
+#define ETHTOOL_GRINGPARAM	0x00000010 /* Get ring parameters */
+#define ETHTOOL_SRINGPARAM	0x00000011 /* Set ring parameters. */
+#define ETHTOOL_GPAUSEPARAM	0x00000012 /* Get pause parameters */
+#define ETHTOOL_SPAUSEPARAM	0x00000013 /* Set pause parameters. */
+#define ETHTOOL_GRXCSUM		0x00000014 /* Get RX hw csum enable (ethtool_value) */
+#define ETHTOOL_SRXCSUM		0x00000015 /* Set RX hw csum enable (ethtool_value) */
+#define ETHTOOL_GTXCSUM		0x00000016 /* Get TX hw csum enable (ethtool_value) */
+#define ETHTOOL_STXCSUM		0x00000017 /* Set TX hw csum enable (ethtool_value) */
+#define ETHTOOL_GSG		0x00000018 /* Get scatter-gather enable
+					    * (ethtool_value) */
+#define ETHTOOL_SSG		0x00000019 /* Set scatter-gather enable
+					    * (ethtool_value). */
+#define ETHTOOL_TEST		0x0000001a /* execute NIC self-test. */
+#define ETHTOOL_GSTRINGS	0x0000001b /* get specified string set */
+#define ETHTOOL_PHYS_ID		0x0000001c /* identify the NIC */
+#define ETHTOOL_GSTATS		0x0000001d /* get NIC-specific statistics */
+#define ETHTOOL_GTSO		0x0000001e /* Get TSO enable (ethtool_value) */
+#define ETHTOOL_STSO		0x0000001f /* Set TSO enable (ethtool_value) */
+#define ETHTOOL_GPERMADDR	0x00000020 /* Get permanent hardware address */
+#define ETHTOOL_GUFO		0x00000021 /* Get UFO enable (ethtool_value) */
+#define ETHTOOL_SUFO		0x00000022 /* Set UFO enable (ethtool_value) */
+#define ETHTOOL_GGSO		0x00000023 /* Get GSO enable (ethtool_value) */
+#define ETHTOOL_SGSO		0x00000024 /* Set GSO enable (ethtool_value) */
+#define ETHTOOL_GFLAGS		0x00000025 /* Get flags bitmap(ethtool_value) */
+#define ETHTOOL_SFLAGS		0x00000026 /* Set flags bitmap(ethtool_value) */
+#define ETHTOOL_GPFLAGS		0x00000027 /* Get driver-private flags bitmap */
+#define ETHTOOL_SPFLAGS		0x00000028 /* Set driver-private flags bitmap */
+
+#define ETHTOOL_GRXFH		0x00000029 /* Get RX flow hash configuration */
+#define ETHTOOL_SRXFH		0x0000002a /* Set RX flow hash configuration */
+#define ETHTOOL_GGRO		0x0000002b /* Get GRO enable (ethtool_value) */
+#define ETHTOOL_SGRO		0x0000002c /* Set GRO enable (ethtool_value) */
+#define ETHTOOL_GRXRINGS	0x0000002d /* Get RX rings available for LB */
+#define ETHTOOL_GRXCLSRLCNT	0x0000002e /* Get RX class rule count */
+#define ETHTOOL_GRXCLSRULE	0x0000002f /* Get RX classification rule */
+#define ETHTOOL_GRXCLSRLALL	0x00000030 /* Get all RX classification rule */
+#define ETHTOOL_SRXCLSRLDEL	0x00000031 /* Delete RX classification rule */
+#define ETHTOOL_SRXCLSRLINS	0x00000032 /* Insert RX classification rule */
+#define ETHTOOL_FLASHDEV	0x00000033 /* Flash firmware to device */
+#define ETHTOOL_RESET		0x00000034 /* Reset hardware */
+#define ETHTOOL_SRXNTUPLE	0x00000035 /* Add an n-tuple filter to device */
+#define ETHTOOL_GRXNTUPLE	0x00000036 /* deprecated */
+#define ETHTOOL_GSSET_INFO	0x00000037 /* Get string set info */
+#define ETHTOOL_GRXFHINDIR	0x00000038 /* Get RX flow hash indir'n table */
+#define ETHTOOL_SRXFHINDIR	0x00000039 /* Set RX flow hash indir'n table */
+
+#define ETHTOOL_GFEATURES	0x0000003a /* Get device offload settings */
+#define ETHTOOL_SFEATURES	0x0000003b /* Change device offload settings */
+#define ETHTOOL_GCHANNELS	0x0000003c /* Get no of channels */
+#define ETHTOOL_SCHANNELS	0x0000003d /* Set no of channels */
+#define ETHTOOL_SET_DUMP	0x0000003e /* Set dump settings */
+#define ETHTOOL_GET_DUMP_FLAG	0x0000003f /* Get dump settings */
+#define ETHTOOL_GET_DUMP_DATA	0x00000040 /* Get dump data */
+#define ETHTOOL_GET_TS_INFO	0x00000041 /* Get time stamping and PHC info */
+#define ETHTOOL_GMODULEINFO	0x00000042 /* Get plug-in module information */
+#define ETHTOOL_GMODULEEEPROM	0x00000043 /* Get plug-in module eeprom */
+#define ETHTOOL_GEEE		0x00000044 /* Get EEE settings */
+#define ETHTOOL_SEEE		0x00000045 /* Set EEE settings */
+
+#define ETHTOOL_GRSSH		0x00000046 /* Get RX flow hash configuration */
+#define ETHTOOL_SRSSH		0x00000047 /* Set RX flow hash configuration */
+#define ETHTOOL_GTUNABLE	0x00000048 /* Get tunable configuration */
+#define ETHTOOL_STUNABLE	0x00000049 /* Set tunable configuration */
+#define ETHTOOL_GPHYSTATS	0x0000004a /* get PHY-specific statistics */
+
+#define ETHTOOL_PERQUEUE	0x0000004b /* Set per queue options */
+
+#define ETHTOOL_GLINKSETTINGS	0x0000004c /* Get ethtool_link_settings */
+#define ETHTOOL_SLINKSETTINGS	0x0000004d /* Set ethtool_link_settings */
+#define ETHTOOL_PHY_GTUNABLE	0x0000004e /* Get PHY tunable configuration */
+#define ETHTOOL_PHY_STUNABLE	0x0000004f /* Set PHY tunable configuration */
+#define ETHTOOL_GFECPARAM	0x00000050 /* Get FEC settings */
+#define ETHTOOL_SFECPARAM	0x00000051 /* Set FEC settings */
+
+/* compatibility with older code */
+#define SPARC_ETH_GSET		ETHTOOL_GSET
+#define SPARC_ETH_SSET		ETHTOOL_SSET
+
+/* Link mode bit indices */
+enum ethtool_link_mode_bit_indices {
+	ETHTOOL_LINK_MODE_10baseT_Half_BIT	= 0,
+	ETHTOOL_LINK_MODE_10baseT_Full_BIT	= 1,
+	ETHTOOL_LINK_MODE_100baseT_Half_BIT	= 2,
+	ETHTOOL_LINK_MODE_100baseT_Full_BIT	= 3,
+	ETHTOOL_LINK_MODE_1000baseT_Half_BIT	= 4,
+	ETHTOOL_LINK_MODE_1000baseT_Full_BIT	= 5,
+	ETHTOOL_LINK_MODE_Autoneg_BIT		= 6,
+	ETHTOOL_LINK_MODE_TP_BIT		= 7,
+	ETHTOOL_LINK_MODE_AUI_BIT		= 8,
+	ETHTOOL_LINK_MODE_MII_BIT		= 9,
+	ETHTOOL_LINK_MODE_FIBRE_BIT		= 10,
+	ETHTOOL_LINK_MODE_BNC_BIT		= 11,
+	ETHTOOL_LINK_MODE_10000baseT_Full_BIT	= 12,
+	ETHTOOL_LINK_MODE_Pause_BIT		= 13,
+	ETHTOOL_LINK_MODE_Asym_Pause_BIT	= 14,
+	ETHTOOL_LINK_MODE_2500baseX_Full_BIT	= 15,
+	ETHTOOL_LINK_MODE_Backplane_BIT		= 16,
+	ETHTOOL_LINK_MODE_1000baseKX_Full_BIT	= 17,
+	ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT	= 18,
+	ETHTOOL_LINK_MODE_10000baseKR_Full_BIT	= 19,
+	ETHTOOL_LINK_MODE_10000baseR_FEC_BIT	= 20,
+	ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT = 21,
+	ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT	= 22,
+	ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT	= 23,
+	ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT	= 24,
+	ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT	= 25,
+	ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT	= 26,
+	ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT	= 27,
+	ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT	= 28,
+	ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT	= 29,
+	ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT	= 30,
+	ETHTOOL_LINK_MODE_25000baseCR_Full_BIT	= 31,
+	ETHTOOL_LINK_MODE_25000baseKR_Full_BIT	= 32,
+	ETHTOOL_LINK_MODE_25000baseSR_Full_BIT	= 33,
+	ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT	= 34,
+	ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT	= 35,
+	ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT	= 36,
+	ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT	= 37,
+	ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT	= 38,
+	ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT	= 39,
+	ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT		= 40,
+	ETHTOOL_LINK_MODE_1000baseX_Full_BIT	= 41,
+	ETHTOOL_LINK_MODE_10000baseCR_Full_BIT	= 42,
+	ETHTOOL_LINK_MODE_10000baseSR_Full_BIT	= 43,
+	ETHTOOL_LINK_MODE_10000baseLR_Full_BIT	= 44,
+	ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT	= 45,
+	ETHTOOL_LINK_MODE_10000baseER_Full_BIT	= 46,
+	ETHTOOL_LINK_MODE_2500baseT_Full_BIT	= 47,
+	ETHTOOL_LINK_MODE_5000baseT_Full_BIT	= 48,
+
+	ETHTOOL_LINK_MODE_FEC_NONE_BIT	= 49,
+	ETHTOOL_LINK_MODE_FEC_RS_BIT	= 50,
+	ETHTOOL_LINK_MODE_FEC_BASER_BIT	= 51,
+
+	/* Last allowed bit for __ETHTOOL_LINK_MODE_LEGACY_MASK is bit
+	 * 31. Please do NOT define any SUPPORTED_* or ADVERTISED_*
+	 * macro for bits > 31. The only way to use indices > 31 is to
+	 * use the new ETHTOOL_GLINKSETTINGS/ETHTOOL_SLINKSETTINGS API.
+	 */
+
+	__ETHTOOL_LINK_MODE_LAST
+	  = ETHTOOL_LINK_MODE_FEC_BASER_BIT,
+};
+
+#define __ETHTOOL_LINK_MODE_LEGACY_MASK(base_name)	\
+	(1UL << (ETHTOOL_LINK_MODE_ ## base_name ## _BIT))
+
+/* DEPRECATED macros. Please migrate to
+ * ETHTOOL_GLINKSETTINGS/ETHTOOL_SLINKSETTINGS API. Please do NOT
+ * define any new SUPPORTED_* macro for bits > 31.
+ */
+#define SUPPORTED_10baseT_Half		__ETHTOOL_LINK_MODE_LEGACY_MASK(10baseT_Half)
+#define SUPPORTED_10baseT_Full		__ETHTOOL_LINK_MODE_LEGACY_MASK(10baseT_Full)
+#define SUPPORTED_100baseT_Half		__ETHTOOL_LINK_MODE_LEGACY_MASK(100baseT_Half)
+#define SUPPORTED_100baseT_Full		__ETHTOOL_LINK_MODE_LEGACY_MASK(100baseT_Full)
+#define SUPPORTED_1000baseT_Half	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseT_Half)
+#define SUPPORTED_1000baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseT_Full)
+#define SUPPORTED_Autoneg		__ETHTOOL_LINK_MODE_LEGACY_MASK(Autoneg)
+#define SUPPORTED_TP			__ETHTOOL_LINK_MODE_LEGACY_MASK(TP)
+#define SUPPORTED_AUI			__ETHTOOL_LINK_MODE_LEGACY_MASK(AUI)
+#define SUPPORTED_MII			__ETHTOOL_LINK_MODE_LEGACY_MASK(MII)
+#define SUPPORTED_FIBRE			__ETHTOOL_LINK_MODE_LEGACY_MASK(FIBRE)
+#define SUPPORTED_BNC			__ETHTOOL_LINK_MODE_LEGACY_MASK(BNC)
+#define SUPPORTED_10000baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseT_Full)
+#define SUPPORTED_Pause			__ETHTOOL_LINK_MODE_LEGACY_MASK(Pause)
+#define SUPPORTED_Asym_Pause		__ETHTOOL_LINK_MODE_LEGACY_MASK(Asym_Pause)
+#define SUPPORTED_2500baseX_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(2500baseX_Full)
+#define SUPPORTED_Backplane		__ETHTOOL_LINK_MODE_LEGACY_MASK(Backplane)
+#define SUPPORTED_1000baseKX_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseKX_Full)
+#define SUPPORTED_10000baseKX4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseKX4_Full)
+#define SUPPORTED_10000baseKR_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseKR_Full)
+#define SUPPORTED_10000baseR_FEC	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseR_FEC)
+#define SUPPORTED_20000baseMLD2_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(20000baseMLD2_Full)
+#define SUPPORTED_20000baseKR2_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(20000baseKR2_Full)
+#define SUPPORTED_40000baseKR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseKR4_Full)
+#define SUPPORTED_40000baseCR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseCR4_Full)
+#define SUPPORTED_40000baseSR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseSR4_Full)
+#define SUPPORTED_40000baseLR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseLR4_Full)
+#define SUPPORTED_56000baseKR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseKR4_Full)
+#define SUPPORTED_56000baseCR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseCR4_Full)
+#define SUPPORTED_56000baseSR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseSR4_Full)
+#define SUPPORTED_56000baseLR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseLR4_Full)
+/* Please do not define any new SUPPORTED_* macro for bits > 31, see
+ * notice above.
+ */
+
+/*
+ * DEPRECATED macros. Please migrate to
+ * ETHTOOL_GLINKSETTINGS/ETHTOOL_SLINKSETTINGS API. Please do NOT
+ * define any new ADERTISE_* macro for bits > 31.
+ */
+#define ADVERTISED_10baseT_Half		__ETHTOOL_LINK_MODE_LEGACY_MASK(10baseT_Half)
+#define ADVERTISED_10baseT_Full		__ETHTOOL_LINK_MODE_LEGACY_MASK(10baseT_Full)
+#define ADVERTISED_100baseT_Half	__ETHTOOL_LINK_MODE_LEGACY_MASK(100baseT_Half)
+#define ADVERTISED_100baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(100baseT_Full)
+#define ADVERTISED_1000baseT_Half	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseT_Half)
+#define ADVERTISED_1000baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseT_Full)
+#define ADVERTISED_Autoneg		__ETHTOOL_LINK_MODE_LEGACY_MASK(Autoneg)
+#define ADVERTISED_TP			__ETHTOOL_LINK_MODE_LEGACY_MASK(TP)
+#define ADVERTISED_AUI			__ETHTOOL_LINK_MODE_LEGACY_MASK(AUI)
+#define ADVERTISED_MII			__ETHTOOL_LINK_MODE_LEGACY_MASK(MII)
+#define ADVERTISED_FIBRE		__ETHTOOL_LINK_MODE_LEGACY_MASK(FIBRE)
+#define ADVERTISED_BNC			__ETHTOOL_LINK_MODE_LEGACY_MASK(BNC)
+#define ADVERTISED_10000baseT_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseT_Full)
+#define ADVERTISED_Pause		__ETHTOOL_LINK_MODE_LEGACY_MASK(Pause)
+#define ADVERTISED_Asym_Pause		__ETHTOOL_LINK_MODE_LEGACY_MASK(Asym_Pause)
+#define ADVERTISED_2500baseX_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(2500baseX_Full)
+#define ADVERTISED_Backplane		__ETHTOOL_LINK_MODE_LEGACY_MASK(Backplane)
+#define ADVERTISED_1000baseKX_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(1000baseKX_Full)
+#define ADVERTISED_10000baseKX4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseKX4_Full)
+#define ADVERTISED_10000baseKR_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseKR_Full)
+#define ADVERTISED_10000baseR_FEC	__ETHTOOL_LINK_MODE_LEGACY_MASK(10000baseR_FEC)
+#define ADVERTISED_20000baseMLD2_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(20000baseMLD2_Full)
+#define ADVERTISED_20000baseKR2_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(20000baseKR2_Full)
+#define ADVERTISED_40000baseKR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseKR4_Full)
+#define ADVERTISED_40000baseCR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseCR4_Full)
+#define ADVERTISED_40000baseSR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseSR4_Full)
+#define ADVERTISED_40000baseLR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(40000baseLR4_Full)
+#define ADVERTISED_56000baseKR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseKR4_Full)
+#define ADVERTISED_56000baseCR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseCR4_Full)
+#define ADVERTISED_56000baseSR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseSR4_Full)
+#define ADVERTISED_56000baseLR4_Full	__ETHTOOL_LINK_MODE_LEGACY_MASK(56000baseLR4_Full)
+/* Please do not define any new ADVERTISED_* macro for bits > 31, see
+ * notice above.
+ */
+
+/* The following are all involved in forcing a particular link
+ * mode for the device for setting things.  When getting the
+ * devices settings, these indicate the current mode and whether
+ * it was forced up into this mode or autonegotiated.
+ */
+
+/* The forced speed, in units of 1Mb. All values 0 to INT_MAX are legal.
+ * Update drivers/net/phy/phy.c:phy_speed_to_str() and
+ * drivers/net/bonding/bond_3ad.c:__get_link_speed() when adding new values.
+ */
+#define SPEED_10		10
+#define SPEED_100		100
+#define SPEED_1000		1000
+#define SPEED_2500		2500
+#define SPEED_5000		5000
+#define SPEED_10000		10000
+#define SPEED_14000		14000
+#define SPEED_20000		20000
+#define SPEED_25000		25000
+#define SPEED_40000		40000
+#define SPEED_50000		50000
+#define SPEED_56000		56000
+#define SPEED_100000		100000
+
+#define SPEED_UNKNOWN		-1
+
+static inline int ethtool_validate_speed(uint32_t speed)
+{
+	return speed <= INT_MAX || speed == SPEED_UNKNOWN;
+}
+
+/* Duplex, half or full. */
+#define DUPLEX_HALF		0x00
+#define DUPLEX_FULL		0x01
+#define DUPLEX_UNKNOWN		0xff
+
+static inline int ethtool_validate_duplex(uint8_t duplex)
+{
+	switch (duplex) {
+	case DUPLEX_HALF:
+	case DUPLEX_FULL:
+	case DUPLEX_UNKNOWN:
+		return 1;
+	}
+
+	return 0;
+}
+
+/* Which connector port. */
+#define PORT_TP			0x00
+#define PORT_AUI		0x01
+#define PORT_MII		0x02
+#define PORT_FIBRE		0x03
+#define PORT_BNC		0x04
+#define PORT_DA			0x05
+#define PORT_NONE		0xef
+#define PORT_OTHER		0xff
+
+/* Which transceiver to use. */
+#define XCVR_INTERNAL		0x00 /* PHY and MAC are in the same package */
+#define XCVR_EXTERNAL		0x01 /* PHY and MAC are in different packages */
+#define XCVR_DUMMY1		0x02
+#define XCVR_DUMMY2		0x03
+#define XCVR_DUMMY3		0x04
+
+/* Enable or disable autonegotiation. */
+#define AUTONEG_DISABLE		0x00
+#define AUTONEG_ENABLE		0x01
+
+/* MDI or MDI-X status/control - if MDI/MDI_X/AUTO is set then
+ * the driver is required to renegotiate link
+ */
+#define ETH_TP_MDI_INVALID	0x00 /* status: unknown; control: unsupported */
+#define ETH_TP_MDI		0x01 /* status: MDI;     control: force MDI */
+#define ETH_TP_MDI_X		0x02 /* status: MDI-X;   control: force MDI-X */
+#define ETH_TP_MDI_AUTO		0x03 /*                  control: auto-select */
+
+/* Wake-On-Lan options. */
+#define WAKE_PHY		(1 << 0)
+#define WAKE_UCAST		(1 << 1)
+#define WAKE_MCAST		(1 << 2)
+#define WAKE_BCAST		(1 << 3)
+#define WAKE_ARP		(1 << 4)
+#define WAKE_MAGIC		(1 << 5)
+#define WAKE_MAGICSECURE	(1 << 6) /* only meaningful if WAKE_MAGIC */
+
+/* L2-L4 network traffic flow types */
+#define	TCP_V4_FLOW	0x01	/* hash or spec (tcp_ip4_spec) */
+#define	UDP_V4_FLOW	0x02	/* hash or spec (udp_ip4_spec) */
+#define	SCTP_V4_FLOW	0x03	/* hash or spec (sctp_ip4_spec) */
+#define	AH_ESP_V4_FLOW	0x04	/* hash only */
+#define	TCP_V6_FLOW	0x05	/* hash or spec (tcp_ip6_spec; nfc only) */
+#define	UDP_V6_FLOW	0x06	/* hash or spec (udp_ip6_spec; nfc only) */
+#define	SCTP_V6_FLOW	0x07	/* hash or spec (sctp_ip6_spec; nfc only) */
+#define	AH_ESP_V6_FLOW	0x08	/* hash only */
+#define	AH_V4_FLOW	0x09	/* hash or spec (ah_ip4_spec) */
+#define	ESP_V4_FLOW	0x0a	/* hash or spec (esp_ip4_spec) */
+#define	AH_V6_FLOW	0x0b	/* hash or spec (ah_ip6_spec; nfc only) */
+#define	ESP_V6_FLOW	0x0c	/* hash or spec (esp_ip6_spec; nfc only) */
+#define	IPV4_USER_FLOW	0x0d	/* spec only (usr_ip4_spec) */
+#define	IP_USER_FLOW	IPV4_USER_FLOW
+#define	IPV6_USER_FLOW	0x0e	/* spec only (usr_ip6_spec; nfc only) */
+#define	IPV4_FLOW	0x10	/* hash only */
+#define	IPV6_FLOW	0x11	/* hash only */
+#define	ETHER_FLOW	0x12	/* spec only (ether_spec) */
+/* Flag to enable additional fields in struct ethtool_rx_flow_spec */
+#define	FLOW_EXT	0x80000000
+#define	FLOW_MAC_EXT	0x40000000
+
+/* L3-L4 network traffic flow hash options */
+#define	RXH_L2DA	(1 << 1)
+#define	RXH_VLAN	(1 << 2)
+#define	RXH_L3_PROTO	(1 << 3)
+#define	RXH_IP_SRC	(1 << 4)
+#define	RXH_IP_DST	(1 << 5)
+#define	RXH_L4_B_0_1	(1 << 6) /* src port in case of TCP/UDP/SCTP */
+#define	RXH_L4_B_2_3	(1 << 7) /* dst port in case of TCP/UDP/SCTP */
+#define	RXH_DISCARD	(1 << 31)
+
+#define	RX_CLS_FLOW_DISC	0xffffffffffffffffULL
+
+/* Special RX classification rule insert location values */
+#define RX_CLS_LOC_SPECIAL	0x80000000	/* flag */
+#define RX_CLS_LOC_ANY		0xffffffff
+#define RX_CLS_LOC_FIRST	0xfffffffe
+#define RX_CLS_LOC_LAST		0xfffffffd
+
+/* EEPROM Standards for plug in modules */
+#define ETH_MODULE_SFF_8079		0x1
+#define ETH_MODULE_SFF_8079_LEN		256
+#define ETH_MODULE_SFF_8472		0x2
+#define ETH_MODULE_SFF_8472_LEN		512
+#define ETH_MODULE_SFF_8636		0x3
+#define ETH_MODULE_SFF_8636_LEN		256
+#define ETH_MODULE_SFF_8436		0x4
+#define ETH_MODULE_SFF_8436_LEN		256
+
+/* Reset flags */
+/* The reset() operation must clear the flags for the components which
+ * were actually reset.  On successful return, the flags indicate the
+ * components which were not reset, either because they do not exist
+ * in the hardware or because they cannot be reset independently.  The
+ * driver must never reset any components that were not requested.
+ */
+enum ethtool_reset_flags {
+	/* These flags represent components dedicated to the interface
+	 * the command is addressed to.  Shift any flag left by
+	 * ETH_RESET_SHARED_SHIFT to reset a shared component of the
+	 * same type.
+	 */
+	ETH_RESET_MGMT		= 1 << 0,	/* Management processor */
+	ETH_RESET_IRQ		= 1 << 1,	/* Interrupt requester */
+	ETH_RESET_DMA		= 1 << 2,	/* DMA engine */
+	ETH_RESET_FILTER	= 1 << 3,	/* Filtering/flow direction */
+	ETH_RESET_OFFLOAD	= 1 << 4,	/* Protocol offload */
+	ETH_RESET_MAC		= 1 << 5,	/* Media access controller */
+	ETH_RESET_PHY		= 1 << 6,	/* Transceiver/PHY */
+	ETH_RESET_RAM		= 1 << 7,	/* RAM shared between
+						 * multiple components */
+	ETH_RESET_AP		= 1 << 8,	/* Application processor */
+
+	ETH_RESET_DEDICATED	= 0x0000ffff,	/* All components dedicated to
+						 * this interface */
+	ETH_RESET_ALL		= 0xffffffff,	/* All components used by this
+						 * interface, even if shared */
+};
+#define ETH_RESET_SHARED_SHIFT	16
+
+
+/**
+ * struct ethtool_link_settings - link control and status
+ *
+ * IMPORTANT, Backward compatibility notice: When implementing new
+ *	user-space tools, please first try %ETHTOOL_GLINKSETTINGS, and
+ *	if it succeeds use %ETHTOOL_SLINKSETTINGS to change link
+ *	settings; do not use %ETHTOOL_SSET if %ETHTOOL_GLINKSETTINGS
+ *	succeeded: stick to %ETHTOOL_GLINKSETTINGS/%SLINKSETTINGS in
+ *	that case.  Conversely, if %ETHTOOL_GLINKSETTINGS fails, use
+ *	%ETHTOOL_GSET to query and %ETHTOOL_SSET to change link
+ *	settings; do not use %ETHTOOL_SLINKSETTINGS if
+ *	%ETHTOOL_GLINKSETTINGS failed: stick to
+ *	%ETHTOOL_GSET/%ETHTOOL_SSET in that case.
+ *
+ * @cmd: Command number = %ETHTOOL_GLINKSETTINGS or %ETHTOOL_SLINKSETTINGS
+ * @speed: Link speed (Mbps)
+ * @duplex: Duplex mode; one of %DUPLEX_*
+ * @port: Physical connector type; one of %PORT_*
+ * @phy_address: MDIO address of PHY (transceiver); 0 or 255 if not
+ *	applicable.  For clause 45 PHYs this is the PRTAD.
+ * @autoneg: Enable/disable autonegotiation and auto-detection;
+ *	either %AUTONEG_DISABLE or %AUTONEG_ENABLE
+ * @mdio_support: Bitmask of %ETH_MDIO_SUPPORTS_* flags for the MDIO
+ *	protocols supported by the interface; 0 if unknown.
+ *	Read-only.
+ * @eth_tp_mdix: Ethernet twisted-pair MDI(-X) status; one of
+ *	%ETH_TP_MDI_*.  If the status is unknown or not applicable, the
+ *	value will be %ETH_TP_MDI_INVALID.  Read-only.
+ * @eth_tp_mdix_ctrl: Ethernet twisted pair MDI(-X) control; one of
+ *	%ETH_TP_MDI_*.  If MDI(-X) control is not implemented, reads
+ *	yield %ETH_TP_MDI_INVALID and writes may be ignored or rejected.
+ *	When written successfully, the link should be renegotiated if
+ *	necessary.
+ * @link_mode_masks_nwords: Number of 32-bit words for each of the
+ *	supported, advertising, lp_advertising link mode bitmaps. For
+ *	%ETHTOOL_GLINKSETTINGS: on entry, number of words passed by user
+ *	(>= 0); on return, if handshake in progress, negative if
+ *	request size unsupported by kernel: absolute value indicates
+ *	kernel expected size and all the other fields but cmd
+ *	are 0; otherwise (handshake completed), strictly positive
+ *	to indicate size used by kernel and cmd field stays
+ *	%ETHTOOL_GLINKSETTINGS, all other fields populated by driver. For
+ *	%ETHTOOL_SLINKSETTINGS: must be valid on entry, ie. a positive
+ *	value returned previously by %ETHTOOL_GLINKSETTINGS, otherwise
+ *	refused. For drivers: ignore this field (use kernel's
+ *	__ETHTOOL_LINK_MODE_MASK_NBITS instead), any change to it will
+ *	be overwritten by kernel.
+ * @supported: Bitmap with each bit meaning given by
+ *	%ethtool_link_mode_bit_indices for the link modes, physical
+ *	connectors and other link features for which the interface
+ *	supports autonegotiation or auto-detection.  Read-only.
+ * @advertising: Bitmap with each bit meaning given by
+ *	%ethtool_link_mode_bit_indices for the link modes, physical
+ *	connectors and other link features that are advertised through
+ *	autonegotiation or enabled for auto-detection.
+ * @lp_advertising: Bitmap with each bit meaning given by
+ *	%ethtool_link_mode_bit_indices for the link modes, and other
+ *	link features that the link partner advertised through
+ *	autonegotiation; 0 if unknown or not applicable.  Read-only.
+ * @transceiver: Used to distinguish different possible PHY types,
+ *	reported consistently by PHYLIB.  Read-only.
+ *
+ * If autonegotiation is disabled, the speed and @duplex represent the
+ * fixed link mode and are writable if the driver supports multiple
+ * link modes.  If it is enabled then they are read-only; if the link
+ * is up they represent the negotiated link mode; if the link is down,
+ * the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
+ * @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.
+ *
+ * Some hardware interfaces may have multiple PHYs and/or physical
+ * connectors fitted or do not allow the driver to detect which are
+ * fitted.  For these interfaces @port and/or @phy_address may be
+ * writable, possibly dependent on @autoneg being %AUTONEG_DISABLE.
+ * Otherwise, attempts to write different values may be ignored or
+ * rejected.
+ *
+ * Deprecated %ethtool_cmd fields transceiver, maxtxpkt and maxrxpkt
+ * are not available in %ethtool_link_settings. Until all drivers are
+ * converted to ignore them or to the new %ethtool_link_settings API,
+ * for both queries and changes, users should always try
+ * %ETHTOOL_GLINKSETTINGS first, and if it fails with -ENOTSUPP stick
+ * only to %ETHTOOL_GSET and %ETHTOOL_SSET consistently. If it
+ * succeeds, then users should stick to %ETHTOOL_GLINKSETTINGS and
+ * %ETHTOOL_SLINKSETTINGS (which would support drivers implementing
+ * either %ethtool_cmd or %ethtool_link_settings).
+ *
+ * Users should assume that all fields not marked read-only are
+ * writable and subject to validation by the driver.  They should use
+ * %ETHTOOL_GLINKSETTINGS to get the current values before making specific
+ * changes and then applying them with %ETHTOOL_SLINKSETTINGS.
+ *
+ * Drivers that implement %get_link_ksettings and/or
+ * %set_link_ksettings should ignore the @cmd
+ * and @link_mode_masks_nwords fields (any change to them overwritten
+ * by kernel), and rely only on kernel's internal
+ * %__ETHTOOL_LINK_MODE_MASK_NBITS and
+ * %ethtool_link_mode_mask_t. Drivers that implement
+ * %set_link_ksettings() should validate all fields other than @cmd
+ * and @link_mode_masks_nwords that are not described as read-only or
+ * deprecated, and must ignore all fields described as read-only.
+ */
+struct ethtool_link_settings {
+	uint32_t	cmd;
+	uint32_t	speed;
+	uint8_t	duplex;
+	uint8_t	port;
+	uint8_t	phy_address;
+	uint8_t	autoneg;
+	uint8_t	mdio_support;
+	uint8_t	eth_tp_mdix;
+	uint8_t	eth_tp_mdix_ctrl;
+	int8_t	link_mode_masks_nwords;
+	uint8_t	transceiver;
+	uint8_t	reserved1[3];
+	uint32_t	reserved[7];
+	uint32_t	link_mode_masks[0];
+	/* layout of link_mode_masks fields:
+	 * uint32_t map_supported[link_mode_masks_nwords];
+	 * uint32_t map_advertising[link_mode_masks_nwords];
+	 * uint32_t map_lp_advertising[link_mode_masks_nwords];
+	 */
+};
+#endif /* _LINUX_ETHTOOL_H */
diff --git a/include/standard-headers/linux/kernel.h b/include/standard-headers/linux/kernel.h
new file mode 100644
index 0000000..1eeba2e
--- /dev/null
+++ b/include/standard-headers/linux/kernel.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_KERNEL_H
+#define _LINUX_KERNEL_H
+
+#include "standard-headers/linux/sysinfo.h"
+
+/*
+ * 'kernel.h' contains some often-used function prototypes etc
+ */
+#define __ALIGN_KERNEL(x, a)		__ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1)
+#define __ALIGN_KERNEL_MASK(x, mask)	(((x) + (mask)) & ~(mask))
+
+#define __KERNEL_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
+
+#endif /* _LINUX_KERNEL_H */
diff --git a/include/standard-headers/linux/sysinfo.h b/include/standard-headers/linux/sysinfo.h
new file mode 100644
index 0000000..e3c06ac
--- /dev/null
+++ b/include/standard-headers/linux/sysinfo.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_SYSINFO_H
+#define _LINUX_SYSINFO_H
+
+#include "standard-headers/linux/types.h"
+
+#define SI_LOAD_SHIFT	16
+struct sysinfo {
+	long uptime;		/* Seconds since boot */
+	unsigned long loads[3];	/* 1, 5, and 15 minute load averages */
+	unsigned long totalram;	/* Total usable main memory size */
+	unsigned long freeram;	/* Available memory size */
+	unsigned long sharedram;	/* Amount of shared memory */
+	unsigned long bufferram;	/* Memory used by buffers */
+	unsigned long totalswap;	/* Total swap space size */
+	unsigned long freeswap;	/* swap space still available */
+	uint16_t procs;		   	/* Number of current processes */
+	uint16_t pad;		   	/* Explicit padding for m68k */
+	unsigned long totalhigh;	/* Total high memory size */
+	unsigned long freehigh;	/* Available high memory size */
+	uint32_t mem_unit;			/* Memory unit size in bytes */
+	char _f[20-2*sizeof(unsigned long)-sizeof(uint32_t)];	/* Padding: libc5 uses this.. */
+};
+
+#endif /* _LINUX_SYSINFO_H */
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index be06570..d18e2f1 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -39,6 +39,9 @@ cp_portable() {
                                      -e 'input-event-codes' \
                                      -e 'sys/' \
                                      -e 'pvrdma_verbs' \
+                                     -e 'limits' \
+                                     -e 'linux/kernel' \
+                                     -e 'linux/sysinfo' \
                                      > /dev/null
     then
         echo "Unexpected #include in input file $f".
@@ -59,6 +62,10 @@ cp_portable() {
         -e '/sys\/ioctl.h/d' \
         -e 's/SW_MAX/SW_MAX_/' \
         -e 's/atomic_t/int/' \
+        -e 's/__kernel_long_t/long/' \
+        -e 's/__kernel_ulong_t/unsigned long/' \
+        -e 's/struct ethhdr/struct eth_header/' \
+        -e '/\#define _LINUX_ETHTOOL_H/a \\n\#include "net/eth.h"' \
         "$f" > "$to/$header";
 }
 
@@ -146,7 +153,9 @@ rm -rf "$output/include/standard-headers/linux"
 mkdir -p "$output/include/standard-headers/linux"
 for i in "$tmpdir"/include/linux/*virtio*.h "$tmpdir/include/linux/input.h" \
          "$tmpdir/include/linux/input-event-codes.h" \
-         "$tmpdir/include/linux/pci_regs.h"; do
+         "$tmpdir/include/linux/pci_regs.h" \
+         "$tmpdir/include/linux/ethtool.h" "$tmpdir/include/linux/kernel.h" \
+         "$tmpdir/include/linux/sysinfo.h"; do
     cp_portable "$i" "$output/include/standard-headers/linux"
 done
 
-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 02/50] virtio-net: use 64-bit values for feature flags
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
@ 2018-03-20  3:17   ` Michael S. Tsirkin
  2018-03-20  3:17   ` [virtio-dev] " Michael S. Tsirkin
                     ` (49 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jason Baron, Jason Wang, virtio-dev

From: Jason Baron <jbaron@akamai.com>

In prepartion for using some of the high order feature bits, make sure that
virtio-net uses 64-bit values everywhere.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-net.h |  2 +-
 hw/net/virtio-net.c            | 55 +++++++++++++++++++++---------------------
 2 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index b81b6a4..e7634c9 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -67,7 +67,7 @@ typedef struct VirtIONet {
     uint32_t has_vnet_hdr;
     size_t host_hdr_len;
     size_t guest_hdr_len;
-    uint32_t host_features;
+    uint64_t host_features;
     uint8_t has_ufo;
     uint32_t mergeable_rx_bufs;
     uint8_t promisc;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 188744e..ab06f93 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -48,18 +48,18 @@
     (offsetof(container, field) + sizeof(((container *)0)->field))
 
 typedef struct VirtIOFeature {
-    uint32_t flags;
+    uint64_t flags;
     size_t end;
 } VirtIOFeature;
 
 static VirtIOFeature feature_sizes[] = {
-    {.flags = 1 << VIRTIO_NET_F_MAC,
+    {.flags = 1ULL << VIRTIO_NET_F_MAC,
      .end = endof(struct virtio_net_config, mac)},
-    {.flags = 1 << VIRTIO_NET_F_STATUS,
+    {.flags = 1ULL << VIRTIO_NET_F_STATUS,
      .end = endof(struct virtio_net_config, status)},
-    {.flags = 1 << VIRTIO_NET_F_MQ,
+    {.flags = 1ULL << VIRTIO_NET_F_MQ,
      .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
-    {.flags = 1 << VIRTIO_NET_F_MTU,
+    {.flags = 1ULL << VIRTIO_NET_F_MTU,
      .end = endof(struct virtio_net_config, mtu)},
     {}
 };
@@ -1938,7 +1938,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     int i;
 
     if (n->net_conf.mtu) {
-        n->host_features |= (0x1 << VIRTIO_NET_F_MTU);
+        n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
     }
 
     virtio_net_set_config_size(n, n->host_features);
@@ -2109,45 +2109,46 @@ static const VMStateDescription vmstate_virtio_net = {
 };
 
 static Property virtio_net_properties[] = {
-    DEFINE_PROP_BIT("csum", VirtIONet, host_features, VIRTIO_NET_F_CSUM, true),
-    DEFINE_PROP_BIT("guest_csum", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("csum", VirtIONet, host_features,
+                    VIRTIO_NET_F_CSUM, true),
+    DEFINE_PROP_BIT64("guest_csum", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_CSUM, true),
-    DEFINE_PROP_BIT("gso", VirtIONet, host_features, VIRTIO_NET_F_GSO, true),
-    DEFINE_PROP_BIT("guest_tso4", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("gso", VirtIONet, host_features, VIRTIO_NET_F_GSO, true),
+    DEFINE_PROP_BIT64("guest_tso4", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_TSO4, true),
-    DEFINE_PROP_BIT("guest_tso6", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("guest_tso6", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_TSO6, true),
-    DEFINE_PROP_BIT("guest_ecn", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("guest_ecn", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_ECN, true),
-    DEFINE_PROP_BIT("guest_ufo", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("guest_ufo", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_UFO, true),
-    DEFINE_PROP_BIT("guest_announce", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("guest_announce", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_ANNOUNCE, true),
-    DEFINE_PROP_BIT("host_tso4", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("host_tso4", VirtIONet, host_features,
                     VIRTIO_NET_F_HOST_TSO4, true),
-    DEFINE_PROP_BIT("host_tso6", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("host_tso6", VirtIONet, host_features,
                     VIRTIO_NET_F_HOST_TSO6, true),
-    DEFINE_PROP_BIT("host_ecn", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("host_ecn", VirtIONet, host_features,
                     VIRTIO_NET_F_HOST_ECN, true),
-    DEFINE_PROP_BIT("host_ufo", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("host_ufo", VirtIONet, host_features,
                     VIRTIO_NET_F_HOST_UFO, true),
-    DEFINE_PROP_BIT("mrg_rxbuf", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("mrg_rxbuf", VirtIONet, host_features,
                     VIRTIO_NET_F_MRG_RXBUF, true),
-    DEFINE_PROP_BIT("status", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("status", VirtIONet, host_features,
                     VIRTIO_NET_F_STATUS, true),
-    DEFINE_PROP_BIT("ctrl_vq", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_vq", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_VQ, true),
-    DEFINE_PROP_BIT("ctrl_rx", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_rx", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_RX, true),
-    DEFINE_PROP_BIT("ctrl_vlan", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_vlan", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_VLAN, true),
-    DEFINE_PROP_BIT("ctrl_rx_extra", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_rx_extra", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_RX_EXTRA, true),
-    DEFINE_PROP_BIT("ctrl_mac_addr", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_mac_addr", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_MAC_ADDR, true),
-    DEFINE_PROP_BIT("ctrl_guest_offloads", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_guest_offloads", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, true),
-    DEFINE_PROP_BIT("mq", VirtIONet, host_features, VIRTIO_NET_F_MQ, false),
+    DEFINE_PROP_BIT64("mq", VirtIONet, host_features, VIRTIO_NET_F_MQ, false),
     DEFINE_NIC_PROPERTIES(VirtIONet, nic_conf),
     DEFINE_PROP_UINT32("x-txtimer", VirtIONet, net_conf.txtimer,
                        TX_TIMER_INTERVAL),
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 03/50] virtio-net: add linkspeed and duplex settings to virtio-net
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
@ 2018-03-20  3:17   ` Michael S. Tsirkin
  2018-03-20  3:17   ` [virtio-dev] " Michael S. Tsirkin
                     ` (49 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jason Baron, Jason Wang, virtio-dev

From: Jason Baron <jbaron@akamai.com>

Although linkspeed and duplex can be set in a linux guest via 'ethtool -s',
this requires custom ethtool commands for virtio-net by default.

Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
the hypervisor to export a linkspeed and duplex setting. The user can
subsequently overwrite it later if desired via: 'ethtool -s'.

Linkspeed and duplex settings can be set as:
'-device virtio-net,speed=10000,duplex=full'

where speed is [0...INT_MAX], and duplex is ["half"|"full"].

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-net.h |  3 +++
 hw/net/virtio-net.c            | 26 ++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index e7634c9..02484dc 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -38,6 +38,9 @@ typedef struct virtio_net_conf
     uint16_t rx_queue_size;
     uint16_t tx_queue_size;
     uint16_t mtu;
+    int32_t speed;
+    char *duplex_str;
+    uint8_t duplex;
 } virtio_net_conf;
 
 /* Maximum packet size we can receive from tap device: header + 64k */
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ab06f93..67ad38c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -26,6 +26,7 @@
 #include "qapi/qapi-events-net.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "standard-headers/linux/ethtool.h"
 
 #define VIRTIO_NET_VM_VERSION    11
 
@@ -61,6 +62,8 @@ static VirtIOFeature feature_sizes[] = {
      .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
     {.flags = 1ULL << VIRTIO_NET_F_MTU,
      .end = endof(struct virtio_net_config, mtu)},
+    {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
+     .end = endof(struct virtio_net_config, duplex)},
     {}
 };
 
@@ -89,6 +92,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
     virtio_stw_p(vdev, &netcfg.max_virtqueue_pairs, n->max_queues);
     virtio_stw_p(vdev, &netcfg.mtu, n->net_conf.mtu);
     memcpy(netcfg.mac, n->mac, ETH_ALEN);
+    virtio_stl_p(vdev, &netcfg.speed, n->net_conf.speed);
+    netcfg.duplex = n->net_conf.duplex;
     memcpy(config, &netcfg, n->config_size);
 }
 
@@ -1941,6 +1946,25 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
         n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
     }
 
+    if (n->net_conf.duplex_str) {
+        if (strncmp(n->net_conf.duplex_str, "half", 5) == 0) {
+            n->net_conf.duplex = DUPLEX_HALF;
+        } else if (strncmp(n->net_conf.duplex_str, "full", 5) == 0) {
+            n->net_conf.duplex = DUPLEX_FULL;
+        } else {
+            error_setg(errp, "'duplex' must be 'half' or 'full'");
+        }
+        n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
+    } else {
+        n->net_conf.duplex = DUPLEX_UNKNOWN;
+    }
+
+    if (n->net_conf.speed < SPEED_UNKNOWN) {
+        error_setg(errp, "'speed' must be between 0 and INT_MAX");
+    } else if (n->net_conf.speed >= 0) {
+        n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
+    }
+
     virtio_net_set_config_size(n, n->host_features);
     virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
 
@@ -2161,6 +2185,8 @@ static Property virtio_net_properties[] = {
     DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0),
     DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend,
                      true),
+    DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
+    DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [virtio-dev] [PULL v2 03/50] virtio-net: add linkspeed and duplex settings to virtio-net
@ 2018-03-20  3:17   ` Michael S. Tsirkin
  0 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jason Baron, Jason Wang, virtio-dev

From: Jason Baron <jbaron@akamai.com>

Although linkspeed and duplex can be set in a linux guest via 'ethtool -s',
this requires custom ethtool commands for virtio-net by default.

Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
the hypervisor to export a linkspeed and duplex setting. The user can
subsequently overwrite it later if desired via: 'ethtool -s'.

Linkspeed and duplex settings can be set as:
'-device virtio-net,speed=10000,duplex=full'

where speed is [0...INT_MAX], and duplex is ["half"|"full"].

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-net.h |  3 +++
 hw/net/virtio-net.c            | 26 ++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index e7634c9..02484dc 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -38,6 +38,9 @@ typedef struct virtio_net_conf
     uint16_t rx_queue_size;
     uint16_t tx_queue_size;
     uint16_t mtu;
+    int32_t speed;
+    char *duplex_str;
+    uint8_t duplex;
 } virtio_net_conf;
 
 /* Maximum packet size we can receive from tap device: header + 64k */
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ab06f93..67ad38c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -26,6 +26,7 @@
 #include "qapi/qapi-events-net.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "standard-headers/linux/ethtool.h"
 
 #define VIRTIO_NET_VM_VERSION    11
 
@@ -61,6 +62,8 @@ static VirtIOFeature feature_sizes[] = {
      .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
     {.flags = 1ULL << VIRTIO_NET_F_MTU,
      .end = endof(struct virtio_net_config, mtu)},
+    {.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
+     .end = endof(struct virtio_net_config, duplex)},
     {}
 };
 
@@ -89,6 +92,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t *config)
     virtio_stw_p(vdev, &netcfg.max_virtqueue_pairs, n->max_queues);
     virtio_stw_p(vdev, &netcfg.mtu, n->net_conf.mtu);
     memcpy(netcfg.mac, n->mac, ETH_ALEN);
+    virtio_stl_p(vdev, &netcfg.speed, n->net_conf.speed);
+    netcfg.duplex = n->net_conf.duplex;
     memcpy(config, &netcfg, n->config_size);
 }
 
@@ -1941,6 +1946,25 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
         n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
     }
 
+    if (n->net_conf.duplex_str) {
+        if (strncmp(n->net_conf.duplex_str, "half", 5) == 0) {
+            n->net_conf.duplex = DUPLEX_HALF;
+        } else if (strncmp(n->net_conf.duplex_str, "full", 5) == 0) {
+            n->net_conf.duplex = DUPLEX_FULL;
+        } else {
+            error_setg(errp, "'duplex' must be 'half' or 'full'");
+        }
+        n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
+    } else {
+        n->net_conf.duplex = DUPLEX_UNKNOWN;
+    }
+
+    if (n->net_conf.speed < SPEED_UNKNOWN) {
+        error_setg(errp, "'speed' must be between 0 and INT_MAX");
+    } else if (n->net_conf.speed >= 0) {
+        n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
+    }
+
     virtio_net_set_config_size(n, n->host_features);
     virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
 
@@ -2161,6 +2185,8 @@ static Property virtio_net_properties[] = {
     DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0),
     DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend,
                      true),
+    DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
+    DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [virtio-dev] [PULL v2 02/50] virtio-net: use 64-bit values for feature flags
@ 2018-03-20  3:17   ` Michael S. Tsirkin
  0 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Jason Baron, Jason Wang, virtio-dev

From: Jason Baron <jbaron@akamai.com>

In prepartion for using some of the high order feature bits, make sure that
virtio-net uses 64-bit values everywhere.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio-net.h |  2 +-
 hw/net/virtio-net.c            | 55 +++++++++++++++++++++---------------------
 2 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index b81b6a4..e7634c9 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -67,7 +67,7 @@ typedef struct VirtIONet {
     uint32_t has_vnet_hdr;
     size_t host_hdr_len;
     size_t guest_hdr_len;
-    uint32_t host_features;
+    uint64_t host_features;
     uint8_t has_ufo;
     uint32_t mergeable_rx_bufs;
     uint8_t promisc;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 188744e..ab06f93 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -48,18 +48,18 @@
     (offsetof(container, field) + sizeof(((container *)0)->field))
 
 typedef struct VirtIOFeature {
-    uint32_t flags;
+    uint64_t flags;
     size_t end;
 } VirtIOFeature;
 
 static VirtIOFeature feature_sizes[] = {
-    {.flags = 1 << VIRTIO_NET_F_MAC,
+    {.flags = 1ULL << VIRTIO_NET_F_MAC,
      .end = endof(struct virtio_net_config, mac)},
-    {.flags = 1 << VIRTIO_NET_F_STATUS,
+    {.flags = 1ULL << VIRTIO_NET_F_STATUS,
      .end = endof(struct virtio_net_config, status)},
-    {.flags = 1 << VIRTIO_NET_F_MQ,
+    {.flags = 1ULL << VIRTIO_NET_F_MQ,
      .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
-    {.flags = 1 << VIRTIO_NET_F_MTU,
+    {.flags = 1ULL << VIRTIO_NET_F_MTU,
      .end = endof(struct virtio_net_config, mtu)},
     {}
 };
@@ -1938,7 +1938,7 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     int i;
 
     if (n->net_conf.mtu) {
-        n->host_features |= (0x1 << VIRTIO_NET_F_MTU);
+        n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
     }
 
     virtio_net_set_config_size(n, n->host_features);
@@ -2109,45 +2109,46 @@ static const VMStateDescription vmstate_virtio_net = {
 };
 
 static Property virtio_net_properties[] = {
-    DEFINE_PROP_BIT("csum", VirtIONet, host_features, VIRTIO_NET_F_CSUM, true),
-    DEFINE_PROP_BIT("guest_csum", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("csum", VirtIONet, host_features,
+                    VIRTIO_NET_F_CSUM, true),
+    DEFINE_PROP_BIT64("guest_csum", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_CSUM, true),
-    DEFINE_PROP_BIT("gso", VirtIONet, host_features, VIRTIO_NET_F_GSO, true),
-    DEFINE_PROP_BIT("guest_tso4", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("gso", VirtIONet, host_features, VIRTIO_NET_F_GSO, true),
+    DEFINE_PROP_BIT64("guest_tso4", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_TSO4, true),
-    DEFINE_PROP_BIT("guest_tso6", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("guest_tso6", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_TSO6, true),
-    DEFINE_PROP_BIT("guest_ecn", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("guest_ecn", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_ECN, true),
-    DEFINE_PROP_BIT("guest_ufo", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("guest_ufo", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_UFO, true),
-    DEFINE_PROP_BIT("guest_announce", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("guest_announce", VirtIONet, host_features,
                     VIRTIO_NET_F_GUEST_ANNOUNCE, true),
-    DEFINE_PROP_BIT("host_tso4", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("host_tso4", VirtIONet, host_features,
                     VIRTIO_NET_F_HOST_TSO4, true),
-    DEFINE_PROP_BIT("host_tso6", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("host_tso6", VirtIONet, host_features,
                     VIRTIO_NET_F_HOST_TSO6, true),
-    DEFINE_PROP_BIT("host_ecn", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("host_ecn", VirtIONet, host_features,
                     VIRTIO_NET_F_HOST_ECN, true),
-    DEFINE_PROP_BIT("host_ufo", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("host_ufo", VirtIONet, host_features,
                     VIRTIO_NET_F_HOST_UFO, true),
-    DEFINE_PROP_BIT("mrg_rxbuf", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("mrg_rxbuf", VirtIONet, host_features,
                     VIRTIO_NET_F_MRG_RXBUF, true),
-    DEFINE_PROP_BIT("status", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("status", VirtIONet, host_features,
                     VIRTIO_NET_F_STATUS, true),
-    DEFINE_PROP_BIT("ctrl_vq", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_vq", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_VQ, true),
-    DEFINE_PROP_BIT("ctrl_rx", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_rx", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_RX, true),
-    DEFINE_PROP_BIT("ctrl_vlan", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_vlan", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_VLAN, true),
-    DEFINE_PROP_BIT("ctrl_rx_extra", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_rx_extra", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_RX_EXTRA, true),
-    DEFINE_PROP_BIT("ctrl_mac_addr", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_mac_addr", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_MAC_ADDR, true),
-    DEFINE_PROP_BIT("ctrl_guest_offloads", VirtIONet, host_features,
+    DEFINE_PROP_BIT64("ctrl_guest_offloads", VirtIONet, host_features,
                     VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, true),
-    DEFINE_PROP_BIT("mq", VirtIONet, host_features, VIRTIO_NET_F_MQ, false),
+    DEFINE_PROP_BIT64("mq", VirtIONet, host_features, VIRTIO_NET_F_MQ, false),
     DEFINE_NIC_PROPERTIES(VirtIONet, nic_conf),
     DEFINE_PROP_UINT32("x-txtimer", VirtIONet, net_conf.txtimer,
                        TX_TIMER_INTERVAL),
-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 04/50] acpi: remove unused acpi-dsdt.aml
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2018-03-20  3:17   ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 05/50] pc: replace pm object initialization with one-liner in acpi_get_pm_info() Michael S. Tsirkin
                   ` (46 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Igor Mammedov, Gerd Hoffmann, Eric Auger,
	Marc-André Lureau, Daniel P. Berrangé,
	Markus Armbruster, Eric Blake

From: Igor Mammedov <imammedo@redhat.com>

SeaBIOS blob which is currently shipped with QEMU
doesn't need acpi-dsdt.aml nor is able to use it
and code that loaded it in QEMU was removed by
(commit 9fb7aaaf4c "pc: drop external DSDT loading")
in 2013.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 Makefile              |   1 -
 pc-bios/acpi-dsdt.aml | Bin 4405 -> 0 bytes
 2 files changed, 1 deletion(-)
 delete mode 100644 pc-bios/acpi-dsdt.aml

diff --git a/Makefile b/Makefile
index c811669..677a54b 100644
--- a/Makefile
+++ b/Makefile
@@ -775,7 +775,6 @@ bepo    cz
 ifdef INSTALL_BLOBS
 BLOBS=bios.bin bios-256k.bin sgabios.bin vgabios.bin vgabios-cirrus.bin \
 vgabios-stdvga.bin vgabios-vmware.bin vgabios-qxl.bin vgabios-virtio.bin \
-acpi-dsdt.aml \
 ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc QEMU,tcx.bin QEMU,cgthree.bin \
 pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
 pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
diff --git a/pc-bios/acpi-dsdt.aml b/pc-bios/acpi-dsdt.aml
deleted file mode 100644
index 558c10f51ccbbf9ec2f47a4a998a7055059a8963..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 4405
zcmb7{O>f)C8OI;KNTx=#Ov$uk$9XY?b_=xIjb0q@_EJP5WkrsxFrpHqP*76NE}&CE
zqzSOzpn&5RO*WTe*G<u*ve0?^5!z#q`3UV-*rJF}QJ-hVlxK_$u;qa>|Kam8znLL9
z<A?s>dJ#bTx_LkF0GjuGY(WhGo!+3koGWcQ9rFPU5B+94((<~g4WH$C9dAv`{m^gT
zZEJrW$A5|A$IoMJl)(Ns&a3@V^7|L@K9JFq{e&^9IOQm8M#H0x!0S}3=w`>a8*i9l
zMGe0XR&=-HYff+FBQoL^UcVI<n+xnWFBU<!u}c6mx@m3g#6Gb#3)?l@pr*I@_{5&;
z#Tgm?=lKNy@f?7`Y?dceyma7Ch?1^<&1QdpC#wIrasZas-`*<LS>@%=&fLZ0Lo8-=
z2|2%0`vJHO7J2;;UQ)-|gCMNeL^TdtX>}ZQ>$N1Pgb_W)N-Ls=$)m@-itRY~WHYgk
zgX+BqrWEW?)FoC3!tE_lT@6}k^@E_hy_E!2ipVPzkypAAJ^BL$Apdw8JA0Oxf|hkN
zXbsXi&~OfL^l_GN27^7ov3&C`59aWhLwfmMtLJY9eLvcCx1(^-fP`A&gqlWQ#LS5&
z_E*O-9LM?DYzmXYSH~mx^T>vO|2H#*DO<8=Sc*kf_+yTS?9DqcX}p{p+4*D-kG#yi
zb|d180Xv{$XK)dCIxy@<o~j1#hl>PNAErQ+8n3KJVco~H$7En{Z-6#s#%p5=&X1+|
z>%skMU4%Dqj4^z*PT^<HPDV28n4R#f8{BTI;^{1=ethujk0=Ux0^Ga?3*DgA)8G>@
zyarVauZe}V<Kx}wZd^0cwM;RGM?dcmJR}qgKaWeEhGmVdw6z2haP%@Q?MLtk^y~o)
zk3PQD^ylV=;pX_@&&QKH#t?&sUZ29JSeA7h*5T1l_HN&uJ1#AsceGfh3=SFYnmfKX
ze-#(NT@&+50P!S?b2^3B<~-vDTWf3I8Q&RTwzap$TO7yo4fv_alm4<B4CYDAc_<p8
z?+N9w#kTgj@ws7H<wNe@GQHb-)pT?+n)o23J)-e_Uzii)!~m=8@Gv_Rrgkn2)8}z;
zg5DcPKhZIcg>jsYb+#sOA%+7j58pBiUkMT(uE)EZc=I=hhhb|Mzl_$me4&!?nw8e>
zrn?k)tz9iS(8fRw?snkyc5t5KZ$5k#v#U?yN%1MgInZJNd^U)+Nr_tgvleDJyAxVO
zPWv%F!Kn)RgVL?v9+vVZzHHF#-SR=yHLN$FWK%oSQ8ZIwpzxryXxg(Ge)CX;b46Zg
zSP;*+ADX6;JTX4^)VU|xo+|Q8P4P9NjA+U|QIaS2hTGy7TG*Z{@=Q$);fbc)6D4`3
zS@1I<Y`Lcir;Oax6rO44QOcYd?wR%=!#z{ejPOi5k5cB$Dx6vFnVM!*PLwj|g2K7L
zJyXsFl@q1RX(^nR!fC0TC}mFkpyCLnoH>Ovr*fi{Ihn%A6i%jcqLexDH;OrNO!%zi
z70$fMiBjgo54$v<w!&$voG4|^MTHap`xqyk&qb9JrOa7SI137ALFGg#b1o^IOA6<b
z%863ubQDfU;dE3^lrkr7#-#ZyDx5`?6Q#^qQaDQrXG!HmDRV9>oXZO5vdW24=5!TK
zSK)M3PLwj|io&^~aIUDFC}qx7g>zNmTva(y%AB|xl-BJ9h4X^SiBjfVQ#jWY&NY=2
zrOdgmaIPzy>nbNone(E;c~RlKsB)r|IX4te+zyF%j(;^bR8EvK=Ou;nlEQgO<wPlS
zURF3SE1Z{APLwj|6@~MP!g)pIL@9G#RXDFIoL5y&l!9~k>_^uO`jgU*EWn+e7WD5_
zEWB0eR-;?pa+f=I@RvWyJ!OYu+`;CiEbnf2?s)wi8uTm00?U7yg&aRY9KcIzV;Q`6
zCiz!mc9@K*KBea2QFnpf=yXS7<8GMt+Vm$6i>qw;%L3#K{9WM*1%OT{c#v2UJ3Z<I
zb<ZtEekX+AQJo#~mL-1Dm{OOxz7U1|P<uHRy}+$`zeDY(*_-FG<L2rIXRk`xt2}!Z
z`$y-TG<((k{_NG^(H^mT=dv^X|43hx(${$U+PU<#_oT0#ruWaM$J5Rarmsus>pXq^
zT>AQZ($|Maw@suE&!;y<`g94=kqD<e-Q4HhET3#QFUFX<icK`TPP;%`LHD{B>@_qz
zV0*#s-WcMfm}eH?9)hk>GJY{)I`G1PBt~VzbmU(20$kE(UXx6W8+$q?xy%b%yZW%q
z{)wleKHuy9jcpE}*<9c)y5YFHS*&<~ODl{%!&5$OWOp*J;^)+h)37m&ChTd<7T}A0
zZU426&7a``5jTMQ$<uue9!|<%ACDd;4|&&Pn6TrAT5quPt5|z&@sb%&VyBmj+Chtt
z+hW5DI+aRg8*mW1l?u2kQL9pg2WMw1+E%*`w$|W**tBCmxpiGQZHeN#C{81NEYv5W
U_=PAMqG*c36NN8|mMC`Mf01wzJpcdz

-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 05/50] pc: replace pm object initialization with one-liner in acpi_get_pm_info()
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 04/50] acpi: remove unused acpi-dsdt.aml Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 06/50] acpi: reuse AcpiGenericAddress instead of Acpi20GenericAddress Michael S. Tsirkin
                   ` (45 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Igor Mammedov, Eric Auger,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost

From: Igor Mammedov <imammedo@redhat.com>

next patch will need it before it gets to piix4/lpc branches
that initializes 'obj' now.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/acpi-build.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index deb440f..b85fefe 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -128,7 +128,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
 {
     Object *piix = piix4_pm_find();
     Object *lpc = ich9_lpc_find();
-    Object *obj = NULL;
+    Object *obj = piix ? piix : lpc;
     QObject *o;
 
     pm->force_rev1_fadt = false;
@@ -138,7 +138,6 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
     if (piix) {
         /* w2k requires FADT(rev1) or it won't boot, keep PC compatible */
         pm->force_rev1_fadt = true;
-        obj = piix;
         pm->cpu_hp_io_base = PIIX4_CPU_HOTPLUG_IO_BASE;
         pm->pcihp_io_base =
             object_property_get_uint(obj, ACPI_PCIHP_IO_BASE_PROP, NULL);
@@ -146,7 +145,6 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
             object_property_get_uint(obj, ACPI_PCIHP_IO_LEN_PROP, NULL);
     }
     if (lpc) {
-        obj = lpc;
         pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
     }
     assert(obj);
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 07/50] acpi: add build_append_gas() helper for Generic Address Structure
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 06/50] acpi: reuse AcpiGenericAddress instead of Acpi20GenericAddress Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 08/50] acpi: move ACPI_PORT_SMI_CMD define to header it belongs to Michael S. Tsirkin
                   ` (43 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Igor Mammedov, Eric Auger

From: Igor Mammedov <imammedo@redhat.com>

it will help to add Generic Address Structure to ACPI tables
without using packed C structures and avoid endianness
issues as API doesn't need an explicit conversion.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/acpi/aml-build.h | 20 ++++++++++++++++++++
 hw/acpi/aml-build.c         | 16 ++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 88d0738..8692ccc 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -78,6 +78,15 @@ typedef enum {
 } AmlUpdateRule;
 
 typedef enum {
+    AML_AS_SYSTEM_MEMORY = 0X00,
+    AML_AS_SYSTEM_IO = 0X01,
+    AML_AS_PCI_CONFIG = 0X02,
+    AML_AS_EMBEDDED_CTRL = 0X03,
+    AML_AS_SMBUS = 0X04,
+    AML_AS_FFH = 0X7F,
+} AmlAddressSpace;
+
+typedef enum {
     AML_SYSTEM_MEMORY = 0X00,
     AML_SYSTEM_IO = 0X01,
     AML_PCI_CONFIG = 0X02,
@@ -389,6 +398,17 @@ int
 build_append_named_dword(GArray *array, const char *name_format, ...)
 GCC_FMT_ATTR(2, 3);
 
+void build_append_gas(GArray *table, AmlAddressSpace as,
+                      uint8_t bit_width, uint8_t bit_offset,
+                      uint8_t access_width, uint64_t address);
+
+static inline void
+build_append_gas_from_struct(GArray *table, const struct AcpiGenericAddress *s)
+{
+    build_append_gas(table, s->space_id, s->bit_width, s->bit_offset,
+                     s->access_width, s->address);
+}
+
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
                        uint64_t len, int node, MemoryAffinityFlags flags);
 
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 36a6cc4..3fef5f6 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -258,6 +258,22 @@ static void build_append_int(GArray *table, uint64_t value)
     }
 }
 
+/* Generic Address Structure (GAS)
+ * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure
+ * 2.0 compat note:
+ *    @access_width must be 0, see ACPI 2.0:Table 5-1
+ */
+void build_append_gas(GArray *table, AmlAddressSpace as,
+                      uint8_t bit_width, uint8_t bit_offset,
+                      uint8_t access_width, uint64_t address)
+{
+    build_append_int_noprefix(table, as, 1);
+    build_append_int_noprefix(table, bit_width, 1);
+    build_append_int_noprefix(table, bit_offset, 1);
+    build_append_int_noprefix(table, access_width, 1);
+    build_append_int_noprefix(table, address, 8);
+}
+
 /*
  * Build NAME(XXXX, 0x00000000) where 0x00000000 is encoded as a dword,
  * and return the offset to 0x00000000 for runtime patching.
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 06/50] acpi: reuse AcpiGenericAddress instead of Acpi20GenericAddress
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 05/50] pc: replace pm object initialization with one-liner in acpi_get_pm_info() Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 07/50] acpi: add build_append_gas() helper for Generic Address Structure Michael S. Tsirkin
                   ` (44 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Igor Mammedov, Eric Auger

From: Igor Mammedov <imammedo@redhat.com>

Drop duplicate in form of Acpi20GenericAddress and reuse
AcpiGenericAddress.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/acpi/acpi-defs.h | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 80c8099..9942bc5 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -40,18 +40,6 @@ enum {
     ACPI_FADT_F_LOW_POWER_S0_IDLE_CAPABLE,
 };
 
-/*
- * ACPI 2.0 Generic Address Space definition.
- */
-struct Acpi20GenericAddress {
-    uint8_t  address_space_id;
-    uint8_t  register_bit_width;
-    uint8_t  register_bit_offset;
-    uint8_t  reserved;
-    uint64_t address;
-} QEMU_PACKED;
-typedef struct Acpi20GenericAddress Acpi20GenericAddress;
-
 struct AcpiRsdpDescriptor {        /* Root System Descriptor Pointer */
     uint64_t signature;              /* ACPI signature, contains "RSD PTR " */
     uint8_t  checksum;               /* To make sum of struct == 0 */
@@ -167,7 +155,8 @@ struct AcpiGenericAddress {
     uint8_t space_id;        /* Address space where struct or register exists */
     uint8_t bit_width;       /* Size in bits of given register */
     uint8_t bit_offset;      /* Bit offset within the register */
-    uint8_t access_width;    /* Minimum Access size (ACPI 3.0) */
+    uint8_t access_width;    /* ACPI 3.0: Minimum Access size (ACPI 3.0),
+                                ACPI 2.0: Reserved, Table 5-1 */
     uint64_t address;        /* 64-bit address of struct or register */
 } QEMU_PACKED;
 
@@ -456,7 +445,7 @@ typedef struct AcpiGenericTimerTable AcpiGenericTimerTable;
 struct Acpi20Hpet {
     ACPI_TABLE_HEADER_DEF                    /* ACPI common table header */
     uint32_t           timer_block_id;
-    Acpi20GenericAddress addr;
+    struct AcpiGenericAddress addr;
     uint8_t            hpet_number;
     uint16_t           min_tick;
     uint8_t            page_protect;
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 08/50] acpi: move ACPI_PORT_SMI_CMD define to header it belongs to
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (6 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 07/50] acpi: add build_append_gas() helper for Generic Address Structure Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 09/50] pc: acpi: isolate FADT specific data into AcpiFadtData structure Michael S. Tsirkin
                   ` (42 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Igor Mammedov, Eric Auger, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

From: Igor Mammedov <imammedo@redhat.com>

ACPI_PORT_SMI_CMD is alias for APM_CNT_IOPORT,
so make it really one instead of duplicating its value.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/isa/apm.h | 3 +++
 hw/i386/acpi-build.c | 2 --
 hw/isa/apm.c         | 1 -
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/hw/isa/apm.h b/include/hw/isa/apm.h
index 4839ff1..b7098bf 100644
--- a/include/hw/isa/apm.h
+++ b/include/hw/isa/apm.h
@@ -5,6 +5,9 @@
 #include "hw/hw.h"
 #include "exec/memory.h"
 
+#define APM_CNT_IOPORT  0xb2
+#define ACPI_PORT_SMI_CMD APM_CNT_IOPORT
+
 typedef void (*apm_ctrl_changed_t)(uint32_t val, void *arg);
 
 typedef struct APMState {
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b85fefe..699f3a0 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -255,8 +255,6 @@ static void acpi_get_pci_holes(Range *hole, Range *hole64)
                                                NULL));
 }
 
-#define ACPI_PORT_SMI_CMD           0x00b2 /* TODO: this is APM_CNT_IOPORT */
-
 static void acpi_align_size(GArray *blob, unsigned align)
 {
     /* Align size to multiple of given size. This reduces the chance
diff --git a/hw/isa/apm.c b/hw/isa/apm.c
index e232b0d..c3101ef 100644
--- a/hw/isa/apm.c
+++ b/hw/isa/apm.c
@@ -34,7 +34,6 @@
 #endif
 
 /* fixed I/O location */
-#define APM_CNT_IOPORT  0xb2
 #define APM_STS_IOPORT  0xb3
 
 static void apm_ioport_writeb(void *opaque, hwaddr addr, uint64_t val,
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 09/50] pc: acpi: isolate FADT specific data into AcpiFadtData structure
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (7 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 08/50] acpi: move ACPI_PORT_SMI_CMD define to header it belongs to Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 10/50] pc: acpi: use build_append_foo() API to construct FADT Michael S. Tsirkin
                   ` (41 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Igor Mammedov, Eric Auger, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum

From: Igor Mammedov <imammedo@redhat.com>

move FADT data initialization out of fadt_setup() into dedicated
init_fadt_data() that will set common for pc/q35 values in
AcpiFadtData structure and acpi_get_pm_info() will complement
it with pc/q35 specific values initialization.

That will allow to get rid of fadt_setup() and generalize
build_fadt() so it could be easily extended for rev5 and
reused by ARM target.

While at it also move facs/dsdt/xdsdt offsets from build_fadt()
arg list into AcpiFadtData, as they belong to the same dataset.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/acpi/acpi-defs.h |  28 +++++++
 hw/i386/acpi-build.c        | 190 ++++++++++++++++++++++++--------------------
 2 files changed, 130 insertions(+), 88 deletions(-)

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 9942bc5..3fb0ace 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -175,6 +175,34 @@ struct AcpiFadtDescriptorRev5_1 {
 
 typedef struct AcpiFadtDescriptorRev5_1 AcpiFadtDescriptorRev5_1;
 
+typedef struct AcpiFadtData {
+    struct AcpiGenericAddress pm1a_cnt;   /* PM1a_CNT_BLK */
+    struct AcpiGenericAddress pm1a_evt;   /* PM1a_EVT_BLK */
+    struct AcpiGenericAddress pm_tmr;    /* PM_TMR_BLK */
+    struct AcpiGenericAddress gpe0_blk;  /* GPE0_BLK */
+    struct AcpiGenericAddress reset_reg; /* RESET_REG */
+    uint8_t reset_val;         /* RESET_VALUE */
+    uint8_t  rev;              /* Revision */
+    uint32_t flags;            /* Flags */
+    uint32_t smi_cmd;          /* SMI_CMD */
+    uint16_t sci_int;          /* SCI_INT */
+    uint8_t  int_model;        /* INT_MODEL */
+    uint8_t  acpi_enable_cmd;  /* ACPI_ENABLE */
+    uint8_t  acpi_disable_cmd; /* ACPI_DISABLE */
+    uint8_t  rtc_century;      /* CENTURY */
+    uint16_t plvl2_lat;        /* P_LVL2_LAT */
+    uint16_t plvl3_lat;        /* P_LVL3_LAT */
+
+    /*
+     * respective tables offsets within ACPI_BUILD_TABLE_FILE,
+     * NULL if table doesn't exist (in that case field's value
+     * won't be patched by linker and will be kept set to 0)
+     */
+    unsigned *facs_tbl_offset; /* FACS offset in */
+    unsigned *dsdt_tbl_offset;
+    unsigned *xdsdt_tbl_offset;
+} AcpiFadtData;
+
 #define ACPI_FADT_ARM_PSCI_COMPLIANT  (1 << 0)
 #define ACPI_FADT_ARM_PSCI_USE_HVC    (1 << 1)
 
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 699f3a0..1f88ed1 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -91,17 +91,11 @@ typedef struct AcpiMcfgInfo {
 } AcpiMcfgInfo;
 
 typedef struct AcpiPmInfo {
-    bool force_rev1_fadt;
     bool s3_disabled;
     bool s4_disabled;
     bool pcihp_bridge_en;
     uint8_t s4_val;
-    uint16_t sci_int;
-    uint8_t acpi_enable_cmd;
-    uint8_t acpi_disable_cmd;
-    uint32_t gpe0_blk;
-    uint32_t gpe0_blk_len;
-    uint32_t io_base;
+    AcpiFadtData fadt;
     uint16_t cpu_hp_io_base;
     uint16_t pcihp_io_base;
     uint16_t pcihp_io_len;
@@ -124,20 +118,59 @@ typedef struct AcpiBuildPciBusHotplugState {
     bool pcihp_bridge_en;
 } AcpiBuildPciBusHotplugState;
 
+static void init_common_fadt_data(Object *o, AcpiFadtData *data)
+{
+    uint32_t io = object_property_get_uint(o, ACPI_PM_PROP_PM_IO_BASE, NULL);
+    AmlAddressSpace as = AML_AS_SYSTEM_IO;
+    AcpiFadtData fadt = {
+        .rev = 3,
+        .flags =
+            (1 << ACPI_FADT_F_WBINVD) |
+            (1 << ACPI_FADT_F_PROC_C1) |
+            (1 << ACPI_FADT_F_SLP_BUTTON) |
+            (1 << ACPI_FADT_F_RTC_S4) |
+            (1 << ACPI_FADT_F_USE_PLATFORM_CLOCK) |
+            /* APIC destination mode ("Flat Logical") has an upper limit of 8
+             * CPUs for more than 8 CPUs, "Clustered Logical" mode has to be
+             * used
+             */
+            ((max_cpus > 8) ? (1 << ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL) : 0),
+        .int_model = 1 /* Multiple APIC */,
+        .rtc_century = RTC_CENTURY,
+        .plvl2_lat = 0xfff /* C2 state not supported */,
+        .plvl3_lat = 0xfff /* C3 state not supported */,
+        .smi_cmd = ACPI_PORT_SMI_CMD,
+        .sci_int = object_property_get_uint(o, ACPI_PM_PROP_SCI_INT, NULL),
+        .acpi_enable_cmd =
+            object_property_get_uint(o, ACPI_PM_PROP_ACPI_ENABLE_CMD, NULL),
+        .acpi_disable_cmd =
+            object_property_get_uint(o, ACPI_PM_PROP_ACPI_DISABLE_CMD, NULL),
+        .pm1a_evt = { .space_id = as, .bit_width = 4 * 8, .address = io },
+        .pm1a_cnt = { .space_id = as, .bit_width = 2 * 8,
+                      .address = io + 0x04 },
+        .pm_tmr = { .space_id = as, .bit_width = 4 * 8, .address = io + 0x08 },
+        .gpe0_blk = { .space_id = as, .bit_width =
+            object_property_get_uint(o, ACPI_PM_PROP_GPE0_BLK_LEN, NULL) * 8,
+            .address = object_property_get_uint(o, ACPI_PM_PROP_GPE0_BLK, NULL)
+        },
+    };
+    *data = fadt;
+}
+
 static void acpi_get_pm_info(AcpiPmInfo *pm)
 {
     Object *piix = piix4_pm_find();
     Object *lpc = ich9_lpc_find();
     Object *obj = piix ? piix : lpc;
     QObject *o;
-
-    pm->force_rev1_fadt = false;
     pm->cpu_hp_io_base = 0;
     pm->pcihp_io_base = 0;
     pm->pcihp_io_len = 0;
+
+    init_common_fadt_data(obj, &pm->fadt);
     if (piix) {
         /* w2k requires FADT(rev1) or it won't boot, keep PC compatible */
-        pm->force_rev1_fadt = true;
+        pm->fadt.rev = 1;
         pm->cpu_hp_io_base = PIIX4_CPU_HOTPLUG_IO_BASE;
         pm->pcihp_io_base =
             object_property_get_uint(obj, ACPI_PCIHP_IO_BASE_PROP, NULL);
@@ -145,10 +178,19 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
             object_property_get_uint(obj, ACPI_PCIHP_IO_LEN_PROP, NULL);
     }
     if (lpc) {
+        struct AcpiGenericAddress r = { .space_id = AML_AS_SYSTEM_IO,
+            .bit_width = 8, .address = ICH9_RST_CNT_IOPORT };
+        pm->fadt.reset_reg = r;
+        pm->fadt.reset_val = 0xf;
+        pm->fadt.flags |= 1 << ACPI_FADT_F_RESET_REG_SUP;
         pm->cpu_hp_io_base = ICH9_CPU_HOTPLUG_IO_BASE;
     }
     assert(obj);
 
+    /* The above need not be conditional on machine type because the reset port
+     * happens to be the same on PIIX (pc) and ICH9 (q35). */
+    QEMU_BUILD_BUG_ON(ICH9_RST_CNT_IOPORT != RCR_IOPORT);
+
     /* Fill in optional s3/s4 related properties */
     o = object_property_get_qobject(obj, ACPI_PM_PROP_S3_DISABLED, NULL);
     if (o) {
@@ -172,22 +214,6 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
     }
     qobject_decref(o);
 
-    /* Fill in mandatory properties */
-    pm->sci_int = object_property_get_uint(obj, ACPI_PM_PROP_SCI_INT, NULL);
-
-    pm->acpi_enable_cmd = object_property_get_uint(obj,
-                                                   ACPI_PM_PROP_ACPI_ENABLE_CMD,
-                                                   NULL);
-    pm->acpi_disable_cmd =
-        object_property_get_uint(obj,
-                                 ACPI_PM_PROP_ACPI_DISABLE_CMD,
-                                 NULL);
-    pm->io_base = object_property_get_uint(obj, ACPI_PM_PROP_PM_IO_BASE,
-                                           NULL);
-    pm->gpe0_blk = object_property_get_uint(obj, ACPI_PM_PROP_GPE0_BLK,
-                                            NULL);
-    pm->gpe0_blk_len = object_property_get_uint(obj, ACPI_PM_PROP_GPE0_BLK_LEN,
-                                                NULL);
     pm->pcihp_bridge_en =
         object_property_get_bool(obj, "acpi-pci-hotplug-with-bridge-support",
                                  NULL);
@@ -273,73 +299,53 @@ build_facs(GArray *table_data, BIOSLinker *linker)
 }
 
 /* Load chipset information in FADT */
-static void fadt_setup(AcpiFadtDescriptorRev3 *fadt, AcpiPmInfo *pm)
+static void fadt_setup(AcpiFadtDescriptorRev3 *fadt, AcpiFadtData f)
 {
-    fadt->model = 1;
+    fadt->model = f.int_model;
     fadt->reserved1 = 0;
-    fadt->sci_int = cpu_to_le16(pm->sci_int);
-    fadt->smi_cmd = cpu_to_le32(ACPI_PORT_SMI_CMD);
-    fadt->acpi_enable = pm->acpi_enable_cmd;
-    fadt->acpi_disable = pm->acpi_disable_cmd;
+    fadt->sci_int = cpu_to_le16(f.sci_int);
+    fadt->smi_cmd = cpu_to_le32(f.smi_cmd);
+    fadt->acpi_enable = f.acpi_enable_cmd;
+    fadt->acpi_disable = f.acpi_disable_cmd;
     /* EVT, CNT, TMR offset matches hw/acpi/core.c */
-    fadt->pm1a_evt_blk = cpu_to_le32(pm->io_base);
-    fadt->pm1a_cnt_blk = cpu_to_le32(pm->io_base + 0x04);
-    fadt->pm_tmr_blk = cpu_to_le32(pm->io_base + 0x08);
-    fadt->gpe0_blk = cpu_to_le32(pm->gpe0_blk);
+    fadt->pm1a_evt_blk = cpu_to_le32(f.pm1a_evt.address);
+    fadt->pm1a_cnt_blk = cpu_to_le32(f.pm1a_cnt.address);
+    fadt->pm_tmr_blk = cpu_to_le32(f.pm_tmr.address);
+    fadt->gpe0_blk = cpu_to_le32(f.gpe0_blk.address);
     /* EVT, CNT, TMR length matches hw/acpi/core.c */
-    fadt->pm1_evt_len = 4;
-    fadt->pm1_cnt_len = 2;
-    fadt->pm_tmr_len = 4;
-    fadt->gpe0_blk_len = pm->gpe0_blk_len;
-    fadt->plvl2_lat = cpu_to_le16(0xfff); /* C2 state not supported */
-    fadt->plvl3_lat = cpu_to_le16(0xfff); /* C3 state not supported */
-    fadt->flags = cpu_to_le32((1 << ACPI_FADT_F_WBINVD) |
-                              (1 << ACPI_FADT_F_PROC_C1) |
-                              (1 << ACPI_FADT_F_SLP_BUTTON) |
-                              (1 << ACPI_FADT_F_RTC_S4));
-    fadt->flags |= cpu_to_le32(1 << ACPI_FADT_F_USE_PLATFORM_CLOCK);
-    /* APIC destination mode ("Flat Logical") has an upper limit of 8 CPUs
-     * For more than 8 CPUs, "Clustered Logical" mode has to be used
-     */
-    if (max_cpus > 8) {
-        fadt->flags |= cpu_to_le32(1 << ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL);
-    }
-    fadt->century = RTC_CENTURY;
-    if (pm->force_rev1_fadt) {
+    fadt->pm1_evt_len = f.pm1a_evt.bit_width / 8;
+    fadt->pm1_cnt_len = f.pm1a_cnt.bit_width / 8;
+    fadt->pm_tmr_len = f.pm_tmr.bit_width / 8;
+    fadt->gpe0_blk_len = f.gpe0_blk.bit_width / 8;
+    fadt->plvl2_lat = cpu_to_le16(f.plvl2_lat);
+    fadt->plvl3_lat = cpu_to_le16(f.plvl3_lat);
+    fadt->flags = cpu_to_le32(f.flags);
+    fadt->century = f.rtc_century;
+    if (f.rev == 1) {
         return;
     }
 
-    fadt->flags |= cpu_to_le32(1 << ACPI_FADT_F_RESET_REG_SUP);
-    fadt->reset_value = 0xf;
-    fadt->reset_register.space_id = AML_SYSTEM_IO;
-    fadt->reset_register.bit_width = 8;
-    fadt->reset_register.address = cpu_to_le64(ICH9_RST_CNT_IOPORT);
-    /* The above need not be conditional on machine type because the reset port
-     * happens to be the same on PIIX (pc) and ICH9 (q35). */
-    QEMU_BUILD_BUG_ON(ICH9_RST_CNT_IOPORT != RCR_IOPORT);
+    fadt->reset_value = f.reset_val;
+    fadt->reset_register = f.reset_reg;
+    fadt->reset_register.address = cpu_to_le64(f.reset_reg.address);
 
-    fadt->xpm1a_event_block.space_id = AML_SYSTEM_IO;
-    fadt->xpm1a_event_block.bit_width = fadt->pm1_evt_len * 8;
-    fadt->xpm1a_event_block.address = cpu_to_le64(pm->io_base);
+    fadt->xpm1a_event_block = f.pm1a_evt;
+    fadt->xpm1a_event_block.address = cpu_to_le64(f.pm1a_evt.address);
 
-    fadt->xpm1a_control_block.space_id = AML_SYSTEM_IO;
-    fadt->xpm1a_control_block.bit_width = fadt->pm1_cnt_len * 8;
-    fadt->xpm1a_control_block.address = cpu_to_le64(pm->io_base + 0x4);
+    fadt->xpm1a_control_block = f.pm1a_cnt;
+    fadt->xpm1a_control_block.address = cpu_to_le64(f.pm1a_cnt.address);
 
-    fadt->xpm_timer_block.space_id = AML_SYSTEM_IO;
-    fadt->xpm_timer_block.bit_width = fadt->pm_tmr_len * 8;
-    fadt->xpm_timer_block.address = cpu_to_le64(pm->io_base + 0x8);
+    fadt->xpm_timer_block = f.pm_tmr;
+    fadt->xpm_timer_block.address = cpu_to_le64(f.pm_tmr.address);
 
-    fadt->xgpe0_block.space_id = AML_SYSTEM_IO;
-    fadt->xgpe0_block.bit_width = pm->gpe0_blk_len * 8;
-    fadt->xgpe0_block.address = cpu_to_le64(pm->gpe0_blk);
+    fadt->xgpe0_block = f.gpe0_blk;
+    fadt->xgpe0_block.address = cpu_to_le64(f.gpe0_blk.address);
 }
 
 
 /* FADT */
 static void
-build_fadt(GArray *table_data, BIOSLinker *linker, AcpiPmInfo *pm,
-           unsigned facs_tbl_offset, unsigned dsdt_tbl_offset,
+build_fadt(GArray *table_data, BIOSLinker *linker, AcpiFadtData *f,
            const char *oem_id, const char *oem_table_id)
 {
     AcpiFadtDescriptorRev3 *fadt = acpi_data_push(table_data, sizeof(*fadt));
@@ -347,29 +353,29 @@ build_fadt(GArray *table_data, BIOSLinker *linker, AcpiPmInfo *pm,
     unsigned dsdt_entry_offset = (char *)&fadt->dsdt - table_data->data;
     unsigned xdsdt_entry_offset = (char *)&fadt->x_dsdt - table_data->data;
     int fadt_size = sizeof(*fadt);
-    int rev = 3;
 
     /* FACS address to be filled by Guest linker */
     bios_linker_loader_add_pointer(linker,
         ACPI_BUILD_TABLE_FILE, fw_ctrl_offset, sizeof(fadt->firmware_ctrl),
-        ACPI_BUILD_TABLE_FILE, facs_tbl_offset);
+        ACPI_BUILD_TABLE_FILE, *f->facs_tbl_offset);
 
     /* DSDT address to be filled by Guest linker */
-    fadt_setup(fadt, pm);
+    fadt_setup(fadt, *f);
     bios_linker_loader_add_pointer(linker,
         ACPI_BUILD_TABLE_FILE, dsdt_entry_offset, sizeof(fadt->dsdt),
-        ACPI_BUILD_TABLE_FILE, dsdt_tbl_offset);
-    if (pm->force_rev1_fadt) {
-        rev = 1;
+        ACPI_BUILD_TABLE_FILE, *f->dsdt_tbl_offset);
+
+    if (f->rev == 1) {
         fadt_size = offsetof(typeof(*fadt), reset_register);
-    } else {
+    } else if (f->xdsdt_tbl_offset) {
         bios_linker_loader_add_pointer(linker,
             ACPI_BUILD_TABLE_FILE, xdsdt_entry_offset, sizeof(fadt->x_dsdt),
-            ACPI_BUILD_TABLE_FILE, dsdt_tbl_offset);
+            ACPI_BUILD_TABLE_FILE, *f->xdsdt_tbl_offset);
     }
 
     build_header(linker, table_data,
-                 (void *)fadt, "FACP", fadt_size, rev, oem_id, oem_table_id);
+                (void *)fadt, "FACP", fadt_size, f->rev,
+                oem_id, oem_table_id);
 }
 
 void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
@@ -2049,7 +2055,12 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
     aml_append(dev, aml_name_decl("_STA", aml_int(0xB)));
     crs = aml_resource_template();
     aml_append(crs,
-        aml_io(AML_DECODE16, pm->gpe0_blk, pm->gpe0_blk, 1, pm->gpe0_blk_len)
+        aml_io(
+               AML_DECODE16,
+               pm->fadt.gpe0_blk.address,
+               pm->fadt.gpe0_blk.address,
+               1,
+               pm->fadt.gpe0_blk.bit_width / 8)
     );
     aml_append(dev, aml_name_decl("_CRS", crs));
     aml_append(scope, dev);
@@ -2696,7 +2707,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     /* ACPI tables pointed to by RSDT */
     fadt = tables_blob->len;
     acpi_add_table(table_offsets, tables_blob);
-    build_fadt(tables_blob, tables->linker, &pm, facs, dsdt,
+    pm.fadt.facs_tbl_offset = &facs;
+    pm.fadt.dsdt_tbl_offset = &dsdt;
+    pm.fadt.xdsdt_tbl_offset = &dsdt;
+    build_fadt(tables_blob, tables->linker, &pm.fadt,
                slic_oem.id, slic_oem.table_id);
     aml_len += tables_blob->len - fadt;
 
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 10/50] pc: acpi: use build_append_foo() API to construct FADT
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (8 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 09/50] pc: acpi: isolate FADT specific data into AcpiFadtData structure Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 11/50] acpi: move build_fadt() from i386 specific to generic ACPI source Michael S. Tsirkin
                   ` (40 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Igor Mammedov, Eric Auger, Marcel Apfelbaum,
	Paolo Bonzini, Richard Henderson, Eduardo Habkost

From: Igor Mammedov <imammedo@redhat.com>

build_append_foo() API doesn't need explicit endianness
conversions which eliminates a source of errors and
it makes build_fadt() look like declarative definition of
FADT table in ACPI spec, which makes it easy to review.
Also it allows easily extending FADT to support other
revisions which will be used by follow up patches
where build_fadt() will be reused for ARM target.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/acpi-build.c | 146 +++++++++++++++++++++++++++++----------------------
 1 file changed, 84 insertions(+), 62 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1f88ed1..d1b387e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -298,84 +298,106 @@ build_facs(GArray *table_data, BIOSLinker *linker)
     facs->length = cpu_to_le32(sizeof(*facs));
 }
 
-/* Load chipset information in FADT */
-static void fadt_setup(AcpiFadtDescriptorRev3 *fadt, AcpiFadtData f)
-{
-    fadt->model = f.int_model;
-    fadt->reserved1 = 0;
-    fadt->sci_int = cpu_to_le16(f.sci_int);
-    fadt->smi_cmd = cpu_to_le32(f.smi_cmd);
-    fadt->acpi_enable = f.acpi_enable_cmd;
-    fadt->acpi_disable = f.acpi_disable_cmd;
-    /* EVT, CNT, TMR offset matches hw/acpi/core.c */
-    fadt->pm1a_evt_blk = cpu_to_le32(f.pm1a_evt.address);
-    fadt->pm1a_cnt_blk = cpu_to_le32(f.pm1a_cnt.address);
-    fadt->pm_tmr_blk = cpu_to_le32(f.pm_tmr.address);
-    fadt->gpe0_blk = cpu_to_le32(f.gpe0_blk.address);
-    /* EVT, CNT, TMR length matches hw/acpi/core.c */
-    fadt->pm1_evt_len = f.pm1a_evt.bit_width / 8;
-    fadt->pm1_cnt_len = f.pm1a_cnt.bit_width / 8;
-    fadt->pm_tmr_len = f.pm_tmr.bit_width / 8;
-    fadt->gpe0_blk_len = f.gpe0_blk.bit_width / 8;
-    fadt->plvl2_lat = cpu_to_le16(f.plvl2_lat);
-    fadt->plvl3_lat = cpu_to_le16(f.plvl3_lat);
-    fadt->flags = cpu_to_le32(f.flags);
-    fadt->century = f.rtc_century;
-    if (f.rev == 1) {
-        return;
-    }
-
-    fadt->reset_value = f.reset_val;
-    fadt->reset_register = f.reset_reg;
-    fadt->reset_register.address = cpu_to_le64(f.reset_reg.address);
-
-    fadt->xpm1a_event_block = f.pm1a_evt;
-    fadt->xpm1a_event_block.address = cpu_to_le64(f.pm1a_evt.address);
-
-    fadt->xpm1a_control_block = f.pm1a_cnt;
-    fadt->xpm1a_control_block.address = cpu_to_le64(f.pm1a_cnt.address);
-
-    fadt->xpm_timer_block = f.pm_tmr;
-    fadt->xpm_timer_block.address = cpu_to_le64(f.pm_tmr.address);
-
-    fadt->xgpe0_block = f.gpe0_blk;
-    fadt->xgpe0_block.address = cpu_to_le64(f.gpe0_blk.address);
-}
-
-
 /* FADT */
 static void
-build_fadt(GArray *table_data, BIOSLinker *linker, AcpiFadtData *f,
+build_fadt(GArray *tbl, BIOSLinker *linker, AcpiFadtData *f,
            const char *oem_id, const char *oem_table_id)
 {
-    AcpiFadtDescriptorRev3 *fadt = acpi_data_push(table_data, sizeof(*fadt));
-    unsigned fw_ctrl_offset = (char *)&fadt->firmware_ctrl - table_data->data;
-    unsigned dsdt_entry_offset = (char *)&fadt->dsdt - table_data->data;
-    unsigned xdsdt_entry_offset = (char *)&fadt->x_dsdt - table_data->data;
-    int fadt_size = sizeof(*fadt);
+    int off;
+    int fadt_start = tbl->len;
 
-    /* FACS address to be filled by Guest linker */
+    acpi_data_push(tbl, sizeof(AcpiTableHeader));
+
+    /* FACS address to be filled by Guest linker at runtime */
+    off = tbl->len;
+    build_append_int_noprefix(tbl, 0, 4); /* FIRMWARE_CTRL */
     bios_linker_loader_add_pointer(linker,
-        ACPI_BUILD_TABLE_FILE, fw_ctrl_offset, sizeof(fadt->firmware_ctrl),
+        ACPI_BUILD_TABLE_FILE, off, 4,
         ACPI_BUILD_TABLE_FILE, *f->facs_tbl_offset);
 
-    /* DSDT address to be filled by Guest linker */
-    fadt_setup(fadt, *f);
+    /* DSDT address to be filled by Guest linker at runtime */
+    off = tbl->len;
+    build_append_int_noprefix(tbl, 0, 4); /* DSDT */
     bios_linker_loader_add_pointer(linker,
-        ACPI_BUILD_TABLE_FILE, dsdt_entry_offset, sizeof(fadt->dsdt),
+        ACPI_BUILD_TABLE_FILE, off, 4,
         ACPI_BUILD_TABLE_FILE, *f->dsdt_tbl_offset);
 
+    /* ACPI1.0: INT_MODEL, ACPI2.0+: Reserved */
+    build_append_int_noprefix(tbl, f->int_model /* Multiple APIC */, 1);
+    /* Preferred_PM_Profile */
+    build_append_int_noprefix(tbl, 0 /* Unspecified */, 1);
+    build_append_int_noprefix(tbl, f->sci_int, 2); /* SCI_INT */
+    build_append_int_noprefix(tbl, f->smi_cmd, 4); /* SMI_CMD */
+    build_append_int_noprefix(tbl, f->acpi_enable_cmd, 1); /* ACPI_ENABLE */
+    build_append_int_noprefix(tbl, f->acpi_disable_cmd, 1); /* ACPI_DISABLE */
+    build_append_int_noprefix(tbl, 0 /* not supported */, 1); /* S4BIOS_REQ */
+    /* ACPI1.0: Reserved, ACPI2.0+: PSTATE_CNT */
+    build_append_int_noprefix(tbl, 0, 1);
+    build_append_int_noprefix(tbl, f->pm1a_evt.address, 4); /* PM1a_EVT_BLK */
+    build_append_int_noprefix(tbl, 0, 4); /* PM1b_EVT_BLK */
+    build_append_int_noprefix(tbl, f->pm1a_cnt.address, 4); /* PM1a_CNT_BLK */
+    build_append_int_noprefix(tbl, 0, 4); /* PM1b_CNT_BLK */
+    build_append_int_noprefix(tbl, 0, 4); /* PM2_CNT_BLK */
+    build_append_int_noprefix(tbl, f->pm_tmr.address, 4); /* PM_TMR_BLK */
+    build_append_int_noprefix(tbl, f->gpe0_blk.address, 4); /* GPE0_BLK */
+    build_append_int_noprefix(tbl, 0, 4); /* GPE1_BLK */
+    /* PM1_EVT_LEN */
+    build_append_int_noprefix(tbl, f->pm1a_evt.bit_width / 8, 1);
+    /* PM1_CNT_LEN */
+    build_append_int_noprefix(tbl, f->pm1a_cnt.bit_width / 8, 1);
+    build_append_int_noprefix(tbl, 0, 1); /* PM2_CNT_LEN */
+    build_append_int_noprefix(tbl, f->pm_tmr.bit_width / 8, 1); /* PM_TMR_LEN */
+    /* GPE0_BLK_LEN */
+    build_append_int_noprefix(tbl, f->gpe0_blk.bit_width / 8, 1);
+    build_append_int_noprefix(tbl, 0, 1); /* GPE1_BLK_LEN */
+    build_append_int_noprefix(tbl, 0, 1); /* GPE1_BASE */
+    build_append_int_noprefix(tbl, 0, 1); /* CST_CNT */
+    build_append_int_noprefix(tbl, f->plvl2_lat, 2); /* P_LVL2_LAT */
+    build_append_int_noprefix(tbl, f->plvl3_lat, 2); /* P_LVL3_LAT */
+    build_append_int_noprefix(tbl, 0, 2); /* FLUSH_SIZE */
+    build_append_int_noprefix(tbl, 0, 2); /* FLUSH_STRIDE */
+    build_append_int_noprefix(tbl, 0, 1); /* DUTY_OFFSET */
+    build_append_int_noprefix(tbl, 0, 1); /* DUTY_WIDTH */
+    build_append_int_noprefix(tbl, 0, 1); /* DAY_ALRM */
+    build_append_int_noprefix(tbl, 0, 1); /* MON_ALRM */
+    build_append_int_noprefix(tbl, f->rtc_century, 1); /* CENTURY */
+    build_append_int_noprefix(tbl, 0, 2); /* IAPC_BOOT_ARCH */
+    build_append_int_noprefix(tbl, 0, 1); /* Reserved */
+    build_append_int_noprefix(tbl, f->flags, 4); /* Flags */
+
     if (f->rev == 1) {
-        fadt_size = offsetof(typeof(*fadt), reset_register);
-    } else if (f->xdsdt_tbl_offset) {
+        goto build_hdr;
+    }
+
+    build_append_gas_from_struct(tbl, &f->reset_reg); /* RESET_REG */
+    build_append_int_noprefix(tbl, f->reset_val, 1); /* RESET_VALUE */
+    build_append_int_noprefix(tbl, 0, 3); /* Reserved, ACPI 3.0 */
+    build_append_int_noprefix(tbl, 0, 8); /* X_FIRMWARE_CTRL */
+
+    /* XDSDT address to be filled by Guest linker at runtime */
+    off = tbl->len;
+    build_append_int_noprefix(tbl, 0, 8); /* X_DSDT */
+    if (f->xdsdt_tbl_offset) {
         bios_linker_loader_add_pointer(linker,
-            ACPI_BUILD_TABLE_FILE, xdsdt_entry_offset, sizeof(fadt->x_dsdt),
+            ACPI_BUILD_TABLE_FILE, off, 8,
             ACPI_BUILD_TABLE_FILE, *f->xdsdt_tbl_offset);
     }
 
-    build_header(linker, table_data,
-                (void *)fadt, "FACP", fadt_size, f->rev,
-                oem_id, oem_table_id);
+    build_append_gas_from_struct(tbl, &f->pm1a_evt); /* X_PM1a_EVT_BLK */
+    /* X_PM1b_EVT_BLK */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
+    build_append_gas_from_struct(tbl, &f->pm1a_cnt); /* X_PM1a_CNT_BLK */
+    /* X_PM1b_CNT_BLK */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
+    /* X_PM2_CNT_BLK */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
+    build_append_gas_from_struct(tbl, &f->pm_tmr); /* X_PM_TMR_BLK */
+    build_append_gas_from_struct(tbl, &f->gpe0_blk); /* X_GPE0_BLK */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0); /* X_GPE1_BLK */
+
+build_hdr:
+    build_header(linker, tbl, (void *)(tbl->data + fadt_start),
+                 "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
 }
 
 void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 12/50] virt_arm: acpi: reuse common build_fadt()
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (10 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 11/50] acpi: move build_fadt() from i386 specific to generic ACPI source Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 13/50] tests: acpi: don't read all fields in test_acpi_fadt_table() Michael S. Tsirkin
                   ` (38 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Igor Mammedov, Eric Auger, Shannon Zhao, qemu-arm

From: Igor Mammedov <imammedo@redhat.com>

Extend generic build_fadt() to support rev5.1 FADT
and reuse it for 'virt' board, it would allow to
phase out usage of AcpiFadtDescriptorRev5_1 and
later ACPI_FADT_COMMON_DEF.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/acpi/acpi-defs.h | 12 ++----------
 hw/acpi/aml-build.c         | 23 +++++++++++++++++++++--
 hw/arm/virt-acpi-build.c    | 33 ++++++++++++---------------------
 3 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 3fb0ace..fe15e95 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -165,16 +165,6 @@ struct AcpiFadtDescriptorRev3 {
 } QEMU_PACKED;
 typedef struct AcpiFadtDescriptorRev3 AcpiFadtDescriptorRev3;
 
-struct AcpiFadtDescriptorRev5_1 {
-    ACPI_FADT_COMMON_DEF
-    /* 64-bit Sleep Control register (ACPI 5.0) */
-    struct AcpiGenericAddress sleep_control;
-    /* 64-bit Sleep Status register (ACPI 5.0) */
-    struct AcpiGenericAddress sleep_status;
-} QEMU_PACKED;
-
-typedef struct AcpiFadtDescriptorRev5_1 AcpiFadtDescriptorRev5_1;
-
 typedef struct AcpiFadtData {
     struct AcpiGenericAddress pm1a_cnt;   /* PM1a_CNT_BLK */
     struct AcpiGenericAddress pm1a_evt;   /* PM1a_EVT_BLK */
@@ -192,6 +182,8 @@ typedef struct AcpiFadtData {
     uint8_t  rtc_century;      /* CENTURY */
     uint16_t plvl2_lat;        /* P_LVL2_LAT */
     uint16_t plvl3_lat;        /* P_LVL3_LAT */
+    uint16_t arm_boot_arch;    /* ARM_BOOT_ARCH */
+    uint8_t minor_ver;         /* FADT Minor Version */
 
     /*
      * respective tables offsets within ACPI_BUILD_TABLE_FILE,
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 8f45298..3fa557c 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1679,7 +1679,7 @@ void build_slit(GArray *table_data, BIOSLinker *linker)
                  table_data->len - slit_start, 1, NULL, NULL);
 }
 
-/* build rev1/rev3 FADT */
+/* build rev1/rev3/rev5.1 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
                 const char *oem_id, const char *oem_table_id)
 {
@@ -1755,7 +1755,14 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
 
     build_append_gas_from_struct(tbl, &f->reset_reg); /* RESET_REG */
     build_append_int_noprefix(tbl, f->reset_val, 1); /* RESET_VALUE */
-    build_append_int_noprefix(tbl, 0, 3); /* Reserved, ACPI 3.0 */
+    /* Since ACPI 5.1 */
+    if ((f->rev >= 6) || ((f->rev == 5) && f->minor_ver > 0)) {
+        build_append_int_noprefix(tbl, f->arm_boot_arch, 2); /* ARM_BOOT_ARCH */
+        /* FADT Minor Version */
+        build_append_int_noprefix(tbl, f->minor_ver, 1);
+    } else {
+        build_append_int_noprefix(tbl, 0, 3); /* Reserved upto ACPI 5.0 */
+    }
     build_append_int_noprefix(tbl, 0, 8); /* X_FIRMWARE_CTRL */
 
     /* XDSDT address to be filled by Guest linker at runtime */
@@ -1779,6 +1786,18 @@ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
     build_append_gas_from_struct(tbl, &f->gpe0_blk); /* X_GPE0_BLK */
     build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0); /* X_GPE1_BLK */
 
+    if (f->rev <= 4) {
+        goto build_hdr;
+    }
+
+    /* SLEEP_CONTROL_REG */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
+    /* SLEEP_STATUS_REG */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
+
+    /* TODO: extra fields need to be added to support revisions above rev5 */
+    assert(f->rev == 5);
+
 build_hdr:
     build_header(linker, tbl, (void *)(tbl->data + fadt_start),
                  "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index b644da9..c7c6a57 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -654,39 +654,30 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 static void build_fadt_rev5(GArray *table_data, BIOSLinker *linker,
                             VirtMachineState *vms, unsigned dsdt_tbl_offset)
 {
-    int fadt_start = table_data->len;
-    AcpiFadtDescriptorRev5_1 *fadt = acpi_data_push(table_data, sizeof(*fadt));
-    unsigned xdsdt_entry_offset = (char *)&fadt->x_dsdt - table_data->data;
-    uint16_t bootflags;
+    /* ACPI v5.1 */
+    AcpiFadtData fadt = {
+        .rev = 5,
+        .minor_ver = 1,
+        .flags = 1 << ACPI_FADT_F_HW_REDUCED_ACPI,
+        .xdsdt_tbl_offset = &dsdt_tbl_offset,
+    };
 
     switch (vms->psci_conduit) {
     case QEMU_PSCI_CONDUIT_DISABLED:
-        bootflags = 0;
+        fadt.arm_boot_arch = 0;
         break;
     case QEMU_PSCI_CONDUIT_HVC:
-        bootflags = ACPI_FADT_ARM_PSCI_COMPLIANT | ACPI_FADT_ARM_PSCI_USE_HVC;
+        fadt.arm_boot_arch = ACPI_FADT_ARM_PSCI_COMPLIANT |
+                             ACPI_FADT_ARM_PSCI_USE_HVC;
         break;
     case QEMU_PSCI_CONDUIT_SMC:
-        bootflags = ACPI_FADT_ARM_PSCI_COMPLIANT;
+        fadt.arm_boot_arch = ACPI_FADT_ARM_PSCI_COMPLIANT;
         break;
     default:
         g_assert_not_reached();
     }
 
-    /* Hardware Reduced = 1 and use PSCI 0.2+ */
-    fadt->flags = cpu_to_le32(1 << ACPI_FADT_F_HW_REDUCED_ACPI);
-    fadt->arm_boot_flags = cpu_to_le16(bootflags);
-
-    /* ACPI v5.1 (fadt->revision.fadt->minor_revision) */
-    fadt->minor_revision = 0x1;
-
-    /* DSDT address to be filled by Guest linker */
-    bios_linker_loader_add_pointer(linker,
-        ACPI_BUILD_TABLE_FILE, xdsdt_entry_offset, sizeof(fadt->x_dsdt),
-        ACPI_BUILD_TABLE_FILE, dsdt_tbl_offset);
-
-    build_header(linker, table_data, (void *)(table_data->data + fadt_start),
-                 "FACP", table_data->len - fadt_start, 5, NULL, NULL);
+    build_fadt(table_data, linker, &fadt, NULL, NULL);
 }
 
 /* DSDT */
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 11/50] acpi: move build_fadt() from i386 specific to generic ACPI source
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (9 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 10/50] pc: acpi: use build_append_foo() API to construct FADT Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 12/50] virt_arm: acpi: reuse common build_fadt() Michael S. Tsirkin
                   ` (39 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Igor Mammedov, Eric Auger, Shannon Zhao,
	Marcel Apfelbaum, Paolo Bonzini, Richard Henderson,
	Eduardo Habkost, qemu-arm

From: Igor Mammedov <imammedo@redhat.com>

It will be extended and reused by follow up patch for ARM target.

PS:
Since it's generic function now, don't patch FIRMWARE_CTRL, DSDT
fields if they don't point to tables since platform might not
provide them and use X_ variants instead if applicable.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/acpi/aml-build.h |   3 ++
 hw/acpi/aml-build.c         | 105 ++++++++++++++++++++++++++++++++++++++++++++
 hw/arm/virt-acpi-build.c    |   6 +--
 hw/i386/acpi-build.c        | 102 ------------------------------------------
 4 files changed, 111 insertions(+), 105 deletions(-)

diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 8692ccc..6c36903 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -413,4 +413,7 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
                        uint64_t len, int node, MemoryAffinityFlags flags);
 
 void build_slit(GArray *table_data, BIOSLinker *linker);
+
+void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
+                const char *oem_id, const char *oem_table_id);
 #endif
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 3fef5f6..8f45298 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1678,3 +1678,108 @@ void build_slit(GArray *table_data, BIOSLinker *linker)
                  "SLIT",
                  table_data->len - slit_start, 1, NULL, NULL);
 }
+
+/* build rev1/rev3 FADT */
+void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
+                const char *oem_id, const char *oem_table_id)
+{
+    int off;
+    int fadt_start = tbl->len;
+
+    acpi_data_push(tbl, sizeof(AcpiTableHeader));
+
+    /* FACS address to be filled by Guest linker at runtime */
+    off = tbl->len;
+    build_append_int_noprefix(tbl, 0, 4); /* FIRMWARE_CTRL */
+    if (f->facs_tbl_offset) { /* don't patch if not supported by platform */
+        bios_linker_loader_add_pointer(linker,
+            ACPI_BUILD_TABLE_FILE, off, 4,
+            ACPI_BUILD_TABLE_FILE, *f->facs_tbl_offset);
+    }
+
+    /* DSDT address to be filled by Guest linker at runtime */
+    off = tbl->len;
+    build_append_int_noprefix(tbl, 0, 4); /* DSDT */
+    if (f->dsdt_tbl_offset) { /* don't patch if not supported by platform */
+        bios_linker_loader_add_pointer(linker,
+            ACPI_BUILD_TABLE_FILE, off, 4,
+            ACPI_BUILD_TABLE_FILE, *f->dsdt_tbl_offset);
+    }
+
+    /* ACPI1.0: INT_MODEL, ACPI2.0+: Reserved */
+    build_append_int_noprefix(tbl, f->int_model /* Multiple APIC */, 1);
+    /* Preferred_PM_Profile */
+    build_append_int_noprefix(tbl, 0 /* Unspecified */, 1);
+    build_append_int_noprefix(tbl, f->sci_int, 2); /* SCI_INT */
+    build_append_int_noprefix(tbl, f->smi_cmd, 4); /* SMI_CMD */
+    build_append_int_noprefix(tbl, f->acpi_enable_cmd, 1); /* ACPI_ENABLE */
+    build_append_int_noprefix(tbl, f->acpi_disable_cmd, 1); /* ACPI_DISABLE */
+    build_append_int_noprefix(tbl, 0 /* not supported */, 1); /* S4BIOS_REQ */
+    /* ACPI1.0: Reserved, ACPI2.0+: PSTATE_CNT */
+    build_append_int_noprefix(tbl, 0, 1);
+    build_append_int_noprefix(tbl, f->pm1a_evt.address, 4); /* PM1a_EVT_BLK */
+    build_append_int_noprefix(tbl, 0, 4); /* PM1b_EVT_BLK */
+    build_append_int_noprefix(tbl, f->pm1a_cnt.address, 4); /* PM1a_CNT_BLK */
+    build_append_int_noprefix(tbl, 0, 4); /* PM1b_CNT_BLK */
+    build_append_int_noprefix(tbl, 0, 4); /* PM2_CNT_BLK */
+    build_append_int_noprefix(tbl, f->pm_tmr.address, 4); /* PM_TMR_BLK */
+    build_append_int_noprefix(tbl, f->gpe0_blk.address, 4); /* GPE0_BLK */
+    build_append_int_noprefix(tbl, 0, 4); /* GPE1_BLK */
+    /* PM1_EVT_LEN */
+    build_append_int_noprefix(tbl, f->pm1a_evt.bit_width / 8, 1);
+    /* PM1_CNT_LEN */
+    build_append_int_noprefix(tbl, f->pm1a_cnt.bit_width / 8, 1);
+    build_append_int_noprefix(tbl, 0, 1); /* PM2_CNT_LEN */
+    build_append_int_noprefix(tbl, f->pm_tmr.bit_width / 8, 1); /* PM_TMR_LEN */
+    /* GPE0_BLK_LEN */
+    build_append_int_noprefix(tbl, f->gpe0_blk.bit_width / 8, 1);
+    build_append_int_noprefix(tbl, 0, 1); /* GPE1_BLK_LEN */
+    build_append_int_noprefix(tbl, 0, 1); /* GPE1_BASE */
+    build_append_int_noprefix(tbl, 0, 1); /* CST_CNT */
+    build_append_int_noprefix(tbl, f->plvl2_lat, 2); /* P_LVL2_LAT */
+    build_append_int_noprefix(tbl, f->plvl3_lat, 2); /* P_LVL3_LAT */
+    build_append_int_noprefix(tbl, 0, 2); /* FLUSH_SIZE */
+    build_append_int_noprefix(tbl, 0, 2); /* FLUSH_STRIDE */
+    build_append_int_noprefix(tbl, 0, 1); /* DUTY_OFFSET */
+    build_append_int_noprefix(tbl, 0, 1); /* DUTY_WIDTH */
+    build_append_int_noprefix(tbl, 0, 1); /* DAY_ALRM */
+    build_append_int_noprefix(tbl, 0, 1); /* MON_ALRM */
+    build_append_int_noprefix(tbl, f->rtc_century, 1); /* CENTURY */
+    build_append_int_noprefix(tbl, 0, 2); /* IAPC_BOOT_ARCH */
+    build_append_int_noprefix(tbl, 0, 1); /* Reserved */
+    build_append_int_noprefix(tbl, f->flags, 4); /* Flags */
+
+    if (f->rev == 1) {
+        goto build_hdr;
+    }
+
+    build_append_gas_from_struct(tbl, &f->reset_reg); /* RESET_REG */
+    build_append_int_noprefix(tbl, f->reset_val, 1); /* RESET_VALUE */
+    build_append_int_noprefix(tbl, 0, 3); /* Reserved, ACPI 3.0 */
+    build_append_int_noprefix(tbl, 0, 8); /* X_FIRMWARE_CTRL */
+
+    /* XDSDT address to be filled by Guest linker at runtime */
+    off = tbl->len;
+    build_append_int_noprefix(tbl, 0, 8); /* X_DSDT */
+    if (f->xdsdt_tbl_offset) {
+        bios_linker_loader_add_pointer(linker,
+            ACPI_BUILD_TABLE_FILE, off, 8,
+            ACPI_BUILD_TABLE_FILE, *f->xdsdt_tbl_offset);
+    }
+
+    build_append_gas_from_struct(tbl, &f->pm1a_evt); /* X_PM1a_EVT_BLK */
+    /* X_PM1b_EVT_BLK */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
+    build_append_gas_from_struct(tbl, &f->pm1a_cnt); /* X_PM1a_CNT_BLK */
+    /* X_PM1b_CNT_BLK */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
+    /* X_PM2_CNT_BLK */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
+    build_append_gas_from_struct(tbl, &f->pm_tmr); /* X_PM_TMR_BLK */
+    build_append_gas_from_struct(tbl, &f->gpe0_blk); /* X_GPE0_BLK */
+    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0); /* X_GPE1_BLK */
+
+build_hdr:
+    build_header(linker, tbl, (void *)(tbl->data + fadt_start),
+                 "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
+}
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f7fa795..b644da9 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -651,8 +651,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 }
 
 /* FADT */
-static void build_fadt(GArray *table_data, BIOSLinker *linker,
-                       VirtMachineState *vms, unsigned dsdt_tbl_offset)
+static void build_fadt_rev5(GArray *table_data, BIOSLinker *linker,
+                            VirtMachineState *vms, unsigned dsdt_tbl_offset)
 {
     int fadt_start = table_data->len;
     AcpiFadtDescriptorRev5_1 *fadt = acpi_data_push(table_data, sizeof(*fadt));
@@ -761,7 +761,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 
     /* FADT MADT GTDT MCFG SPCR pointed to by RSDT */
     acpi_add_table(table_offsets, tables_blob);
-    build_fadt(tables_blob, tables->linker, vms, dsdt);
+    build_fadt_rev5(tables_blob, tables->linker, vms, dsdt);
 
     acpi_add_table(table_offsets, tables_blob);
     build_madt(tables_blob, tables->linker, vms);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index d1b387e..ebde2cd 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -298,108 +298,6 @@ build_facs(GArray *table_data, BIOSLinker *linker)
     facs->length = cpu_to_le32(sizeof(*facs));
 }
 
-/* FADT */
-static void
-build_fadt(GArray *tbl, BIOSLinker *linker, AcpiFadtData *f,
-           const char *oem_id, const char *oem_table_id)
-{
-    int off;
-    int fadt_start = tbl->len;
-
-    acpi_data_push(tbl, sizeof(AcpiTableHeader));
-
-    /* FACS address to be filled by Guest linker at runtime */
-    off = tbl->len;
-    build_append_int_noprefix(tbl, 0, 4); /* FIRMWARE_CTRL */
-    bios_linker_loader_add_pointer(linker,
-        ACPI_BUILD_TABLE_FILE, off, 4,
-        ACPI_BUILD_TABLE_FILE, *f->facs_tbl_offset);
-
-    /* DSDT address to be filled by Guest linker at runtime */
-    off = tbl->len;
-    build_append_int_noprefix(tbl, 0, 4); /* DSDT */
-    bios_linker_loader_add_pointer(linker,
-        ACPI_BUILD_TABLE_FILE, off, 4,
-        ACPI_BUILD_TABLE_FILE, *f->dsdt_tbl_offset);
-
-    /* ACPI1.0: INT_MODEL, ACPI2.0+: Reserved */
-    build_append_int_noprefix(tbl, f->int_model /* Multiple APIC */, 1);
-    /* Preferred_PM_Profile */
-    build_append_int_noprefix(tbl, 0 /* Unspecified */, 1);
-    build_append_int_noprefix(tbl, f->sci_int, 2); /* SCI_INT */
-    build_append_int_noprefix(tbl, f->smi_cmd, 4); /* SMI_CMD */
-    build_append_int_noprefix(tbl, f->acpi_enable_cmd, 1); /* ACPI_ENABLE */
-    build_append_int_noprefix(tbl, f->acpi_disable_cmd, 1); /* ACPI_DISABLE */
-    build_append_int_noprefix(tbl, 0 /* not supported */, 1); /* S4BIOS_REQ */
-    /* ACPI1.0: Reserved, ACPI2.0+: PSTATE_CNT */
-    build_append_int_noprefix(tbl, 0, 1);
-    build_append_int_noprefix(tbl, f->pm1a_evt.address, 4); /* PM1a_EVT_BLK */
-    build_append_int_noprefix(tbl, 0, 4); /* PM1b_EVT_BLK */
-    build_append_int_noprefix(tbl, f->pm1a_cnt.address, 4); /* PM1a_CNT_BLK */
-    build_append_int_noprefix(tbl, 0, 4); /* PM1b_CNT_BLK */
-    build_append_int_noprefix(tbl, 0, 4); /* PM2_CNT_BLK */
-    build_append_int_noprefix(tbl, f->pm_tmr.address, 4); /* PM_TMR_BLK */
-    build_append_int_noprefix(tbl, f->gpe0_blk.address, 4); /* GPE0_BLK */
-    build_append_int_noprefix(tbl, 0, 4); /* GPE1_BLK */
-    /* PM1_EVT_LEN */
-    build_append_int_noprefix(tbl, f->pm1a_evt.bit_width / 8, 1);
-    /* PM1_CNT_LEN */
-    build_append_int_noprefix(tbl, f->pm1a_cnt.bit_width / 8, 1);
-    build_append_int_noprefix(tbl, 0, 1); /* PM2_CNT_LEN */
-    build_append_int_noprefix(tbl, f->pm_tmr.bit_width / 8, 1); /* PM_TMR_LEN */
-    /* GPE0_BLK_LEN */
-    build_append_int_noprefix(tbl, f->gpe0_blk.bit_width / 8, 1);
-    build_append_int_noprefix(tbl, 0, 1); /* GPE1_BLK_LEN */
-    build_append_int_noprefix(tbl, 0, 1); /* GPE1_BASE */
-    build_append_int_noprefix(tbl, 0, 1); /* CST_CNT */
-    build_append_int_noprefix(tbl, f->plvl2_lat, 2); /* P_LVL2_LAT */
-    build_append_int_noprefix(tbl, f->plvl3_lat, 2); /* P_LVL3_LAT */
-    build_append_int_noprefix(tbl, 0, 2); /* FLUSH_SIZE */
-    build_append_int_noprefix(tbl, 0, 2); /* FLUSH_STRIDE */
-    build_append_int_noprefix(tbl, 0, 1); /* DUTY_OFFSET */
-    build_append_int_noprefix(tbl, 0, 1); /* DUTY_WIDTH */
-    build_append_int_noprefix(tbl, 0, 1); /* DAY_ALRM */
-    build_append_int_noprefix(tbl, 0, 1); /* MON_ALRM */
-    build_append_int_noprefix(tbl, f->rtc_century, 1); /* CENTURY */
-    build_append_int_noprefix(tbl, 0, 2); /* IAPC_BOOT_ARCH */
-    build_append_int_noprefix(tbl, 0, 1); /* Reserved */
-    build_append_int_noprefix(tbl, f->flags, 4); /* Flags */
-
-    if (f->rev == 1) {
-        goto build_hdr;
-    }
-
-    build_append_gas_from_struct(tbl, &f->reset_reg); /* RESET_REG */
-    build_append_int_noprefix(tbl, f->reset_val, 1); /* RESET_VALUE */
-    build_append_int_noprefix(tbl, 0, 3); /* Reserved, ACPI 3.0 */
-    build_append_int_noprefix(tbl, 0, 8); /* X_FIRMWARE_CTRL */
-
-    /* XDSDT address to be filled by Guest linker at runtime */
-    off = tbl->len;
-    build_append_int_noprefix(tbl, 0, 8); /* X_DSDT */
-    if (f->xdsdt_tbl_offset) {
-        bios_linker_loader_add_pointer(linker,
-            ACPI_BUILD_TABLE_FILE, off, 8,
-            ACPI_BUILD_TABLE_FILE, *f->xdsdt_tbl_offset);
-    }
-
-    build_append_gas_from_struct(tbl, &f->pm1a_evt); /* X_PM1a_EVT_BLK */
-    /* X_PM1b_EVT_BLK */
-    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
-    build_append_gas_from_struct(tbl, &f->pm1a_cnt); /* X_PM1a_CNT_BLK */
-    /* X_PM1b_CNT_BLK */
-    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
-    /* X_PM2_CNT_BLK */
-    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0);
-    build_append_gas_from_struct(tbl, &f->pm_tmr); /* X_PM_TMR_BLK */
-    build_append_gas_from_struct(tbl, &f->gpe0_blk); /* X_GPE0_BLK */
-    build_append_gas(tbl, AML_AS_SYSTEM_MEMORY, 0 , 0, 0, 0); /* X_GPE1_BLK */
-
-build_hdr:
-    build_header(linker, tbl, (void *)(tbl->data + fadt_start),
-                 "FACP", tbl->len - fadt_start, f->rev, oem_id, oem_table_id);
-}
-
 void pc_madt_cpu_entry(AcpiDeviceIf *adev, int uid,
                        const CPUArchIdList *apic_ids, GArray *entry)
 {
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 13/50] tests: acpi: don't read all fields in test_acpi_fadt_table()
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (11 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 12/50] virt_arm: acpi: reuse common build_fadt() Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 14/50] standard-headers: update virtio_net.h Michael S. Tsirkin
                   ` (37 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Igor Mammedov, Eric Auger

From: Igor Mammedov <imammedo@redhat.com>

there is no point to read fields here but not actually
checking them so drop it and read only header + dsdt/facs
addresses since it's needed later to fetch that tables.

With this cleanup we can get rid of AcpiFadtDescriptorRev3/
ACPI_FADT_COMMON_DEF which have no users left.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/acpi/acpi-defs.h | 81 --------------------------------------------
 tests/bios-tables-test.c    | 82 ++++++++++-----------------------------------
 2 files changed, 18 insertions(+), 145 deletions(-)

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index fe15e95..5955eb4 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -75,82 +75,6 @@ struct AcpiTableHeader {
 } QEMU_PACKED;
 typedef struct AcpiTableHeader AcpiTableHeader;
 
-/*
- * ACPI Fixed ACPI Description Table (FADT)
- */
-#define ACPI_FADT_COMMON_DEF /* FADT common definition */ \
-    ACPI_TABLE_HEADER_DEF    /* ACPI common table header */ \
-    uint32_t firmware_ctrl;  /* Physical address of FACS */ \
-    uint32_t dsdt;         /* Physical address of DSDT */ \
-    uint8_t  model;        /* System Interrupt Model */ \
-    uint8_t  reserved1;    /* Reserved */ \
-    uint16_t sci_int;      /* System vector of SCI interrupt */ \
-    uint32_t smi_cmd;      /* Port address of SMI command port */ \
-    uint8_t  acpi_enable;  /* Value to write to smi_cmd to enable ACPI */ \
-    uint8_t  acpi_disable; /* Value to write to smi_cmd to disable ACPI */ \
-    /* Value to write to SMI CMD to enter S4BIOS state */ \
-    uint8_t  S4bios_req; \
-    uint8_t  reserved2;    /* Reserved - must be zero */ \
-    /* Port address of Power Mgt 1a acpi_event Reg Blk */ \
-    uint32_t pm1a_evt_blk; \
-    /* Port address of Power Mgt 1b acpi_event Reg Blk */ \
-    uint32_t pm1b_evt_blk; \
-    uint32_t pm1a_cnt_blk; /* Port address of Power Mgt 1a Control Reg Blk */ \
-    uint32_t pm1b_cnt_blk; /* Port address of Power Mgt 1b Control Reg Blk */ \
-    uint32_t pm2_cnt_blk;  /* Port address of Power Mgt 2 Control Reg Blk */ \
-    uint32_t pm_tmr_blk;   /* Port address of Power Mgt Timer Ctrl Reg Blk */ \
-    /* Port addr of General Purpose acpi_event 0 Reg Blk */ \
-    uint32_t gpe0_blk; \
-    /* Port addr of General Purpose acpi_event 1 Reg Blk */ \
-    uint32_t gpe1_blk; \
-    uint8_t  pm1_evt_len;  /* Byte length of ports at pm1_x_evt_blk */ \
-    uint8_t  pm1_cnt_len;  /* Byte length of ports at pm1_x_cnt_blk */ \
-    uint8_t  pm2_cnt_len;  /* Byte Length of ports at pm2_cnt_blk */ \
-    uint8_t  pm_tmr_len;   /* Byte Length of ports at pm_tm_blk */ \
-    uint8_t  gpe0_blk_len; /* Byte Length of ports at gpe0_blk */ \
-    uint8_t  gpe1_blk_len; /* Byte Length of ports at gpe1_blk */ \
-    uint8_t  gpe1_base;    /* Offset in gpe model where gpe1 events start */ \
-    uint8_t  reserved3;    /* Reserved */ \
-    uint16_t plvl2_lat;    /* Worst case HW latency to enter/exit C2 state */ \
-    uint16_t plvl3_lat;    /* Worst case HW latency to enter/exit C3 state */ \
-    uint16_t flush_size;   /* Size of area read to flush caches */ \
-    uint16_t flush_stride; /* Stride used in flushing caches */ \
-    uint8_t  duty_offset;  /* Bit location of duty cycle field in p_cnt reg */ \
-    uint8_t  duty_width;   /* Bit width of duty cycle field in p_cnt reg */ \
-    uint8_t  day_alrm;     /* Index to day-of-month alarm in RTC CMOS RAM */ \
-    uint8_t  mon_alrm;     /* Index to month-of-year alarm in RTC CMOS RAM */ \
-    uint8_t  century;      /* Index to century in RTC CMOS RAM */ \
-    /* IA-PC Boot Architecture Flags (see below for individual flags) */ \
-    uint16_t boot_flags; \
-    uint8_t reserved;    /* Reserved, must be zero */ \
-    /* Miscellaneous flag bits (see below for individual flags) */ \
-    uint32_t flags; \
-    /* 64-bit address of the Reset register */ \
-    struct AcpiGenericAddress reset_register; \
-    /* Value to write to the reset_register port to reset the system */ \
-    uint8_t reset_value; \
-    /* ARM-Specific Boot Flags (see below for individual flags) (ACPI 5.1) */ \
-    uint16_t arm_boot_flags; \
-    uint8_t minor_revision;  /* FADT Minor Revision (ACPI 5.1) */ \
-    uint64_t x_facs;          /* 64-bit physical address of FACS */ \
-    uint64_t x_dsdt;          /* 64-bit physical address of DSDT */ \
-    /* 64-bit Extended Power Mgt 1a Event Reg Blk address */ \
-    struct AcpiGenericAddress xpm1a_event_block; \
-    /* 64-bit Extended Power Mgt 1b Event Reg Blk address */ \
-    struct AcpiGenericAddress xpm1b_event_block; \
-    /* 64-bit Extended Power Mgt 1a Control Reg Blk address */ \
-    struct AcpiGenericAddress xpm1a_control_block; \
-    /* 64-bit Extended Power Mgt 1b Control Reg Blk address */ \
-    struct AcpiGenericAddress xpm1b_control_block; \
-    /* 64-bit Extended Power Mgt 2 Control Reg Blk address */ \
-    struct AcpiGenericAddress xpm2_control_block; \
-    /* 64-bit Extended Power Mgt Timer Ctrl Reg Blk address */ \
-    struct AcpiGenericAddress xpm_timer_block; \
-    /* 64-bit Extended General Purpose Event 0 Reg Blk address */ \
-    struct AcpiGenericAddress xgpe0_block; \
-    /* 64-bit Extended General Purpose Event 1 Reg Blk address */ \
-    struct AcpiGenericAddress xgpe1_block; \
-
 struct AcpiGenericAddress {
     uint8_t space_id;        /* Address space where struct or register exists */
     uint8_t bit_width;       /* Size in bits of given register */
@@ -160,11 +84,6 @@ struct AcpiGenericAddress {
     uint64_t address;        /* 64-bit address of struct or register */
 } QEMU_PACKED;
 
-struct AcpiFadtDescriptorRev3 {
-    ACPI_FADT_COMMON_DEF
-} QEMU_PACKED;
-typedef struct AcpiFadtDescriptorRev3 AcpiFadtDescriptorRev3;
-
 typedef struct AcpiFadtData {
     struct AcpiGenericAddress pm1a_cnt;   /* PM1a_CNT_BLK */
     struct AcpiGenericAddress pm1a_evt;   /* PM1a_EVT_BLK */
diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 65b271a..cd753ff 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -29,7 +29,8 @@ typedef struct {
     uint32_t rsdp_addr;
     AcpiRsdpDescriptor rsdp_table;
     AcpiRsdtDescriptorRev1 rsdt_table;
-    AcpiFadtDescriptorRev3 fadt_table;
+    uint32_t dsdt_addr;
+    uint32_t facs_addr;
     AcpiFacsDescriptorRev1 facs_table;
     uint32_t *rsdt_tables_addr;
     int rsdt_tables_nr;
@@ -127,71 +128,18 @@ static void test_acpi_rsdt_table(test_data *data)
     data->rsdt_tables_nr = tables_nr;
 }
 
-static void test_acpi_fadt_table(test_data *data)
+static void fadt_fetch_facs_and_dsdt_ptrs(test_data *data)
 {
-    AcpiFadtDescriptorRev3 *fadt_table = &data->fadt_table;
     uint32_t addr;
+    AcpiTableHeader hdr;
 
     /* FADT table comes first */
     addr = le32_to_cpu(data->rsdt_tables_addr[0]);
-    ACPI_READ_TABLE_HEADER(fadt_table, addr);
-
-    ACPI_READ_FIELD(fadt_table->firmware_ctrl, addr);
-    ACPI_READ_FIELD(fadt_table->dsdt, addr);
-    ACPI_READ_FIELD(fadt_table->model, addr);
-    ACPI_READ_FIELD(fadt_table->reserved1, addr);
-    ACPI_READ_FIELD(fadt_table->sci_int, addr);
-    ACPI_READ_FIELD(fadt_table->smi_cmd, addr);
-    ACPI_READ_FIELD(fadt_table->acpi_enable, addr);
-    ACPI_READ_FIELD(fadt_table->acpi_disable, addr);
-    ACPI_READ_FIELD(fadt_table->S4bios_req, addr);
-    ACPI_READ_FIELD(fadt_table->reserved2, addr);
-    ACPI_READ_FIELD(fadt_table->pm1a_evt_blk, addr);
-    ACPI_READ_FIELD(fadt_table->pm1b_evt_blk, addr);
-    ACPI_READ_FIELD(fadt_table->pm1a_cnt_blk, addr);
-    ACPI_READ_FIELD(fadt_table->pm1b_cnt_blk, addr);
-    ACPI_READ_FIELD(fadt_table->pm2_cnt_blk, addr);
-    ACPI_READ_FIELD(fadt_table->pm_tmr_blk, addr);
-    ACPI_READ_FIELD(fadt_table->gpe0_blk, addr);
-    ACPI_READ_FIELD(fadt_table->gpe1_blk, addr);
-    ACPI_READ_FIELD(fadt_table->pm1_evt_len, addr);
-    ACPI_READ_FIELD(fadt_table->pm1_cnt_len, addr);
-    ACPI_READ_FIELD(fadt_table->pm2_cnt_len, addr);
-    ACPI_READ_FIELD(fadt_table->pm_tmr_len, addr);
-    ACPI_READ_FIELD(fadt_table->gpe0_blk_len, addr);
-    ACPI_READ_FIELD(fadt_table->gpe1_blk_len, addr);
-    ACPI_READ_FIELD(fadt_table->gpe1_base, addr);
-    ACPI_READ_FIELD(fadt_table->reserved3, addr);
-    ACPI_READ_FIELD(fadt_table->plvl2_lat, addr);
-    ACPI_READ_FIELD(fadt_table->plvl3_lat, addr);
-    ACPI_READ_FIELD(fadt_table->flush_size, addr);
-    ACPI_READ_FIELD(fadt_table->flush_stride, addr);
-    ACPI_READ_FIELD(fadt_table->duty_offset, addr);
-    ACPI_READ_FIELD(fadt_table->duty_width, addr);
-    ACPI_READ_FIELD(fadt_table->day_alrm, addr);
-    ACPI_READ_FIELD(fadt_table->mon_alrm, addr);
-    ACPI_READ_FIELD(fadt_table->century, addr);
-    ACPI_READ_FIELD(fadt_table->boot_flags, addr);
-    ACPI_READ_FIELD(fadt_table->reserved, addr);
-    ACPI_READ_FIELD(fadt_table->flags, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->reset_register, addr);
-    ACPI_READ_FIELD(fadt_table->reset_value, addr);
-    ACPI_READ_FIELD(fadt_table->arm_boot_flags, addr);
-    ACPI_READ_FIELD(fadt_table->minor_revision, addr);
-    ACPI_READ_FIELD(fadt_table->x_facs, addr);
-    ACPI_READ_FIELD(fadt_table->x_dsdt, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->xpm1a_event_block, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->xpm1b_event_block, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->xpm1a_control_block, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->xpm1b_control_block, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->xpm2_control_block, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->xpm_timer_block, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->xgpe0_block, addr);
-    ACPI_READ_GENERIC_ADDRESS(fadt_table->xgpe1_block, addr);
-
-    ACPI_ASSERT_CMP(fadt_table->signature, "FACP");
-    g_assert(!acpi_calc_checksum((uint8_t *)fadt_table,
-                                 le32_to_cpu(fadt_table->length)));
+    ACPI_READ_TABLE_HEADER(&hdr, addr);
+    ACPI_ASSERT_CMP(hdr.signature, "FACP");
+
+    ACPI_READ_FIELD(data->facs_addr, addr);
+    ACPI_READ_FIELD(data->dsdt_addr, addr);
 }
 
 static void sanitize_fadt_ptrs(test_data *data)
@@ -206,6 +154,12 @@ static void sanitize_fadt_ptrs(test_data *data)
             continue;
         }
 
+        /* check original FADT checksum before sanitizing table */
+        g_assert(!(uint8_t)(
+            acpi_calc_checksum((uint8_t *)sdt, sizeof(AcpiTableHeader)) +
+            acpi_calc_checksum((uint8_t *)sdt->aml, sdt->aml_len)
+        ));
+
         /* sdt->aml field offset := spec offset - header size */
         memset(sdt->aml + 0, 0, 4); /* sanitize FIRMWARE_CTRL(36) ptr */
         memset(sdt->aml + 4, 0, 4); /* sanitize DSDT(40) ptr */
@@ -226,7 +180,7 @@ static void sanitize_fadt_ptrs(test_data *data)
 static void test_acpi_facs_table(test_data *data)
 {
     AcpiFacsDescriptorRev1 *facs_table = &data->facs_table;
-    uint32_t addr = le32_to_cpu(data->fadt_table.firmware_ctrl);
+    uint32_t addr = le32_to_cpu(data->facs_addr);
 
     ACPI_READ_FIELD(facs_table->signature, addr);
     ACPI_READ_FIELD(facs_table->length, addr);
@@ -265,7 +219,7 @@ static void fetch_table(AcpiSdtTable *sdt_table, uint32_t addr)
 static void test_acpi_dsdt_table(test_data *data)
 {
     AcpiSdtTable dsdt_table;
-    uint32_t addr = le32_to_cpu(data->fadt_table.dsdt);
+    uint32_t addr = le32_to_cpu(data->dsdt_addr);
 
     fetch_table(&dsdt_table, addr);
     ACPI_ASSERT_CMP(dsdt_table.header.signature, "DSDT");
@@ -674,7 +628,7 @@ static void test_acpi_one(const char *params, test_data *data)
     test_acpi_rsdp_address(data);
     test_acpi_rsdp_table(data);
     test_acpi_rsdt_table(data);
-    test_acpi_fadt_table(data);
+    fadt_fetch_facs_and_dsdt_ptrs(data);
     test_acpi_facs_table(data);
     test_acpi_dsdt_table(data);
     fetch_rsdt_referenced_tables(data);
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 15/50] hw/pci: remove obsolete PCIDevice->init()
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (13 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 14/50] standard-headers: update virtio_net.h Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 16/50] pc-dimm: make qmp_pc_dimm_device_list() sort devices by address Michael S. Tsirkin
                   ` (35 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Philippe Mathieu-Daudé, Marcel Apfelbaum

From: Philippe Mathieu-Daudé <f4bug@amsat.org>

All PCI devices are now QOM'ified.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/pci/pci.h |  1 -
 hw/pci/pci.c         | 14 --------------
 2 files changed, 15 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index d8c18c7..e28f3fa 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -217,7 +217,6 @@ typedef struct PCIDeviceClass {
     DeviceClass parent_class;
 
     void (*realize)(PCIDevice *dev, Error **errp);
-    int (*init)(PCIDevice *dev);/* TODO convert to realize() and remove */
     PCIUnregisterFunc *exit;
     PCIConfigReadFunc *config_read;
     PCIConfigWriteFunc *config_write;
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 2174c25..f98efdc 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2049,18 +2049,6 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
     }
 }
 
-static void pci_default_realize(PCIDevice *dev, Error **errp)
-{
-    PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
-
-    if (pc->init) {
-        if (pc->init(dev) < 0) {
-            error_setg(errp, "Device initialization failed");
-            return;
-        }
-    }
-}
-
 PCIDevice *pci_create_multifunction(PCIBus *bus, int devfn, bool multifunction,
                                     const char *name)
 {
@@ -2533,13 +2521,11 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev)
 static void pci_device_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *k = DEVICE_CLASS(klass);
-    PCIDeviceClass *pc = PCI_DEVICE_CLASS(klass);
 
     k->realize = pci_qdev_realize;
     k->unrealize = pci_qdev_unrealize;
     k->bus_type = TYPE_PCI_BUS;
     k->props = pci_props;
-    pc->realize = pci_default_realize;
 }
 
 static void pci_device_class_base_init(ObjectClass *klass, void *data)
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 14/50] standard-headers: update virtio_net.h
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (12 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 13/50] tests: acpi: don't read all fields in test_acpi_fadt_table() Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 15/50] hw/pci: remove obsolete PCIDevice->init() Michael S. Tsirkin
                   ` (36 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell

include speed/duplex fields

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/standard-headers/linux/virtio_net.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
index 30ff249..e9f255e 100644
--- a/include/standard-headers/linux/virtio_net.h
+++ b/include/standard-headers/linux/virtio_net.h
@@ -57,6 +57,8 @@
 					 * Steering */
 #define VIRTIO_NET_F_CTRL_MAC_ADDR 23	/* Set MAC address */
 
+#define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
+
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
 #endif /* VIRTIO_NET_NO_LEGACY */
@@ -76,6 +78,17 @@ struct virtio_net_config {
 	uint16_t max_virtqueue_pairs;
 	/* Default maximum transmit unit advice */
 	uint16_t mtu;
+	/*
+	 * speed, in units of 1Mb. All values 0 to INT_MAX are legal.
+	 * Any other value stands for unknown.
+	 */
+	uint32_t speed;
+	/*
+	 * 0x00 - half duplex
+	 * 0x01 - full duplex
+	 * Any other value stands for unknown.
+	 */
+	uint8_t duplex;
 } QEMU_PACKED;
 
 /*
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 16/50] pc-dimm: make qmp_pc_dimm_device_list() sort devices by address
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (14 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 15/50] hw/pci: remove obsolete PCIDevice->init() Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 17/50] qmp: distinguish PC-DIMM and NVDIMM in MemoryDeviceInfoList Michael S. Tsirkin
                   ` (34 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Haozhong Zhang, Igor Mammedov,
	David Gibson for ppc part, Bharata B Rao, Alexander Graf,
	Eduardo Habkost, Markus Armbruster, Paolo Bonzini, qemu-ppc

From: Haozhong Zhang <haozhong.zhang@intel.com>

Make qmp_pc_dimm_device_list() return sorted by start address
list of devices so that it could be reused in places that
would need sorted list*. Reuse existing pc_dimm_built_list()
to get sorted list.

While at it hide recursive callbacks from callers, so that:

  qmp_pc_dimm_device_list(qdev_get_machine(), &list);

could be replaced with simpler:

  list = qmp_pc_dimm_device_list();

* follow up patch will use it in build_srat()

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au> for ppc part
Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/mem/pc-dimm.h |  2 +-
 hw/mem/pc-dimm.c         | 83 +++++++++++++++++++++++++-----------------------
 hw/ppc/spapr.c           |  3 +-
 numa.c                   |  4 +--
 qmp.c                    |  7 +---
 stubs/qmp_pc_dimm.c      |  4 +--
 6 files changed, 50 insertions(+), 53 deletions(-)

diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index d83b957..1fc4792 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -93,7 +93,7 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
 
 int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque);
+MemoryDeviceInfoList *qmp_pc_dimm_device_list(void);
 uint64_t pc_existing_dimms_capacity(Error **errp);
 uint64_t get_plugged_memory_size(void);
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 6e74b61..4d050fe 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -162,45 +162,6 @@ uint64_t get_plugged_memory_size(void)
     return pc_existing_dimms_capacity(&error_abort);
 }
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque)
-{
-    MemoryDeviceInfoList ***prev = opaque;
-
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
-        DeviceState *dev = DEVICE(obj);
-
-        if (dev->realized) {
-            MemoryDeviceInfoList *elem = g_new0(MemoryDeviceInfoList, 1);
-            MemoryDeviceInfo *info = g_new0(MemoryDeviceInfo, 1);
-            PCDIMMDeviceInfo *di = g_new0(PCDIMMDeviceInfo, 1);
-            DeviceClass *dc = DEVICE_GET_CLASS(obj);
-            PCDIMMDevice *dimm = PC_DIMM(obj);
-
-            if (dev->id) {
-                di->has_id = true;
-                di->id = g_strdup(dev->id);
-            }
-            di->hotplugged = dev->hotplugged;
-            di->hotpluggable = dc->hotpluggable;
-            di->addr = dimm->addr;
-            di->slot = dimm->slot;
-            di->node = dimm->node;
-            di->size = object_property_get_uint(OBJECT(dimm), PC_DIMM_SIZE_PROP,
-                                                NULL);
-            di->memdev = object_get_canonical_path(OBJECT(dimm->hostmem));
-
-            info->u.dimm.data = di;
-            elem->value = info;
-            elem->next = NULL;
-            **prev = elem;
-            *prev = &elem->next;
-        }
-    }
-
-    object_child_foreach(obj, qmp_pc_dimm_device_list, opaque);
-    return 0;
-}
-
 static int pc_dimm_slot2bitmap(Object *obj, void *opaque)
 {
     unsigned long *bitmap = opaque;
@@ -276,6 +237,50 @@ static int pc_dimm_built_list(Object *obj, void *opaque)
     return 0;
 }
 
+MemoryDeviceInfoList *qmp_pc_dimm_device_list(void)
+{
+    GSList *dimms = NULL, *item;
+    MemoryDeviceInfoList *list = NULL, *prev = NULL;
+
+    object_child_foreach(qdev_get_machine(), pc_dimm_built_list, &dimms);
+
+    for (item = dimms; item; item = g_slist_next(item)) {
+        PCDIMMDevice *dimm = PC_DIMM(item->data);
+        Object *obj = OBJECT(dimm);
+        MemoryDeviceInfoList *elem = g_new0(MemoryDeviceInfoList, 1);
+        MemoryDeviceInfo *info = g_new0(MemoryDeviceInfo, 1);
+        PCDIMMDeviceInfo *di = g_new0(PCDIMMDeviceInfo, 1);
+        DeviceClass *dc = DEVICE_GET_CLASS(obj);
+        DeviceState *dev = DEVICE(obj);
+
+        if (dev->id) {
+            di->has_id = true;
+            di->id = g_strdup(dev->id);
+        }
+        di->hotplugged = dev->hotplugged;
+        di->hotpluggable = dc->hotpluggable;
+        di->addr = dimm->addr;
+        di->slot = dimm->slot;
+        di->node = dimm->node;
+        di->size = object_property_get_uint(obj, PC_DIMM_SIZE_PROP, NULL);
+        di->memdev = object_get_canonical_path(OBJECT(dimm->hostmem));
+
+        info->u.dimm.data = di;
+        elem->value = info;
+        elem->next = NULL;
+        if (prev) {
+            prev->next = elem;
+        } else {
+            list = elem;
+        }
+        prev = elem;
+    }
+
+    g_slist_free(dimms);
+
+    return list;
+}
+
 uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
                                uint64_t address_space_size,
                                uint64_t *hint, uint64_t align, uint64_t size,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7e1c858..44a0670 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -722,8 +722,7 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
     }
 
     if (hotplug_lmb_start) {
-        MemoryDeviceInfoList **prev = &dimms;
-        qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
+        dimms = qmp_pc_dimm_device_list();
     }
 
     /* ibm,dynamic-memory */
diff --git a/numa.c b/numa.c
index 398e2c9..9442704 100644
--- a/numa.c
+++ b/numa.c
@@ -520,12 +520,10 @@ void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
 
 static void numa_stat_memory_devices(NumaNodeMem node_mem[])
 {
-    MemoryDeviceInfoList *info_list = NULL;
-    MemoryDeviceInfoList **prev = &info_list;
+    MemoryDeviceInfoList *info_list = qmp_pc_dimm_device_list();
     MemoryDeviceInfoList *info;
     PCDIMMDeviceInfo     *pcdimm_info;
 
-    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
     for (info = info_list; info; info = info->next) {
         MemoryDeviceInfo *value = info->value;
 
diff --git a/qmp.c b/qmp.c
index 8c7d1cc..4b2517d 100644
--- a/qmp.c
+++ b/qmp.c
@@ -731,12 +731,7 @@ void qmp_object_del(const char *id, Error **errp)
 
 MemoryDeviceInfoList *qmp_query_memory_devices(Error **errp)
 {
-    MemoryDeviceInfoList *head = NULL;
-    MemoryDeviceInfoList **prev = &head;
-
-    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
-
-    return head;
+    return qmp_pc_dimm_device_list();
 }
 
 ACPIOSTInfoList *qmp_query_acpi_ospm_status(Error **errp)
diff --git a/stubs/qmp_pc_dimm.c b/stubs/qmp_pc_dimm.c
index 9ddc4f6..b6b2cca 100644
--- a/stubs/qmp_pc_dimm.c
+++ b/stubs/qmp_pc_dimm.c
@@ -2,9 +2,9 @@
 #include "qom/object.h"
 #include "hw/mem/pc-dimm.h"
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque)
+MemoryDeviceInfoList *qmp_pc_dimm_device_list(void)
 {
-   return 0;
+   return NULL;
 }
 
 uint64_t get_plugged_memory_size(void)
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 17/50] qmp: distinguish PC-DIMM and NVDIMM in MemoryDeviceInfoList
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (15 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 16/50] pc-dimm: make qmp_pc_dimm_device_list() sort devices by address Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 18/50] hw/acpi-build: build SRAT memory affinity structures for DIMM devices Michael S. Tsirkin
                   ` (33 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Haozhong Zhang, Eric Blake, Igor Mammedov,
	Dr. David Alan Gilbert, Eduardo Habkost, Markus Armbruster

From: Haozhong Zhang <haozhong.zhang@intel.com>

It may need to treat PC-DIMM and NVDIMM differently, e.g., when
deciding the necessity of non-volatile flag bit in SRAT memory
affinity structures.

A new field 'nvdimm' is added to the union type MemoryDeviceInfo for
such purpose. Its type is currently PCDIMMDeviceInfo and will be
updated when necessary in the future.

It also fixes "info memory-devices"/query-memory-devices which
currently show nvdimm devices as dimm devices since
object_dynamic_cast(obj, TYPE_PC_DIMM) happily cast nvdimm to
TYPE_PC_DIMM which it's been inherited from.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 qapi/misc.json   |  6 +++++-
 hmp.c            | 14 +++++++++++---
 hw/mem/pc-dimm.c | 10 +++++++++-
 numa.c           | 19 +++++++++++++------
 4 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index bcd5d10..6bf082f 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -2852,7 +2852,11 @@
 #
 # Since: 2.1
 ##
-{ 'union': 'MemoryDeviceInfo', 'data': {'dimm': 'PCDIMMDeviceInfo'} }
+{ 'union': 'MemoryDeviceInfo',
+  'data': { 'dimm': 'PCDIMMDeviceInfo',
+            'nvdimm': 'PCDIMMDeviceInfo'
+          }
+}
 
 ##
 # @query-memory-devices:
diff --git a/hmp.c b/hmp.c
index ba9e299..a277517 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2423,7 +2423,18 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
             switch (value->type) {
             case MEMORY_DEVICE_INFO_KIND_DIMM:
                 di = value->u.dimm.data;
+                break;
+
+            case MEMORY_DEVICE_INFO_KIND_NVDIMM:
+                di = value->u.nvdimm.data;
+                break;
+
+            default:
+                di = NULL;
+                break;
+            }
 
+            if (di) {
                 monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
                                MemoryDeviceInfoKind_str(value->type),
                                di->id ? di->id : "");
@@ -2436,9 +2447,6 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
                                di->hotplugged ? "true" : "false");
                 monitor_printf(mon, "  hotpluggable: %s\n",
                                di->hotpluggable ? "true" : "false");
-                break;
-            default:
-                break;
             }
         }
     }
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 4d050fe..51350d9 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/mem/pc-dimm.h"
+#include "hw/mem/nvdimm.h"
 #include "qapi/error.h"
 #include "qemu/config-file.h"
 #include "qapi/visitor.h"
@@ -250,6 +251,7 @@ MemoryDeviceInfoList *qmp_pc_dimm_device_list(void)
         MemoryDeviceInfoList *elem = g_new0(MemoryDeviceInfoList, 1);
         MemoryDeviceInfo *info = g_new0(MemoryDeviceInfo, 1);
         PCDIMMDeviceInfo *di = g_new0(PCDIMMDeviceInfo, 1);
+        bool is_nvdimm = object_dynamic_cast(obj, TYPE_NVDIMM);
         DeviceClass *dc = DEVICE_GET_CLASS(obj);
         DeviceState *dev = DEVICE(obj);
 
@@ -265,7 +267,13 @@ MemoryDeviceInfoList *qmp_pc_dimm_device_list(void)
         di->size = object_property_get_uint(obj, PC_DIMM_SIZE_PROP, NULL);
         di->memdev = object_get_canonical_path(OBJECT(dimm->hostmem));
 
-        info->u.dimm.data = di;
+        if (!is_nvdimm) {
+            info->u.dimm.data = di;
+            info->type = MEMORY_DEVICE_INFO_KIND_DIMM;
+        } else {
+            info->u.nvdimm.data = di;
+            info->type = MEMORY_DEVICE_INFO_KIND_NVDIMM;
+        }
         elem->value = info;
         elem->next = NULL;
         if (prev) {
diff --git a/numa.c b/numa.c
index 9442704..1116c90 100644
--- a/numa.c
+++ b/numa.c
@@ -529,18 +529,25 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
 
         if (value) {
             switch (value->type) {
-            case MEMORY_DEVICE_INFO_KIND_DIMM: {
+            case MEMORY_DEVICE_INFO_KIND_DIMM:
                 pcdimm_info = value->u.dimm.data;
+                break;
+
+            case MEMORY_DEVICE_INFO_KIND_NVDIMM:
+                pcdimm_info = value->u.nvdimm.data;
+                break;
+
+            default:
+                pcdimm_info = NULL;
+                break;
+            }
+
+            if (pcdimm_info) {
                 node_mem[pcdimm_info->node].node_mem += pcdimm_info->size;
                 if (pcdimm_info->hotpluggable && pcdimm_info->hotplugged) {
                     node_mem[pcdimm_info->node].node_plugged_mem +=
                         pcdimm_info->size;
                 }
-                break;
-            }
-
-            default:
-                break;
             }
         }
     }
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 18/50] hw/acpi-build: build SRAT memory affinity structures for DIMM devices
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (16 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 17/50] qmp: distinguish PC-DIMM and NVDIMM in MemoryDeviceInfoList Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 19/50] tests/bios-tables-test: add test cases for DIMM proximity Michael S. Tsirkin
                   ` (32 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Haozhong Zhang, Igor Mammedov, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Marcel Apfelbaum

From: Haozhong Zhang <haozhong.zhang@intel.com>

ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
domain of a NVDIMM SPA range must match with corresponding entry in
SRAT table.

The address ranges of vNVDIMM in QEMU are allocated from the
hot-pluggable address space, which is entirely covered by one SRAT
memory affinity structure. However, users can set the vNVDIMM
proximity domain in NFIT SPA range structure by the 'node' property of
'-device nvdimm' to a value different than the one in the above SRAT
memory affinity structure.

In order to solve such proximity domain mismatch, this patch builds
one SRAT memory affinity structure for each DIMM device present at
boot time, including both PC-DIMM and NVDIMM, with the proximity
domain specified in '-device pc-dimm' or '-device nvdimm'.

The remaining hot-pluggable address space is covered by one or multiple
SRAT memory affinity structures with the proximity domain of the last
node as before.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/i386/acpi-build.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ebde2cd..1df9ed2 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2250,6 +2250,55 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, GArray *tcpalog)
 #define HOLE_640K_START  (640 * 1024)
 #define HOLE_640K_END   (1024 * 1024)
 
+static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
+                                           uint64_t len, int default_node)
+{
+    MemoryDeviceInfoList *info_list = qmp_pc_dimm_device_list();
+    MemoryDeviceInfoList *info;
+    MemoryDeviceInfo *mi;
+    PCDIMMDeviceInfo *di;
+    uint64_t end = base + len, cur, size;
+    bool is_nvdimm;
+    AcpiSratMemoryAffinity *numamem;
+    MemoryAffinityFlags flags;
+
+    for (cur = base, info = info_list;
+         cur < end;
+         cur += size, info = info->next) {
+        numamem = acpi_data_push(table_data, sizeof *numamem);
+
+        if (!info) {
+            build_srat_memory(numamem, cur, end - cur, default_node,
+                              MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+            break;
+        }
+
+        mi = info->value;
+        is_nvdimm = (mi->type == MEMORY_DEVICE_INFO_KIND_NVDIMM);
+        di = !is_nvdimm ? mi->u.dimm.data : mi->u.nvdimm.data;
+
+        if (cur < di->addr) {
+            build_srat_memory(numamem, cur, di->addr - cur, default_node,
+                              MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+            numamem = acpi_data_push(table_data, sizeof *numamem);
+        }
+
+        size = di->size;
+
+        flags = MEM_AFFINITY_ENABLED;
+        if (di->hotpluggable) {
+            flags |= MEM_AFFINITY_HOTPLUGGABLE;
+        }
+        if (is_nvdimm) {
+            flags |= MEM_AFFINITY_NON_VOLATILE;
+        }
+
+        build_srat_memory(numamem, di->addr, size, di->node, flags);
+    }
+
+    qapi_free_MemoryDeviceInfoList(info_list);
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
@@ -2361,10 +2410,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
      * providing _PXM method if necessary.
      */
     if (hotplugabble_address_space_size) {
-        numamem = acpi_data_push(table_data, sizeof *numamem);
-        build_srat_memory(numamem, pcms->hotplug_memory.base,
-                          hotplugabble_address_space_size, pcms->numa_nodes - 1,
-                          MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+        build_srat_hotpluggable_memory(table_data, pcms->hotplug_memory.base,
+                                       hotplugabble_address_space_size,
+                                       pcms->numa_nodes - 1);
     }
 
     build_header(linker, table_data,
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 19/50] tests/bios-tables-test: add test cases for DIMM proximity
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (17 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 18/50] hw/acpi-build: build SRAT memory affinity structures for DIMM devices Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 20/50] test/acpi-test-data: add ACPI tables for dimmpxm test Michael S. Tsirkin
                   ` (31 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Haozhong Zhang, Igor Mammedov

From: Haozhong Zhang <haozhong.zhang@intel.com>

QEMU now builds one SRAT memory affinity structure for each PC-DIMM
and NVDIMM device presented at boot time with the proximity domain
specified in the device option 'node', rather than only one SRAT
memory affinity structure covering the entire hotpluggable address
space with the proximity domain of the last node.

Add test cases on PC and Q35 machines with 4 proximity domains, and
one PC-DIMM and one NVDIMM attached to the 2nd and 3rd proximity
domains respectively. Check whether the QEMU-built SRAT tables match
with the expected ones.

The following ACPI tables need to be added for this test:
  tests/acpi-test-data/pc/APIC.dimmpxm
  tests/acpi-test-data/pc/DSDT.dimmpxm
  tests/acpi-test-data/pc/NFIT.dimmpxm
  tests/acpi-test-data/pc/SRAT.dimmpxm
  tests/acpi-test-data/pc/SSDT.dimmpxm
  tests/acpi-test-data/q35/APIC.dimmpxm
  tests/acpi-test-data/q35/DSDT.dimmpxm
  tests/acpi-test-data/q35/NFIT.dimmpxm
  tests/acpi-test-data/q35/SRAT.dimmpxm
  tests/acpi-test-data/q35/SSDT.dimmpxm
New APIC and DSDT are needed because of the multiple processors
configuration. New NFIT and SSDT are needed because of NVDIMM.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/bios-tables-test.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index cd753ff..bf3e193 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -823,6 +823,42 @@ static void test_acpi_piix4_tcg_numamem(void)
     free_test_data(&data);
 }
 
+static void test_acpi_tcg_dimm_pxm(const char *machine)
+{
+    test_data data;
+
+    memset(&data, 0, sizeof(data));
+    data.machine = machine;
+    data.variant = ".dimmpxm";
+    test_acpi_one(" -machine nvdimm=on"
+                  " -smp 4,sockets=4"
+                  " -m 128M,slots=3,maxmem=1G"
+                  " -numa node,mem=32M,nodeid=0"
+                  " -numa node,mem=32M,nodeid=1"
+                  " -numa node,mem=32M,nodeid=2"
+                  " -numa node,mem=32M,nodeid=3"
+                  " -numa cpu,node-id=0,socket-id=0"
+                  " -numa cpu,node-id=1,socket-id=1"
+                  " -numa cpu,node-id=2,socket-id=2"
+                  " -numa cpu,node-id=3,socket-id=3"
+                  " -object memory-backend-ram,id=ram0,size=128M"
+                  " -object memory-backend-ram,id=nvm0,size=128M"
+                  " -device pc-dimm,id=dimm0,memdev=ram0,node=1"
+                  " -device nvdimm,id=dimm1,memdev=nvm0,node=2",
+                  &data);
+    free_test_data(&data);
+}
+
+static void test_acpi_q35_tcg_dimm_pxm(void)
+{
+    test_acpi_tcg_dimm_pxm(MACHINE_Q35);
+}
+
+static void test_acpi_piix4_tcg_dimm_pxm(void)
+{
+    test_acpi_tcg_dimm_pxm(MACHINE_PC);
+}
+
 int main(int argc, char *argv[])
 {
     const char *arch = qtest_get_arch();
@@ -847,6 +883,8 @@ int main(int argc, char *argv[])
         qtest_add_func("acpi/q35/memhp", test_acpi_q35_tcg_memhp);
         qtest_add_func("acpi/piix4/numamem", test_acpi_piix4_tcg_numamem);
         qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
+        qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
+        qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
     }
     ret = g_test_run();
     boot_sector_cleanup(disk);
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 20/50] test/acpi-test-data: add ACPI tables for dimmpxm test
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (18 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 19/50] tests/bios-tables-test: add test cases for DIMM proximity Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files Michael S. Tsirkin
                   ` (30 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Haozhong Zhang

From: Haozhong Zhang <haozhong.zhang@intel.com>

Reviewers can use ACPI tables in this patch to run
test_acpi_{piix4,q35}_tcg_dimm_pxm cases.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 tests/acpi-test-data/pc/APIC.dimmpxm  | Bin 0 -> 144 bytes
 tests/acpi-test-data/pc/DSDT.dimmpxm  | Bin 0 -> 6803 bytes
 tests/acpi-test-data/pc/NFIT.dimmpxm  | Bin 0 -> 224 bytes
 tests/acpi-test-data/pc/SRAT.dimmpxm  | Bin 0 -> 472 bytes
 tests/acpi-test-data/pc/SSDT.dimmpxm  | Bin 0 -> 685 bytes
 tests/acpi-test-data/q35/APIC.dimmpxm | Bin 0 -> 144 bytes
 tests/acpi-test-data/q35/DSDT.dimmpxm | Bin 0 -> 9487 bytes
 tests/acpi-test-data/q35/NFIT.dimmpxm | Bin 0 -> 224 bytes
 tests/acpi-test-data/q35/SRAT.dimmpxm | Bin 0 -> 472 bytes
 tests/acpi-test-data/q35/SSDT.dimmpxm | Bin 0 -> 685 bytes
 10 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 tests/acpi-test-data/pc/APIC.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/DSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/NFIT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/SRAT.dimmpxm
 create mode 100644 tests/acpi-test-data/pc/SSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/APIC.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/DSDT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/NFIT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/SRAT.dimmpxm
 create mode 100644 tests/acpi-test-data/q35/SSDT.dimmpxm

diff --git a/tests/acpi-test-data/pc/APIC.dimmpxm b/tests/acpi-test-data/pc/APIC.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..427bb08248e6a029c1c988f74f5e48f93ee4ebe0
GIT binary patch
literal 144
zcmZ<^@N}NQz`($`&dJ}|BUr&HBEZ=ZD8>jB1F=Cg1H*%VV44G{4#eePWQ5R6Oc0ux
t8ALPkfFuw61CdNzKn!AlSgfo-nis_4<b<)ffC?aD+}vOm3)_F75dauy4FLcE

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/pc/DSDT.dimmpxm b/tests/acpi-test-data/pc/DSDT.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..38661cb13ee348718ab45bfc69452cd642cf9bb9
GIT binary patch
literal 6803
zcmcgxUvJyi6~C9H9O_E4DVt54IBf){f7Y%|^v88tY$z;|vZYv*8BxwMFc>Mv!Q`St
z2satx2E`N=aMQdMT8a(WgA(XD`3T!b=-Xbquh3zwpX!}M^3tko0`0>lAoM-={O<3Z
zd+#~tUNX9+w+H74q5rIGXf-QWxnXKL^ie_xw(+l0mu?cfr&rb-ni3>xKTP>;cvNKd
zZN0L&M*q@vzjEEXpS!f<k*#OSldX>T*&}z0An4wf#~3!0MaQZ*c7MUt>Ur6z)%A4w
zYbMH0S#J^9*{thSG2{SKm1}0T%|R4EpawT;X4@cXLcabXMI`&7g7Vz;YE#ddE#1kf
z%Z}A96Ayb_54$>_xJ+?}=`RN^8Mvv#!x0%ye>v!oKX=mPU;jyr$cW9zj@GiWSvI|&
zgc$=lkqFr%%4?U<8+6z1pEYk3O1`gYkx+2OER-~XutQ}fF$UA5x>a@p94sv2mhzgw
zTn6sG{<}-af+Gp3T_&d*X0=JldqmXA*bub}{86+Emql(E+3fy1t+ocF!IGt5vC!Xn
z_R<>lh({D*k<c}|OLmDcwMXp!mGz1q!9`I``l}L6)F0>)mrxkBO}63hq5$)?`)?Q<
zo6*3TxygYtODqxvfn|AB*P=~46?(M5=GW-A;<qA^*68=*_pnflE95Q7=Ps-^%rf8X
zRlPo&QwU424VI_u1ryk7@NbqautW9{`;N^pS$0<SYW56MF$~9l0tj~hgNE6Te3ghg
zA8b7?sXLs4?H-Y*QZ#3UW!C6@@xCa_i#^|;s-$fg1-_^W8blzc!3~L{IS>y-A}=aF
z%_`CqCuo=u@xYG8@(2e4@W{ZU)a0We>Y19=rYZ8A$q?cwXb^*&Ie6$fyJuCeLMqD7
zF``l^Xq9~RDkv&VqeW9npg=*ZG&hZ$O)Cv;ov5#wTJ@@6UqtEf(Cebo+oL-Khud#p
z_lPJ>NTG_OtTPOIRcDUbY7=i(=(!&0JgX$1bXd>(q{9TV<vrN#Y!N1sTSH}V3qVOo
zn?`e8C)>kU2e;@jJLoVe>bA)C(@Z3l0hArwVnWnY346q-M<d(br+ZsWA&|J_(KaF8
zgAUkxv`vY({mAW*d!3PMKYaDFh~8qZV75`SbuBN$qkxYhK1#AWSHA|UI!r!by)Gnu
za>P>Va{ZdtN&=vj&rY+{7gHqZ2iqQbjD0Kt&-yU+qziNIVta($cE527rU}6uBxD*2
z2$m-q*2>DBW^-Rmlcbn{C}r?31^@PlIuqm|I)Uz~Sx2v1<Wp?3p5HTrSxrXhk}sce
ztd>O|cLOh=nzicKA6l<WScmxi$<_;F)(gJ%0{Ay>y`Y|Ky<qnZoVzw*oe4j~d~wWt
z(Klb5Fkg%@UkuIbljh-o_{p;`jhQd`=1UXiOEKn4c7H>H*;|w5rNBH7Av{H3%X!&N
zsZ;)DtEpg((N*Ze-Brr<;K0+^&7-1kwyKc{HsuFbMl&~awL5pckM8|Gw|o2JJNI^P
z-Ts)^R5KgvVfoE4wnoVd@`9$JMnh2fRpbRr+Yc|IP$oGI4;-!Dw5ZlHu2U!oc}gTr
zGju35vj`G3tJ=r`QZKi2YTBtc>#|3%)9&jZ-1Msy_!9V+DQjjupD*OmgWx<*s7qRV
z^|!*14Z37s%jfeu*reDj><m+bE)%d_4B8^LOFL?93d&qLX<xi~S>k6OE(1;vi`8T|
z_~Q|Gcy2JMzzKS6#<kx#rfIB?T-8Y6q<@|vI!Of#C-SScY0c7_a@;hi`>dQAxNvGi
zHJv1dO&G38$0p=&7Odzb9QKEL$2<eHN8P9E(y(@%%HQ&nshD`23*f+Y@<ffKwqdo?
zaT7gpzm7QzGxl~)n3u<d$zFthIN1qHaH4niXX_G19;6}sAc!Hf<PioK#HmqkXH!fj
zGG5O>n{qG-Fer#R?ZBi`I5X1S`4E!&hCxEILU0QiCxen3kx@E9M#sn~C!_Xf8|YaP
zuP|UcbL1IT(1=YCe71Dt8eAx5BHx(6`IrzAmc-+PP!l6UQJf?c#|g!VP*fKn$JflN
zQ_UA4ME#2><~znURNur{nKEi-P>3^T)6AFi%dom|4rYwof4H-|m+Ky@R>8eBC{qj$
z$9XJMH4|?wgt2+MUonCL1I)n*Gr_Fa4I{UG`;R+V`(#6JwwP!?yfhXU=o2!EMyt}u
z!J`I`2DTc|GH*CJ`{COs;LBu%8CA=n2IiZAnPT8Q3oaKYphW|Vq_I)G4i8JqsdN76
zK1>eKC%AdS<-(?hf9)zy8L)KZNC(MpUqs#E;j>>qadCv_BH?gu5Lky4g&N_z^7Qmj
z9R%)RFQzF<yDlasl8mVOj)9eajMn>Wm-vLKfj_699{OlkEzcWp(nA3ZT;N#QXe}>g
zODzBRmxRD8&cyGB!{EoeL-7!9N;r^BgXfK)ISb8N(0sj7<-Kn~GweNWly8LCbI`1L
zxJ@_+8x16aMx%xUu+c!0UF^mNjzz&&<LIIK1p83Gi!${)vwkPN8}(qTfx5;}KQ+K^
z2%uJ}n7zQxe|~0s$~aY=CQi|xUa80!&^s_EXime}uz4CBay+z*Y7fa#>7k#f3U>&H
zMa|_U^;L*NgHCRMhtbJ5)m481fz_6dfp#$Hybm$z0!noe;xrG}`X6s6gb9Ri0D)V}
zlws!Kbq%vKe;*2Cbb;yds}BYR>OsajIl`C<WvqfD^x2nZu?~qNbPr{y9IE2isS!Hj
zcZCMw<tj8-`2QG$*RarF@qcU(wg=KNyyW28hL;>@Mwc9D7k9?WX^EFptZrBn6m1GC
zt-lxHO$utL`aYE20>P6t`U()dvW<0^TGK!JeuRK3RLw7u{h)?6uj?(=Q#3UY@dYWU
zzblOW?NNpZ@EHZ893Su<;2?dFrZV)?Ao;%s^+7t4Gk#9|dpuJZWC*_8;7=gFCO9Jr
zkq;05{zIVo$9KmI3Z8zDTz?8q!2k!(9=7o7f+oST1YZ{>4gl;QyZ*x0qIQ|3#?F;*
zUVQU{N=R~5GHYflSll0<Pp}jxT&0h5k>}(&RpI6Q&6^NTR>_&2lJJ;^_L4IzOYp%0
zW&qN=s6pqIUmY#B-M_T@8*Mdk;9Aw#gbN9V<)p_-LP_9-U8#CE{a0_}j=xtGI1ykP
zDJ!3c&m8!T%<jKs6+!`aM<6P&6?8amuu<fV5;pAqn^xgH7-(%a!*6UbL?r60pR4#O
z01coTp+%%cD7?vlDS*Nkl^7^zO>k%;4qD=@@D3BOPQ^kw>nRRB6$f47tnh^>V3lK`
zoHfaz$vEh{oRx7Xi-mGl2)w|7R2+1fv!*#T9Sh~G84k_FL9cMuX%3x^g>qK-MiKZr
z69;{dvnm{_#6mf1mP4~~(Dym(EQij<LOJUkht9=8%bXP=KbV#p3+1dYap+5N&<bZg
z&7r4bp`7&$hn|UpYMgbRL+4|mob@b+o{fX*ob?=so{NQY)&&k-h=W!+>v;}69}DHI
zFLUV2anM!H`U;1>5)0+57dZ67Nl>-6pkHscg<<+Z7vAOevDWSQf&v0mvp~q9z%?r%
zKt&2PJrq!rdC*P4i{QQmWhImZlp>u35)_9}hqm60bZ87xJk+7J<w%F-AW=gd+Pi|f
znFn2{u5gO%Taa*EhI0huBR!lZ`xc}Q*Tc&y$VYkwWk>?9&-wC^eg$O+XRh!0@{!)3
zF$5;p!(p;-X=8evHH28C$9d~xdYm}~OQg@CJf>d(8MZyr_fQ7CdI7I4@Shg=^%=Ji
M&bmz+HgqQb7xYSyTL1t6

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/pc/NFIT.dimmpxm b/tests/acpi-test-data/pc/NFIT.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..2bfc6c51f31c25a052803c494c933d4948fc0106
GIT binary patch
literal 224
zcmeZs^9*^wz`(%h@8s|75v<@85#a0x6k`O6f!H7#0xTF<7?{CKCLmdP`9s?0EhP?X
zoOz8Uw)fly3UNTya)1<ZG=NB;xeNvjAoU=?!oUim!15plDuC!_VF&=KYHMHw>O=<N
OCPEC15bKeJ39<o`))N5$

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/pc/SRAT.dimmpxm b/tests/acpi-test-data/pc/SRAT.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..3b10a607d5bba6cebb97d9174c2a54a577bba9a8
GIT binary patch
literal 472
zcmWFzatyh_$iTq3-O1nCBUr&HBEUHqC<YW_0I@+d2*ZH@I-ijdRi23nmCwwK%xBbq
zn*?QW!3D6Z16l|MAK=n(22h+)1I}ZDDumG}?q<}03$x%?#|)KbV8gEtrVKxg<UW{t
kIAA*9HUR~Y+{Xd+5nLTROaoXQT$cb;-3ypBTm~or07$zG0RR91

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/pc/SSDT.dimmpxm b/tests/acpi-test-data/pc/SSDT.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..8ba0e67cb72daa81a65da4906d37a5e0f4af1fd4
GIT binary patch
literal 685
zcmZXSKZw(C6vtnha!u17ByGVzlq1`X<~j)uk|vD}G-*lFBIF?dq}OXZ{P1oO5!yPO
zo*?wHiAX9L6?ehS)yc`t;lSNRa8Q3QhlA(x-Y@U_^4{nB<L5Y<`?dhU4BCCQ>qyo}
zGfb0y13>%kK*cOryZgS=_PteSm+Cg>cMWY@Q3r-B@3o-Ot68ej+a_kmRL0)I8W?@1
za+T+c^lU38j4L2`%L>+6%he6ZTQ*T(yIQX!*`1Li=|fAEbj7~2_*wFnwOqA(9ZTwK
zio5t#O0Oq#AYz>tvTwqT^{aF7>F3(5<j4N|U~@CwN#<2V&KthJe0Fd1sivNOF+RR)
zeTah1mAo#$DXZf4xwu|)AXQ(CgY?>?WDIA?B!IM>Od%6lCJzjmBN;hFG%`iDwD~Z3
zKI4nY$&9XfG6RUn<0vLEGLtd7I!0c;7?KAe&qC<c5go#Qd#E3Y1;Bie9W(@AbIf9f
zH#Rw(&VaKWSAm9EvUS5PbA8=$flM$F>_N+y9WhL8i@}cEbZ{B~9Wf*ra9CPBOY%xa
z*OHSUJPs*V$|WInR{*ab@KVl5iS!IZ<F-$ibA+l9f%vt|5TuC%{5usBoLT|quf7q|
zEgTlzkHh#V3L<aSv_`Vb`HE&UmmM<RYKN+OxylzB;=dQb7cTVHh0gw`vmCywDt!H2
F`U7Br!vO#Q

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/q35/APIC.dimmpxm b/tests/acpi-test-data/q35/APIC.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..427bb08248e6a029c1c988f74f5e48f93ee4ebe0
GIT binary patch
literal 144
zcmZ<^@N}NQz`($`&dJ}|BUr&HBEZ=ZD8>jB1F=Cg1H*%VV44G{4#eePWQ5R6Oc0ux
t8ALPkfFuw61CdNzKn!AlSgfo-nis_4<b<)ffC?aD+}vOm3)_F75dauy4FLcE

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/q35/DSDT.dimmpxm b/tests/acpi-test-data/q35/DSDT.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..14904e8ea2376abd989aa9e99f5bf388a3b85032
GIT binary patch
literal 9487
zcmcgyTW=f38J#65YBi*!rL>miOJc%Fn>5XZl3cgwB`~>5k&L(!P12478sOTETPbOg
zh2j)Rg8-5O<l=_{36r1&`alQ#AI(q5TLbi|uYE0w_$li7W_IYAB?ZI}tsYkM&7Sjp
zb7p7fESK~<es}Q)j9DL6cD#D0Sh>;ieDqn2F>2F)r;)kIdIx@`*0*x0jMY2Li8c-u
z+kMurT&r1s-VMJ9!@D1b)~$%${?hsU_O0mskHXuGKyTfSIH!iQ#rvH~zjx&Eme(lR
z{d&XCm%rw=-S=cGZTHK5o7w$q4c~H`v;Ccpm$~;k^Zb|BhTAiP-NG_=dci+7zP)h%
z$`>!+Er0c^zyIdm>pTO%I{w@EzY)<Pd^hA5!lCo&V9$Dw=;GYkaQV}LI4%1eIunJ|
zb3ZS;DAn5NbtkapSgUruRqd*=S{Nb5hWV(sx&g~G_Vsr;mgyZj6fa+|Zu`Yn+wT|M
z%l%Hb%p&RuBkH>K$B;YWF#pLQWP>nwd^}>qg--^z*k`x$?4SRc8L<&x#7wk1g#usr
zU=CA{<SfJ3JIuK<`#cSYrVZ1VZFI_gF;y7A1q`OuXkSh7kmq{`O<chgKj5j};TdW(
zZzm|HnT3B9{A;$leoCOKDSkw#Q$SUjSFUbbUIm9{3kT=Q=@-(fB30`hJ=gAfC@Qa6
z4%diAy2?-0tcXR-NnK=C&Dv-CER}tS^^ShJRq@C0blRO0^D&kvh#?D^1=g$VRs6r0
zg3{j6Q^C%)F>!G@NwoRQMdlFjMQ9lVGt6C;Gfv|Vhgr<>h~3YO3p#whX1$$$J8jz+
z4@9AA26M8e6wO%is*BUmgq)P-LHa?O*%uv=W|PvaC(`hg8V=3riFr~FQpjT?kKJl`
zo6ODk@!yJeDRTbwiba>e@%|-lmsY#mTH1Q^U@4cu>S4|8ttyjk3++v|l&5~4LQqpd
z8bO^c&1dq*$GDj#E{=!=;DS6Scz~*qn8+`%DZvCbj)<wCDJGbJ3MN8gVl)Dq7ECn)
zT7M#|fTkrAP!$qWRl&NKSdk|qm<pPaOf`6fy3QFx=Zs(~=!|5l!6VdlW(}QL!Bo(!
zsS}~DbJox~Yv`Ocbt2St+J;Ws&}o}G5$ZazLufNSXXu<Wbt2StaziIKbaGQCLS1Lh
z(3vxI=1iRkb)DF$wCkBSbmmQ+2z8zFhR%6I=e(&Cp{{em(79mfTrhPa)O9WzIu{L{
zi>6M5x=zQ?=@>d4Qzt@QXTi`}Fmx77od|WEONP!RL+6sI6QQni+0eOc=v+2+BGh%d
zhECVe>6$tb>N-yuI!_roPnkLq>N-~pohydU6;mfdUFT^-=V?RdX;UXcUFWKybJftf
zYU)I&>pUZv<#;DNBbb$VvpXZ1-oucXevCb9FwYvyvnCUv&OB!@&l${fCKI8~JZ~`1
z8_e@26QRz0%wRrdFds9S2zBP;f~j2Qalur3{Ns|THg1H(R8DfisCmJtdBLoSP}a0z
zpk>WKYn{f}K+#6w*gz4WtPIi!R8bhH#0g8X@Vp~{$}v+CszPEaX)sWURR*d-$v`Dk
zFv&m>Vl>J?B{q(zrt(@cPzfCyC_)TfU}2yV8%M-clS>9Fp@K;Uicsl<fl91k!ay}B
z8K{H`CK)I~r4t4!v4RN$)u3dc5-ONvpa_*t7^uVwCJa=Al7UL7V3L6%R61dx5-XT6
zPz_23Dxrc&28vMWgn>$|V8TE(C>f}P3MLsSLZuT1DzSnI1J$5ppb{#WWS|I@P8g`f
z3MLFxgOY(ts9=(TB2+qIpb{&XFi;Ij1}dR~Nd}5g>4bqwtYE@GH7FUVgbF4ZC_<$Z
z1}d?F2?N!jWS|l%m}H;`l};F_#0n-1RD+U%N~mCxfg)5oVW1K#m@rTcN(L&Sf=LF7
zQ0at$N~~bQKs6{CsDuh887M-f69y`=f(ZlFpk$yDDwt%T2$fD4sKg2;3{-=Xfl8=g
zl7S*rI$@v^E0{1)4N3+op@K;Uicsl<fl91k!ay}B8K{H`CK)I~r4t4!v4RN$)u3dc
z5-ONvpa_*t7^uVwCJa=Al7UL7V3L6%R61dx5-XT6Pz_23Dxrc&28vMWgn=T`3>1-W
zpa^vXMW`95#)N@tOfpc7Nd~GhVW1il2C6a1Ks6>AsK$hWYD^fY#v}vPm}H<D69%d=
zVW1k53{+#1foe<`C?a)rVW5a`#l&Qifg;3ZP$4nZ+`>Q+skw!LB2sfp28vM4Eg2|6
zpSfrIuuwM455$Mn5q%)NpQV50-r>(*NYkfOdRIViBdk{YY8j4uwL%Av4!+IsZscl}
z+M9H!(V=d;%Z;m@t~H!{mmlonCCJ=}=iEz;t6qB!fOe^{z;hpG*&Mx$!YAc>)W>IV
zY(($w@<m2145E!UBh6VnBb3!=@jPCq(90Sd!|oJT`0~dKY%UFVW7xkLc4B067v*~i
z8UjXBb_;#K(P8gVn;(dtMS9c0Ml>K1pYC_s9qZn1JO(OH{c-h5qB>qHeDCTNt$IaN
zuV8$7^@=;adZl+1kajOoou?k5yr-4-M0qb!-aDbZ7nk>w<?*`gd#m@g^1dkVC(8RL
zl=tKE)nxh1Bb2Xd<*TB6HBr8LLiuW3eluBq<`K$oYUMXY`OQT6%@fLR#^r0t^4Uiy
zU(?FhMEP2xeC>qtwYYpeS$_5r%Gb5>by2>aC|^IJd>!R8w3Q{xJF>jmJn&dSui3;~
zOS8|-id!E4=)R8AC2wP1Fw@~#V<MgGo0@LozFacX;Q?eKo$Q;MZsI;$Hq+s$WFnpH
zo0@LozH!ZTcvP84C;O(Ro45~7nd$KSGLcU9O-(nku2;--c&M33C;O(Rn^>o(&2)Iu
znMf!5%ydq<A-^H;_|1p!%3JMfN3BGT8xa08@3pQK>u=nRe)raIi~D!peB-VCU*CC?
z^(?Q|+=^G^Jm)p*oBTI<E00&(@EsUp?wdSb=pQ~oL75djdN1!awmX*B4z62X^`(-<
zGM*RscJFYBKxwa2Z5Dy_%iSVgB{#MkcEqT&cjUi#Q+$JfvB>ewm%Ub_R9Y+12;{df
zGwvm?yZK6L$OpUZ1Fuvn?NQUB=GxvcH`wJVYOl+7n%WzCZawQ^fOEYkZoZIFUm4Ie
zki6sLXgAr7@kA^fo*g7lkidv9hP|gI&NR11p&QJQ=tnDLB~u~8)ckIJ!RvV2_DM3V
z@XK;;5aP>@Fk;E1xCp~qW<rp-SeVfy9M{K+CPoHyWQF&)GQ;|0DsjoDr{d|!T!16j
zbfLkVx6|q7PZB!ydQA)#ile<Lx+0!R_7e29WGALz>gam!Z)IrZ;U>f%7BQBVJ)#AM
zbsChrg&emTKJwGUpquN{O+br+xeNHNu$v=gY<KT9+|oA!7AIR1PmV@1Je7@;Sym#M
zr6sdeGOPFg4vlOb?{cX2{KJvaL)JK3DkZYz$3w$cL5t(D@s!Gkuij+B4SYwEOo+#(
zal&*unJ}#<Opg<$-E+9dYprU(T3W{v72n^{{Z8u|wl<?}Q$f8YJ!HPM-71w!modAv
zRRwo`|Hl41#odYBN*brwrttL4*(z;Q^;Xcf@S1|J_Kqrn++dJdnq4BZ-w1-@t9Sme
zM|+<g_}gus?j1g_tqU3xWlRU{O0$SV9fu~hnrO9d^oza2XQ$xH><SO6oxT;&{T9%j
z5>#L6<BRGir=ZgTZIZ!G^*Rn6ujnp|4*<it0gVaY8RFAN&kCB}7IVwl08_^vp2N-G
zdxFj#AAkAF&qHEFAtf9x4`g-RqSzzEQ0}h`-GLkq_0nxh(SArbCM6kJ?sEa{^jg~J
zFS|@6%ngJlxgQF2=$2Old>Ft6D2Jf1cF?|jsj$II_kPS+UM?rGrsUzfWBI4L;_w(B
zrHIHEF!H8sE}>bM%^x<as`nW*<KC5^d?Uu5MYA5!Wg-$anv@8QW}OJsXi}1W>ZS&9
zDG;ZLJ#;^(K03cou_ZbSqSLxN?x9wb&NYYr?0~wlfZ7}Dg$w-rhp^^{l1VDbB#A=a
z#yaOn--jkN7r>31D;Q8eqRMnSqU3ZkqH)rx_zDqi+`N3reG=<1u#y)bqSdJ`YUn<V
zh1JbWfc9w7MX%ASlPTScSf{1b1W+tr3F#f}WP?@-zm=f7v-CDLM)h=&KTuD|v?%C2
zV%jH1w5H;icEJ&iS;R}U52Z&mJc_Y%xQ^JTMl=x76&J+U;&H*X|4%`DfgTrJ{~s5`
z<-zS3KjhHd#t%7YjvsQ+Uf<KU(+#zq((Xokg7Y+OwBl044^r4dv6WOym)Hk#Biz33
zxA}onTa=;v#S5-f;R@$}{b-)0@IsfCDjo0}hzI!xdCNw>C71KJIQ<|$j_1X~{<pr|
zIhHxS#Gt<v6KAB+G4|E*@!!6aXTExIa#tV=qgzD$#zexz5x$B?U;Mx%199^AIf-4I
z`n6|R{6?l*<~hwAirHe;!p}3DAKk=t>}>RxvKjJMv@i9S><mwFR7dm^Izv^RUp$<k
z-qG`$FMIOy!~E51a1Boav>uXGoEc|^o|K*K%uVqViOr1sX-uwDD&aRF^dq9)(KDUW
d&t!nPv=~0rbk=<A@YGmWjSbPrw!G~w`!6ZL^LYRO

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/q35/NFIT.dimmpxm b/tests/acpi-test-data/q35/NFIT.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..2bfc6c51f31c25a052803c494c933d4948fc0106
GIT binary patch
literal 224
zcmeZs^9*^wz`(%h@8s|75v<@85#a0x6k`O6f!H7#0xTF<7?{CKCLmdP`9s?0EhP?X
zoOz8Uw)fly3UNTya)1<ZG=NB;xeNvjAoU=?!oUim!15plDuC!_VF&=KYHMHw>O=<N
OCPEC15bKeJ39<o`))N5$

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/q35/SRAT.dimmpxm b/tests/acpi-test-data/q35/SRAT.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..3b10a607d5bba6cebb97d9174c2a54a577bba9a8
GIT binary patch
literal 472
zcmWFzatyh_$iTq3-O1nCBUr&HBEUHqC<YW_0I@+d2*ZH@I-ijdRi23nmCwwK%xBbq
zn*?QW!3D6Z16l|MAK=n(22h+)1I}ZDDumG}?q<}03$x%?#|)KbV8gEtrVKxg<UW{t
kIAA*9HUR~Y+{Xd+5nLTROaoXQT$cb;-3ypBTm~or07$zG0RR91

literal 0
HcmV?d00001

diff --git a/tests/acpi-test-data/q35/SSDT.dimmpxm b/tests/acpi-test-data/q35/SSDT.dimmpxm
new file mode 100644
index 0000000000000000000000000000000000000000..8ba0e67cb72daa81a65da4906d37a5e0f4af1fd4
GIT binary patch
literal 685
zcmZXSKZw(C6vtnha!u17ByGVzlq1`X<~j)uk|vD}G-*lFBIF?dq}OXZ{P1oO5!yPO
zo*?wHiAX9L6?ehS)yc`t;lSNRa8Q3QhlA(x-Y@U_^4{nB<L5Y<`?dhU4BCCQ>qyo}
zGfb0y13>%kK*cOryZgS=_PteSm+Cg>cMWY@Q3r-B@3o-Ot68ej+a_kmRL0)I8W?@1
za+T+c^lU38j4L2`%L>+6%he6ZTQ*T(yIQX!*`1Li=|fAEbj7~2_*wFnwOqA(9ZTwK
zio5t#O0Oq#AYz>tvTwqT^{aF7>F3(5<j4N|U~@CwN#<2V&KthJe0Fd1sivNOF+RR)
zeTah1mAo#$DXZf4xwu|)AXQ(CgY?>?WDIA?B!IM>Od%6lCJzjmBN;hFG%`iDwD~Z3
zKI4nY$&9XfG6RUn<0vLEGLtd7I!0c;7?KAe&qC<c5go#Qd#E3Y1;Bie9W(@AbIf9f
zH#Rw(&VaKWSAm9EvUS5PbA8=$flM$F>_N+y9WhL8i@}cEbZ{B~9Wf*ra9CPBOY%xa
z*OHSUJPs*V$|WInR{*ab@KVl5iS!IZ<F-$ibA+l9f%vt|5TuC%{5usBoLT|quf7q|
zEgTlzkHh#V3L<aSv_`Vb`HE&UmmM<RYKN+OxylzB;=dQb7cTVHh0gw`vmCywDt!H2
F`U7Br!vO#Q

literal 0
HcmV?d00001

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (19 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 20/50] test/acpi-test-data: add ACPI tables for dimmpxm test Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-04-13  7:21   ` Markus Armbruster
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 22/50] migrate: Update ram_block_discard_range for shared Michael S. Tsirkin
                   ` (29 subsequent siblings)
  50 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Marc-André Lureau, Daniel P. Berrangé,
	Markus Armbruster, Eric Blake, Gerd Hoffmann

This is helpful for automatic code analysis.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Makefile b/Makefile
index 677a54b..f799390 100644
--- a/Makefile
+++ b/Makefile
@@ -1045,6 +1045,9 @@ endif
 include $(SRC_PATH)/tests/docker/Makefile.include
 include $(SRC_PATH)/tests/vm/Makefile.include
 
+printgen:
+	@echo $(GENERATED_FILES)
+
 .PHONY: help
 help:
 	@echo  'Generic targets:'
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 22/50] migrate: Update ram_block_discard_range for shared
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (20 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 23/50] qemu_ram_block_host_offset Michael S. Tsirkin
                   ` (28 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

The choice of call to discard a block is getting more complicated
for other cases.   We use fallocate PUNCH_HOLE in any file cases;
it works for both hugepage and for tmpfs.
We use the DONTNEED for non-hugepage cases either where they're
anonymous or where they're private.

Care should be taken when trying other backing files.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 exec.c       | 60 ++++++++++++++++++++++++++++++++++++++++++++++--------------
 trace-events |  3 ++-
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/exec.c b/exec.c
index a9181e6..34fdfd9 100644
--- a/exec.c
+++ b/exec.c
@@ -3721,6 +3721,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
     }
 
     if ((start + length) <= rb->used_length) {
+        bool need_madvise, need_fallocate;
         uint8_t *host_endaddr = host_startaddr + length;
         if ((uintptr_t)host_endaddr & (rb->page_size - 1)) {
             error_report("ram_block_discard_range: Unaligned end address: %p",
@@ -3730,29 +3731,60 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
 
         errno = ENOTSUP; /* If we are missing MADVISE etc */
 
-        if (rb->page_size == qemu_host_page_size) {
-#if defined(CONFIG_MADVISE)
-            /* Note: We need the madvise MADV_DONTNEED behaviour of definitely
-             * freeing the page.
-             */
-            ret = madvise(host_startaddr, length, MADV_DONTNEED);
-#endif
-        } else {
-            /* Huge page case  - unfortunately it can't do DONTNEED, but
-             * it can do the equivalent by FALLOC_FL_PUNCH_HOLE in the
-             * huge page file.
+        /* The logic here is messy;
+         *    madvise DONTNEED fails for hugepages
+         *    fallocate works on hugepages and shmem
+         */
+        need_madvise = (rb->page_size == qemu_host_page_size);
+        need_fallocate = rb->fd != -1;
+        if (need_fallocate) {
+            /* For a file, this causes the area of the file to be zero'd
+             * if read, and for hugetlbfs also causes it to be unmapped
+             * so a userfault will trigger.
              */
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
             ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                             start, length);
+            if (ret) {
+                ret = -errno;
+                error_report("ram_block_discard_range: Failed to fallocate "
+                             "%s:%" PRIx64 " +%zx (%d)",
+                             rb->idstr, start, length, ret);
+                goto err;
+            }
+#else
+            ret = -ENOSYS;
+            error_report("ram_block_discard_range: fallocate not available/file"
+                         "%s:%" PRIx64 " +%zx (%d)",
+                         rb->idstr, start, length, ret);
+            goto err;
 #endif
         }
-        if (ret) {
-            ret = -errno;
-            error_report("ram_block_discard_range: Failed to discard range "
+        if (need_madvise) {
+            /* For normal RAM this causes it to be unmapped,
+             * for shared memory it causes the local mapping to disappear
+             * and to fall back on the file contents (which we just
+             * fallocate'd away).
+             */
+#if defined(CONFIG_MADVISE)
+            ret =  madvise(host_startaddr, length, MADV_DONTNEED);
+            if (ret) {
+                ret = -errno;
+                error_report("ram_block_discard_range: Failed to discard range "
+                             "%s:%" PRIx64 " +%zx (%d)",
+                             rb->idstr, start, length, ret);
+                goto err;
+            }
+#else
+            ret = -ENOSYS;
+            error_report("ram_block_discard_range: MADVISE not available"
                          "%s:%" PRIx64 " +%zx (%d)",
                          rb->idstr, start, length, ret);
+            goto err;
+#endif
         }
+        trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
+                                      need_madvise, need_fallocate, ret);
     } else {
         error_report("ram_block_discard_range: Overrun block '%s' (%" PRIu64
                      "/%zx/" RAM_ADDR_FMT")",
diff --git a/trace-events b/trace-events
index 855b0ab..2c3e3d7 100644
--- a/trace-events
+++ b/trace-events
@@ -55,9 +55,10 @@ dma_complete(void *dbs, int ret, void *cb) "dbs=%p ret=%d cb=%p"
 dma_blk_cb(void *dbs, int ret) "dbs=%p ret=%d"
 dma_map_wait(void *dbs) "dbs=%p"
 
-#  # exec.c
+# exec.c
 find_ram_offset(uint64_t size, uint64_t offset) "size: 0x%" PRIx64 " @ 0x%" PRIx64
 find_ram_offset_loop(uint64_t size, uint64_t candidate, uint64_t offset, uint64_t next, uint64_t mingap) "trying size: 0x%" PRIx64 " @ 0x%" PRIx64 ", offset: 0x%" PRIx64" next: 0x%" PRIx64 " mingap: 0x%" PRIx64
+ram_block_discard_range(const char *rbname, void *hva, size_t length, bool need_madvise, bool need_fallocate, int ret) "%s@%p + 0x%zx: madvise: %d fallocate: %d ret: %d"
 
 # memory.c
 memory_region_ops_read(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 23/50] qemu_ram_block_host_offset
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (21 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 22/50] migrate: Update ram_block_discard_range for shared Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 24/50] postcopy: use UFFDIO_ZEROPAGE only when available Michael S. Tsirkin
                   ` (27 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Utility to give the offset of a host pointer within a RAMBlock
(assuming we already know it's in that RAMBlock)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/exec/cpu-common.h |  1 +
 exec.c                    | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 74341b1..0d861a6 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -68,6 +68,7 @@ ram_addr_t qemu_ram_addr_from_host(void *ptr);
 RAMBlock *qemu_ram_block_by_name(const char *name);
 RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
                                    ram_addr_t *offset);
+ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host);
 void qemu_ram_set_idstr(RAMBlock *block, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(RAMBlock *block);
 const char *qemu_ram_get_idstr(RAMBlock *rb);
diff --git a/exec.c b/exec.c
index 34fdfd9..2199b09 100644
--- a/exec.c
+++ b/exec.c
@@ -2297,6 +2297,16 @@ static void *qemu_ram_ptr_length(RAMBlock *ram_block, ram_addr_t addr,
     return ramblock_ptr(block, addr);
 }
 
+/* Return the offset of a hostpointer within a ramblock */
+ram_addr_t qemu_ram_block_host_offset(RAMBlock *rb, void *host)
+{
+    ram_addr_t res = (uint8_t *)host - (uint8_t *)rb->host;
+    assert((uintptr_t)host >= (uintptr_t)rb->host);
+    assert(res < rb->max_length);
+
+    return res;
+}
+
 /*
  * Translates a host ptr back to a RAMBlock, a ram_addr and an offset
  * in that RAMBlock.
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 24/50] postcopy: use UFFDIO_ZEROPAGE only when available
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (22 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 23/50] qemu_ram_block_host_offset Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 25/50] postcopy: Add notifier chain Michael S. Tsirkin
                   ` (26 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Use a flag on the RAMBlock to state whether it has the
UFFDIO_ZEROPAGE capability, use it when it's available.

This allows the use of postcopy on tmpfs as well as hugepage
backed files.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/exec/cpu-common.h |  3 +++
 exec.c                    | 16 ++++++++++++++++
 migration/postcopy-ram.c  | 13 ++++++++++---
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 0d861a6..24d335f 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -73,6 +73,9 @@ void qemu_ram_set_idstr(RAMBlock *block, const char *name, DeviceState *dev);
 void qemu_ram_unset_idstr(RAMBlock *block);
 const char *qemu_ram_get_idstr(RAMBlock *rb);
 bool qemu_ram_is_shared(RAMBlock *rb);
+bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
+void qemu_ram_set_uf_zeroable(RAMBlock *rb);
+
 size_t qemu_ram_pagesize(RAMBlock *block);
 size_t qemu_ram_pagesize_largest(void);
 
diff --git a/exec.c b/exec.c
index 2199b09..0eb890d 100644
--- a/exec.c
+++ b/exec.c
@@ -99,6 +99,11 @@ static MemoryRegion io_mem_unassigned;
  */
 #define RAM_RESIZEABLE (1 << 2)
 
+/* UFFDIO_ZEROPAGE is available on this RAMBlock to atomically
+ * zero the page and wake waiting processes.
+ * (Set during postcopy)
+ */
+#define RAM_UF_ZEROPAGE (1 << 3)
 #endif
 
 #ifdef TARGET_PAGE_BITS_VARY
@@ -1767,6 +1772,17 @@ bool qemu_ram_is_shared(RAMBlock *rb)
     return rb->flags & RAM_SHARED;
 }
 
+/* Note: Only set at the start of postcopy */
+bool qemu_ram_is_uf_zeroable(RAMBlock *rb)
+{
+    return rb->flags & RAM_UF_ZEROPAGE;
+}
+
+void qemu_ram_set_uf_zeroable(RAMBlock *rb)
+{
+    rb->flags |= RAM_UF_ZEROPAGE;
+}
+
 /* Called with iothread lock held.  */
 void qemu_ram_set_idstr(RAMBlock *new_block, const char *name, DeviceState *dev)
 {
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 032abfb..a75b5d3 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -481,6 +481,10 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
         error_report("%s userfault: Region doesn't support COPY", __func__);
         return -1;
     }
+    if (reg_struct.ioctls & ((__u64)1 << _UFFDIO_ZEROPAGE)) {
+        RAMBlock *rb = qemu_ram_block_by_name(block_name);
+        qemu_ram_set_uf_zeroable(rb);
+    }
 
     return 0;
 }
@@ -700,11 +704,14 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
 int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
                              RAMBlock *rb)
 {
+    size_t pagesize = qemu_ram_pagesize(rb);
     trace_postcopy_place_page_zero(host);
 
-    if (qemu_ram_pagesize(rb) == getpagesize()) {
-        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, getpagesize(),
-                                rb)) {
+    /* Normal RAMBlocks can zero a page using UFFDIO_ZEROPAGE
+     * but it's not available for everything (e.g. hugetlbpages)
+     */
+    if (qemu_ram_is_uf_zeroable(rb)) {
+        if (qemu_ufd_copy_ioctl(mis->userfault_fd, host, NULL, pagesize, rb)) {
             int e = errno;
             error_report("%s: %s zero host: %p",
                          __func__, strerror(e), host);
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 25/50] postcopy: Add notifier chain
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (23 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 24/50] postcopy: use UFFDIO_ZEROPAGE only when available Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 27/50] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Michael S. Tsirkin
                   ` (25 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu, Juan Quintela,
	Paolo Bonzini

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a notifier chain for postcopy with a 'reason' flag
and an opportunity for a notifier member to return an error.

Call it when enabling postcopy.

This will initially used to enable devices to declare they're unable
to postcopy and later to notify of devices of stages within postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/postcopy-ram.h | 26 ++++++++++++++++++++++++++
 migration/postcopy-ram.c | 36 ++++++++++++++++++++++++++++++++++++
 vl.c                     |  2 ++
 3 files changed, 64 insertions(+)

diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 14f6cad..2e879bb 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -116,4 +116,30 @@ PostcopyState postcopy_state_set(PostcopyState new_state);
 
 void postcopy_fault_thread_notify(MigrationIncomingState *mis);
 
+/*
+ * To be called once at the start before any device initialisation
+ */
+void postcopy_infrastructure_init(void);
+
+/* Add a notifier to a list to be called when checking whether the devices
+ * can support postcopy.
+ * It's data is a *PostcopyNotifyData
+ * It should return 0 if OK, or a negative value on failure.
+ * On failure it must set the data->errp to an error.
+ *
+ */
+enum PostcopyNotifyReason {
+    POSTCOPY_NOTIFY_PROBE = 0,
+};
+
+struct PostcopyNotifyData {
+    enum PostcopyNotifyReason reason;
+    Error **errp;
+};
+
+void postcopy_add_notifier(NotifierWithReturn *nn);
+void postcopy_remove_notifier(NotifierWithReturn *n);
+/* Call the notifier list set by postcopy_add_start_notifier */
+int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
+
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index a75b5d3..1089814 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -23,6 +23,8 @@
 #include "savevm.h"
 #include "postcopy-ram.h"
 #include "ram.h"
+#include "qapi/error.h"
+#include "qemu/notify.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/balloon.h"
 #include "qemu/error-report.h"
@@ -45,6 +47,33 @@ struct PostcopyDiscardState {
     unsigned int nsentcmds;
 };
 
+static NotifierWithReturnList postcopy_notifier_list;
+
+void postcopy_infrastructure_init(void)
+{
+    notifier_with_return_list_init(&postcopy_notifier_list);
+}
+
+void postcopy_add_notifier(NotifierWithReturn *nn)
+{
+    notifier_with_return_list_add(&postcopy_notifier_list, nn);
+}
+
+void postcopy_remove_notifier(NotifierWithReturn *n)
+{
+    notifier_with_return_remove(n);
+}
+
+int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp)
+{
+    struct PostcopyNotifyData pnd;
+    pnd.reason = reason;
+    pnd.errp = errp;
+
+    return notifier_with_return_list_notify(&postcopy_notifier_list,
+                                            &pnd);
+}
+
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
  * across, and efficiently map new pages in, the techniques for doing this
  * are target OS specific.
@@ -215,6 +244,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
     struct uffdio_register reg_struct;
     struct uffdio_range range_struct;
     uint64_t feature_mask;
+    Error *local_err = NULL;
 
     if (qemu_target_page_size() > pagesize) {
         error_report("Target page size bigger than host page size");
@@ -228,6 +258,12 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
         goto out;
     }
 
+    /* Give devices a chance to object */
+    if (postcopy_notify(POSTCOPY_NOTIFY_PROBE, &local_err)) {
+        error_report_err(local_err);
+        goto out;
+    }
+
     /* Version and features check */
     if (!ufd_check_and_apply(ufd, mis)) {
         goto out;
diff --git a/vl.c b/vl.c
index 3ef04ce..0b15811 100644
--- a/vl.c
+++ b/vl.c
@@ -94,6 +94,7 @@ int main(int argc, char **argv)
 #include "audio/audio.h"
 #include "sysemu/cpus.h"
 #include "migration/colo.h"
+#include "migration/postcopy-ram.h"
 #include "sysemu/kvm.h"
 #include "sysemu/hax.h"
 #include "qapi/qobject-input-visitor.h"
@@ -3101,6 +3102,7 @@ int main(int argc, char **argv, char **envp)
     module_call_init(MODULE_INIT_OPTS);
 
     runstate_init();
+    postcopy_infrastructure_init();
 
     if (qcrypto_init(&err) < 0) {
         error_reportf_err(err, "cannot initialize crypto: ");
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 26/50] postcopy: Add vhost-user flag for postcopy and check it
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (25 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 27/50] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 28/50] libvhost-user: Support sending fds back to qemu Michael S. Tsirkin
                   ` (23 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a vhost feature flag for postcopy support, and
use the postcopy notifier to check it before allowing postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/interop/vhost-user.txt           | 10 +++++++++
 contrib/libvhost-user/libvhost-user.h |  2 ++
 hw/virtio/vhost-user.c                | 41 ++++++++++++++++++++++++++++++++++-
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index cb3a759..91a572d 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -290,6 +290,15 @@ Once the source has finished migration, rings will be stopped by
 the source. No further update must be done before rings are
 restarted.
 
+In postcopy migration the slave is started before all the memory has been
+received from the source host, and care must be taken to avoid accessing pages
+that have yet to be received.  The slave opens a 'userfault'-fd and registers
+the memory with it; this fd is then passed back over to the master.
+The master services requests on the userfaultfd for pages that are accessed
+and when the page is available it performs WAKE ioctl's on the userfaultfd
+to wake the stalled slave.  The client indicates support for this via the
+VHOST_USER_PROTOCOL_F_PAGEFAULT feature.
+
 Memory access
 -------------
 
@@ -369,6 +378,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_SLAVE_REQ      5
 #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN   6
 #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7
+#define VHOST_USER_PROTOCOL_F_PAGEFAULT      8
 
 Master message types
 --------------------
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 18f95f6..96db29c 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -48,6 +48,8 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_NET_MTU = 4,
     VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
     VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
+    VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
+    VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
 
     VHOST_USER_PROTOCOL_F_MAX
 };
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 41ff5cf..aab35c4 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -18,6 +18,8 @@
 #include "qemu/error-report.h"
 #include "qemu/sockets.h"
 #include "sysemu/cryptodev.h"
+#include "migration/migration.h"
+#include "migration/postcopy-ram.h"
 
 #include <sys/ioctl.h>
 #include <sys/socket.h>
@@ -41,7 +43,7 @@ enum VhostUserProtocolFeature {
     VHOST_USER_PROTOCOL_F_SLAVE_REQ = 5,
     VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
     VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
-
+    VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
     VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -164,8 +166,10 @@ static VhostUserMsg m __attribute__ ((unused));
 #define VHOST_USER_VERSION    (0x1)
 
 struct vhost_user {
+    struct vhost_dev *dev;
     CharBackend *chr;
     int slave_fd;
+    NotifierWithReturn postcopy_notifier;
 };
 
 static bool ioeventfd_enabled(void)
@@ -791,6 +795,33 @@ out:
     return ret;
 }
 
+static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
+                                        void *opaque)
+{
+    struct PostcopyNotifyData *pnd = opaque;
+    struct vhost_user *u = container_of(notifier, struct vhost_user,
+                                         postcopy_notifier);
+    struct vhost_dev *dev = u->dev;
+
+    switch (pnd->reason) {
+    case POSTCOPY_NOTIFY_PROBE:
+        if (!virtio_has_feature(dev->protocol_features,
+                                VHOST_USER_PROTOCOL_F_PAGEFAULT)) {
+            /* TODO: Get the device name into this error somehow */
+            error_setg(pnd->errp,
+                       "vhost-user backend not capable of postcopy");
+            return -ENOENT;
+        }
+        break;
+
+    default:
+        /* We ignore notifications we don't know */
+        break;
+    }
+
+    return 0;
+}
+
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
     uint64_t features, protocol_features;
@@ -802,6 +833,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
     u = g_new0(struct vhost_user, 1);
     u->chr = opaque;
     u->slave_fd = -1;
+    u->dev = dev;
     dev->opaque = u;
 
     err = vhost_user_get_features(dev, &features);
@@ -858,6 +890,9 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque)
         return err;
     }
 
+    u->postcopy_notifier.notify = vhost_user_postcopy_notifier;
+    postcopy_add_notifier(&u->postcopy_notifier);
+
     return 0;
 }
 
@@ -868,6 +903,10 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
     assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
     u = dev->opaque;
+    if (u->postcopy_notifier.notify) {
+        postcopy_remove_notifier(&u->postcopy_notifier);
+        u->postcopy_notifier.notify = NULL;
+    }
     if (u->slave_fd >= 0) {
         qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
         close(u->slave_fd);
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 27/50] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (24 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 25/50] postcopy: Add notifier chain Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 26/50] postcopy: Add vhost-user flag for postcopy and check it Michael S. Tsirkin
                   ` (24 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up a notifier to send a VHOST_USER_POSTCOPY_ADVISE
message on an incoming advise.

Later patches will fill in the behaviour/contents of the
message.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/interop/vhost-user.txt           | 10 ++++++++
 contrib/libvhost-user/libvhost-user.h |  3 +++
 migration/postcopy-ram.h              |  1 +
 contrib/libvhost-user/libvhost-user.c | 11 ++++++++
 hw/virtio/vhost-user.c                | 48 +++++++++++++++++++++++++++++++++++
 migration/savevm.c                    |  6 +++++
 6 files changed, 79 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 91a572d..7854e50 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -699,6 +699,16 @@ Master message types
      feature has been successfully negotiated.
      It's a required feature for crypto devices.
 
+ * VHOST_USER_POSTCOPY_ADVISE
+      Id: 28
+      Master payload: N/A
+      Slave payload: userfault fd
+
+      When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, the
+      master advises slave that a migration with postcopy enabled is underway,
+      the slave must open a userfaultfd for later use.
+      Note that at this stage the migration is still in precopy mode.
+
 Slave message types
 -------------------
 
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 96db29c..00d78a8 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -83,6 +83,9 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_VRING_ENDIAN = 23,
     VHOST_USER_GET_CONFIG = 24,
     VHOST_USER_SET_CONFIG = 25,
+    VHOST_USER_CREATE_CRYPTO_SESSION = 26,
+    VHOST_USER_CLOSE_CRYPTO_SESSION = 27,
+    VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 2e879bb..0421c98 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -130,6 +130,7 @@ void postcopy_infrastructure_init(void);
  */
 enum PostcopyNotifyReason {
     POSTCOPY_NOTIFY_PROBE = 0,
+    POSTCOPY_NOTIFY_INBOUND_ADVISE,
 };
 
 struct PostcopyNotifyData {
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 2e358b5..37d4228 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -86,6 +86,7 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_SET_VRING_ENDIAN),
         REQ(VHOST_USER_GET_CONFIG),
         REQ(VHOST_USER_SET_CONFIG),
+        REQ(VHOST_USER_POSTCOPY_ADVISE),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -857,6 +858,14 @@ vu_set_config(VuDev *dev, VhostUserMsg *vmsg)
 }
 
 static bool
+vu_set_postcopy_advise(VuDev *dev, VhostUserMsg *vmsg)
+{
+    /* TODO: Open ufd, pass it back in the request */
+    vmsg->size = 0;
+    return true; /* = send a reply */
+}
+
+static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
     int do_reply = 0;
@@ -927,6 +936,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_config(dev, vmsg);
     case VHOST_USER_NONE:
         break;
+    case VHOST_USER_POSTCOPY_ADVISE:
+        return vu_set_postcopy_advise(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index aab35c4..ceb17b0 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -78,6 +78,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_SET_CONFIG = 25,
     VHOST_USER_CREATE_CRYPTO_SESSION = 26,
     VHOST_USER_CLOSE_CRYPTO_SESSION = 27,
+    VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -795,6 +796,50 @@ out:
     return ret;
 }
 
+/*
+ * Called at the start of an inbound postcopy on reception of the
+ * 'advise' command.
+ */
+static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
+{
+    struct vhost_user *u = dev->opaque;
+    CharBackend *chr = u->chr;
+    int ufd;
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_POSTCOPY_ADVISE,
+        .hdr.flags = VHOST_USER_VERSION,
+    };
+
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        error_setg(errp, "Failed to send postcopy_advise to vhost");
+        return -1;
+    }
+
+    if (vhost_user_read(dev, &msg) < 0) {
+        error_setg(errp, "Failed to get postcopy_advise reply from vhost");
+        return -1;
+    }
+
+    if (msg.hdr.request != VHOST_USER_POSTCOPY_ADVISE) {
+        error_setg(errp, "Unexpected msg type. Expected %d received %d",
+                     VHOST_USER_POSTCOPY_ADVISE, msg.hdr.request);
+        return -1;
+    }
+
+    if (msg.hdr.size) {
+        error_setg(errp, "Received bad msg size.");
+        return -1;
+    }
+    ufd = qemu_chr_fe_get_msgfd(chr);
+    if (ufd < 0) {
+        error_setg(errp, "%s: Failed to get ufd", __func__);
+        return -1;
+    }
+
+    /* TODO: register ufd with userfault thread */
+    return 0;
+}
+
 static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
                                         void *opaque)
 {
@@ -814,6 +859,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
         }
         break;
 
+    case POSTCOPY_NOTIFY_INBOUND_ADVISE:
+        return vhost_user_postcopy_advise(dev, pnd->errp);
+
     default:
         /* We ignore notifications we don't know */
         break;
diff --git a/migration/savevm.c b/migration/savevm.c
index 358c5b5..1f2bf12 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1386,6 +1386,7 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
     uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
+    Error *local_err = NULL;
 
     trace_loadvm_postcopy_handle_advise();
     if (ps != POSTCOPY_INCOMING_NONE) {
@@ -1451,6 +1452,11 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
         return -1;
     }
 
+    if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_ADVISE, &local_err)) {
+        error_report_err(local_err);
+        return -1;
+    }
+
     if (ram_postcopy_incoming_init(mis)) {
         return -1;
     }
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 28/50] libvhost-user: Support sending fds back to qemu
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (26 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 26/50] postcopy: Add vhost-user flag for postcopy and check it Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 29/50] libvhost-user: Open userfaultfd Michael S. Tsirkin
                   ` (22 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Maxime Coquelin, Yongji Xie

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Allow replies with fds (for postcopy)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 37d4228..ed9f314 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -246,6 +246,31 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
 {
     int rc;
     uint8_t *p = (uint8_t *)vmsg;
+    char control[CMSG_SPACE(VHOST_MEMORY_MAX_NREGIONS * sizeof(int))] = { };
+    struct iovec iov = {
+        .iov_base = (char *)vmsg,
+        .iov_len = VHOST_USER_HDR_SIZE,
+    };
+    struct msghdr msg = {
+        .msg_iov = &iov,
+        .msg_iovlen = 1,
+        .msg_control = control,
+    };
+    struct cmsghdr *cmsg;
+
+    memset(control, 0, sizeof(control));
+    assert(vmsg->fd_num <= VHOST_MEMORY_MAX_NREGIONS);
+    if (vmsg->fd_num > 0) {
+        size_t fdsize = vmsg->fd_num * sizeof(int);
+        msg.msg_controllen = CMSG_SPACE(fdsize);
+        cmsg = CMSG_FIRSTHDR(&msg);
+        cmsg->cmsg_len = CMSG_LEN(fdsize);
+        cmsg->cmsg_level = SOL_SOCKET;
+        cmsg->cmsg_type = SCM_RIGHTS;
+        memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
+    } else {
+        msg.msg_controllen = 0;
+    }
 
     /* Set the version in the flags when sending the reply */
     vmsg->flags &= ~VHOST_USER_VERSION_MASK;
@@ -253,7 +278,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
     vmsg->flags |= VHOST_USER_REPLY_MASK;
 
     do {
-        rc = write(conn_fd, p, VHOST_USER_HDR_SIZE);
+        rc = sendmsg(conn_fd, &msg, 0);
     } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
     do {
@@ -346,6 +371,7 @@ vu_get_features_exec(VuDev *dev, VhostUserMsg *vmsg)
     }
 
     vmsg->size = sizeof(vmsg->payload.u64);
+    vmsg->fd_num = 0;
 
     DPRINT("Sending back to guest u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
 
@@ -501,6 +527,7 @@ vu_set_log_base_exec(VuDev *dev, VhostUserMsg *vmsg)
     dev->log_size = log_mmap_size;
 
     vmsg->size = sizeof(vmsg->payload.u64);
+    vmsg->fd_num = 0;
 
     return true;
 }
@@ -759,6 +786,7 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
 
     vmsg->payload.u64 = features;
     vmsg->size = sizeof(vmsg->payload.u64);
+    vmsg->fd_num = 0;
 
     return true;
 }
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 29/50] libvhost-user: Open userfaultfd
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (27 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 28/50] libvhost-user: Support sending fds back to qemu Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 30/50] postcopy: Allow registering of fd handler Michael S. Tsirkin
                   ` (21 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Maxime Coquelin, Yongji Xie, Peter Xu

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Open a userfaultfd (on a postcopy_advise) and send it back in
the reply to the qemu for it to monitor.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 contrib/libvhost-user/libvhost-user.h |  3 +++
 contrib/libvhost-user/libvhost-user.c | 42 +++++++++++++++++++++++++++++++++--
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 00d78a8..074b786 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -282,6 +282,9 @@ struct VuDev {
      * re-initialize */
     vu_panic_cb panic;
     const VuDevIface *iface;
+
+    /* Postcopy data */
+    int postcopy_ufd;
 };
 
 typedef struct VuVirtqElement {
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index ed9f314..9e31f47 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -26,9 +26,20 @@
 #include <sys/socket.h>
 #include <sys/eventfd.h>
 #include <sys/mman.h>
+#include "qemu/compiler.h"
+
+#if defined(__linux__)
+#include <sys/syscall.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
 #include <linux/vhost.h>
 
-#include "qemu/compiler.h"
+#ifdef __NR_userfaultfd
+#include <linux/userfaultfd.h>
+#endif
+
+#endif
+
 #include "qemu/atomic.h"
 
 #include "libvhost-user.h"
@@ -888,8 +899,35 @@ vu_set_config(VuDev *dev, VhostUserMsg *vmsg)
 static bool
 vu_set_postcopy_advise(VuDev *dev, VhostUserMsg *vmsg)
 {
-    /* TODO: Open ufd, pass it back in the request */
+    dev->postcopy_ufd = -1;
+#ifdef UFFDIO_API
+    struct uffdio_api api_struct;
+
+    dev->postcopy_ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
     vmsg->size = 0;
+#endif
+
+    if (dev->postcopy_ufd == -1) {
+        vu_panic(dev, "Userfaultfd not available: %s", strerror(errno));
+        goto out;
+    }
+
+#ifdef UFFDIO_API
+    api_struct.api = UFFD_API;
+    api_struct.features = 0;
+    if (ioctl(dev->postcopy_ufd, UFFDIO_API, &api_struct)) {
+        vu_panic(dev, "Failed UFFDIO_API: %s", strerror(errno));
+        close(dev->postcopy_ufd);
+        dev->postcopy_ufd = -1;
+        goto out;
+    }
+    /* TODO: Stash feature flags somewhere */
+#endif
+
+out:
+    /* Return a ufd to the QEMU */
+    vmsg->fd_num = 1;
+    vmsg->fds[0] = dev->postcopy_ufd;
     return true; /* = send a reply */
 }
 
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 31/50] vhost+postcopy: Register shared ufd with postcopy
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (29 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 30/50] postcopy: Allow registering of fd handler Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-04-27 16:12   ` Peter Maydell
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 32/50] vhost+postcopy: Transmit 'listen' to slave Michael S. Tsirkin
                   ` (19 subsequent siblings)
  50 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Register the UFD that comes in as the response to the 'advise' method
with the postcopy code.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-user.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index ceb17b0..5900583 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -171,6 +171,7 @@ struct vhost_user {
     CharBackend *chr;
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
+    struct PostCopyFD  postcopy_fd;
 };
 
 static bool ioeventfd_enabled(void)
@@ -797,6 +798,17 @@ out:
 }
 
 /*
+ * Called back from the postcopy fault thread when a fault is received on our
+ * ufd.
+ * TODO: This is Linux specific
+ */
+static int vhost_user_postcopy_fault_handler(struct PostCopyFD *pcfd,
+                                             void *ufd)
+{
+    return 0;
+}
+
+/*
  * Called at the start of an inbound postcopy on reception of the
  * 'advise' command.
  */
@@ -835,8 +847,14 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
         error_setg(errp, "%s: Failed to get ufd", __func__);
         return -1;
     }
+    fcntl(ufd, F_SETFL, O_NONBLOCK);
 
-    /* TODO: register ufd with userfault thread */
+    /* register ufd with userfault thread */
+    u->postcopy_fd.fd = ufd;
+    u->postcopy_fd.data = dev;
+    u->postcopy_fd.handler = vhost_user_postcopy_fault_handler;
+    u->postcopy_fd.idstr = "vhost-user"; /* Need to find unique name */
+    postcopy_register_shared_ufd(&u->postcopy_fd);
     return 0;
 }
 
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 32/50] vhost+postcopy: Transmit 'listen' to slave
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (30 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 31/50] vhost+postcopy: Register shared ufd with postcopy Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 33/50] postcopy+vhost-user: Split set_mem_table for postcopy Michael S. Tsirkin
                   ` (18 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Peter Xu, Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Notify the vhost-user slave on reception of the 'postcopy-listen'
event from the source.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/interop/vhost-user.txt           | 11 +++++++++++
 contrib/libvhost-user/libvhost-user.h |  2 ++
 migration/postcopy-ram.h              |  1 +
 contrib/libvhost-user/libvhost-user.c | 19 +++++++++++++++++++
 hw/virtio/vhost-user.c                | 34 ++++++++++++++++++++++++++++++++++
 migration/savevm.c                    |  7 +++++++
 hw/virtio/trace-events                |  3 +++
 7 files changed, 77 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 7854e50..0d24203 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -709,6 +709,17 @@ Master message types
       the slave must open a userfaultfd for later use.
       Note that at this stage the migration is still in precopy mode.
 
+ * VHOST_USER_POSTCOPY_LISTEN
+      Id: 29
+      Master payload: N/A
+
+      Master advises slave that a transition to postcopy mode has happened.
+      The slave must ensure that shared memory is registered with userfaultfd
+      to cause faulting of non-present pages.
+
+      This is always sent sometime after a VHOST_USER_POSTCOPY_ADVISE, and
+      thus only when VHOST_USER_PROTOCOL_F_PAGEFAULT is supported.
+
 Slave message types
 -------------------
 
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index 074b786..ed505cf 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -86,6 +86,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_CREATE_CRYPTO_SESSION = 26,
     VHOST_USER_CLOSE_CRYPTO_SESSION = 27,
     VHOST_USER_POSTCOPY_ADVISE  = 28,
+    VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -285,6 +286,7 @@ struct VuDev {
 
     /* Postcopy data */
     int postcopy_ufd;
+    bool postcopy_listening;
 };
 
 typedef struct VuVirtqElement {
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index f21eef6..c8ced34 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -131,6 +131,7 @@ void postcopy_infrastructure_init(void);
 enum PostcopyNotifyReason {
     POSTCOPY_NOTIFY_PROBE = 0,
     POSTCOPY_NOTIFY_INBOUND_ADVISE,
+    POSTCOPY_NOTIFY_INBOUND_LISTEN,
 };
 
 struct PostcopyNotifyData {
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 9e31f47..e53b195 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -98,6 +98,7 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_GET_CONFIG),
         REQ(VHOST_USER_SET_CONFIG),
         REQ(VHOST_USER_POSTCOPY_ADVISE),
+        REQ(VHOST_USER_POSTCOPY_LISTEN),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -932,6 +933,22 @@ out:
 }
 
 static bool
+vu_set_postcopy_listen(VuDev *dev, VhostUserMsg *vmsg)
+{
+    vmsg->payload.u64 = -1;
+    vmsg->size = sizeof(vmsg->payload.u64);
+
+    if (dev->nregions) {
+        vu_panic(dev, "Regions already registered at postcopy-listen");
+        return true;
+    }
+    dev->postcopy_listening = true;
+
+    vmsg->flags = VHOST_USER_VERSION |  VHOST_USER_REPLY_MASK;
+    vmsg->payload.u64 = 0; /* Success */
+    return true;
+}
+static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
     int do_reply = 0;
@@ -1004,6 +1021,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         break;
     case VHOST_USER_POSTCOPY_ADVISE:
         return vu_set_postcopy_advise(dev, vmsg);
+    case VHOST_USER_POSTCOPY_LISTEN:
+        return vu_set_postcopy_listen(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 5900583..c3ab299 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -20,6 +20,7 @@
 #include "sysemu/cryptodev.h"
 #include "migration/migration.h"
 #include "migration/postcopy-ram.h"
+#include "trace.h"
 
 #include <sys/ioctl.h>
 #include <sys/socket.h>
@@ -79,6 +80,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_CREATE_CRYPTO_SESSION = 26,
     VHOST_USER_CLOSE_CRYPTO_SESSION = 27,
     VHOST_USER_POSTCOPY_ADVISE  = 28,
+    VHOST_USER_POSTCOPY_LISTEN  = 29,
     VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -172,6 +174,8 @@ struct vhost_user {
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
+    /* True once we've entered postcopy_listen */
+    bool               postcopy_listen;
 };
 
 static bool ioeventfd_enabled(void)
@@ -858,6 +862,33 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
     return 0;
 }
 
+/*
+ * Called at the switch to postcopy on reception of the 'listen' command.
+ */
+static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
+{
+    struct vhost_user *u = dev->opaque;
+    int ret;
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_POSTCOPY_LISTEN,
+        .hdr.flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+    };
+    u->postcopy_listen = true;
+    trace_vhost_user_postcopy_listen();
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        error_setg(errp, "Failed to send postcopy_listen to vhost");
+        return -1;
+    }
+
+    ret = process_message_reply(dev, &msg);
+    if (ret) {
+        error_setg(errp, "Failed to receive reply to postcopy_listen");
+        return ret;
+    }
+
+    return 0;
+}
+
 static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
                                         void *opaque)
 {
@@ -880,6 +911,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
     case POSTCOPY_NOTIFY_INBOUND_ADVISE:
         return vhost_user_postcopy_advise(dev, pnd->errp);
 
+    case POSTCOPY_NOTIFY_INBOUND_LISTEN:
+        return vhost_user_postcopy_listen(dev, pnd->errp);
+
     default:
         /* We ignore notifications we don't know */
         break;
diff --git a/migration/savevm.c b/migration/savevm.c
index 1f2bf12..305c3ce 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1618,6 +1618,8 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_LISTENING);
     trace_loadvm_postcopy_handle_listen();
+    Error *local_err = NULL;
+
     if (ps != POSTCOPY_INCOMING_ADVISE && ps != POSTCOPY_INCOMING_DISCARD) {
         error_report("CMD_POSTCOPY_LISTEN in wrong postcopy state (%d)", ps);
         return -1;
@@ -1643,6 +1645,11 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
         }
     }
 
+    if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, &local_err)) {
+        error_report_err(local_err);
+        return -1;
+    }
+
     if (mis->have_listen_thread) {
         error_report("CMD_POSTCOPY_RAM_LISTEN already has a listen thread");
         return -1;
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 742ff0f..06ec03d 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -6,6 +6,9 @@ vhost_region_add_section(const char *name, uint64_t gpa, uint64_t size, uint64_t
 vhost_region_add_section_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx64
 vhost_section(const char *name, int r) "%s:%d"
 
+# hw/virtio/vhost-user.c
+vhost_user_postcopy_listen(void) ""
+
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
 virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "vq %p elem %p len %u idx %u"
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 30/50] postcopy: Allow registering of fd handler
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (28 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 29/50] libvhost-user: Open userfaultfd Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 31/50] vhost+postcopy: Register shared ufd with postcopy Michael S. Tsirkin
                   ` (20 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu, Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Allow other userfaultfd's to be registered into the fault thread
so that handlers for shared memory can get responses.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/migration.h    |   2 +
 migration/postcopy-ram.h |  21 +++++
 migration/migration.c    |   6 ++
 migration/postcopy-ram.c | 209 +++++++++++++++++++++++++++++++++++------------
 migration/trace-events   |   2 +
 5 files changed, 187 insertions(+), 53 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 08c5d2d..d02a759 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -51,6 +51,8 @@ struct MigrationIncomingState {
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
     void     *postcopy_tmp_page;
     void     *postcopy_tmp_zero_page;
+    /* PostCopyFD's for external userfaultfds & handlers of shared memory */
+    GArray   *postcopy_remote_fds;
 
     QEMUBH *bh;
 
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 0421c98..f21eef6 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -143,4 +143,25 @@ void postcopy_remove_notifier(NotifierWithReturn *n);
 /* Call the notifier list set by postcopy_add_start_notifier */
 int postcopy_notify(enum PostcopyNotifyReason reason, Error **errp);
 
+struct PostCopyFD;
+
+/* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
+typedef int (*pcfdhandler)(struct PostCopyFD *pcfd, void *ufd);
+
+struct PostCopyFD {
+    int fd;
+    /* Data to pass to handler */
+    void *data;
+    /* Handler to be called whenever we get a poll event */
+    pcfdhandler handler;
+    /* A string to use in error messages */
+    const char *idstr;
+};
+
+/* Register a userfaultfd owned by an external process for
+ * shared memory.
+ */
+void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
+void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 6a4780e..1f22f46 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -155,6 +155,8 @@ MigrationIncomingState *migration_incoming_get_current(void)
     if (!once) {
         mis_current.state = MIGRATION_STATUS_NONE;
         memset(&mis_current, 0, sizeof(MigrationIncomingState));
+        mis_current.postcopy_remote_fds = g_array_new(FALSE, TRUE,
+                                                   sizeof(struct PostCopyFD));
         qemu_mutex_init(&mis_current.rp_mutex);
         qemu_event_init(&mis_current.main_thread_load_event, false);
         once = true;
@@ -177,6 +179,10 @@ void migration_incoming_state_destroy(void)
         qemu_fclose(mis->from_src_file);
         mis->from_src_file = NULL;
     }
+    if (mis->postcopy_remote_fds) {
+        g_array_free(mis->postcopy_remote_fds, TRUE);
+        mis->postcopy_remote_fds = NULL;
+    }
 
     qemu_event_reset(&mis->main_thread_load_event);
 }
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 1089814..6ce1577 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -533,29 +533,44 @@ static void *postcopy_ram_fault_thread(void *opaque)
     MigrationIncomingState *mis = opaque;
     struct uffd_msg msg;
     int ret;
+    size_t index;
     RAMBlock *rb = NULL;
     RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
 
     trace_postcopy_ram_fault_thread_entry();
     qemu_sem_post(&mis->fault_thread_sem);
 
+    struct pollfd *pfd;
+    size_t pfd_len = 2 + mis->postcopy_remote_fds->len;
+
+    pfd = g_new0(struct pollfd, pfd_len);
+
+    pfd[0].fd = mis->userfault_fd;
+    pfd[0].events = POLLIN;
+    pfd[1].fd = mis->userfault_event_fd;
+    pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
+    trace_postcopy_ram_fault_thread_fds_core(pfd[0].fd, pfd[1].fd);
+    for (index = 0; index < mis->postcopy_remote_fds->len; index++) {
+        struct PostCopyFD *pcfd = &g_array_index(mis->postcopy_remote_fds,
+                                                 struct PostCopyFD, index);
+        pfd[2 + index].fd = pcfd->fd;
+        pfd[2 + index].events = POLLIN;
+        trace_postcopy_ram_fault_thread_fds_extra(2 + index, pcfd->idstr,
+                                                  pcfd->fd);
+    }
+
     while (true) {
         ram_addr_t rb_offset;
-        struct pollfd pfd[2];
+        int poll_result;
 
         /*
          * We're mainly waiting for the kernel to give us a faulting HVA,
          * however we can be told to quit via userfault_quit_fd which is
          * an eventfd
          */
-        pfd[0].fd = mis->userfault_fd;
-        pfd[0].events = POLLIN;
-        pfd[0].revents = 0;
-        pfd[1].fd = mis->userfault_event_fd;
-        pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
-        pfd[1].revents = 0;
-
-        if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
+
+        poll_result = poll(pfd, pfd_len, -1 /* Wait forever */);
+        if (poll_result == -1) {
             error_report("%s: userfault poll: %s", __func__, strerror(errno));
             break;
         }
@@ -575,57 +590,117 @@ static void *postcopy_ram_fault_thread(void *opaque)
             }
         }
 
-        ret = read(mis->userfault_fd, &msg, sizeof(msg));
-        if (ret != sizeof(msg)) {
-            if (errno == EAGAIN) {
-                /*
-                 * if a wake up happens on the other thread just after
-                 * the poll, there is nothing to read.
-                 */
-                continue;
+        if (pfd[0].revents) {
+            poll_result--;
+            ret = read(mis->userfault_fd, &msg, sizeof(msg));
+            if (ret != sizeof(msg)) {
+                if (errno == EAGAIN) {
+                    /*
+                     * if a wake up happens on the other thread just after
+                     * the poll, there is nothing to read.
+                     */
+                    continue;
+                }
+                if (ret < 0) {
+                    error_report("%s: Failed to read full userfault "
+                                 "message: %s",
+                                 __func__, strerror(errno));
+                    break;
+                } else {
+                    error_report("%s: Read %d bytes from userfaultfd "
+                                 "expected %zd",
+                                 __func__, ret, sizeof(msg));
+                    break; /* Lost alignment, don't know what we'd read next */
+                }
             }
-            if (ret < 0) {
-                error_report("%s: Failed to read full userfault message: %s",
-                             __func__, strerror(errno));
-                break;
-            } else {
-                error_report("%s: Read %d bytes from userfaultfd expected %zd",
-                             __func__, ret, sizeof(msg));
-                break; /* Lost alignment, don't know what we'd read next */
+            if (msg.event != UFFD_EVENT_PAGEFAULT) {
+                error_report("%s: Read unexpected event %ud from userfaultfd",
+                             __func__, msg.event);
+                continue; /* It's not a page fault, shouldn't happen */
             }
-        }
-        if (msg.event != UFFD_EVENT_PAGEFAULT) {
-            error_report("%s: Read unexpected event %ud from userfaultfd",
-                         __func__, msg.event);
-            continue; /* It's not a page fault, shouldn't happen */
-        }
 
-        rb = qemu_ram_block_from_host(
-                 (void *)(uintptr_t)msg.arg.pagefault.address,
-                 true, &rb_offset);
-        if (!rb) {
-            error_report("postcopy_ram_fault_thread: Fault outside guest: %"
-                         PRIx64, (uint64_t)msg.arg.pagefault.address);
-            break;
-        }
+            rb = qemu_ram_block_from_host(
+                     (void *)(uintptr_t)msg.arg.pagefault.address,
+                     true, &rb_offset);
+            if (!rb) {
+                error_report("postcopy_ram_fault_thread: Fault outside guest: %"
+                             PRIx64, (uint64_t)msg.arg.pagefault.address);
+                break;
+            }
 
-        rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
-        trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
+            rb_offset &= ~(qemu_ram_pagesize(rb) - 1);
+            trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
                                                 qemu_ram_get_idstr(rb),
                                                 rb_offset);
+            /*
+             * Send the request to the source - we want to request one
+             * of our host page sizes (which is >= TPS)
+             */
+            if (rb != last_rb) {
+                last_rb = rb;
+                migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                         rb_offset, qemu_ram_pagesize(rb));
+            } else {
+                /* Save some space */
+                migrate_send_rp_req_pages(mis, NULL,
+                                         rb_offset, qemu_ram_pagesize(rb));
+            }
+        }
 
-        /*
-         * Send the request to the source - we want to request one
-         * of our host page sizes (which is >= TPS)
-         */
-        if (rb != last_rb) {
-            last_rb = rb;
-            migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
-                                     rb_offset, qemu_ram_pagesize(rb));
-        } else {
-            /* Save some space */
-            migrate_send_rp_req_pages(mis, NULL,
-                                     rb_offset, qemu_ram_pagesize(rb));
+        /* Now handle any requests from external processes on shared memory */
+        /* TODO: May need to handle devices deregistering during postcopy */
+        for (index = 2; index < pfd_len && poll_result; index++) {
+            if (pfd[index].revents) {
+                struct PostCopyFD *pcfd =
+                    &g_array_index(mis->postcopy_remote_fds,
+                                   struct PostCopyFD, index - 2);
+
+                poll_result--;
+                if (pfd[index].revents & POLLERR) {
+                    error_report("%s: POLLERR on poll %zd fd=%d",
+                                 __func__, index, pcfd->fd);
+                    pfd[index].events = 0;
+                    continue;
+                }
+
+                ret = read(pcfd->fd, &msg, sizeof(msg));
+                if (ret != sizeof(msg)) {
+                    if (errno == EAGAIN) {
+                        /*
+                         * if a wake up happens on the other thread just after
+                         * the poll, there is nothing to read.
+                         */
+                        continue;
+                    }
+                    if (ret < 0) {
+                        error_report("%s: Failed to read full userfault "
+                                     "message: %s (shared) revents=%d",
+                                     __func__, strerror(errno),
+                                     pfd[index].revents);
+                        /*TODO: Could just disable this sharer */
+                        break;
+                    } else {
+                        error_report("%s: Read %d bytes from userfaultfd "
+                                     "expected %zd (shared)",
+                                     __func__, ret, sizeof(msg));
+                        /*TODO: Could just disable this sharer */
+                        break; /*Lost alignment,don't know what we'd read next*/
+                    }
+                }
+                if (msg.event != UFFD_EVENT_PAGEFAULT) {
+                    error_report("%s: Read unexpected event %ud "
+                                 "from userfaultfd (shared)",
+                                 __func__, msg.event);
+                    continue; /* It's not a page fault, shouldn't happen */
+                }
+                /* Call the device handler registered with us */
+                ret = pcfd->handler(pcfd, &msg);
+                if (ret) {
+                    error_report("%s: Failed to resolve shared fault on %zd/%s",
+                                 __func__, index, pcfd->idstr);
+                    /* TODO: Fail? Disable this sharer? */
+                }
+            }
         }
     }
     trace_postcopy_ram_fault_thread_exit();
@@ -970,3 +1045,31 @@ PostcopyState postcopy_state_set(PostcopyState new_state)
 {
     return atomic_xchg(&incoming_postcopy_state, new_state);
 }
+
+/* Register a handler for external shared memory postcopy
+ * called on the destination.
+ */
+void postcopy_register_shared_ufd(struct PostCopyFD *pcfd)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    mis->postcopy_remote_fds = g_array_append_val(mis->postcopy_remote_fds,
+                                                  *pcfd);
+}
+
+/* Unregister a handler for external shared memory postcopy
+ */
+void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd)
+{
+    guint i;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    GArray *pcrfds = mis->postcopy_remote_fds;
+
+    for (i = 0; i < pcrfds->len; i++) {
+        struct PostCopyFD *cur = &g_array_index(pcrfds, struct PostCopyFD, i);
+        if (cur->fd == pcfd->fd) {
+            mis->postcopy_remote_fds = g_array_remove_index(pcrfds, i);
+            return;
+        }
+    }
+}
diff --git a/migration/trace-events b/migration/trace-events
index 93961de..1e617ad 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -190,6 +190,8 @@ postcopy_place_page_zero(void *host_addr) "host=%p"
 postcopy_ram_enable_notify(void) ""
 postcopy_ram_fault_thread_entry(void) ""
 postcopy_ram_fault_thread_exit(void) ""
+postcopy_ram_fault_thread_fds_core(int baseufd, int quitfd) "ufd: %d quitfd: %d"
+postcopy_ram_fault_thread_fds_extra(size_t index, const char *name, int fd) "%zd/%s: %d"
 postcopy_ram_fault_thread_quit(void) ""
 postcopy_ram_fault_thread_request(uint64_t hostaddr, const char *ramblock, size_t offset) "Request for HVA=0x%" PRIx64 " rb=%s offset=0x%zx"
 postcopy_ram_incoming_cleanup_closeuf(void) ""
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 33/50] postcopy+vhost-user: Split set_mem_table for postcopy
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (31 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 32/50] vhost+postcopy: Transmit 'listen' to slave Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 34/50] migration/ram: ramblock_recv_bitmap_test_byte_offset Michael S. Tsirkin
                   ` (17 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Split the set_mem_table routines in both qemu and libvhost-user
because the postcopy versions are going to be quite different
once changes in the later patches are added. However, this patch
doesn't produce any functional change, just the split.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 53 ++++++++++++++++++++++++
 hw/virtio/vhost-user.c                | 77 ++++++++++++++++++++++++++++++++++-
 2 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index e53b195..b2de8ed 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -449,6 +449,55 @@ vu_reset_device_exec(VuDev *dev, VhostUserMsg *vmsg)
 }
 
 static bool
+vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
+{
+    int i;
+    VhostUserMemory *memory = &vmsg->payload.memory;
+    dev->nregions = memory->nregions;
+    /* TODO: Postcopy specific code */
+    DPRINT("Nregions: %d\n", memory->nregions);
+    for (i = 0; i < dev->nregions; i++) {
+        void *mmap_addr;
+        VhostUserMemoryRegion *msg_region = &memory->regions[i];
+        VuDevRegion *dev_region = &dev->regions[i];
+
+        DPRINT("Region %d\n", i);
+        DPRINT("    guest_phys_addr: 0x%016"PRIx64"\n",
+               msg_region->guest_phys_addr);
+        DPRINT("    memory_size:     0x%016"PRIx64"\n",
+               msg_region->memory_size);
+        DPRINT("    userspace_addr   0x%016"PRIx64"\n",
+               msg_region->userspace_addr);
+        DPRINT("    mmap_offset      0x%016"PRIx64"\n",
+               msg_region->mmap_offset);
+
+        dev_region->gpa = msg_region->guest_phys_addr;
+        dev_region->size = msg_region->memory_size;
+        dev_region->qva = msg_region->userspace_addr;
+        dev_region->mmap_offset = msg_region->mmap_offset;
+
+        /* We don't use offset argument of mmap() since the
+         * mapped address has to be page aligned, and we use huge
+         * pages.  */
+        mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
+                         PROT_READ | PROT_WRITE, MAP_SHARED,
+                         vmsg->fds[i], 0);
+
+        if (mmap_addr == MAP_FAILED) {
+            vu_panic(dev, "region mmap error: %s", strerror(errno));
+        } else {
+            dev_region->mmap_addr = (uint64_t)(uintptr_t)mmap_addr;
+            DPRINT("    mmap_addr:       0x%016"PRIx64"\n",
+                   dev_region->mmap_addr);
+        }
+
+        close(vmsg->fds[i]);
+    }
+
+    return false;
+}
+
+static bool
 vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
 {
     int i;
@@ -464,6 +513,10 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg *vmsg)
     }
     dev->nregions = memory->nregions;
 
+    if (dev->postcopy_listening) {
+        return vu_set_mem_table_exec_postcopy(dev, vmsg);
+    }
+
     DPRINT("Nregions: %d\n", memory->nregions);
     for (i = 0; i < dev->nregions; i++) {
         void *mmap_addr;
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index c3ab299..b6757eb 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -340,15 +340,86 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     return 0;
 }
 
+static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
+                                             struct vhost_memory *mem)
+{
+    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int i, fd;
+    size_t fd_num = 0;
+    bool reply_supported = virtio_has_feature(dev->protocol_features,
+                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
+    /* TODO: Add actual postcopy differences */
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_SET_MEM_TABLE,
+        .hdr.flags = VHOST_USER_VERSION,
+    };
+
+    if (reply_supported) {
+        msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        struct vhost_memory_region *reg = dev->mem->regions + i;
+        ram_addr_t offset;
+        MemoryRegion *mr;
+
+        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                     &offset);
+        fd = memory_region_get_fd(mr);
+        if (fd > 0) {
+            msg.payload.memory.regions[fd_num].userspace_addr =
+                reg->userspace_addr;
+            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
+            msg.payload.memory.regions[fd_num].guest_phys_addr =
+                reg->guest_phys_addr;
+            msg.payload.memory.regions[fd_num].mmap_offset = offset;
+            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
+            fds[fd_num++] = fd;
+        }
+    }
+
+    msg.payload.memory.nregions = fd_num;
+
+    if (!fd_num) {
+        error_report("Failed initializing vhost-user memory map, "
+                     "consider using -object memory-backend-file share=on");
+        return -1;
+    }
+
+    msg.hdr.size = sizeof(msg.payload.memory.nregions);
+    msg.hdr.size += sizeof(msg.payload.memory.padding);
+    msg.hdr.size += fd_num * sizeof(VhostUserMemoryRegion);
+
+    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
+        return -1;
+    }
+
+    if (reply_supported) {
+        return process_message_reply(dev, &msg);
+    }
+
+    return 0;
+}
+
 static int vhost_user_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
+    struct vhost_user *u = dev->opaque;
     int fds[VHOST_MEMORY_MAX_NREGIONS];
     int i, fd;
     size_t fd_num = 0;
+    bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
                                               VHOST_USER_PROTOCOL_F_REPLY_ACK);
 
+    if (do_postcopy) {
+        /* Postcopy has enough differences that it's best done in it's own
+         * version
+         */
+        return vhost_user_set_mem_table_postcopy(dev, mem);
+    }
+
     VhostUserMsg msg = {
         .hdr.request = VHOST_USER_SET_MEM_TABLE,
         .hdr.flags = VHOST_USER_VERSION,
@@ -372,9 +443,11 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
                 error_report("Failed preparing vhost-user memory table msg");
                 return -1;
             }
-            msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
+            msg.payload.memory.regions[fd_num].userspace_addr =
+                reg->userspace_addr;
             msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
-            msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
+            msg.payload.memory.regions[fd_num].guest_phys_addr =
+                reg->guest_phys_addr;
             msg.payload.memory.regions[fd_num].mmap_offset = offset;
             fds[fd_num++] = fd;
         }
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 34/50] migration/ram: ramblock_recv_bitmap_test_byte_offset
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (32 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 33/50] postcopy+vhost-user: Split set_mem_table for postcopy Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 35/50] libvhost-user+postcopy: Register new regions with the ufd Michael S. Tsirkin
                   ` (16 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu,
	Marc-André Lureau, Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Utility for testing the map when you already know the offset
in the RAMBlock.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.h | 1 +
 migration/ram.c | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/migration/ram.h b/migration/ram.h
index 53f0021..5030be1 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -60,6 +60,7 @@ int ram_postcopy_incoming_init(MigrationIncomingState *mis);
 void ram_handle_compressed(void *host, uint8_t ch, uint64_t size);
 
 int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
+bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
 
diff --git a/migration/ram.c b/migration/ram.c
index 7266351..6ce7770 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -169,6 +169,11 @@ int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr)
                     rb->receivedmap);
 }
 
+bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset)
+{
+    return test_bit(byte_offset >> TARGET_PAGE_BITS, rb->receivedmap);
+}
+
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr)
 {
     set_bit_atomic(ramblock_recv_bitmap_offset(host_addr, rb), rb->receivedmap);
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 35/50] libvhost-user+postcopy: Register new regions with the ufd
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (33 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 34/50] migration/ram: ramblock_recv_bitmap_test_byte_offset Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 37/50] vhost+postcopy: Stash RAMBlock and offset Michael S. Tsirkin
                   ` (15 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Maxime Coquelin, Yongji Xie

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When new regions are sent to the client using SET_MEM_TABLE, register
them with the userfaultfd.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index b2de8ed..7c8cd587 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -494,6 +494,40 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
         close(vmsg->fds[i]);
     }
 
+    /* TODO: Get address back to QEMU */
+    for (i = 0; i < dev->nregions; i++) {
+        VuDevRegion *dev_region = &dev->regions[i];
+#ifdef UFFDIO_REGISTER
+        /* We should already have an open ufd. Mark each memory
+         * range as ufd.
+         * Note: Do we need any madvises? Well it's not been accessed
+         * yet, still probably need no THP to be safe, discard to be safe?
+         */
+        struct uffdio_register reg_struct;
+        reg_struct.range.start = (uintptr_t)dev_region->mmap_addr;
+        reg_struct.range.len = dev_region->size + dev_region->mmap_offset;
+        reg_struct.mode = UFFDIO_REGISTER_MODE_MISSING;
+
+        if (ioctl(dev->postcopy_ufd, UFFDIO_REGISTER, &reg_struct)) {
+            vu_panic(dev, "%s: Failed to userfault region %d "
+                          "@%p + size:%zx offset: %zx: (ufd=%d)%s\n",
+                     __func__, i,
+                     dev_region->mmap_addr,
+                     dev_region->size, dev_region->mmap_offset,
+                     dev->postcopy_ufd, strerror(errno));
+            return false;
+        }
+        if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
+            vu_panic(dev, "%s Region (%d) doesn't support COPY",
+                     __func__, i);
+            return false;
+        }
+        DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
+                __func__, i, reg_struct.range.start, reg_struct.range.len);
+        /* TODO: Stash 'zero' support flags somewhere */
+#endif
+    }
+
     return false;
 }
 
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 37/50] vhost+postcopy: Stash RAMBlock and offset
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (34 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 35/50] libvhost-user+postcopy: Register new regions with the ufd Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 36/50] vhost+postcopy: Send address back to qemu Michael S. Tsirkin
                   ` (14 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Stash the RAMBlock and offset for later use looking up
addresses.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-user.c | 34 ++++++++++++++++++++++++++++++++++
 hw/virtio/trace-events |  1 +
 2 files changed, 35 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 1603d70..b47de62 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -175,6 +175,15 @@ struct vhost_user {
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
     uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
+    /* Length of the region_rb and region_rb_offset arrays */
+    size_t             region_rb_len;
+    /* RAMBlock associated with a given region */
+    RAMBlock         **region_rb;
+    /* The offset from the start of the RAMBlock to the start of the
+     * vhost region.
+     */
+    ram_addr_t        *region_rb_offset;
+
     /* True once we've entered postcopy_listen */
     bool               postcopy_listen;
 };
@@ -362,6 +371,17 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }
 
+    if (u->region_rb_len < dev->mem->nregions) {
+        u->region_rb = g_renew(RAMBlock*, u->region_rb, dev->mem->nregions);
+        u->region_rb_offset = g_renew(ram_addr_t, u->region_rb_offset,
+                                      dev->mem->nregions);
+        memset(&(u->region_rb[u->region_rb_len]), '\0',
+               sizeof(RAMBlock *) * (dev->mem->nregions - u->region_rb_len));
+        memset(&(u->region_rb_offset[u->region_rb_len]), '\0',
+               sizeof(ram_addr_t) * (dev->mem->nregions - u->region_rb_len));
+        u->region_rb_len = dev->mem->nregions;
+    }
+
     for (i = 0; i < dev->mem->nregions; ++i) {
         struct vhost_memory_region *reg = dev->mem->regions + i;
         ram_addr_t offset;
@@ -372,6 +392,12 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                      &offset);
         fd = memory_region_get_fd(mr);
         if (fd > 0) {
+            trace_vhost_user_set_mem_table_withfd(fd_num, mr->name,
+                                                  reg->memory_size,
+                                                  reg->guest_phys_addr,
+                                                  reg->userspace_addr, offset);
+            u->region_rb_offset[i] = offset;
+            u->region_rb[i] = mr->ram_block;
             msg.payload.memory.regions[fd_num].userspace_addr =
                 reg->userspace_addr;
             msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
@@ -380,6 +406,9 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
             msg.payload.memory.regions[fd_num].mmap_offset = offset;
             assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
             fds[fd_num++] = fd;
+        } else {
+            u->region_rb_offset[i] = 0;
+            u->region_rb[i] = NULL;
         }
     }
 
@@ -1148,6 +1177,11 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
         close(u->slave_fd);
         u->slave_fd = -1;
     }
+    g_free(u->region_rb);
+    u->region_rb = NULL;
+    g_free(u->region_rb_offset);
+    u->region_rb_offset = NULL;
+    u->region_rb_len = 0;
     g_free(u);
     dev->opaque = 0;
 
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 05d18ad..d7e9e10 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -9,6 +9,7 @@ vhost_section(const char *name, int r) "%s:%d"
 # hw/virtio/vhost-user.c
 vhost_user_postcopy_listen(void) ""
 vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:0x%"PRIx64" for hva: 0x%"PRIx64" reply %d region %d"
+vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:0x%"PRIx64" GPA:0x%"PRIx64" QVA/userspace:0x%"PRIx64" RB offset:0x%"PRIx64
 
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 36/50] vhost+postcopy: Send address back to qemu
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (35 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 37/50] vhost+postcopy: Stash RAMBlock and offset Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 39/50] vhost+postcopy: Resolve client address Michael S. Tsirkin
                   ` (13 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We need a better way, but at the moment we need the address of the
mappings sent back to qemu so it can interpret the messages on the
userfaultfd it reads.

This is done as a 3 stage set:
   QEMU -> client
      set_mem_table

   mmap stuff, get addresses

   client -> qemu
       here are the addresses

   qemu -> client
       OK - now you can use them

That ensures that qemu has registered the new addresses in it's
userfault code before the client starts accessing them.

Note: We don't ask for the default 'ack' reply since we've got our own.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/interop/vhost-user.txt           |  9 +++++
 contrib/libvhost-user/libvhost-user.c | 24 ++++++++++++-
 hw/virtio/vhost-user.c                | 67 +++++++++++++++++++++++++++++++++--
 hw/virtio/trace-events                |  1 +
 4 files changed, 98 insertions(+), 3 deletions(-)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index 0d24203..e295ef1 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -455,12 +455,21 @@ Master message types
       Id: 5
       Equivalent ioctl: VHOST_SET_MEM_TABLE
       Master payload: memory regions description
+      Slave payload: (postcopy only) memory regions description
 
       Sets the memory map regions on the slave so it can translate the vring
       addresses. In the ancillary data there is an array of file descriptors
       for each memory mapped region. The size and ordering of the fds matches
       the number and ordering of memory regions.
 
+      When VHOST_USER_POSTCOPY_LISTEN has been received, SET_MEM_TABLE replies with
+      the bases of the memory mapped regions to the master.  The slave must
+      have mmap'd the regions but not yet accessed them and should not yet generate
+      a userfault event. Note NEED_REPLY_MASK is not set in this case.
+      QEMU will then reply back to the list of mappings with an empty
+      VHOST_USER_SET_MEM_TABLE as an acknowledgment; only upon reception of this
+      message may the guest start accessing the memory and generating faults.
+
  * VHOST_USER_SET_LOG_BASE
 
       Id: 6
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 7c8cd587..6314549 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -491,10 +491,32 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
                    dev_region->mmap_addr);
         }
 
+        /* Return the address to QEMU so that it can translate the ufd
+         * fault addresses back.
+         */
+        msg_region->userspace_addr = (uintptr_t)(mmap_addr +
+                                                 dev_region->mmap_offset);
         close(vmsg->fds[i]);
     }
 
-    /* TODO: Get address back to QEMU */
+    /* Send the message back to qemu with the addresses filled in */
+    vmsg->fd_num = 0;
+    if (!vu_message_write(dev, dev->sock, vmsg)) {
+        vu_panic(dev, "failed to respond to set-mem-table for postcopy");
+        return false;
+    }
+
+    /* Wait for QEMU to confirm that it's registered the handler for the
+     * faults.
+     */
+    if (!vu_message_read(dev, dev->sock, vmsg) ||
+        vmsg->size != sizeof(vmsg->payload.u64) ||
+        vmsg->payload.u64 != 0) {
+        vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
+        return false;
+    }
+
+    /* OK, now we can go and register the memory and generate faults */
     for (i = 0; i < dev->nregions; i++) {
         VuDevRegion *dev_region = &dev->regions[i];
 #ifdef UFFDIO_REGISTER
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b6757eb..1603d70 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -174,6 +174,7 @@ struct vhost_user {
     int slave_fd;
     NotifierWithReturn postcopy_notifier;
     struct PostCopyFD  postcopy_fd;
+    uint64_t           postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS];
     /* True once we've entered postcopy_listen */
     bool               postcopy_listen;
 };
@@ -343,12 +344,15 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
 static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                              struct vhost_memory *mem)
 {
+    struct vhost_user *u = dev->opaque;
     int fds[VHOST_MEMORY_MAX_NREGIONS];
     int i, fd;
     size_t fd_num = 0;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
                                               VHOST_USER_PROTOCOL_F_REPLY_ACK);
-    /* TODO: Add actual postcopy differences */
+    VhostUserMsg msg_reply;
+    int region_i, msg_i;
+
     VhostUserMsg msg = {
         .hdr.request = VHOST_USER_SET_MEM_TABLE,
         .hdr.flags = VHOST_USER_VERSION,
@@ -395,6 +399,64 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
         return -1;
     }
 
+    if (vhost_user_read(dev, &msg_reply) < 0) {
+        return -1;
+    }
+
+    if (msg_reply.hdr.request != VHOST_USER_SET_MEM_TABLE) {
+        error_report("%s: Received unexpected msg type."
+                     "Expected %d received %d", __func__,
+                     VHOST_USER_SET_MEM_TABLE, msg_reply.hdr.request);
+        return -1;
+    }
+    /* We're using the same structure, just reusing one of the
+     * fields, so it should be the same size.
+     */
+    if (msg_reply.hdr.size != msg.hdr.size) {
+        error_report("%s: Unexpected size for postcopy reply "
+                     "%d vs %d", __func__, msg_reply.hdr.size, msg.hdr.size);
+        return -1;
+    }
+
+    memset(u->postcopy_client_bases, 0,
+           sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS);
+
+    /* They're in the same order as the regions that were sent
+     * but some of the regions were skipped (above) if they
+     * didn't have fd's
+    */
+    for (msg_i = 0, region_i = 0;
+         region_i < dev->mem->nregions;
+        region_i++) {
+        if (msg_i < fd_num &&
+            msg_reply.payload.memory.regions[msg_i].guest_phys_addr ==
+            dev->mem->regions[region_i].guest_phys_addr) {
+            u->postcopy_client_bases[region_i] =
+                msg_reply.payload.memory.regions[msg_i].userspace_addr;
+            trace_vhost_user_set_mem_table_postcopy(
+                msg_reply.payload.memory.regions[msg_i].userspace_addr,
+                msg.payload.memory.regions[msg_i].userspace_addr,
+                msg_i, region_i);
+            msg_i++;
+        }
+    }
+    if (msg_i != fd_num) {
+        error_report("%s: postcopy reply not fully consumed "
+                     "%d vs %zd",
+                     __func__, msg_i, fd_num);
+        return -1;
+    }
+    /* Now we've registered this with the postcopy code, we ack to the client,
+     * because now we're in the position to be able to deal with any faults
+     * it generates.
+     */
+    /* TODO: Use this for failure cases as well with a bad value */
+    msg.hdr.size = sizeof(msg.payload.u64);
+    msg.payload.u64 = 0; /* OK */
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        return -1;
+    }
+
     if (reply_supported) {
         return process_message_reply(dev, &msg);
     }
@@ -411,7 +473,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
     size_t fd_num = 0;
     bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
     bool reply_supported = virtio_has_feature(dev->protocol_features,
-                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
+                                          VHOST_USER_PROTOCOL_F_REPLY_ACK) &&
+                                          !do_postcopy;
 
     if (do_postcopy) {
         /* Postcopy has enough differences that it's best done in it's own
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 06ec03d..05d18ad 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -8,6 +8,7 @@ vhost_section(const char *name, int r) "%s:%d"
 
 # hw/virtio/vhost-user.c
 vhost_user_postcopy_listen(void) ""
+vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:0x%"PRIx64" for hva: 0x%"PRIx64" reply %d region %d"
 
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 38/50] vhost+postcopy: Helper to send requests to source for shared pages
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (37 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 39/50] vhost+postcopy: Resolve client address Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 41/50] postcopy: postcopy_notify_shared_wake Michael S. Tsirkin
                   ` (11 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a helper to be used by shared waker functions to request
shared pages from the source.
The last_rb pointer is moved into the incoming state since this
helper can update it as well as the main fault thread function.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/migration.h    |  2 ++
 migration/postcopy-ram.h |  3 +++
 migration/postcopy-ram.c | 32 +++++++++++++++++++++++++++++---
 migration/trace-events   |  2 ++
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index d02a759..83dc36b 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -49,6 +49,8 @@ struct MigrationIncomingState {
     int       userfault_event_fd;
     QEMUFile *to_src_file;
     QemuMutex rp_mutex;    /* We send replies from multiple threads */
+    /* RAMBlock of last request sent to source */
+    RAMBlock *last_rb;
     void     *postcopy_tmp_page;
     void     *postcopy_tmp_zero_page;
     /* PostCopyFD's for external userfaultfds & handlers of shared memory */
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index c8ced34..d7afab0 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -164,5 +164,8 @@ struct PostCopyFD {
  */
 void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
 void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
+/* Callback from shared fault handlers to ask for a page */
+int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
+                                 uint64_t client_addr, uint64_t offset);
 
 #endif
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 6ce1577..146bc09 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -526,6 +526,32 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
 }
 
 /*
+ * Callback from shared fault handlers to ask for a page,
+ * the page must be specified by a RAMBlock and an offset in that rb
+ * Note: Only for use by shared fault handlers (in fault thread)
+ */
+int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
+                                 uint64_t client_addr, uint64_t rb_offset)
+{
+    size_t pagesize = qemu_ram_pagesize(rb);
+    uint64_t aligned_rbo = rb_offset & ~(pagesize - 1);
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    trace_postcopy_request_shared_page(pcfd->idstr, qemu_ram_get_idstr(rb),
+                                       rb_offset);
+    /* TODO: Check bitmap to see if we already have the page */
+    if (rb != mis->last_rb) {
+        mis->last_rb = rb;
+        migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
+                                  aligned_rbo, pagesize);
+    } else {
+        /* Save some space */
+        migrate_send_rp_req_pages(mis, NULL, aligned_rbo, pagesize);
+    }
+    return 0;
+}
+
+/*
  * Handle faults detected by the USERFAULT markings
  */
 static void *postcopy_ram_fault_thread(void *opaque)
@@ -535,9 +561,9 @@ static void *postcopy_ram_fault_thread(void *opaque)
     int ret;
     size_t index;
     RAMBlock *rb = NULL;
-    RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
 
     trace_postcopy_ram_fault_thread_entry();
+    mis->last_rb = NULL; /* last RAMBlock we sent part of */
     qemu_sem_post(&mis->fault_thread_sem);
 
     struct pollfd *pfd;
@@ -636,8 +662,8 @@ static void *postcopy_ram_fault_thread(void *opaque)
              * Send the request to the source - we want to request one
              * of our host page sizes (which is >= TPS)
              */
-            if (rb != last_rb) {
-                last_rb = rb;
+            if (rb != mis->last_rb) {
+                mis->last_rb = rb;
                 migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
                                          rb_offset, qemu_ram_pagesize(rb));
             } else {
diff --git a/migration/trace-events b/migration/trace-events
index 1e617ad..7c910b5 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -198,6 +198,8 @@ postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
+postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset 0x%"PRIx64
+
 save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 39/50] vhost+postcopy: Resolve client address
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (36 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 36/50] vhost+postcopy: Send address back to qemu Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 38/50] vhost+postcopy: Helper to send requests to source for shared pages Michael S. Tsirkin
                   ` (12 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu, Marc-André Lureau

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Resolve fault addresses read off the clients UFD into RAMBlock
and offset, and call back to the postcopy code to ask for the page.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-user.c | 31 ++++++++++++++++++++++++++++++-
 hw/virtio/trace-events |  3 +++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b47de62..6dee1b5 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -26,6 +26,7 @@
 #include <sys/socket.h>
 #include <sys/un.h>
 #include <linux/vhost.h>
+#include <linux/userfaultfd.h>
 
 #define VHOST_MEMORY_MAX_NREGIONS    8
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
@@ -974,7 +975,35 @@ out:
 static int vhost_user_postcopy_fault_handler(struct PostCopyFD *pcfd,
                                              void *ufd)
 {
-    return 0;
+    struct vhost_dev *dev = pcfd->data;
+    struct vhost_user *u = dev->opaque;
+    struct uffd_msg *msg = ufd;
+    uint64_t faultaddr = msg->arg.pagefault.address;
+    RAMBlock *rb = NULL;
+    uint64_t rb_offset;
+    int i;
+
+    trace_vhost_user_postcopy_fault_handler(pcfd->idstr, faultaddr,
+                                            dev->mem->nregions);
+    for (i = 0; i < MIN(dev->mem->nregions, u->region_rb_len); i++) {
+        trace_vhost_user_postcopy_fault_handler_loop(i,
+                u->postcopy_client_bases[i], dev->mem->regions[i].memory_size);
+        if (faultaddr >= u->postcopy_client_bases[i]) {
+            /* Ofset of the fault address in the vhost region */
+            uint64_t region_offset = faultaddr - u->postcopy_client_bases[i];
+            if (region_offset < dev->mem->regions[i].memory_size) {
+                rb_offset = region_offset + u->region_rb_offset[i];
+                trace_vhost_user_postcopy_fault_handler_found(i,
+                        region_offset, rb_offset);
+                rb = u->region_rb[i];
+                return postcopy_request_shared_page(pcfd, rb, faultaddr,
+                                                    rb_offset);
+            }
+        }
+    }
+    error_report("%s: Failed to find region for fault %" PRIx64,
+                 __func__, faultaddr);
+    return -1;
 }
 
 /*
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index d7e9e10..3afd12c 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -7,6 +7,9 @@ vhost_region_add_section_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx
 vhost_section(const char *name, int r) "%s:%d"
 
 # hw/virtio/vhost-user.c
+vhost_user_postcopy_fault_handler(const char *name, uint64_t fault_address, int nregions) "%s: @0x%"PRIx64" nregions:%d"
+vhost_user_postcopy_fault_handler_loop(int i, uint64_t client_base, uint64_t size) "%d: client 0x%"PRIx64" +0x%"PRIx64
+vhost_user_postcopy_fault_handler_found(int i, uint64_t region_offset, uint64_t rb_offset) "%d: region_offset: 0x%"PRIx64" rb_offset:0x%"PRIx64
 vhost_user_postcopy_listen(void) ""
 vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:0x%"PRIx64" for hva: 0x%"PRIx64" reply %d region %d"
 vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:0x%"PRIx64" GPA:0x%"PRIx64" QVA/userspace:0x%"PRIx64" RB offset:0x%"PRIx64
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 40/50] postcopy: helper for waking shared
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (39 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 41/50] postcopy: postcopy_notify_shared_wake Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 44/50] libvhost-user: mprotect & madvises for postcopy Michael S. Tsirkin
                   ` (9 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Provide a helper to send a 'wake' request on a userfaultfd for
a shared process.
The address in the clients address space is specified together
with the RAMBlock it was resolved to.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/postcopy-ram.h | 10 ++++++++++
 migration/postcopy-ram.c | 26 ++++++++++++++++++++++++++
 migration/trace-events   |  1 +
 3 files changed, 37 insertions(+)

diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index d7afab0..fcd53b8 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -164,6 +164,16 @@ struct PostCopyFD {
  */
 void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
 void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
+/* postcopy_wake_shared: Notify a client ufd that a page is available
+ *
+ * Returns 0 on success
+ *
+ * @pcfd: Structure with fd, handler and name as above
+ * @client_addr: Address in the client program, not QEMU
+ * @rb: The RAMBlock the page is in
+ */
+int postcopy_wake_shared(struct PostCopyFD *pcfd, uint64_t client_addr,
+                         RAMBlock *rb);
 /* Callback from shared fault handlers to ask for a page */
 int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
                                  uint64_t client_addr, uint64_t offset);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 146bc09..11cb096 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -525,6 +525,25 @@ static int ram_block_enable_notify(const char *block_name, void *host_addr,
     return 0;
 }
 
+int postcopy_wake_shared(struct PostCopyFD *pcfd,
+                         uint64_t client_addr,
+                         RAMBlock *rb)
+{
+    size_t pagesize = qemu_ram_pagesize(rb);
+    struct uffdio_range range;
+    int ret;
+    trace_postcopy_wake_shared(client_addr, qemu_ram_get_idstr(rb));
+    range.start = client_addr & ~(pagesize - 1);
+    range.len = pagesize;
+    ret = ioctl(pcfd->fd, UFFDIO_WAKE, &range);
+    if (ret) {
+        error_report("%s: Failed to wake: %zx in %s (%s)",
+                     __func__, (size_t)client_addr, qemu_ram_get_idstr(rb),
+                     strerror(errno));
+    }
+    return ret;
+}
+
 /*
  * Callback from shared fault handlers to ask for a page,
  * the page must be specified by a RAMBlock and an offset in that rb
@@ -954,6 +973,13 @@ void *postcopy_get_tmp_page(MigrationIncomingState *mis)
     return NULL;
 }
 
+int postcopy_wake_shared(struct PostCopyFD *pcfd,
+                         uint64_t client_addr,
+                         RAMBlock *rb)
+{
+    assert(0);
+    return -1;
+}
 #endif
 
 /* ------------------------------------------------------------------------- */
diff --git a/migration/trace-events b/migration/trace-events
index 7c910b5..b0acaaa 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -199,6 +199,7 @@ postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
 postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset 0x%"PRIx64
+postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in %s"
 
 save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 41/50] postcopy: postcopy_notify_shared_wake
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (38 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 38/50] vhost+postcopy: Helper to send requests to source for shared pages Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 40/50] postcopy: helper for waking shared Michael S. Tsirkin
                   ` (10 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu,
	Marc-André Lureau, Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a hook to allow a client userfaultfd to be 'woken'
when a page arrives, and a walker that calls that
hook for relevant clients given a RAMBlock and offset.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/postcopy-ram.h | 10 ++++++++++
 migration/postcopy-ram.c | 16 ++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index fcd53b8..2c73d77 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -148,6 +148,10 @@ struct PostCopyFD;
 
 /* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
 typedef int (*pcfdhandler)(struct PostCopyFD *pcfd, void *ufd);
+/* Notification to wake, either on place or on reception of
+ * a fault on something that's already arrived (race)
+ */
+typedef int (*pcfdwake)(struct PostCopyFD *pcfd, RAMBlock *rb, uint64_t offset);
 
 struct PostCopyFD {
     int fd;
@@ -155,6 +159,8 @@ struct PostCopyFD {
     void *data;
     /* Handler to be called whenever we get a poll event */
     pcfdhandler handler;
+    /* Notification to wake shared client */
+    pcfdwake waker;
     /* A string to use in error messages */
     const char *idstr;
 };
@@ -164,6 +170,10 @@ struct PostCopyFD {
  */
 void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
 void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
+/* Call each of the shared 'waker's registerd telling them of
+ * availability of a block.
+ */
+int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset);
 /* postcopy_wake_shared: Notify a client ufd that a page is available
  *
  * Returns 0 on success
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 11cb096..d14a051 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -827,6 +827,22 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
     return ret;
 }
 
+int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset)
+{
+    int i;
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    GArray *pcrfds = mis->postcopy_remote_fds;
+
+    for (i = 0; i < pcrfds->len; i++) {
+        struct PostCopyFD *cur = &g_array_index(pcrfds, struct PostCopyFD, i);
+        int ret = cur->waker(cur, rb, offset);
+        if (ret) {
+            return ret;
+        }
+    }
+    return 0;
+}
+
 /*
  * Place a host page (from) at (host) atomically
  * returns 0 on success
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 43/50] vhost+postcopy: Call wakeups
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (42 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 42/50] vhost+postcopy: Add vhost waker Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 46/50] vhost+postcopy: Wire up POSTCOPY_END notify Michael S. Tsirkin
                   ` (6 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert, Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Cause the vhost-user client to be woken up whenever:
  a) We place a page in postcopy mode
  b) We get a fault and the page has already been received

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/postcopy-ram.c | 14 ++++++++++----
 migration/trace-events   |  1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d14a051..7754913 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -558,7 +558,11 @@ int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,
 
     trace_postcopy_request_shared_page(pcfd->idstr, qemu_ram_get_idstr(rb),
                                        rb_offset);
-    /* TODO: Check bitmap to see if we already have the page */
+    if (ramblock_recv_bitmap_test_byte_offset(rb, aligned_rbo)) {
+        trace_postcopy_request_shared_page_present(pcfd->idstr,
+                                        qemu_ram_get_idstr(rb), rb_offset);
+        return postcopy_wake_shared(pcfd, client_addr, rb);
+    }
     if (rb != mis->last_rb) {
         mis->last_rb = rb;
         migrate_send_rp_req_pages(mis, qemu_ram_get_idstr(rb),
@@ -866,7 +870,8 @@ int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,
     }
 
     trace_postcopy_place_page(host);
-    return 0;
+    return postcopy_notify_shared_wake(rb,
+                                       qemu_ram_block_host_offset(rb, host));
 }
 
 /*
@@ -890,6 +895,9 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
 
             return -e;
         }
+        return postcopy_notify_shared_wake(rb,
+                                           qemu_ram_block_host_offset(rb,
+                                                                      host));
     } else {
         /* The kernel can't use UFFDIO_ZEROPAGE for hugepages */
         if (!mis->postcopy_tmp_zero_page) {
@@ -909,8 +917,6 @@ int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,
         return postcopy_place_page(mis, host, mis->postcopy_tmp_zero_page,
                                    rb);
     }
-
-    return 0;
 }
 
 /*
diff --git a/migration/trace-events b/migration/trace-events
index b0acaaa..1e353a3 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -199,6 +199,7 @@ postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
 postcopy_request_shared_page(const char *sharer, const char *rb, uint64_t rb_offset) "for %s in %s offset 0x%"PRIx64
+postcopy_request_shared_page_present(const char *sharer, const char *rb, uint64_t rb_offset) "%s already %s offset 0x%"PRIx64
 postcopy_wake_shared(uint64_t client_addr, const char *rb) "at 0x%"PRIx64" in %s"
 
 save_xbzrle_page_skipping(void) ""
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 44/50] libvhost-user: mprotect & madvises for postcopy
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (40 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 40/50] postcopy: helper for waking shared Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 42/50] vhost+postcopy: Add vhost waker Michael S. Tsirkin
                   ` (8 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Maxime Coquelin, Peter Xu

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Clear the area and turn off THP.
PROT_NONE the area until after we've userfault advised it
to catch any unexpected changes.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 47 +++++++++++++++++++++++++++++++----
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 6314549..5feed52 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -454,7 +454,7 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
     int i;
     VhostUserMemory *memory = &vmsg->payload.memory;
     dev->nregions = memory->nregions;
-    /* TODO: Postcopy specific code */
+
     DPRINT("Nregions: %d\n", memory->nregions);
     for (i = 0; i < dev->nregions; i++) {
         void *mmap_addr;
@@ -478,9 +478,12 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
 
         /* We don't use offset argument of mmap() since the
          * mapped address has to be page aligned, and we use huge
-         * pages.  */
+         * pages.
+         * In postcopy we're using PROT_NONE here to catch anyone
+         * accessing it before we userfault
+         */
         mmap_addr = mmap(0, dev_region->size + dev_region->mmap_offset,
-                         PROT_READ | PROT_WRITE, MAP_SHARED,
+                         PROT_NONE, MAP_SHARED,
                          vmsg->fds[i], 0);
 
         if (mmap_addr == MAP_FAILED) {
@@ -519,12 +522,38 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
     /* OK, now we can go and register the memory and generate faults */
     for (i = 0; i < dev->nregions; i++) {
         VuDevRegion *dev_region = &dev->regions[i];
+        int ret;
 #ifdef UFFDIO_REGISTER
         /* We should already have an open ufd. Mark each memory
          * range as ufd.
-         * Note: Do we need any madvises? Well it's not been accessed
-         * yet, still probably need no THP to be safe, discard to be safe?
+         * Discard any mapping we have here; note I can't use MADV_REMOVE
+         * or fallocate to make the hole since I don't want to lose
+         * data that's already arrived in the shared process.
+         * TODO: How to do hugepage
          */
+        ret = madvise((void *)dev_region->mmap_addr,
+                      dev_region->size + dev_region->mmap_offset,
+                      MADV_DONTNEED);
+        if (ret) {
+            fprintf(stderr,
+                    "%s: Failed to madvise(DONTNEED) region %d: %s\n",
+                    __func__, i, strerror(errno));
+        }
+        /* Turn off transparent hugepages so we dont get lose wakeups
+         * in neighbouring pages.
+         * TODO: Turn this backon later.
+         */
+        ret = madvise((void *)dev_region->mmap_addr,
+                      dev_region->size + dev_region->mmap_offset,
+                      MADV_NOHUGEPAGE);
+        if (ret) {
+            /* Note: This can happen legally on kernels that are configured
+             * without madvise'able hugepages
+             */
+            fprintf(stderr,
+                    "%s: Failed to madvise(NOHUGEPAGE) region %d: %s\n",
+                    __func__, i, strerror(errno));
+        }
         struct uffdio_register reg_struct;
         reg_struct.range.start = (uintptr_t)dev_region->mmap_addr;
         reg_struct.range.len = dev_region->size + dev_region->mmap_offset;
@@ -546,6 +575,14 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
         }
         DPRINT("%s: region %d: Registered userfault for %llx + %llx\n",
                 __func__, i, reg_struct.range.start, reg_struct.range.len);
+        /* Now it's registered we can let the client at it */
+        if (mprotect((void *)dev_region->mmap_addr,
+                     dev_region->size + dev_region->mmap_offset,
+                     PROT_READ | PROT_WRITE)) {
+            vu_panic(dev, "failed to mprotect region %d for postcopy (%s)",
+                     i, strerror(errno));
+            return false;
+        }
         /* TODO: Stash 'zero' support flags somewhere */
 #endif
     }
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 42/50] vhost+postcopy: Add vhost waker
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (41 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 44/50] libvhost-user: mprotect & madvises for postcopy Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 43/50] vhost+postcopy: Call wakeups Michael S. Tsirkin
                   ` (7 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Register a waker function in vhost-user code to be notified when
pages arrive or requests to previously mapped pages get requested.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-user.c | 30 ++++++++++++++++++++++++++++++
 hw/virtio/trace-events |  3 +++
 2 files changed, 33 insertions(+)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 6dee1b5..a785aef 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1006,6 +1006,35 @@ static int vhost_user_postcopy_fault_handler(struct PostCopyFD *pcfd,
     return -1;
 }
 
+static int vhost_user_postcopy_waker(struct PostCopyFD *pcfd, RAMBlock *rb,
+                                     uint64_t offset)
+{
+    struct vhost_dev *dev = pcfd->data;
+    struct vhost_user *u = dev->opaque;
+    int i;
+
+    trace_vhost_user_postcopy_waker(qemu_ram_get_idstr(rb), offset);
+
+    if (!u) {
+        return 0;
+    }
+    /* Translate the offset into an address in the clients address space */
+    for (i = 0; i < MIN(dev->mem->nregions, u->region_rb_len); i++) {
+        if (u->region_rb[i] == rb &&
+            offset >= u->region_rb_offset[i] &&
+            offset < (u->region_rb_offset[i] +
+                      dev->mem->regions[i].memory_size)) {
+            uint64_t client_addr = (offset - u->region_rb_offset[i]) +
+                                   u->postcopy_client_bases[i];
+            trace_vhost_user_postcopy_waker_found(client_addr);
+            return postcopy_wake_shared(pcfd, client_addr, rb);
+        }
+    }
+
+    trace_vhost_user_postcopy_waker_nomatch(qemu_ram_get_idstr(rb), offset);
+    return 0;
+}
+
 /*
  * Called at the start of an inbound postcopy on reception of the
  * 'advise' command.
@@ -1051,6 +1080,7 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
     u->postcopy_fd.fd = ufd;
     u->postcopy_fd.data = dev;
     u->postcopy_fd.handler = vhost_user_postcopy_fault_handler;
+    u->postcopy_fd.waker = vhost_user_postcopy_waker;
     u->postcopy_fd.idstr = "vhost-user"; /* Need to find unique name */
     postcopy_register_shared_ufd(&u->postcopy_fd);
     return 0;
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 3afd12c..fe5e0ff 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -13,6 +13,9 @@ vhost_user_postcopy_fault_handler_found(int i, uint64_t region_offset, uint64_t
 vhost_user_postcopy_listen(void) ""
 vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:0x%"PRIx64" for hva: 0x%"PRIx64" reply %d region %d"
 vhost_user_set_mem_table_withfd(int index, const char *name, uint64_t memory_size, uint64_t guest_phys_addr, uint64_t userspace_addr, uint64_t offset) "%d:%s: size:0x%"PRIx64" GPA:0x%"PRIx64" QVA/userspace:0x%"PRIx64" RB offset:0x%"PRIx64
+vhost_user_postcopy_waker(const char *rb, uint64_t rb_offset) "%s + 0x%"PRIx64
+vhost_user_postcopy_waker_found(uint64_t client_addr) "0x%"PRIx64
+vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) "%s + 0x%"PRIx64
 
 # hw/virtio/virtio.c
 virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 46/50] vhost+postcopy: Wire up POSTCOPY_END notify
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (43 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 43/50] vhost+postcopy: Call wakeups Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 45/50] vhost-user: Add VHOST_USER_POSTCOPY_END message Michael S. Tsirkin
                   ` (5 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Wire up a call to VHOST_USER_POSTCOPY_END message to the vhost clients
right before we ask the listener thread to shutdown.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/postcopy-ram.h |  1 +
 hw/virtio/vhost-user.c   | 34 ++++++++++++++++++++++++++++++++++
 migration/postcopy-ram.c |  7 +++++++
 hw/virtio/trace-events   |  2 ++
 4 files changed, 44 insertions(+)

diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index 2c73d77..d900d9c 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -132,6 +132,7 @@ enum PostcopyNotifyReason {
     POSTCOPY_NOTIFY_PROBE = 0,
     POSTCOPY_NOTIFY_INBOUND_ADVISE,
     POSTCOPY_NOTIFY_INBOUND_LISTEN,
+    POSTCOPY_NOTIFY_INBOUND_END,
 };
 
 struct PostcopyNotifyData {
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 230f2f9..44aea5c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1114,6 +1114,37 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
     return 0;
 }
 
+/*
+ * Called at the end of postcopy
+ */
+static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
+{
+    VhostUserMsg msg = {
+        .hdr.request = VHOST_USER_POSTCOPY_END,
+        .hdr.flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY_MASK,
+    };
+    int ret;
+    struct vhost_user *u = dev->opaque;
+
+    trace_vhost_user_postcopy_end_entry();
+    if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
+        error_setg(errp, "Failed to send postcopy_end to vhost");
+        return -1;
+    }
+
+    ret = process_message_reply(dev, &msg);
+    if (ret) {
+        error_setg(errp, "Failed to receive reply to postcopy_end");
+        return ret;
+    }
+    postcopy_unregister_shared_ufd(&u->postcopy_fd);
+    u->postcopy_fd.handler = NULL;
+
+    trace_vhost_user_postcopy_end_exit();
+
+    return 0;
+}
+
 static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
                                         void *opaque)
 {
@@ -1139,6 +1170,9 @@ static int vhost_user_postcopy_notifier(NotifierWithReturn *notifier,
     case POSTCOPY_NOTIFY_INBOUND_LISTEN:
         return vhost_user_postcopy_listen(dev, pnd->errp);
 
+    case POSTCOPY_NOTIFY_INBOUND_END:
+        return vhost_user_postcopy_end(dev, pnd->errp);
+
     default:
         /* We ignore notifications we don't know */
         break;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 7754913..9cdee0f 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -413,6 +413,13 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
     trace_postcopy_ram_incoming_cleanup_entry();
 
     if (mis->have_fault_thread) {
+        Error *local_err = NULL;
+
+        if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_END, &local_err)) {
+            error_report_err(local_err);
+            return -1;
+        }
+
         if (qemu_ram_foreach_block(cleanup_range, mis)) {
             return -1;
         }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index fe5e0ff..857c495 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -7,6 +7,8 @@ vhost_region_add_section_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx
 vhost_section(const char *name, int r) "%s:%d"
 
 # hw/virtio/vhost-user.c
+vhost_user_postcopy_end_entry(void) ""
+vhost_user_postcopy_end_exit(void) ""
 vhost_user_postcopy_fault_handler(const char *name, uint64_t fault_address, int nregions) "%s: @0x%"PRIx64" nregions:%d"
 vhost_user_postcopy_fault_handler_loop(int i, uint64_t client_base, uint64_t size) "%d: client 0x%"PRIx64" +0x%"PRIx64
 vhost_user_postcopy_fault_handler_found(int i, uint64_t region_offset, uint64_t rb_offset) "%d: region_offset: 0x%"PRIx64" rb_offset:0x%"PRIx64
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 45/50] vhost-user: Add VHOST_USER_POSTCOPY_END message
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (44 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 46/50] vhost+postcopy: Wire up POSTCOPY_END notify Michael S. Tsirkin
@ 2018-03-20  3:17 ` Michael S. Tsirkin
  2018-03-20  3:18 ` [Qemu-devel] [PULL v2 47/50] vhost: Huge page align and merge Michael S. Tsirkin
                   ` (4 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Peter Xu, Marc-André Lureau

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

This message is sent just before the end of postcopy to get the
client to stop using userfault since we wont respond to any more
requests.  It should close userfaultfd so that any other pages
get mapped to the backing file automatically by the kernel, since
at this point we know we've received everything.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/interop/vhost-user.txt           | 12 ++++++++++++
 contrib/libvhost-user/libvhost-user.h |  1 +
 contrib/libvhost-user/libvhost-user.c | 23 +++++++++++++++++++++++
 hw/virtio/vhost-user.c                |  1 +
 4 files changed, 37 insertions(+)

diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
index e295ef1..c058c40 100644
--- a/docs/interop/vhost-user.txt
+++ b/docs/interop/vhost-user.txt
@@ -729,6 +729,18 @@ Master message types
       This is always sent sometime after a VHOST_USER_POSTCOPY_ADVISE, and
       thus only when VHOST_USER_PROTOCOL_F_PAGEFAULT is supported.
 
+ * VHOST_USER_POSTCOPY_END
+      Id: 30
+      Slave payload: u64
+
+      Master advises that postcopy migration has now completed.  The
+      slave must disable the userfaultfd. The response is an acknowledgement
+      only.
+      When VHOST_USER_PROTOCOL_F_PAGEFAULT is supported, this message
+      is sent at the end of the migration, after VHOST_USER_POSTCOPY_LISTEN
+      was previously sent.
+      The value returned is an error indication; 0 is success.
+
 Slave message types
 -------------------
 
diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
index ed505cf..79f7a53 100644
--- a/contrib/libvhost-user/libvhost-user.h
+++ b/contrib/libvhost-user/libvhost-user.h
@@ -87,6 +87,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_CLOSE_CRYPTO_SESSION = 27,
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
+    VHOST_USER_POSTCOPY_END     = 30,
     VHOST_USER_MAX
 } VhostUserRequest;
 
diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 5feed52..504ff5e 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -99,6 +99,7 @@ vu_request_to_string(unsigned int req)
         REQ(VHOST_USER_SET_CONFIG),
         REQ(VHOST_USER_POSTCOPY_ADVISE),
         REQ(VHOST_USER_POSTCOPY_LISTEN),
+        REQ(VHOST_USER_POSTCOPY_END),
         REQ(VHOST_USER_MAX),
     };
 #undef REQ
@@ -1094,6 +1095,26 @@ vu_set_postcopy_listen(VuDev *dev, VhostUserMsg *vmsg)
     vmsg->payload.u64 = 0; /* Success */
     return true;
 }
+
+static bool
+vu_set_postcopy_end(VuDev *dev, VhostUserMsg *vmsg)
+{
+    DPRINT("%s: Entry\n", __func__);
+    dev->postcopy_listening = false;
+    if (dev->postcopy_ufd > 0) {
+        close(dev->postcopy_ufd);
+        dev->postcopy_ufd = -1;
+        DPRINT("%s: Done close\n", __func__);
+    }
+
+    vmsg->fd_num = 0;
+    vmsg->payload.u64 = 0;
+    vmsg->size = sizeof(vmsg->payload.u64);
+    vmsg->flags = VHOST_USER_VERSION |  VHOST_USER_REPLY_MASK;
+    DPRINT("%s: exit\n", __func__);
+    return true;
+}
+
 static bool
 vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
 {
@@ -1169,6 +1190,8 @@ vu_process_message(VuDev *dev, VhostUserMsg *vmsg)
         return vu_set_postcopy_advise(dev, vmsg);
     case VHOST_USER_POSTCOPY_LISTEN:
         return vu_set_postcopy_listen(dev, vmsg);
+    case VHOST_USER_POSTCOPY_END:
+        return vu_set_postcopy_end(dev, vmsg);
     default:
         vmsg_close_fds(vmsg);
         vu_panic(dev, "Unhandled request: %d", vmsg->request);
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index a785aef..230f2f9 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -82,6 +82,7 @@ typedef enum VhostUserRequest {
     VHOST_USER_CLOSE_CRYPTO_SESSION = 27,
     VHOST_USER_POSTCOPY_ADVISE  = 28,
     VHOST_USER_POSTCOPY_LISTEN  = 29,
+    VHOST_USER_POSTCOPY_END     = 30,
     VHOST_USER_MAX
 } VhostUserRequest;
 
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 48/50] postcopy: Allow shared memory
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (46 preceding siblings ...)
  2018-03-20  3:18 ` [Qemu-devel] [PULL v2 47/50] vhost: Huge page align and merge Michael S. Tsirkin
@ 2018-03-20  3:18 ` Michael S. Tsirkin
  2018-03-20  3:18 ` [Qemu-devel] [PULL v2 50/50] postcopy shared docs Michael S. Tsirkin
                   ` (2 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Juan Quintela

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Now that we have the mechanisms in here, allow shared memory in a
postcopy.

Note that QEMU can't tell who all the users of shared regions are
and thus can't tell whether all the users of the shared regions
have appropriate support for postcopy.  Those devices that explicitly
support shared memory (e.g. vhost-user) must check, but it doesn't
stop weirder configurations causing problems.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 migration/postcopy-ram.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 9cdee0f..188c2ca 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -215,12 +215,6 @@ static int test_ramblock_postcopiable(const char *block_name, void *host_addr,
     RAMBlock *rb = qemu_ram_block_by_name(block_name);
     size_t pagesize = qemu_ram_pagesize(rb);
 
-    if (qemu_ram_is_shared(rb)) {
-        error_report("Postcopy on shared RAM (%s) is not yet supported",
-                     block_name);
-        return 1;
-    }
-
     if (length % pagesize) {
         error_report("Postcopy requires RAM blocks to be a page size multiple,"
                      " block %s is 0x" RAM_ADDR_FMT " bytes with a "
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 47/50] vhost: Huge page align and merge
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (45 preceding siblings ...)
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 45/50] vhost-user: Add VHOST_USER_POSTCOPY_END message Michael S. Tsirkin
@ 2018-03-20  3:18 ` Michael S. Tsirkin
  2018-03-20  3:18 ` [Qemu-devel] [PULL v2 48/50] postcopy: Allow shared memory Michael S. Tsirkin
                   ` (3 subsequent siblings)
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:18 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Maydell, Dr. David Alan Gilbert

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Align RAMBlocks to page size alignment, and adjust the merging code
to deal with partial overlap due to that alignment.

This is needed for postcopy so that we can place/fetch whole hugepages
when under userfault.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost.c      | 66 ++++++++++++++++++++++++++++++++++++++++++--------
 hw/virtio/trace-events |  3 ++-
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index d8d0ef9..250f886 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -522,10 +522,28 @@ static void vhost_region_add_section(struct vhost_dev *dev,
     uint64_t mrs_gpa = section->offset_within_address_space;
     uintptr_t mrs_host = (uintptr_t)memory_region_get_ram_ptr(section->mr) +
                          section->offset_within_region;
+    RAMBlock *mrs_rb = section->mr->ram_block;
+    size_t mrs_page = qemu_ram_pagesize(mrs_rb);
 
     trace_vhost_region_add_section(section->mr->name, mrs_gpa, mrs_size,
                                    mrs_host);
 
+    /* Round the section to it's page size */
+    /* First align the start down to a page boundary */
+    uint64_t alignage = mrs_host & (mrs_page - 1);
+    if (alignage) {
+        mrs_host -= alignage;
+        mrs_size += alignage;
+        mrs_gpa  -= alignage;
+    }
+    /* Now align the size up to a page boundary */
+    alignage = mrs_size & (mrs_page - 1);
+    if (alignage) {
+        mrs_size += mrs_page - alignage;
+    }
+    trace_vhost_region_add_section_aligned(section->mr->name, mrs_gpa, mrs_size,
+                                           mrs_host);
+
     if (dev->n_tmp_sections) {
         /* Since we already have at least one section, lets see if
          * this extends it; since we're scanning in order, we only
@@ -542,18 +560,46 @@ static void vhost_region_add_section(struct vhost_dev *dev,
                         prev_sec->offset_within_region;
         uint64_t prev_host_end   = range_get_last(prev_host_start, prev_size);
 
-        if (prev_gpa_end + 1 == mrs_gpa &&
-            prev_host_end + 1 == mrs_host &&
-            section->mr == prev_sec->mr &&
-            (!dev->vhost_ops->vhost_backend_can_merge ||
-                dev->vhost_ops->vhost_backend_can_merge(dev,
+        if (mrs_gpa <= (prev_gpa_end + 1)) {
+            /* OK, looks like overlapping/intersecting - it's possible that
+             * the rounding to page sizes has made them overlap, but they should
+             * match up in the same RAMBlock if they do.
+             */
+            if (mrs_gpa < prev_gpa_start) {
+                error_report("%s:Section rounded to %"PRIx64
+                             " prior to previous %"PRIx64,
+                             __func__, mrs_gpa, prev_gpa_start);
+                /* A way to cleanly fail here would be better */
+                return;
+            }
+            /* Offset from the start of the previous GPA to this GPA */
+            size_t offset = mrs_gpa - prev_gpa_start;
+
+            if (prev_host_start + offset == mrs_host &&
+                section->mr == prev_sec->mr &&
+                (!dev->vhost_ops->vhost_backend_can_merge ||
+                 dev->vhost_ops->vhost_backend_can_merge(dev,
                     mrs_host, mrs_size,
                     prev_host_start, prev_size))) {
-            /* The two sections abut */
-            need_add = false;
-            prev_sec->size = int128_add(prev_sec->size, section->size);
-            trace_vhost_region_add_section_abut(section->mr->name,
-                                                mrs_size + prev_size);
+                uint64_t max_end = MAX(prev_host_end, mrs_host + mrs_size);
+                need_add = false;
+                prev_sec->offset_within_address_space =
+                    MIN(prev_gpa_start, mrs_gpa);
+                prev_sec->offset_within_region =
+                    MIN(prev_host_start, mrs_host) -
+                    (uintptr_t)memory_region_get_ram_ptr(prev_sec->mr);
+                prev_sec->size = int128_make64(max_end - MIN(prev_host_start,
+                                               mrs_host));
+                trace_vhost_region_add_section_merge(section->mr->name,
+                                        int128_get64(prev_sec->size),
+                                        prev_sec->offset_within_address_space,
+                                        prev_sec->offset_within_region);
+            } else {
+                error_report("%s: Overlapping but not coherent sections "
+                             "at %"PRIx64,
+                             __func__, mrs_gpa);
+                return;
+            }
         }
     }
 
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 857c495..1422ff0 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -3,7 +3,8 @@
 # hw/virtio/vhost.c
 vhost_commit(bool started, bool changed) "Started: %d Changed: %d"
 vhost_region_add_section(const char *name, uint64_t gpa, uint64_t size, uint64_t host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64
-vhost_region_add_section_abut(const char *name, uint64_t new_size) "%s: 0x%"PRIx64
+vhost_region_add_section_merge(const char *name, uint64_t new_size, uint64_t gpa, uint64_t owr) "%s: size: 0x%"PRIx64 " gpa: 0x%"PRIx64 " owr: 0x%"PRIx64
+vhost_region_add_section_aligned(const char *name, uint64_t gpa, uint64_t size, uint64_t host) "%s: 0x%"PRIx64"+0x%"PRIx64" @ 0x%"PRIx64
 vhost_section(const char *name, int r) "%s:%d"
 
 # hw/virtio/vhost-user.c
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 50/50] postcopy shared docs
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (47 preceding siblings ...)
  2018-03-20  3:18 ` [Qemu-devel] [PULL v2 48/50] postcopy: Allow shared memory Michael S. Tsirkin
@ 2018-03-20  3:18 ` Michael S. Tsirkin
  2018-03-20  3:18 ` [Qemu-devel] [PULL v2 49/50] libvhost-user: Claim support for postcopy Michael S. Tsirkin
  2018-03-20 14:18 ` [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Peter Maydell
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Juan Quintela, Peter Xu, Daniel P. Berrangé

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add some notes to the migration documentation for shared memory
postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 docs/devel/migration.rst | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 9d1b765..e32b087 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -577,3 +577,44 @@ Postcopy now works with hugetlbfs backed memory:
      hugepages works well, however 1GB hugepages are likely to be problematic
      since it takes ~1 second to transfer a 1GB hugepage across a 10Gbps link,
      and until the full page is transferred the destination thread is blocked.
+
+Postcopy with shared memory
+---------------------------
+
+Postcopy migration with shared memory needs explicit support from the other
+processes that share memory and from QEMU. There are restrictions on the type of
+memory that userfault can support shared.
+
+The Linux kernel userfault support works on `/dev/shm` memory and on `hugetlbfs`
+(although the kernel doesn't provide an equivalent to `madvise(MADV_DONTNEED)`
+for hugetlbfs which may be a problem in some configurations).
+
+The vhost-user code in QEMU supports clients that have Postcopy support,
+and the `vhost-user-bridge` (in `tests/`) and the DPDK package have changes
+to support postcopy.
+
+The client needs to open a userfaultfd and register the areas
+of memory that it maps with userfault.  The client must then pass the
+userfaultfd back to QEMU together with a mapping table that allows
+fault addresses in the clients address space to be converted back to
+RAMBlock/offsets.  The client's userfaultfd is added to the postcopy
+fault-thread and page requests are made on behalf of the client by QEMU.
+QEMU performs 'wake' operations on the client's userfaultfd to allow it
+to continue after a page has arrived.
+
+.. note::
+  There are two future improvements that would be nice:
+    a) Some way to make QEMU ignorant of the addresses in the clients
+       address space
+    b) Avoiding the need for QEMU to perform ufd-wake calls after the
+       pages have arrived
+
+Retro-fitting postcopy to existing clients is possible:
+  a) A mechanism is needed for the registration with userfault as above,
+     and the registration needs to be coordinated with the phases of
+     postcopy.  In vhost-user extra messages are added to the existing
+     control channel.
+  b) Any thread that can block due to guest memory accesses must be
+     identified and the implication understood; for example if the
+     guest memory access is made while holding a lock then all other
+     threads waiting for that lock will also be blocked.
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PULL v2 49/50] libvhost-user: Claim support for postcopy
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (48 preceding siblings ...)
  2018-03-20  3:18 ` [Qemu-devel] [PULL v2 50/50] postcopy shared docs Michael S. Tsirkin
@ 2018-03-20  3:18 ` Michael S. Tsirkin
  2018-03-20 14:18 ` [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Peter Maydell
  50 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Maydell, Dr. David Alan Gilbert, Marc-André Lureau,
	Maxime Coquelin, Peter Xu

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Tell QEMU we understand the protocol features needed for postcopy.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 contrib/libvhost-user/libvhost-user.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
index 504ff5e..beeed0c 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -185,6 +185,35 @@ vmsg_close_fds(VhostUserMsg *vmsg)
     }
 }
 
+/* A test to see if we have userfault available */
+static bool
+have_userfault(void)
+{
+#if defined(__linux__) && defined(__NR_userfaultfd) &&\
+        defined(UFFD_FEATURE_MISSING_SHMEM) &&\
+        defined(UFFD_FEATURE_MISSING_HUGETLBFS)
+    /* Now test the kernel we're running on really has the features */
+    int ufd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+    struct uffdio_api api_struct;
+    if (ufd < 0) {
+        return false;
+    }
+
+    api_struct.api = UFFD_API;
+    api_struct.features = UFFD_FEATURE_MISSING_SHMEM |
+                          UFFD_FEATURE_MISSING_HUGETLBFS;
+    if (ioctl(ufd, UFFDIO_API, &api_struct)) {
+        close(ufd);
+        return false;
+    }
+    close(ufd);
+    return true;
+
+#else
+    return false;
+#endif
+}
+
 static bool
 vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
 {
@@ -939,6 +968,10 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg *vmsg)
     uint64_t features = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
                         1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ;
 
+    if (have_userfault()) {
+        features |= 1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT;
+    }
+
     if (dev->iface->get_protocol_features) {
         features |= dev->iface->get_protocol_features(dev);
     }
-- 
MST

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
                   ` (49 preceding siblings ...)
  2018-03-20  3:18 ` [Qemu-devel] [PULL v2 49/50] libvhost-user: Claim support for postcopy Michael S. Tsirkin
@ 2018-03-20 14:18 ` Peter Maydell
  2018-03-20 14:37   ` Michael S. Tsirkin
  2018-03-20 15:05   ` Michael S. Tsirkin
  50 siblings, 2 replies; 69+ messages in thread
From: Peter Maydell @ 2018-03-20 14:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: QEMU Developers

On 20 March 2018 at 03:16, Michael S. Tsirkin <mst@redhat.com> wrote:
> Changes from v1:
> - dropped include change for one generated file - proposed a tree-wide refactoring
> - dropped vhost used slot refactoring due to alignment issues found by clang
> - added vhost-user post-copy support
>
> The following changes since commit 026aaf47c02b79036feb830206cfebb2a726510d:
>
>   Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' into staging (2018-03-13 16:26:44 +0000)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
>
> for you to fetch changes up to a466e2cdb09d7b1262e24bae8cc47a51550d3af3:
>
>   postcopy shared docs (2018-03-20 05:03:30 +0200)
>
> ----------------------------------------------------------------
> virtio,vhost,pci,pc: features, cleanups
>
> SRAT tables for DIMM devices
> new virtio net flags for speed/duplex
> post-copy migration support in vhost
> cleanups in pci
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>

Link failure on aarch64 host:
  LINK    arm-softmmu/qemu-system-arm
hw/virtio/vhost-user.o: In function `vhost_user_postcopy_fault_handler':
/home/pm215/qemu/hw/virtio/vhost-user.c:1000: undefined reference to
`postcopy_request_shared_page'

I think this host is taking the "don't have userfaultfd" #ifdef
path in postcopy-ram.c, which doesn't define a stub
for the postcopy_request_shared_page function, but there's
no equivalent guard in vhost-user.c

thanks
-- PMM

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20 14:18 ` [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Peter Maydell
@ 2018-03-20 14:37   ` Michael S. Tsirkin
  2018-03-20 15:05   ` Michael S. Tsirkin
  1 sibling, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20 14:37 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On Tue, Mar 20, 2018 at 02:18:33PM +0000, Peter Maydell wrote:
> On 20 March 2018 at 03:16, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Changes from v1:
> > - dropped include change for one generated file - proposed a tree-wide refactoring
> > - dropped vhost used slot refactoring due to alignment issues found by clang
> > - added vhost-user post-copy support
> >
> > The following changes since commit 026aaf47c02b79036feb830206cfebb2a726510d:
> >
> >   Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' into staging (2018-03-13 16:26:44 +0000)
> >
> > are available in the git repository at:
> >
> >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> >
> > for you to fetch changes up to a466e2cdb09d7b1262e24bae8cc47a51550d3af3:
> >
> >   postcopy shared docs (2018-03-20 05:03:30 +0200)
> >
> > ----------------------------------------------------------------
> > virtio,vhost,pci,pc: features, cleanups
> >
> > SRAT tables for DIMM devices
> > new virtio net flags for speed/duplex
> > post-copy migration support in vhost
> > cleanups in pci
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> 
> Link failure on aarch64 host:
>   LINK    arm-softmmu/qemu-system-arm
> hw/virtio/vhost-user.o: In function `vhost_user_postcopy_fault_handler':
> /home/pm215/qemu/hw/virtio/vhost-user.c:1000: undefined reference to
> `postcopy_request_shared_page'
> 
> I think this host is taking the "don't have userfaultfd" #ifdef
> path in postcopy-ram.c, which doesn't define a stub
> for the postcopy_request_shared_page function, but there's
> no equivalent guard in vhost-user.c
> 
> thanks
> -- PMM

Thanks for debugging this!

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20 14:18 ` [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Peter Maydell
  2018-03-20 14:37   ` Michael S. Tsirkin
@ 2018-03-20 15:05   ` Michael S. Tsirkin
  2018-03-20 15:41     ` Peter Maydell
  1 sibling, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20 15:05 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On Tue, Mar 20, 2018 at 02:18:33PM +0000, Peter Maydell wrote:
> On 20 March 2018 at 03:16, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Changes from v1:
> > - dropped include change for one generated file - proposed a tree-wide refactoring
> > - dropped vhost used slot refactoring due to alignment issues found by clang
> > - added vhost-user post-copy support
> >
> > The following changes since commit 026aaf47c02b79036feb830206cfebb2a726510d:
> >
> >   Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' into staging (2018-03-13 16:26:44 +0000)
> >
> > are available in the git repository at:
> >
> >   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream
> >
> > for you to fetch changes up to a466e2cdb09d7b1262e24bae8cc47a51550d3af3:
> >
> >   postcopy shared docs (2018-03-20 05:03:30 +0200)
> >
> > ----------------------------------------------------------------
> > virtio,vhost,pci,pc: features, cleanups
> >
> > SRAT tables for DIMM devices
> > new virtio net flags for speed/duplex
> > post-copy migration support in vhost
> > cleanups in pci
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> 
> Link failure on aarch64 host:
>   LINK    arm-softmmu/qemu-system-arm
> hw/virtio/vhost-user.o: In function `vhost_user_postcopy_fault_handler':
> /home/pm215/qemu/hw/virtio/vhost-user.c:1000: undefined reference to
> `postcopy_request_shared_page'
> 
> I think this host is taking the "don't have userfaultfd" #ifdef
> path in postcopy-ram.c, which doesn't define a stub
> for the postcopy_request_shared_page function, but there's
> no equivalent guard in vhost-user.c
> 
> thanks
> -- PMM

I'm curious why does it take that path though.
Here's the rule

#if defined(__linux__) && defined(__NR_userfaultfd) && defined(CONFIG_EVENTFD)

If it's Linux we'll pick our own copy of the header
and so __NR_userfaultfd is defined.

CONFIG_EVENTFD is not set?

Actually vhost-user does not work without CONFIG_EVENTFD either.
Maybe we should skip building it in that configuration.

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20 15:05   ` Michael S. Tsirkin
@ 2018-03-20 15:41     ` Peter Maydell
  2018-03-20 15:51       ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Maydell @ 2018-03-20 15:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: QEMU Developers

On 20 March 2018 at 15:05, Michael S. Tsirkin <mst@redhat.com> wrote:
> I'm curious why does it take that path though.
> Here's the rule
>
> #if defined(__linux__) && defined(__NR_userfaultfd) && defined(CONFIG_EVENTFD)
>
> If it's Linux we'll pick our own copy of the header
> and so __NR_userfaultfd is defined.
>
> CONFIG_EVENTFD is not set?

CONFIG_EVENTFD is set. __NR_userfaultfd is not. It isn't
defined in the system headers, and it's not defined in
our linux-headers/asm-arm64/. Only x86, s390, powerpc
and arm (32-bit) define it. I suspect the build would also
fail on MIPS if I had a mips host in the build set.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20 15:41     ` Peter Maydell
@ 2018-03-20 15:51       ` Michael S. Tsirkin
  2018-03-20 15:54         ` Peter Maydell
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20 15:51 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On Tue, Mar 20, 2018 at 03:41:54PM +0000, Peter Maydell wrote:
> On 20 March 2018 at 15:05, Michael S. Tsirkin <mst@redhat.com> wrote:
> > I'm curious why does it take that path though.
> > Here's the rule
> >
> > #if defined(__linux__) && defined(__NR_userfaultfd) && defined(CONFIG_EVENTFD)
> >
> > If it's Linux we'll pick our own copy of the header
> > and so __NR_userfaultfd is defined.
> >
> > CONFIG_EVENTFD is not set?
> 
> CONFIG_EVENTFD is set. __NR_userfaultfd is not. It isn't
> defined in the system headers, and it's not defined in
> our linux-headers/asm-arm64/. Only x86, s390, powerpc
> and arm (32-bit) define it. I suspect the build would also
> fail on MIPS if I had a mips host in the build set.
> 
> thanks
> -- PMM

Let's update headers for arm and mips then?

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20 15:51       ` Michael S. Tsirkin
@ 2018-03-20 15:54         ` Peter Maydell
  2018-03-20 16:02           ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Maydell @ 2018-03-20 15:54 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: QEMU Developers

On 20 March 2018 at 15:51, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Mar 20, 2018 at 03:41:54PM +0000, Peter Maydell wrote:
>> CONFIG_EVENTFD is set. __NR_userfaultfd is not. It isn't
>> defined in the system headers, and it's not defined in
>> our linux-headers/asm-arm64/. Only x86, s390, powerpc
>> and arm (32-bit) define it. I suspect the build would also
>> fail on MIPS if I had a mips host in the build set.

> Let's update headers for arm and mips then?

Shouldn't that happen automatically?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20 15:54         ` Peter Maydell
@ 2018-03-20 16:02           ` Michael S. Tsirkin
  2018-03-20 17:18             ` Peter Maydell
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20 16:02 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On Tue, Mar 20, 2018 at 03:54:47PM +0000, Peter Maydell wrote:
> On 20 March 2018 at 15:51, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Mar 20, 2018 at 03:41:54PM +0000, Peter Maydell wrote:
> >> CONFIG_EVENTFD is set. __NR_userfaultfd is not. It isn't
> >> defined in the system headers, and it's not defined in
> >> our linux-headers/asm-arm64/. Only x86, s390, powerpc
> >> and arm (32-bit) define it. I suspect the build would also
> >> fail on MIPS if I had a mips host in the build set.
> 
> > Let's update headers for arm and mips then?
> 
> Shouldn't that happen automatically?
> 
> thanks
> -- PMM

And apparently it does for arm:
linux-headers/asm-arm/unistd-common.h has __NR_userfaultfd.

What's the story for arm64 and mips?


-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20 16:02           ` Michael S. Tsirkin
@ 2018-03-20 17:18             ` Peter Maydell
  2018-03-20 17:22               ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Maydell @ 2018-03-20 17:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: QEMU Developers

On 20 March 2018 at 16:02, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Mar 20, 2018 at 03:54:47PM +0000, Peter Maydell wrote:
>> > Let's update headers for arm and mips then?
>>
>> Shouldn't that happen automatically?

> And apparently it does for arm:
> linux-headers/asm-arm/unistd-common.h has __NR_userfaultfd.
>
> What's the story for arm64 and mips?

arm64 uses the generic syscall numbering (as should pretty
much any new architecture port for Linux). That means its
unistd.h just #includes the asm-generic one. QEMU's script
update-linux-headers.sh isn't syncing asm-generic/unistd.h,
though. That means that the #include in linux-headers/asm-arm64/unistd.h
picks up whatever the host system's asm-generic/unistd.h
is. In this instance the build system had a version of that
header that predated __NR_userfaultfd.

For mips, update-linux-headers.sh has it on a blacklist:
    # Blacklist architectures which have KVM headers but are actually dead
    if [ "$arch" = "ia64" -o "$arch" = "mips" ]; then
        continue
    fi

and has done since 1842bdfdbac2ec46 when we started syncing
unistd.h. That means that any updates to linux-headers/mips
would need to be done manually, but in fact we have not done
any of those, so we still have 2015's headers, which predate
__NR_userfaultfd.

So we should:
 (1) make update-headers.sh sync asm-generic/unistd.h
  -- looks like this will also require us to sync
     bitsperlong.h for all archs and the asm-generic copy

 (2) reinvestigate whatever the "extra header inclusion"
     issues are with mips so we can have the update script
     properly sync the mips headers too

Incidentally we can drop the "blacklist ia64" code, because
kernels these days don't have KVM headers for ia64 and
so the generic "skip archs with no KVM support" code will
make us skip ia64.

PS: migration/postcopy-ram.c isn't KVM-specific, so it's
a little odd of it to be relying on header files that we
only copy for KVM-supporting host architectures. That
means you need to cope with __NR_userfaultfd not being
defined anyway, in case you're on a host which doesn't
support KVM and we've ended up falling back to the system
includes.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups
  2018-03-20 17:18             ` Peter Maydell
@ 2018-03-20 17:22               ` Michael S. Tsirkin
  0 siblings, 0 replies; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20 17:22 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On Tue, Mar 20, 2018 at 05:18:22PM +0000, Peter Maydell wrote:
> On 20 March 2018 at 16:02, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Mar 20, 2018 at 03:54:47PM +0000, Peter Maydell wrote:
> >> > Let's update headers for arm and mips then?
> >>
> >> Shouldn't that happen automatically?
> 
> > And apparently it does for arm:
> > linux-headers/asm-arm/unistd-common.h has __NR_userfaultfd.
> >
> > What's the story for arm64 and mips?
> 
> arm64 uses the generic syscall numbering (as should pretty
> much any new architecture port for Linux). That means its
> unistd.h just #includes the asm-generic one. QEMU's script
> update-linux-headers.sh isn't syncing asm-generic/unistd.h,
> though. That means that the #include in linux-headers/asm-arm64/unistd.h
> picks up whatever the host system's asm-generic/unistd.h
> is. In this instance the build system had a version of that
> header that predated __NR_userfaultfd.
> 
> For mips, update-linux-headers.sh has it on a blacklist:
>     # Blacklist architectures which have KVM headers but are actually dead
>     if [ "$arch" = "ia64" -o "$arch" = "mips" ]; then
>         continue
>     fi
> 
> and has done since 1842bdfdbac2ec46 when we started syncing
> unistd.h. That means that any updates to linux-headers/mips
> would need to be done manually, but in fact we have not done
> any of those, so we still have 2015's headers, which predate
> __NR_userfaultfd.
> 
> So we should:
>  (1) make update-headers.sh sync asm-generic/unistd.h
>   -- looks like this will also require us to sync
>      bitsperlong.h for all archs and the asm-generic copy
> 
>  (2) reinvestigate whatever the "extra header inclusion"
>      issues are with mips so we can have the update script
>      properly sync the mips headers too
> 
> Incidentally we can drop the "blacklist ia64" code, because
> kernels these days don't have KVM headers for ia64 and
> so the generic "skip archs with no KVM support" code will
> make us skip ia64.
> PS: migration/postcopy-ram.c isn't KVM-specific, so it's
> a little odd of it to be relying on header files that we
> only copy for KVM-supporting host architectures. That
> means you need to cope with __NR_userfaultfd not being
> defined anyway, in case you're on a host which doesn't
> support KVM and we've ended up falling back to the system
> includes.
> 
> thanks
> -- PMM

I sent some patches to try to clean all that up.  I kept ia64
blacklisted from kvm for now as the patchset is already large,
but I limited the effect to kvm only. Would be
easy to drop that test as a patch on top.

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files Michael S. Tsirkin
@ 2018-04-13  7:21   ` Markus Armbruster
  2018-04-13 10:04     ` Marc-André Lureau
  0 siblings, 1 reply; 69+ messages in thread
From: Markus Armbruster @ 2018-04-13  7:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Peter Maydell, Gerd Hoffmann, Marc-André Lureau

"Michael S. Tsirkin" <mst@redhat.com> writes:

> This is helpful for automatic code analysis.

Out of curiosity: how?

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  Makefile | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 677a54b..f799390 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1045,6 +1045,9 @@ endif
>  include $(SRC_PATH)/tests/docker/Makefile.include
>  include $(SRC_PATH)/tests/vm/Makefile.include
>  
> +printgen:
> +	@echo $(GENERATED_FILES)
> +
>  .PHONY: help
>  help:
>  	@echo  'Generic targets:'

I tried to answer my question myself by looking up the thread where this
patch was posted for review.  I can't find any.  Am I blind?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files
  2018-04-13  7:21   ` Markus Armbruster
@ 2018-04-13 10:04     ` Marc-André Lureau
  2018-04-13 12:51       ` Michael S. Tsirkin
  0 siblings, 1 reply; 69+ messages in thread
From: Marc-André Lureau @ 2018-04-13 10:04 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Michael S. Tsirkin, qemu-devel, Peter Maydell, Gerd Hoffmann

Hi

On Fri, Apr 13, 2018 at 9:21 AM, Markus Armbruster <armbru@redhat.com> wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
>> This is helpful for automatic code analysis.
>
> Out of curiosity: how?
>
>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>  Makefile | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/Makefile b/Makefile
>> index 677a54b..f799390 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -1045,6 +1045,9 @@ endif
>>  include $(SRC_PATH)/tests/docker/Makefile.include
>>  include $(SRC_PATH)/tests/vm/Makefile.include
>>
>> +printgen:
>> +     @echo $(GENERATED_FILES)
>> +
>>  .PHONY: help
>>  help:
>>       @echo  'Generic targets:'
>
> I tried to answer my question myself by looking up the thread where this
> patch was posted for review.  I can't find any.  Am I blind?

And you could also "make print-GENERATED_FILES" already

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files
  2018-04-13 10:04     ` Marc-André Lureau
@ 2018-04-13 12:51       ` Michael S. Tsirkin
  2018-05-04  5:44         ` Markus Armbruster
  0 siblings, 1 reply; 69+ messages in thread
From: Michael S. Tsirkin @ 2018-04-13 12:51 UTC (permalink / raw)
  To: Marc-André Lureau
  Cc: Markus Armbruster, qemu-devel, Peter Maydell, Gerd Hoffmann

On Fri, Apr 13, 2018 at 12:04:20PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Apr 13, 2018 at 9:21 AM, Markus Armbruster <armbru@redhat.com> wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> >
> >> This is helpful for automatic code analysis.
> >
> > Out of curiosity: how?
> >
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> ---
> >>  Makefile | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/Makefile b/Makefile
> >> index 677a54b..f799390 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -1045,6 +1045,9 @@ endif
> >>  include $(SRC_PATH)/tests/docker/Makefile.include
> >>  include $(SRC_PATH)/tests/vm/Makefile.include
> >>
> >> +printgen:
> >> +     @echo $(GENERATED_FILES)
> >> +
> >>  .PHONY: help
> >>  help:
> >>       @echo  'Generic targets:'
> >
> > I tried to answer my question myself by looking up the thread where this
> > patch was posted for review.  I can't find any.  Am I blind?

That was a development patch that sneaked in. Sorry. I'll revert.

> 
> And you could also "make print-GENERATED_FILES" already

Nice tip, thanks for that.

-- 
MST

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 31/50] vhost+postcopy: Register shared ufd with postcopy
  2018-03-20  3:17 ` [Qemu-devel] [PULL v2 31/50] vhost+postcopy: Register shared ufd with postcopy Michael S. Tsirkin
@ 2018-04-27 16:12   ` Peter Maydell
  2018-05-02 10:58     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Maydell @ 2018-04-27 16:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: QEMU Developers, Dr. David Alan Gilbert, Marc-André Lureau

On 20 March 2018 at 03:17, Michael S. Tsirkin <mst@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Register the UFD that comes in as the response to the 'advise' method
> with the postcopy code.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/virtio/vhost-user.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)

> @@ -835,8 +847,14 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>          error_setg(errp, "%s: Failed to get ufd", __func__);
>          return -1;
>      }
> +    fcntl(ufd, F_SETFL, O_NONBLOCK);

Hi; this would probably be more neatly done with
       qemu_set_nonblock(ufd);
unless you really wanted to clear the other fd flags.
Among other things, it avoids Coverity producing a complaint
that we didn't check the fcntl return value (though we seem
to assume it can't fail in general, hence qemu_set_nonblock()
returning NULL.) -- CID1390601, which I've marked as false-positive.

> -    /* TODO: register ufd with userfault thread */
> +    /* register ufd with userfault thread */
> +    u->postcopy_fd.fd = ufd;
> +    u->postcopy_fd.data = dev;
> +    u->postcopy_fd.handler = vhost_user_postcopy_fault_handler;
> +    u->postcopy_fd.idstr = "vhost-user"; /* Need to find unique name */
> +    postcopy_register_shared_ufd(&u->postcopy_fd);
>      return 0;
>  }

thanks
-- PMM

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 31/50] vhost+postcopy: Register shared ufd with postcopy
  2018-04-27 16:12   ` Peter Maydell
@ 2018-05-02 10:58     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 69+ messages in thread
From: Dr. David Alan Gilbert @ 2018-05-02 10:58 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Michael S. Tsirkin, QEMU Developers, Marc-André Lureau

* Peter Maydell (peter.maydell@linaro.org) wrote:
> On 20 March 2018 at 03:17, Michael S. Tsirkin <mst@redhat.com> wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >
> > Register the UFD that comes in as the response to the 'advise' method
> > with the postcopy code.
> >
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/virtio/vhost-user.c | 20 +++++++++++++++++++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> > @@ -835,8 +847,14 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
> >          error_setg(errp, "%s: Failed to get ufd", __func__);
> >          return -1;
> >      }
> > +    fcntl(ufd, F_SETFL, O_NONBLOCK);
> 
> Hi; this would probably be more neatly done with
>        qemu_set_nonblock(ufd);
> unless you really wanted to clear the other fd flags.
> Among other things, it avoids Coverity producing a complaint
> that we didn't check the fcntl return value (though we seem
> to assume it can't fail in general, hence qemu_set_nonblock()
> returning NULL.) -- CID1390601, which I've marked as false-positive.

Fix posted.  To be honest, I probably hadn't realised/forgot that
this would nuke all the other flags.  I bet some of the others uses
are the same, and may be losing important flags like noexec.

Dave

> > -    /* TODO: register ufd with userfault thread */
> > +    /* register ufd with userfault thread */
> > +    u->postcopy_fd.fd = ufd;
> > +    u->postcopy_fd.data = dev;
> > +    u->postcopy_fd.handler = vhost_user_postcopy_fault_handler;
> > +    u->postcopy_fd.idstr = "vhost-user"; /* Need to find unique name */
> > +    postcopy_register_shared_ufd(&u->postcopy_fd);
> >      return 0;
> >  }
> 
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files
  2018-04-13 12:51       ` Michael S. Tsirkin
@ 2018-05-04  5:44         ` Markus Armbruster
  0 siblings, 0 replies; 69+ messages in thread
From: Markus Armbruster @ 2018-05-04  5:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Marc-André Lureau, Peter Maydell, Gerd Hoffmann, qemu-devel

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Fri, Apr 13, 2018 at 12:04:20PM +0200, Marc-André Lureau wrote:
>> Hi
>> 
>> On Fri, Apr 13, 2018 at 9:21 AM, Markus Armbruster <armbru@redhat.com> wrote:
>> > "Michael S. Tsirkin" <mst@redhat.com> writes:
>> >
>> >> This is helpful for automatic code analysis.
>> >
>> > Out of curiosity: how?
>> >
>> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> >> ---
>> >>  Makefile | 3 +++
>> >>  1 file changed, 3 insertions(+)
>> >>
>> >> diff --git a/Makefile b/Makefile
>> >> index 677a54b..f799390 100644
>> >> --- a/Makefile
>> >> +++ b/Makefile
>> >> @@ -1045,6 +1045,9 @@ endif
>> >>  include $(SRC_PATH)/tests/docker/Makefile.include
>> >>  include $(SRC_PATH)/tests/vm/Makefile.include
>> >>
>> >> +printgen:
>> >> +     @echo $(GENERATED_FILES)
>> >> +
>> >>  .PHONY: help
>> >>  help:
>> >>       @echo  'Generic targets:'
>> >
>> > I tried to answer my question myself by looking up the thread where this
>> > patch was posted for review.  I can't find any.  Am I blind?
>
> That was a development patch that sneaked in. Sorry. I'll revert.

Since you haven't, I took the liberty to post the revert myself.

>> And you could also "make print-GENERATED_FILES" already
>
> Nice tip, thanks for that.

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2018-05-04  5:44 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20  3:16 [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Michael S. Tsirkin
2018-03-20  3:16 ` [Qemu-devel] [PULL v2 01/50] scripts/update-linux-headers: add ethtool.h and update to 4.16.0-rc4 Michael S. Tsirkin
2018-03-20  3:16   ` [virtio-dev] " Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 02/50] virtio-net: use 64-bit values for feature flags Michael S. Tsirkin
2018-03-20  3:17   ` [virtio-dev] " Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 03/50] virtio-net: add linkspeed and duplex settings to virtio-net Michael S. Tsirkin
2018-03-20  3:17   ` [virtio-dev] " Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 04/50] acpi: remove unused acpi-dsdt.aml Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 05/50] pc: replace pm object initialization with one-liner in acpi_get_pm_info() Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 06/50] acpi: reuse AcpiGenericAddress instead of Acpi20GenericAddress Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 07/50] acpi: add build_append_gas() helper for Generic Address Structure Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 08/50] acpi: move ACPI_PORT_SMI_CMD define to header it belongs to Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 09/50] pc: acpi: isolate FADT specific data into AcpiFadtData structure Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 10/50] pc: acpi: use build_append_foo() API to construct FADT Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 11/50] acpi: move build_fadt() from i386 specific to generic ACPI source Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 12/50] virt_arm: acpi: reuse common build_fadt() Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 13/50] tests: acpi: don't read all fields in test_acpi_fadt_table() Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 14/50] standard-headers: update virtio_net.h Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 15/50] hw/pci: remove obsolete PCIDevice->init() Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 16/50] pc-dimm: make qmp_pc_dimm_device_list() sort devices by address Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 17/50] qmp: distinguish PC-DIMM and NVDIMM in MemoryDeviceInfoList Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 18/50] hw/acpi-build: build SRAT memory affinity structures for DIMM devices Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 19/50] tests/bios-tables-test: add test cases for DIMM proximity Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 20/50] test/acpi-test-data: add ACPI tables for dimmpxm test Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 21/50] Makefile: add target to print generated files Michael S. Tsirkin
2018-04-13  7:21   ` Markus Armbruster
2018-04-13 10:04     ` Marc-André Lureau
2018-04-13 12:51       ` Michael S. Tsirkin
2018-05-04  5:44         ` Markus Armbruster
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 22/50] migrate: Update ram_block_discard_range for shared Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 23/50] qemu_ram_block_host_offset Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 24/50] postcopy: use UFFDIO_ZEROPAGE only when available Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 25/50] postcopy: Add notifier chain Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 27/50] vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 26/50] postcopy: Add vhost-user flag for postcopy and check it Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 28/50] libvhost-user: Support sending fds back to qemu Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 29/50] libvhost-user: Open userfaultfd Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 30/50] postcopy: Allow registering of fd handler Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 31/50] vhost+postcopy: Register shared ufd with postcopy Michael S. Tsirkin
2018-04-27 16:12   ` Peter Maydell
2018-05-02 10:58     ` Dr. David Alan Gilbert
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 32/50] vhost+postcopy: Transmit 'listen' to slave Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 33/50] postcopy+vhost-user: Split set_mem_table for postcopy Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 34/50] migration/ram: ramblock_recv_bitmap_test_byte_offset Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 35/50] libvhost-user+postcopy: Register new regions with the ufd Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 37/50] vhost+postcopy: Stash RAMBlock and offset Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 36/50] vhost+postcopy: Send address back to qemu Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 39/50] vhost+postcopy: Resolve client address Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 38/50] vhost+postcopy: Helper to send requests to source for shared pages Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 41/50] postcopy: postcopy_notify_shared_wake Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 40/50] postcopy: helper for waking shared Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 44/50] libvhost-user: mprotect & madvises for postcopy Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 42/50] vhost+postcopy: Add vhost waker Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 43/50] vhost+postcopy: Call wakeups Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 46/50] vhost+postcopy: Wire up POSTCOPY_END notify Michael S. Tsirkin
2018-03-20  3:17 ` [Qemu-devel] [PULL v2 45/50] vhost-user: Add VHOST_USER_POSTCOPY_END message Michael S. Tsirkin
2018-03-20  3:18 ` [Qemu-devel] [PULL v2 47/50] vhost: Huge page align and merge Michael S. Tsirkin
2018-03-20  3:18 ` [Qemu-devel] [PULL v2 48/50] postcopy: Allow shared memory Michael S. Tsirkin
2018-03-20  3:18 ` [Qemu-devel] [PULL v2 50/50] postcopy shared docs Michael S. Tsirkin
2018-03-20  3:18 ` [Qemu-devel] [PULL v2 49/50] libvhost-user: Claim support for postcopy Michael S. Tsirkin
2018-03-20 14:18 ` [Qemu-devel] [PULL v2 00/50] virtio, vhost, pci, pc: features, cleanups Peter Maydell
2018-03-20 14:37   ` Michael S. Tsirkin
2018-03-20 15:05   ` Michael S. Tsirkin
2018-03-20 15:41     ` Peter Maydell
2018-03-20 15:51       ` Michael S. Tsirkin
2018-03-20 15:54         ` Peter Maydell
2018-03-20 16:02           ` Michael S. Tsirkin
2018-03-20 17:18             ` Peter Maydell
2018-03-20 17:22               ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.