Add pcie dvsec extended capability id along with helper macros to retrieve information from the headers. https://members.pcisig.com/wg/PCI-SIG/document/12335 Signed-off-by: David E. Box <david.e.box@linux.intel.com> --- include/uapi/linux/pci_regs.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index f9701410d3b5..c96f08d1e711 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -720,6 +720,7 @@ #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Desinated Vendor-Specific */ #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT @@ -1062,6 +1063,10 @@ #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */ +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */ + /* Data Link Feature */ #define PCI_DLF_CAP 0x04 /* Capabilities Register */ #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ -- 2.20.1
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring facilities. PMT supports multiple types of monitoring capabilities. Capabilities are discovered using PCIe DVSEC with the Intel VID. Each capability is discovered as a separate DVSEC instance in a device's config space. This driver uses MFD to manage the creation of platform devices for each type so that they may be controlled by their own drivers (to be introduced). Support is included for the 3 current capability types, Telemetry, Watcher, and Crashlog. The features are available on new Intel platforms starting from Tiger Lake for which support is added. Tiger Lake however will not support Watcher and Crashlog even though the capabilities appear on the device. So add a quirk facility and use it to disable them. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> --- MAINTAINERS | 5 ++ drivers/mfd/Kconfig | 10 +++ drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 174 ++++++++++++++++++++++++++++++++++++ include/linux/intel-dvsec.h | 44 +++++++++ 5 files changed, 234 insertions(+) create mode 100644 drivers/mfd/intel_pmt.c create mode 100644 include/linux/intel-dvsec.h diff --git a/MAINTAINERS b/MAINTAINERS index e64e5db31497..bacf7ecd4d21 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8783,6 +8783,11 @@ S: Maintained F: arch/x86/include/asm/intel_telemetry.h F: drivers/platform/x86/intel_telemetry* +INTEL PMT DRIVER +M: "David E. Box" <david.e.box@linux.intel.com> +S: Maintained +F: drivers/mfd/intel_pmt.c + INTEL UNCORE FREQUENCY CONTROL M: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> L: platform-driver-x86@vger.kernel.org diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig index 0a59249198d3..c673031acdf1 100644 --- a/drivers/mfd/Kconfig +++ b/drivers/mfd/Kconfig @@ -632,6 +632,16 @@ config MFD_INTEL_MSIC Passage) chip. This chip embeds audio, battery, GPIO, etc. devices used in Intel Medfield platforms. +config MFD_INTEL_PMT + tristate "Intel Platform Monitoring Technology support" + depends on PCI + select MFD_CORE + help + The Intel Platform Monitoring Technology (PMT) is an interface that + provides access to hardware monitor registers. This driver supports + Telemetry, Watcher, and Crashlog PTM capabilities/devices for + platforms starting from Tiger Lake. + config MFD_IPAQ_MICRO bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support" depends on SA1100_H3100 || SA1100_H3600 diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile index f935d10cbf0f..0041f673faa1 100644 --- a/drivers/mfd/Makefile +++ b/drivers/mfd/Makefile @@ -212,6 +212,7 @@ obj-$(CONFIG_MFD_INTEL_LPSS) += intel-lpss.o obj-$(CONFIG_MFD_INTEL_LPSS_PCI) += intel-lpss-pci.o obj-$(CONFIG_MFD_INTEL_LPSS_ACPI) += intel-lpss-acpi.o obj-$(CONFIG_MFD_INTEL_MSIC) += intel_msic.o +obj-$(CONFIG_MFD_INTEL_PMT) += intel_pmt.o obj-$(CONFIG_MFD_PALMAS) += palmas.o obj-$(CONFIG_MFD_VIPERBOARD) += viperboard.o obj-$(CONFIG_MFD_RC5T583) += rc5t583.o rc5t583-irq.o diff --git a/drivers/mfd/intel_pmt.c b/drivers/mfd/intel_pmt.c new file mode 100644 index 000000000000..c48a2b82ca99 --- /dev/null +++ b/drivers/mfd/intel_pmt.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitoring Technology MFD driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Authors: David E. Box <david.e.box@linux.intel.com> + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/pm.h> +#include <linux/pm_runtime.h> +#include <linux/mfd/core.h> +#include <linux/intel-dvsec.h> + +#define TELEM_DEV_NAME "pmt_telemetry" +#define WATCHER_DEV_NAME "pmt_watcher" +#define CRASHLOG_DEV_NAME "pmt_crashlog" + +static const struct pmt_platform_info tgl_info = { + .quirks = PMT_QUIRK_NO_WATCHER | PMT_QUIRK_NO_CRASHLOG, +}; + +static int +pmt_add_dev(struct pci_dev *pdev, struct intel_dvsec_header *header, + struct pmt_platform_info *info) +{ + struct mfd_cell *cell, *tmp; + const char *name; + int i; + + switch (header->id) { + case DVSEC_INTEL_ID_TELEM: + name = TELEM_DEV_NAME; + break; + case DVSEC_INTEL_ID_WATCHER: + if (info->quirks && PMT_QUIRK_NO_WATCHER) { + dev_info(&pdev->dev, "Watcher not supported\n"); + return 0; + } + name = WATCHER_DEV_NAME; + break; + case DVSEC_INTEL_ID_CRASHLOG: + if (info->quirks && PMT_QUIRK_NO_WATCHER) { + dev_info(&pdev->dev, "Crashlog not supported\n"); + return 0; + } + name = CRASHLOG_DEV_NAME; + break; + default: + return -EINVAL; + } + + cell = devm_kcalloc(&pdev->dev, header->num_entries, + sizeof(*cell), GFP_KERNEL); + if (!cell) + return -ENOMEM; + + /* Create a platform device for each entry. */ + for (i = 0, tmp = cell; i < header->num_entries; i++, tmp++) { + struct resource *res; + + res = devm_kzalloc(&pdev->dev, sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + tmp->name = name; + + res->start = pdev->resource[header->tbir].start + + header->offset + + (i * (INTEL_DVSEC_ENTRY_SIZE << 2)); + res->end = res->start + (header->entry_size << 2) - 1; + res->flags = IORESOURCE_MEM; + + tmp->resources = res; + tmp->num_resources = 1; + tmp->platform_data = header; + tmp->pdata_size = sizeof(*header); + + } + + return devm_mfd_add_devices(&pdev->dev, PLATFORM_DEVID_AUTO, cell, + header->num_entries, NULL, 0, NULL); +} + +static int +pmt_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + u16 vid; + u32 table; + int ret, pos = 0, last_pos = 0; + struct pmt_platform_info *info; + struct intel_dvsec_header header; + + ret = pcim_enable_device(pdev); + if (ret) + return ret; + + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data, sizeof(*info), + GFP_KERNEL); + + if (!info) + return -ENOMEM; + + while ((pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC))) { + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, &vid); + if (vid != PCI_VENDOR_ID_INTEL) + continue; + + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER2, + &header.id); + + pci_read_config_byte(pdev, pos + INTEL_DVSEC_ENTRIES, + &header.num_entries); + + pci_read_config_byte(pdev, pos + INTEL_DVSEC_SIZE, + &header.entry_size); + + if (!header.num_entries || !header.entry_size) + return -EINVAL; + + pci_read_config_dword(pdev, pos + INTEL_DVSEC_TABLE, + &table); + + header.tbir = INTEL_DVSEC_TABLE_BAR(table); + header.offset = INTEL_DVSEC_TABLE_OFFSET(table); + ret = pmt_add_dev(pdev, &header, info); + if (ret) + dev_warn(&pdev->dev, + "Failed to add devices for DVSEC id %d\n", + header.id); + last_pos = pos; + } + + if (!last_pos) { + dev_err(&pdev->dev, "No supported PMT capabilities found.\n"); + return -ENODEV; + } + + pm_runtime_put(&pdev->dev); + pm_runtime_allow(&pdev->dev); + + return 0; +} + +static void pmt_pci_remove(struct pci_dev *pdev) +{ + pm_runtime_forbid(&pdev->dev); + pm_runtime_get_sync(&pdev->dev); +} + +static const struct pci_device_id pmt_pci_ids[] = { + /* TGL */ + { PCI_VDEVICE(INTEL, 0x9a0d), (kernel_ulong_t)&tgl_info }, + { } +}; +MODULE_DEVICE_TABLE(pci, pmt_pci_ids); + +static struct pci_driver pmt_pci_driver = { + .name = "intel-pmt", + .id_table = pmt_pci_ids, + .probe = pmt_pci_probe, + .remove = pmt_pci_remove, +}; + +module_pci_driver(pmt_pci_driver); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel Platform Monitoring Technology MFD driver"); +MODULE_LICENSE("GPL v2"); diff --git a/include/linux/intel-dvsec.h b/include/linux/intel-dvsec.h new file mode 100644 index 000000000000..94f606bf8eae --- /dev/null +++ b/include/linux/intel-dvsec.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef INTEL_DVSEC_H +#define INTEL_DVSEC_H + +#include <linux/types.h> + +#define DVSEC_INTEL_ID_TELEM 2 +#define DVSEC_INTEL_ID_WATCHER 3 +#define DVSEC_INTEL_ID_CRASHLOG 4 + +/* Intel DVSEC capability vendor space offsets */ +#define INTEL_DVSEC_ENTRIES 0xA +#define INTEL_DVSEC_SIZE 0xB +#define INTEL_DVSEC_TABLE 0xC +#define INTEL_DVSEC_TABLE_BAR(x) ((x) & GENMASK(2, 0)) +#define INTEL_DVSEC_TABLE_OFFSET(x) ((x) >> 3) + +#define INTEL_DVSEC_ENTRY_SIZE 4 + +/* DVSEC header */ +struct intel_dvsec_header { + u16 length; + u16 id; + u8 num_entries; + u8 entry_size; + u8 entry_max; + u8 tbir; + u32 offset; +}; + +enum pmt_quirks { + /* Watcher capability not supported */ + PMT_QUIRK_NO_WATCHER = (1 << 0), + + /* Crashlog capability not supported */ + PMT_QUIRK_NO_CRASHLOG = (1 << 1), +}; + +struct pmt_platform_info { + unsigned long quirks; + struct intel_dvsec_header **capabilities; +}; + +#endif -- 2.20.1
PMT Telemetry is a capability of the Intel Platform Monitoring Technology. The Telemetry capability provides access to device telemetry metrics that provide hardware performance data to users from continuous, memory mapped, read-only register spaces. Register mappings are not provided by the driver. Instead, a GUID is read from a header for each endpoint. The GUID identifies the device and is to be used with an XML, provided by the vendor, to discover the available set of metrics and their register mapping. This allows firmware updates to modify the register space without needing to update the driver every time with new mappings. Firmware writes a new GUID in this case to specify the new mapping. Software tools with access to the associated XML file can then interpret the changes. This module manages access to all PMT Telemetry endpoints on a system, regardless of the device exporting them. It creates an intel_pmt_telem class to manage the list. For each endpoint, sysfs files provide GUID and size information as well as a pointer to the parent device the telemetry comes from. Software may discover the association between endpoints and devices by iterating through the list in sysfs, or by looking for the existence of the class folder under the device of interest. A device node of the same name allows software to then map the telemetry space for direct access. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> --- .../ABI/testing/sysfs-class-intel_pmt_telem | 46 +++ MAINTAINERS | 1 + drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telem.c | 356 ++++++++++++++++++ drivers/platform/x86/intel_pmt_telem.h | 20 + 6 files changed, 434 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-intel_pmt_telem create mode 100644 drivers/platform/x86/intel_pmt_telem.c create mode 100644 drivers/platform/x86/intel_pmt_telem.h diff --git a/Documentation/ABI/testing/sysfs-class-intel_pmt_telem b/Documentation/ABI/testing/sysfs-class-intel_pmt_telem new file mode 100644 index 000000000000..cdd9a16b31f3 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-intel_pmt_telem @@ -0,0 +1,46 @@ +What: /sys/class/intel_pmt_telem/ +Date: April 2020 +KernelVersion: 5.8 +Contact: David Box <david.e.box@linux.intel.com> +Description: + The intel_pmt_telem/ class directory contains information for + devices that expose hardware telemetry using Intel Platform + Monitoring Technology (PMT) + +What: /sys/class/intel_pmt_telem/telemX +Date: April 2020 +KernelVersion: 5.8 +Contact: David Box <david.e.box@linux.intel.com> +Description: + The telemX directory contains files describing an instance of a + PMT telemetry device that exposes hardware telemetry. Each + telemX device has an associated /dev/telemX node. This node can + be opened and mapped to access the telemetry space of the + device. The register layout of the telemetry space is + determined from an XML file of specific guid for the corresponding + parent device. + +What: /sys/class/intel_pmt_telem/telemX/guid +Date: April 2020 +KernelVersion: 5.8 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The guid for this telemetry device. The guid identifies + the version of the XML file for the parent device that should + be used to determine the register layout. + +What: /sys/class/intel_pmt_telem/telemX/size +Date: April 2020 +KernelVersion: 5.8 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The size of telemetry region in bytes that corresponds to + the mapping size for the /dev/telemX device node. + +What: /sys/class/intel_pmt_telem/telemX/offset +Date: April 2020 +KernelVersion: 5.8 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The offset of telemetry region in bytes that corresponds to + the mapping for the /dev/telemX device node. diff --git a/MAINTAINERS b/MAINTAINERS index bacf7ecd4d21..c49a9d3a28d2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8787,6 +8787,7 @@ INTEL PMT DRIVER M: "David E. Box" <david.e.box@linux.intel.com> S: Maintained F: drivers/mfd/intel_pmt.c +F: drivers/platform/x86/intel_pmt_* INTEL UNCORE FREQUENCY CONTROL M: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 0ad7ad8cf8e1..dd734eb66e74 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -1368,6 +1368,16 @@ config INTEL_TELEMETRY directly via debugfs files. Various tools may use this interface for SoC state monitoring. +config INTEL_PMT_TELEM + tristate "Intel PMT telemetry driver" + help + The Intel Platform Monitory Technology (PMT) Telemetry driver provides + access to hardware telemetry metrics on devices that support the + feature. + + For more information, see + <file:Documentation/ABI/testing/sysfs-class-intel_pmt_telem> + endif # X86_PLATFORM_DEVICES config PMC_ATOM diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index 53408d965874..f37e000ef8cb 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -146,3 +146,4 @@ obj-$(CONFIG_INTEL_TELEMETRY) += intel_telemetry_core.o \ intel_telemetry_pltdrv.o \ intel_telemetry_debugfs.o obj-$(CONFIG_PMC_ATOM) += pmc_atom.o +obj-$(CONFIG_INTEL_PMT_TELEM) += intel_pmt_telem.o diff --git a/drivers/platform/x86/intel_pmt_telem.c b/drivers/platform/x86/intel_pmt_telem.c new file mode 100644 index 000000000000..ae6f867f53fa --- /dev/null +++ b/drivers/platform/x86/intel_pmt_telem.c @@ -0,0 +1,356 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitory Technology Telemetry driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Author: "David E. Box" <david.e.box@linux.intel.com> + */ + +#include <linux/cdev.h> +#include <linux/intel-dvsec.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/uaccess.h> +#include <linux/xarray.h> + +#include "intel_pmt_telem.h" + +/* platform device name to bind to driver */ +#define TELEM_DRV_NAME "pmt_telemetry" + +/* Telemetry access types */ +#define TELEM_ACCESS_FUTURE 1 +#define TELEM_ACCESS_BARID 2 +#define TELEM_ACCESS_LOCAL 3 + +#define TELEM_GUID_OFFSET 0x4 +#define TELEM_BASE_OFFSET 0x8 +#define TELEM_TBIR_MASK 0x7 +#define TELEM_ACCESS(v) ((v) & GENMASK(3, 0)) +#define TELEM_TYPE(v) (((v) & GENMASK(7, 4)) >> 4) +/* size is in bytes */ +#define TELEM_SIZE(v) (((v) & GENMASK(27, 12)) >> 10) + +#define TELEM_XA_START 1 +#define TELEM_XA_MAX INT_MAX +#define TELEM_XA_LIMIT XA_LIMIT(TELEM_XA_START, TELEM_XA_MAX) + +static DEFINE_XARRAY_ALLOC(telem_array); + +struct pmt_telem_priv { + struct device *dev; + struct intel_dvsec_header *dvsec; + struct telem_header header; + unsigned long base_addr; + void __iomem *disc_table; + struct cdev cdev; + dev_t devt; + int devid; +}; + +/* + * devfs + */ +static int pmt_telem_open(struct inode *inode, struct file *filp) +{ + struct pmt_telem_priv *priv; + struct pci_driver *pci_drv; + struct pci_dev *pci_dev; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + priv = container_of(inode->i_cdev, struct pmt_telem_priv, cdev); + pci_dev = to_pci_dev(priv->dev->parent); + + pci_drv = pci_dev_driver(pci_dev); + if (!pci_drv) + return -ENODEV; + + filp->private_data = priv; + get_device(&pci_dev->dev); + + if (!try_module_get(pci_drv->driver.owner)) { + put_device(&pci_dev->dev); + return -ENODEV; + } + + return 0; +} + +static int pmt_telem_release(struct inode *inode, struct file *filp) +{ + struct pmt_telem_priv *priv = filp->private_data; + struct pci_dev *pci_dev = to_pci_dev(priv->dev->parent); + struct pci_driver *pci_drv = pci_dev_driver(pci_dev); + + put_device(&pci_dev->dev); + module_put(pci_drv->driver.owner); + + return 0; +} + +static int pmt_telem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct pmt_telem_priv *priv = filp->private_data; + unsigned long vsize = vma->vm_end - vma->vm_start; + unsigned long phys = priv->base_addr; + unsigned long pfn = PFN_DOWN(phys); + unsigned long psize; + + psize = (PFN_UP(priv->base_addr + priv->header.size) - pfn) * PAGE_SIZE; + if (vsize > psize) { + dev_err(priv->dev, "Requested mmap size is too large\n"); + return -EINVAL; + } + + if ((vma->vm_flags & VM_WRITE) || (vma->vm_flags & VM_MAYWRITE)) + return -EPERM; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + + if (io_remap_pfn_range(vma, vma->vm_start, pfn, vsize, + vma->vm_page_prot)) + return -EINVAL; + + return 0; +} + +static const struct file_operations pmt_telem_fops = { + .owner = THIS_MODULE, + .open = pmt_telem_open, + .mmap = pmt_telem_mmap, + .release = pmt_telem_release, +}; + +/* + * sysfs + */ +static ssize_t guid_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_priv *priv = dev_get_drvdata(dev); + + return sprintf(buf, "0x%x\n", priv->header.guid); +} +static DEVICE_ATTR_RO(guid); + +static ssize_t size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_priv *priv = dev_get_drvdata(dev); + + /* Display buffer size in bytes */ + return sprintf(buf, "%u\n", priv->header.size); +} +static DEVICE_ATTR_RO(size); + +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_priv *priv = dev_get_drvdata(dev); + + /* Display buffer offset in bytes */ + return sprintf(buf, "%lu\n", offset_in_page(priv->base_addr)); +} +static DEVICE_ATTR_RO(offset); + +static struct attribute *pmt_telem_attrs[] = { + &dev_attr_guid.attr, + &dev_attr_size.attr, + &dev_attr_offset.attr, + NULL +}; +ATTRIBUTE_GROUPS(pmt_telem); + +struct class pmt_telem_class = { + .owner = THIS_MODULE, + .name = "intel_pmt_telem", + .dev_groups = pmt_telem_groups, +}; + +/* + * driver initialization + */ +static int pmt_telem_create_dev(struct pmt_telem_priv *priv) +{ + struct device *dev; + int ret; + + cdev_init(&priv->cdev, &pmt_telem_fops); + ret = cdev_add(&priv->cdev, priv->devt, 1); + if (ret) { + dev_err(priv->dev, "Could not add char dev\n"); + return ret; + } + + dev = device_create(&pmt_telem_class, priv->dev, priv->devt, + priv, "telem%d", priv->devid); + if (IS_ERR(dev)) { + dev_err(priv->dev, "Could not create device node\n"); + cdev_del(&priv->cdev); + } + + return PTR_ERR_OR_ZERO(dev); +} + +static void pmt_telem_populate_header(void __iomem *disc_offset, + struct telem_header *header) +{ + header->access_type = TELEM_ACCESS(readb(disc_offset)); + header->telem_type = TELEM_TYPE(readb(disc_offset)); + header->size = TELEM_SIZE(readl(disc_offset)); + header->guid = readl(disc_offset + TELEM_GUID_OFFSET); + header->base_offset = readl(disc_offset + TELEM_BASE_OFFSET); + + /* + * For non-local access types the lower 3 bits of base offset + * contains the index of the base address register where the + * telemetry can be found. + */ + header->tbir = header->base_offset & TELEM_TBIR_MASK; + header->base_offset ^= header->tbir; +} + +static int pmt_telem_probe(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv; + struct pci_dev *parent; + int err; + + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + platform_set_drvdata(pdev, priv); + priv->dev = &pdev->dev; + parent = to_pci_dev(priv->dev->parent); + + /* TODO: replace with device properties??? */ + priv->dvsec = dev_get_platdata(&pdev->dev); + if (!priv->dvsec) { + dev_err(&pdev->dev, "Platform data not found\n"); + return -ENODEV; + } + + /* Remap and access the discovery table header */ + priv->disc_table = devm_platform_ioremap_resource(pdev, 0); + if (IS_ERR(priv->disc_table)) + return PTR_ERR(priv->disc_table); + + pmt_telem_populate_header(priv->disc_table, &priv->header); + + /* Local access and BARID only for now */ + switch (priv->header.access_type) { + case TELEM_ACCESS_LOCAL: + if (priv->header.tbir) { + dev_err(&pdev->dev, + "Unsupported BAR index %d for access type %d\n", + priv->header.tbir, priv->header.access_type); + return -EINVAL; + } + + fallthrough; + + case TELEM_ACCESS_BARID: + break; + default: + dev_err(&pdev->dev, "Unsupported access type %d\n", + priv->header.access_type); + return -EINVAL; + } + + priv->base_addr = pci_resource_start(parent, priv->header.tbir) + + priv->header.base_offset; + + err = alloc_chrdev_region(&priv->devt, 0, 1, TELEM_DRV_NAME); + if (err < 0) { + dev_err(&pdev->dev, + "PMT telemetry chrdev_region err: %d\n", err); + return err; + } + + err = xa_alloc(&telem_array, &priv->devid, priv, TELEM_XA_LIMIT, + GFP_KERNEL); + if (err < 0) + goto fail_xa_alloc; + + err = pmt_telem_create_dev(priv); + if (err < 0) + goto fail_create_dev; + + return 0; + +fail_create_dev: + xa_erase(&telem_array, priv->devid); +fail_xa_alloc: + unregister_chrdev_region(priv->devt, 1); + + return err; +} + +static int pmt_telem_remove(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv = platform_get_drvdata(pdev); + + device_destroy(&pmt_telem_class, priv->devt); + cdev_del(&priv->cdev); + + xa_erase(&telem_array, priv->devid); + unregister_chrdev_region(priv->devt, 1); + + return 0; +} + +static const struct platform_device_id pmt_telem_table[] = { + { + .name = "pmt_telemetry", + }, { + /* sentinel */ + } +}; +MODULE_DEVICE_TABLE(platform, pmt_telem_table); + +static struct platform_driver pmt_telem_driver = { + .driver = { + .name = TELEM_DRV_NAME, + }, + .probe = pmt_telem_probe, + .remove = pmt_telem_remove, + .id_table = pmt_telem_table, +}; + +static int __init pmt_telem_init(void) +{ + int ret = class_register(&pmt_telem_class); + + if (ret) + return ret; + + ret = platform_driver_register(&pmt_telem_driver); + if (ret) + class_unregister(&pmt_telem_class); + + return ret; +} + +static void __exit pmt_telem_exit(void) +{ + platform_driver_unregister(&pmt_telem_driver); + class_unregister(&pmt_telem_class); + xa_destroy(&telem_array); +} + +module_init(pmt_telem_init); +module_exit(pmt_telem_exit); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel PMT Telemetry driver"); +MODULE_ALIAS("platform:" TELEM_DRV_NAME); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/platform/x86/intel_pmt_telem.h b/drivers/platform/x86/intel_pmt_telem.h new file mode 100644 index 000000000000..3c6d1da3dc48 --- /dev/null +++ b/drivers/platform/x86/intel_pmt_telem.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _INTEL_PMT_TELEM_H +#define _INTEL_PMT_TELEM_H + +#include <linux/intel-dvsec.h> + +/* Telemetry types */ +#define PMT_TELEM_TELEMETRY 0 +#define PMT_TELEM_CRASHLOG 1 + +struct telem_header { + u8 access_type; + u8 telem_type; + u16 size; + u32 guid; + u32 base_offset; + u8 tbir; +}; + +#endif -- 2.20.1
On 5/4/20 7:31 PM, David E. Box wrote: > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig > index 0a59249198d3..c673031acdf1 100644 > --- a/drivers/mfd/Kconfig > +++ b/drivers/mfd/Kconfig > @@ -632,6 +632,16 @@ config MFD_INTEL_MSIC > Passage) chip. This chip embeds audio, battery, GPIO, etc. > devices used in Intel Medfield platforms. > > +config MFD_INTEL_PMT > + tristate "Intel Platform Monitoring Technology support" > + depends on PCI > + select MFD_CORE > + help > + The Intel Platform Monitoring Technology (PMT) is an interface that > + provides access to hardware monitor registers. This driver supports > + Telemetry, Watcher, and Crashlog PTM capabilities/devices for What is PTM? > + platforms starting from Tiger Lake. > + > config MFD_IPAQ_MICRO > bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support" > depends on SA1100_H3100 || SA1100_H3600 -- ~Randy
On Tue, May 5, 2020 at 4:32 AM David E. Box <david.e.box@linux.intel.com> wrote: > > Add pcie dvsec extended capability id along with helper macros to pcie -> PCIe dvsec -> DVSEC (but here I'm not sure, what's official abbreviation for this?) > retrieve information from the headers. > https://members.pcisig.com/wg/PCI-SIG/document/12335 Perhaps DocLink: ... (as a tag) > > Signed-off-by: David E. Box <david.e.box@linux.intel.com> > --- > include/uapi/linux/pci_regs.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index f9701410d3b5..c96f08d1e711 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -720,6 +720,7 @@ > #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ > #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ > #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ > +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Desinated Vendor-Specific */ > #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ > #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ > #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT > @@ -1062,6 +1063,10 @@ > #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ > #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ > > +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ > +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */ > +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */ > + > /* Data Link Feature */ > #define PCI_DLF_CAP 0x04 /* Capabilities Register */ > #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ > -- > 2.20.1 > -- With Best Regards, Andy Shevchenko
On Tue, May 5, 2020 at 5:32 AM David E. Box <david.e.box@linux.intel.com> wrote: > > Intel Platform Monitoring Technology (PMT) is an architecture for > enumerating and accessing hardware monitoring facilities. PMT supports > multiple types of monitoring capabilities. Capabilities are discovered > using PCIe DVSEC with the Intel VID. Each capability is discovered as a > separate DVSEC instance in a device's config space. This driver uses MFD to > manage the creation of platform devices for each type so that they may be > controlled by their own drivers (to be introduced). Support is included > for the 3 current capability types, Telemetry, Watcher, and Crashlog. The > features are available on new Intel platforms starting from Tiger Lake for > which support is added. Tiger Lake however will not support Watcher and > Crashlog even though the capabilities appear on the device. So add a quirk > facility and use it to disable them. ... > include/linux/intel-dvsec.h | 44 +++++++++ I guess it's no go for a such header, since we may end up with tons of a such. Perhaps simple pcie-dvsec.h ? ... > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8783,6 +8783,11 @@ S: Maintained > F: arch/x86/include/asm/intel_telemetry.h > F: drivers/platform/x86/intel_telemetry* > > +INTEL PMT DRIVER > +M: "David E. Box" <david.e.box@linux.intel.com> > +S: Maintained > +F: drivers/mfd/intel_pmt.c I believe you forgot to run parse-maintainers.pl --order --input=MAINTAINERS --output=MAINTAINERS ... > + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data, sizeof(*info), > + GFP_KERNEL); > + Extra blank line. > + if (!info) > + return -ENOMEM; > + > + while ((pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC))) { > + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, &vid); > + if (vid != PCI_VENDOR_ID_INTEL) > + continue; Perhaps a candidate for for_each_vendor_cap() macro in pcie-dvsec.h. Or how is it done for the rest of capabilities? > + } ... > +static const struct pci_device_id pmt_pci_ids[] = { > + /* TGL */ > + { PCI_VDEVICE(INTEL, 0x9a0d), (kernel_ulong_t)&tgl_info }, PCI_DEVICE_DATA()? > + { } > +}; -- With Best Regards, Andy Shevchenko
On Tue, May 5, 2020 at 5:32 AM David E. Box <david.e.box@linux.intel.com> wrote: ... > Register mappings are not provided by the driver. Instead, a GUID is read > from a header for each endpoint. The GUID identifies the device and is to > be used with an XML, provided by the vendor, to discover the available set > of metrics and their register mapping. This allows firmware updates to > modify the register space without needing to update the driver every time > with new mappings. Firmware writes a new GUID in this case to specify the > new mapping. Software tools with access to the associated XML file can > then interpret the changes. Is old hardware going to support this in the future? (I have in mind Apollo Lake / Broxton) > This module manages access to all PMT Telemetry endpoints on a system, > regardless of the device exporting them. It creates an intel_pmt_telem Name is not the best we can come up with. Would anyone else use PMT? Would it be vendor-agnostic ABI? (For example, I know that MIPI standardizes tracing protocols, like STM, do we have any plans to standardize this one?) telem -> telemetry. > class to manage the list. For each endpoint, sysfs files provide GUID and > size information as well as a pointer to the parent device the telemetry > comes from. Software may discover the association between endpoints and > devices by iterating through the list in sysfs, or by looking for the > existence of the class folder under the device of interest. A device node > of the same name allows software to then map the telemetry space for direct > access. ... > + tristate "Intel PMT telemetry driver" I think user should understand what is it from the title (hint: spell PMT fully). ... > obj-$(CONFIG_PMC_ATOM) += pmc_atom.o > +obj-$(CONFIG_INTEL_PMT_TELEM) += intel_pmt_telem.o Keep this and Kconfig section in order with the other stuff. ... bits.h? > +#include <linux/cdev.h> > +#include <linux/intel-dvsec.h> > +#include <linux/io-64-nonatomic-lo-hi.h> > +#include <linux/kernel.h> > +#include <linux/module.h> > +#include <linux/pci.h> > +#include <linux/platform_device.h> > +#include <linux/slab.h> > +#include <linux/uaccess.h> > +#include <linux/xarray.h> ... > +/* platform device name to bind to driver */ > +#define TELEM_DRV_NAME "pmt_telemetry" Shouldn't be part of MFD header? ... > +#define TELEM_TBIR_MASK 0x7 GENMASK() ? > +struct pmt_telem_priv { > + struct device *dev; > + struct intel_dvsec_header *dvsec; > + struct telem_header header; > + unsigned long base_addr; > + void __iomem *disc_table; > + struct cdev cdev; > + dev_t devt; > + int devid; > +}; ... > + unsigned long phys = priv->base_addr; > + unsigned long pfn = PFN_DOWN(phys); > + unsigned long psize; > + > + psize = (PFN_UP(priv->base_addr + priv->header.size) - pfn) * PAGE_SIZE; > + if (vsize > psize) { > + dev_err(priv->dev, "Requested mmap size is too large\n"); > + return -EINVAL; > + } ... > +static ssize_t guid_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct pmt_telem_priv *priv = dev_get_drvdata(dev); > + > + return sprintf(buf, "0x%x\n", priv->header.guid); > +} So, it's not a GUID but rather some custom number? Can we actually do a real GUID / UUID here? Because of TODO below I suppose it's not carved in stone (yet) and basically a protocol defined by firmware (which can be amended). ... > + /* TODO: replace with device properties??? */ So, please, fulfill. swnode I guess is what you are looking for. > + priv->dvsec = dev_get_platdata(&pdev->dev); > + if (!priv->dvsec) { > + dev_err(&pdev->dev, "Platform data not found\n"); > + return -ENODEV; > + } ... > + /* Local access and BARID only for now */ > + switch (priv->header.access_type) { > + case TELEM_ACCESS_LOCAL: > + if (priv->header.tbir) { > + dev_err(&pdev->dev, > + "Unsupported BAR index %d for access type %d\n", > + priv->header.tbir, priv->header.access_type); > + return -EINVAL; > + } > + fallthrough; What's the point? > + > + case TELEM_ACCESS_BARID: > + break; > + default: > + dev_err(&pdev->dev, "Unsupported access type %d\n", > + priv->header.access_type); > + return -EINVAL; > + } > + err = alloc_chrdev_region(&priv->devt, 0, 1, TELEM_DRV_NAME); err or ret? Be consistent in the module. > + if (err < 0) { ' < 0' Do we need it? > + dev_err(&pdev->dev, > + "PMT telemetry chrdev_region err: %d\n", err); > + return err; > + } ... > + err = pmt_telem_create_dev(priv); > + if (err < 0) ' < 0' Do we need it? > + goto fail_create_dev; > + > + return 0; > +} ... > +static const struct platform_device_id pmt_telem_table[] = { > + { > + .name = "pmt_telemetry", > + }, { > + /* sentinel */ > + } { .name = ... }, {} is enough. > +}; ... > +static int __init pmt_telem_init(void) > +{ > + int ret = class_register(&pmt_telem_class); > + > + if (ret) int ret; ret = ... if (ret) > + return ret; > + > + ret = platform_driver_register(&pmt_telem_driver); > + if (ret) > + class_unregister(&pmt_telem_class); > + > + return ret; > +} ... > +{ > +} > + Extra blank line. > +module_init(pmt_telem_init); > +module_exit(pmt_telem_exit); Better to attach to the respective functions. ... > +#include <linux/intel-dvsec.h> There is no user of this below, but types.h has users here. > +/* Telemetry types */ > +#define PMT_TELEM_TELEMETRY 0 > +#define PMT_TELEM_CRASHLOG 1 > + > +struct telem_header { > + u8 access_type; If it's part of hardware communication, shouldn't be rather __uXX types to show that this is part of protocol between software and hardware? > + u8 telem_type; > + u16 size; > + u32 guid; > + u32 base_offset; > + u8 tbir; > +}; -- With Best Regards, Andy Shevchenko
On Mon, 2020-05-04 at 19:53 -0700, Randy Dunlap wrote:
> On 5/4/20 7:31 PM, David E. Box wrote:
> > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
> > index 0a59249198d3..c673031acdf1 100644
> > --- a/drivers/mfd/Kconfig
> > +++ b/drivers/mfd/Kconfig
> > @@ -632,6 +632,16 @@ config MFD_INTEL_MSIC
> > Passage) chip. This chip embeds audio, battery, GPIO, etc.
> > devices used in Intel Medfield platforms.
> >
> > +config MFD_INTEL_PMT
> > + tristate "Intel Platform Monitoring Technology support"
> > + depends on PCI
> > + select MFD_CORE
> > + help
> > + The Intel Platform Monitoring Technology (PMT) is an
> > interface that
> > + provides access to hardware monitor registers. This driver
> > supports
> > + Telemetry, Watcher, and Crashlog PTM capabilities/devices for
>
> What is PTM?
s/PTM/PMT
I have the fortune of working on another project involving PCI
Precision Time Management.
On Tue, 2020-05-05 at 11:49 +0300, Andy Shevchenko wrote: > On Tue, May 5, 2020 at 4:32 AM David E. Box < > david.e.box@linux.intel.com> wrote: > > Add pcie dvsec extended capability id along with helper macros to > > pcie -> PCIe > > dvsec -> DVSEC (but here I'm not sure, what's official abbreviation > for this?) Okay. DVSEC is used in the ECN. I'll spell it here out as well. > > > retrieve information from the headers. > > > > https://members.pcisig.com/wg/PCI-SIG/document/12335 > > Perhaps > > DocLink: ... > > (as a tag) Yes. Forgot to add this.
On Tue, 2020-05-05 at 12:02 +0300, Andy Shevchenko wrote: > On Tue, May 5, 2020 at 5:32 AM David E. Box < > david.e.box@linux.intel.com> wrote: > > Intel Platform Monitoring Technology (PMT) is an architecture for > > enumerating and accessing hardware monitoring facilities. PMT > > supports > > multiple types of monitoring capabilities. Capabilities are > > discovered > > using PCIe DVSEC with the Intel VID. Each capability is discovered > > as a > > separate DVSEC instance in a device's config space. This driver > > uses MFD to > > manage the creation of platform devices for each type so that they > > may be > > controlled by their own drivers (to be introduced). Support is > > included > > for the 3 current capability types, Telemetry, Watcher, and > > Crashlog. The > > features are available on new Intel platforms starting from Tiger > > Lake for > > which support is added. Tiger Lake however will not support Watcher > > and > > Crashlog even though the capabilities appear on the device. So add > > a quirk > > facility and use it to disable them. > > ... > > > include/linux/intel-dvsec.h | 44 +++++++++ > > I guess it's no go for a such header, since we may end up with tons > of > a such. Perhaps simple pcie-dvsec.h ? Too general. Nothing in here applies to all PCIE DVSEC capabilities. The file describes only the vendor defined space in a DVSEC region. > > ... > > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -8783,6 +8783,11 @@ S: Maintained > > F: arch/x86/include/asm/intel_telemetry.h > > F: drivers/platform/x86/intel_telemetry* > > > > +INTEL PMT DRIVER > > +M: "David E. Box" <david.e.box@linux.intel.com> > > +S: Maintained > > +F: drivers/mfd/intel_pmt.c > > I believe you forgot to run parse-maintainers.pl --order > --input=MAINTAINERS --output=MAINTAINERS > > ... > > > + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data, > > sizeof(*info), > > + GFP_KERNEL); > > + > > Extra blank line. > > > + if (!info) > > + return -ENOMEM; > > + > > + while ((pos = pci_find_next_ext_capability(pdev, pos, > > PCI_EXT_CAP_ID_DVSEC))) { > > + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, > > &vid); > > + if (vid != PCI_VENDOR_ID_INTEL) > > + continue; > > Perhaps a candidate for for_each_vendor_cap() macro in pcie-dvsec.h. > Or how is it done for the rest of capabilities? > > > + } > > ... > > > +static const struct pci_device_id pmt_pci_ids[] = { > > + /* TGL */ > > + { PCI_VDEVICE(INTEL, 0x9a0d), (kernel_ulong_t)&tgl_info }, > > PCI_DEVICE_DATA()? Ack on the rest of the changes.
$ git log --oneline include/uapi/linux/pci_regs.h 202853595e53 PCI: pciehp: Disable in-band presence detect when possible ed22aaaede44 PCI: dwc: intel: PCIe RC controller driver bbdb2f5ecdf1 PCI: Add #defines for Enter Compliance, Transmit Margin c9c13ba428ef PCI: Add PCI_STD_NUM_BARS for the number of standard BARs 106feb2fdced PCI: pciehp: Remove pciehp_set_attention_status() 448d5a55759a PCI: Add #defines for some of PCIe spec r4.0 features de76cda215d5 PCI: Decode PCIe 32 GT/s link speed Yours could be: PCI: Add #defines for Designated Vendor-Specific Capability On Mon, May 04, 2020 at 06:32:04PM -0700, David E. Box wrote: > Add pcie dvsec extended capability id along with helper macros to > retrieve information from the headers. s/pcie/PCIe/ s/dvsec/DVSEC/ s/id/ID/ I don't see any helper macros in the patch. Well, OK, I guess the header offsets could be considered macros. > https://members.pcisig.com/wg/PCI-SIG/document/12335 This URL is for an ECN. DVSEC is included in PCIe r5.0, sec 7.9.6, so please cite that instead so the citation remains useful after the URL becomes stale and for people who have the spec but not access to the PCI-SIG website. > Signed-off-by: David E. Box <david.e.box@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> > --- > include/uapi/linux/pci_regs.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index f9701410d3b5..c96f08d1e711 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -720,6 +720,7 @@ > #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ > #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ > #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ > +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Desinated Vendor-Specific */ s/Desinated/Designated/ > #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ > #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ > #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT > @@ -1062,6 +1063,10 @@ > #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ > #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ > > +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ > +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */ > +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */ > + > /* Data Link Feature */ > #define PCI_DLF_CAP 0x04 /* Capabilities Register */ > #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ > -- > 2.20.1 >
On Tue, 2020-05-05 at 16:49 +0300, Andy Shevchenko wrote: > On Tue, May 5, 2020 at 5:32 AM David E. Box < > david.e.box@linux.intel.com> wrote: > > ... > > > Register mappings are not provided by the driver. Instead, a GUID > > is read > > from a header for each endpoint. The GUID identifies the device and > > is to > > be used with an XML, provided by the vendor, to discover the > > available set > > of metrics and their register mapping. This allows firmware > > updates to > > modify the register space without needing to update the driver > > every time > > with new mappings. Firmware writes a new GUID in this case to > > specify the > > new mapping. Software tools with access to the associated XML file > > can > > then interpret the changes. > > Is old hardware going to support this in the future? > (I have in mind Apollo Lake / Broxton) I don't know of any plans for this. > > > This module manages access to all PMT Telemetry endpoints on a > > system, > > regardless of the device exporting them. It creates an > > intel_pmt_telem > > Name is not the best we can come up with. Would anyone else use PMT? > Would it be vendor-agnostic ABI? > (For example, I know that MIPI standardizes tracing protocols, like > STM, do we have any plans to standardize this one?) Not at this time. The technology may be used as a feature on non-Intel devices, but it is Intel owned. Hence the use of DVSEC which allows hardware to enumerate and get driver support for IP from other vendors. > > telem -> telemetry. > > > class to manage the list. For each endpoint, sysfs files provide > > GUID and > > size information as well as a pointer to the parent device the > > telemetry > > comes from. Software may discover the association between endpoints > > and > > devices by iterating through the list in sysfs, or by looking for > > the > > existence of the class folder under the device of interest. A > > device node > > of the same name allows software to then map the telemetry space > > for direct > > access. > > ... > > > + tristate "Intel PMT telemetry driver" > > I think user should understand what is it from the title (hint: spell > PMT fully). > > ... > > > obj-$(CONFIG_PMC_ATOM) += pmc_atom.o > > +obj-$(CONFIG_INTEL_PMT_TELEM) += intel_pmt_telem.o > > Keep this and Kconfig section in order with the other stuff. > > ... > > bits.h? > > > +#include <linux/cdev.h> > > +#include <linux/intel-dvsec.h> > > +#include <linux/io-64-nonatomic-lo-hi.h> > > +#include <linux/kernel.h> > > +#include <linux/module.h> > > +#include <linux/pci.h> > > +#include <linux/platform_device.h> > > +#include <linux/slab.h> > > +#include <linux/uaccess.h> > > +#include <linux/xarray.h> > > ... > > > +/* platform device name to bind to driver */ > > +#define TELEM_DRV_NAME "pmt_telemetry" > > Shouldn't be part of MFD header? Can place in the dvsec header shared by MFD and drivers. > > ... > > > +#define TELEM_TBIR_MASK 0x7 > > GENMASK() ? > > > +struct pmt_telem_priv { > > + struct device *dev; > > + struct intel_dvsec_header *dvsec; > > + struct telem_header header; > > + unsigned long base_addr; > > + void __iomem *disc_table; > > + struct cdev cdev; > > + dev_t devt; > > + int devid; > > +}; > > ... > > > + unsigned long phys = priv->base_addr; > > + unsigned long pfn = PFN_DOWN(phys); > > + unsigned long psize; > > + > > + psize = (PFN_UP(priv->base_addr + priv->header.size) - pfn) > > * PAGE_SIZE; > > + if (vsize > psize) { > > + dev_err(priv->dev, "Requested mmap size is too > > large\n"); > > + return -EINVAL; > > + } > > ... > > > > +static ssize_t guid_show(struct device *dev, struct > > device_attribute *attr, > > + char *buf) > > +{ > > + struct pmt_telem_priv *priv = dev_get_drvdata(dev); > > + > > + return sprintf(buf, "0x%x\n", priv->header.guid); > > +} > > So, it's not a GUID but rather some custom number? Can we actually do > a real GUID / UUID here? I wish but this is the name it was called. We should have pushed back more on this. My concern now in calling the attribute something different is that it will not align with public documentation. ... > > > + /* Local access and BARID only for now */ > > + switch (priv->header.access_type) { > > + case TELEM_ACCESS_LOCAL: > > + if (priv->header.tbir) { > > + dev_err(&pdev->dev, > > + "Unsupported BAR index %d for > > access type %d\n", > > + priv->header.tbir, priv- > > >header.access_type); > > + return -EINVAL; > > + } > > + fallthrough; > > What's the point? The next case has the break. That case is only there to validate that it's not the default which would be an error. Will switch this to break though to make it explicit. Ack on everything else. Thanks.
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring capabilities on a device. With customers increasingly asking for hardware telemetry, engineers not only have to figure out how to measure and collect data, but also how to deliver it and make it discoverable. The latter may be through some device specific method requiring device specific tools to collect the data. This in turn requires customers to manage a suite of different tools in order to collect the differing assortment of monitoring data on their systems. Even when such information can be provided in kernel drivers, they may require constant maintenance to update register mappings as they change with firmware updates and new versions of hardware. PMT provides a solution for discovering and reading telemetry from a device through a hardware agnostic framework that allows for updates to systems without requiring patches to the kernel or software tools. PMT defines several capabilities to support collecting monitoring data from hardware. All are discoverable as separate instances of the PCIE Designated Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID field uniquely identifies the capability. Each DVSEC also provides a BAR offset to a header that defines capability-specific attributes, including GUID, feature type, offset and length, as well as configuration settings where applicable. The GUID uniquely identifies the register space of any monitor data exposed by the capability. The GUID is associated with an XML file from the vendor that describes the mapping of the register space along with properties of the monitor data. This allows vendors to perform firmware updates that can change the mapping (e.g. add new metrics) without requiring any changes to drivers or software tools. The new mapping is confirmed by an updated GUID, read from the hardware, which software uses with a new XML. The current capabilities defined by PMT are Telemetry, Watcher, and Crashlog. The Telemetry capability provides access to a continuous block of read only data. The Watcher capability provides access to hardware sampling and tracing features. Crashlog provides access to device crash dumps. While there is some relationship between capabilities (Watcher can be configured to sample from the Telemetry data set) each exists as stand alone features with no dependency on any other. The design therefore splits them into individual, capability specific drivers. MFD is used to create platform devices for each capability so that they may be managed by their own driver. The PMT architecture is (for the most part) agnostic to the type of device it can collect from. Devices nodes are consequently generic in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability driver creates a class to manage the list of devices supporting it. Software can determine which devices support a PMT feature by searching through each device node entry in the sysfs class folder. It can additionally determine if a particular device supports a PMT feature by checking for a PMT class folder in the device folder. This patch set provides support for the PMT framework, along with support for Telemetry on Tiger Lake. Changes from V1: - In the telemetry driver, set the device in device_create() to the parent pci device (the monitoring device) for clear association in sysfs. Was set before to the platform device created by the pci parent. - Move telem struct into driver and delete unneeded header file. - Start telem device numbering from 0 instead of 1. 1 was used due to anticipated changes, no longer needed. - Use helper macros suggested by Andy S. - Rename class to pmt_telemetry, spelling out full name - Move monitor device name defines to common header - Coding style, spelling, and Makefile/MAINTAINERS ordering fixes David E. Box (3): PCI: Add #defines for Designated Vendor-Specific Capability mfd: Intel Platform Monitoring Technology support platform/x86: Intel PMT Telemetry capability driver MAINTAINERS | 6 + drivers/mfd/Kconfig | 10 + drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 170 ++++++++++++ drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telem.c | 362 +++++++++++++++++++++++++ include/linux/intel-dvsec.h | 48 ++++ include/uapi/linux/pci_regs.h | 5 + 9 files changed, 613 insertions(+) create mode 100644 drivers/mfd/intel_pmt.c create mode 100644 drivers/platform/x86/intel_pmt_telem.c create mode 100644 include/linux/intel-dvsec.h -- 2.20.1
Add PCIe DVSEC extended capability ID and defines for the header offsets. Defined in PCIe r5.0, sec 7.9.6. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- include/uapi/linux/pci_regs.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index f9701410d3b5..09daa9f07b6b 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -720,6 +720,7 @@ #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT @@ -1062,6 +1063,10 @@ #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */ +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */ + /* Data Link Feature */ #define PCI_DLF_CAP 0x04 /* Capabilities Register */ #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ -- 2.20.1
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring facilities. PMT supports multiple types of monitoring capabilities. This driver creates platform devices for each type so that they may be managed by capability specific drivers (to be introduced). Capabilities are discovered using PCIe DVSEC ids. Support is included for the 3 current capability types, Telemetry, Watcher, and Crashlog. The features are available on new Intel platforms starting from Tiger Lake for which support is added. Tiger Lake however will not support Watcher and Crashlog even though the capabilities appear on the device. So add a quirk facility and use it to disable them. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> --- MAINTAINERS | 5 ++ drivers/mfd/Kconfig | 10 +++ drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 170 ++++++++++++++++++++++++++++++++++++ include/linux/intel-dvsec.h | 48 ++++++++++ 5 files changed, 234 insertions(+) create mode 100644 drivers/mfd/intel_pmt.c create mode 100644 include/linux/intel-dvsec.h diff --git a/MAINTAINERS b/MAINTAINERS index e64e5db31497..367e49d27960 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8733,6 +8733,11 @@ F: drivers/mfd/intel_soc_pmic* F: include/linux/mfd/intel_msic.h F: include/linux/mfd/intel_soc_pmic* +INTEL PMT DRIVER +M: "David E. Box" <david.e.box@linux.intel.com> +S: Maintained +F: drivers/mfd/intel_pmt.c + INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT M: Stanislav Yakovlev <stas.yakovlev@gmail.com> L: linux-wireless@vger.kernel.org diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig index 0a59249198d3..8777ff99e633 100644 --- a/drivers/mfd/Kconfig +++ b/drivers/mfd/Kconfig @@ -632,6 +632,16 @@ config MFD_INTEL_MSIC Passage) chip. This chip embeds audio, battery, GPIO, etc. devices used in Intel Medfield platforms. +config MFD_INTEL_PMT + tristate "Intel Platform Monitoring Technology support" + depends on PCI + select MFD_CORE + help + The Intel Platform Monitoring Technology (PMT) is an interface that + provides access to hardware monitor registers. This driver supports + Telemetry, Watcher, and Crashlog PMT capabilities/devices for + platforms starting from Tiger Lake. + config MFD_IPAQ_MICRO bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support" depends on SA1100_H3100 || SA1100_H3600 diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile index f935d10cbf0f..0041f673faa1 100644 --- a/drivers/mfd/Makefile +++ b/drivers/mfd/Makefile @@ -212,6 +212,7 @@ obj-$(CONFIG_MFD_INTEL_LPSS) += intel-lpss.o obj-$(CONFIG_MFD_INTEL_LPSS_PCI) += intel-lpss-pci.o obj-$(CONFIG_MFD_INTEL_LPSS_ACPI) += intel-lpss-acpi.o obj-$(CONFIG_MFD_INTEL_MSIC) += intel_msic.o +obj-$(CONFIG_MFD_INTEL_PMT) += intel_pmt.o obj-$(CONFIG_MFD_PALMAS) += palmas.o obj-$(CONFIG_MFD_VIPERBOARD) += viperboard.o obj-$(CONFIG_MFD_RC5T583) += rc5t583.o rc5t583-irq.o diff --git a/drivers/mfd/intel_pmt.c b/drivers/mfd/intel_pmt.c new file mode 100644 index 000000000000..951128ec2afa --- /dev/null +++ b/drivers/mfd/intel_pmt.c @@ -0,0 +1,170 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitoring Technology MFD driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Authors: David E. Box <david.e.box@linux.intel.com> + */ + +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/pm.h> +#include <linux/pm_runtime.h> +#include <linux/mfd/core.h> +#include <linux/intel-dvsec.h> + +static const struct pmt_platform_info tgl_info = { + .quirks = PMT_QUIRK_NO_WATCHER | PMT_QUIRK_NO_CRASHLOG, +}; + +static int +pmt_add_dev(struct pci_dev *pdev, struct intel_dvsec_header *header, + struct pmt_platform_info *info) +{ + struct mfd_cell *cell, *tmp; + const char *name; + int i; + + switch (header->id) { + case DVSEC_INTEL_ID_TELEM: + name = TELEM_DEV_NAME; + break; + case DVSEC_INTEL_ID_WATCHER: + if (info->quirks && PMT_QUIRK_NO_WATCHER) { + dev_info(&pdev->dev, "Watcher not supported\n"); + return 0; + } + name = WATCHER_DEV_NAME; + break; + case DVSEC_INTEL_ID_CRASHLOG: + if (info->quirks && PMT_QUIRK_NO_WATCHER) { + dev_info(&pdev->dev, "Crashlog not supported\n"); + return 0; + } + name = CRASHLOG_DEV_NAME; + break; + default: + return -EINVAL; + } + + cell = devm_kcalloc(&pdev->dev, header->num_entries, + sizeof(*cell), GFP_KERNEL); + if (!cell) + return -ENOMEM; + + /* Create a platform device for each entry. */ + for (i = 0, tmp = cell; i < header->num_entries; i++, tmp++) { + struct resource *res; + + res = devm_kzalloc(&pdev->dev, sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + tmp->name = name; + + res->start = pdev->resource[header->tbir].start + + header->offset + + (i * (INTEL_DVSEC_ENTRY_SIZE << 2)); + res->end = res->start + (header->entry_size << 2) - 1; + res->flags = IORESOURCE_MEM; + + tmp->resources = res; + tmp->num_resources = 1; + tmp->platform_data = header; + tmp->pdata_size = sizeof(*header); + + } + + return devm_mfd_add_devices(&pdev->dev, PLATFORM_DEVID_AUTO, cell, + header->num_entries, NULL, 0, NULL); +} + +static int +pmt_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + u16 vid; + u32 table; + int ret, pos = 0, last_pos = 0; + struct pmt_platform_info *info; + struct intel_dvsec_header header; + + ret = pcim_enable_device(pdev); + if (ret) + return ret; + + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data, sizeof(*info), + GFP_KERNEL); + if (!info) + return -ENOMEM; + + while ((pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC))) { + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, &vid); + if (vid != PCI_VENDOR_ID_INTEL) + continue; + + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER2, + &header.id); + + pci_read_config_byte(pdev, pos + INTEL_DVSEC_ENTRIES, + &header.num_entries); + + pci_read_config_byte(pdev, pos + INTEL_DVSEC_SIZE, + &header.entry_size); + + if (!header.num_entries || !header.entry_size) + return -EINVAL; + + pci_read_config_dword(pdev, pos + INTEL_DVSEC_TABLE, + &table); + + header.tbir = INTEL_DVSEC_TABLE_BAR(table); + header.offset = INTEL_DVSEC_TABLE_OFFSET(table); + ret = pmt_add_dev(pdev, &header, info); + if (ret) + dev_warn(&pdev->dev, + "Failed to add devices for DVSEC id %d\n", + header.id); + last_pos = pos; + } + + if (!last_pos) { + dev_err(&pdev->dev, "No supported PMT capabilities found.\n"); + return -ENODEV; + } + + pm_runtime_put(&pdev->dev); + pm_runtime_allow(&pdev->dev); + + return 0; +} + +static void pmt_pci_remove(struct pci_dev *pdev) +{ + pm_runtime_forbid(&pdev->dev); + pm_runtime_get_sync(&pdev->dev); +} + +#define PCI_DEVICE_ID_INTEL_PMT_TGL 0x9a0d + +static const struct pci_device_id pmt_pci_ids[] = { + { PCI_DEVICE_DATA(INTEL, PMT_TGL, &tgl_info) }, + { } +}; +MODULE_DEVICE_TABLE(pci, pmt_pci_ids); + +static struct pci_driver pmt_pci_driver = { + .name = "intel-pmt", + .id_table = pmt_pci_ids, + .probe = pmt_pci_probe, + .remove = pmt_pci_remove, +}; + +module_pci_driver(pmt_pci_driver); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel Platform Monitoring Technology MFD driver"); +MODULE_LICENSE("GPL v2"); diff --git a/include/linux/intel-dvsec.h b/include/linux/intel-dvsec.h new file mode 100644 index 000000000000..87bb67fd62f7 --- /dev/null +++ b/include/linux/intel-dvsec.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef INTEL_DVSEC_H +#define INTEL_DVSEC_H + +#include <linux/types.h> + +#define DVSEC_INTEL_ID_TELEM 2 +#define DVSEC_INTEL_ID_WATCHER 3 +#define DVSEC_INTEL_ID_CRASHLOG 4 + +#define TELEM_DEV_NAME "pmt_telemetry" +#define WATCHER_DEV_NAME "pmt_watcher" +#define CRASHLOG_DEV_NAME "pmt_crashlog" + +/* Intel DVSEC capability vendor space offsets */ +#define INTEL_DVSEC_ENTRIES 0xA +#define INTEL_DVSEC_SIZE 0xB +#define INTEL_DVSEC_TABLE 0xC +#define INTEL_DVSEC_TABLE_BAR(x) ((x) & GENMASK(2, 0)) +#define INTEL_DVSEC_TABLE_OFFSET(x) ((x) >> 3) + +#define INTEL_DVSEC_ENTRY_SIZE 4 + +/* DVSEC header */ +struct intel_dvsec_header { + u16 length; + u16 id; + u8 num_entries; + u8 entry_size; + u8 entry_max; + u8 tbir; + u32 offset; +}; + +enum pmt_quirks { + /* Watcher capability not supported */ + PMT_QUIRK_NO_WATCHER = (1 << 0), + + /* Crashlog capability not supported */ + PMT_QUIRK_NO_CRASHLOG = (1 << 1), +}; + +struct pmt_platform_info { + unsigned long quirks; + struct intel_dvsec_header **capabilities; +}; + +#endif -- 2.20.1
PMT Telemetry is a capability of the Intel Platform Monitoring Technology. The Telemetry capability provides access to device telemetry metrics that provide hardware performance data to users from continuous, memory mapped, read-only register spaces. Register mappings are not provided by the driver. Instead, a GUID is read from a header for each endpoint. The GUID identifies the device and is to be used with an XML, provided by the vendor, to discover the available set of metrics and their register mapping. This allows firmware updates to modify the register space without needing to update the driver every time with new mappings. Firmware writes a new GUID in this case to specify the new mapping. Software tools with access to the associated XML file can then interpret the changes. This module manages access to all PMT Telemetry endpoints on a system, regardless of the device exporting them. It creates a pmt_telemetry class to manage the list. For each endpoint, sysfs files provide GUID and size information as well as a pointer to the parent device the telemetry comes from. Software may discover the association between endpoints and devices by iterating through the list in sysfs, or by looking for the existence of the class folder under the device of interest. A device node of the same name allows software to then map the telemetry space for direct access. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> --- MAINTAINERS | 1 + drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telem.c | 362 +++++++++++++++++++++++++ 4 files changed, 374 insertions(+) create mode 100644 drivers/platform/x86/intel_pmt_telem.c diff --git a/MAINTAINERS b/MAINTAINERS index 367e49d27960..a2a12c1196c4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8737,6 +8737,7 @@ INTEL PMT DRIVER M: "David E. Box" <david.e.box@linux.intel.com> S: Maintained F: drivers/mfd/intel_pmt.c +F: drivers/platform/x86/intel_pmt_* INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT M: Stanislav Yakovlev <stas.yakovlev@gmail.com> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 0ad7ad8cf8e1..41f66da0e3f9 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -1368,6 +1368,16 @@ config INTEL_TELEMETRY directly via debugfs files. Various tools may use this interface for SoC state monitoring. +config INTEL_PMT_TELEM + tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver" + help + The Intel Platform Monitory Technology (PMT) Telemetry driver provides + access to hardware telemetry metrics on devices that support the + feature. + + For more information, see + <file:Documentation/ABI/testing/sysfs-class-intel_pmt_telem> + endif # X86_PLATFORM_DEVICES config PMC_ATOM diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index 53408d965874..e5cd49e54745 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -139,6 +139,7 @@ obj-$(CONFIG_INTEL_MID_POWER_BUTTON) += intel_mid_powerbtn.o obj-$(CONFIG_INTEL_MRFLD_PWRBTN) += intel_mrfld_pwrbtn.o obj-$(CONFIG_INTEL_PMC_CORE) += intel_pmc_core.o intel_pmc_core_pltdrv.o obj-$(CONFIG_INTEL_PMC_IPC) += intel_pmc_ipc.o +obj-$(CONFIG_INTEL_PMT_TELEM) += intel_pmt_telem.o obj-$(CONFIG_INTEL_PUNIT_IPC) += intel_punit_ipc.o obj-$(CONFIG_INTEL_SCU_IPC) += intel_scu_ipc.o obj-$(CONFIG_INTEL_SCU_IPC_UTIL) += intel_scu_ipcutil.o diff --git a/drivers/platform/x86/intel_pmt_telem.c b/drivers/platform/x86/intel_pmt_telem.c new file mode 100644 index 000000000000..d5aac239bb35 --- /dev/null +++ b/drivers/platform/x86/intel_pmt_telem.c @@ -0,0 +1,362 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitory Technology Telemetry driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Author: "David E. Box" <david.e.box@linux.intel.com> + */ + +#include <linux/bits.h> +#include <linux/cdev.h> +#include <linux/intel-dvsec.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include <linux/xarray.h> + +/* Telemetry access types */ +#define TELEM_ACCESS_FUTURE 1 +#define TELEM_ACCESS_BARID 2 +#define TELEM_ACCESS_LOCAL 3 + +#define TELEM_GUID_OFFSET 0x4 +#define TELEM_BASE_OFFSET 0x8 +#define TELEM_TBIR_MASK GENMASK(2, 0) +#define TELEM_ACCESS(v) ((v) & GENMASK(3, 0)) +#define TELEM_TYPE(v) (((v) & GENMASK(7, 4)) >> 4) +/* size is in bytes */ +#define TELEM_SIZE(v) (((v) & GENMASK(27, 12)) >> 10) + +#define TELEM_XA_START 0 +#define TELEM_XA_MAX INT_MAX +#define TELEM_XA_LIMIT XA_LIMIT(TELEM_XA_START, TELEM_XA_MAX) + +static DEFINE_XARRAY_ALLOC(telem_array); + +struct telem_header { + u8 access_type; + u8 telem_type; + u16 size; + u32 guid; + u32 base_offset; + u8 tbir; +}; + +struct pmt_telem_priv { + struct device *dev; + struct intel_dvsec_header *dvsec; + struct telem_header header; + unsigned long base_addr; + void __iomem *disc_table; + struct cdev cdev; + dev_t devt; + int devid; +}; + +/* + * devfs + */ +static int pmt_telem_open(struct inode *inode, struct file *filp) +{ + struct pmt_telem_priv *priv; + struct pci_driver *pci_drv; + struct pci_dev *pci_dev; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + priv = container_of(inode->i_cdev, struct pmt_telem_priv, cdev); + pci_dev = to_pci_dev(priv->dev->parent); + + pci_drv = pci_dev_driver(pci_dev); + if (!pci_drv) + return -ENODEV; + + filp->private_data = priv; + get_device(&pci_dev->dev); + + if (!try_module_get(pci_drv->driver.owner)) { + put_device(&pci_dev->dev); + return -ENODEV; + } + + return 0; +} + +static int pmt_telem_release(struct inode *inode, struct file *filp) +{ + struct pmt_telem_priv *priv = filp->private_data; + struct pci_dev *pci_dev = to_pci_dev(priv->dev->parent); + struct pci_driver *pci_drv = pci_dev_driver(pci_dev); + + put_device(&pci_dev->dev); + module_put(pci_drv->driver.owner); + + return 0; +} + +static int pmt_telem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct pmt_telem_priv *priv = filp->private_data; + unsigned long vsize = vma->vm_end - vma->vm_start; + unsigned long phys = priv->base_addr; + unsigned long pfn = PFN_DOWN(phys); + unsigned long psize; + + psize = (PFN_UP(priv->base_addr + priv->header.size) - pfn) * PAGE_SIZE; + if (vsize > psize) { + dev_err(priv->dev, "Requested mmap size is too large\n"); + return -EINVAL; + } + + if ((vma->vm_flags & VM_WRITE) || (vma->vm_flags & VM_MAYWRITE)) + return -EPERM; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + + if (io_remap_pfn_range(vma, vma->vm_start, pfn, vsize, + vma->vm_page_prot)) + return -EINVAL; + + return 0; +} + +static const struct file_operations pmt_telem_fops = { + .owner = THIS_MODULE, + .open = pmt_telem_open, + .mmap = pmt_telem_mmap, + .release = pmt_telem_release, +}; + +/* + * sysfs + */ +static ssize_t guid_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_priv *priv = dev_get_drvdata(dev); + + return sprintf(buf, "0x%x\n", priv->header.guid); +} +static DEVICE_ATTR_RO(guid); + +static ssize_t size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_priv *priv = dev_get_drvdata(dev); + + /* Display buffer size in bytes */ + return sprintf(buf, "%u\n", priv->header.size); +} +static DEVICE_ATTR_RO(size); + +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_priv *priv = dev_get_drvdata(dev); + + /* Display buffer offset in bytes */ + return sprintf(buf, "%lu\n", offset_in_page(priv->base_addr)); +} +static DEVICE_ATTR_RO(offset); + +static struct attribute *pmt_telem_attrs[] = { + &dev_attr_guid.attr, + &dev_attr_size.attr, + &dev_attr_offset.attr, + NULL +}; +ATTRIBUTE_GROUPS(pmt_telem); + +struct class pmt_telem_class = { + .owner = THIS_MODULE, + .name = "pmt_telemetry", + .dev_groups = pmt_telem_groups, +}; + +/* + * driver initialization + */ +static int pmt_telem_create_dev(struct pmt_telem_priv *priv) +{ + struct pci_dev *pci_dev; + struct device *dev; + int ret; + + cdev_init(&priv->cdev, &pmt_telem_fops); + ret = cdev_add(&priv->cdev, priv->devt, 1); + if (ret) { + dev_err(priv->dev, "Could not add char dev\n"); + return ret; + } + + pci_dev = to_pci_dev(priv->dev->parent); + dev = device_create(&pmt_telem_class, &pci_dev->dev, priv->devt, + priv, "telem%d", priv->devid); + if (IS_ERR(dev)) { + dev_err(priv->dev, "Could not create device node\n"); + cdev_del(&priv->cdev); + } + + return PTR_ERR_OR_ZERO(dev); +} + +static void pmt_telem_populate_header(void __iomem *disc_offset, + struct telem_header *header) +{ + header->access_type = TELEM_ACCESS(readb(disc_offset)); + header->telem_type = TELEM_TYPE(readb(disc_offset)); + header->size = TELEM_SIZE(readl(disc_offset)); + header->guid = readl(disc_offset + TELEM_GUID_OFFSET); + header->base_offset = readl(disc_offset + TELEM_BASE_OFFSET); + + /* + * For non-local access types the lower 3 bits of base offset + * contains the index of the base address register where the + * telemetry can be found. + */ + header->tbir = header->base_offset & TELEM_TBIR_MASK; + header->base_offset ^= header->tbir; +} + +static int pmt_telem_probe(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv; + struct pci_dev *parent; + int ret; + + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + platform_set_drvdata(pdev, priv); + priv->dev = &pdev->dev; + parent = to_pci_dev(priv->dev->parent); + + priv->dvsec = dev_get_platdata(&pdev->dev); + if (!priv->dvsec) { + dev_err(&pdev->dev, "Platform data not found\n"); + return -ENODEV; + } + + /* Remap and access the discovery table header */ + priv->disc_table = devm_platform_ioremap_resource(pdev, 0); + if (IS_ERR(priv->disc_table)) + return PTR_ERR(priv->disc_table); + + pmt_telem_populate_header(priv->disc_table, &priv->header); + + /* Local access and BARID only for now */ + switch (priv->header.access_type) { + case TELEM_ACCESS_LOCAL: + if (priv->header.tbir) { + dev_err(&pdev->dev, + "Unsupported BAR index %d for access type %d\n", + priv->header.tbir, priv->header.access_type); + return -EINVAL; + } + break; + + case TELEM_ACCESS_BARID: + break; + + default: + dev_err(&pdev->dev, "Unsupported access type %d\n", + priv->header.access_type); + return -EINVAL; + } + + priv->base_addr = pci_resource_start(parent, priv->header.tbir) + + priv->header.base_offset; + + ret = alloc_chrdev_region(&priv->devt, 0, 1, TELEM_DEV_NAME); + if (ret) { + dev_err(&pdev->dev, + "PMT telemetry chrdev_region error: %d\n", ret); + return ret; + } + + ret = xa_alloc(&telem_array, &priv->devid, priv, TELEM_XA_LIMIT, + GFP_KERNEL); + if (ret) + goto fail_xa_alloc; + + ret = pmt_telem_create_dev(priv); + if (ret) + goto fail_create_dev; + + return 0; + +fail_create_dev: + xa_erase(&telem_array, priv->devid); +fail_xa_alloc: + unregister_chrdev_region(priv->devt, 1); + + return ret; +} + +static int pmt_telem_remove(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv = platform_get_drvdata(pdev); + + device_destroy(&pmt_telem_class, priv->devt); + cdev_del(&priv->cdev); + + xa_erase(&telem_array, priv->devid); + unregister_chrdev_region(priv->devt, 1); + + return 0; +} + +static const struct platform_device_id pmt_telem_table[] = { + { + .name = TELEM_DEV_NAME, + }, + {} +}; +MODULE_DEVICE_TABLE(platform, pmt_telem_table); + +static struct platform_driver pmt_telem_driver = { + .driver = { + .name = TELEM_DEV_NAME, + }, + .probe = pmt_telem_probe, + .remove = pmt_telem_remove, + .id_table = pmt_telem_table, +}; + +static int __init pmt_telem_init(void) +{ + int ret; + + ret = class_register(&pmt_telem_class); + if (ret) + return ret; + + ret = platform_driver_register(&pmt_telem_driver); + if (ret) + class_unregister(&pmt_telem_class); + + return ret; +} +module_init(pmt_telem_init); + +static void __exit pmt_telem_exit(void) +{ + platform_driver_unregister(&pmt_telem_driver); + class_unregister(&pmt_telem_class); + xa_destroy(&telem_array); +} +module_exit(pmt_telem_exit); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel PMT Telemetry driver"); +MODULE_ALIAS("platform:" TELEM_DEV_NAME); +MODULE_LICENSE("GPL v2"); -- 2.20.1
On Tue, 2020-05-05 at 16:49 +0300, Andy Shevchenko wrote:
> ...
>
> > + /* TODO: replace with device properties??? */
>
> So, please, fulfill. swnode I guess is what you are looking for.
I kept the platform data in v2 because swnode properties doesn't look
like a good fit. We are only passing information that was read from the
pci device. It is not hard coded, platform specific data.
David
On Fri, May 8, 2020 at 5:18 AM David E. Box <david.e.box@linux.intel.com> wrote: > > Intel Platform Monitoring Technology (PMT) is an architecture for > enumerating and accessing hardware monitoring facilities. PMT supports > multiple types of monitoring capabilities. This driver creates platform > devices for each type so that they may be managed by capability specific > drivers (to be introduced). Capabilities are discovered using PCIe DVSEC > ids. Support is included for the 3 current capability types, Telemetry, > Watcher, and Crashlog. The features are available on new Intel platforms > starting from Tiger Lake for which support is added. Tiger Lake however > will not support Watcher and Crashlog even though the capabilities appear > on the device. So add a quirk facility and use it to disable them. Thank you for an update. Some nitpicks below. ... > + case DVSEC_INTEL_ID_TELEM: Is this from the spec? Or can we also spell TELEMETRY ? > + name = TELEM_DEV_NAME; Ditto for all occurrences. > + break; ... > + cell = devm_kcalloc(&pdev->dev, header->num_entries, > + sizeof(*cell), GFP_KERNEL); I think if you use temporary struct device *dev = &pdev->dev; you may squeeze this to one line and make others smaller as well. > + if (!cell) > + return -ENOMEM; ... > + res->start = pdev->resource[header->tbir].start + > + header->offset + > + (i * (INTEL_DVSEC_ENTRY_SIZE << 2)); Outer parentheses are redundant. And perhaps last two lines can be one. ... > +static int > +pmt_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > +{ > + u16 vid; > + u32 table; > + int ret, pos = 0, last_pos = 0; Redundant assignment of pos. > + while ((pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC))) { > + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, &vid); > + if (vid != PCI_VENDOR_ID_INTEL) > + continue; > + > + last_pos = pos; Can we simple use a boolean flag? > + } > + > + if (!last_pos) { > + dev_err(&pdev->dev, "No supported PMT capabilities found.\n"); > + return -ENODEV; > + } > +} ... > +}; > + Extra blank line. > +module_pci_driver(pmt_pci_driver); ... + bits.h since GENMASK() is in use. > +#include <linux/types.h> ... > +enum pmt_quirks { > + /* Watcher capability not supported */ > + PMT_QUIRK_NO_WATCHER = (1 << 0), BIT() ? > + > + /* Crashlog capability not supported */ > + PMT_QUIRK_NO_CRASHLOG = (1 << 1), BIT() ? > +}; -- With Best Regards, Andy Shevchenko
On Fri, May 8, 2020 at 5:18 AM David E. Box <david.e.box@linux.intel.com> wrote: > > PMT Telemetry is a capability of the Intel Platform Monitoring Technology. > The Telemetry capability provides access to device telemetry metrics that > provide hardware performance data to users from continuous, memory mapped, > read-only register spaces. > > Register mappings are not provided by the driver. Instead, a GUID is read > from a header for each endpoint. The GUID identifies the device and is to > be used with an XML, provided by the vendor, to discover the available set > of metrics and their register mapping. This allows firmware updates to > modify the register space without needing to update the driver every time > with new mappings. Firmware writes a new GUID in this case to specify the > new mapping. Software tools with access to the associated XML file can > then interpret the changes. > > This module manages access to all PMT Telemetry endpoints on a system, > regardless of the device exporting them. It creates a pmt_telemetry class > to manage the list. For each endpoint, sysfs files provide GUID and size > information as well as a pointer to the parent device the telemetry comes > from. Software may discover the association between endpoints and devices > by iterating through the list in sysfs, or by looking for the existence of ABI needs documentation. > the class folder under the device of interest. A device node of the same > name allows software to then map the telemetry space for direct access. ... > +config INTEL_PMT_TELEM TELEMETRY ... > +obj-$(CONFIG_INTEL_PMT_TELEM) += intel_pmt_telem.o telemetry (Inside the file it's fine to have telem) ... > + priv->dvsec = dev_get_platdata(&pdev->dev); > + if (!priv->dvsec) { > + dev_err(&pdev->dev, "Platform data not found\n"); > + return -ENODEV; > + } I don't see how is it being used? -- With Best Regards, Andy Shevchenko
On Fri, May 8, 2020 at 5:18 AM David E. Box <david.e.box@linux.intel.com> wrote: > > Intel Platform Monitoring Technology (PMT) is an architecture for > enumerating and accessing hardware monitoring capabilities on a device. > With customers increasingly asking for hardware telemetry, engineers not > only have to figure out how to measure and collect data, but also how to > deliver it and make it discoverable. The latter may be through some device > specific method requiring device specific tools to collect the data. This > in turn requires customers to manage a suite of different tools in order to > collect the differing assortment of monitoring data on their systems. Even > when such information can be provided in kernel drivers, they may require > constant maintenance to update register mappings as they change with > firmware updates and new versions of hardware. PMT provides a solution for > discovering and reading telemetry from a device through a hardware agnostic > framework that allows for updates to systems without requiring patches to > the kernel or software tools. > > PMT defines several capabilities to support collecting monitoring data from > hardware. All are discoverable as separate instances of the PCIE Designated > Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID > field uniquely identifies the capability. Each DVSEC also provides a BAR > offset to a header that defines capability-specific attributes, including > GUID, feature type, offset and length, as well as configuration settings > where applicable. The GUID uniquely identifies the register space of any > monitor data exposed by the capability. The GUID is associated with an XML > file from the vendor that describes the mapping of the register space along > with properties of the monitor data. This allows vendors to perform > firmware updates that can change the mapping (e.g. add new metrics) without > requiring any changes to drivers or software tools. The new mapping is > confirmed by an updated GUID, read from the hardware, which software uses > with a new XML. > > The current capabilities defined by PMT are Telemetry, Watcher, and > Crashlog. The Telemetry capability provides access to a continuous block > of read only data. The Watcher capability provides access to hardware > sampling and tracing features. Crashlog provides access to device crash > dumps. While there is some relationship between capabilities (Watcher can > be configured to sample from the Telemetry data set) each exists as stand > alone features with no dependency on any other. The design therefore splits > them into individual, capability specific drivers. MFD is used to create > platform devices for each capability so that they may be managed by their > own driver. The PMT architecture is (for the most part) agnostic to the > type of device it can collect from. Devices nodes are consequently generic > in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability driver > creates a class to manage the list of devices supporting it. Software can > determine which devices support a PMT feature by searching through each > device node entry in the sysfs class folder. It can additionally determine > if a particular device supports a PMT feature by checking for a PMT class > folder in the device folder. > > This patch set provides support for the PMT framework, along with support > for Telemetry on Tiger Lake. > Some nitpicks per individual patches, also you forgot to send the series to PDx86 mailing list and its maintainers (only me included). > Changes from V1: > > - In the telemetry driver, set the device in device_create() to > the parent pci device (the monitoring device) for clear > association in sysfs. Was set before to the platform device > created by the pci parent. > - Move telem struct into driver and delete unneeded header file. > - Start telem device numbering from 0 instead of 1. 1 was used > due to anticipated changes, no longer needed. > - Use helper macros suggested by Andy S. > - Rename class to pmt_telemetry, spelling out full name > - Move monitor device name defines to common header > - Coding style, spelling, and Makefile/MAINTAINERS ordering fixes > > David E. Box (3): > PCI: Add #defines for Designated Vendor-Specific Capability > mfd: Intel Platform Monitoring Technology support > platform/x86: Intel PMT Telemetry capability driver > > MAINTAINERS | 6 + > drivers/mfd/Kconfig | 10 + > drivers/mfd/Makefile | 1 + > drivers/mfd/intel_pmt.c | 170 ++++++++++++ > drivers/platform/x86/Kconfig | 10 + > drivers/platform/x86/Makefile | 1 + > drivers/platform/x86/intel_pmt_telem.c | 362 +++++++++++++++++++++++++ > include/linux/intel-dvsec.h | 48 ++++ > include/uapi/linux/pci_regs.h | 5 + > 9 files changed, 613 insertions(+) > create mode 100644 drivers/mfd/intel_pmt.c > create mode 100644 drivers/platform/x86/intel_pmt_telem.c > create mode 100644 include/linux/intel-dvsec.h > > -- > 2.20.1 > -- With Best Regards, Andy Shevchenko
On Fri, 2020-05-08 at 12:57 +0300, Andy Shevchenko wrote: > On Fri, May 8, 2020 at 5:18 AM David E. Box < > david.e.box@linux.intel.com> wrote: > > PMT Telemetry is a capability of the Intel Platform Monitoring > > Technology. > > The Telemetry capability provides access to device telemetry > > metrics that > > provide hardware performance data to users from continuous, memory > > mapped, > > read-only register spaces. > > > > Register mappings are not provided by the driver. Instead, a GUID > > is read > > from a header for each endpoint. The GUID identifies the device and > > is to > > be used with an XML, provided by the vendor, to discover the > > available set > > of metrics and their register mapping. This allows firmware > > updates to > > modify the register space without needing to update the driver > > every time > > with new mappings. Firmware writes a new GUID in this case to > > specify the > > new mapping. Software tools with access to the associated XML file > > can > > then interpret the changes. > > > > This module manages access to all PMT Telemetry endpoints on a > > system, > > regardless of the device exporting them. It creates a pmt_telemetry > > class > > to manage the list. For each endpoint, sysfs files provide GUID and > > size > > information as well as a pointer to the parent device the telemetry > > comes > > from. Software may discover the association between endpoints and > > devices > > by iterating through the list in sysfs, or by looking for the > > existence of > > ABI needs documentation. We will be releasing a Linux software spec for PMT. We are waiting on public release of the PMT spec. For this patch we did document the sysfs class ABI. > > > the class folder under the device of interest. A device node of > > the same > > name allows software to then map the telemetry space for direct > > access. > > ... > > > +config INTEL_PMT_TELEM > > TELEMETRY > > ... > > > +obj-$(CONFIG_INTEL_PMT_TELEM) += intel_pmt_telem.o > > telemetry > > (Inside the file it's fine to have telem) > > ... > > > + priv->dvsec = dev_get_platdata(&pdev->dev); > > + if (!priv->dvsec) { > > + dev_err(&pdev->dev, "Platform data not found\n"); > > + return -ENODEV; > > + } > > I don't see how is it being used? Good catch :). This was initially used to pass the DVSEC info from the pci device to the telemetry driver. But with changes all of the needed info is now read from the driver's memory resource. It was unnoticed that dvsec fields are no longer used. Will remove in next version. Okay on other comments. David
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring capabilities on a device. With customers increasingly asking for hardware telemetry, engineers not only have to figure out how to measure and collect data, but also how to deliver it and make it discoverable. The latter may be through some device specific method requiring device specific tools to collect the data. This in turn requires customers to manage a suite of different tools in order to collect the differing assortment of monitoring data on their systems. Even when such information can be provided in kernel drivers, they may require constant maintenance to update register mappings as they change with firmware updates and new versions of hardware. PMT provides a solution for discovering and reading telemetry from a device through a hardware agnostic framework that allows for updates to systems without requiring patches to the kernel or software tools. PMT defines several capabilities to support collecting monitoring data from hardware. All are discoverable as separate instances of the PCIE Designated Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID field uniquely identifies the capability. Each DVSEC also provides a BAR offset to a header that defines capability-specific attributes, including GUID, feature type, offset and length, as well as configuration settings where applicable. The GUID uniquely identifies the register space of any monitor data exposed by the capability. The GUID is associated with an XML file from the vendor that describes the mapping of the register space along with properties of the monitor data. This allows vendors to perform firmware updates that can change the mapping (e.g. add new metrics) without requiring any changes to drivers or software tools. The new mapping is confirmed by an updated GUID, read from the hardware, which software uses with a new XML. The current capabilities defined by PMT are Telemetry, Watcher, and Crashlog. The Telemetry capability provides access to a continuous block of read only data. The Watcher capability provides access to hardware sampling and tracing features. Crashlog provides access to device crash dumps. While there is some relationship between capabilities (Watcher can be configured to sample from the Telemetry data set) each exists as stand alone features with no dependency on any other. The design therefore splits them into individual, capability specific drivers. MFD is used to create platform devices for each capability so that they may be managed by their own driver. The PMT architecture is (for the most part) agnostic to the type of device it can collect from. Devices nodes are consequently generic in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability driver creates a class to manage the list of devices supporting it. Software can determine which devices support a PMT feature by searching through each device node entry in the sysfs class folder. It can additionally determine if a particular device supports a PMT feature by checking for a PMT class folder in the device folder. This patch set provides support for the PMT framework, along with support for Telemetry on Tiger Lake. Changes from V2: Please excuse this delayed V3 as we dealt with last minute hardware changes. - In order to handle certain HW bugs from the telemetry capability driver, create a single platform device per capability instead of a device per entry. Add the entry data as device resources and let the capability driver manage them as a set allowing for cleaner HW bug resolution. - Handle discovery table offset bug in intel_pmt.c - Handle overlapping regions in intel_pmt_telemetry.c - Add description of sysfs class to testing ABI. - Don't check size and count until confirming support for the PMT capability to avoid bailing out when we need to skip it. - Remove unneeded header file. Move code to the intel_pmt.c, the only place where it's needed. - Remove now unused platform data. - Add missing header files types.h, bits.h. - Rename file name and build options from telem to telemetry. - Code cleanup suggested by Andy S. - x86 mailing list added. Changes from V1: - In the telemetry driver, set the device in device_create() to the parent pci device (the monitoring device) for clear association in sysfs. Was set before to the platform device created by the pci parent. - Move telem struct into driver and delete unneeded header file. - Start telem device numbering from 0 instead of 1. 1 was used due to anticipated changes, no longer needed. - Use helper macros suggested by Andy S. - Rename class to pmt_telemetry, spelling out full name - Move monitor device name defines to common header - Coding style, spelling, and Makefile/MAINTAINERS ordering fixes David E. Box (3): PCI: Add defines for Designated Vendor-Specific Capability mfd: Intel Platform Monitoring Technology support platform/x86: Intel PMT Telemetry capability driver .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++ MAINTAINERS | 6 + drivers/mfd/Kconfig | 10 + drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 218 +++++++++ drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telemetry.c | 454 ++++++++++++++++++ include/uapi/linux/pci_regs.h | 5 + 9 files changed, 751 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry create mode 100644 drivers/mfd/intel_pmt.c create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c -- 2.20.1
Add PCIe DVSEC extended capability ID and defines for the header offsets. Defined in PCIe r5.0, sec 7.9.6. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- include/uapi/linux/pci_regs.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index f9701410d3b5..09daa9f07b6b 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -720,6 +720,7 @@ #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT @@ -1062,6 +1063,10 @@ #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */ +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */ + /* Data Link Feature */ #define PCI_DLF_CAP 0x04 /* Capabilities Register */ #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ -- 2.20.1
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring facilities. PMT supports multiple types of monitoring capabilities. This driver creates platform devices for each type so that they may be managed by capability specific drivers (to be introduced). Capabilities are discovered using PCIe DVSEC ids. Support is included for the 3 current capability types, Telemetry, Watcher, and Crashlog. The features are available on new Intel platforms starting from Tiger Lake for which support is added. This patch also adds a quirk mechanism for several early hardware differences and bugs. For Tiger Lake, do not support Watcher and Crashlog capabilities since they will not be compatible with future product. Also, fix use a quirk to fix the discovery table offset. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> --- MAINTAINERS | 5 + drivers/mfd/Kconfig | 10 ++ drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 218 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 234 insertions(+) create mode 100644 drivers/mfd/intel_pmt.c diff --git a/MAINTAINERS b/MAINTAINERS index b4a43a9e7fbc..2e42bf0c41ab 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8845,6 +8845,11 @@ F: drivers/mfd/intel_soc_pmic* F: include/linux/mfd/intel_msic.h F: include/linux/mfd/intel_soc_pmic* +INTEL PMT DRIVER +M: "David E. Box" <david.e.box@linux.intel.com> +S: Maintained +F: drivers/mfd/intel_pmt.c + INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT M: Stanislav Yakovlev <stas.yakovlev@gmail.com> L: linux-wireless@vger.kernel.org diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig index a37d7d171382..1a62ce2c68d9 100644 --- a/drivers/mfd/Kconfig +++ b/drivers/mfd/Kconfig @@ -670,6 +670,16 @@ config MFD_INTEL_PMC_BXT Register and P-unit access. In addition this creates devices for iTCO watchdog and telemetry that are part of the PMC. +config MFD_INTEL_PMT + tristate "Intel Platform Monitoring Technology support" + depends on PCI + select MFD_CORE + help + The Intel Platform Monitoring Technology (PMT) is an interface that + provides access to hardware monitor registers. This driver supports + Telemetry, Watcher, and Crashlog PMT capabilities/devices for + platforms starting from Tiger Lake. + config MFD_IPAQ_MICRO bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support" depends on SA1100_H3100 || SA1100_H3600 diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile index 9367a92f795a..1961b4737985 100644 --- a/drivers/mfd/Makefile +++ b/drivers/mfd/Makefile @@ -216,6 +216,7 @@ obj-$(CONFIG_MFD_INTEL_LPSS_PCI) += intel-lpss-pci.o obj-$(CONFIG_MFD_INTEL_LPSS_ACPI) += intel-lpss-acpi.o obj-$(CONFIG_MFD_INTEL_MSIC) += intel_msic.o obj-$(CONFIG_MFD_INTEL_PMC_BXT) += intel_pmc_bxt.o +obj-$(CONFIG_MFD_INTEL_PMT) += intel_pmt.o obj-$(CONFIG_MFD_PALMAS) += palmas.o obj-$(CONFIG_MFD_VIPERBOARD) += viperboard.o obj-$(CONFIG_MFD_RC5T583) += rc5t583.o rc5t583-irq.o diff --git a/drivers/mfd/intel_pmt.c b/drivers/mfd/intel_pmt.c new file mode 100644 index 000000000000..0924eca25db0 --- /dev/null +++ b/drivers/mfd/intel_pmt.c @@ -0,0 +1,218 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitoring Technology MFD driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Authors: David E. Box <david.e.box@linux.intel.com> + */ + +#include <linux/bits.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/pm.h> +#include <linux/pm_runtime.h> +#include <linux/mfd/core.h> +#include <linux/types.h> + +/* Intel DVSEC capability vendor space offsets */ +#define INTEL_DVSEC_ENTRIES 0xA +#define INTEL_DVSEC_SIZE 0xB +#define INTEL_DVSEC_TABLE 0xC +#define INTEL_DVSEC_TABLE_BAR(x) ((x) & GENMASK(2, 0)) +#define INTEL_DVSEC_TABLE_OFFSET(x) ((x) & GENMASK(31, 3)) +#define INTEL_DVSEC_ENTRY_SIZE 4 + +/* PMT capabilities */ +#define DVSEC_INTEL_ID_TELEMETRY 2 +#define DVSEC_INTEL_ID_WATCHER 3 +#define DVSEC_INTEL_ID_CRASHLOG 4 + +#define TELEMETRY_DEV_NAME "pmt_telemetry" +#define WATCHER_DEV_NAME "pmt_watcher" +#define CRASHLOG_DEV_NAME "pmt_crashlog" + +struct intel_dvsec_header { + u16 length; + u16 id; + u8 num_entries; + u8 entry_size; + u8 tbir; + u32 offset; +}; + +enum pmt_quirks { + /* Watcher capability not supported */ + PMT_QUIRK_NO_WATCHER = BIT(0), + + /* Crashlog capability not supported */ + PMT_QUIRK_NO_CRASHLOG = BIT(1), + + /* Use shift instead of mask to read discovery table offset */ + PMT_QUIRK_TABLE_SHIFT = BIT(2), +}; + +struct pmt_platform_info { + unsigned long quirks; +}; + +static const struct pmt_platform_info tgl_info = { + .quirks = PMT_QUIRK_NO_WATCHER | PMT_QUIRK_NO_CRASHLOG | + PMT_QUIRK_TABLE_SHIFT, +}; + +static const struct pmt_platform_info pmt_info = { +}; + +static int +pmt_add_dev(struct pci_dev *pdev, struct intel_dvsec_header *header, + struct pmt_platform_info *info) +{ + struct device *dev = &pdev->dev; + struct resource *res, *tmp; + struct mfd_cell *cell; + const char *name; + int count = header->num_entries; + int size = header->entry_size; + int i; + + switch (header->id) { + case DVSEC_INTEL_ID_TELEMETRY: + name = TELEMETRY_DEV_NAME; + break; + case DVSEC_INTEL_ID_WATCHER: + if (info->quirks & PMT_QUIRK_NO_WATCHER) { + dev_info(dev, "Watcher not supported\n"); + return 0; + } + name = WATCHER_DEV_NAME; + break; + case DVSEC_INTEL_ID_CRASHLOG: + if (info->quirks & PMT_QUIRK_NO_CRASHLOG) { + dev_info(dev, "Crashlog not supported\n"); + return 0; + } + name = CRASHLOG_DEV_NAME; + break; + default: + return -EINVAL; + } + + if (!header->num_entries || !header->entry_size) { + dev_warn(dev, "Invalid count or size for %s header\n", name); + return -EINVAL; + } + + cell = devm_kzalloc(dev, sizeof(*cell), GFP_KERNEL); + if (!cell) + return -ENOMEM; + + res = devm_kcalloc(dev, count, sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + if (info->quirks & PMT_QUIRK_TABLE_SHIFT) + header->offset >>= 3; + + for (i = 0, tmp = res; i < count; i++, tmp++) { + tmp->start = pdev->resource[header->tbir].start + + header->offset + i * (size << 2); + tmp->end = tmp->start + (size << 2) - 1; + tmp->flags = IORESOURCE_MEM; + } + + cell->resources = res; + cell->num_resources = count; + cell->name = name; + + return devm_mfd_add_devices(dev, PLATFORM_DEVID_AUTO, cell, 1, NULL, 0, + NULL); +} + +static int +pmt_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct intel_dvsec_header header; + struct pmt_platform_info *info; + bool found_devices = false; + int ret, pos = 0; + u32 table; + u16 vid; + + ret = pcim_enable_device(pdev); + if (ret) + return ret; + + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data, sizeof(*info), + GFP_KERNEL); + if (!info) + return -ENOMEM; + + pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC); + while (pos) { + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, &vid); + if (vid != PCI_VENDOR_ID_INTEL) + continue; + + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER2, + &header.id); + pci_read_config_byte(pdev, pos + INTEL_DVSEC_ENTRIES, + &header.num_entries); + pci_read_config_byte(pdev, pos + INTEL_DVSEC_SIZE, + &header.entry_size); + pci_read_config_dword(pdev, pos + INTEL_DVSEC_TABLE, + &table); + + header.tbir = INTEL_DVSEC_TABLE_BAR(table); + header.offset = INTEL_DVSEC_TABLE_OFFSET(table); + + ret = pmt_add_dev(pdev, &header, info); + if (ret) + dev_warn(&pdev->dev, + "Failed to add devices for DVSEC id %d\n", + header.id); + found_devices = true; + + pos = pci_find_next_ext_capability(pdev, pos, + PCI_EXT_CAP_ID_DVSEC); + } + + if (!found_devices) { + dev_err(&pdev->dev, "No supported PMT capabilities found.\n"); + return -ENODEV; + } + + pm_runtime_put(&pdev->dev); + pm_runtime_allow(&pdev->dev); + + return 0; +} + +static void pmt_pci_remove(struct pci_dev *pdev) +{ + pm_runtime_forbid(&pdev->dev); + pm_runtime_get_sync(&pdev->dev); +} + +#define PCI_DEVICE_ID_INTEL_PMT_TGL 0x9a0d + +static const struct pci_device_id pmt_pci_ids[] = { + { PCI_DEVICE_DATA(INTEL, PMT_TGL, &tgl_info) }, + { } +}; +MODULE_DEVICE_TABLE(pci, pmt_pci_ids); + +static struct pci_driver pmt_pci_driver = { + .name = "intel-pmt", + .id_table = pmt_pci_ids, + .probe = pmt_pci_probe, + .remove = pmt_pci_remove, +}; +module_pci_driver(pmt_pci_driver); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel Platform Monitoring Technology MFD driver"); +MODULE_LICENSE("GPL v2"); -- 2.20.1
PMT Telemetry is a capability of the Intel Platform Monitoring Technology. The Telemetry capability provides access to device telemetry metrics that provide hardware performance data to users from continuous, memory mapped, read-only register spaces. Register mappings are not provided by the driver. Instead, a GUID is read from a header for each endpoint. The GUID identifies the device and is to be used with an XML, provided by the vendor, to discover the available set of metrics and their register mapping. This allows firmware updates to modify the register space without needing to update the driver every time with new mappings. Firmware writes a new GUID in this case to specify the new mapping. Software tools with access to the associated XML file can then interpret the changes. This module manages access to all PMT Telemetry endpoints on a system, independent of the device exporting them. It creates a pmt_telemetry class to manage the devices. For each telemetry endpoint, sysfs files provide GUID and size information as well as a pointer to the parent device the telemetry came from. Software may discover the association between endpoints and devices by iterating through the list in sysfs, or by looking for the existence of the class folder under the device of interest. A device node of the same name allows software to then map the telemetry space for direct access. This patch also creates an pci device id list for early telemetry hardware that requires workarounds for known issues. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> --- .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++ MAINTAINERS | 1 + drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telemetry.c | 454 ++++++++++++++++++ 5 files changed, 512 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c diff --git a/Documentation/ABI/testing/sysfs-class-pmt_telemetry b/Documentation/ABI/testing/sysfs-class-pmt_telemetry new file mode 100644 index 000000000000..381924549ecb --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-pmt_telemetry @@ -0,0 +1,46 @@ +What: /sys/class/pmt_telemetry/ +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + The pmt_telemetry/ class directory contains information for + devices that expose hardware telemetry using Intel Platform + Monitoring Technology (PMT) + +What: /sys/class/pmt_telemetry/telem<x> +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + The telem<x> directory contains files describing an instance of + a PMT telemetry device that exposes hardware telemetry. Each + telem<x> directory has an associated /dev/telem<x> node. This + node may be opened and mapped to access the telemetry space of + the device. The register layout of the telemetry space is + determined from an XML file that matches the pci device id and + guid for the device. + +What: /sys/class/pmt_telemetry/telem<x>/guid +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The guid for this telemetry device. The guid identifies + the version of the XML file for the parent device that is to + be used to get the register layout. + +What: /sys/class/pmt_telemetry/telem<x>/size +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The size of telemetry region in bytes that corresponds to + the mapping size for the /dev/telem<x> device node. + +What: /sys/class/pmt_telemetry/telem<x>/offset +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The offset of telemetry region in bytes that corresponds to + the mapping for the /dev/telem<x> device node. diff --git a/MAINTAINERS b/MAINTAINERS index 2e42bf0c41ab..ebc145894abd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8849,6 +8849,7 @@ INTEL PMT DRIVER M: "David E. Box" <david.e.box@linux.intel.com> S: Maintained F: drivers/mfd/intel_pmt.c +F: drivers/platform/x86/intel_pmt_* INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT M: Stanislav Yakovlev <stas.yakovlev@gmail.com> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 0581a54cf562..5e1f7ce6e69f 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -1396,6 +1396,16 @@ config INTEL_TELEMETRY directly via debugfs files. Various tools may use this interface for SoC state monitoring. +config INTEL_PMT_TELEMETRY + tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver" + help + The Intel Platform Monitory Technology (PMT) Telemetry driver provides + access to hardware telemetry metrics on devices that support the + feature. + + For more information, see + <file:Documentation/ABI/testing/sysfs-class-intel_pmt_telem> + endif # X86_PLATFORM_DEVICES config PMC_ATOM diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index 2b85852a1a87..95cd3d0be17f 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -139,6 +139,7 @@ obj-$(CONFIG_INTEL_MFLD_THERMAL) += intel_mid_thermal.o obj-$(CONFIG_INTEL_MID_POWER_BUTTON) += intel_mid_powerbtn.o obj-$(CONFIG_INTEL_MRFLD_PWRBTN) += intel_mrfld_pwrbtn.o obj-$(CONFIG_INTEL_PMC_CORE) += intel_pmc_core.o intel_pmc_core_pltdrv.o +obj-$(CONFIG_INTEL_PMT_TELEMETRY) += intel_pmt_telemetry.o obj-$(CONFIG_INTEL_PUNIT_IPC) += intel_punit_ipc.o obj-$(CONFIG_INTEL_SCU_IPC) += intel_scu_ipc.o obj-$(CONFIG_INTEL_SCU_PCI) += intel_scu_pcidrv.o diff --git a/drivers/platform/x86/intel_pmt_telemetry.c b/drivers/platform/x86/intel_pmt_telemetry.c new file mode 100644 index 000000000000..e1856fc8c209 --- /dev/null +++ b/drivers/platform/x86/intel_pmt_telemetry.c @@ -0,0 +1,454 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitory Technology Telemetry driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Author: "David E. Box" <david.e.box@linux.intel.com> + */ + +#include <linux/bits.h> +#include <linux/cdev.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include <linux/xarray.h> + +#define TELEM_DEV_NAME "pmt_telemetry" + +/* Telemetry access types */ +#define TELEM_ACCESS_FUTURE 1 +#define TELEM_ACCESS_BARID 2 +#define TELEM_ACCESS_LOCAL 3 + +#define TELEM_GUID_OFFSET 0x4 +#define TELEM_BASE_OFFSET 0x8 +#define TELEM_TBIR_MASK GENMASK(2, 0) +#define TELEM_ACCESS(v) ((v) & GENMASK(3, 0)) +#define TELEM_TYPE(v) (((v) & GENMASK(7, 4)) >> 4) +/* size is in bytes */ +#define TELEM_SIZE(v) (((v) & GENMASK(27, 12)) >> 10) + +#define TELEM_XA_START 0 +#define TELEM_XA_MAX INT_MAX +#define TELEM_XA_LIMIT XA_LIMIT(TELEM_XA_START, TELEM_XA_MAX) + +/* Used by client hardware to identify a fixed telemetry entry*/ +#define TELEM_CLIENT_FIXED_BLOCK_GUID 0x10000000 + +static DEFINE_XARRAY_ALLOC(telem_array); + +struct pmt_telem_priv; + +struct telem_header { + u8 access_type; + u8 telem_type; + u16 size; + u32 guid; + u32 base_offset; + u8 tbir; +}; + +struct pmt_telem_entry { + struct pmt_telem_priv *priv; + struct telem_header header; + struct resource *header_res; + unsigned long base_addr; + void __iomem *disc_table; + struct cdev cdev; + dev_t devt; + int devid; +}; + +struct pmt_telem_priv { + struct pmt_telem_entry *entry; + int num_entries; + struct device *dev; +}; + +/* + * devfs + */ +static int pmt_telem_open(struct inode *inode, struct file *filp) +{ + struct pmt_telem_priv *priv; + struct pmt_telem_entry *entry; + struct pci_driver *pci_drv; + struct pci_dev *pci_dev; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + entry = container_of(inode->i_cdev, struct pmt_telem_entry, cdev); + priv = entry->priv; + pci_dev = to_pci_dev(priv->dev->parent); + + pci_drv = pci_dev_driver(pci_dev); + if (!pci_drv) + return -ENODEV; + + filp->private_data = entry; + get_device(&pci_dev->dev); + + if (!try_module_get(pci_drv->driver.owner)) { + put_device(&pci_dev->dev); + return -ENODEV; + } + + return 0; +} + +static int pmt_telem_release(struct inode *inode, struct file *filp) +{ + struct pmt_telem_entry *entry = filp->private_data; + struct pci_dev *pci_dev = to_pci_dev(entry->priv->dev->parent); + struct pci_driver *pci_drv = pci_dev_driver(pci_dev); + + put_device(&pci_dev->dev); + module_put(pci_drv->driver.owner); + + return 0; +} + +static int pmt_telem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct pmt_telem_entry *entry = filp->private_data; + struct pmt_telem_priv *priv; + unsigned long vsize = vma->vm_end - vma->vm_start; + unsigned long phys = entry->base_addr; + unsigned long pfn = PFN_DOWN(phys); + unsigned long psize; + + priv = entry->priv; + psize = (PFN_UP(entry->base_addr + entry->header.size) - pfn) * + PAGE_SIZE; + if (vsize > psize) { + dev_err(priv->dev, "Requested mmap size is too large\n"); + return -EINVAL; + } + + if ((vma->vm_flags & VM_WRITE) || (vma->vm_flags & VM_MAYWRITE)) + return -EPERM; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + + if (io_remap_pfn_range(vma, vma->vm_start, pfn, vsize, + vma->vm_page_prot)) + return -EINVAL; + + return 0; +} + +static const struct file_operations pmt_telem_fops = { + .owner = THIS_MODULE, + .open = pmt_telem_open, + .mmap = pmt_telem_mmap, + .release = pmt_telem_release, +}; + +/* + * sysfs + */ +static ssize_t guid_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + return sprintf(buf, "0x%x\n", entry->header.guid); +} +static DEVICE_ATTR_RO(guid); + +static ssize_t size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + /* Display buffer size in bytes */ + return sprintf(buf, "%u\n", entry->header.size); +} +static DEVICE_ATTR_RO(size); + +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + /* Display buffer offset in bytes */ + return sprintf(buf, "%lu\n", offset_in_page(entry->base_addr)); +} +static DEVICE_ATTR_RO(offset); + +static struct attribute *pmt_telem_attrs[] = { + &dev_attr_guid.attr, + &dev_attr_size.attr, + &dev_attr_offset.attr, + NULL +}; +ATTRIBUTE_GROUPS(pmt_telem); + +struct class pmt_telem_class = { + .owner = THIS_MODULE, + .name = "pmt_telemetry", + .dev_groups = pmt_telem_groups, +}; + +/* + * driver initialization + */ +static const struct pci_device_id pmt_telem_early_client_pci_ids[] = { + { PCI_VDEVICE(INTEL, 0x9a0d) }, /* TGL */ + { } +}; + +static bool pmt_telem_is_early_client_hw(struct device *dev) +{ + struct pci_dev *parent; + + parent = to_pci_dev(dev->parent); + + return !!pci_match_id(pmt_telem_early_client_pci_ids, parent); +} + +static int pmt_telem_create_dev(struct pmt_telem_priv *priv, + struct pmt_telem_entry *entry) +{ + struct pci_dev *pci_dev; + struct device *dev; + int ret; + + cdev_init(&entry->cdev, &pmt_telem_fops); + ret = cdev_add(&entry->cdev, entry->devt, 1); + if (ret) { + dev_err(priv->dev, "Could not add char dev\n"); + return ret; + } + + pci_dev = to_pci_dev(priv->dev->parent); + dev = device_create(&pmt_telem_class, &pci_dev->dev, entry->devt, + entry, "telem%d", entry->devid); + if (IS_ERR(dev)) { + dev_err(priv->dev, "Could not create device node\n"); + cdev_del(&entry->cdev); + } + + return PTR_ERR_OR_ZERO(dev); +} + +static void pmt_telem_populate_header(void __iomem *disc_offset, + struct telem_header *header) +{ + header->access_type = TELEM_ACCESS(readb(disc_offset)); + header->telem_type = TELEM_TYPE(readb(disc_offset)); + header->size = TELEM_SIZE(readl(disc_offset)); + header->guid = readl(disc_offset + TELEM_GUID_OFFSET); + header->base_offset = readl(disc_offset + TELEM_BASE_OFFSET); + + /* + * For non-local access types the lower 3 bits of base offset + * contains the index of the base address register where the + * telemetry can be found. + */ + header->tbir = header->base_offset & TELEM_TBIR_MASK; + header->base_offset ^= header->tbir; +} + +static int pmt_telem_add_entry(struct pmt_telem_priv *priv, + struct pmt_telem_entry *entry) +{ + struct resource *res = entry->header_res; + struct pci_dev *pci_dev = to_pci_dev(priv->dev->parent); + int ret; + + pmt_telem_populate_header(entry->disc_table, &entry->header); + + /* Local access and BARID only for now */ + switch (entry->header.access_type) { + case TELEM_ACCESS_LOCAL: + if (entry->header.tbir) { + dev_err(priv->dev, + "Unsupported BAR index %d for access type %d\n", + entry->header.tbir, entry->header.access_type); + return -EINVAL; + } + + /* + * For access_type LOCAL, the base address is as follows: + * base address = header address + header length + base offset + */ + entry->base_addr = res->start + resource_size(res) + + entry->header.base_offset; + break; + + case TELEM_ACCESS_BARID: + entry->base_addr = pci_dev->resource[entry->header.tbir].start + + entry->header.base_offset; + break; + + default: + dev_err(priv->dev, "Unsupported access type %d\n", + entry->header.access_type); + return -EINVAL; + } + + ret = alloc_chrdev_region(&entry->devt, 0, 1, TELEM_DEV_NAME); + if (ret) { + dev_err(priv->dev, + "PMT telemetry chrdev_region error: %d\n", ret); + return ret; + } + + ret = xa_alloc(&telem_array, &entry->devid, entry, TELEM_XA_LIMIT, + GFP_KERNEL); + if (ret) + goto fail_xa_alloc; + + ret = pmt_telem_create_dev(priv, entry); + if (ret) + goto fail_create_dev; + + entry->priv = priv; + priv->num_entries++; + return 0; + +fail_create_dev: + xa_erase(&telem_array, entry->devid); +fail_xa_alloc: + unregister_chrdev_region(entry->devt, 1); + + return ret; +} + +static bool pmt_telem_region_overlaps(struct platform_device *pdev, + void __iomem *disc_table) +{ + u32 guid; + + guid = readl(disc_table + TELEM_GUID_OFFSET); + + return guid == TELEM_CLIENT_FIXED_BLOCK_GUID; +} + +static void pmt_telem_remove_entries(struct pmt_telem_priv *priv) +{ + int i; + + for (i = 0; i < priv->num_entries; i++) { + device_destroy(&pmt_telem_class, priv->entry[i].devt); + cdev_del(&priv->entry[i].cdev); + xa_erase(&telem_array, priv->entry[i].devid); + unregister_chrdev_region(priv->entry[i].devt, 1); + } +} + +static int pmt_telem_probe(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv; + struct pmt_telem_entry *entry; + bool early_hw; + int i; + + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + platform_set_drvdata(pdev, priv); + priv->dev = &pdev->dev; + + priv->entry = devm_kcalloc(&pdev->dev, pdev->num_resources, + sizeof(struct pmt_telem_entry), GFP_KERNEL); + if (!priv->entry) + return -ENOMEM; + + if (pmt_telem_is_early_client_hw(&pdev->dev)) + early_hw = true; + + for (i = 0, entry = priv->entry; i < pdev->num_resources; + i++, entry++) { + int ret; + + entry->header_res = platform_get_resource(pdev, IORESOURCE_MEM, + i); + if (!entry->header_res) { + pmt_telem_remove_entries(priv); + return -ENODEV; + } + + entry->disc_table = devm_platform_ioremap_resource(pdev, i); + if (IS_ERR(entry->disc_table)) { + pmt_telem_remove_entries(priv); + return PTR_ERR(entry->disc_table); + } + + if (pmt_telem_region_overlaps(pdev, entry->disc_table) && + early_hw) + continue; + + ret = pmt_telem_add_entry(priv, entry); + if (ret) { + pmt_telem_remove_entries(priv); + return ret; + } + } + + return 0; +} + +static int pmt_telem_remove(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv = platform_get_drvdata(pdev); + + pmt_telem_remove_entries(priv); + + return 0; +} + +static const struct platform_device_id pmt_telem_table[] = { + { + .name = "pmt_telemetry", + }, + {} +}; +MODULE_DEVICE_TABLE(platform, pmt_telem_table); + +static struct platform_driver pmt_telem_driver = { + .driver = { + .name = TELEM_DEV_NAME, + }, + .probe = pmt_telem_probe, + .remove = pmt_telem_remove, + .id_table = pmt_telem_table, +}; + +static int __init pmt_telem_init(void) +{ + int ret = class_register(&pmt_telem_class); + + if (ret) + return ret; + + ret = platform_driver_register(&pmt_telem_driver); + if (ret) + class_unregister(&pmt_telem_class); + + return ret; +} +module_init(pmt_telem_init); + +static void __exit pmt_telem_exit(void) +{ + platform_driver_unregister(&pmt_telem_driver); + class_unregister(&pmt_telem_class); + xa_destroy(&telem_array); +} +module_exit(pmt_telem_exit); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel PMT Telemetry driver"); +MODULE_ALIAS("platform:" TELEM_DEV_NAME); +MODULE_LICENSE("GPL v2"); -- 2.20.1
On Tue, Jul 14, 2020 at 9:22 AM David E. Box <david.e.box@linux.intel.com> wrote: > > Add PCIe DVSEC extended capability ID and defines for the header offsets. > Defined in PCIe r5.0, sec 7.9.6. > Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> > Signed-off-by: David E. Box <david.e.box@linux.intel.com> > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > --- > include/uapi/linux/pci_regs.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index f9701410d3b5..09daa9f07b6b 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -720,6 +720,7 @@ > #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ > #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ > #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ > +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ > #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ > #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ > #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT > @@ -1062,6 +1063,10 @@ > #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ > #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ > > +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ > +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */ > +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */ > + > /* Data Link Feature */ > #define PCI_DLF_CAP 0x04 /* Capabilities Register */ > #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ > -- > 2.20.1 > -- With Best Regards, Andy Shevchenko
On Tue, Jul 14, 2020 at 9:22 AM David E. Box <david.e.box@linux.intel.com> wrote: > > PMT Telemetry is a capability of the Intel Platform Monitoring Technology. > The Telemetry capability provides access to device telemetry metrics that > provide hardware performance data to users from continuous, memory mapped, > read-only register spaces. > > Register mappings are not provided by the driver. Instead, a GUID is read > from a header for each endpoint. The GUID identifies the device and is to > be used with an XML, provided by the vendor, to discover the available set > of metrics and their register mapping. This allows firmware updates to > modify the register space without needing to update the driver every time > with new mappings. Firmware writes a new GUID in this case to specify the > new mapping. Software tools with access to the associated XML file can > then interpret the changes. > > This module manages access to all PMT Telemetry endpoints on a system, > independent of the device exporting them. It creates a pmt_telemetry class > to manage the devices. For each telemetry endpoint, sysfs files provide > GUID and size information as well as a pointer to the parent device the > telemetry came from. Software may discover the association between > endpoints and devices by iterating through the list in sysfs, or by looking > for the existence of the class folder under the device of interest. A > device node of the same name allows software to then map the telemetry > space for direct access. > > This patch also creates an pci device id list for early telemetry hardware > that requires workarounds for known issues. Some more style issues, after addressing feel free to add Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> > Signed-off-by: David E. Box <david.e.box@linux.intel.com> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Since you are submitting this the order of the above SoB chain is a bit strange. I think something like SoB: Alexander Co-developed-by: Alexander SoB: David is expected (same for patch 2). ... > +Contact: David Box <david.e.box@linux.intel.com> > +Description: > + The telem<x> directory contains files describing an instance of > + a PMT telemetry device that exposes hardware telemetry. Each > + telem<x> directory has an associated /dev/telem<x> node. This > + node may be opened and mapped to access the telemetry space of > + the device. The register layout of the telemetry space is > + determined from an XML file that matches the pci device id and PCI > + guid for the device. GUID Same for all code where it appears. ... > + psize = (PFN_UP(entry->base_addr + entry->header.size) - pfn) * > + PAGE_SIZE; I wouldn't mind having this on one line. ... > +static ssize_t guid_show(struct device *dev, struct device_attribute *attr, > + char *buf) Ditto. ... > +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, > + char *buf) Ditto. ... > +static bool pmt_telem_is_early_client_hw(struct device *dev) > +{ > + struct pci_dev *parent; > + > + parent = to_pci_dev(dev->parent); Can be one line. > + return !!pci_match_id(pmt_telem_early_client_pci_ids, parent); > +} ... > + entry->header_res = platform_get_resource(pdev, IORESOURCE_MEM, > + i); One line, please. -- With Best Regards, Andy Shevchenko
Hi David, On 14.07.2020 9:23, David E. Box wrote: > PMT Telemetry is a capability of the Intel Platform Monitoring Technology. > The Telemetry capability provides access to device telemetry metrics that > provide hardware performance data to users from continuous, memory mapped, > read-only register spaces. > > Register mappings are not provided by the driver. Instead, a GUID is read > from a header for each endpoint. The GUID identifies the device and is to > be used with an XML, provided by the vendor, to discover the available set > of metrics and their register mapping. This allows firmware updates to > modify the register space without needing to update the driver every time > with new mappings. Firmware writes a new GUID in this case to specify the > new mapping. Software tools with access to the associated XML file can > then interpret the changes. > > This module manages access to all PMT Telemetry endpoints on a system, > independent of the device exporting them. It creates a pmt_telemetry class > to manage the devices. For each telemetry endpoint, sysfs files provide > GUID and size information as well as a pointer to the parent device the > telemetry came from. Software may discover the association between > endpoints and devices by iterating through the list in sysfs, or by looking > for the existence of the class folder under the device of interest. A > device node of the same name allows software to then map the telemetry > space for direct access. > > This patch also creates an pci device id list for early telemetry hardware > that requires workarounds for known issues. > > Signed-off-by: David E. Box <david.e.box@linux.intel.com> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > --- > .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++ > MAINTAINERS | 1 + > drivers/platform/x86/Kconfig | 10 + > drivers/platform/x86/Makefile | 1 + > drivers/platform/x86/intel_pmt_telemetry.c | 454 ++++++++++++++++++ > 5 files changed, 512 insertions(+) > create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry > create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c > > diff --git a/Documentation/ABI/testing/sysfs-class-pmt_telemetry b/Documentation/ABI/testing/sysfs-class-pmt_telemetry > new file mode 100644 > index 000000000000..381924549ecb > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-class-pmt_telemetry > @@ -0,0 +1,46 @@ > +What: /sys/class/pmt_telemetry/ > +Date: July 2020 > +KernelVersion: 5.9 > +Contact: David Box <david.e.box@linux.intel.com> > +Description: > + The pmt_telemetry/ class directory contains information for > + devices that expose hardware telemetry using Intel Platform > + Monitoring Technology (PMT) > + > +What: /sys/class/pmt_telemetry/telem<x> > +Date: July 2020 > +KernelVersion: 5.9 > +Contact: David Box <david.e.box@linux.intel.com> > +Description: > + The telem<x> directory contains files describing an instance of > + a PMT telemetry device that exposes hardware telemetry. Each > + telem<x> directory has an associated /dev/telem<x> node. This > + node may be opened and mapped to access the telemetry space of > + the device. The register layout of the telemetry space is > + determined from an XML file that matches the pci device id and > + guid for the device. > + > +What: /sys/class/pmt_telemetry/telem<x>/guid > +Date: July 2020 > +KernelVersion: 5.9 > +Contact: David Box <david.e.box@linux.intel.com> > +Description: > + (RO) The guid for this telemetry device. The guid identifies > + the version of the XML file for the parent device that is to > + be used to get the register layout. > + > +What: /sys/class/pmt_telemetry/telem<x>/size > +Date: July 2020 > +KernelVersion: 5.9 > +Contact: David Box <david.e.box@linux.intel.com> > +Description: > + (RO) The size of telemetry region in bytes that corresponds to > + the mapping size for the /dev/telem<x> device node. > + > +What: /sys/class/pmt_telemetry/telem<x>/offset > +Date: July 2020 > +KernelVersion: 5.9 > +Contact: David Box <david.e.box@linux.intel.com> > +Description: > + (RO) The offset of telemetry region in bytes that corresponds to > + the mapping for the /dev/telem<x> device node. > diff --git a/MAINTAINERS b/MAINTAINERS > index 2e42bf0c41ab..ebc145894abd 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8849,6 +8849,7 @@ INTEL PMT DRIVER > M: "David E. Box" <david.e.box@linux.intel.com> > S: Maintained > F: drivers/mfd/intel_pmt.c > +F: drivers/platform/x86/intel_pmt_* > > INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT > M: Stanislav Yakovlev <stas.yakovlev@gmail.com> > diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig > index 0581a54cf562..5e1f7ce6e69f 100644 > --- a/drivers/platform/x86/Kconfig > +++ b/drivers/platform/x86/Kconfig > @@ -1396,6 +1396,16 @@ config INTEL_TELEMETRY > directly via debugfs files. Various tools may use > this interface for SoC state monitoring. > > +config INTEL_PMT_TELEMETRY > + tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver" > + help > + The Intel Platform Monitory Technology (PMT) Telemetry driver provides > + access to hardware telemetry metrics on devices that support the > + feature. > + > + For more information, see > + <file:Documentation/ABI/testing/sysfs-class-intel_pmt_telem> > + > endif # X86_PLATFORM_DEVICES > > config PMC_ATOM > diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile > index 2b85852a1a87..95cd3d0be17f 100644 > --- a/drivers/platform/x86/Makefile > +++ b/drivers/platform/x86/Makefile > @@ -139,6 +139,7 @@ obj-$(CONFIG_INTEL_MFLD_THERMAL) += intel_mid_thermal.o > obj-$(CONFIG_INTEL_MID_POWER_BUTTON) += intel_mid_powerbtn.o > obj-$(CONFIG_INTEL_MRFLD_PWRBTN) += intel_mrfld_pwrbtn.o > obj-$(CONFIG_INTEL_PMC_CORE) += intel_pmc_core.o intel_pmc_core_pltdrv.o > +obj-$(CONFIG_INTEL_PMT_TELEMETRY) += intel_pmt_telemetry.o > obj-$(CONFIG_INTEL_PUNIT_IPC) += intel_punit_ipc.o > obj-$(CONFIG_INTEL_SCU_IPC) += intel_scu_ipc.o > obj-$(CONFIG_INTEL_SCU_PCI) += intel_scu_pcidrv.o > diff --git a/drivers/platform/x86/intel_pmt_telemetry.c b/drivers/platform/x86/intel_pmt_telemetry.c > new file mode 100644 > index 000000000000..e1856fc8c209 > --- /dev/null > +++ b/drivers/platform/x86/intel_pmt_telemetry.c > @@ -0,0 +1,454 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Intel Platform Monitory Technology Telemetry driver > + * > + * Copyright (c) 2020, Intel Corporation. > + * All Rights Reserved. > + * > + * Author: "David E. Box" <david.e.box@linux.intel.com> > + */ > + > +#include <linux/bits.h> > +#include <linux/cdev.h> > +#include <linux/io-64-nonatomic-lo-hi.h> > +#include <linux/kernel.h> > +#include <linux/module.h> > +#include <linux/pci.h> > +#include <linux/platform_device.h> > +#include <linux/slab.h> > +#include <linux/types.h> > +#include <linux/uaccess.h> > +#include <linux/xarray.h> > + > +#define TELEM_DEV_NAME "pmt_telemetry" > + > +/* Telemetry access types */ > +#define TELEM_ACCESS_FUTURE 1 > +#define TELEM_ACCESS_BARID 2 > +#define TELEM_ACCESS_LOCAL 3 > + > +#define TELEM_GUID_OFFSET 0x4 > +#define TELEM_BASE_OFFSET 0x8 > +#define TELEM_TBIR_MASK GENMASK(2, 0) > +#define TELEM_ACCESS(v) ((v) & GENMASK(3, 0)) > +#define TELEM_TYPE(v) (((v) & GENMASK(7, 4)) >> 4) > +/* size is in bytes */ > +#define TELEM_SIZE(v) (((v) & GENMASK(27, 12)) >> 10) > + > +#define TELEM_XA_START 0 > +#define TELEM_XA_MAX INT_MAX > +#define TELEM_XA_LIMIT XA_LIMIT(TELEM_XA_START, TELEM_XA_MAX) > + > +/* Used by client hardware to identify a fixed telemetry entry*/ > +#define TELEM_CLIENT_FIXED_BLOCK_GUID 0x10000000 > + > +static DEFINE_XARRAY_ALLOC(telem_array); > + > +struct pmt_telem_priv; > + > +struct telem_header { > + u8 access_type; > + u8 telem_type; > + u16 size; > + u32 guid; > + u32 base_offset; > + u8 tbir; > +}; > + > +struct pmt_telem_entry { > + struct pmt_telem_priv *priv; > + struct telem_header header; > + struct resource *header_res; > + unsigned long base_addr; > + void __iomem *disc_table; > + struct cdev cdev; > + dev_t devt; > + int devid; > +}; > + > +struct pmt_telem_priv { > + struct pmt_telem_entry *entry; > + int num_entries; > + struct device *dev; > +}; > + > +/* > + * devfs > + */ > +static int pmt_telem_open(struct inode *inode, struct file *filp) > +{ > + struct pmt_telem_priv *priv; > + struct pmt_telem_entry *entry; > + struct pci_driver *pci_drv; > + struct pci_dev *pci_dev; > + > + if (!capable(CAP_SYS_ADMIN)) Thanks for supplying these patches. Are there any reasons not to expose this feature to CAP_PERFMON privileged processes too that currently have access to performance monitoring features of the kernel without root/CAP_SYS_ADMIN credentials? This could be done by pefmon_capable() function call starting from v5.8+. Thanks, Alexei > + return -EPERM; > + > + entry = container_of(inode->i_cdev, struct pmt_telem_entry, cdev); > + priv = entry->priv; > + pci_dev = to_pci_dev(priv->dev->parent); > + > + pci_drv = pci_dev_driver(pci_dev); > + if (!pci_drv) > + return -ENODEV; > + > + filp->private_data = entry; > + get_device(&pci_dev->dev); > + > + if (!try_module_get(pci_drv->driver.owner)) { > + put_device(&pci_dev->dev); > + return -ENODEV; > + } > + > + return 0; > +} > + > +static int pmt_telem_release(struct inode *inode, struct file *filp) > +{ > + struct pmt_telem_entry *entry = filp->private_data; > + struct pci_dev *pci_dev = to_pci_dev(entry->priv->dev->parent); > + struct pci_driver *pci_drv = pci_dev_driver(pci_dev); > + > + put_device(&pci_dev->dev); > + module_put(pci_drv->driver.owner); > + > + return 0; > +} > + > +static int pmt_telem_mmap(struct file *filp, struct vm_area_struct *vma) > +{ > + struct pmt_telem_entry *entry = filp->private_data; > + struct pmt_telem_priv *priv; > + unsigned long vsize = vma->vm_end - vma->vm_start; > + unsigned long phys = entry->base_addr; > + unsigned long pfn = PFN_DOWN(phys); > + unsigned long psize; > + > + priv = entry->priv; > + psize = (PFN_UP(entry->base_addr + entry->header.size) - pfn) * > + PAGE_SIZE; > + if (vsize > psize) { > + dev_err(priv->dev, "Requested mmap size is too large\n"); > + return -EINVAL; > + } > + > + if ((vma->vm_flags & VM_WRITE) || (vma->vm_flags & VM_MAYWRITE)) > + return -EPERM; > + > + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > + > + if (io_remap_pfn_range(vma, vma->vm_start, pfn, vsize, > + vma->vm_page_prot)) > + return -EINVAL; > + > + return 0; > +} > + > +static const struct file_operations pmt_telem_fops = { > + .owner = THIS_MODULE, > + .open = pmt_telem_open, > + .mmap = pmt_telem_mmap, > + .release = pmt_telem_release, > +}; > + > +/* > + * sysfs > + */ > +static ssize_t guid_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct pmt_telem_entry *entry = dev_get_drvdata(dev); > + > + return sprintf(buf, "0x%x\n", entry->header.guid); > +} > +static DEVICE_ATTR_RO(guid); > + > +static ssize_t size_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct pmt_telem_entry *entry = dev_get_drvdata(dev); > + > + /* Display buffer size in bytes */ > + return sprintf(buf, "%u\n", entry->header.size); > +} > +static DEVICE_ATTR_RO(size); > + > +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + struct pmt_telem_entry *entry = dev_get_drvdata(dev); > + > + /* Display buffer offset in bytes */ > + return sprintf(buf, "%lu\n", offset_in_page(entry->base_addr)); > +} > +static DEVICE_ATTR_RO(offset); > + > +static struct attribute *pmt_telem_attrs[] = { > + &dev_attr_guid.attr, > + &dev_attr_size.attr, > + &dev_attr_offset.attr, > + NULL > +}; > +ATTRIBUTE_GROUPS(pmt_telem); > + > +struct class pmt_telem_class = { > + .owner = THIS_MODULE, > + .name = "pmt_telemetry", > + .dev_groups = pmt_telem_groups, > +}; > + > +/* > + * driver initialization > + */ > +static const struct pci_device_id pmt_telem_early_client_pci_ids[] = { > + { PCI_VDEVICE(INTEL, 0x9a0d) }, /* TGL */ > + { } > +}; > + > +static bool pmt_telem_is_early_client_hw(struct device *dev) > +{ > + struct pci_dev *parent; > + > + parent = to_pci_dev(dev->parent); > + > + return !!pci_match_id(pmt_telem_early_client_pci_ids, parent); > +} > + > +static int pmt_telem_create_dev(struct pmt_telem_priv *priv, > + struct pmt_telem_entry *entry) > +{ > + struct pci_dev *pci_dev; > + struct device *dev; > + int ret; > + > + cdev_init(&entry->cdev, &pmt_telem_fops); > + ret = cdev_add(&entry->cdev, entry->devt, 1); > + if (ret) { > + dev_err(priv->dev, "Could not add char dev\n"); > + return ret; > + } > + > + pci_dev = to_pci_dev(priv->dev->parent); > + dev = device_create(&pmt_telem_class, &pci_dev->dev, entry->devt, > + entry, "telem%d", entry->devid); > + if (IS_ERR(dev)) { > + dev_err(priv->dev, "Could not create device node\n"); > + cdev_del(&entry->cdev); > + } > + > + return PTR_ERR_OR_ZERO(dev); > +} > + > +static void pmt_telem_populate_header(void __iomem *disc_offset, > + struct telem_header *header) > +{ > + header->access_type = TELEM_ACCESS(readb(disc_offset)); > + header->telem_type = TELEM_TYPE(readb(disc_offset)); > + header->size = TELEM_SIZE(readl(disc_offset)); > + header->guid = readl(disc_offset + TELEM_GUID_OFFSET); > + header->base_offset = readl(disc_offset + TELEM_BASE_OFFSET); > + > + /* > + * For non-local access types the lower 3 bits of base offset > + * contains the index of the base address register where the > + * telemetry can be found. > + */ > + header->tbir = header->base_offset & TELEM_TBIR_MASK; > + header->base_offset ^= header->tbir; > +} > + > +static int pmt_telem_add_entry(struct pmt_telem_priv *priv, > + struct pmt_telem_entry *entry) > +{ > + struct resource *res = entry->header_res; > + struct pci_dev *pci_dev = to_pci_dev(priv->dev->parent); > + int ret; > + > + pmt_telem_populate_header(entry->disc_table, &entry->header); > + > + /* Local access and BARID only for now */ > + switch (entry->header.access_type) { > + case TELEM_ACCESS_LOCAL: > + if (entry->header.tbir) { > + dev_err(priv->dev, > + "Unsupported BAR index %d for access type %d\n", > + entry->header.tbir, entry->header.access_type); > + return -EINVAL; > + } > + > + /* > + * For access_type LOCAL, the base address is as follows: > + * base address = header address + header length + base offset > + */ > + entry->base_addr = res->start + resource_size(res) + > + entry->header.base_offset; > + break; > + > + case TELEM_ACCESS_BARID: > + entry->base_addr = pci_dev->resource[entry->header.tbir].start + > + entry->header.base_offset; > + break; > + > + default: > + dev_err(priv->dev, "Unsupported access type %d\n", > + entry->header.access_type); > + return -EINVAL; > + } > + > + ret = alloc_chrdev_region(&entry->devt, 0, 1, TELEM_DEV_NAME); > + if (ret) { > + dev_err(priv->dev, > + "PMT telemetry chrdev_region error: %d\n", ret); > + return ret; > + } > + > + ret = xa_alloc(&telem_array, &entry->devid, entry, TELEM_XA_LIMIT, > + GFP_KERNEL); > + if (ret) > + goto fail_xa_alloc; > + > + ret = pmt_telem_create_dev(priv, entry); > + if (ret) > + goto fail_create_dev; > + > + entry->priv = priv; > + priv->num_entries++; > + return 0; > + > +fail_create_dev: > + xa_erase(&telem_array, entry->devid); > +fail_xa_alloc: > + unregister_chrdev_region(entry->devt, 1); > + > + return ret; > +} > + > +static bool pmt_telem_region_overlaps(struct platform_device *pdev, > + void __iomem *disc_table) > +{ > + u32 guid; > + > + guid = readl(disc_table + TELEM_GUID_OFFSET); > + > + return guid == TELEM_CLIENT_FIXED_BLOCK_GUID; > +} > + > +static void pmt_telem_remove_entries(struct pmt_telem_priv *priv) > +{ > + int i; > + > + for (i = 0; i < priv->num_entries; i++) { > + device_destroy(&pmt_telem_class, priv->entry[i].devt); > + cdev_del(&priv->entry[i].cdev); > + xa_erase(&telem_array, priv->entry[i].devid); > + unregister_chrdev_region(priv->entry[i].devt, 1); > + } > +} > + > +static int pmt_telem_probe(struct platform_device *pdev) > +{ > + struct pmt_telem_priv *priv; > + struct pmt_telem_entry *entry; > + bool early_hw; > + int i; > + > + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL); > + if (!priv) > + return -ENOMEM; > + > + platform_set_drvdata(pdev, priv); > + priv->dev = &pdev->dev; > + > + priv->entry = devm_kcalloc(&pdev->dev, pdev->num_resources, > + sizeof(struct pmt_telem_entry), GFP_KERNEL); > + if (!priv->entry) > + return -ENOMEM; > + > + if (pmt_telem_is_early_client_hw(&pdev->dev)) > + early_hw = true; > + > + for (i = 0, entry = priv->entry; i < pdev->num_resources; > + i++, entry++) { > + int ret; > + > + entry->header_res = platform_get_resource(pdev, IORESOURCE_MEM, > + i); > + if (!entry->header_res) { > + pmt_telem_remove_entries(priv); > + return -ENODEV; > + } > + > + entry->disc_table = devm_platform_ioremap_resource(pdev, i); > + if (IS_ERR(entry->disc_table)) { > + pmt_telem_remove_entries(priv); > + return PTR_ERR(entry->disc_table); > + } > + > + if (pmt_telem_region_overlaps(pdev, entry->disc_table) && > + early_hw) > + continue; > + > + ret = pmt_telem_add_entry(priv, entry); > + if (ret) { > + pmt_telem_remove_entries(priv); > + return ret; > + } > + } > + > + return 0; > +} > + > +static int pmt_telem_remove(struct platform_device *pdev) > +{ > + struct pmt_telem_priv *priv = platform_get_drvdata(pdev); > + > + pmt_telem_remove_entries(priv); > + > + return 0; > +} > + > +static const struct platform_device_id pmt_telem_table[] = { > + { > + .name = "pmt_telemetry", > + }, > + {} > +}; > +MODULE_DEVICE_TABLE(platform, pmt_telem_table); > + > +static struct platform_driver pmt_telem_driver = { > + .driver = { > + .name = TELEM_DEV_NAME, > + }, > + .probe = pmt_telem_probe, > + .remove = pmt_telem_remove, > + .id_table = pmt_telem_table, > +}; > + > +static int __init pmt_telem_init(void) > +{ > + int ret = class_register(&pmt_telem_class); > + > + if (ret) > + return ret; > + > + ret = platform_driver_register(&pmt_telem_driver); > + if (ret) > + class_unregister(&pmt_telem_class); > + > + return ret; > +} > +module_init(pmt_telem_init); > + > +static void __exit pmt_telem_exit(void) > +{ > + platform_driver_unregister(&pmt_telem_driver); > + class_unregister(&pmt_telem_class); > + xa_destroy(&telem_array); > +} > +module_exit(pmt_telem_exit); > + > +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); > +MODULE_DESCRIPTION("Intel PMT Telemetry driver"); > +MODULE_ALIAS("platform:" TELEM_DEV_NAME); > +MODULE_LICENSE("GPL v2"); >
On Wed, 2020-07-15 at 10:39 +0300, Alexey Budankov wrote: > Hi David, > > On 14.07.2020 9:23, David E. Box wrote: ... > > > > +static int pmt_telem_open(struct inode *inode, struct file *filp) > > +{ > > + struct pmt_telem_priv *priv; > > + struct pmt_telem_entry *entry; > > + struct pci_driver *pci_drv; > > + struct pci_dev *pci_dev; > > + > > + if (!capable(CAP_SYS_ADMIN)) > > Thanks for supplying these patches. > Are there any reasons not to expose this feature to CAP_PERFMON > privileged > processes too that currently have access to performance monitoring > features > of the kernel without root/CAP_SYS_ADMIN credentials? This could be > done by > pefmon_capable() function call starting from v5.8+. The new capability is well suited for this feature. I'll make the change. Thanks. David
On 7/13/20 11:23 PM, David E. Box wrote:
> Add PCIe DVSEC extended capability ID and defines for the header offsets.
> Defined in PCIe r5.0, sec 7.9.6.
>
> Signed-off-by: David E. Box <david.e.box@linux.intel.com>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
> include/uapi/linux/pci_regs.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index f9701410d3b5..09daa9f07b6b 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -720,6 +720,7 @@
> +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */
> @@ -1062,6 +1063,10 @@
> +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
> +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */
> +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */
Just a little comment: It would make more sense to me to
s/DVSEC/DVSPEC/g.
But then I don't have the PCIe documentation.
--
~Randy
On 7/13/20 11:23 PM, David E. Box wrote:
> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
> index 0581a54cf562..5e1f7ce6e69f 100644
> --- a/drivers/platform/x86/Kconfig
> +++ b/drivers/platform/x86/Kconfig
> @@ -1396,6 +1396,16 @@ config INTEL_TELEMETRY
> directly via debugfs files. Various tools may use
> this interface for SoC state monitoring.
>
> +config INTEL_PMT_TELEMETRY
> + tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver"
> + help
> + The Intel Platform Monitory Technology (PMT) Telemetry driver provides
> + access to hardware telemetry metrics on devices that support the
> + feature.
> +
> + For more information, see
> + <file:Documentation/ABI/testing/sysfs-class-intel_pmt_telem>
> +
> endif # X86_PLATFORM_DEVICES
>
> config PMC_ATOM
The text under "help" should be indented with one tab + 2 spaces,
as is done in patch 2/3.
--
~Randy
On 16.07.2020 2:59, David E. Box wrote:
> On Wed, 2020-07-15 at 10:39 +0300, Alexey Budankov wrote:
>> Hi David,
>>
>> On 14.07.2020 9:23, David E. Box wrote:
>
> ...
>
>>>
>>> +static int pmt_telem_open(struct inode *inode, struct file *filp)
>>> +{
>>> + struct pmt_telem_priv *priv;
>>> + struct pmt_telem_entry *entry;
>>> + struct pci_driver *pci_drv;
>>> + struct pci_dev *pci_dev;
>>> +
>>> + if (!capable(CAP_SYS_ADMIN))
>>
>> Thanks for supplying these patches.
>> Are there any reasons not to expose this feature to CAP_PERFMON
>> privileged
>> processes too that currently have access to performance monitoring
>> features
>> of the kernel without root/CAP_SYS_ADMIN credentials? This could be
>> done by
>> pefmon_capable() function call starting from v5.8+.
>
> The new capability is well suited for this feature. I'll make the
> change. Thanks.
I appreciate your cooperation. Thanks!
Alexei
On Wed, Jul 15, 2020 at 07:55:11PM -0700, Randy Dunlap wrote:
> On 7/13/20 11:23 PM, David E. Box wrote:
> > Add PCIe DVSEC extended capability ID and defines for the header offsets.
> > Defined in PCIe r5.0, sec 7.9.6.
> >
> > Signed-off-by: David E. Box <david.e.box@linux.intel.com>
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > ---
> > include/uapi/linux/pci_regs.h | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> > index f9701410d3b5..09daa9f07b6b 100644
> > --- a/include/uapi/linux/pci_regs.h
> > +++ b/include/uapi/linux/pci_regs.h
> > @@ -720,6 +720,7 @@
> > +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */
> > @@ -1062,6 +1063,10 @@
> > +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
> > +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */
> > +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */
>
> Just a little comment: It would make more sense to me to
> s/DVSEC/DVSPEC/g.
Yeah, that is confusing, but "DVSEC" is the term used in the spec. I
think it stands for "Designated Vendor-Specific Extended Capability".
On 7/16/20 8:07 AM, Bjorn Helgaas wrote:
> On Wed, Jul 15, 2020 at 07:55:11PM -0700, Randy Dunlap wrote:
>> On 7/13/20 11:23 PM, David E. Box wrote:
>>> Add PCIe DVSEC extended capability ID and defines for the header offsets.
>>> Defined in PCIe r5.0, sec 7.9.6.
>>>
>>> Signed-off-by: David E. Box <david.e.box@linux.intel.com>
>>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>> ---
>>> include/uapi/linux/pci_regs.h | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
>>> index f9701410d3b5..09daa9f07b6b 100644
>>> --- a/include/uapi/linux/pci_regs.h
>>> +++ b/include/uapi/linux/pci_regs.h
>>> @@ -720,6 +720,7 @@
>>> +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */
>>> @@ -1062,6 +1063,10 @@
>>> +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
>>> +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */
>>> +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */
>>
>> Just a little comment: It would make more sense to me to
>> s/DVSEC/DVSPEC/g.
>
> Yeah, that is confusing, but "DVSEC" is the term used in the spec. I
> think it stands for "Designated Vendor-Specific Extended Capability".
Right. I noticed that after I sent the email.
thanks.
--
~Randy
On 7/15/2020 7:55 PM, Randy Dunlap wrote:
> On 7/13/20 11:23 PM, David E. Box wrote:
>> Add PCIe DVSEC extended capability ID and defines for the header offsets.
>> Defined in PCIe r5.0, sec 7.9.6.
>>
>> Signed-off-by: David E. Box <david.e.box@linux.intel.com>
>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>> ---
>> include/uapi/linux/pci_regs.h | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
>> index f9701410d3b5..09daa9f07b6b 100644
>> --- a/include/uapi/linux/pci_regs.h
>> +++ b/include/uapi/linux/pci_regs.h
>> @@ -720,6 +720,7 @@
>> +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */
>> @@ -1062,6 +1063,10 @@
>> +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */
>> +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific Header1 */
>> +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific Header2 */
>
> Just a little comment: It would make more sense to me to
> s/DVSEC/DVSPEC/g.
>
> But then I don't have the PCIe documentation.
Arguably some of the confusion might be from the patch title. DVSEC is
acronym for Designated Vendor-Specific Extended Capability if I recall
correctly. It would probably be best to call that out since the extended
implies it lives in the config space accessible via the memory mapped
config.
On Thu, 2020-07-16 at 10:18 -0700, Alexander Duyck wrote: > > On 7/15/2020 7:55 PM, Randy Dunlap wrote: > > On 7/13/20 11:23 PM, David E. Box wrote: > > > Add PCIe DVSEC extended capability ID and defines for the header > > > offsets. > > > Defined in PCIe r5.0, sec 7.9.6. > > > > > > Signed-off-by: David E. Box <david.e.box@linux.intel.com> > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > > > --- > > > include/uapi/linux/pci_regs.h | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/include/uapi/linux/pci_regs.h > > > b/include/uapi/linux/pci_regs.h > > > index f9701410d3b5..09daa9f07b6b 100644 > > > --- a/include/uapi/linux/pci_regs.h > > > +++ b/include/uapi/linux/pci_regs.h > > > @@ -720,6 +720,7 @@ > > > +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor- > > > Specific */ > > > @@ -1062,6 +1063,10 @@ > > > +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ > > > +#define PCI_DVSEC_HEADER1 0x4 /* Vendor-Specific > > > Header1 */ > > > +#define PCI_DVSEC_HEADER2 0x8 /* Vendor-Specific > > > Header2 */ These comments I'll fix to say "Designated Vendor-Specific" > > > > Just a little comment: It would make more sense to me to > > s/DVSEC/DVSPEC/g. > > > > But then I don't have the PCIe documentation. > > Arguably some of the confusion might be from the patch title. DVSEC > is > acronym for Designated Vendor-Specific Extended Capability if I > recall > correctly. It would probably be best to call that out since the > extended > implies it lives in the config space accessible via the memory > mapped > config. I'll change the patch title as well, but agree DVSEC is better as it's consistent with the spec. Thanks David
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring capabilities on a device. With customers increasingly asking for hardware telemetry, engineers not only have to figure out how to measure and collect data, but also how to deliver it and make it discoverable. The latter may be through some device specific method requiring device specific tools to collect the data. This in turn requires customers to manage a suite of different tools in order to collect the differing assortment of monitoring data on their systems. Even when such information can be provided in kernel drivers, they may require constant maintenance to update register mappings as they change with firmware updates and new versions of hardware. PMT provides a solution for discovering and reading telemetry from a device through a hardware agnostic framework that allows for updates to systems without requiring patches to the kernel or software tools. PMT defines several capabilities to support collecting monitoring data from hardware. All are discoverable as separate instances of the PCIE Designated Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID field uniquely identifies the capability. Each DVSEC also provides a BAR offset to a header that defines capability-specific attributes, including GUID, feature type, offset and length, as well as configuration settings where applicable. The GUID uniquely identifies the register space of any monitor data exposed by the capability. The GUID is associated with an XML file from the vendor that describes the mapping of the register space along with properties of the monitor data. This allows vendors to perform firmware updates that can change the mapping (e.g. add new metrics) without requiring any changes to drivers or software tools. The new mapping is confirmed by an updated GUID, read from the hardware, which software uses with a new XML. The current capabilities defined by PMT are Telemetry, Watcher, and Crashlog. The Telemetry capability provides access to a continuous block of read only data. The Watcher capability provides access to hardware sampling and tracing features. Crashlog provides access to device crash dumps. While there is some relationship between capabilities (Watcher can be configured to sample from the Telemetry data set) each exists as stand alone features with no dependency on any other. The design therefore splits them into individual, capability specific drivers. MFD is used to create platform devices for each capability so that they may be managed by their own driver. The PMT architecture is (for the most part) agnostic to the type of device it can collect from. Devices nodes are consequently generic in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability driver creates a class to manage the list of devices supporting it. Software can determine which devices support a PMT feature by searching through each device node entry in the sysfs class folder. It can additionally determine if a particular device supports a PMT feature by checking for a PMT class folder in the device folder. This patch set provides support for the PMT framework, along with support for Telemetry on Tiger Lake. Changes from V3: - Write out full acronym for DVSEC in PCI patch commit message and add 'Designated' to comments - remove unused variable caught by kernel test robot <lkp@intel.com> - Add required Co-developed-by signoffs, noted by Andy - Allow access using new CAP_PERFMON capability as suggested by Alexey Bundankov - Fix spacing in Kconfig, noted by Randy - Other style changes and fixups suggested by Andy Changes from V2: - In order to handle certain HW bugs from the telemetry capability driver, create a single platform device per capability instead of a device per entry. Add the entry data as device resources and let the capability driver manage them as a set allowing for cleaner HW bug resolution. - Handle discovery table offset bug in intel_pmt.c - Handle overlapping regions in intel_pmt_telemetry.c - Add description of sysfs class to testing ABI. - Don't check size and count until confirming support for the PMT capability to avoid bailing out when we need to skip it. - Remove unneeded header file. Move code to the intel_pmt.c, the only place where it's needed. - Remove now unused platform data. - Add missing header files types.h, bits.h. - Rename file name and build options from telem to telemetry. - Code cleanup suggested by Andy S. - x86 mailing list added. Changes from V1: - In the telemetry driver, set the device in device_create() to the parent PCI device (the monitoring device) for clear association in sysfs. Was set before to the platform device created by the PCI parent. - Move telem struct into driver and delete unneeded header file. - Start telem device numbering from 0 instead of 1. 1 was used due to anticipated changes, no longer needed. - Use helper macros suggested by Andy S. - Rename class to pmt_telemetry, spelling out full name - Move monitor device name defines to common header - Coding style, spelling, and Makefile/MAINTAINERS ordering fixes David E. Box (3): PCI: Add defines for Designated Vendor-Specific Extended Capability mfd: Intel Platform Monitoring Technology support platform/x86: Intel PMT Telemetry capability driver .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++ MAINTAINERS | 6 + drivers/mfd/Kconfig | 10 + drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 215 +++++++++ drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telemetry.c | 448 ++++++++++++++++++ include/uapi/linux/pci_regs.h | 5 + 9 files changed, 742 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry create mode 100644 drivers/mfd/intel_pmt.c create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c -- 2.20.1
Add PCIe Designated Vendor-Specific Extended Capability (DVSEC) and defines for the header offsets. Defined in PCIe r5.0, sec 7.9.6. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> --- include/uapi/linux/pci_regs.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index f9701410d3b5..beafeee39e44 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -720,6 +720,7 @@ #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT @@ -1062,6 +1063,10 @@ #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ +#define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */ +#define PCI_DVSEC_HEADER2 0x8 /* Designated Vendor-Specific Header2 */ + /* Data Link Feature */ #define PCI_DLF_CAP 0x04 /* Capabilities Register */ #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ -- 2.20.1
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring facilities. PMT supports multiple types of monitoring capabilities. This driver creates platform devices for each type so that they may be managed by capability specific drivers (to be introduced). Capabilities are discovered using PCIe DVSEC ids. Support is included for the 3 current capability types, Telemetry, Watcher, and Crashlog. The features are available on new Intel platforms starting from Tiger Lake for which support is added. Also add a quirk mechanism for several early hardware differences and bugs. For Tiger Lake, do not support Watcher and Crashlog capabilities since they will not be compatible with future product. Also, fix use a quirk to fix the discovery table offset. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Co-developed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> --- MAINTAINERS | 5 + drivers/mfd/Kconfig | 10 ++ drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 215 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 231 insertions(+) create mode 100644 drivers/mfd/intel_pmt.c diff --git a/MAINTAINERS b/MAINTAINERS index b4a43a9e7fbc..2e42bf0c41ab 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8845,6 +8845,11 @@ F: drivers/mfd/intel_soc_pmic* F: include/linux/mfd/intel_msic.h F: include/linux/mfd/intel_soc_pmic* +INTEL PMT DRIVER +M: "David E. Box" <david.e.box@linux.intel.com> +S: Maintained +F: drivers/mfd/intel_pmt.c + INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT M: Stanislav Yakovlev <stas.yakovlev@gmail.com> L: linux-wireless@vger.kernel.org diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig index a37d7d171382..1a62ce2c68d9 100644 --- a/drivers/mfd/Kconfig +++ b/drivers/mfd/Kconfig @@ -670,6 +670,16 @@ config MFD_INTEL_PMC_BXT Register and P-unit access. In addition this creates devices for iTCO watchdog and telemetry that are part of the PMC. +config MFD_INTEL_PMT + tristate "Intel Platform Monitoring Technology support" + depends on PCI + select MFD_CORE + help + The Intel Platform Monitoring Technology (PMT) is an interface that + provides access to hardware monitor registers. This driver supports + Telemetry, Watcher, and Crashlog PMT capabilities/devices for + platforms starting from Tiger Lake. + config MFD_IPAQ_MICRO bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support" depends on SA1100_H3100 || SA1100_H3600 diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile index 9367a92f795a..1961b4737985 100644 --- a/drivers/mfd/Makefile +++ b/drivers/mfd/Makefile @@ -216,6 +216,7 @@ obj-$(CONFIG_MFD_INTEL_LPSS_PCI) += intel-lpss-pci.o obj-$(CONFIG_MFD_INTEL_LPSS_ACPI) += intel-lpss-acpi.o obj-$(CONFIG_MFD_INTEL_MSIC) += intel_msic.o obj-$(CONFIG_MFD_INTEL_PMC_BXT) += intel_pmc_bxt.o +obj-$(CONFIG_MFD_INTEL_PMT) += intel_pmt.o obj-$(CONFIG_MFD_PALMAS) += palmas.o obj-$(CONFIG_MFD_VIPERBOARD) += viperboard.o obj-$(CONFIG_MFD_RC5T583) += rc5t583.o rc5t583-irq.o diff --git a/drivers/mfd/intel_pmt.c b/drivers/mfd/intel_pmt.c new file mode 100644 index 000000000000..6857eaf4ff86 --- /dev/null +++ b/drivers/mfd/intel_pmt.c @@ -0,0 +1,215 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitoring Technology MFD driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Authors: David E. Box <david.e.box@linux.intel.com> + */ + +#include <linux/bits.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/pm.h> +#include <linux/pm_runtime.h> +#include <linux/mfd/core.h> +#include <linux/types.h> + +/* Intel DVSEC capability vendor space offsets */ +#define INTEL_DVSEC_ENTRIES 0xA +#define INTEL_DVSEC_SIZE 0xB +#define INTEL_DVSEC_TABLE 0xC +#define INTEL_DVSEC_TABLE_BAR(x) ((x) & GENMASK(2, 0)) +#define INTEL_DVSEC_TABLE_OFFSET(x) ((x) & GENMASK(31, 3)) +#define INTEL_DVSEC_ENTRY_SIZE 4 + +/* PMT capabilities */ +#define DVSEC_INTEL_ID_TELEMETRY 2 +#define DVSEC_INTEL_ID_WATCHER 3 +#define DVSEC_INTEL_ID_CRASHLOG 4 + +#define TELEMETRY_DEV_NAME "pmt_telemetry" +#define WATCHER_DEV_NAME "pmt_watcher" +#define CRASHLOG_DEV_NAME "pmt_crashlog" + +struct intel_dvsec_header { + u16 length; + u16 id; + u8 num_entries; + u8 entry_size; + u8 tbir; + u32 offset; +}; + +enum pmt_quirks { + /* Watcher capability not supported */ + PMT_QUIRK_NO_WATCHER = BIT(0), + + /* Crashlog capability not supported */ + PMT_QUIRK_NO_CRASHLOG = BIT(1), + + /* Use shift instead of mask to read discovery table offset */ + PMT_QUIRK_TABLE_SHIFT = BIT(2), +}; + +struct pmt_platform_info { + unsigned long quirks; +}; + +static const struct pmt_platform_info tgl_info = { + .quirks = PMT_QUIRK_NO_WATCHER | PMT_QUIRK_NO_CRASHLOG | + PMT_QUIRK_TABLE_SHIFT, +}; + +static int +pmt_add_dev(struct pci_dev *pdev, struct intel_dvsec_header *header, + struct pmt_platform_info *info) +{ + struct device *dev = &pdev->dev; + struct resource *res, *tmp; + struct mfd_cell *cell; + const char *name; + int count = header->num_entries; + int size = header->entry_size; + int i; + + switch (header->id) { + case DVSEC_INTEL_ID_TELEMETRY: + name = TELEMETRY_DEV_NAME; + break; + case DVSEC_INTEL_ID_WATCHER: + if (info->quirks & PMT_QUIRK_NO_WATCHER) { + dev_info(dev, "Watcher not supported\n"); + return 0; + } + name = WATCHER_DEV_NAME; + break; + case DVSEC_INTEL_ID_CRASHLOG: + if (info->quirks & PMT_QUIRK_NO_CRASHLOG) { + dev_info(dev, "Crashlog not supported\n"); + return 0; + } + name = CRASHLOG_DEV_NAME; + break; + default: + return -EINVAL; + } + + if (!header->num_entries || !header->entry_size) { + dev_warn(dev, "Invalid count or size for %s header\n", name); + return -EINVAL; + } + + cell = devm_kzalloc(dev, sizeof(*cell), GFP_KERNEL); + if (!cell) + return -ENOMEM; + + res = devm_kcalloc(dev, count, sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + if (info->quirks & PMT_QUIRK_TABLE_SHIFT) + header->offset >>= 3; + + for (i = 0, tmp = res; i < count; i++, tmp++) { + tmp->start = pdev->resource[header->tbir].start + + header->offset + i * (size << 2); + tmp->end = tmp->start + (size << 2) - 1; + tmp->flags = IORESOURCE_MEM; + } + + cell->resources = res; + cell->num_resources = count; + cell->name = name; + + return devm_mfd_add_devices(dev, PLATFORM_DEVID_AUTO, cell, 1, NULL, 0, + NULL); +} + +static int +pmt_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct intel_dvsec_header header; + struct pmt_platform_info *info; + bool found_devices = false; + int ret, pos = 0; + u32 table; + u16 vid; + + ret = pcim_enable_device(pdev); + if (ret) + return ret; + + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data, sizeof(*info), + GFP_KERNEL); + if (!info) + return -ENOMEM; + + pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC); + while (pos) { + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, &vid); + if (vid != PCI_VENDOR_ID_INTEL) + continue; + + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER2, + &header.id); + pci_read_config_byte(pdev, pos + INTEL_DVSEC_ENTRIES, + &header.num_entries); + pci_read_config_byte(pdev, pos + INTEL_DVSEC_SIZE, + &header.entry_size); + pci_read_config_dword(pdev, pos + INTEL_DVSEC_TABLE, + &table); + + header.tbir = INTEL_DVSEC_TABLE_BAR(table); + header.offset = INTEL_DVSEC_TABLE_OFFSET(table); + + ret = pmt_add_dev(pdev, &header, info); + if (ret) + dev_warn(&pdev->dev, + "Failed to add devices for DVSEC id %d\n", + header.id); + found_devices = true; + + pos = pci_find_next_ext_capability(pdev, pos, + PCI_EXT_CAP_ID_DVSEC); + } + + if (!found_devices) { + dev_err(&pdev->dev, "No supported PMT capabilities found.\n"); + return -ENODEV; + } + + pm_runtime_put(&pdev->dev); + pm_runtime_allow(&pdev->dev); + + return 0; +} + +static void pmt_pci_remove(struct pci_dev *pdev) +{ + pm_runtime_forbid(&pdev->dev); + pm_runtime_get_sync(&pdev->dev); +} + +#define PCI_DEVICE_ID_INTEL_PMT_TGL 0x9a0d + +static const struct pci_device_id pmt_pci_ids[] = { + { PCI_DEVICE_DATA(INTEL, PMT_TGL, &tgl_info) }, + { } +}; +MODULE_DEVICE_TABLE(pci, pmt_pci_ids); + +static struct pci_driver pmt_pci_driver = { + .name = "intel-pmt", + .id_table = pmt_pci_ids, + .probe = pmt_pci_probe, + .remove = pmt_pci_remove, +}; +module_pci_driver(pmt_pci_driver); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel Platform Monitoring Technology MFD driver"); +MODULE_LICENSE("GPL v2"); -- 2.20.1
PMT Telemetry is a capability of the Intel Platform Monitoring Technology. The Telemetry capability provides access to device telemetry metrics that provide hardware performance data to users from continuous, memory mapped, read-only register spaces. Register mappings are not provided by the driver. Instead, a GUID is read from a header for each endpoint. The GUID identifies the device and is to be used with an XML, provided by the vendor, to discover the available set of metrics and their register mapping. This allows firmware updates to modify the register space without needing to update the driver every time with new mappings. Firmware writes a new GUID in this case to specify the new mapping. Software tools with access to the associated XML file can then interpret the changes. The module manages access to all PMT Telemetry endpoints on a system, independent of the device exporting them. It creates a pmt_telemetry class to manage the devices. For each telemetry endpoint, sysfs files provide GUID and size information as well as a pointer to the parent device the telemetry came from. Software may discover the association between endpoints and devices by iterating through the list in sysfs, or by looking for the existence of the class folder under the device of interest. A device node of the same name allows software to then map the telemetry space for direct access. Also create a PCI device id list for early telemetry hardware that require workarounds for known issues. Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Co-developed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> --- .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++ MAINTAINERS | 1 + drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telemetry.c | 448 ++++++++++++++++++ 5 files changed, 506 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c diff --git a/Documentation/ABI/testing/sysfs-class-pmt_telemetry b/Documentation/ABI/testing/sysfs-class-pmt_telemetry new file mode 100644 index 000000000000..b0b096db9cae --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-pmt_telemetry @@ -0,0 +1,46 @@ +What: /sys/class/pmt_telemetry/ +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + The pmt_telemetry/ class directory contains information for + devices that expose hardware telemetry using Intel Platform + Monitoring Technology (PMT) + +What: /sys/class/pmt_telemetry/telem<x> +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + The telem<x> directory contains files describing an instance of + a PMT telemetry device that exposes hardware telemetry. Each + telem<x> directory has an associated /dev/telem<x> node. This + node may be opened and mapped to access the telemetry space of + the device. The register layout of the telemetry space is + determined from an XML file that matches the PCI device id and + GUID for the device. + +What: /sys/class/pmt_telemetry/telem<x>/guid +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The GUID for this telemetry device. The GUID identifies + the version of the XML file for the parent device that is to + be used to get the register layout. + +What: /sys/class/pmt_telemetry/telem<x>/size +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The size of telemetry region in bytes that corresponds to + the mapping size for the /dev/telem<x> device node. + +What: /sys/class/pmt_telemetry/telem<x>/offset +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The offset of telemetry region in bytes that corresponds to + the mapping for the /dev/telem<x> device node. diff --git a/MAINTAINERS b/MAINTAINERS index 2e42bf0c41ab..ebc145894abd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8849,6 +8849,7 @@ INTEL PMT DRIVER M: "David E. Box" <david.e.box@linux.intel.com> S: Maintained F: drivers/mfd/intel_pmt.c +F: drivers/platform/x86/intel_pmt_* INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT M: Stanislav Yakovlev <stas.yakovlev@gmail.com> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 0581a54cf562..8552b094d005 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -1339,6 +1339,16 @@ config INTEL_PMC_CORE - LTR Ignore - MPHY/PLL gating status (Sunrisepoint PCH only) +config INTEL_PMT_TELEMETRY + tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver" + help + The Intel Platform Monitory Technology (PMT) Telemetry driver provides + access to hardware telemetry metrics on devices that support the + feature. + + For more information, see + <file:Documentation/ABI/testing/sysfs-class-intel_pmt_telem> + config INTEL_PUNIT_IPC tristate "Intel P-Unit IPC Driver" help diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index 2b85852a1a87..95cd3d0be17f 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -139,6 +139,7 @@ obj-$(CONFIG_INTEL_MFLD_THERMAL) += intel_mid_thermal.o obj-$(CONFIG_INTEL_MID_POWER_BUTTON) += intel_mid_powerbtn.o obj-$(CONFIG_INTEL_MRFLD_PWRBTN) += intel_mrfld_pwrbtn.o obj-$(CONFIG_INTEL_PMC_CORE) += intel_pmc_core.o intel_pmc_core_pltdrv.o +obj-$(CONFIG_INTEL_PMT_TELEMETRY) += intel_pmt_telemetry.o obj-$(CONFIG_INTEL_PUNIT_IPC) += intel_punit_ipc.o obj-$(CONFIG_INTEL_SCU_IPC) += intel_scu_ipc.o obj-$(CONFIG_INTEL_SCU_PCI) += intel_scu_pcidrv.o diff --git a/drivers/platform/x86/intel_pmt_telemetry.c b/drivers/platform/x86/intel_pmt_telemetry.c new file mode 100644 index 000000000000..544a84d72cf7 --- /dev/null +++ b/drivers/platform/x86/intel_pmt_telemetry.c @@ -0,0 +1,448 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitory Technology Telemetry driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Author: "David E. Box" <david.e.box@linux.intel.com> + */ + +#include <linux/bits.h> +#include <linux/cdev.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include <linux/xarray.h> + +#define TELEM_DEV_NAME "pmt_telemetry" + +/* Telemetry access types */ +#define TELEM_ACCESS_FUTURE 1 +#define TELEM_ACCESS_BARID 2 +#define TELEM_ACCESS_LOCAL 3 + +#define TELEM_GUID_OFFSET 0x4 +#define TELEM_BASE_OFFSET 0x8 +#define TELEM_TBIR_MASK GENMASK(2, 0) +#define TELEM_ACCESS(v) ((v) & GENMASK(3, 0)) +#define TELEM_TYPE(v) (((v) & GENMASK(7, 4)) >> 4) +/* size is in bytes */ +#define TELEM_SIZE(v) (((v) & GENMASK(27, 12)) >> 10) + +#define TELEM_XA_START 0 +#define TELEM_XA_MAX INT_MAX +#define TELEM_XA_LIMIT XA_LIMIT(TELEM_XA_START, TELEM_XA_MAX) + +/* Used by client hardware to identify a fixed telemetry entry*/ +#define TELEM_CLIENT_FIXED_BLOCK_GUID 0x10000000 + +static DEFINE_XARRAY_ALLOC(telem_array); + +struct pmt_telem_priv; + +struct telem_header { + u8 access_type; + u8 telem_type; + u16 size; + u32 guid; + u32 base_offset; + u8 tbir; +}; + +struct pmt_telem_entry { + struct pmt_telem_priv *priv; + struct telem_header header; + struct resource *header_res; + unsigned long base_addr; + void __iomem *disc_table; + struct cdev cdev; + dev_t devt; + int devid; +}; + +struct pmt_telem_priv { + struct pmt_telem_entry *entry; + int num_entries; + struct device *dev; +}; + +/* + * devfs + */ +static int pmt_telem_open(struct inode *inode, struct file *filp) +{ + struct pmt_telem_priv *priv; + struct pmt_telem_entry *entry; + struct pci_driver *pci_drv; + struct pci_dev *pci_dev; + + if (!perfmon_capable()) + return -EPERM; + + entry = container_of(inode->i_cdev, struct pmt_telem_entry, cdev); + priv = entry->priv; + pci_dev = to_pci_dev(priv->dev->parent); + + pci_drv = pci_dev_driver(pci_dev); + if (!pci_drv) + return -ENODEV; + + filp->private_data = entry; + get_device(&pci_dev->dev); + + if (!try_module_get(pci_drv->driver.owner)) { + put_device(&pci_dev->dev); + return -ENODEV; + } + + return 0; +} + +static int pmt_telem_release(struct inode *inode, struct file *filp) +{ + struct pmt_telem_entry *entry = filp->private_data; + struct pci_dev *pci_dev = to_pci_dev(entry->priv->dev->parent); + struct pci_driver *pci_drv = pci_dev_driver(pci_dev); + + put_device(&pci_dev->dev); + module_put(pci_drv->driver.owner); + + return 0; +} + +static int pmt_telem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct pmt_telem_entry *entry = filp->private_data; + struct pmt_telem_priv *priv; + unsigned long vsize = vma->vm_end - vma->vm_start; + unsigned long phys = entry->base_addr; + unsigned long pfn = PFN_DOWN(phys); + unsigned long psize; + + priv = entry->priv; + psize = (PFN_UP(entry->base_addr + entry->header.size) - pfn) * PAGE_SIZE; + if (vsize > psize) { + dev_err(priv->dev, "Requested mmap size is too large\n"); + return -EINVAL; + } + + if ((vma->vm_flags & VM_WRITE) || (vma->vm_flags & VM_MAYWRITE)) + return -EPERM; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + + if (io_remap_pfn_range(vma, vma->vm_start, pfn, vsize, + vma->vm_page_prot)) + return -EINVAL; + + return 0; +} + +static const struct file_operations pmt_telem_fops = { + .owner = THIS_MODULE, + .open = pmt_telem_open, + .mmap = pmt_telem_mmap, + .release = pmt_telem_release, +}; + +/* + * sysfs + */ +static ssize_t guid_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + return sprintf(buf, "0x%x\n", entry->header.guid); +} +static DEVICE_ATTR_RO(guid); + +static ssize_t size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + /* Display buffer size in bytes */ + return sprintf(buf, "%u\n", entry->header.size); +} +static DEVICE_ATTR_RO(size); + +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + /* Display buffer offset in bytes */ + return sprintf(buf, "%lu\n", offset_in_page(entry->base_addr)); +} +static DEVICE_ATTR_RO(offset); + +static struct attribute *pmt_telem_attrs[] = { + &dev_attr_guid.attr, + &dev_attr_size.attr, + &dev_attr_offset.attr, + NULL +}; +ATTRIBUTE_GROUPS(pmt_telem); + +static struct class pmt_telem_class = { + .owner = THIS_MODULE, + .name = "pmt_telemetry", + .dev_groups = pmt_telem_groups, +}; + +/* + * driver initialization + */ +static const struct pci_device_id pmt_telem_early_client_pci_ids[] = { + { PCI_VDEVICE(INTEL, 0x9a0d) }, /* TGL */ + { } +}; + +static bool pmt_telem_is_early_client_hw(struct device *dev) +{ + struct pci_dev *parent = to_pci_dev(dev->parent); + + return !!pci_match_id(pmt_telem_early_client_pci_ids, parent); +} + +static int pmt_telem_create_dev(struct pmt_telem_priv *priv, + struct pmt_telem_entry *entry) +{ + struct pci_dev *pci_dev; + struct device *dev; + int ret; + + cdev_init(&entry->cdev, &pmt_telem_fops); + ret = cdev_add(&entry->cdev, entry->devt, 1); + if (ret) { + dev_err(priv->dev, "Could not add char dev\n"); + return ret; + } + + pci_dev = to_pci_dev(priv->dev->parent); + dev = device_create(&pmt_telem_class, &pci_dev->dev, entry->devt, + entry, "telem%d", entry->devid); + if (IS_ERR(dev)) { + dev_err(priv->dev, "Could not create device node\n"); + cdev_del(&entry->cdev); + } + + return PTR_ERR_OR_ZERO(dev); +} + +static void pmt_telem_populate_header(void __iomem *disc_offset, + struct telem_header *header) +{ + header->access_type = TELEM_ACCESS(readb(disc_offset)); + header->telem_type = TELEM_TYPE(readb(disc_offset)); + header->size = TELEM_SIZE(readl(disc_offset)); + header->guid = readl(disc_offset + TELEM_GUID_OFFSET); + header->base_offset = readl(disc_offset + TELEM_BASE_OFFSET); + + /* + * For non-local access types the lower 3 bits of base offset + * contains the index of the base address register where the + * telemetry can be found. + */ + header->tbir = header->base_offset & TELEM_TBIR_MASK; + header->base_offset ^= header->tbir; +} + +static int pmt_telem_add_entry(struct pmt_telem_priv *priv, + struct pmt_telem_entry *entry) +{ + struct resource *res = entry->header_res; + struct pci_dev *pci_dev = to_pci_dev(priv->dev->parent); + int ret; + + pmt_telem_populate_header(entry->disc_table, &entry->header); + + /* Local access and BARID only for now */ + switch (entry->header.access_type) { + case TELEM_ACCESS_LOCAL: + if (entry->header.tbir) { + dev_err(priv->dev, + "Unsupported BAR index %d for access type %d\n", + entry->header.tbir, entry->header.access_type); + return -EINVAL; + } + + /* + * For access_type LOCAL, the base address is as follows: + * base address = header address + header length + base offset + */ + entry->base_addr = res->start + resource_size(res) + + entry->header.base_offset; + break; + + case TELEM_ACCESS_BARID: + entry->base_addr = pci_dev->resource[entry->header.tbir].start + + entry->header.base_offset; + break; + + default: + dev_err(priv->dev, "Unsupported access type %d\n", + entry->header.access_type); + return -EINVAL; + } + + ret = alloc_chrdev_region(&entry->devt, 0, 1, TELEM_DEV_NAME); + if (ret) { + dev_err(priv->dev, + "PMT telemetry chrdev_region error: %d\n", ret); + return ret; + } + + ret = xa_alloc(&telem_array, &entry->devid, entry, TELEM_XA_LIMIT, + GFP_KERNEL); + if (ret) + goto fail_xa_alloc; + + ret = pmt_telem_create_dev(priv, entry); + if (ret) + goto fail_create_dev; + + entry->priv = priv; + priv->num_entries++; + return 0; + +fail_create_dev: + xa_erase(&telem_array, entry->devid); +fail_xa_alloc: + unregister_chrdev_region(entry->devt, 1); + + return ret; +} + +static bool pmt_telem_region_overlaps(struct platform_device *pdev, + void __iomem *disc_table) +{ + u32 guid; + + guid = readl(disc_table + TELEM_GUID_OFFSET); + + return guid == TELEM_CLIENT_FIXED_BLOCK_GUID; +} + +static void pmt_telem_remove_entries(struct pmt_telem_priv *priv) +{ + int i; + + for (i = 0; i < priv->num_entries; i++) { + device_destroy(&pmt_telem_class, priv->entry[i].devt); + cdev_del(&priv->entry[i].cdev); + xa_erase(&telem_array, priv->entry[i].devid); + unregister_chrdev_region(priv->entry[i].devt, 1); + } +} + +static int pmt_telem_probe(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv; + struct pmt_telem_entry *entry; + bool early_hw; + int i; + + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + platform_set_drvdata(pdev, priv); + priv->dev = &pdev->dev; + + priv->entry = devm_kcalloc(&pdev->dev, pdev->num_resources, + sizeof(struct pmt_telem_entry), GFP_KERNEL); + if (!priv->entry) + return -ENOMEM; + + if (pmt_telem_is_early_client_hw(&pdev->dev)) + early_hw = true; + + for (i = 0, entry = priv->entry; i < pdev->num_resources; + i++, entry++) { + int ret; + + entry->header_res = platform_get_resource(pdev, IORESOURCE_MEM, i); + if (!entry->header_res) { + pmt_telem_remove_entries(priv); + return -ENODEV; + } + + entry->disc_table = devm_platform_ioremap_resource(pdev, i); + if (IS_ERR(entry->disc_table)) { + pmt_telem_remove_entries(priv); + return PTR_ERR(entry->disc_table); + } + + if (pmt_telem_region_overlaps(pdev, entry->disc_table) && + early_hw) + continue; + + ret = pmt_telem_add_entry(priv, entry); + if (ret) { + pmt_telem_remove_entries(priv); + return ret; + } + } + + return 0; +} + +static int pmt_telem_remove(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv = platform_get_drvdata(pdev); + + pmt_telem_remove_entries(priv); + + return 0; +} + +static const struct platform_device_id pmt_telem_table[] = { + { + .name = "pmt_telemetry", + }, + {} +}; +MODULE_DEVICE_TABLE(platform, pmt_telem_table); + +static struct platform_driver pmt_telem_driver = { + .driver = { + .name = TELEM_DEV_NAME, + }, + .probe = pmt_telem_probe, + .remove = pmt_telem_remove, + .id_table = pmt_telem_table, +}; + +static int __init pmt_telem_init(void) +{ + int ret = class_register(&pmt_telem_class); + + if (ret) + return ret; + + ret = platform_driver_register(&pmt_telem_driver); + if (ret) + class_unregister(&pmt_telem_class); + + return ret; +} +module_init(pmt_telem_init); + +static void __exit pmt_telem_exit(void) +{ + platform_driver_unregister(&pmt_telem_driver); + class_unregister(&pmt_telem_class); + xa_destroy(&telem_array); +} +module_exit(pmt_telem_exit); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel PMT Telemetry driver"); +MODULE_ALIAS("platform:" TELEM_DEV_NAME); +MODULE_LICENSE("GPL v2"); -- 2.20.1
On Fri, Jul 17, 2020 at 10:05 PM David E. Box <david.e.box@linux.intel.com> wrote: > > Add PCIe Designated Vendor-Specific Extended Capability (DVSEC) and defines > for the header offsets. Defined in PCIe r5.0, sec 7.9.6. > FWIW, Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> > Signed-off-by: David E. Box <david.e.box@linux.intel.com> > Acked-by: Bjorn Helgaas <bhelgaas@google.com> > --- > include/uapi/linux/pci_regs.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h > index f9701410d3b5..beafeee39e44 100644 > --- a/include/uapi/linux/pci_regs.h > +++ b/include/uapi/linux/pci_regs.h > @@ -720,6 +720,7 @@ > #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ > #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ > #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ > +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ > #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ > #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ > #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT > @@ -1062,6 +1063,10 @@ > #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ > #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ > > +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ > +#define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */ > +#define PCI_DVSEC_HEADER2 0x8 /* Designated Vendor-Specific Header2 */ > + > /* Data Link Feature */ > #define PCI_DLF_CAP 0x04 /* Capabilities Register */ > #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ > -- > 2.20.1 > -- With Best Regards, Andy Shevchenko
On Fri, Jul 17, 2020 at 10:05 PM David E. Box <david.e.box@linux.intel.com> wrote: > > Intel Platform Monitoring Technology (PMT) is an architecture for > enumerating and accessing hardware monitoring capabilities on a device. > With customers increasingly asking for hardware telemetry, engineers not > only have to figure out how to measure and collect data, but also how to > deliver it and make it discoverable. The latter may be through some device > specific method requiring device specific tools to collect the data. This > in turn requires customers to manage a suite of different tools in order to > collect the differing assortment of monitoring data on their systems. Even > when such information can be provided in kernel drivers, they may require > constant maintenance to update register mappings as they change with > firmware updates and new versions of hardware. PMT provides a solution for > discovering and reading telemetry from a device through a hardware agnostic > framework that allows for updates to systems without requiring patches to > the kernel or software tools. > > PMT defines several capabilities to support collecting monitoring data from > hardware. All are discoverable as separate instances of the PCIE Designated > Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID > field uniquely identifies the capability. Each DVSEC also provides a BAR > offset to a header that defines capability-specific attributes, including > GUID, feature type, offset and length, as well as configuration settings > where applicable. The GUID uniquely identifies the register space of any > monitor data exposed by the capability. The GUID is associated with an XML > file from the vendor that describes the mapping of the register space along > with properties of the monitor data. This allows vendors to perform > firmware updates that can change the mapping (e.g. add new metrics) without > requiring any changes to drivers or software tools. The new mapping is > confirmed by an updated GUID, read from the hardware, which software uses > with a new XML. > > The current capabilities defined by PMT are Telemetry, Watcher, and > Crashlog. The Telemetry capability provides access to a continuous block > of read only data. The Watcher capability provides access to hardware > sampling and tracing features. Crashlog provides access to device crash > dumps. While there is some relationship between capabilities (Watcher can > be configured to sample from the Telemetry data set) each exists as stand > alone features with no dependency on any other. The design therefore splits > them into individual, capability specific drivers. MFD is used to create > platform devices for each capability so that they may be managed by their > own driver. The PMT architecture is (for the most part) agnostic to the > type of device it can collect from. Devices nodes are consequently generic > in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability driver > creates a class to manage the list of devices supporting it. Software can > determine which devices support a PMT feature by searching through each > device node entry in the sysfs class folder. It can additionally determine > if a particular device supports a PMT feature by checking for a PMT class > folder in the device folder. > > This patch set provides support for the PMT framework, along with support > for Telemetry on Tiger Lake. > I assume this goes thru MFD tree. > Changes from V3: > - Write out full acronym for DVSEC in PCI patch commit message and > add 'Designated' to comments > - remove unused variable caught by kernel test robot <lkp@intel.com> > - Add required Co-developed-by signoffs, noted by Andy > - Allow access using new CAP_PERFMON capability as suggested by > Alexey Bundankov > - Fix spacing in Kconfig, noted by Randy > - Other style changes and fixups suggested by Andy > > Changes from V2: > - In order to handle certain HW bugs from the telemetry capability > driver, create a single platform device per capability instead of > a device per entry. Add the entry data as device resources and > let the capability driver manage them as a set allowing for > cleaner HW bug resolution. > - Handle discovery table offset bug in intel_pmt.c > - Handle overlapping regions in intel_pmt_telemetry.c > - Add description of sysfs class to testing ABI. > - Don't check size and count until confirming support for the PMT > capability to avoid bailing out when we need to skip it. > - Remove unneeded header file. Move code to the intel_pmt.c, the > only place where it's needed. > - Remove now unused platform data. > - Add missing header files types.h, bits.h. > - Rename file name and build options from telem to telemetry. > - Code cleanup suggested by Andy S. > - x86 mailing list added. > > Changes from V1: > - In the telemetry driver, set the device in device_create() to > the parent PCI device (the monitoring device) for clear > association in sysfs. Was set before to the platform device > created by the PCI parent. > - Move telem struct into driver and delete unneeded header file. > - Start telem device numbering from 0 instead of 1. 1 was used > due to anticipated changes, no longer needed. > - Use helper macros suggested by Andy S. > - Rename class to pmt_telemetry, spelling out full name > - Move monitor device name defines to common header > - Coding style, spelling, and Makefile/MAINTAINERS ordering fixes > > David E. Box (3): > PCI: Add defines for Designated Vendor-Specific Extended Capability > mfd: Intel Platform Monitoring Technology support > platform/x86: Intel PMT Telemetry capability driver > > .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++ > MAINTAINERS | 6 + > drivers/mfd/Kconfig | 10 + > drivers/mfd/Makefile | 1 + > drivers/mfd/intel_pmt.c | 215 +++++++++ > drivers/platform/x86/Kconfig | 10 + > drivers/platform/x86/Makefile | 1 + > drivers/platform/x86/intel_pmt_telemetry.c | 448 ++++++++++++++++++ > include/uapi/linux/pci_regs.h | 5 + > 9 files changed, 742 insertions(+) > create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry > create mode 100644 drivers/mfd/intel_pmt.c > create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c > > -- > 2.20.1 > -- With Best Regards, Andy Shevchenko
On Mon, 2020-07-27 at 13:23 +0300, Andy Shevchenko wrote:
> On Fri, Jul 17, 2020 at 10:05 PM David E. Box
> <david.e.box@linux.intel.com> wrote:
> > Intel Platform Monitoring Technology (PMT) is an architecture for
> > enumerating and accessing hardware monitoring capabilities on a
> > device.
> > With customers increasingly asking for hardware telemetry,
> > engineers not
> > only have to figure out how to measure and collect data, but also
> > how to
> > deliver it and make it discoverable. The latter may be through some
> > device
> > specific method requiring device specific tools to collect the
> > data. This
> > in turn requires customers to manage a suite of different tools in
> > order to
> > collect the differing assortment of monitoring data on their
> > systems. Even
> > when such information can be provided in kernel drivers, they may
> > require
> > constant maintenance to update register mappings as they change
> > with
> > firmware updates and new versions of hardware. PMT provides a
> > solution for
> > discovering and reading telemetry from a device through a hardware
> > agnostic
> > framework that allows for updates to systems without requiring
> > patches to
> > the kernel or software tools.
> >
> > PMT defines several capabilities to support collecting monitoring
> > data from
> > hardware. All are discoverable as separate instances of the PCIE
> > Designated
> > Vendor extended capability (DVSEC) with the Intel vendor code. The
> > DVSEC ID
> > field uniquely identifies the capability. Each DVSEC also provides
> > a BAR
> > offset to a header that defines capability-specific attributes,
> > including
> > GUID, feature type, offset and length, as well as configuration
> > settings
> > where applicable. The GUID uniquely identifies the register space
> > of any
> > monitor data exposed by the capability. The GUID is associated with
> > an XML
> > file from the vendor that describes the mapping of the register
> > space along
> > with properties of the monitor data. This allows vendors to perform
> > firmware updates that can change the mapping (e.g. add new metrics)
> > without
> > requiring any changes to drivers or software tools. The new mapping
> > is
> > confirmed by an updated GUID, read from the hardware, which
> > software uses
> > with a new XML.
> >
> > The current capabilities defined by PMT are Telemetry, Watcher, and
> > Crashlog. The Telemetry capability provides access to a continuous
> > block
> > of read only data. The Watcher capability provides access to
> > hardware
> > sampling and tracing features. Crashlog provides access to device
> > crash
> > dumps. While there is some relationship between capabilities
> > (Watcher can
> > be configured to sample from the Telemetry data set) each exists as
> > stand
> > alone features with no dependency on any other. The design
> > therefore splits
> > them into individual, capability specific drivers. MFD is used to
> > create
> > platform devices for each capability so that they may be managed by
> > their
> > own driver. The PMT architecture is (for the most part) agnostic to
> > the
> > type of device it can collect from. Devices nodes are consequently
> > generic
> > in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability
> > driver
> > creates a class to manage the list of devices supporting
> > it. Software can
> > determine which devices support a PMT feature by searching through
> > each
> > device node entry in the sysfs class folder. It can additionally
> > determine
> > if a particular device supports a PMT feature by checking for a PMT
> > class
> > folder in the device folder.
> >
> > This patch set provides support for the PMT framework, along with
> > support
> > for Telemetry on Tiger Lake.
> >
>
> I assume this goes thru MFD tree.
Yes, looking for pull by MFD. Thanks Andy.
On Fri, 17 Jul 2020, David E. Box wrote: > Intel Platform Monitoring Technology (PMT) is an architecture for > enumerating and accessing hardware monitoring facilities. PMT supports > multiple types of monitoring capabilities. This driver creates platform > devices for each type so that they may be managed by capability specific > drivers (to be introduced). Capabilities are discovered using PCIe DVSEC > ids. Support is included for the 3 current capability types, Telemetry, > Watcher, and Crashlog. The features are available on new Intel platforms > starting from Tiger Lake for which support is added. > > Also add a quirk mechanism for several early hardware differences and bugs. > For Tiger Lake, do not support Watcher and Crashlog capabilities since they > will not be compatible with future product. Also, fix use a quirk to fix > the discovery table offset. > > Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> > Co-developed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> > Signed-off-by: David E. Box <david.e.box@linux.intel.com> This should be in chronological order. > --- > MAINTAINERS | 5 + > drivers/mfd/Kconfig | 10 ++ > drivers/mfd/Makefile | 1 + > drivers/mfd/intel_pmt.c | 215 ++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 231 insertions(+) > create mode 100644 drivers/mfd/intel_pmt.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index b4a43a9e7fbc..2e42bf0c41ab 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8845,6 +8845,11 @@ F: drivers/mfd/intel_soc_pmic* > F: include/linux/mfd/intel_msic.h > F: include/linux/mfd/intel_soc_pmic* > > +INTEL PMT DRIVER > +M: "David E. Box" <david.e.box@linux.intel.com> > +S: Maintained > +F: drivers/mfd/intel_pmt.c > + > INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT > M: Stanislav Yakovlev <stas.yakovlev@gmail.com> > L: linux-wireless@vger.kernel.org > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig > index a37d7d171382..1a62ce2c68d9 100644 > --- a/drivers/mfd/Kconfig > +++ b/drivers/mfd/Kconfig > @@ -670,6 +670,16 @@ config MFD_INTEL_PMC_BXT > Register and P-unit access. In addition this creates devices > for iTCO watchdog and telemetry that are part of the PMC. > > +config MFD_INTEL_PMT > + tristate "Intel Platform Monitoring Technology support" Nit: "Intel Platform Monitoring Technology (PMT) support" > + depends on PCI > + select MFD_CORE > + help > + The Intel Platform Monitoring Technology (PMT) is an interface that > + provides access to hardware monitor registers. This driver supports > + Telemetry, Watcher, and Crashlog PMT capabilities/devices for > + platforms starting from Tiger Lake. > + > config MFD_IPAQ_MICRO > bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support" > depends on SA1100_H3100 || SA1100_H3600 > diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile > index 9367a92f795a..1961b4737985 100644 > --- a/drivers/mfd/Makefile > +++ b/drivers/mfd/Makefile > @@ -216,6 +216,7 @@ obj-$(CONFIG_MFD_INTEL_LPSS_PCI) += intel-lpss-pci.o > obj-$(CONFIG_MFD_INTEL_LPSS_ACPI) += intel-lpss-acpi.o > obj-$(CONFIG_MFD_INTEL_MSIC) += intel_msic.o > obj-$(CONFIG_MFD_INTEL_PMC_BXT) += intel_pmc_bxt.o > +obj-$(CONFIG_MFD_INTEL_PMT) += intel_pmt.o > obj-$(CONFIG_MFD_PALMAS) += palmas.o > obj-$(CONFIG_MFD_VIPERBOARD) += viperboard.o > obj-$(CONFIG_MFD_RC5T583) += rc5t583.o rc5t583-irq.o > diff --git a/drivers/mfd/intel_pmt.c b/drivers/mfd/intel_pmt.c > new file mode 100644 > index 000000000000..6857eaf4ff86 > --- /dev/null > +++ b/drivers/mfd/intel_pmt.c > @@ -0,0 +1,215 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Intel Platform Monitoring Technology MFD driver s/MFD/(PMT)/ > + * Copyright (c) 2020, Intel Corporation. > + * All Rights Reserved. > + * > + * Authors: David E. Box <david.e.box@linux.intel.com> Looks odd to use a plural for a single author. > + */ > + > +#include <linux/bits.h> > +#include <linux/kernel.h> > +#include <linux/module.h> > +#include <linux/pci.h> > +#include <linux/platform_device.h> > +#include <linux/pm.h> > +#include <linux/pm_runtime.h> > +#include <linux/mfd/core.h> > +#include <linux/types.h> Alphabetical please. > +/* Intel DVSEC capability vendor space offsets */ > +#define INTEL_DVSEC_ENTRIES 0xA > +#define INTEL_DVSEC_SIZE 0xB > +#define INTEL_DVSEC_TABLE 0xC > +#define INTEL_DVSEC_TABLE_BAR(x) ((x) & GENMASK(2, 0)) > +#define INTEL_DVSEC_TABLE_OFFSET(x) ((x) & GENMASK(31, 3)) > +#define INTEL_DVSEC_ENTRY_SIZE 4 > + > +/* PMT capabilities */ > +#define DVSEC_INTEL_ID_TELEMETRY 2 > +#define DVSEC_INTEL_ID_WATCHER 3 > +#define DVSEC_INTEL_ID_CRASHLOG 4 > + > +#define TELEMETRY_DEV_NAME "pmt_telemetry" > +#define WATCHER_DEV_NAME "pmt_watcher" > +#define CRASHLOG_DEV_NAME "pmt_crashlog" Please don't define names of things. It makes grepping a pain, at the very least. Just use the 'raw' string in-place. > +struct intel_dvsec_header { > + u16 length; > + u16 id; > + u8 num_entries; > + u8 entry_size; > + u8 tbir; > + u32 offset; > +}; > + > +enum pmt_quirks { > + /* Watcher capability not supported */ > + PMT_QUIRK_NO_WATCHER = BIT(0), > + > + /* Crashlog capability not supported */ > + PMT_QUIRK_NO_CRASHLOG = BIT(1), > + > + /* Use shift instead of mask to read discovery table offset */ > + PMT_QUIRK_TABLE_SHIFT = BIT(2), > +}; > + > +struct pmt_platform_info { > + unsigned long quirks; > +}; > + > +static const struct pmt_platform_info tgl_info = { > + .quirks = PMT_QUIRK_NO_WATCHER | PMT_QUIRK_NO_CRASHLOG | > + PMT_QUIRK_TABLE_SHIFT, > +}; > + > +static int > +pmt_add_dev(struct pci_dev *pdev, struct intel_dvsec_header *header, > + struct pmt_platform_info *info) My personal preference is to a) only break when you have to and b) to align with the '('. Perhaps point b) is satisfied and it's just the patch format that's shifting the tab though? > +{ > + struct device *dev = &pdev->dev; > + struct resource *res, *tmp; > + struct mfd_cell *cell; > + const char *name; > + int count = header->num_entries; > + int size = header->entry_size; > + int i; > + > + switch (header->id) { > + case DVSEC_INTEL_ID_TELEMETRY: > + name = TELEMETRY_DEV_NAME; > + break; > + case DVSEC_INTEL_ID_WATCHER: > + if (info->quirks & PMT_QUIRK_NO_WATCHER) { > + dev_info(dev, "Watcher not supported\n"); > + return 0; > + } > + name = WATCHER_DEV_NAME; > + break; > + case DVSEC_INTEL_ID_CRASHLOG: > + if (info->quirks & PMT_QUIRK_NO_CRASHLOG) { > + dev_info(dev, "Crashlog not supported\n"); > + return 0; > + } > + name = CRASHLOG_DEV_NAME; > + break; > + default: > + return -EINVAL; Doesn't deserve an error message? > + } > + > + if (!header->num_entries || !header->entry_size) { > + dev_warn(dev, "Invalid count or size for %s header\n", name); > + return -EINVAL; If you're returning an error, this should be dev_err(). Even if you only handle it as a warning at the call site. > + } > + > + cell = devm_kzalloc(dev, sizeof(*cell), GFP_KERNEL); > + if (!cell) > + return -ENOMEM; > + > + res = devm_kcalloc(dev, count, sizeof(*res), GFP_KERNEL); > + if (!res) > + return -ENOMEM; > + > + if (info->quirks & PMT_QUIRK_TABLE_SHIFT) > + header->offset >>= 3; > + > + for (i = 0, tmp = res; i < count; i++, tmp++) { > + tmp->start = pdev->resource[header->tbir].start + > + header->offset + i * (size << 2); Deserves a comment I think. > + tmp->end = tmp->start + (size << 2) - 1; > + tmp->flags = IORESOURCE_MEM; > + } > + > + cell->resources = res; > + cell->num_resources = count; > + cell->name = name; > + > + return devm_mfd_add_devices(dev, PLATFORM_DEVID_AUTO, cell, 1, NULL, 0, > + NULL); > +} > + > +static int > +pmt_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) > +{ > + struct intel_dvsec_header header; > + struct pmt_platform_info *info; > + bool found_devices = false; > + int ret, pos = 0; > + u32 table; > + u16 vid; > + > + ret = pcim_enable_device(pdev); > + if (ret) > + return ret; > + > + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data, sizeof(*info), > + GFP_KERNEL); > + if (!info) > + return -ENOMEM; > + > + pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC); > + while (pos) { If you do: do { int pos; pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC); if (!pos) break; Then you can invoke pci_find_next_ext_capability() once, no? > + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, &vid); > + if (vid != PCI_VENDOR_ID_INTEL) > + continue; > + > + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER2, > + &header.id); > + pci_read_config_byte(pdev, pos + INTEL_DVSEC_ENTRIES, > + &header.num_entries); > + pci_read_config_byte(pdev, pos + INTEL_DVSEC_SIZE, > + &header.entry_size); > + pci_read_config_dword(pdev, pos + INTEL_DVSEC_TABLE, > + &table); > + > + header.tbir = INTEL_DVSEC_TABLE_BAR(table); > + header.offset = INTEL_DVSEC_TABLE_OFFSET(table); > + > + ret = pmt_add_dev(pdev, &header, info); > + if (ret) > + dev_warn(&pdev->dev, > + "Failed to add devices for DVSEC id %d\n", "device", so not all devices, right? > + header.id); Don't you want to continue here? Else you're going to set found_devices for a failed device. > + found_devices = true; > + > + pos = pci_find_next_ext_capability(pdev, pos, > + PCI_EXT_CAP_ID_DVSEC); > + } > + > + if (!found_devices) { > + dev_err(&pdev->dev, "No supported PMT capabilities found.\n"); > + return -ENODEV; > + } > + > + pm_runtime_put(&pdev->dev); > + pm_runtime_allow(&pdev->dev); > + > + return 0; > +} > + > +static void pmt_pci_remove(struct pci_dev *pdev) > +{ > + pm_runtime_forbid(&pdev->dev); > + pm_runtime_get_sync(&pdev->dev); > +} > + > +#define PCI_DEVICE_ID_INTEL_PMT_TGL 0x9a0d What's this for? If this is PCI_DEVICE_DATA magic, it would be worth tying it to the struct i.e. remove the empty line between it and the table below. > +static const struct pci_device_id pmt_pci_ids[] = { > + { PCI_DEVICE_DATA(INTEL, PMT_TGL, &tgl_info) }, > + { } > +}; > +MODULE_DEVICE_TABLE(pci, pmt_pci_ids); > + > +static struct pci_driver pmt_pci_driver = { > + .name = "intel-pmt", > + .id_table = pmt_pci_ids, > + .probe = pmt_pci_probe, > + .remove = pmt_pci_remove, > +}; > +module_pci_driver(pmt_pci_driver); > + > +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); > +MODULE_DESCRIPTION("Intel Platform Monitoring Technology MFD driver"); s/MFD/(PMT)/ > +MODULE_LICENSE("GPL v2"); -- Lee Jones [李琼斯] Senior Technical Lead - Developer Services Linaro.org │ Open source software for Arm SoCs Follow Linaro: Facebook | Twitter | Blog
Hi Lee,
Thanks for this thorough review. Ack on all the comments with
particular thanks for spoting the missing continue.
David
On Tue, 2020-07-28 at 08:58 +0100, Lee Jones wrote:
> On Fri, 17 Jul 2020, David E. Box wrote:
>
> > Intel Platform Monitoring Technology (PMT) is an architecture for
> > enumerating and accessing hardware monitoring facilities. PMT
> > supports
> > multiple types of monitoring capabilities. This driver creates
> > platform
> > devices for each type so that they may be managed by capability
> > specific
> > drivers (to be introduced). Capabilities are discovered using PCIe
> > DVSEC
> > ids. Support is included for the 3 current capability types,
> > Telemetry,
> > Watcher, and Crashlog. The features are available on new Intel
> > platforms
> > starting from Tiger Lake for which support is added.
> >
> > Also add a quirk mechanism for several early hardware differences
> > and bugs.
> > For Tiger Lake, do not support Watcher and Crashlog capabilities
> > since they
> > will not be compatible with future product. Also, fix use a quirk
> > to fix
> > the discovery table offset.
> >
> > Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> > Co-developed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com
> > >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > Signed-off-by: David E. Box <david.e.box@linux.intel.com>
>
> This should be in chronological order.
>
> > ---
> > MAINTAINERS | 5 +
> > drivers/mfd/Kconfig | 10 ++
> > drivers/mfd/Makefile | 1 +
> > drivers/mfd/intel_pmt.c | 215
> > ++++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 231 insertions(+)
> > create mode 100644 drivers/mfd/intel_pmt.c
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index b4a43a9e7fbc..2e42bf0c41ab 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8845,6 +8845,11 @@ F: drivers/mfd/intel_soc_pmic*
> > F: include/linux/mfd/intel_msic.h
> > F: include/linux/mfd/intel_soc_pmic*
> >
> > +INTEL PMT DRIVER
> > +M: "David E. Box" <david.e.box@linux.intel.com>
> > +S: Maintained
> > +F: drivers/mfd/intel_pmt.c
> > +
> > INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION
> > SUPPORT
> > M: Stanislav Yakovlev <stas.yakovlev@gmail.com>
> > L: linux-wireless@vger.kernel.org
> > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
> > index a37d7d171382..1a62ce2c68d9 100644
> > --- a/drivers/mfd/Kconfig
> > +++ b/drivers/mfd/Kconfig
> > @@ -670,6 +670,16 @@ config MFD_INTEL_PMC_BXT
> > Register and P-unit access. In addition this creates devices
> > for iTCO watchdog and telemetry that are part of the PMC.
> >
> > +config MFD_INTEL_PMT
> > + tristate "Intel Platform Monitoring Technology support"
>
> Nit: "Intel Platform Monitoring Technology (PMT) support"
>
> > + depends on PCI
> > + select MFD_CORE
> > + help
> > + The Intel Platform Monitoring Technology (PMT) is an
> > interface that
> > + provides access to hardware monitor registers. This driver
> > supports
> > + Telemetry, Watcher, and Crashlog PMT capabilities/devices for
> > + platforms starting from Tiger Lake.
> > +
> > config MFD_IPAQ_MICRO
> > bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support"
> > depends on SA1100_H3100 || SA1100_H3600
> > diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
> > index 9367a92f795a..1961b4737985 100644
> > --- a/drivers/mfd/Makefile
> > +++ b/drivers/mfd/Makefile
> > @@ -216,6 +216,7 @@ obj-$(CONFIG_MFD_INTEL_LPSS_PCI) +=
> > intel-lpss-pci.o
> > obj-$(CONFIG_MFD_INTEL_LPSS_ACPI) += intel-lpss-acpi.o
> > obj-$(CONFIG_MFD_INTEL_MSIC) += intel_msic.o
> > obj-$(CONFIG_MFD_INTEL_PMC_BXT) += intel_pmc_bxt.o
> > +obj-$(CONFIG_MFD_INTEL_PMT) += intel_pmt.o
> > obj-$(CONFIG_MFD_PALMAS) += palmas.o
> > obj-$(CONFIG_MFD_VIPERBOARD) += viperboard.o
> > obj-$(CONFIG_MFD_RC5T583) += rc5t583.o rc5t583-irq.o
> > diff --git a/drivers/mfd/intel_pmt.c b/drivers/mfd/intel_pmt.c
> > new file mode 100644
> > index 000000000000..6857eaf4ff86
> > --- /dev/null
> > +++ b/drivers/mfd/intel_pmt.c
> > @@ -0,0 +1,215 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Intel Platform Monitoring Technology MFD driver
>
> s/MFD/(PMT)/
>
> > + * Copyright (c) 2020, Intel Corporation.
> > + * All Rights Reserved.
> > + *
> > + * Authors: David E. Box <david.e.box@linux.intel.com>
>
> Looks odd to use a plural for a single author.
>
> > + */
> > +
> > +#include <linux/bits.h>
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +#include <linux/pci.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/pm.h>
> > +#include <linux/pm_runtime.h>
> > +#include <linux/mfd/core.h>
> > +#include <linux/types.h>
>
> Alphabetical please.
>
> > +/* Intel DVSEC capability vendor space offsets */
> > +#define INTEL_DVSEC_ENTRIES 0xA
> > +#define INTEL_DVSEC_SIZE 0xB
> > +#define INTEL_DVSEC_TABLE 0xC
> > +#define INTEL_DVSEC_TABLE_BAR(x) ((x) & GENMASK(2, 0))
> > +#define INTEL_DVSEC_TABLE_OFFSET(x) ((x) & GENMASK(31, 3))
> > +#define INTEL_DVSEC_ENTRY_SIZE 4
> > +
> > +/* PMT capabilities */
> > +#define DVSEC_INTEL_ID_TELEMETRY 2
> > +#define DVSEC_INTEL_ID_WATCHER 3
> > +#define DVSEC_INTEL_ID_CRASHLOG 4
> > +
> > +#define TELEMETRY_DEV_NAME "pmt_telemetry"
> > +#define WATCHER_DEV_NAME "pmt_watcher"
> > +#define CRASHLOG_DEV_NAME "pmt_crashlog"
>
> Please don't define names of things. It makes grepping a pain, at
> the
> very least. Just use the 'raw' string in-place.
>
> > +struct intel_dvsec_header {
> > + u16 length;
> > + u16 id;
> > + u8 num_entries;
> > + u8 entry_size;
> > + u8 tbir;
> > + u32 offset;
> > +};
> > +
> > +enum pmt_quirks {
> > + /* Watcher capability not supported */
> > + PMT_QUIRK_NO_WATCHER = BIT(0),
> > +
> > + /* Crashlog capability not supported */
> > + PMT_QUIRK_NO_CRASHLOG = BIT(1),
> > +
> > + /* Use shift instead of mask to read discovery table offset */
> > + PMT_QUIRK_TABLE_SHIFT = BIT(2),
> > +};
> > +
> > +struct pmt_platform_info {
> > + unsigned long quirks;
> > +};
> > +
> > +static const struct pmt_platform_info tgl_info = {
> > + .quirks = PMT_QUIRK_NO_WATCHER | PMT_QUIRK_NO_CRASHLOG |
> > + PMT_QUIRK_TABLE_SHIFT,
> > +};
> > +
> > +static int
> > +pmt_add_dev(struct pci_dev *pdev, struct intel_dvsec_header
> > *header,
> > + struct pmt_platform_info *info)
>
> My personal preference is to a) only break when you have to and b) to
> align with the '('. Perhaps point b) is satisfied and it's just the
> patch format that's shifting the tab though?
>
> > +{
> > + struct device *dev = &pdev->dev;
> > + struct resource *res, *tmp;
> > + struct mfd_cell *cell;
> > + const char *name;
> > + int count = header->num_entries;
> > + int size = header->entry_size;
> > + int i;
> > +
> > + switch (header->id) {
> > + case DVSEC_INTEL_ID_TELEMETRY:
> > + name = TELEMETRY_DEV_NAME;
> > + break;
> > + case DVSEC_INTEL_ID_WATCHER:
> > + if (info->quirks & PMT_QUIRK_NO_WATCHER) {
> > + dev_info(dev, "Watcher not supported\n");
> > + return 0;
> > + }
> > + name = WATCHER_DEV_NAME;
> > + break;
> > + case DVSEC_INTEL_ID_CRASHLOG:
> > + if (info->quirks & PMT_QUIRK_NO_CRASHLOG) {
> > + dev_info(dev, "Crashlog not supported\n");
> > + return 0;
> > + }
> > + name = CRASHLOG_DEV_NAME;
> > + break;
> > + default:
> > + return -EINVAL;
>
> Doesn't deserve an error message?
>
> > + }
> > +
> > + if (!header->num_entries || !header->entry_size) {
> > + dev_warn(dev, "Invalid count or size for %s header\n",
> > name);
> > + return -EINVAL;
>
> If you're returning an error, this should be dev_err().
>
> Even if you only handle it as a warning at the call site.
>
> > + }
> > +
> > + cell = devm_kzalloc(dev, sizeof(*cell), GFP_KERNEL);
> > + if (!cell)
> > + return -ENOMEM;
> > +
> > + res = devm_kcalloc(dev, count, sizeof(*res), GFP_KERNEL);
> > + if (!res)
> > + return -ENOMEM;
> > +
> > + if (info->quirks & PMT_QUIRK_TABLE_SHIFT)
> > + header->offset >>= 3;
> > +
> > + for (i = 0, tmp = res; i < count; i++, tmp++) {
> > + tmp->start = pdev->resource[header->tbir].start +
> > + header->offset + i * (size << 2);
>
> Deserves a comment I think.
>
> > + tmp->end = tmp->start + (size << 2) - 1;
> > + tmp->flags = IORESOURCE_MEM;
> > + }
> > +
> > + cell->resources = res;
> > + cell->num_resources = count;
> > + cell->name = name;
> > +
> > + return devm_mfd_add_devices(dev, PLATFORM_DEVID_AUTO, cell, 1,
> > NULL, 0,
> > + NULL);
> > +}
> > +
> > +static int
> > +pmt_pci_probe(struct pci_dev *pdev, const struct pci_device_id
> > *id)
> > +{
> > + struct intel_dvsec_header header;
> > + struct pmt_platform_info *info;
> > + bool found_devices = false;
> > + int ret, pos = 0;
> > + u32 table;
> > + u16 vid;
> > +
> > + ret = pcim_enable_device(pdev);
> > + if (ret)
> > + return ret;
> > +
> > + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data,
> > sizeof(*info),
> > + GFP_KERNEL);
> > + if (!info)
> > + return -ENOMEM;
> > +
> > + pos = pci_find_next_ext_capability(pdev, pos,
> > PCI_EXT_CAP_ID_DVSEC);
> > + while (pos) {
>
> If you do:
>
> do {
> int pos;
>
> pos = pci_find_next_ext_capability(pdev, pos,
> PCI_EXT_CAP_ID_DVSEC);
> if (!pos)
> break;
>
> Then you can invoke pci_find_next_ext_capability() once, no?
>
> > + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1,
> > &vid);
> > + if (vid != PCI_VENDOR_ID_INTEL)
> > + continue;
> > +
> > + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER2,
> > + &header.id);
> > + pci_read_config_byte(pdev, pos + INTEL_DVSEC_ENTRIES,
> > + &header.num_entries);
> > + pci_read_config_byte(pdev, pos + INTEL_DVSEC_SIZE,
> > + &header.entry_size);
> > + pci_read_config_dword(pdev, pos + INTEL_DVSEC_TABLE,
> > + &table);
> > +
> > + header.tbir = INTEL_DVSEC_TABLE_BAR(table);
> > + header.offset = INTEL_DVSEC_TABLE_OFFSET(table);
> > +
> > + ret = pmt_add_dev(pdev, &header, info);
> > + if (ret)
> > + dev_warn(&pdev->dev,
> > + "Failed to add devices for DVSEC id
> > %d\n",
>
> "device", so not all devices, right?
>
> > + header.id);
>
> Don't you want to continue here?
>
> Else you're going to set found_devices for a failed device.
>
> > + found_devices = true;
> > +
> > + pos = pci_find_next_ext_capability(pdev, pos,
> > + PCI_EXT_CAP_ID_DVSEC
> > );
> > + }
> > +
> > + if (!found_devices) {
> > + dev_err(&pdev->dev, "No supported PMT capabilities
> > found.\n");
> > + return -ENODEV;
> > + }
> > +
> > + pm_runtime_put(&pdev->dev);
> > + pm_runtime_allow(&pdev->dev);
> > +
> > + return 0;
> > +}
> > +
> > +static void pmt_pci_remove(struct pci_dev *pdev)
> > +{
> > + pm_runtime_forbid(&pdev->dev);
> > + pm_runtime_get_sync(&pdev->dev);
> > +}
> > +
> > +#define PCI_DEVICE_ID_INTEL_PMT_TGL 0x9a0d
>
> What's this for?
>
> If this is PCI_DEVICE_DATA magic, it would be worth tying it to the
> struct i.e. remove the empty line between it and the table below.
>
> > +static const struct pci_device_id pmt_pci_ids[] = {
> > + { PCI_DEVICE_DATA(INTEL, PMT_TGL, &tgl_info) },
> > + { }
> > +};
> > +MODULE_DEVICE_TABLE(pci, pmt_pci_ids);
> > +
> > +static struct pci_driver pmt_pci_driver = {
> > + .name = "intel-pmt",
> > + .id_table = pmt_pci_ids,
> > + .probe = pmt_pci_probe,
> > + .remove = pmt_pci_remove,
> > +};
> > +module_pci_driver(pmt_pci_driver);
> > +
> > +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>");
> > +MODULE_DESCRIPTION("Intel Platform Monitoring Technology MFD
> > driver");
>
> s/MFD/(PMT)/
>
> > +MODULE_LICENSE("GPL v2");
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring capabilities on a device. With customers increasingly asking for hardware telemetry, engineers not only have to figure out how to measure and collect data, but also how to deliver it and make it discoverable. The latter may be through some device specific method requiring device specific tools to collect the data. This in turn requires customers to manage a suite of different tools in order to collect the differing assortment of monitoring data on their systems. Even when such information can be provided in kernel drivers, they may require constant maintenance to update register mappings as they change with firmware updates and new versions of hardware. PMT provides a solution for discovering and reading telemetry from a device through a hardware agnostic framework that allows for updates to systems without requiring patches to the kernel or software tools. PMT defines several capabilities to support collecting monitoring data from hardware. All are discoverable as separate instances of the PCIE Designated Vendor extended capability (DVSEC) with the Intel vendor code. The DVSEC ID field uniquely identifies the capability. Each DVSEC also provides a BAR offset to a header that defines capability-specific attributes, including GUID, feature type, offset and length, as well as configuration settings where applicable. The GUID uniquely identifies the register space of any monitor data exposed by the capability. The GUID is associated with an XML file from the vendor that describes the mapping of the register space along with properties of the monitor data. This allows vendors to perform firmware updates that can change the mapping (e.g. add new metrics) without requiring any changes to drivers or software tools. The new mapping is confirmed by an updated GUID, read from the hardware, which software uses with a new XML. The current capabilities defined by PMT are Telemetry, Watcher, and Crashlog. The Telemetry capability provides access to a continuous block of read only data. The Watcher capability provides access to hardware sampling and tracing features. Crashlog provides access to device crash dumps. While there is some relationship between capabilities (Watcher can be configured to sample from the Telemetry data set) each exists as stand alone features with no dependency on any other. The design therefore splits them into individual, capability specific drivers. MFD is used to create platform devices for each capability so that they may be managed by their own driver. The PMT architecture is (for the most part) agnostic to the type of device it can collect from. Devices nodes are consequently generic in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability driver creates a class to manage the list of devices supporting it. Software can determine which devices support a PMT feature by searching through each device node entry in the sysfs class folder. It can additionally determine if a particular device supports a PMT feature by checking for a PMT class folder in the device folder. This patch set provides support for the PMT framework, along with support for Telemetry on Tiger Lake. Changes from V4: - Replace MFD with PMT in driver title - Fix commit tags in chronological order - Fix includes in alphabetical order - Use 'raw' string instead of defines for device names - Add an error message when returning an error code for unrecognized capability id - Use dev_err instead of dev_warn for messages when returning an error - Change while loop to call pci_find_next_ext_capability once - Add missing continue in while loop - Keep PCI platform defines using PCI_DEVICE_DATA magic tied to the pci_device_id table - Comment and kernel message cleanup Changes from V3: - Write out full acronym for DVSEC in PCI patch commit message and add 'Designated' to comments - remove unused variable caught by kernel test robot <lkp@intel.com> - Add required Co-developed-by signoffs, noted by Andy - Allow access using new CAP_PERFMON capability as suggested by Alexey Bundankov - Fix spacing in Kconfig, noted by Randy - Other style changes and fixups suggested by Andy Changes from V2: - In order to handle certain HW bugs from the telemetry capability driver, create a single platform device per capability instead of a device per entry. Add the entry data as device resources and let the capability driver manage them as a set allowing for cleaner HW bug resolution. - Handle discovery table offset bug in intel_pmt.c - Handle overlapping regions in intel_pmt_telemetry.c - Add description of sysfs class to testing ABI. - Don't check size and count until confirming support for the PMT capability to avoid bailing out when we need to skip it. - Remove unneeded header file. Move code to the intel_pmt.c, the only place where it's needed. - Remove now unused platform data. - Add missing header files types.h, bits.h. - Rename file name and build options from telem to telemetry. - Code cleanup suggested by Andy S. - x86 mailing list added. Changes from V1: - In the telemetry driver, set the device in device_create() to the parent PCI device (the monitoring device) for clear association in sysfs. Was set before to the platform device created by the PCI parent. - Move telem struct into driver and delete unneeded header file. - Start telem device numbering from 0 instead of 1. 1 was used due to anticipated changes, no longer needed. - Use helper macros suggested by Andy S. - Rename class to pmt_telemetry, spelling out full name - Move monitor device name defines to common header - Coding style, spelling, and Makefile/MAINTAINERS ordering fixes David E. Box (3): PCI: Add defines for Designated Vendor-Specific Extended Capability mfd: Intel Platform Monitoring Technology support platform/x86: Intel PMT Telemetry capability driver .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++ MAINTAINERS | 6 + drivers/mfd/Kconfig | 10 + drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 220 +++++++++ drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telemetry.c | 448 ++++++++++++++++++ include/uapi/linux/pci_regs.h | 5 + 9 files changed, 747 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry create mode 100644 drivers/mfd/intel_pmt.c create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c -- 2.20.1
Add PCIe Designated Vendor-Specific Extended Capability (DVSEC) and defines for the header offsets. Defined in PCIe r5.0, sec 7.9.6. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> --- include/uapi/linux/pci_regs.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index f9701410d3b5..beafeee39e44 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -720,6 +720,7 @@ #define PCI_EXT_CAP_ID_DPC 0x1D /* Downstream Port Containment */ #define PCI_EXT_CAP_ID_L1SS 0x1E /* L1 PM Substates */ #define PCI_EXT_CAP_ID_PTM 0x1F /* Precision Time Measurement */ +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ #define PCI_EXT_CAP_ID_DLF 0x25 /* Data Link Feature */ #define PCI_EXT_CAP_ID_PL_16GT 0x26 /* Physical Layer 16.0 GT/s */ #define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_16GT @@ -1062,6 +1063,10 @@ #define PCI_L1SS_CTL1_LTR_L12_TH_SCALE 0xe0000000 /* LTR_L1.2_THRESHOLD_Scale */ #define PCI_L1SS_CTL2 0x0c /* Control 2 Register */ +/* Designated Vendor-Specific (DVSEC, PCI_EXT_CAP_ID_DVSEC) */ +#define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */ +#define PCI_DVSEC_HEADER2 0x8 /* Designated Vendor-Specific Header2 */ + /* Data Link Feature */ #define PCI_DLF_CAP 0x04 /* Capabilities Register */ #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ -- 2.20.1
Intel Platform Monitoring Technology (PMT) is an architecture for enumerating and accessing hardware monitoring facilities. PMT supports multiple types of monitoring capabilities. This driver creates platform devices for each type so that they may be managed by capability specific drivers (to be introduced). Capabilities are discovered using PCIe DVSEC ids. Support is included for the 3 current capability types, Telemetry, Watcher, and Crashlog. The features are available on new Intel platforms starting from Tiger Lake for which support is added. Also add a quirk mechanism for several early hardware differences and bugs. For Tiger Lake, do not support Watcher and Crashlog capabilities since they will not be compatible with future product. Also, fix use a quirk to fix the discovery table offset. Co-developed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> --- MAINTAINERS | 5 + drivers/mfd/Kconfig | 10 ++ drivers/mfd/Makefile | 1 + drivers/mfd/intel_pmt.c | 220 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 236 insertions(+) create mode 100644 drivers/mfd/intel_pmt.c diff --git a/MAINTAINERS b/MAINTAINERS index f0569cf304ca..b69429c70330 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8845,6 +8845,11 @@ F: drivers/mfd/intel_soc_pmic* F: include/linux/mfd/intel_msic.h F: include/linux/mfd/intel_soc_pmic* +INTEL PMT DRIVER +M: "David E. Box" <david.e.box@linux.intel.com> +S: Maintained +F: drivers/mfd/intel_pmt.c + INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT M: Stanislav Yakovlev <stas.yakovlev@gmail.com> L: linux-wireless@vger.kernel.org diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig index a37d7d171382..5dd05f1b8ce5 100644 --- a/drivers/mfd/Kconfig +++ b/drivers/mfd/Kconfig @@ -670,6 +670,16 @@ config MFD_INTEL_PMC_BXT Register and P-unit access. In addition this creates devices for iTCO watchdog and telemetry that are part of the PMC. +config MFD_INTEL_PMT + tristate "Intel Platform Monitoring Technology (PMT) support" + depends on PCI + select MFD_CORE + help + The Intel Platform Monitoring Technology (PMT) is an interface that + provides access to hardware monitor registers. This driver supports + Telemetry, Watcher, and Crashlog PMT capabilities/devices for + platforms starting from Tiger Lake. + config MFD_IPAQ_MICRO bool "Atmel Micro ASIC (iPAQ h3100/h3600/h3700) Support" depends on SA1100_H3100 || SA1100_H3600 diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile index 9367a92f795a..1961b4737985 100644 --- a/drivers/mfd/Makefile +++ b/drivers/mfd/Makefile @@ -216,6 +216,7 @@ obj-$(CONFIG_MFD_INTEL_LPSS_PCI) += intel-lpss-pci.o obj-$(CONFIG_MFD_INTEL_LPSS_ACPI) += intel-lpss-acpi.o obj-$(CONFIG_MFD_INTEL_MSIC) += intel_msic.o obj-$(CONFIG_MFD_INTEL_PMC_BXT) += intel_pmc_bxt.o +obj-$(CONFIG_MFD_INTEL_PMT) += intel_pmt.o obj-$(CONFIG_MFD_PALMAS) += palmas.o obj-$(CONFIG_MFD_VIPERBOARD) += viperboard.o obj-$(CONFIG_MFD_RC5T583) += rc5t583.o rc5t583-irq.o diff --git a/drivers/mfd/intel_pmt.c b/drivers/mfd/intel_pmt.c new file mode 100644 index 000000000000..0e572b105101 --- /dev/null +++ b/drivers/mfd/intel_pmt.c @@ -0,0 +1,220 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitoring Technology PMT driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Author: David E. Box <david.e.box@linux.intel.com> + */ + +#include <linux/bits.h> +#include <linux/kernel.h> +#include <linux/mfd/core.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/pm.h> +#include <linux/pm_runtime.h> +#include <linux/types.h> + +/* Intel DVSEC capability vendor space offsets */ +#define INTEL_DVSEC_ENTRIES 0xA +#define INTEL_DVSEC_SIZE 0xB +#define INTEL_DVSEC_TABLE 0xC +#define INTEL_DVSEC_TABLE_BAR(x) ((x) & GENMASK(2, 0)) +#define INTEL_DVSEC_TABLE_OFFSET(x) ((x) & GENMASK(31, 3)) +#define INTEL_DVSEC_ENTRY_SIZE 4 + +/* PMT capabilities */ +#define DVSEC_INTEL_ID_TELEMETRY 2 +#define DVSEC_INTEL_ID_WATCHER 3 +#define DVSEC_INTEL_ID_CRASHLOG 4 + +struct intel_dvsec_header { + u16 length; + u16 id; + u8 num_entries; + u8 entry_size; + u8 tbir; + u32 offset; +}; + +enum pmt_quirks { + /* Watcher capability not supported */ + PMT_QUIRK_NO_WATCHER = BIT(0), + + /* Crashlog capability not supported */ + PMT_QUIRK_NO_CRASHLOG = BIT(1), + + /* Use shift instead of mask to read discovery table offset */ + PMT_QUIRK_TABLE_SHIFT = BIT(2), +}; + +struct pmt_platform_info { + unsigned long quirks; +}; + +static const struct pmt_platform_info tgl_info = { + .quirks = PMT_QUIRK_NO_WATCHER | PMT_QUIRK_NO_CRASHLOG | + PMT_QUIRK_TABLE_SHIFT, +}; + +static int pmt_add_dev(struct pci_dev *pdev, struct intel_dvsec_header *header, + struct pmt_platform_info *info) +{ + struct device *dev = &pdev->dev; + struct resource *res, *tmp; + struct mfd_cell *cell; + const char *name; + int count = header->num_entries; + int size = header->entry_size; + int id = header->id; + int i; + + switch (id) { + case DVSEC_INTEL_ID_TELEMETRY: + name = "pmt_telemetry"; + break; + case DVSEC_INTEL_ID_WATCHER: + if (info->quirks & PMT_QUIRK_NO_WATCHER) { + dev_info(dev, "Watcher not supported\n"); + return 0; + } + name = "pmt_watcher"; + break; + case DVSEC_INTEL_ID_CRASHLOG: + if (info->quirks & PMT_QUIRK_NO_CRASHLOG) { + dev_info(dev, "Crashlog not supported\n"); + return 0; + } + name = "pmt_crashlog"; + break; + default: + dev_err(dev, "Unrecognized PMT capability: %d\n", id); + return -EINVAL; + } + + if (!header->num_entries || !header->entry_size) { + dev_err(dev, "Invalid count or size for %s header\n", name); + return -EINVAL; + } + + cell = devm_kzalloc(dev, sizeof(*cell), GFP_KERNEL); + if (!cell) + return -ENOMEM; + + res = devm_kcalloc(dev, count, sizeof(*res), GFP_KERNEL); + if (!res) + return -ENOMEM; + + if (info->quirks & PMT_QUIRK_TABLE_SHIFT) + header->offset >>= 3; + + /* + * The PMT DVSEC contains the starting offset and count for a block of + * discovery tables, each providing access to monitoring facilities for + * a section of the device. Create a resource list of these tables to + * provide to the driver. + */ + for (i = 0, tmp = res; i < count; i++, tmp++) { + tmp->start = pdev->resource[header->tbir].start + + header->offset + i * (size << 2); + tmp->end = tmp->start + (size << 2) - 1; + tmp->flags = IORESOURCE_MEM; + } + + cell->resources = res; + cell->num_resources = count; + cell->name = name; + + return devm_mfd_add_devices(dev, PLATFORM_DEVID_AUTO, cell, 1, NULL, 0, + NULL); +} + +static int pmt_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct pmt_platform_info *info; + bool found_devices = false; + int ret, pos = 0; + + ret = pcim_enable_device(pdev); + if (ret) + return ret; + + info = devm_kmemdup(&pdev->dev, (void *)id->driver_data, sizeof(*info), + GFP_KERNEL); + if (!info) + return -ENOMEM; + + do { + struct intel_dvsec_header header; + u32 table; + u16 vid; + + pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC); + if (!pos) + break; + + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER1, &vid); + if (vid != PCI_VENDOR_ID_INTEL) + continue; + + pci_read_config_word(pdev, pos + PCI_DVSEC_HEADER2, + &header.id); + pci_read_config_byte(pdev, pos + INTEL_DVSEC_ENTRIES, + &header.num_entries); + pci_read_config_byte(pdev, pos + INTEL_DVSEC_SIZE, + &header.entry_size); + pci_read_config_dword(pdev, pos + INTEL_DVSEC_TABLE, + &table); + + header.tbir = INTEL_DVSEC_TABLE_BAR(table); + header.offset = INTEL_DVSEC_TABLE_OFFSET(table); + + ret = pmt_add_dev(pdev, &header, info); + if (ret) { + dev_warn(&pdev->dev, + "Failed to add device for DVSEC id %d\n", + header.id); + continue; + } + + found_devices = true; + } while (true); + + if (!found_devices) { + dev_err(&pdev->dev, "No supported PMT capabilities found.\n"); + return -ENODEV; + } + + pm_runtime_put(&pdev->dev); + pm_runtime_allow(&pdev->dev); + + return 0; +} + +static void pmt_pci_remove(struct pci_dev *pdev) +{ + pm_runtime_forbid(&pdev->dev); + pm_runtime_get_sync(&pdev->dev); +} + +#define PCI_DEVICE_ID_INTEL_PMT_TGL 0x9a0d +static const struct pci_device_id pmt_pci_ids[] = { + { PCI_DEVICE_DATA(INTEL, PMT_TGL, &tgl_info) }, + { } +}; +MODULE_DEVICE_TABLE(pci, pmt_pci_ids); + +static struct pci_driver pmt_pci_driver = { + .name = "intel-pmt", + .id_table = pmt_pci_ids, + .probe = pmt_pci_probe, + .remove = pmt_pci_remove, +}; +module_pci_driver(pmt_pci_driver); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel Platform Monitoring Technology PMT driver"); +MODULE_LICENSE("GPL v2"); -- 2.20.1
PMT Telemetry is a capability of the Intel Platform Monitoring Technology. The Telemetry capability provides access to device telemetry metrics that provide hardware performance data to users from continuous, memory mapped, read-only register spaces. Register mappings are not provided by the driver. Instead, a GUID is read from a header for each endpoint. The GUID identifies the device and is to be used with an XML, provided by the vendor, to discover the available set of metrics and their register mapping. This allows firmware updates to modify the register space without needing to update the driver every time with new mappings. Firmware writes a new GUID in this case to specify the new mapping. Software tools with access to the associated XML file can then interpret the changes. The module manages access to all PMT Telemetry endpoints on a system, independent of the device exporting them. It creates a pmt_telemetry class to manage the devices. For each telemetry endpoint, sysfs files provide GUID and size information as well as a pointer to the parent device the telemetry came from. Software may discover the association between endpoints and devices by iterating through the list in sysfs, or by looking for the existence of the class folder under the device of interest. A device node of the same name allows software to then map the telemetry space for direct access. Also create a PCI device id list for early telemetry hardware that require workarounds for known issues. Co-developed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> --- .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++ MAINTAINERS | 1 + drivers/platform/x86/Kconfig | 10 + drivers/platform/x86/Makefile | 1 + drivers/platform/x86/intel_pmt_telemetry.c | 448 ++++++++++++++++++ 5 files changed, 506 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-pmt_telemetry create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c diff --git a/Documentation/ABI/testing/sysfs-class-pmt_telemetry b/Documentation/ABI/testing/sysfs-class-pmt_telemetry new file mode 100644 index 000000000000..b0b096db9cae --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-pmt_telemetry @@ -0,0 +1,46 @@ +What: /sys/class/pmt_telemetry/ +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + The pmt_telemetry/ class directory contains information for + devices that expose hardware telemetry using Intel Platform + Monitoring Technology (PMT) + +What: /sys/class/pmt_telemetry/telem<x> +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + The telem<x> directory contains files describing an instance of + a PMT telemetry device that exposes hardware telemetry. Each + telem<x> directory has an associated /dev/telem<x> node. This + node may be opened and mapped to access the telemetry space of + the device. The register layout of the telemetry space is + determined from an XML file that matches the PCI device id and + GUID for the device. + +What: /sys/class/pmt_telemetry/telem<x>/guid +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The GUID for this telemetry device. The GUID identifies + the version of the XML file for the parent device that is to + be used to get the register layout. + +What: /sys/class/pmt_telemetry/telem<x>/size +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The size of telemetry region in bytes that corresponds to + the mapping size for the /dev/telem<x> device node. + +What: /sys/class/pmt_telemetry/telem<x>/offset +Date: July 2020 +KernelVersion: 5.9 +Contact: David Box <david.e.box@linux.intel.com> +Description: + (RO) The offset of telemetry region in bytes that corresponds to + the mapping for the /dev/telem<x> device node. diff --git a/MAINTAINERS b/MAINTAINERS index b69429c70330..40794cc721af 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8849,6 +8849,7 @@ INTEL PMT DRIVER M: "David E. Box" <david.e.box@linux.intel.com> S: Maintained F: drivers/mfd/intel_pmt.c +F: drivers/platform/x86/intel_pmt_* INTEL PRO/WIRELESS 2100, 2200BG, 2915ABG NETWORK CONNECTION SUPPORT M: Stanislav Yakovlev <stas.yakovlev@gmail.com> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig index 0581a54cf562..8552b094d005 100644 --- a/drivers/platform/x86/Kconfig +++ b/drivers/platform/x86/Kconfig @@ -1339,6 +1339,16 @@ config INTEL_PMC_CORE - LTR Ignore - MPHY/PLL gating status (Sunrisepoint PCH only) +config INTEL_PMT_TELEMETRY + tristate "Intel Platform Monitoring Technology (PMT) Telemetry driver" + help + The Intel Platform Monitory Technology (PMT) Telemetry driver provides + access to hardware telemetry metrics on devices that support the + feature. + + For more information, see + <file:Documentation/ABI/testing/sysfs-class-intel_pmt_telem> + config INTEL_PUNIT_IPC tristate "Intel P-Unit IPC Driver" help diff --git a/drivers/platform/x86/Makefile b/drivers/platform/x86/Makefile index 2b85852a1a87..95cd3d0be17f 100644 --- a/drivers/platform/x86/Makefile +++ b/drivers/platform/x86/Makefile @@ -139,6 +139,7 @@ obj-$(CONFIG_INTEL_MFLD_THERMAL) += intel_mid_thermal.o obj-$(CONFIG_INTEL_MID_POWER_BUTTON) += intel_mid_powerbtn.o obj-$(CONFIG_INTEL_MRFLD_PWRBTN) += intel_mrfld_pwrbtn.o obj-$(CONFIG_INTEL_PMC_CORE) += intel_pmc_core.o intel_pmc_core_pltdrv.o +obj-$(CONFIG_INTEL_PMT_TELEMETRY) += intel_pmt_telemetry.o obj-$(CONFIG_INTEL_PUNIT_IPC) += intel_punit_ipc.o obj-$(CONFIG_INTEL_SCU_IPC) += intel_scu_ipc.o obj-$(CONFIG_INTEL_SCU_PCI) += intel_scu_pcidrv.o diff --git a/drivers/platform/x86/intel_pmt_telemetry.c b/drivers/platform/x86/intel_pmt_telemetry.c new file mode 100644 index 000000000000..17f814ece30a --- /dev/null +++ b/drivers/platform/x86/intel_pmt_telemetry.c @@ -0,0 +1,448 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel Platform Monitory Technology Telemetry driver + * + * Copyright (c) 2020, Intel Corporation. + * All Rights Reserved. + * + * Author: "David E. Box" <david.e.box@linux.intel.com> + */ + +#include <linux/bits.h> +#include <linux/cdev.h> +#include <linux/io-64-nonatomic-lo-hi.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include <linux/xarray.h> + +#define TELEM_DEV_NAME "pmt_telemetry" + +/* Telemetry access types */ +#define TELEM_ACCESS_FUTURE 1 +#define TELEM_ACCESS_BARID 2 +#define TELEM_ACCESS_LOCAL 3 + +#define TELEM_GUID_OFFSET 0x4 +#define TELEM_BASE_OFFSET 0x8 +#define TELEM_TBIR_MASK GENMASK(2, 0) +#define TELEM_ACCESS(v) ((v) & GENMASK(3, 0)) +#define TELEM_TYPE(v) (((v) & GENMASK(7, 4)) >> 4) +/* size is in bytes */ +#define TELEM_SIZE(v) (((v) & GENMASK(27, 12)) >> 10) + +#define TELEM_XA_START 0 +#define TELEM_XA_MAX INT_MAX +#define TELEM_XA_LIMIT XA_LIMIT(TELEM_XA_START, TELEM_XA_MAX) + +/* Used by client hardware to identify a fixed telemetry entry*/ +#define TELEM_CLIENT_FIXED_BLOCK_GUID 0x10000000 + +static DEFINE_XARRAY_ALLOC(telem_array); + +struct pmt_telem_priv; + +struct telem_header { + u8 access_type; + u8 telem_type; + u16 size; + u32 guid; + u32 base_offset; + u8 tbir; +}; + +struct pmt_telem_entry { + struct pmt_telem_priv *priv; + struct telem_header header; + struct resource *header_res; + unsigned long base_addr; + void __iomem *disc_table; + struct cdev cdev; + dev_t devt; + int devid; +}; + +struct pmt_telem_priv { + struct pmt_telem_entry *entry; + int num_entries; + struct device *dev; +}; + +/* + * devfs + */ +static int pmt_telem_open(struct inode *inode, struct file *filp) +{ + struct pmt_telem_priv *priv; + struct pmt_telem_entry *entry; + struct pci_driver *pci_drv; + struct pci_dev *pci_dev; + + if (!perfmon_capable()) + return -EPERM; + + entry = container_of(inode->i_cdev, struct pmt_telem_entry, cdev); + priv = entry->priv; + pci_dev = to_pci_dev(priv->dev->parent); + + pci_drv = pci_dev_driver(pci_dev); + if (!pci_drv) + return -ENODEV; + + filp->private_data = entry; + get_device(&pci_dev->dev); + + if (!try_module_get(pci_drv->driver.owner)) { + put_device(&pci_dev->dev); + return -ENODEV; + } + + return 0; +} + +static int pmt_telem_release(struct inode *inode, struct file *filp) +{ + struct pmt_telem_entry *entry = filp->private_data; + struct pci_dev *pci_dev = to_pci_dev(entry->priv->dev->parent); + struct pci_driver *pci_drv = pci_dev_driver(pci_dev); + + put_device(&pci_dev->dev); + module_put(pci_drv->driver.owner); + + return 0; +} + +static int pmt_telem_mmap(struct file *filp, struct vm_area_struct *vma) +{ + struct pmt_telem_entry *entry = filp->private_data; + struct pmt_telem_priv *priv; + unsigned long vsize = vma->vm_end - vma->vm_start; + unsigned long phys = entry->base_addr; + unsigned long pfn = PFN_DOWN(phys); + unsigned long psize; + + priv = entry->priv; + psize = (PFN_UP(entry->base_addr + entry->header.size) - pfn) * PAGE_SIZE; + if (vsize > psize) { + dev_err(priv->dev, "Requested mmap size is too large\n"); + return -EINVAL; + } + + if ((vma->vm_flags & VM_WRITE) || (vma->vm_flags & VM_MAYWRITE)) + return -EPERM; + + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + + if (io_remap_pfn_range(vma, vma->vm_start, pfn, vsize, + vma->vm_page_prot)) + return -EINVAL; + + return 0; +} + +static const struct file_operations pmt_telem_fops = { + .owner = THIS_MODULE, + .open = pmt_telem_open, + .mmap = pmt_telem_mmap, + .release = pmt_telem_release, +}; + +/* + * sysfs + */ +static ssize_t guid_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + return sprintf(buf, "0x%x\n", entry->header.guid); +} +static DEVICE_ATTR_RO(guid); + +static ssize_t size_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + /* Display buffer size in bytes */ + return sprintf(buf, "%u\n", entry->header.size); +} +static DEVICE_ATTR_RO(size); + +static ssize_t offset_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct pmt_telem_entry *entry = dev_get_drvdata(dev); + + /* Display buffer offset in bytes */ + return sprintf(buf, "%lu\n", offset_in_page(entry->base_addr)); +} +static DEVICE_ATTR_RO(offset); + +static struct attribute *pmt_telem_attrs[] = { + &dev_attr_guid.attr, + &dev_attr_size.attr, + &dev_attr_offset.attr, + NULL +}; +ATTRIBUTE_GROUPS(pmt_telem); + +static struct class pmt_telem_class = { + .owner = THIS_MODULE, + .name = "pmt_telemetry", + .dev_groups = pmt_telem_groups, +}; + +/* + * driver initialization + */ +static const struct pci_device_id pmt_telem_early_client_pci_ids[] = { + { PCI_VDEVICE(INTEL, 0x9a0d) }, /* TGL */ + { } +}; + +static bool pmt_telem_is_early_client_hw(struct device *dev) +{ + struct pci_dev *parent = to_pci_dev(dev->parent); + + return !!pci_match_id(pmt_telem_early_client_pci_ids, parent); +} + +static int pmt_telem_create_dev(struct pmt_telem_priv *priv, + struct pmt_telem_entry *entry) +{ + struct pci_dev *pci_dev; + struct device *dev; + int ret; + + cdev_init(&entry->cdev, &pmt_telem_fops); + ret = cdev_add(&entry->cdev, entry->devt, 1); + if (ret) { + dev_err(priv->dev, "Could not add char dev\n"); + return ret; + } + + pci_dev = to_pci_dev(priv->dev->parent); + dev = device_create(&pmt_telem_class, &pci_dev->dev, entry->devt, + entry, "telem%d", entry->devid); + if (IS_ERR(dev)) { + dev_err(priv->dev, "Could not create device node\n"); + cdev_del(&entry->cdev); + } + + return PTR_ERR_OR_ZERO(dev); +} + +static void pmt_telem_populate_header(void __iomem *disc_offset, + struct telem_header *header) +{ + header->access_type = TELEM_ACCESS(readb(disc_offset)); + header->telem_type = TELEM_TYPE(readb(disc_offset)); + header->size = TELEM_SIZE(readl(disc_offset)); + header->guid = readl(disc_offset + TELEM_GUID_OFFSET); + header->base_offset = readl(disc_offset + TELEM_BASE_OFFSET); + + /* + * For non-local access types the lower 3 bits of base offset + * contains the index of the base address register where the + * telemetry can be found. + */ + header->tbir = header->base_offset & TELEM_TBIR_MASK; + header->base_offset ^= header->tbir; +} + +static int pmt_telem_add_entry(struct pmt_telem_priv *priv, + struct pmt_telem_entry *entry) +{ + struct resource *res = entry->header_res; + struct pci_dev *pci_dev = to_pci_dev(priv->dev->parent); + int ret; + + pmt_telem_populate_header(entry->disc_table, &entry->header); + + /* Local access and BARID only for now */ + switch (entry->header.access_type) { + case TELEM_ACCESS_LOCAL: + if (entry->header.tbir) { + dev_err(priv->dev, + "Unsupported BAR index %d for access type %d\n", + entry->header.tbir, entry->header.access_type); + return -EINVAL; + } + + /* + * For access_type LOCAL, the base address is as follows: + * base address = header address + header length + base offset + */ + entry->base_addr = res->start + resource_size(res) + + entry->header.base_offset; + break; + + case TELEM_ACCESS_BARID: + entry->base_addr = pci_dev->resource[entry->header.tbir].start + + entry->header.base_offset; + break; + + default: + dev_err(priv->dev, "Unsupported access type %d\n", + entry->header.access_type); + return -EINVAL; + } + + ret = alloc_chrdev_region(&entry->devt, 0, 1, TELEM_DEV_NAME); + if (ret) { + dev_err(priv->dev, + "PMT telemetry chrdev_region error: %d\n", ret); + return ret; + } + + ret = xa_alloc(&telem_array, &entry->devid, entry, TELEM_XA_LIMIT, + GFP_KERNEL); + if (ret) + goto fail_xa_alloc; + + ret = pmt_telem_create_dev(priv, entry); + if (ret) + goto fail_create_dev; + + entry->priv = priv; + priv->num_entries++; + return 0; + +fail_create_dev: + xa_erase(&telem_array, entry->devid); +fail_xa_alloc: + unregister_chrdev_region(entry->devt, 1); + + return ret; +} + +static bool pmt_telem_region_overlaps(struct platform_device *pdev, + void __iomem *disc_table) +{ + u32 guid; + + guid = readl(disc_table + TELEM_GUID_OFFSET); + + return guid == TELEM_CLIENT_FIXED_BLOCK_GUID; +} + +static void pmt_telem_remove_entries(struct pmt_telem_priv *priv) +{ + int i; + + for (i = 0; i < priv->num_entries; i++) { + device_destroy(&pmt_telem_class, priv->entry[i].devt); + cdev_del(&priv->entry[i].cdev); + xa_erase(&telem_array, priv->entry[i].devid); + unregister_chrdev_region(priv->entry[i].devt, 1); + } +} + +static int pmt_telem_probe(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv; + struct pmt_telem_entry *entry; + bool early_hw; + int i; + + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + platform_set_drvdata(pdev, priv); + priv->dev = &pdev->dev; + + priv->entry = devm_kcalloc(&pdev->dev, pdev->num_resources, + sizeof(struct pmt_telem_entry), GFP_KERNEL); + if (!priv->entry) + return -ENOMEM; + + if (pmt_telem_is_early_client_hw(&pdev->dev)) + early_hw = true; + + for (i = 0, entry = priv->entry; i < pdev->num_resources; + i++, entry++) { + int ret; + + entry->header_res = platform_get_resource(pdev, IORESOURCE_MEM, i); + if (!entry->header_res) { + pmt_telem_remove_entries(priv); + return -ENODEV; + } + + entry->disc_table = devm_platform_ioremap_resource(pdev, i); + if (IS_ERR(entry->disc_table)) { + pmt_telem_remove_entries(priv); + return PTR_ERR(entry->disc_table); + } + + if (pmt_telem_region_overlaps(pdev, entry->disc_table) && + early_hw) + continue; + + ret = pmt_telem_add_entry(priv, entry); + if (ret) { + pmt_telem_remove_entries(priv); + return ret; + } + } + + return 0; +} + +static int pmt_telem_remove(struct platform_device *pdev) +{ + struct pmt_telem_priv *priv = platform_get_drvdata(pdev); + + pmt_telem_remove_entries(priv); + + return 0; +} + +static const struct platform_device_id pmt_telem_table[] = { + { + .name = "pmt_telemetry", + }, + {} +}; +MODULE_DEVICE_TABLE(platform, pmt_telem_table); + +static struct platform_driver pmt_telem_driver = { + .driver = { + .name = TELEM_DEV_NAME, + }, + .probe = pmt_telem_probe, + .remove = pmt_telem_remove, + .id_table = pmt_telem_table, +}; + +static int __init pmt_telem_init(void) +{ + int ret = class_register(&pmt_telem_class); + + if (ret) + return ret; + + ret = platform_driver_register(&pmt_telem_driver); + if (ret) + class_unregister(&pmt_telem_class); + + return ret; +} +module_init(pmt_telem_init); + +static void __exit pmt_telem_exit(void) +{ + platform_driver_unregister(&pmt_telem_driver); + class_unregister(&pmt_telem_class); + xa_destroy(&telem_array); +} +module_exit(pmt_telem_exit); + +MODULE_AUTHOR("David E. Box <david.e.box@linux.intel.com>"); +MODULE_DESCRIPTION("Intel PMT Telemetry driver"); +MODULE_ALIAS("platform:" TELEM_DEV_NAME); +MODULE_LICENSE("GPL v2"); -- 2.20.1
[-- Attachment #1: Type: text/plain, Size: 491 bytes --] at 12:58 AM, Lee Jones <lee.jones@linaro.org> wrote: > If you do: > > do { > int pos; > > pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC); > if (!pos) > break; > > Then you can invoke pci_find_next_ext_capability() once, no? Part of your suggestion here won't work, because pos needs to be initialized to 0 the first time. As such it needs to be declared and initialized outside the loop. Other than that it may be ok. -- Mark Rustad, MRustad@gmail.com [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 873 bytes --]
On Wed, 2020-07-29 at 15:59 -0700, Mark D Rustad wrote:
> at 12:58 AM, Lee Jones <lee.jones@linaro.org> wrote:
>
> > If you do:
> >
> > do {
> > int pos;
> >
> > pos = pci_find_next_ext_capability(pdev, pos,
> > PCI_EXT_CAP_ID_DVSEC);
> > if (!pos)
> > break;
> >
> > Then you can invoke pci_find_next_ext_capability() once, no?
>
> Part of your suggestion here won't work, because pos needs to be
> initialized to 0 the first time. As such it needs to be declared
> and
> initialized outside the loop. Other than that it may be ok.
Already done in V5. Thanks.
David
On Wed, 29 Jul 2020, Mark D Rustad wrote:
> at 12:58 AM, Lee Jones <lee.jones@linaro.org> wrote:
>
> > If you do:
> >
> > do {
> > int pos;
> >
> > pos = pci_find_next_ext_capability(pdev, pos, PCI_EXT_CAP_ID_DVSEC);
> > if (!pos)
> > break;
> >
> > Then you can invoke pci_find_next_ext_capability() once, no?
>
> Part of your suggestion here won't work, because pos needs to be initialized
> to 0 the first time. As such it needs to be declared and initialized outside
> the loop. Other than that it may be ok.
Right. It was just an example I quickly hacked out.
Feel free to move the variable, or make it static, etc.
--
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
Friendly ping.
On Wed, 2020-07-29 at 14:37 -0700, David E. Box wrote:
> Intel Platform Monitoring Technology (PMT) is an architecture for
> enumerating and accessing hardware monitoring capabilities on a
> device.
> With customers increasingly asking for hardware telemetry, engineers
> not
> only have to figure out how to measure and collect data, but also how
> to
> deliver it and make it discoverable. The latter may be through some
> device
> specific method requiring device specific tools to collect the data.
> This
> in turn requires customers to manage a suite of different tools in
> order to
> collect the differing assortment of monitoring data on their
> systems. Even
> when such information can be provided in kernel drivers, they may
> require
> constant maintenance to update register mappings as they change with
> firmware updates and new versions of hardware. PMT provides a
> solution for
> discovering and reading telemetry from a device through a hardware
> agnostic
> framework that allows for updates to systems without requiring
> patches to
> the kernel or software tools.
>
> PMT defines several capabilities to support collecting monitoring
> data from
> hardware. All are discoverable as separate instances of the PCIE
> Designated
> Vendor extended capability (DVSEC) with the Intel vendor code. The
> DVSEC ID
> field uniquely identifies the capability. Each DVSEC also provides a
> BAR
> offset to a header that defines capability-specific attributes,
> including
> GUID, feature type, offset and length, as well as configuration
> settings
> where applicable. The GUID uniquely identifies the register space of
> any
> monitor data exposed by the capability. The GUID is associated with
> an XML
> file from the vendor that describes the mapping of the register space
> along
> with properties of the monitor data. This allows vendors to perform
> firmware updates that can change the mapping (e.g. add new metrics)
> without
> requiring any changes to drivers or software tools. The new mapping
> is
> confirmed by an updated GUID, read from the hardware, which software
> uses
> with a new XML.
>
> The current capabilities defined by PMT are Telemetry, Watcher, and
> Crashlog. The Telemetry capability provides access to a continuous
> block
> of read only data. The Watcher capability provides access to hardware
> sampling and tracing features. Crashlog provides access to device
> crash
> dumps. While there is some relationship between capabilities
> (Watcher can
> be configured to sample from the Telemetry data set) each exists as
> stand
> alone features with no dependency on any other. The design therefore
> splits
> them into individual, capability specific drivers. MFD is used to
> create
> platform devices for each capability so that they may be managed by
> their
> own driver. The PMT architecture is (for the most part) agnostic to
> the
> type of device it can collect from. Devices nodes are consequently
> generic
> in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability
> driver
> creates a class to manage the list of devices supporting
> it. Software can
> determine which devices support a PMT feature by searching through
> each
> device node entry in the sysfs class folder. It can additionally
> determine
> if a particular device supports a PMT feature by checking for a PMT
> class
> folder in the device folder.
>
> This patch set provides support for the PMT framework, along with
> support
> for Telemetry on Tiger Lake.
>
> Changes from V4:
> - Replace MFD with PMT in driver title
> - Fix commit tags in chronological order
> - Fix includes in alphabetical order
> - Use 'raw' string instead of defines for device names
> - Add an error message when returning an error code for
> unrecognized capability id
> - Use dev_err instead of dev_warn for messages when returning
> an error
> - Change while loop to call pci_find_next_ext_capability once
> - Add missing continue in while loop
> - Keep PCI platform defines using PCI_DEVICE_DATA magic tied to
> the pci_device_id table
> - Comment and kernel message cleanup
>
> Changes from V3:
> - Write out full acronym for DVSEC in PCI patch commit message
> and
> add 'Designated' to comments
> - remove unused variable caught by kernel test robot <
> lkp@intel.com>
> - Add required Co-developed-by signoffs, noted by Andy
> - Allow access using new CAP_PERFMON capability as suggested by
> Alexey Bundankov
> - Fix spacing in Kconfig, noted by Randy
> - Other style changes and fixups suggested by Andy
>
> Changes from V2:
> - In order to handle certain HW bugs from the telemetry
> capability
> driver, create a single platform device per capability
> instead of
> a device per entry. Add the entry data as device resources
> and
> let the capability driver manage them as a set allowing for
> cleaner HW bug resolution.
> - Handle discovery table offset bug in intel_pmt.c
> - Handle overlapping regions in intel_pmt_telemetry.c
> - Add description of sysfs class to testing ABI.
> - Don't check size and count until confirming support for the
> PMT
> capability to avoid bailing out when we need to skip it.
> - Remove unneeded header file. Move code to the intel_pmt.c,
> the
> only place where it's needed.
> - Remove now unused platform data.
> - Add missing header files types.h, bits.h.
> - Rename file name and build options from telem to telemetry.
> - Code cleanup suggested by Andy S.
> - x86 mailing list added.
>
> Changes from V1:
> - In the telemetry driver, set the device in device_create() to
> the parent PCI device (the monitoring device) for clear
> association in sysfs. Was set before to the platform device
> created by the PCI parent.
> - Move telem struct into driver and delete unneeded header
> file.
> - Start telem device numbering from 0 instead of 1. 1 was used
> due to anticipated changes, no longer needed.
> - Use helper macros suggested by Andy S.
> - Rename class to pmt_telemetry, spelling out full name
> - Move monitor device name defines to common header
> - Coding style, spelling, and Makefile/MAINTAINERS ordering
> fixes
>
> David E. Box (3):
> PCI: Add defines for Designated Vendor-Specific Extended Capability
> mfd: Intel Platform Monitoring Technology support
> platform/x86: Intel PMT Telemetry capability driver
>
> .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++
> MAINTAINERS | 6 +
> drivers/mfd/Kconfig | 10 +
> drivers/mfd/Makefile | 1 +
> drivers/mfd/intel_pmt.c | 220 +++++++++
> drivers/platform/x86/Kconfig | 10 +
> drivers/platform/x86/Makefile | 1 +
> drivers/platform/x86/intel_pmt_telemetry.c | 448
> ++++++++++++++++++
> include/uapi/linux/pci_regs.h | 5 +
> 9 files changed, 747 insertions(+)
> create mode 100644 Documentation/ABI/testing/sysfs-class-
> pmt_telemetry
> create mode 100644 drivers/mfd/intel_pmt.c
> create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c
>
MCTP and PLDM are the latest in Platform management Technology. Sw
application and drivers can be implemented on the PCIe platform.
Previously I spent some time on this.
On Mon, Aug 10, 2020 at 7:49 PM David E. Box
<david.e.box@linux.intel.com> wrote:
>
> Friendly ping.
>
> On Wed, 2020-07-29 at 14:37 -0700, David E. Box wrote:
> > Intel Platform Monitoring Technology (PMT) is an architecture for
> > enumerating and accessing hardware monitoring capabilities on a
> > device.
> > With customers increasingly asking for hardware telemetry, engineers
> > not
> > only have to figure out how to measure and collect data, but also how
> > to
> > deliver it and make it discoverable. The latter may be through some
> > device
> > specific method requiring device specific tools to collect the data.
> > This
> > in turn requires customers to manage a suite of different tools in
> > order to
> > collect the differing assortment of monitoring data on their
> > systems. Even
> > when such information can be provided in kernel drivers, they may
> > require
> > constant maintenance to update register mappings as they change with
> > firmware updates and new versions of hardware. PMT provides a
> > solution for
> > discovering and reading telemetry from a device through a hardware
> > agnostic
> > framework that allows for updates to systems without requiring
> > patches to
> > the kernel or software tools.
> >
> > PMT defines several capabilities to support collecting monitoring
> > data from
> > hardware. All are discoverable as separate instances of the PCIE
> > Designated
> > Vendor extended capability (DVSEC) with the Intel vendor code. The
> > DVSEC ID
> > field uniquely identifies the capability. Each DVSEC also provides a
> > BAR
> > offset to a header that defines capability-specific attributes,
> > including
> > GUID, feature type, offset and length, as well as configuration
> > settings
> > where applicable. The GUID uniquely identifies the register space of
> > any
> > monitor data exposed by the capability. The GUID is associated with
> > an XML
> > file from the vendor that describes the mapping of the register space
> > along
> > with properties of the monitor data. This allows vendors to perform
> > firmware updates that can change the mapping (e.g. add new metrics)
> > without
> > requiring any changes to drivers or software tools. The new mapping
> > is
> > confirmed by an updated GUID, read from the hardware, which software
> > uses
> > with a new XML.
> >
> > The current capabilities defined by PMT are Telemetry, Watcher, and
> > Crashlog. The Telemetry capability provides access to a continuous
> > block
> > of read only data. The Watcher capability provides access to hardware
> > sampling and tracing features. Crashlog provides access to device
> > crash
> > dumps. While there is some relationship between capabilities
> > (Watcher can
> > be configured to sample from the Telemetry data set) each exists as
> > stand
> > alone features with no dependency on any other. The design therefore
> > splits
> > them into individual, capability specific drivers. MFD is used to
> > create
> > platform devices for each capability so that they may be managed by
> > their
> > own driver. The PMT architecture is (for the most part) agnostic to
> > the
> > type of device it can collect from. Devices nodes are consequently
> > generic
> > in naming, e.g. /dev/telem<n> and /dev/smplr<n>. Each capability
> > driver
> > creates a class to manage the list of devices supporting
> > it. Software can
> > determine which devices support a PMT feature by searching through
> > each
> > device node entry in the sysfs class folder. It can additionally
> > determine
> > if a particular device supports a PMT feature by checking for a PMT
> > class
> > folder in the device folder.
> >
> > This patch set provides support for the PMT framework, along with
> > support
> > for Telemetry on Tiger Lake.
> >
> > Changes from V4:
> > - Replace MFD with PMT in driver title
> > - Fix commit tags in chronological order
> > - Fix includes in alphabetical order
> > - Use 'raw' string instead of defines for device names
> > - Add an error message when returning an error code for
> > unrecognized capability id
> > - Use dev_err instead of dev_warn for messages when returning
> > an error
> > - Change while loop to call pci_find_next_ext_capability once
> > - Add missing continue in while loop
> > - Keep PCI platform defines using PCI_DEVICE_DATA magic tied to
> > the pci_device_id table
> > - Comment and kernel message cleanup
> >
> > Changes from V3:
> > - Write out full acronym for DVSEC in PCI patch commit message
> > and
> > add 'Designated' to comments
> > - remove unused variable caught by kernel test robot <
> > lkp@intel.com>
> > - Add required Co-developed-by signoffs, noted by Andy
> > - Allow access using new CAP_PERFMON capability as suggested by
> > Alexey Bundankov
> > - Fix spacing in Kconfig, noted by Randy
> > - Other style changes and fixups suggested by Andy
> >
> > Changes from V2:
> > - In order to handle certain HW bugs from the telemetry
> > capability
> > driver, create a single platform device per capability
> > instead of
> > a device per entry. Add the entry data as device resources
> > and
> > let the capability driver manage them as a set allowing for
> > cleaner HW bug resolution.
> > - Handle discovery table offset bug in intel_pmt.c
> > - Handle overlapping regions in intel_pmt_telemetry.c
> > - Add description of sysfs class to testing ABI.
> > - Don't check size and count until confirming support for the
> > PMT
> > capability to avoid bailing out when we need to skip it.
> > - Remove unneeded header file. Move code to the intel_pmt.c,
> > the
> > only place where it's needed.
> > - Remove now unused platform data.
> > - Add missing header files types.h, bits.h.
> > - Rename file name and build options from telem to telemetry.
> > - Code cleanup suggested by Andy S.
> > - x86 mailing list added.
> >
> > Changes from V1:
> > - In the telemetry driver, set the device in device_create() to
> > the parent PCI device (the monitoring device) for clear
> > association in sysfs. Was set before to the platform device
> > created by the PCI parent.
> > - Move telem struct into driver and delete unneeded header
> > file.
> > - Start telem device numbering from 0 instead of 1. 1 was used
> > due to anticipated changes, no longer needed.
> > - Use helper macros suggested by Andy S.
> > - Rename class to pmt_telemetry, spelling out full name
> > - Move monitor device name defines to common header
> > - Coding style, spelling, and Makefile/MAINTAINERS ordering
> > fixes
> >
> > David E. Box (3):
> > PCI: Add defines for Designated Vendor-Specific Extended Capability
> > mfd: Intel Platform Monitoring Technology support
> > platform/x86: Intel PMT Telemetry capability driver
> >
> > .../ABI/testing/sysfs-class-pmt_telemetry | 46 ++
> > MAINTAINERS | 6 +
> > drivers/mfd/Kconfig | 10 +
> > drivers/mfd/Makefile | 1 +
> > drivers/mfd/intel_pmt.c | 220 +++++++++
> > drivers/platform/x86/Kconfig | 10 +
> > drivers/platform/x86/Makefile | 1 +
> > drivers/platform/x86/intel_pmt_telemetry.c | 448
> > ++++++++++++++++++
> > include/uapi/linux/pci_regs.h | 5 +
> > 9 files changed, 747 insertions(+)
> > create mode 100644 Documentation/ABI/testing/sysfs-class-
> > pmt_telemetry
> > create mode 100644 drivers/mfd/intel_pmt.c
> > create mode 100644 drivers/platform/x86/intel_pmt_telemetry.c
> >
>
On Mon, 10 Aug 2020, David E. Box wrote:
> Friendly ping.
Don't do that. Sending contentless pings is seldom helpful.
If you think your set has been dropped please just send a [RESEND].
This is probably worth doing anyway, since you've sent v2, v3, v4 and
now v5 has reply-tos of one another. The thread has become quite
messy as a result.
Also please take the time to identify where we are with respect to the
current release cycle. The merge-window is open presently. Meaning
that most maintainers are busy, either sending out pull-requests or
ramping up for the next cycle (or just taking a quick breather).
--
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
On Tue, 2020-08-11 at 09:04 +0100, Lee Jones wrote:
> On Mon, 10 Aug 2020, David E. Box wrote:
>
> > Friendly ping.
>
> Don't do that. Sending contentless pings is seldom helpful.
>
> If you think your set has been dropped please just send a [RESEND].
>
> This is probably worth doing anyway, since you've sent v2, v3, v4 and
> now v5 has reply-tos of one another. The thread has become quite
> messy as a result.
>
> Also please take the time to identify where we are with respect to
> the
> current release cycle. The merge-window is open presently. Meaning
> that most maintainers are busy, either sending out pull-requests or
> ramping up for the next cycle (or just taking a quick breather).
>
No problem. I'll resend v5 in a new thread when rc1 is tagged. Thanks.