All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
@ 2016-02-26  0:02 Marty McFadden
  2016-02-26  0:02 ` [PATCH 1/4] MSR: Prep for separating msr.c into three files Marty McFadden
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Marty McFadden @ 2016-02-26  0:02 UTC (permalink / raw)
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mcfadden8, mingo, pavel, tglx,
	viro, x86, yu.c.chen


This patch addresses the following two problems:
  1. The current msr module grants all-or-nothing access to MSRs,
     thus making user-level runtime performance adjustments 
     problematic, particularly for power-constrained HPC systems.

  2. The current msr module requires a separate system call and the
     acquisition of the preemption lock for each individual MSR access. 
     This overhead degrades performance of runtime tools that would
     ideally sample multiple MSRs at high frequencies.
    
Problem #1 is addressed by introducing a whitelist policy for MSR access.
A batch mechanism is added to address problem #2.

[PATCH 1/4] MSR: Prep for separating msr.c into three files
[PATCH 2/4] MSR: Prep for separating msr.c into three files
    Prepares for providing the overall MSR kernel module as a set of
    three implementation files: msr_entry.c, msr_whitelist.c, and
    msr_batch.c.

    Additonal changes made besides renaming to stop scripts/checkpatch.pl
    complaining.

[PATCH 3/4] MSR: msr Whitelist Implementation
    Allows the administrator to configure a set of bit masks for MSRs
    where access is permitted.

    Whitelist Administration:
        To configure whitelist (as root):
            cat whitelistfile > /dev/cpu/msr_whitelist

            This operation will cause the previous whitelist to be
            replaced by the specified whitelist.

        To enumerate current whitelist (as root):
            cat < /dev/cpu/msr_whitelist

        To remove whitelist (as root):
            echo > /dev/cpu/msr_whitelist

        Security model:
            If user has CAP_SYS_RAWIO privileges, they will enjoy full
            access to MSRs like they do today.

            Otherwise, if the user is able to open the /dev/cpu/*/msr
            file, they will have access to MSR operations as follows:

                If the write mask exists for a particular MSR, then 
                rdmsr access to that MSR access is granted.

                If the write mask is set to all ones (0xffffffffffffffff),
                then the user may perform a "raw" wrmsr operation with all
                64 bits being overwritten to that MSR.
 
                If the write mask is not 0xffffffffffffffff, then a rdmsr
                will be performed first and only the bits set in the write
                mask will be affected in the MSR.

[PATCH 4/4] MSR: msr Batch processing feature
    Provides a new ioctl interface through /dev/cpu/msr_batch.  Each
    element in the batch list is subject to the whitelist security model
    described above.

    This implementation will cause an Inter Processor Interrupt to be sent
    to each destination processor and will wait until all processors have
    finished processing their respective batch of MSR operations before
    returning.

    Implementation Note: A separate "error" field is maintained per MSR
    operation in order to maintain reentrancy into the IPI callback
    function.

Signed-off-by: Marty McFadden <mcfadden8@llnl.gov>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/4] MSR: Prep for separating msr.c into three files
  2016-02-26  0:02 [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Marty McFadden
@ 2016-02-26  0:02 ` Marty McFadden
  2016-02-26  2:56   ` kbuild test robot
  2016-02-26 10:08   ` Andy Shevchenko
  2016-02-26  0:02 ` [PATCH 2/4] " Marty McFadden
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 29+ messages in thread
From: Marty McFadden @ 2016-02-26  0:02 UTC (permalink / raw)
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mcfadden8, mingo, pavel, tglx,
	viro, x86, yu.c.chen

DEPENDENCY: Compilation depends upon next commit.  This commit only
            does the rename of msr.c --> msr_entry.c.
            Separated in to two commits so that "git am" would work
            without complaint.

    No functional change (yet).  Just preparation for splitting the
    MSR kernel module into three separate implementation files:

      1) msr_entry.c - Original msr driver and entry (now)
      2) msr_whitelist.c - MSR Whitelist implementation (patch 2)
      3) msr_batch.c - MSR Batch implementation (patch 3)

Signed-off-by: Marty McFadden <mcfadden8@llnl.gov>
---
 arch/x86/kernel/msr.c       |  284 -------------------------------------------
 arch/x86/kernel/msr_entry.c |  284 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 284 insertions(+), 284 deletions(-)
 delete mode 100644 arch/x86/kernel/msr.c
 create mode 100644 arch/x86/kernel/msr_entry.c

diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
deleted file mode 100644
index 64f9616..0000000
--- a/arch/x86/kernel/msr.c
+++ /dev/null
@@ -1,284 +0,0 @@
-/* ----------------------------------------------------------------------- *
- *
- *   Copyright 2000-2008 H. Peter Anvin - All Rights Reserved
- *   Copyright 2009 Intel Corporation; author: H. Peter Anvin
- *
- *   This program is free software; you can redistribute it and/or modify
- *   it under the terms of the GNU General Public License as published by
- *   the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
- *   USA; either version 2 of the License, or (at your option) any later
- *   version; incorporated herein by reference.
- *
- * ----------------------------------------------------------------------- */
-
-/*
- * x86 MSR access device
- *
- * This device is accessed by lseek() to the appropriate register number
- * and then read/write in chunks of 8 bytes.  A larger size means multiple
- * reads or writes of the same register.
- *
- * This driver uses /dev/cpu/%d/msr where %d is the minor number, and on
- * an SMP box will direct the access to CPU %d.
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/module.h>
-
-#include <linux/types.h>
-#include <linux/errno.h>
-#include <linux/fcntl.h>
-#include <linux/init.h>
-#include <linux/poll.h>
-#include <linux/smp.h>
-#include <linux/major.h>
-#include <linux/fs.h>
-#include <linux/device.h>
-#include <linux/cpu.h>
-#include <linux/notifier.h>
-#include <linux/uaccess.h>
-#include <linux/gfp.h>
-
-#include <asm/processor.h>
-#include <asm/msr.h>
-
-static struct class *msr_class;
-
-static ssize_t msr_read(struct file *file, char __user *buf,
-			size_t count, loff_t *ppos)
-{
-	u32 __user *tmp = (u32 __user *) buf;
-	u32 data[2];
-	u32 reg = *ppos;
-	int cpu = iminor(file_inode(file));
-	int err = 0;
-	ssize_t bytes = 0;
-
-	if (count % 8)
-		return -EINVAL;	/* Invalid chunk size */
-
-	for (; count; count -= 8) {
-		err = rdmsr_safe_on_cpu(cpu, reg, &data[0], &data[1]);
-		if (err)
-			break;
-		if (copy_to_user(tmp, &data, 8)) {
-			err = -EFAULT;
-			break;
-		}
-		tmp += 2;
-		bytes += 8;
-	}
-
-	return bytes ? bytes : err;
-}
-
-static ssize_t msr_write(struct file *file, const char __user *buf,
-			 size_t count, loff_t *ppos)
-{
-	const u32 __user *tmp = (const u32 __user *)buf;
-	u32 data[2];
-	u32 reg = *ppos;
-	int cpu = iminor(file_inode(file));
-	int err = 0;
-	ssize_t bytes = 0;
-
-	if (count % 8)
-		return -EINVAL;	/* Invalid chunk size */
-
-	for (; count; count -= 8) {
-		if (copy_from_user(&data, tmp, 8)) {
-			err = -EFAULT;
-			break;
-		}
-		err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
-		if (err)
-			break;
-		tmp += 2;
-		bytes += 8;
-	}
-
-	return bytes ? bytes : err;
-}
-
-static long msr_ioctl(struct file *file, unsigned int ioc, unsigned long arg)
-{
-	u32 __user *uregs = (u32 __user *)arg;
-	u32 regs[8];
-	int cpu = iminor(file_inode(file));
-	int err;
-
-	switch (ioc) {
-	case X86_IOC_RDMSR_REGS:
-		if (!(file->f_mode & FMODE_READ)) {
-			err = -EBADF;
-			break;
-		}
-		if (copy_from_user(&regs, uregs, sizeof regs)) {
-			err = -EFAULT;
-			break;
-		}
-		err = rdmsr_safe_regs_on_cpu(cpu, regs);
-		if (err)
-			break;
-		if (copy_to_user(uregs, &regs, sizeof regs))
-			err = -EFAULT;
-		break;
-
-	case X86_IOC_WRMSR_REGS:
-		if (!(file->f_mode & FMODE_WRITE)) {
-			err = -EBADF;
-			break;
-		}
-		if (copy_from_user(&regs, uregs, sizeof regs)) {
-			err = -EFAULT;
-			break;
-		}
-		err = wrmsr_safe_regs_on_cpu(cpu, regs);
-		if (err)
-			break;
-		if (copy_to_user(uregs, &regs, sizeof regs))
-			err = -EFAULT;
-		break;
-
-	default:
-		err = -ENOTTY;
-		break;
-	}
-
-	return err;
-}
-
-static int msr_open(struct inode *inode, struct file *file)
-{
-	unsigned int cpu = iminor(file_inode(file));
-	struct cpuinfo_x86 *c;
-
-	if (!capable(CAP_SYS_RAWIO))
-		return -EPERM;
-
-	if (cpu >= nr_cpu_ids || !cpu_online(cpu))
-		return -ENXIO;	/* No such CPU */
-
-	c = &cpu_data(cpu);
-	if (!cpu_has(c, X86_FEATURE_MSR))
-		return -EIO;	/* MSR not supported */
-
-	return 0;
-}
-
-/*
- * File operations we support
- */
-static const struct file_operations msr_fops = {
-	.owner = THIS_MODULE,
-	.llseek = no_seek_end_llseek,
-	.read = msr_read,
-	.write = msr_write,
-	.open = msr_open,
-	.unlocked_ioctl = msr_ioctl,
-	.compat_ioctl = msr_ioctl,
-};
-
-static int msr_device_create(int cpu)
-{
-	struct device *dev;
-
-	dev = device_create(msr_class, NULL, MKDEV(MSR_MAJOR, cpu), NULL,
-			    "msr%d", cpu);
-	return PTR_ERR_OR_ZERO(dev);
-}
-
-static void msr_device_destroy(int cpu)
-{
-	device_destroy(msr_class, MKDEV(MSR_MAJOR, cpu));
-}
-
-static int msr_class_cpu_callback(struct notifier_block *nfb,
-				  unsigned long action, void *hcpu)
-{
-	unsigned int cpu = (unsigned long)hcpu;
-	int err = 0;
-
-	switch (action) {
-	case CPU_UP_PREPARE:
-		err = msr_device_create(cpu);
-		break;
-	case CPU_UP_CANCELED:
-	case CPU_UP_CANCELED_FROZEN:
-	case CPU_DEAD:
-		msr_device_destroy(cpu);
-		break;
-	}
-	return notifier_from_errno(err);
-}
-
-static struct notifier_block __refdata msr_class_cpu_notifier = {
-	.notifier_call = msr_class_cpu_callback,
-};
-
-static char *msr_devnode(struct device *dev, umode_t *mode)
-{
-	return kasprintf(GFP_KERNEL, "cpu/%u/msr", MINOR(dev->devt));
-}
-
-static int __init msr_init(void)
-{
-	int i, err = 0;
-	i = 0;
-
-	if (__register_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr", &msr_fops)) {
-		pr_err("unable to get major %d for msr\n", MSR_MAJOR);
-		err = -EBUSY;
-		goto out;
-	}
-	msr_class = class_create(THIS_MODULE, "msr");
-	if (IS_ERR(msr_class)) {
-		err = PTR_ERR(msr_class);
-		goto out_chrdev;
-	}
-	msr_class->devnode = msr_devnode;
-
-	cpu_notifier_register_begin();
-	for_each_online_cpu(i) {
-		err = msr_device_create(i);
-		if (err != 0)
-			goto out_class;
-	}
-	__register_hotcpu_notifier(&msr_class_cpu_notifier);
-	cpu_notifier_register_done();
-
-	err = 0;
-	goto out;
-
-out_class:
-	i = 0;
-	for_each_online_cpu(i)
-		msr_device_destroy(i);
-	cpu_notifier_register_done();
-	class_destroy(msr_class);
-out_chrdev:
-	__unregister_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr");
-out:
-	return err;
-}
-
-static void __exit msr_exit(void)
-{
-	int cpu = 0;
-
-	cpu_notifier_register_begin();
-	for_each_online_cpu(cpu)
-		msr_device_destroy(cpu);
-	class_destroy(msr_class);
-	__unregister_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr");
-	__unregister_hotcpu_notifier(&msr_class_cpu_notifier);
-	cpu_notifier_register_done();
-}
-
-module_init(msr_init);
-module_exit(msr_exit)
-
-MODULE_AUTHOR("H. Peter Anvin <hpa@zytor.com>");
-MODULE_DESCRIPTION("x86 generic MSR driver");
-MODULE_LICENSE("GPL");
diff --git a/arch/x86/kernel/msr_entry.c b/arch/x86/kernel/msr_entry.c
new file mode 100644
index 0000000..64f9616
--- /dev/null
+++ b/arch/x86/kernel/msr_entry.c
@@ -0,0 +1,284 @@
+/* ----------------------------------------------------------------------- *
+ *
+ *   Copyright 2000-2008 H. Peter Anvin - All Rights Reserved
+ *   Copyright 2009 Intel Corporation; author: H. Peter Anvin
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, Inc., 675 Mass Ave, Cambridge MA 02139,
+ *   USA; either version 2 of the License, or (at your option) any later
+ *   version; incorporated herein by reference.
+ *
+ * ----------------------------------------------------------------------- */
+
+/*
+ * x86 MSR access device
+ *
+ * This device is accessed by lseek() to the appropriate register number
+ * and then read/write in chunks of 8 bytes.  A larger size means multiple
+ * reads or writes of the same register.
+ *
+ * This driver uses /dev/cpu/%d/msr where %d is the minor number, and on
+ * an SMP box will direct the access to CPU %d.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+
+#include <linux/types.h>
+#include <linux/errno.h>
+#include <linux/fcntl.h>
+#include <linux/init.h>
+#include <linux/poll.h>
+#include <linux/smp.h>
+#include <linux/major.h>
+#include <linux/fs.h>
+#include <linux/device.h>
+#include <linux/cpu.h>
+#include <linux/notifier.h>
+#include <linux/uaccess.h>
+#include <linux/gfp.h>
+
+#include <asm/processor.h>
+#include <asm/msr.h>
+
+static struct class *msr_class;
+
+static ssize_t msr_read(struct file *file, char __user *buf,
+			size_t count, loff_t *ppos)
+{
+	u32 __user *tmp = (u32 __user *) buf;
+	u32 data[2];
+	u32 reg = *ppos;
+	int cpu = iminor(file_inode(file));
+	int err = 0;
+	ssize_t bytes = 0;
+
+	if (count % 8)
+		return -EINVAL;	/* Invalid chunk size */
+
+	for (; count; count -= 8) {
+		err = rdmsr_safe_on_cpu(cpu, reg, &data[0], &data[1]);
+		if (err)
+			break;
+		if (copy_to_user(tmp, &data, 8)) {
+			err = -EFAULT;
+			break;
+		}
+		tmp += 2;
+		bytes += 8;
+	}
+
+	return bytes ? bytes : err;
+}
+
+static ssize_t msr_write(struct file *file, const char __user *buf,
+			 size_t count, loff_t *ppos)
+{
+	const u32 __user *tmp = (const u32 __user *)buf;
+	u32 data[2];
+	u32 reg = *ppos;
+	int cpu = iminor(file_inode(file));
+	int err = 0;
+	ssize_t bytes = 0;
+
+	if (count % 8)
+		return -EINVAL;	/* Invalid chunk size */
+
+	for (; count; count -= 8) {
+		if (copy_from_user(&data, tmp, 8)) {
+			err = -EFAULT;
+			break;
+		}
+		err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
+		if (err)
+			break;
+		tmp += 2;
+		bytes += 8;
+	}
+
+	return bytes ? bytes : err;
+}
+
+static long msr_ioctl(struct file *file, unsigned int ioc, unsigned long arg)
+{
+	u32 __user *uregs = (u32 __user *)arg;
+	u32 regs[8];
+	int cpu = iminor(file_inode(file));
+	int err;
+
+	switch (ioc) {
+	case X86_IOC_RDMSR_REGS:
+		if (!(file->f_mode & FMODE_READ)) {
+			err = -EBADF;
+			break;
+		}
+		if (copy_from_user(&regs, uregs, sizeof regs)) {
+			err = -EFAULT;
+			break;
+		}
+		err = rdmsr_safe_regs_on_cpu(cpu, regs);
+		if (err)
+			break;
+		if (copy_to_user(uregs, &regs, sizeof regs))
+			err = -EFAULT;
+		break;
+
+	case X86_IOC_WRMSR_REGS:
+		if (!(file->f_mode & FMODE_WRITE)) {
+			err = -EBADF;
+			break;
+		}
+		if (copy_from_user(&regs, uregs, sizeof regs)) {
+			err = -EFAULT;
+			break;
+		}
+		err = wrmsr_safe_regs_on_cpu(cpu, regs);
+		if (err)
+			break;
+		if (copy_to_user(uregs, &regs, sizeof regs))
+			err = -EFAULT;
+		break;
+
+	default:
+		err = -ENOTTY;
+		break;
+	}
+
+	return err;
+}
+
+static int msr_open(struct inode *inode, struct file *file)
+{
+	unsigned int cpu = iminor(file_inode(file));
+	struct cpuinfo_x86 *c;
+
+	if (!capable(CAP_SYS_RAWIO))
+		return -EPERM;
+
+	if (cpu >= nr_cpu_ids || !cpu_online(cpu))
+		return -ENXIO;	/* No such CPU */
+
+	c = &cpu_data(cpu);
+	if (!cpu_has(c, X86_FEATURE_MSR))
+		return -EIO;	/* MSR not supported */
+
+	return 0;
+}
+
+/*
+ * File operations we support
+ */
+static const struct file_operations msr_fops = {
+	.owner = THIS_MODULE,
+	.llseek = no_seek_end_llseek,
+	.read = msr_read,
+	.write = msr_write,
+	.open = msr_open,
+	.unlocked_ioctl = msr_ioctl,
+	.compat_ioctl = msr_ioctl,
+};
+
+static int msr_device_create(int cpu)
+{
+	struct device *dev;
+
+	dev = device_create(msr_class, NULL, MKDEV(MSR_MAJOR, cpu), NULL,
+			    "msr%d", cpu);
+	return PTR_ERR_OR_ZERO(dev);
+}
+
+static void msr_device_destroy(int cpu)
+{
+	device_destroy(msr_class, MKDEV(MSR_MAJOR, cpu));
+}
+
+static int msr_class_cpu_callback(struct notifier_block *nfb,
+				  unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	int err = 0;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+		err = msr_device_create(cpu);
+		break;
+	case CPU_UP_CANCELED:
+	case CPU_UP_CANCELED_FROZEN:
+	case CPU_DEAD:
+		msr_device_destroy(cpu);
+		break;
+	}
+	return notifier_from_errno(err);
+}
+
+static struct notifier_block __refdata msr_class_cpu_notifier = {
+	.notifier_call = msr_class_cpu_callback,
+};
+
+static char *msr_devnode(struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "cpu/%u/msr", MINOR(dev->devt));
+}
+
+static int __init msr_init(void)
+{
+	int i, err = 0;
+	i = 0;
+
+	if (__register_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr", &msr_fops)) {
+		pr_err("unable to get major %d for msr\n", MSR_MAJOR);
+		err = -EBUSY;
+		goto out;
+	}
+	msr_class = class_create(THIS_MODULE, "msr");
+	if (IS_ERR(msr_class)) {
+		err = PTR_ERR(msr_class);
+		goto out_chrdev;
+	}
+	msr_class->devnode = msr_devnode;
+
+	cpu_notifier_register_begin();
+	for_each_online_cpu(i) {
+		err = msr_device_create(i);
+		if (err != 0)
+			goto out_class;
+	}
+	__register_hotcpu_notifier(&msr_class_cpu_notifier);
+	cpu_notifier_register_done();
+
+	err = 0;
+	goto out;
+
+out_class:
+	i = 0;
+	for_each_online_cpu(i)
+		msr_device_destroy(i);
+	cpu_notifier_register_done();
+	class_destroy(msr_class);
+out_chrdev:
+	__unregister_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr");
+out:
+	return err;
+}
+
+static void __exit msr_exit(void)
+{
+	int cpu = 0;
+
+	cpu_notifier_register_begin();
+	for_each_online_cpu(cpu)
+		msr_device_destroy(cpu);
+	class_destroy(msr_class);
+	__unregister_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr");
+	__unregister_hotcpu_notifier(&msr_class_cpu_notifier);
+	cpu_notifier_register_done();
+}
+
+module_init(msr_init);
+module_exit(msr_exit)
+
+MODULE_AUTHOR("H. Peter Anvin <hpa@zytor.com>");
+MODULE_DESCRIPTION("x86 generic MSR driver");
+MODULE_LICENSE("GPL");
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/4] MSR: Prep for separating msr.c into three files
  2016-02-26  0:02 [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Marty McFadden
  2016-02-26  0:02 ` [PATCH 1/4] MSR: Prep for separating msr.c into three files Marty McFadden
@ 2016-02-26  0:02 ` Marty McFadden
  2016-02-26  0:02 ` [PATCH 3/4] MSR: msr Whitelist Implementation Marty McFadden
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Marty McFadden @ 2016-02-26  0:02 UTC (permalink / raw)
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mcfadden8, mingo, pavel, tglx,
	viro, x86, yu.c.chen

No functional change (yet).  Just updated Makefile to successfully
build msr_entry.c instead of msr.c.  This is in preparation for
splitting the MSR kernel module into three separate implementation
files:

  1) msr_entry.c - Original msr driver and entry (now)
  2) msr_whitelist.c - MSR Whitelist implementation (patch 2)
  3) msr_batch.c - MSR Batch implementation (patch 3)

Signed-off-by: Marty McFadden <mcfadden8@llnl.gov>
---
 arch/x86/kernel/Makefile |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b1b78ff..7e96dff 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -54,6 +54,7 @@ obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
 obj-y				+= acpi/
 obj-y				+= reboot.o
+msr-y				+= msr_entry.o
 obj-$(CONFIG_X86_MSR)		+= msr.o
 obj-$(CONFIG_X86_CPUID)		+= cpuid.o
 obj-$(CONFIG_PCI)		+= early-quirks.o
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/4] MSR: msr Whitelist Implementation
  2016-02-26  0:02 [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Marty McFadden
  2016-02-26  0:02 ` [PATCH 1/4] MSR: Prep for separating msr.c into three files Marty McFadden
  2016-02-26  0:02 ` [PATCH 2/4] " Marty McFadden
@ 2016-02-26  0:02 ` Marty McFadden
  2016-02-26  1:05   ` [PATCH] MSR: fix badzero.cocci warnings kbuild test robot
  2016-02-26  1:05   ` [PATCH 3/4] MSR: msr Whitelist Implementation kbuild test robot
  2016-02-26  0:02 ` [PATCH 4/4] MSR: msr Batch processing feature Marty McFadden
  2016-02-26  7:37 ` [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Ingo Molnar
  4 siblings, 2 replies; 29+ messages in thread
From: Marty McFadden @ 2016-02-26  0:02 UTC (permalink / raw)
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mcfadden8, mingo, pavel, tglx,
	viro, x86, yu.c.chen

Allows the administrator to configure a set of bit masks for MSRs
where access is permitted.

Whitelist Administration:
  To configure whitelist (as root):
    cat whitelistfile > /dev/cpu/msr_whitelist

  This operation will cause the previous whitelist to be replaced by
  the specified whitelist.

  To enumerate current whitelist (as root):
    cat < /dev/cpu/msr_whitelist

  To remove whitelist (as root):
    echo > /dev/cpu/msr_whitelist

Security model:
  If user has CAP_SYS_RAWIO privileges, they will enjoy full
  access to MSRs like they do today.

  Otherwise, if the user is able to open the /dev/cpu/*/msr
  file, they will have access to MSR operations as follows:

    If the write mask exists for a particular MSR, then
    rdmsr access to that MSR access is granted.

    If the write mask is set to all ones (0xffffffffffffffff),
    then the user may perform a "raw" wrmsr operation with all
    64 bits being overwritten to that MSR.

    If the write mask is not 0xffffffffffffffff, then a rdmsr
    will be performed first and only the bits set in the write
    mask will be affected in the MSR.

Signed-off-by: Marty McFadden <mcfadden8@llnl.gov>
---
 arch/x86/kernel/Makefile        |    2 +-
 arch/x86/kernel/msr_entry.c     |   59 ++++++-
 arch/x86/kernel/msr_whitelist.c |  344 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/msr_whitelist.h |   38 +++++
 4 files changed, 438 insertions(+), 5 deletions(-)
 create mode 100644 arch/x86/kernel/msr_whitelist.c
 create mode 100644 arch/x86/kernel/msr_whitelist.h

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 7e96dff..7ceed68 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -54,7 +54,7 @@ obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
 obj-y				+= acpi/
 obj-y				+= reboot.o
-msr-y				+= msr_entry.o
+msr-y				+= msr_entry.o msr_whitelist.o
 obj-$(CONFIG_X86_MSR)		+= msr.o
 obj-$(CONFIG_X86_CPUID)		+= cpuid.o
 obj-$(CONFIG_PCI)		+= early-quirks.o
diff --git a/arch/x86/kernel/msr_entry.c b/arch/x86/kernel/msr_entry.c
index 64f9616..216cd87 100644
--- a/arch/x86/kernel/msr_entry.c
+++ b/arch/x86/kernel/msr_entry.c
@@ -29,6 +29,7 @@
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/fcntl.h>
+#include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/poll.h>
 #include <linux/smp.h>
@@ -42,8 +43,12 @@
 
 #include <asm/processor.h>
 #include <asm/msr.h>
+#include "msr_whitelist.h"
 
 static struct class *msr_class;
+struct msr_session_info {
+	int rawio_allowed;
+};
 
 static ssize_t msr_read(struct file *file, char __user *buf,
 			size_t count, loff_t *ppos)
@@ -54,10 +59,14 @@ static ssize_t msr_read(struct file *file, char __user *buf,
 	int cpu = iminor(file_inode(file));
 	int err = 0;
 	ssize_t bytes = 0;
+	struct msr_session_info *myinfo = file->private_data;
 
 	if (count % 8)
 		return -EINVAL;	/* Invalid chunk size */
 
+	if (!myinfo->rawio_allowed && !msr_whitelist_maskexists(reg))
+		return -EACCES;
+
 	for (; count; count -= 8) {
 		err = rdmsr_safe_on_cpu(cpu, reg, &data[0], &data[1]);
 		if (err)
@@ -77,20 +86,41 @@ static ssize_t msr_write(struct file *file, const char __user *buf,
 			 size_t count, loff_t *ppos)
 {
 	const u32 __user *tmp = (const u32 __user *)buf;
+	u32 curdata[2];
 	u32 data[2];
 	u32 reg = *ppos;
+	u64 mask;
 	int cpu = iminor(file_inode(file));
 	int err = 0;
 	ssize_t bytes = 0;
+	struct msr_session_info *myinfo = file->private_data;
 
 	if (count % 8)
 		return -EINVAL;	/* Invalid chunk size */
 
+	mask = myinfo->rawio_allowed ? 0xffffffffffffffff :
+						msr_whitelist_writemask(reg);
+
+	if (!myinfo->rawio_allowed && mask == 0)
+		return -EACCES;
+
 	for (; count; count -= 8) {
 		if (copy_from_user(&data, tmp, 8)) {
 			err = -EFAULT;
 			break;
 		}
+
+		if (mask != 0xffffffffffffffff) {
+			err = rdmsr_safe_on_cpu(cpu, reg,
+						&curdata[0], &curdata[1]);
+			if (err)
+				break;
+
+			*(u64 *)&curdata[0] &= ~mask;
+			*(u64 *)&data[0] &= mask;
+			*(u64 *)&data[0] |= *(u64 *)&curdata[0];
+		}
+
 		err = wrmsr_safe_on_cpu(cpu, reg, data[0], data[1]);
 		if (err)
 			break;
@@ -153,9 +183,7 @@ static int msr_open(struct inode *inode, struct file *file)
 {
 	unsigned int cpu = iminor(file_inode(file));
 	struct cpuinfo_x86 *c;
-
-	if (!capable(CAP_SYS_RAWIO))
-		return -EPERM;
+	struct msr_session_info *myinfo;
 
 	if (cpu >= nr_cpu_ids || !cpu_online(cpu))
 		return -ENXIO;	/* No such CPU */
@@ -164,6 +192,20 @@ static int msr_open(struct inode *inode, struct file *file)
 	if (!cpu_has(c, X86_FEATURE_MSR))
 		return -EIO;	/* MSR not supported */
 
+	myinfo = kmalloc(sizeof(*myinfo), GFP_KERNEL);
+	if (!myinfo)
+		return -ENOMEM;
+
+	myinfo->rawio_allowed = capable(CAP_SYS_RAWIO);
+	file->private_data = myinfo;
+
+	return 0;
+}
+
+static int msr_close(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+	file->private_data = 0;
 	return 0;
 }
 
@@ -178,6 +220,7 @@ static const struct file_operations msr_fops = {
 	.open = msr_open,
 	.unlocked_ioctl = msr_ioctl,
 	.compat_ioctl = msr_ioctl,
+	.release = msr_close
 };
 
 static int msr_device_create(int cpu)
@@ -227,10 +270,15 @@ static int __init msr_init(void)
 	int i, err = 0;
 	i = 0;
 
+	err = msr_whitelist_init();
+	if (err != 0) {
+		pr_err("failed to initialize whitelist for msr\n");
+		goto out;
+	}
 	if (__register_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr", &msr_fops)) {
 		pr_err("unable to get major %d for msr\n", MSR_MAJOR);
 		err = -EBUSY;
-		goto out;
+		goto out_wlist;
 	}
 	msr_class = class_create(THIS_MODULE, "msr");
 	if (IS_ERR(msr_class)) {
@@ -259,6 +307,8 @@ out_class:
 	class_destroy(msr_class);
 out_chrdev:
 	__unregister_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr");
+out_wlist:
+	msr_whitelist_cleanup();
 out:
 	return err;
 }
@@ -274,6 +324,7 @@ static void __exit msr_exit(void)
 	__unregister_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr");
 	__unregister_hotcpu_notifier(&msr_class_cpu_notifier);
 	cpu_notifier_register_done();
+	msr_whitelist_cleanup();
 }
 
 module_init(msr_init);
diff --git a/arch/x86/kernel/msr_whitelist.c b/arch/x86/kernel/msr_whitelist.c
new file mode 100644
index 0000000..7d8affc
--- /dev/null
+++ b/arch/x86/kernel/msr_whitelist.c
@@ -0,0 +1,344 @@
+/*
+ * MSR Whitelist implementation
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/hashtable.h>
+#include <linux/mutex.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/ctype.h>
+#include <linux/device.h>
+
+#define MAX_WLIST_BSIZE ((128 * 1024) + 1) /* "+1" for null character */
+
+struct whitelist_entry {
+	u64 wmask;	/* Bits that may be written */
+	u64 msr;	/* Address of msr (used as hash key) */
+	u64 *msrdata;	/* ptr to original msr contents of writable bits */
+	struct hlist_node hlist;
+};
+
+static void delete_whitelist(void);
+static int create_whitelist(int nentries);
+static struct whitelist_entry *find_in_whitelist(u64 msr);
+static void add_to_whitelist(struct whitelist_entry *entry);
+static int parse_next_whitelist_entry(char *inbuf, char **nextinbuf,
+						struct whitelist_entry *entry);
+static ssize_t read_whitelist(struct file *file, char __user *buf,
+						size_t count, loff_t *ppos);
+static int majordev;
+static struct class *cdev_class;
+static char cdev_created;
+static char cdev_registered;
+static char cdev_class_created;
+
+static DEFINE_HASHTABLE(whitelist_hash, 6);
+static DEFINE_MUTEX(whitelist_mutex);
+static struct whitelist_entry *whitelist;
+static int whitelist_numentries;
+
+int msr_whitelist_maskexists(loff_t reg)
+{
+	struct whitelist_entry *entry;
+
+	mutex_lock(&whitelist_mutex);
+	entry = find_in_whitelist((u64)reg);
+	mutex_unlock(&whitelist_mutex);
+
+	return entry != NULL;
+}
+
+u64 msr_whitelist_writemask(loff_t reg)
+{
+	struct whitelist_entry *entry;
+
+	mutex_lock(&whitelist_mutex);
+	entry = find_in_whitelist((u64)reg);
+	mutex_unlock(&whitelist_mutex);
+
+	return entry ? entry->wmask : 0;
+}
+
+static int open_whitelist(struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+/*
+ * After copying data from user space, we make two passes through it.
+ * The first pass is to ensure that the input file is valid. If the file is
+ * valid, we will then delete the current white list and then perform the
+ * second pass to actually create the new white list.
+ */
+static ssize_t write_whitelist(struct file *file, const char __user *buf,
+						size_t count, loff_t *ppos)
+{
+	int err = 0;
+	const u32 __user *tmp = (const u32 __user *)buf;
+	char *s;
+	int res;
+	int num_entries;
+	struct whitelist_entry *entry;
+	char *kbuf;
+
+	if (count <= 2) {
+		mutex_lock(&whitelist_mutex);
+		delete_whitelist();
+		hash_init(whitelist_hash);
+		mutex_unlock(&whitelist_mutex);
+		return count;
+	}
+
+	if (count+1 > MAX_WLIST_BSIZE) {
+		pr_err("write_whitelist: buffer of %zu bytes too large\n",
+		    count);
+		return -EINVAL;
+	}
+
+	kbuf = kzalloc(count+1, GFP_KERNEL);
+	if (!kbuf)
+		return -ENOMEM;
+
+	if (copy_from_user(kbuf, tmp, count)) {
+		err = -EFAULT;
+		goto out_freebuffer;
+	}
+
+	/* Pass 1: */
+	for (num_entries = 0, s = kbuf, res = 1; res > 0; ) {
+		res = parse_next_whitelist_entry(s, &s, 0);
+		if (res < 0) {
+			err = res;
+			goto out_freebuffer;
+		}
+
+		if (res)
+			num_entries++;
+	}
+
+	/* Pass 2: */
+	mutex_lock(&whitelist_mutex);
+	res = create_whitelist(num_entries);
+	if (res < 0) {
+		err = res;
+		goto out_releasemutex;
+	}
+
+	for (entry = whitelist, s = kbuf, res = 1; res > 0; entry++) {
+		res = parse_next_whitelist_entry(s, &s, entry);
+		if (res < 0) {
+			pr_alert("write_whitelist: Table corrupted\n");
+			delete_whitelist();
+			err = res; /* This should not happen! */
+			goto out_releasemutex;
+		}
+
+		if (res) {
+			if (find_in_whitelist(entry->msr)) {
+				pr_err("write_whitelist: Duplicate: %llx\n",
+							 entry->msr);
+				err = -EINVAL;
+				delete_whitelist();
+				goto out_releasemutex;
+			}
+			add_to_whitelist(entry);
+		}
+	}
+
+out_releasemutex:
+	mutex_unlock(&whitelist_mutex);
+out_freebuffer:
+	kfree(kbuf);
+	return err ? err : count;
+}
+
+static ssize_t read_whitelist(struct file *file, char __user *buf,
+						size_t count, loff_t *ppos)
+{
+	loff_t idx = *ppos;
+	u32 __user *tmp = (u32 __user *) buf;
+	char kbuf[160];
+	int len;
+	struct whitelist_entry e;
+
+	mutex_lock(&whitelist_mutex);
+	*ppos = 0;
+
+	if (idx >= whitelist_numentries || idx < 0) {
+		mutex_unlock(&whitelist_mutex);
+		return 0;
+	}
+
+	e = whitelist[idx];
+	mutex_unlock(&whitelist_mutex);
+
+	len = sprintf(kbuf,
+		"MSR: %08llx Write Mask: %016llx\n", e.msr, e.wmask);
+
+	if (len > count)
+		return -EFAULT;
+
+	if (copy_to_user(tmp, kbuf, len))
+		return -EFAULT;
+
+	*ppos = idx+1;
+	return len;
+}
+
+static const struct file_operations fops = {
+	.owner = THIS_MODULE,
+	.read = read_whitelist,
+	.write = write_whitelist,
+	.open = open_whitelist
+};
+
+static void delete_whitelist(void)
+{
+	if (whitelist == 0)
+		return;
+
+	if (whitelist->msrdata != 0)
+		kfree(whitelist->msrdata);
+
+	kfree(whitelist);
+	whitelist = 0;
+	whitelist_numentries = 0;
+}
+
+static int create_whitelist(int nentries)
+{
+	hash_init(whitelist_hash);
+	delete_whitelist();
+	whitelist_numentries = nentries;
+	whitelist = kcalloc(nentries, sizeof(*whitelist), GFP_KERNEL);
+
+	if (!whitelist)
+		return -ENOMEM;
+	return 0;
+}
+
+static struct whitelist_entry *find_in_whitelist(u64 msr)
+{
+	struct whitelist_entry *entry = 0;
+
+	if (whitelist) {
+		hash_for_each_possible(whitelist_hash, entry, hlist, msr)
+			if (entry && entry->msr == msr)
+				return entry;
+	}
+	return 0;
+}
+
+static void add_to_whitelist(struct whitelist_entry *entry)
+{
+	hash_add(whitelist_hash, &entry->hlist, entry->msr);
+}
+
+static int parse_next_whitelist_entry(char *inbuf, char **nextinbuf,
+						struct whitelist_entry *entry)
+{
+	char *s = skip_spaces(inbuf);
+	int i;
+	u64 data[2];
+
+	while (*s == '#') { /* Skip remaining portion of line */
+		for (s = s + 1; *s && *s != '\n'; s++)
+			;
+		s = skip_spaces(s);
+	}
+
+	if (*s == 0)
+		return 0; /* This means we are done with the input buffer */
+
+	for (i = 0; i < 2; i++) {/* we should have the first of 3 #s now */
+		char *s2;
+		int err;
+		char tmp;
+
+		s2 = s = skip_spaces(s);
+		while (!isspace(*s) && *s)
+			s++;
+
+		if (*s == 0) {
+			pr_err("parse_next_whitelist_entry: Premature EOF");
+			return -EINVAL;
+		}
+
+		tmp = *s;
+		*s = 0; /* Null-terminate this portion of string */
+		err = kstrtoull(s2, 0, &data[i]);
+		if (err)
+			return err;
+		*s++ = tmp;
+	}
+
+	if (entry) {
+		entry->msr = data[0];
+		entry->wmask = data[1];
+	}
+
+	*nextinbuf = s; /* Return where we left off to caller */
+	return *nextinbuf - inbuf;
+}
+
+static char *msr_whitelist_nodename(struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "cpu/msr_whitelist");
+}
+
+void msr_whitelist_cleanup(void)
+{
+	delete_whitelist();
+
+	if (cdev_created) {
+		cdev_created = 0;
+		device_destroy(cdev_class, MKDEV(majordev, 0));
+	}
+
+	if (cdev_class_created) {
+		cdev_class_created = 0;
+		class_destroy(cdev_class);
+	}
+
+	if (cdev_registered) {
+		cdev_registered = 0;
+		unregister_chrdev(majordev, "cpu/msr_whitelist");
+	}
+}
+
+int msr_whitelist_init(void)
+{
+	int err;
+	struct device *dev;
+
+	majordev = register_chrdev(0, "cpu/msr_whitelist", &fops);
+	if (majordev < 0) {
+		pr_err("msr_whitelist_init: unable to register chrdev\n");
+		msr_whitelist_cleanup();
+		return -EBUSY;
+	}
+	cdev_registered = 1;
+
+	cdev_class = class_create(THIS_MODULE, "msr_whitelist");
+	if (IS_ERR(cdev_class)) {
+		err = PTR_ERR(cdev_class);
+		msr_whitelist_cleanup();
+		return err;
+	}
+	cdev_class_created = 1;
+
+	cdev_class->devnode = msr_whitelist_nodename;
+
+	dev = device_create(cdev_class, NULL, MKDEV(majordev, 0),
+						NULL, "msr_whitelist");
+	if (IS_ERR(dev)) {
+		err = PTR_ERR(dev);
+		msr_whitelist_cleanup();
+		return err;
+	}
+	cdev_created = 1;
+	return 0;
+}
diff --git a/arch/x86/kernel/msr_whitelist.h b/arch/x86/kernel/msr_whitelist.h
new file mode 100644
index 0000000..529b4af
--- /dev/null
+++ b/arch/x86/kernel/msr_whitelist.h
@@ -0,0 +1,38 @@
+/*
+ * Internal declarations for x86 MSR whitelist implementation functions.
+ *
+ * Copyright (c) 2015, Lawrence Livermore National Security, LLC.
+ * Produced at the Lawrence Livermore National Laboratory
+ * All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License v.2.
+ *
+ * Thank you to everyone who has contributed and helped with this project:
+ *
+ * Kathleen Shoga
+ * Peter Bailey
+ * Trent D'Hooge
+ * Jim Foraker
+ * David Lowenthal
+ * Tapasya Patki
+ * Barry Rountree
+ * Marty McFadden
+ *
+ * Special thanks to Kendrick Shaw at Case Western Reserve University for
+ * his initial suggestion to explore MSRs.
+ *
+ * Latest Updates from: Marty McFadden, mcfadden8@llnl.gov
+ */
+#ifndef _ARCH_X68_KERNEL_MSR_WHITELIST_H
+#define _ARCH_X68_KERNEL_MSR_WHITELIST_H 1
+
+#include <linux/types.h>
+
+int msr_whitelist_init(void);
+int msr_whitelist_cleanup(void);
+int msr_whitelist_maskexists(loff_t reg);
+u64 msr_whitelist_writemask(loff_t reg);
+
+#endif /* _ARCH_X68_KERNEL_MSR_WHITELIST_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/4] MSR: msr Batch processing feature
  2016-02-26  0:02 [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Marty McFadden
                   ` (2 preceding siblings ...)
  2016-02-26  0:02 ` [PATCH 3/4] MSR: msr Whitelist Implementation Marty McFadden
@ 2016-02-26  0:02 ` Marty McFadden
  2016-02-26  2:48   ` Andy Lutomirski
  2016-03-03 17:21   ` One Thousand Gnomes
  2016-02-26  7:37 ` [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Ingo Molnar
  4 siblings, 2 replies; 29+ messages in thread
From: Marty McFadden @ 2016-02-26  0:02 UTC (permalink / raw)
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mcfadden8, mingo, pavel, tglx,
	viro, x86, yu.c.chen

Provides a new ioctl interface through /dev/cpu/msr_batch.

This implementation will cause an Inter Processor Interrupt to be sent
to each destination processor and will wait until all processors have
finished processing their respective batch of MSR operations before
returning.

Implementation Note: A separate "error" field is maintained per MSR
operation in order to maintain reentrancy into the IPI callback
function.

Signed-off-by: Marty McFadden <mcfadden8@llnl.gov>
---
 arch/x86/include/asm/msr.h      |   32 ++++++
 arch/x86/include/uapi/asm/msr.h |   15 +++
 arch/x86/kernel/Makefile        |    2 +-
 arch/x86/kernel/msr_batch.c     |  234 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/msr_batch.h     |    7 +
 arch/x86/kernel/msr_entry.c     |   11 ++-
 arch/x86/lib/msr-smp.c          |   51 +++++++++
 7 files changed, 350 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/kernel/msr_batch.c
 create mode 100644 arch/x86/kernel/msr_batch.h

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 93fb7c1..9e43c83 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -299,6 +299,7 @@ int rdmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 *q);
 int wrmsrl_safe_on_cpu(unsigned int cpu, u32 msr_no, u64 q);
 int rdmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8]);
 int wrmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8]);
+int msr_safe_batch(struct msr_batch_array *oa);
 #else  /*  CONFIG_SMP  */
 static inline int rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h)
 {
@@ -355,6 +356,37 @@ static inline int wrmsr_safe_regs_on_cpu(unsigned int cpu, u32 regs[8])
 {
 	return wrmsr_safe_regs(regs);
 }
+static inline int msr_safe_batch(struct msr_batch_array *oa)
+{
+	struct msr_batch_op *op;
+	int result = 0;
+	u32 *dp;
+	u64 oldmsr;
+	u64 newmsr;
+
+	for (op = oa->ops; op < oa->ops + oa->numops; ++op) {
+		op->err = 0;
+		dp = (u32 *)&oldmsr;
+		if (rdmsr_safe(op->msr, &dp[0], &dp[1])) {
+			op->err = -EIO;
+			result = op->err;
+			continue;
+		}
+		if (op->isrdmsr) {
+			op->msrdata = oldmsr;
+			continue;
+		}
+
+		newmsr = op->msrdata & op->wmask;
+		newmsr |= (oldmsr & ~op->wmask);
+		dp = (u32 *)&newmsr;
+		if (wrmsr_safe(op->msr, dp[0], dp[1])) {
+			op->err = -EIO;
+			result = op->err;
+		}
+	}
+	return result;
+}
 #endif  /* CONFIG_SMP */
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_MSR_H */
diff --git a/arch/x86/include/uapi/asm/msr.h b/arch/x86/include/uapi/asm/msr.h
index c41f4fe..b678914 100644
--- a/arch/x86/include/uapi/asm/msr.h
+++ b/arch/x86/include/uapi/asm/msr.h
@@ -6,8 +6,23 @@
 #include <linux/types.h>
 #include <linux/ioctl.h>
 
+struct msr_batch_op {
+	__u16 cpu;		/* In: CPU to execute {rd/wr}msr ins. */
+	__u16 isrdmsr;		/* In: 0=wrmsr, non-zero=rdmsr */
+	__s32 err;		/* Out: set if error occurred with this op */
+	__u32 msr;		/* In: MSR Address to perform op */
+	__u64 msrdata;		/* In/Out: Input/Result to/from operation */
+	__u64 wmask;		/* Out: Write mask applied to wrmsr */
+};
+
+struct msr_batch_array {
+	__u32 numops;			/* In: # of operations in ops array */
+	struct msr_batch_op *ops;	/* In: Array[numops] of operations */
+};
+
 #define X86_IOC_RDMSR_REGS	_IOWR('c', 0xA0, __u32[8])
 #define X86_IOC_WRMSR_REGS	_IOWR('c', 0xA1, __u32[8])
+#define X86_IOC_MSR_BATCH	_IOWR('c', 0xA2, struct msr_batch_array)
 
 #endif /* __ASSEMBLY__ */
 #endif /* _UAPI_ASM_X86_MSR_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 7ceed68..e340384 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -54,7 +54,7 @@ obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
 obj-y				+= cpu/
 obj-y				+= acpi/
 obj-y				+= reboot.o
-msr-y				+= msr_entry.o msr_whitelist.o
+msr-y				+= msr_entry.o msr_whitelist.o msr_batch.o
 obj-$(CONFIG_X86_MSR)		+= msr.o
 obj-$(CONFIG_X86_CPUID)		+= cpuid.o
 obj-$(CONFIG_PCI)		+= early-quirks.o
diff --git a/arch/x86/kernel/msr_batch.c b/arch/x86/kernel/msr_batch.c
new file mode 100644
index 0000000..21b253c
--- /dev/null
+++ b/arch/x86/kernel/msr_batch.c
@@ -0,0 +1,234 @@
+/*
+ * x86 MSR batch access device
+ *
+ * This device is accessed by ioctl() to submit a batch of MSR requests
+ * which may be used instead of or in addition to the lseek()/write()/read()
+ * mechanism provided by msr_safe.c
+ *
+ * This driver uses /dev/cpu/msr_batch as its device file.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/cpu.h>
+#include <linux/uaccess.h>
+#include <linux/mutex.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/module.h>
+#include <asm/msr.h>
+#include "msr_whitelist.h"
+#include "msr_batch.h"
+
+static int majordev;
+static struct class *cdev_class;
+static char cdev_created;
+static char cdev_registered;
+static char cdev_class_created;
+
+struct msrbatch_session_info {
+	int rawio_allowed;
+};
+
+static int msrbatch_open(struct inode *inode, struct file *file)
+{
+	unsigned int cpu;
+	struct cpuinfo_x86 *c;
+	struct msrbatch_session_info *myinfo;
+
+	cpu = iminor(file->f_path.dentry->d_inode);
+	if (cpu >= nr_cpu_ids || !cpu_online(cpu)) {
+		pr_err("cpu #%u does not exist\n", cpu);
+		return -ENXIO; /* No such CPU */
+	}
+
+	c = &cpu_data(cpu);
+	if (!cpu_has(c, X86_FEATURE_MSR)) {
+		pr_err("cpu #%u does not have MSR feature.\n", cpu);
+		return -EIO;	/* MSR not supported */
+	}
+
+	myinfo = kmalloc(sizeof(*myinfo), GFP_KERNEL);
+	if (!myinfo)
+		return -ENOMEM;
+
+	myinfo->rawio_allowed = capable(CAP_SYS_RAWIO);
+	file->private_data = myinfo;
+
+	return 0;
+}
+
+static int msrbatch_close(struct inode *inode, struct file *file)
+{
+	kfree(file->private_data);
+	file->private_data = 0;
+	return 0;
+}
+
+static int msrbatch_apply_whitelist(struct msr_batch_array *oa,
+				struct msrbatch_session_info *myinfo)
+{
+	struct msr_batch_op *op;
+	int err = 0;
+
+	for (op = oa->ops; op < oa->ops + oa->numops; ++op) {
+		op->err = 0;
+		if (myinfo->rawio_allowed) {
+			op->wmask = 0xffffffffffffffff;
+			continue;
+		}
+
+		if (!msr_whitelist_maskexists(op->msr)) {
+			pr_err("No whitelist entry for MSR %x\n", op->msr);
+			op->err = err = -EACCES;
+		} else {
+			op->wmask = msr_whitelist_writemask(op->msr);
+			/*
+			 * Check for read-only case
+			 */
+			if (op->wmask == 0 && !op->isrdmsr) {
+				if (!myinfo->rawio_allowed) {
+					pr_err("MSR %x is read-only\n",
+								op->msr);
+					op->err = err = -EACCES;
+				}
+			}
+		}
+	}
+	return err;
+}
+
+static long msrbatch_ioctl(struct file *f, unsigned int ioc, unsigned long arg)
+{
+	int err = 0;
+	struct msr_batch_array __user *uoa;
+	struct msr_batch_op __user *uops;
+	struct msr_batch_array koa;
+	struct msrbatch_session_info *myinfo = f->private_data;
+
+	if (ioc != X86_IOC_MSR_BATCH) {
+		pr_err("Invalid ioctl op %u\n", ioc);
+		return -ENOTTY;
+	}
+
+	if (!(f->f_mode & FMODE_READ)) {
+		pr_err("File not open for reading\n");
+		return -EBADF;
+	}
+
+	uoa = (struct msr_batch_array *)arg;
+
+	if (copy_from_user(&koa, uoa, sizeof(koa))) {
+		pr_err("Copy of batch array descriptor failed\n");
+		return -EFAULT;
+	}
+
+	if (koa.numops <= 0) {
+		pr_err("Invalid # of ops %d\n", koa.numops);
+		return -EINVAL;
+	}
+
+	uops = koa.ops;
+
+	koa.ops = kmalloc_array(koa.numops, sizeof(*koa.ops), GFP_KERNEL);
+	if (!koa.ops)
+		return -ENOMEM;
+
+	if (copy_from_user(koa.ops, uops, koa.numops * sizeof(*koa.ops))) {
+		pr_err("Copy of batch array failed\n");
+		err = -EFAULT;
+		goto bundle_alloc;
+	}
+
+	err = msrbatch_apply_whitelist(&koa, myinfo);
+	if (err) {
+		pr_err("Failed to apply whitelist %d\n", err);
+		goto copyout_and_return;
+	}
+
+	err = msr_safe_batch(&koa);
+	if (err != 0) {
+		pr_err("msr_safe_batch failed: %d\n", err);
+		goto copyout_and_return;
+	}
+
+copyout_and_return:
+	if (copy_to_user(uops, koa.ops, koa.numops * sizeof(*uops))) {
+		pr_err("copy batch data back to user failed\n");
+		if (!err)
+			err = -EFAULT;
+	}
+bundle_alloc:
+	kfree(koa.ops);
+
+	return err;
+}
+
+static const struct file_operations fops = {
+	.owner = THIS_MODULE,
+	.open = msrbatch_open,
+	.unlocked_ioctl = msrbatch_ioctl,
+	.compat_ioctl = msrbatch_ioctl,
+	.release = msrbatch_close
+};
+
+void msrbatch_cleanup(void)
+{
+	if (cdev_created) {
+		cdev_created = 0;
+		device_destroy(cdev_class, MKDEV(majordev, 0));
+	}
+
+	if (cdev_class_created) {
+		cdev_class_created = 0;
+		class_destroy(cdev_class);
+	}
+
+	if (cdev_registered) {
+		cdev_registered = 0;
+		unregister_chrdev(majordev, "cpu/msr_batch");
+	}
+}
+
+static char *msrbatch_nodename(struct device *dev, umode_t *mode)
+{
+	return kasprintf(GFP_KERNEL, "cpu/msr_batch");
+}
+
+int msrbatch_init(void)
+{
+	int err;
+	struct device *dev;
+
+	majordev = register_chrdev(0, "cpu/msr_batch", &fops);
+	if (majordev < 0) {
+		pr_err("msrbatch_init: unable to register chrdev\n");
+		msrbatch_cleanup();
+		return -EBUSY;
+	}
+	cdev_registered = 1;
+
+	cdev_class = class_create(THIS_MODULE, "msr_batch");
+	if (IS_ERR(cdev_class)) {
+		err = PTR_ERR(cdev_class);
+		msrbatch_cleanup();
+		return err;
+	}
+	cdev_class_created = 1;
+
+	cdev_class->devnode = msrbatch_nodename;
+
+	dev = device_create(cdev_class, NULL, MKDEV(majordev, 0),
+						NULL, "msr_batch");
+	if (IS_ERR(dev)) {
+		err = PTR_ERR(dev);
+		msrbatch_cleanup();
+		return err;
+	}
+	cdev_created = 1;
+	return 0;
+}
diff --git a/arch/x86/kernel/msr_batch.h b/arch/x86/kernel/msr_batch.h
new file mode 100644
index 0000000..da224c3
--- /dev/null
+++ b/arch/x86/kernel/msr_batch.h
@@ -0,0 +1,7 @@
+#ifndef MSR_BATCH_INC
+#define MSR_BATCH_INC 1
+#include <asm/msr.h>
+
+extern void msrbatch_cleanup(void);
+extern int msrbatch_init(void);
+#endif /* MSR_BATCH_INC */
diff --git a/arch/x86/kernel/msr_entry.c b/arch/x86/kernel/msr_entry.c
index 216cd87..957a89e 100644
--- a/arch/x86/kernel/msr_entry.c
+++ b/arch/x86/kernel/msr_entry.c
@@ -44,6 +44,7 @@
 #include <asm/processor.h>
 #include <asm/msr.h>
 #include "msr_whitelist.h"
+#include "msr_batch.h"
 
 static struct class *msr_class;
 struct msr_session_info {
@@ -270,10 +271,15 @@ static int __init msr_init(void)
 	int i, err = 0;
 	i = 0;
 
+	err = msrbatch_init();
+	if (err != 0) {
+		pr_err("failed to initialize msrbatch\n");
+		goto out;
+	}
 	err = msr_whitelist_init();
 	if (err != 0) {
 		pr_err("failed to initialize whitelist for msr\n");
-		goto out;
+		goto out_batch;
 	}
 	if (__register_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr", &msr_fops)) {
 		pr_err("unable to get major %d for msr\n", MSR_MAJOR);
@@ -309,6 +315,8 @@ out_chrdev:
 	__unregister_chrdev(MSR_MAJOR, 0, NR_CPUS, "cpu/msr");
 out_wlist:
 	msr_whitelist_cleanup();
+out_batch:
+	msrbatch_cleanup();
 out:
 	return err;
 }
@@ -325,6 +333,7 @@ static void __exit msr_exit(void)
 	__unregister_hotcpu_notifier(&msr_class_cpu_notifier);
 	cpu_notifier_register_done();
 	msr_whitelist_cleanup();
+	msrbatch_cleanup();
 }
 
 module_init(msr_init);
diff --git a/arch/x86/lib/msr-smp.c b/arch/x86/lib/msr-smp.c
index 518532e..98a3f19 100644
--- a/arch/x86/lib/msr-smp.c
+++ b/arch/x86/lib/msr-smp.c
@@ -264,3 +264,54 @@ int wrmsr_safe_regs_on_cpu(unsigned int cpu, u32 *regs)
 	return err ? err : rv.err;
 }
 EXPORT_SYMBOL(wrmsr_safe_regs_on_cpu);
+
+static void __msr_safe_batch(void *info)
+{
+	struct msr_batch_array *oa = info;
+	struct msr_batch_op *op;
+	int this_cpu = smp_processor_id();
+	u32 *dp;
+	u64 oldmsr;
+	u64 newmsr;
+
+	for (op = oa->ops; op < oa->ops + oa->numops; ++op) {
+		if (op->cpu != this_cpu)
+			continue;
+
+		op->err = 0;
+		dp = (u32 *)&oldmsr;
+		if (rdmsr_safe(op->msr, &dp[0], &dp[1])) {
+			op->err = -EIO;
+			continue;
+		}
+		if (op->isrdmsr) {
+			op->msrdata = oldmsr;
+			continue;
+		}
+
+		newmsr = op->msrdata & op->wmask;
+		newmsr |= (oldmsr & ~op->wmask);
+		dp = (u32 *)&newmsr;
+		if (wrmsr_safe(op->msr, dp[0], dp[1]))
+			op->err = -EIO;
+	}
+}
+
+int msr_safe_batch(struct msr_batch_array *oa)
+{
+	struct cpumask cpus_to_run_on;
+	struct msr_batch_op *op;
+
+	cpumask_clear(&cpus_to_run_on);
+	for (op = oa->ops; op < oa->ops + oa->numops; ++op)
+		cpumask_set_cpu(op->cpu, &cpus_to_run_on);
+
+	on_each_cpu_mask(&cpus_to_run_on, __msr_safe_batch, oa, 1);
+
+	for (op = oa->ops; op < oa->ops + oa->numops; ++op)
+		if (op->err)
+			return op->err;
+
+	return 0;
+}
+EXPORT_SYMBOL(msr_safe_batch);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH] MSR: fix badzero.cocci warnings
  2016-02-26  0:02 ` [PATCH 3/4] MSR: msr Whitelist Implementation Marty McFadden
@ 2016-02-26  1:05   ` kbuild test robot
  2016-02-26  1:05   ` [PATCH 3/4] MSR: msr Whitelist Implementation kbuild test robot
  1 sibling, 0 replies; 29+ messages in thread
From: kbuild test robot @ 2016-02-26  1:05 UTC (permalink / raw)
  To: Marty McFadden
  Cc: kbuild-all, ak, andriy.shevchenko, bp, bp, brgerst,
	dan.j.williams, dyoung, hpa, linux, linux-kernel, luto,
	mcfadden8, mingo, pavel, tglx, viro, x86, yu.c.chen

arch/x86/kernel/msr_whitelist.c:200:18-19: WARNING comparing pointer to 0
arch/x86/kernel/msr_whitelist.c:203:27-28: WARNING comparing pointer to 0

 Compare pointer-typed values to NULL rather than 0

Semantic patch information:
 This makes an effort to choose between !x and x == NULL.  !x is used
 if it has previously been used with the function used to initialize x.
 This relies on type information.  More type information can be obtained
 using the option -all_includes and the option -I to specify an
 include path.

Generated by: scripts/coccinelle/null/badzero.cocci

CC: Marty McFadden <mcfadden8@llnl.gov>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---

 msr_whitelist.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/msr_whitelist.c
+++ b/arch/x86/kernel/msr_whitelist.c
@@ -197,10 +197,10 @@ static const struct file_operations fops
 
 static void delete_whitelist(void)
 {
-	if (whitelist == 0)
+	if (whitelist == NULL)
 		return;
 
-	if (whitelist->msrdata != 0)
+	if (whitelist->msrdata != NULL)
 		kfree(whitelist->msrdata);
 
 	kfree(whitelist);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/4] MSR: msr Whitelist Implementation
  2016-02-26  0:02 ` [PATCH 3/4] MSR: msr Whitelist Implementation Marty McFadden
  2016-02-26  1:05   ` [PATCH] MSR: fix badzero.cocci warnings kbuild test robot
@ 2016-02-26  1:05   ` kbuild test robot
  1 sibling, 0 replies; 29+ messages in thread
From: kbuild test robot @ 2016-02-26  1:05 UTC (permalink / raw)
  To: Marty McFadden
  Cc: kbuild-all, ak, andriy.shevchenko, bp, bp, brgerst,
	dan.j.williams, dyoung, hpa, linux, linux-kernel, luto,
	mcfadden8, mingo, pavel, tglx, viro, x86, yu.c.chen

Hi Marty,

[auto build test WARNING on tip/x86/core]
[also build test WARNING on v4.5-rc5 next-20160225]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Marty-McFadden/MSR-MSR-MSR-Whitelist-and-Batch-Introduction/20160226-081635


coccinelle warnings: (new ones prefixed by >>)

>> arch/x86/kernel/msr_whitelist.c:200:18-19: WARNING comparing pointer to 0
   arch/x86/kernel/msr_whitelist.c:203:27-28: WARNING comparing pointer to 0

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] MSR: msr Batch processing feature
  2016-02-26  0:02 ` [PATCH 4/4] MSR: msr Batch processing feature Marty McFadden
@ 2016-02-26  2:48   ` Andy Lutomirski
  2016-03-03 17:21   ` One Thousand Gnomes
  1 sibling, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-26  2:48 UTC (permalink / raw)
  To: Marty McFadden
  Cc: Andi Kleen, Andy Shevchenko, Borislav Petkov, Borislav Petkov,
	Brian Gerst, Dan Williams, Dave Young, H. Peter Anvin,
	George Spelvin, linux-kernel, Andrew Lutomirski, Ingo Molnar,
	Pavel Machek, Thomas Gleixner, Al Viro, X86 ML, Chen Yu

On Thu, Feb 25, 2016 at 4:02 PM, Marty McFadden <mcfadden8@llnl.gov> wrote:
> Provides a new ioctl interface through /dev/cpu/msr_batch.
>
> This implementation will cause an Inter Processor Interrupt to be sent
> to each destination processor and will wait until all processors have
> finished processing their respective batch of MSR operations before
> returning.

This comes up periodicially, and it needs to be benchmarked against
the sane way to do it in userspace:

For each CPU: sched_setaffinity to bind to that CPU, and then a bunch
of conventional MSR accesses.  Optionally do all the CPUs at once
using lots of threads.

Please do this benchmark on a kernel that is *not* using nohz_full.
The nohz_full implementaion in current kernels has abysmal syscall
performance, but that's improving rapidly.

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] MSR: Prep for separating msr.c into three files
  2016-02-26  0:02 ` [PATCH 1/4] MSR: Prep for separating msr.c into three files Marty McFadden
@ 2016-02-26  2:56   ` kbuild test robot
  2016-02-26 10:08   ` Andy Shevchenko
  1 sibling, 0 replies; 29+ messages in thread
From: kbuild test robot @ 2016-02-26  2:56 UTC (permalink / raw)
  To: Marty McFadden
  Cc: kbuild-all, ak, andriy.shevchenko, bp, bp, brgerst,
	dan.j.williams, dyoung, hpa, linux, linux-kernel, luto,
	mcfadden8, mingo, pavel, tglx, viro, x86, yu.c.chen

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

Hi Marty,

[auto build test ERROR on tip/x86/core]
[also build test ERROR on v4.5-rc5 next-20160225]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url:    https://github.com/0day-ci/linux/commits/Marty-McFadden/MSR-MSR-MSR-Whitelist-and-Batch-Introduction/20160226-081635
config: x86_64-randconfig-s0-02260944 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

Note: the linux-review/Marty-McFadden/MSR-MSR-MSR-Whitelist-and-Batch-Introduction/20160226-081635 HEAD b22b3d52ac438a7130196d7bec9ad8af70073ede builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

>> make[3]: *** No rule to make target 'arch/x86/kernel/msr.c', needed by 'arch/x86/kernel/msr.o'.
   make[3]: Target '__build' not remade because of errors.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 21833 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-26  0:02 [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Marty McFadden
                   ` (3 preceding siblings ...)
  2016-02-26  0:02 ` [PATCH 4/4] MSR: msr Batch processing feature Marty McFadden
@ 2016-02-26  7:37 ` Ingo Molnar
  2016-02-28 18:54   ` Mcfadden, Marty Jay
  4 siblings, 1 reply; 29+ messages in thread
From: Ingo Molnar @ 2016-02-26  7:37 UTC (permalink / raw)
  To: Marty McFadden
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mingo, pavel, tglx, viro, x86,
	yu.c.chen, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa


* Marty McFadden <mcfadden8@llnl.gov> wrote:

> 
> This patch addresses the following two problems:
>   1. The current msr module grants all-or-nothing access to MSRs,
>      thus making user-level runtime performance adjustments 
>      problematic, particularly for power-constrained HPC systems.
> 
>   2. The current msr module requires a separate system call and the
>      acquisition of the preemption lock for each individual MSR access. 
>      This overhead degrades performance of runtime tools that would
>      ideally sample multiple MSRs at high frequencies.

No, we really don't want to touch the old MSR code - it's a very opaque API with 
various deep limitations.

What I'd like to see instead is to use a modern system monitoring interface - and 
in fact that already happened in the last kernel release, we added the perf MSR 
access methods via:

 commit b7b7c7821d932ba18ef6c8eafc8536066b4c2ef4
 Author: Andy Lutomirski <luto@kernel.org>
 Date:   Mon Jul 20 11:49:06 2015 -0400

    perf/x86: Add an MSR PMU driver
    
    This patch adds an MSR PMU to support free running MSR counters. Such
    as time and freq related counters includes TSC, IA32_APERF, IA32_MPERF
    and IA32_PPERF, but also SMI_COUNT.
    
    The events are exposed in sysfs for use by perf stat and other tools.
    The files are under /sys/devices/msr/events/

see arch/x86/cpu/perf/msr.c, or arch/x86/events/msr.c in the latest perf tree:

  git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core

For example with the perf ABIs 'batch access' of a group of MSRs is easy: a group 
of events can be read or sampled at once. It can be done in a system-wide, per 
task or per task hierarchy fashion, with cgroup management as well - it's a modern 
API.

Right now the MSR PMU code is only at its first version, with only these few MSRs 
exposed:

enum perf_msr_id {
        PERF_MSR_TSC                    = 0,
        PERF_MSR_APERF                  = 1,
        PERF_MSR_MPERF                  = 2,
        PERF_MSR_PPERF                  = 3,
        PERF_MSR_SMI                    = 4,

        PERF_MSR_EVENT_MAX,
};

but that can (and should) be expanded and more features can be added.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/4] MSR: Prep for separating msr.c into three files
  2016-02-26  0:02 ` [PATCH 1/4] MSR: Prep for separating msr.c into three files Marty McFadden
  2016-02-26  2:56   ` kbuild test robot
@ 2016-02-26 10:08   ` Andy Shevchenko
  1 sibling, 0 replies; 29+ messages in thread
From: Andy Shevchenko @ 2016-02-26 10:08 UTC (permalink / raw)
  To: Marty McFadden
  Cc: ak, bp, bp, brgerst, dan.j.williams, dyoung, hpa, linux,
	linux-kernel, luto, mingo, pavel, tglx, viro, x86, yu.c.chen

On Thu, 2016-02-25 at 16:02 -0800, Marty McFadden wrote:
>  arch/x86/kernel/msr.c       |  284 -------------------------------
> ------------
>  arch/x86/kernel/msr_entry.c |  284
> +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 284 insertions(+), 284 deletions(-)

Just hint to the future use of Git. Please, use -C -M whenever you move
file with or without slight changes).

-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-26  7:37 ` [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Ingo Molnar
@ 2016-02-28 18:54   ` Mcfadden, Marty Jay
  2016-02-28 19:01     ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Mcfadden, Marty Jay @ 2016-02-28 18:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mingo, pavel, tglx, viro, x86,
	yu.c.chen, Peter Zijlstra, Arnaldo Carvalho de Melo, Jiri Olsa

> * Ingo Molnar [mailto:mingo.kernel.org@gmail.com] wrote:
> 
> No, we really don't want to touch the old MSR code - it's a very opaque API with
> various deep limitations.
> 
> What I'd like to see instead is to use a modern system monitoring interface - and
> in fact that already happened in the last kernel release, we added the perf MSR
> access methods via:
> 
>  commit b7b7c7821d932ba18ef6c8eafc8536066b4c2ef4
>  Author: Andy Lutomirski <luto@kernel.org>
>  Date:   Mon Jul 20 11:49:06 2015 -0400
> 
>     perf/x86: Add an MSR PMU driver
> 
>     This patch adds an MSR PMU to support free running MSR counters. Such
>     as time and freq related counters includes TSC, IA32_APERF, IA32_MPERF
>     and IA32_PPERF, but also SMI_COUNT.
> 
>     The events are exposed in sysfs for use by perf stat and other tools.
>     The files are under /sys/devices/msr/events/
> 

Thank you Ingo, 

Our use case for MSR access is different.  In addition to being able to
access free running MSR counters, we also need to monitor (read) and
adjust (write) MSRs that may modify running system configurations.
One example set of MSRs that we need to be able to access are
associated with RAPL.

Further, system administrators need the ability to grant/deny MSR read
and/or write access at bit-level granularity for some of the MSRs in order
to maintain imposed security policies for their respective deployments.
The proposed whitelist approach allows for system administrators to set
a bit mask for the bits in each register where access is to be granted.

The cgroup management does not provide this level of granularity.
Instead it allows for an administrator to give a mode of access to either
all MSRs or none of them.

Thanks again,

Marty

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-28 18:54   ` Mcfadden, Marty Jay
@ 2016-02-28 19:01     ` Borislav Petkov
  2016-02-29  2:55       ` Mcfadden, Marty Jay
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2016-02-28 19:01 UTC (permalink / raw)
  To: Mcfadden, Marty Jay
  Cc: Ingo Molnar, ak, andriy.shevchenko, bp, brgerst, dan.j.williams,
	dyoung, hpa, linux, linux-kernel, luto, mingo, pavel, tglx, viro,
	x86, yu.c.chen, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa

On Sun, Feb 28, 2016 at 06:54:47PM +0000, Mcfadden, Marty Jay wrote:
> Further, system administrators need the ability to grant/deny MSR read
> and/or write access at bit-level granularity for some of the MSRs in order
> to maintain imposed security policies for their respective deployments.

Can we have some concrete examples for that please?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-28 19:01     ` Borislav Petkov
@ 2016-02-29  2:55       ` Mcfadden, Marty Jay
  2016-02-29 14:58         ` Borislav Petkov
  2016-02-29 17:17         ` Andy Lutomirski
  0 siblings, 2 replies; 29+ messages in thread
From: Mcfadden, Marty Jay @ 2016-02-29  2:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, ak, andriy.shevchenko, bp, brgerst, dan.j.williams,
	dyoung, hpa, linux, linux-kernel, luto, mingo, pavel, tglx, viro,
	x86, yu.c.chen, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa

> On Sun, Feb 28, 2016, Borislav Petkov <bp@alien8.de> wrote: 
> 
> Can we have some concrete examples for that please?
> 

Our environment allows users to have exclusive access to some
number of compute nodes for a limited time.  Bit-level control of
MSRs is required when a user might gain root or, more commonly,
interfere with subsequent jobs run by other users.

The canonical examples for bitwise control are
MSR_PKG_POWER_LIMIT and MSR_DRAM_POWER_LIMIT.  We
want to provider user space control over power bounds, but if
the lock bit is set the power bound cannot be changed without
rebooting.  As setting very low power bounds can slow
performance by a factor of 4x or worse, leaving the lock bit
writable allows a crude denial-of-service attack.

A second use case for bitwise control is IA32_MISC_ENABLE.  This
MSR controls a wide variety of processor functionality, some of
which is benign ("Performance Energy Bias Hint") and some that
might not be ("Automatic Thermal Control Circuit Enable").  Rather
than do a formal security review of the dozen features controlled
by this MSR, we'd like to take the simpler step of allowing writes
to only what we know is safe.  Note that bit "Enhanced Intel
SpeedStep Technology Select Lock" is a lock bit.

Thanks,

Marty McFadden

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29  2:55       ` Mcfadden, Marty Jay
@ 2016-02-29 14:58         ` Borislav Petkov
  2016-02-29 16:31           ` Henrique de Moraes Holschuh
  2016-02-29 17:53           ` George Spelvin
  2016-02-29 17:17         ` Andy Lutomirski
  1 sibling, 2 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-02-29 14:58 UTC (permalink / raw)
  To: Mcfadden, Marty Jay
  Cc: Ingo Molnar, ak, andriy.shevchenko, brgerst, dan.j.williams,
	dyoung, hpa, linux, linux-kernel, luto, mingo, pavel, tglx, viro,
	x86, yu.c.chen, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Jiri Olsa

On Mon, Feb 29, 2016 at 02:55:43AM +0000, Mcfadden, Marty Jay wrote:
> Our environment allows users to have exclusive access to some
> number of compute nodes for a limited time.  Bit-level control of
> MSRs is required when a user might gain root or, more commonly,
> interfere with subsequent jobs run by other users.
> 
> The canonical examples for bitwise control are
> MSR_PKG_POWER_LIMIT and MSR_DRAM_POWER_LIMIT.  We
> want to provider user space control over power bounds, but if
> the lock bit is set the power bound cannot be changed without
> rebooting.  As setting very low power bounds can slow
> performance by a factor of 4x or worse, leaving the lock bit
> writable allows a crude denial-of-service attack.

And this all can't be done with the RAPL driver
drivers/powercap/intel_rapl.c?

> A second use case for bitwise control is IA32_MISC_ENABLE.  This
> MSR controls a wide variety of processor functionality, some of
> which is benign ("Performance Energy Bias Hint")

So SDM has IA32_ENERGY_PERF_BIAS (0x1b0) listed for that. I.e., the bits
[3:0] with which you can select between 0 - highest perf and 15 max
energy saving.

That's implemented in x86_energy_perf_policy(8).

And the control bit is in MSR_MISC_PWR_MGMT(0x1aa)[1].

> and some that might not be ("Automatic Thermal Control Circuit
> Enable").
>
> Rather than do a formal security review of the dozen features
> controlled by this MSR, we'd like to take the simpler step of allowing
> writes to only what we know is safe.

If people want to shoot themselves in the foot, I don't think you can
stop them.

Ok, so here's what I think:

What you want to achieve can be implemented - if it hasn't happened
already - in the already supplied drivers (rapl, x86_energy_perf_policy,
etc).

Then, instead of filtering access to MSRs, we should not allow any
non-OS controlled access to them. That means, not shipping msr.ko at
all. Why?

You simply don't want to allow uncontrolled access to MSRs from
userspace. Even to root. It is simply too dangerous to poke at them.

What should be done, instead, is implement all functionality you need in
the respective drivers with proper error and input sanity-checking done
by the OS. Also, OS has other agents poking at them so it should be the
arbiter controlling access and so on.

IMNSVHO.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29 14:58         ` Borislav Petkov
@ 2016-02-29 16:31           ` Henrique de Moraes Holschuh
  2016-02-29 17:22             ` Borislav Petkov
  2016-02-29 17:53           ` George Spelvin
  1 sibling, 1 reply; 29+ messages in thread
From: Henrique de Moraes Holschuh @ 2016-02-29 16:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Mcfadden, Marty Jay, Ingo Molnar, ak, andriy.shevchenko, brgerst,
	dan.j.williams, dyoung, hpa, linux, linux-kernel, luto, mingo,
	pavel, tglx, viro, x86, yu.c.chen, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, 29 Feb 2016, Borislav Petkov wrote:
> What you want to achieve can be implemented - if it hasn't happened
> already - in the already supplied drivers (rapl, x86_energy_perf_policy,
> etc).
> 
> Then, instead of filtering access to MSRs, we should not allow any
> non-OS controlled access to them. That means, not shipping msr.ko at
> all. Why?
> 
> You simply don't want to allow uncontrolled access to MSRs from
> userspace. Even to root. It is simply too dangerous to poke at them.

Fully agreed.  That said, it would be nice to have a patchset (intended
mostly for stable/LTS backporting) that restricts MSR access to a *static*
whitelist that allows only the minimal access required by the current crop
of general use utilities like powertop, turbostat, etc.

We do need to deprecate MSR.ko, but IMHO it would be worth it to take the
time to make MSR.ko safer anyway *and backport that to older kernels*
because that is going to increase security on a large number of systems that
can't just disable MSR.ko without losing functionality right now.

Some kconfig control for what is included in the static whitelist (with
appropriate defaults) could make the deprecation period mostly painless both
for general use distros (that *do* enable MSR.ko) and for HPC deployments...

IMHO it would be acceptable to have a more complicated scenario where there
is a static "master whitelist", which can be made more strict (i.e. remove
access rights) through a soft whitelist managed by userspace (with
appropriate capability checks), if there is a real need for that level of
control on something we want to deprecate.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29  2:55       ` Mcfadden, Marty Jay
  2016-02-29 14:58         ` Borislav Petkov
@ 2016-02-29 17:17         ` Andy Lutomirski
  1 sibling, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-02-29 17:17 UTC (permalink / raw)
  To: Marty McFadden
  Cc: Ingo Molnar, Brian Gerst, x86, Arnaldo Carvalho de Melo,
	Borislav Petkov, dan.j.williams, Ingo Molnar, dyoung, Jiri Olsa,
	yu.c.chen, ak, viro, tglx, hpa, pavel, andriy.shevchenko,
	Peter Zijlstra, linux, linux-kernel, bp

On Feb 28, 2016 6:55 PM, "Mcfadden, Marty Jay" <mcfadden8@llnl.gov> wrote:
>
> > On Sun, Feb 28, 2016, Borislav Petkov <bp@alien8.de> wrote:
> >
> > Can we have some concrete examples for that please?
> >
>
> Our environment allows users to have exclusive access to some
> number of compute nodes for a limited time.  Bit-level control of
> MSRs is required when a user might gain root or, more commonly,
> interfere with subsequent jobs run by other users.
>
> The canonical examples for bitwise control are
> MSR_PKG_POWER_LIMIT and MSR_DRAM_POWER_LIMIT.  We
> want to provider user space control over power bounds, but if
> the lock bit is set the power bound cannot be changed without
> rebooting.  As setting very low power bounds can slow
> performance by a factor of 4x or worse, leaving the lock bit
> writable allows a crude denial-of-service attack.
>
> A second use case for bitwise control is IA32_MISC_ENABLE.  This
> MSR controls a wide variety of processor functionality, some of
> which is benign ("Performance Energy Bias Hint") and some that
> might not be ("Automatic Thermal Control Circuit Enable").  Rather
> than do a formal security review of the dozen features controlled
> by this MSR, we'd like to take the simpler step of allowing writes
> to only what we know is safe.  Note that bit "Enhanced Intel
> SpeedStep Technology Select Lock" is a lock bit.

ISTM you should either write a kernel driver that exposes a real
interface for these controls or a userspace daemon that offers this as
a service.

>
> Thanks,
>
> Marty McFadden

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29 16:31           ` Henrique de Moraes Holschuh
@ 2016-02-29 17:22             ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-02-29 17:22 UTC (permalink / raw)
  To: Henrique de Moraes Holschuh
  Cc: Mcfadden, Marty Jay, Ingo Molnar, ak, andriy.shevchenko, brgerst,
	dan.j.williams, dyoung, hpa, linux, linux-kernel, luto, mingo,
	pavel, tglx, viro, x86, yu.c.chen, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Jiri Olsa

On Mon, Feb 29, 2016 at 01:31:44PM -0300, Henrique de Moraes Holschuh wrote:
> Fully agreed.  That said, it would be nice to have a patchset (intended
> mostly for stable/LTS backporting) that restricts MSR access to a *static*
> whitelist that allows only the minimal access required by the current crop
> of general use utilities like powertop, turbostat, etc.

As I said before, if people want to shoot themselves in the foot, we
can't stop them.

However, we should start converting those tools away from msr.ko and
towards proper OS interfaces instead. And not a short list it is (this
only in the kernel repo) ... :-\

$ git grep -lE "cpu/.*/msr" | uniq
Documentation/devices.txt
arch/x86/Kconfig
arch/x86/kernel/msr.c
tools/power/cpupower/debug/i386/centrino-decode.c
tools/power/cpupower/debug/i386/powernow-k8-decode.c
tools/power/cpupower/man/cpupower-monitor.1
tools/power/cpupower/utils/cpupower.c
tools/power/cpupower/utils/helpers/msr.c
tools/power/x86/turbostat/turbostat.8
tools/power/x86/turbostat/turbostat.c
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.8
tools/power/x86/x86_energy_perf_policy/x86_energy_perf_policy.c
tools/testing/selftests/intel_pstate/aperf.c
tools/testing/selftests/intel_pstate/msr.c

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29 14:58         ` Borislav Petkov
  2016-02-29 16:31           ` Henrique de Moraes Holschuh
@ 2016-02-29 17:53           ` George Spelvin
  2016-02-29 18:20             ` Borislav Petkov
  1 sibling, 1 reply; 29+ messages in thread
From: George Spelvin @ 2016-02-29 17:53 UTC (permalink / raw)
  To: bp, mcfadden8
  Cc: a.p.zijlstra, acme, ak, andriy.shevchenko, brgerst,
	dan.j.williams, dyoung, hpa, jolsa, linux-kernel, linux, luto,
	mingo, mingo, pavel, tglx, viro, x86, yu.c.chen

Borislav Petkov <bp@alien8.de> wrote:
> What should be done, instead, is implement all functionality you need in
> the respective drivers with proper error and input sanity-checking done
> by the OS. Also, OS has other agents poking at them so it should be the
> arbiter controlling access and so on.
> 
> IMNSVHO.

I worry that this is this too ambitious a goal.  Who is volunteering
to actually do this?

It takes quite a while to find a good OS-level abstraction (remember
wakelocks?), and MSRs are the CPU architect's equivalent of ioctls.
So they're a bit of a mess, and there will keep being new ones.

I agree with you about anything that's going to see widespread use, but
for specialized (apparently mostly HPC) use where the application really
is heavily optimized for specific CPU models, perhaps dangerous-but-simple
is good enough?

The proposed interface is simple and imposes very little maintenance
burden on the kernel.  My main objection is that it's yet another
special-case permission system.  Are we *sure* we'll never want to have
to classes of users with different access rights?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29 17:53           ` George Spelvin
@ 2016-02-29 18:20             ` Borislav Petkov
  2016-02-29 22:35               ` Mcfadden, Marty Jay
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2016-02-29 18:20 UTC (permalink / raw)
  To: George Spelvin
  Cc: mcfadden8, a.p.zijlstra, acme, ak, andriy.shevchenko, brgerst,
	dan.j.williams, dyoung, hpa, jolsa, linux-kernel, luto, mingo,
	mingo, pavel, tglx, viro, x86, yu.c.chen

On Mon, Feb 29, 2016 at 12:53:18PM -0500, George Spelvin wrote:
> I worry that this is this too ambitious a goal.  Who is volunteering
> to actually do this?

>From a quick look, the stuff in the examples was already in the rapl
driver.

> It takes quite a while to find a good OS-level abstraction (remember
> wakelocks?), and MSRs are the CPU architect's equivalent of ioctls.
> So they're a bit of a mess, and there will keep being new ones.

And yet you end up needing only a handful in most cases.

> I agree with you about anything that's going to see widespread use, but
> for specialized (apparently mostly HPC) use where the application really
> is heavily optimized for specific CPU models, perhaps dangerous-but-simple
> is good enough?

If it is that specialized, then it doesn't belong upstream.

> The proposed interface is simple and imposes very little maintenance
> burden on the kernel.  My main objection is that it's yet another
> special-case permission system.  Are we *sure* we'll never want to have
> to classes of users with different access rights?

The proposed interface is the wrong thing to do. There's no need to talk
about how simple and less of a burden it is.

The burden comes when people start complaining about strange issues and
we go and have to get a full MSR dump at the time the explosion happens
because some userspace tool went nuts and scribbled all over them. No
one wants to be on the receiving end of a bug report like this.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29 18:20             ` Borislav Petkov
@ 2016-02-29 22:35               ` Mcfadden, Marty Jay
  2016-02-29 23:41                 ` Borislav Petkov
  2016-03-01  8:02                 ` Thomas Gleixner
  0 siblings, 2 replies; 29+ messages in thread
From: Mcfadden, Marty Jay @ 2016-02-29 22:35 UTC (permalink / raw)
  To: Borislav Petkov, George Spelvin
  Cc: a.p.zijlstra, acme, ak, andriy.shevchenko, brgerst,
	dan.j.williams, dyoung, hpa, jolsa, linux-kernel, luto, mingo,
	mingo, pavel, tglx, viro, x86, yu.c.chen

> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Monday, February 29, 2016 10:21 AM
> To: George Spelvin <linux@horizon.com>
> 
> From a quick look, the stuff in the examples was already in the rapl driver.
>

The examples provided were to address why bit-level access granularity
was needed.  They were not intended to be, nor are they, a comprehensive list
of scenarios.
 
> If it is that specialized, then it doesn't belong upstream.
>

What is being proposed is to make msr.ko better with a whitelist which
does not allow access to any MSRs be default unless you are root or
are a user with the appropriate capability (which is how it works today).

System Administrators can then selectively add entries to the whitelist
to allow userspace applications access to specified bits of specific MSRs
of interest.  This can all be done on emerging hardware before new
updates or interfaces are added to the kernel.

Keeping msr.ko around and adding a whitelist mechanism will provide a
safe means for developers to experiment with MSR sets in their to
squeeze out performance and power and possibly inform future features
of modules like the rapl driver.  

Further, keeping msr.ko provides a means for software developers to work
with hardware architects on emerging hardware technologies where new
MSRs have been added or updated.

Removing the ability to access newly added or updated MSRs without having
new kernel code would negatively impact these types of development activities
being utilized today.

> The burden comes when people start complaining about strange issues and we
> go and have to get a full MSR dump at the time the explosion happens because
> some userspace tool went nuts and scribbled all over them. No one wants to be
> on the receiving end of a bug report like this.
> 

This is precisely why the whitelist approach is being proposed.  The current version
of msr.ko will gladly allow userspace tools with capabilities set to scribble all over
them.  With whitelists, system admins can turn off capabilities for the apps and
limit access to a very small subset of bits of a small subset of MSRs.

Marty

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29 22:35               ` Mcfadden, Marty Jay
@ 2016-02-29 23:41                 ` Borislav Petkov
  2016-03-01 19:01                   ` Rountree, Barry L.
  2016-03-01  8:02                 ` Thomas Gleixner
  1 sibling, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2016-02-29 23:41 UTC (permalink / raw)
  To: Mcfadden, Marty Jay
  Cc: George Spelvin, a.p.zijlstra, acme, ak, andriy.shevchenko,
	brgerst, dan.j.williams, dyoung, hpa, jolsa, linux-kernel, luto,
	mingo, mingo, pavel, tglx, viro, x86, yu.c.chen

On Mon, Feb 29, 2016 at 10:35:12PM +0000, Mcfadden, Marty Jay wrote:
> The examples provided were to address why bit-level access granularity
> was needed.  They were not intended to be, nor are they, a comprehensive list
> of scenarios.

So which scenarios cannot be satisfied by proper implementation of
userspace APIs (sysfs, procfs, etc) and direct access to MSRs is the
only way?

> What is being proposed is to make msr.ko better with a whitelist which
> does not allow access to any MSRs be default unless you are root or
> are a user with the appropriate capability (which is how it works
> today).

That doesn't protect you from mistakenly configuring the wrong mask, as
root. A well-defined userspace interface does.

Also, as I hinted at before, you have cases where other drivers
and modules access the same MSRs as msr.ko and for that you need
synchronization. Which would then need to be added. But we're going
to be stuck with this wrong interface already and we then would be
*extending* that wrong interface. No thanks.

> System Administrators can then selectively add entries to the whitelist
> to allow userspace applications access to specified bits of specific MSRs
> of interest.  This can all be done on emerging hardware before new
> updates or interfaces are added to the kernel.

Emerging hardware uses specialized hw bringup software - not the
upstream kernel. Or it is a heavily patched beyond recognition ancient
upstream kernel. And to that kernel people can do whatever they want.

> Keeping msr.ko around and adding a whitelist mechanism will provide a
> safe means for developers to experiment with MSR sets in their to
> squeeze out performance and power and possibly inform future features
> of modules like the rapl driver.

This can be done with msr.ko now too.

So no one is removing msr.ko. I'm saying extending it is the wrong
approach. msr.ko is a simple debugging module. Nothing more. Everything
else which is going to be user-visible and widely used, needs to be safe
and clean. Exposing MSRs directly is not. It might be useful for some
niche HPC use cases but is not nearly clean, safe and generic enough for
a userspace API which we will have to support forever.

And the fact that a bunch of tools are already using it is wrong. They
shouldn't. They should use proper, safe user APIs and not poke at MSRs
directly.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29 22:35               ` Mcfadden, Marty Jay
  2016-02-29 23:41                 ` Borislav Petkov
@ 2016-03-01  8:02                 ` Thomas Gleixner
  2016-03-01 18:29                   ` Rountree, Barry L.
  1 sibling, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2016-03-01  8:02 UTC (permalink / raw)
  To: Mcfadden, Marty Jay
  Cc: Borislav Petkov, George Spelvin, a.p.zijlstra, acme, ak,
	andriy.shevchenko, brgerst, dan.j.williams, dyoung, hpa, jolsa,
	linux-kernel, luto, mingo, mingo, pavel, viro, x86, yu.c.chen

On Mon, 29 Feb 2016, Mcfadden, Marty Jay wrote:
> This is precisely why the whitelist approach is being proposed.  The current
> version of msr.ko will gladly allow userspace tools with capabilities set to
> scribble all over them.  With whitelists, system admins can turn off
> capabilities for the apps and limit access to a very small subset of bits of
> a small subset of MSRs.

We don't need any of this, really.

Developers who play with experimental or emerging technologies hardly need
that whitelist stuff. They better know what they are doing.

For normal production use we want proper interfaces/drivers for the
technologies which should be made accessible to applications. msr.ko should
not be available on any production machine at all.

Your whitelist filter is just a sloppy hack to foster bad practices. Your
arguments about emerging technologies etc. are just trying to lull us into
accepting your wonderful hackery, so that you can continue to use MSRs in
production environments while pretending that you provide a reasonable amount
of security and sanity around it. Nice try, but it doesn't work...

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-03-01  8:02                 ` Thomas Gleixner
@ 2016-03-01 18:29                   ` Rountree, Barry L.
  2016-03-01 18:38                     ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Rountree, Barry L. @ 2016-03-01 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Borislav Petkov, George Spelvin, a.p.zijlstra, acme, ak,
	andriy.shevchenko, brgerst, dan.j.williams, dyoung, hpa, jolsa,
	Mcfadden, Marty Jay, luto, mingo, mingo, pavel, viro, x86,
	yu.c.chen, Thomas Gleixner, Travis Gummels, Eastep, Jonathan M,
	len.brown, prarit



On 3/1/16, 12:02 AM, "Thomas Gleixner" <tglx@linutronix.de> wrote:


>
>Your whitelist filter is just a sloppy hack to foster bad practices. Your
>arguments about emerging technologies etc. are just trying to lull us into
>accepting your wonderful hackery, so that you can continue to use MSRs in
>production environments while pretending that you provide a reasonable
>amount
>of security and sanity around it. Nice try, but it doesn't work...

Here's how kernel development is done is secure facilities.

Any new code going into the kernel gets an extensive (and expensive)
security
review.  It has to be right the first time.  We don't have the luxury of
iterating back and forth several times between the testbed machines and
the 
production floor.

Because of this we keep as much functionality as possible outside of the
kernel.  Most of the time, all we really need the kernel to do is read and
write MSRs (quickly).

And because kernel-driver-per-MSR-group leads to lots of new kernel
drivers,
we like to refactor when we can.  Most new functionality can be captured
by having a single driver that just reads and writes the MSRs we need.

So now we have one kernel module to review instead of dozens.  When we
want to move a new processor feature to the production floor, the security
review is focused on what the MSR does.  The kernel code itself is
unchanged.
If the new MSR is deemed safe enough, the only thing that needs to change
on 
production machines is a new entry in the whitelist.  And if need be, that
whitelist can extend to individual bits.

You're absolutely right that developers don't need this functionality.
Users do.  msr-safe allows users to get access to benign MSRs that either
don't have kernel interfaces yet or whose kernel interfaces don't meet
their specialized needs.  And we can provide this access on production
machines without risking inadvertent or malicious scribbling on non-benign
MSRs.  And we can do this with one simple, straightforward kernel module.

If you have a better way of solving this problem, I'd love to hear it.

>
>Thanks,
>
>        tglx

Barry Rountree
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory

>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-03-01 18:29                   ` Rountree, Barry L.
@ 2016-03-01 18:38                     ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2016-03-01 18:38 UTC (permalink / raw)
  To: Rountree, Barry L.
  Cc: linux-kernel, George Spelvin, a.p.zijlstra, acme, ak,
	andriy.shevchenko, brgerst, dan.j.williams, dyoung, hpa, jolsa,
	Mcfadden, Marty Jay, luto, mingo, mingo, pavel, viro, x86,
	yu.c.chen, Thomas Gleixner, Travis Gummels, Eastep, Jonathan M,
	len.brown, prarit

On Tue, Mar 01, 2016 at 06:29:59PM +0000, Rountree, Barry L. wrote:

> ... And we can provide this access on production machines without
> risking inadvertent or malicious scribbling on non-benign MSRs. And we
> can do this with one simple, straightforward kernel module.

So you can simply keep that module in your repo and apply it each time
before doing a security review. It is not meant for upstream use anyway.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-02-29 23:41                 ` Borislav Petkov
@ 2016-03-01 19:01                   ` Rountree, Barry L.
  2016-03-03  0:40                     ` Andy Lutomirski
  0 siblings, 1 reply; 29+ messages in thread
From: Rountree, Barry L. @ 2016-03-01 19:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: George Spelvin, a.p.zijlstra, acme, ak, andriy.shevchenko,
	brgerst, dan.j.williams, dyoung, hpa, jolsa, luto, Mcfadden,
	Marty Jay, mingo, mingo, pavel, tglx, viro, x86, yu.c.chen,
	Borislav Petkov, Eastep, Jonathan M, Prarit Bhargava, len.brown,
	Travis Gummels



On 2/29/16, 3:41 PM, "Borislav Petkov" <bp@alien8.de> wrote:

>On Mon, Feb 29, 2016 at 10:35:12PM +0000, Mcfadden, Marty Jay wrote:
>> The examples provided were to address why bit-level access granularity
>> was needed.  They were not intended to be, nor are they, a
>>comprehensive list
>> of scenarios.
>
>So which scenarios cannot be satisfied by proper implementation of
>userspace APIs (sysfs, procfs, etc) and direct access to MSRs is the
>only way?

All of them can *eventually* be implemented with userspace APIs.  But that
process can take years.  We had an early version of msr-safe allowing
whitelisted RAPL access in 2012.  I don't think a RAPL driver hit
linux mainline until 2015.

One way of getting more resources into "proper implementations" is by
showing how usefulness of the userspace + msr-safe implementation.

>
>> What is being proposed is to make msr.ko better with a whitelist which
>> does not allow access to any MSRs be default unless you are root or
>> are a user with the appropriate capability (which is how it works
>> today).
>
>That doesn't protect you from mistakenly configuring the wrong mask, as
>root. A well-defined userspace interface does.

No userspace interface will protect you from mistaken configurations as
root.  You can mitigate the mistakes by making the defaults sane (the
default here is no userspace access at all) and by making inadvertent
errors unlikely (it's unlikely that root will edit the whitelist by
mistake; if it happens it still needs to be echoed into the whilelist
file; any invalid line invalidates the entire whitelist and puts the
system back into no-userspace-access mode).


>
>Also, as I hinted at before, you have cases where other drivers
>and modules access the same MSRs as msr.ko and for that you need
>synchronization. Which would then need to be added. But we're going
>to be stuck with this wrong interface already and we then would be
>*extending* that wrong interface. No thanks.

Synchronization wasn't provided by msr.ko and that doesn't appear to
have caused any problems.  If synchronization is a problem, it's easy
enough to remove the offending MSR from the whitelist.

>
>> System Administrators can then selectively add entries to the whitelist
>> to allow userspace applications access to specified bits of specific
>>MSRs
>> of interest.  This can all be done on emerging hardware before new
>> updates or interfaces are added to the kernel.
>
>Emerging hardware uses specialized hw bringup software - not the
>upstream kernel. Or it is a heavily patched beyond recognition ancient
>upstream kernel. And to that kernel people can do whatever they want.

Absolutely.  The problem we're trying to solve is that msr.ko shouldn't
be used on production machines.  However, our users want the capabilities
provided by new MSRs well before they show up in mainline.  msr-safe is
the best solution we've come up with.  We do far less kernel patching
(which our security folks really appreciate), users can't inadvertently
trash the machine (which our sysadmins appreciate) and our users get
access to new processor functionality on production machines.

>
>> Keeping msr.ko around and adding a whitelist mechanism will provide a
>> safe means for developers to experiment with MSR sets in their to
>> squeeze out performance and power and possibly inform future features
>> of modules like the rapl driver.
>
>This can be done with msr.ko now too.

The security model for msr.ko is all-or-nothing.  That's not a sane choice
for production environments.

Barry Rountree
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction
  2016-03-01 19:01                   ` Rountree, Barry L.
@ 2016-03-03  0:40                     ` Andy Lutomirski
  0 siblings, 0 replies; 29+ messages in thread
From: Andy Lutomirski @ 2016-03-03  0:40 UTC (permalink / raw)
  To: Rountree, Barry L.
  Cc: Ingo Molnar, Brian Gerst, x86, Borislav Petkov, len.brown,
	dan.j.williams, George Spelvin, mingo, dyoung, acme, yu.c.chen,
	ak, Eastep, Jonathan M, viro, jolsa, tglx, hpa, pavel, Mcfadden,
	Marty Jay, andriy.shevchenko, a.p.zijlstra, Travis Gummels,
	linux-kernel, Prarit Bhargava

On Mar 1, 2016 11:01 AM, "Rountree, Barry L." <rountree4@llnl.gov> wrote:
>
>
>
> On 2/29/16, 3:41 PM, "Borislav Petkov" <bp@alien8.de> wrote:
>
> >On Mon, Feb 29, 2016 at 10:35:12PM +0000, Mcfadden, Marty Jay wrote:
> >> The examples provided were to address why bit-level access granularity
> >> was needed.  They were not intended to be, nor are they, a
> >>comprehensive list
> >> of scenarios.
> >
> >So which scenarios cannot be satisfied by proper implementation of
> >userspace APIs (sysfs, procfs, etc) and direct access to MSRs is the
> >only way?
>
> All of them can *eventually* be implemented with userspace APIs.  But that
> process can take years.  We had an early version of msr-safe allowing
> whitelisted RAPL access in 2012.  I don't think a RAPL driver hit
> linux mainline until 2015.
>
> One way of getting more resources into "proper implementations" is by
> showing how usefulness of the userspace + msr-safe implementation.

How about a userspace + msr.ko implementation with no kernel changes
whatsoever?  We're not talking about a lot of code here.

>
> >
> >> What is being proposed is to make msr.ko better with a whitelist which
> >> does not allow access to any MSRs be default unless you are root or
> >> are a user with the appropriate capability (which is how it works
> >> today).
> >
> >That doesn't protect you from mistakenly configuring the wrong mask, as
> >root. A well-defined userspace interface does.
>
> No userspace interface will protect you from mistaken configurations as
> root.  You can mitigate the mistakes by making the defaults sane (the
> default here is no userspace access at all) and by making inadvertent
> errors unlikely (it's unlikely that root will edit the whitelist by
> mistake; if it happens it still needs to be echoed into the whilelist
> file; any invalid line invalidates the entire whitelist and puts the
> system back into no-userspace-access mode).

Then configure the existing code correctly.

Step 1: Only root gets access to /dev/msr

Step 2: Your semi-privileged users get access to a user daemon or CUSE
device or whatever that implements your security policy

Step 3: Remember not to complain upstream if your MSR write breaks things.

Which reminds me: we should probably create a rather limited (maybe
even empty) list of MSRs that may be written via /dev/msr without
tainting the kernel.


> >
> >> System Administrators can then selectively add entries to the whitelist
> >> to allow userspace applications access to specified bits of specific
> >>MSRs
> >> of interest.  This can all be done on emerging hardware before new
> >> updates or interfaces are added to the kernel.
> >
> >Emerging hardware uses specialized hw bringup software - not the
> >upstream kernel. Or it is a heavily patched beyond recognition ancient
> >upstream kernel. And to that kernel people can do whatever they want.
>
> Absolutely.  The problem we're trying to solve is that msr.ko shouldn't
> be used on production machines.  However, our users want the capabilities
> provided by new MSRs well before they show up in mainline.  msr-safe is
> the best solution we've come up with.  We do far less kernel patching
> (which our security folks really appreciate), users can't inadvertently
> trash the machine (which our sysadmins appreciate) and our users get
> access to new processor functionality on production machines.

Why exactly can't msr.ko be used in production?  Remember, you're
already claiming that you believe that certain MSRs are okay to poke
behind the kernel's back.

--Andy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/4] MSR: msr Batch processing feature
  2016-02-26  0:02 ` [PATCH 4/4] MSR: msr Batch processing feature Marty McFadden
  2016-02-26  2:48   ` Andy Lutomirski
@ 2016-03-03 17:21   ` One Thousand Gnomes
  2016-03-03 23:09     ` Mcfadden, Marty Jay
  1 sibling, 1 reply; 29+ messages in thread
From: One Thousand Gnomes @ 2016-03-03 17:21 UTC (permalink / raw)
  To: Marty McFadden
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mingo, pavel, tglx, viro, x86,
	yu.c.chen

> +	myinfo->rawio_allowed = capable(CAP_SYS_RAWIO);
> +	file->private_data = myinfo;

That strikes me as a very bad idea btw. If your opener was privileged and
leaks the handle via exec or anything else to another process that
process inherits the powers which means it can own the system trivially.

(Texbook example was the early 4BSD socket bug where any root created
socket could be used to reconfigure networking which meant rsh could be
used to do wonderful things. Then faithfully duplicated in Solaris 2.x
*after* BSD fixed it ;) )

> +static long msrbatch_ioctl(struct file *f, unsigned int ioc, unsigned long arg)
> +{
> +	int err = 0;
> +	struct msr_batch_array __user *uoa;
> +	struct msr_batch_op __user *uops;
> +	struct msr_batch_array koa;
> +	struct msrbatch_session_info *myinfo = f->private_data;
> +
> +	if (ioc != X86_IOC_MSR_BATCH) {
> +		pr_err("Invalid ioctl op %u\n", ioc);

So a user can fill your log because you have lots of pr_err() calls etc

> +	err = msrbatch_apply_whitelist(&koa, myinfo);

Two threads doing this at once will break if they issue overlapping
requests with and/or (plus whatever carnage if you clash with any other
kernel used MSR)

Alan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 4/4] MSR: msr Batch processing feature
  2016-03-03 17:21   ` One Thousand Gnomes
@ 2016-03-03 23:09     ` Mcfadden, Marty Jay
  0 siblings, 0 replies; 29+ messages in thread
From: Mcfadden, Marty Jay @ 2016-03-03 23:09 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: ak, andriy.shevchenko, bp, bp, brgerst, dan.j.williams, dyoung,
	hpa, linux, linux-kernel, luto, mingo, pavel, tglx, viro, x86,
	yu.c.chen

> From: One Thousand Gnomes [mailto:gnomes@lxorguk.ukuu.org.uk]
> Sent: Thursday, March 03, 2016 9:21 AM
> 
> That strikes me as a very bad idea btw. If your opener was privileged and leaks
> the handle via exec or anything else to another process that process inherits the
> powers which means it can own the system trivially.
> 

Agreed, definitely a bad idea on my part.  I will revert these particular changes.

> 
> So a user can fill your log because you have lots of pr_err() calls etc

Thanks, I will scrub out the unnecessary pr_err() calls.

> > +	err = msrbatch_apply_whitelist(&koa, myinfo);
> 
> Two threads doing this at once will break if they issue overlapping
> requests with and/or (plus whatever carnage if you clash with 
> any other kernel used MSR)

Good point.  I'll add protection around the whitelist.

Thank you for your review and comments,

Marty
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-03-03 23:09 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-26  0:02 [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Marty McFadden
2016-02-26  0:02 ` [PATCH 1/4] MSR: Prep for separating msr.c into three files Marty McFadden
2016-02-26  2:56   ` kbuild test robot
2016-02-26 10:08   ` Andy Shevchenko
2016-02-26  0:02 ` [PATCH 2/4] " Marty McFadden
2016-02-26  0:02 ` [PATCH 3/4] MSR: msr Whitelist Implementation Marty McFadden
2016-02-26  1:05   ` [PATCH] MSR: fix badzero.cocci warnings kbuild test robot
2016-02-26  1:05   ` [PATCH 3/4] MSR: msr Whitelist Implementation kbuild test robot
2016-02-26  0:02 ` [PATCH 4/4] MSR: msr Batch processing feature Marty McFadden
2016-02-26  2:48   ` Andy Lutomirski
2016-03-03 17:21   ` One Thousand Gnomes
2016-03-03 23:09     ` Mcfadden, Marty Jay
2016-02-26  7:37 ` [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction Ingo Molnar
2016-02-28 18:54   ` Mcfadden, Marty Jay
2016-02-28 19:01     ` Borislav Petkov
2016-02-29  2:55       ` Mcfadden, Marty Jay
2016-02-29 14:58         ` Borislav Petkov
2016-02-29 16:31           ` Henrique de Moraes Holschuh
2016-02-29 17:22             ` Borislav Petkov
2016-02-29 17:53           ` George Spelvin
2016-02-29 18:20             ` Borislav Petkov
2016-02-29 22:35               ` Mcfadden, Marty Jay
2016-02-29 23:41                 ` Borislav Petkov
2016-03-01 19:01                   ` Rountree, Barry L.
2016-03-03  0:40                     ` Andy Lutomirski
2016-03-01  8:02                 ` Thomas Gleixner
2016-03-01 18:29                   ` Rountree, Barry L.
2016-03-01 18:38                     ` Borislav Petkov
2016-02-29 17:17         ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.