* [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region
@ 2020-05-19 7:03 Dan Williams
2020-05-19 12:11 ` Greg KH
2020-05-19 18:46 ` Matthew Wilcox
0 siblings, 2 replies; 8+ messages in thread
From: Dan Williams @ 2020-05-19 7:03 UTC (permalink / raw)
To: gregkh
Cc: Arnd Bergmann, Ingo Molnar, Kees Cook, Russell King,
Andrew Morton, akpm, linux-kernel, linux-mm
Close the hole of holding a mapping over kernel driver takeover event of
a given address range.
Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
kernel against scenarios where a /dev/mem user tramples memory that a
kernel driver owns. However, this protection only prevents *new* read(),
write() and mmap() requests. Established mappings prior to the driver
calling request_mem_region() are left alone.
Especially with persistent memory, and the core kernel metadata that is
stored there, there are plentiful scenarios for a /dev/mem user to
violate the expectations of the driver and cause amplified damage.
Teach request_mem_region() to find and shoot down active /dev/mem
mappings that it believes it has successfully claimed for the exclusive
use of the driver. Effectively a driver call to request_mem_region()
becomes a hole-punch on the /dev/mem device.
The typical usage of unmap_mapping_range() is part of
truncate_pagecache() to punch a hole in a file, but in this case the
implementation is only doing the "first half" of a hole punch. Namely it
is just evacuating current established mappings of the "hole", and it
relies on the fact that /dev/mem establishes mappings in terms of
absolute physical address offsets. Once existing mmap users are
invalidated they can attempt to re-establish the mapping, or attempt to
continue issuing read(2) / write(2) to the invalidated extent, but they
will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
block those subsequent accesses.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes since v1 [1]:
- updated the changelog to describe the usage of unmap_mapping_range().
No other logic changes:
[1]: http://lore.kernel.org/r/158662721802.1893045.12301414116114602646.stgit@dwillia2-desk3.amr.corp.intel.com
Greg, Andrew,
I have a regression test for this case now. This was found by an
intermittent data corruption scenario on pmem from a test tool using
/dev/mem.
drivers/char/mem.c | 104 +++++++++++++++++++++++++++++++++++++++++++-
include/linux/ioport.h | 6 +++
include/uapi/linux/magic.h | 1
kernel/resource.c | 5 ++
4 files changed, 114 insertions(+), 2 deletions(-)
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 43dd0891ca1e..3a3b7b37c5d2 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -31,11 +31,15 @@
#include <linux/uio.h>
#include <linux/uaccess.h>
#include <linux/security.h>
+#include <linux/pseudo_fs.h>
+#include <uapi/linux/magic.h>
+#include <linux/mount.h>
#ifdef CONFIG_IA64
# include <linux/efi.h>
#endif
+#define DEVMEM_MINOR 1
#define DEVPORT_MINOR 4
static inline unsigned long size_inside_page(unsigned long start,
@@ -805,12 +809,66 @@ static loff_t memory_lseek(struct file *file, loff_t offset, int orig)
return ret;
}
+static struct inode *devmem_inode;
+
+#ifdef CONFIG_IO_STRICT_DEVMEM
+void revoke_devmem(struct resource *res)
+{
+ struct inode *inode = READ_ONCE(devmem_inode);
+
+ /*
+ * Check that the initialization has completed. Losing the race
+ * is ok because it means drivers are claiming resources before
+ * the fs_initcall level of init and prevent /dev/mem from
+ * establishing mappings.
+ */
+ smp_rmb();
+ if (!inode)
+ return;
+
+ /*
+ * The expectation is that the driver has successfully marked
+ * the resource busy by this point, so devmem_is_allowed()
+ * should start returning false, however for performance this
+ * does not iterate the entire resource range.
+ */
+ if (devmem_is_allowed(PHYS_PFN(res->start)) &&
+ devmem_is_allowed(PHYS_PFN(res->end))) {
+ /*
+ * *cringe* iomem=relaxed says "go ahead, what's the
+ * worst that can happen?"
+ */
+ return;
+ }
+
+ unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 1);
+}
+#endif
+
static int open_port(struct inode *inode, struct file *filp)
{
+ int rc;
+
if (!capable(CAP_SYS_RAWIO))
return -EPERM;
- return security_locked_down(LOCKDOWN_DEV_MEM);
+ rc = security_locked_down(LOCKDOWN_DEV_MEM);
+ if (rc)
+ return rc;
+
+ if (iminor(inode) != DEVMEM_MINOR)
+ return 0;
+
+ /*
+ * Use a unified address space to have a single point to manage
+ * revocations when drivers want to take over a /dev/mem mapped
+ * range.
+ */
+ inode->i_mapping = devmem_inode->i_mapping;
+ inode->i_mapping->host = devmem_inode;
+ filp->f_mapping = inode->i_mapping;
+
+ return 0;
}
#define zero_lseek null_lseek
@@ -885,7 +943,7 @@ static const struct memdev {
fmode_t fmode;
} devlist[] = {
#ifdef CONFIG_DEVMEM
- [1] = { "mem", 0, &mem_fops, FMODE_UNSIGNED_OFFSET },
+ [DEVMEM_MINOR] = { "mem", 0, &mem_fops, FMODE_UNSIGNED_OFFSET },
#endif
#ifdef CONFIG_DEVKMEM
[2] = { "kmem", 0, &kmem_fops, FMODE_UNSIGNED_OFFSET },
@@ -939,6 +997,46 @@ static char *mem_devnode(struct device *dev, umode_t *mode)
static struct class *mem_class;
+static int devmem_fs_init_fs_context(struct fs_context *fc)
+{
+ return init_pseudo(fc, DEVMEM_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type devmem_fs_type = {
+ .name = "devmem",
+ .owner = THIS_MODULE,
+ .init_fs_context = devmem_fs_init_fs_context,
+ .kill_sb = kill_anon_super,
+};
+
+static int devmem_init_inode(void)
+{
+ static struct vfsmount *devmem_vfs_mount;
+ static int devmem_fs_cnt;
+ struct inode *inode;
+ int rc;
+
+ rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt);
+ if (rc < 0) {
+ pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc);
+ return rc;
+ }
+
+ inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb);
+ if (IS_ERR(inode)) {
+ rc = PTR_ERR(inode);
+ pr_err("Cannot allocate inode for /dev/mem: %d\n", rc);
+ simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt);
+ return rc;
+ }
+
+ /* publish /dev/mem initialized */
+ WRITE_ONCE(devmem_inode, inode);
+ smp_wmb();
+
+ return 0;
+}
+
static int __init chr_dev_init(void)
{
int minor;
@@ -960,6 +1058,8 @@ static int __init chr_dev_init(void)
*/
if ((minor == DEVPORT_MINOR) && !arch_has_dev_port())
continue;
+ if ((minor == DEVMEM_MINOR) && devmem_init_inode() != 0)
+ continue;
device_create(mem_class, NULL, MKDEV(MEM_MAJOR, minor),
NULL, devlist[minor].name);
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index a9b9170b5dd2..6c3eca90cbc4 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -301,5 +301,11 @@ struct resource *devm_request_free_mem_region(struct device *dev,
struct resource *request_free_mem_region(struct resource *base,
unsigned long size, const char *name);
+#ifdef CONFIG_IO_STRICT_DEVMEM
+void revoke_devmem(struct resource *res);
+#else
+static inline void revoke_devmem(struct resource *res) { };
+#endif
+
#endif /* __ASSEMBLY__ */
#endif /* _LINUX_IOPORT_H */
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index d78064007b17..f3956fc11de6 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -94,6 +94,7 @@
#define BALLOON_KVM_MAGIC 0x13661366
#define ZSMALLOC_MAGIC 0x58295829
#define DMA_BUF_MAGIC 0x444d4142 /* "DMAB" */
+#define DEVMEM_MAGIC 0x454d444d /* "DMEM" */
#define Z3FOLD_MAGIC 0x33
#define PPC_CMM_MAGIC 0xc7571590
diff --git a/kernel/resource.c b/kernel/resource.c
index 76036a41143b..841737bbda9e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1126,6 +1126,7 @@ struct resource * __request_region(struct resource *parent,
{
DECLARE_WAITQUEUE(wait, current);
struct resource *res = alloc_resource(GFP_KERNEL);
+ struct resource *orig_parent = parent;
if (!res)
return NULL;
@@ -1176,6 +1177,10 @@ struct resource * __request_region(struct resource *parent,
break;
}
write_unlock(&resource_lock);
+
+ if (res && orig_parent == &iomem_resource)
+ revoke_devmem(res);
+
return res;
}
EXPORT_SYMBOL(__request_region);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region
2020-05-19 7:03 [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region Dan Williams
@ 2020-05-19 12:11 ` Greg KH
2020-05-19 18:27 ` Dan Williams
2020-05-19 18:46 ` Matthew Wilcox
1 sibling, 1 reply; 8+ messages in thread
From: Greg KH @ 2020-05-19 12:11 UTC (permalink / raw)
To: Dan Williams
Cc: Arnd Bergmann, Ingo Molnar, Kees Cook, Russell King,
Andrew Morton, linux-kernel, linux-mm
On Tue, May 19, 2020 at 12:03:06AM -0700, Dan Williams wrote:
> Close the hole of holding a mapping over kernel driver takeover event of
> a given address range.
>
> Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
> kernel against scenarios where a /dev/mem user tramples memory that a
> kernel driver owns. However, this protection only prevents *new* read(),
> write() and mmap() requests. Established mappings prior to the driver
> calling request_mem_region() are left alone.
>
> Especially with persistent memory, and the core kernel metadata that is
> stored there, there are plentiful scenarios for a /dev/mem user to
> violate the expectations of the driver and cause amplified damage.
>
> Teach request_mem_region() to find and shoot down active /dev/mem
> mappings that it believes it has successfully claimed for the exclusive
> use of the driver. Effectively a driver call to request_mem_region()
> becomes a hole-punch on the /dev/mem device.
>
> The typical usage of unmap_mapping_range() is part of
> truncate_pagecache() to punch a hole in a file, but in this case the
> implementation is only doing the "first half" of a hole punch. Namely it
> is just evacuating current established mappings of the "hole", and it
> relies on the fact that /dev/mem establishes mappings in terms of
> absolute physical address offsets. Once existing mmap users are
> invalidated they can attempt to re-establish the mapping, or attempt to
> continue issuing read(2) / write(2) to the invalidated extent, but they
> will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
> block those subsequent accesses.
>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> Changes since v1 [1]:
>
> - updated the changelog to describe the usage of unmap_mapping_range().
> No other logic changes:
>
> [1]: http://lore.kernel.org/r/158662721802.1893045.12301414116114602646.stgit@dwillia2-desk3.amr.corp.intel.com
>
> Greg, Andrew,
>
> I have a regression test for this case now. This was found by an
> intermittent data corruption scenario on pmem from a test tool using
> /dev/mem.
Ick, why are test tools messing around in /dev/mem :)
Anyway, this seems sane to me, want me to take it through my tree?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region
2020-05-19 12:11 ` Greg KH
@ 2020-05-19 18:27 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2020-05-19 18:27 UTC (permalink / raw)
To: Greg KH
Cc: Arnd Bergmann, Ingo Molnar, Kees Cook, Russell King,
Andrew Morton, Linux Kernel Mailing List, Linux MM
On Tue, May 19, 2020 at 5:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, May 19, 2020 at 12:03:06AM -0700, Dan Williams wrote:
> > Close the hole of holding a mapping over kernel driver takeover event of
> > a given address range.
> >
> > Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
> > kernel against scenarios where a /dev/mem user tramples memory that a
> > kernel driver owns. However, this protection only prevents *new* read(),
> > write() and mmap() requests. Established mappings prior to the driver
> > calling request_mem_region() are left alone.
> >
> > Especially with persistent memory, and the core kernel metadata that is
> > stored there, there are plentiful scenarios for a /dev/mem user to
> > violate the expectations of the driver and cause amplified damage.
> >
> > Teach request_mem_region() to find and shoot down active /dev/mem
> > mappings that it believes it has successfully claimed for the exclusive
> > use of the driver. Effectively a driver call to request_mem_region()
> > becomes a hole-punch on the /dev/mem device.
> >
> > The typical usage of unmap_mapping_range() is part of
> > truncate_pagecache() to punch a hole in a file, but in this case the
> > implementation is only doing the "first half" of a hole punch. Namely it
> > is just evacuating current established mappings of the "hole", and it
> > relies on the fact that /dev/mem establishes mappings in terms of
> > absolute physical address offsets. Once existing mmap users are
> > invalidated they can attempt to re-establish the mapping, or attempt to
> > continue issuing read(2) / write(2) to the invalidated extent, but they
> > will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
> > block those subsequent accesses.
> >
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Russell King <linux@arm.linux.org.uk>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > Changes since v1 [1]:
> >
> > - updated the changelog to describe the usage of unmap_mapping_range().
> > No other logic changes:
> >
> > [1]: http://lore.kernel.org/r/158662721802.1893045.12301414116114602646.stgit@dwillia2-desk3.amr.corp.intel.com
> >
> > Greg, Andrew,
> >
> > I have a regression test for this case now. This was found by an
> > intermittent data corruption scenario on pmem from a test tool using
> > /dev/mem.
>
> Ick, why are test tools messing around in /dev/mem :)
Yeah, I'm all for useful tools, just not at the expense of kernel integrity.
> Anyway, this seems sane to me, want me to take it through my tree?
Yes please, seems to belong with the driver core.
Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region
@ 2020-05-19 18:27 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2020-05-19 18:27 UTC (permalink / raw)
To: Greg KH
Cc: Arnd Bergmann, Ingo Molnar, Kees Cook, Russell King,
Andrew Morton, Linux Kernel Mailing List, Linux MM
On Tue, May 19, 2020 at 5:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, May 19, 2020 at 12:03:06AM -0700, Dan Williams wrote:
> > Close the hole of holding a mapping over kernel driver takeover event of
> > a given address range.
> >
> > Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
> > kernel against scenarios where a /dev/mem user tramples memory that a
> > kernel driver owns. However, this protection only prevents *new* read(),
> > write() and mmap() requests. Established mappings prior to the driver
> > calling request_mem_region() are left alone.
> >
> > Especially with persistent memory, and the core kernel metadata that is
> > stored there, there are plentiful scenarios for a /dev/mem user to
> > violate the expectations of the driver and cause amplified damage.
> >
> > Teach request_mem_region() to find and shoot down active /dev/mem
> > mappings that it believes it has successfully claimed for the exclusive
> > use of the driver. Effectively a driver call to request_mem_region()
> > becomes a hole-punch on the /dev/mem device.
> >
> > The typical usage of unmap_mapping_range() is part of
> > truncate_pagecache() to punch a hole in a file, but in this case the
> > implementation is only doing the "first half" of a hole punch. Namely it
> > is just evacuating current established mappings of the "hole", and it
> > relies on the fact that /dev/mem establishes mappings in terms of
> > absolute physical address offsets. Once existing mmap users are
> > invalidated they can attempt to re-establish the mapping, or attempt to
> > continue issuing read(2) / write(2) to the invalidated extent, but they
> > will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
> > block those subsequent accesses.
> >
> > Cc: Arnd Bergmann <arnd@arndb.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Russell King <linux@arm.linux.org.uk>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > ---
> > Changes since v1 [1]:
> >
> > - updated the changelog to describe the usage of unmap_mapping_range().
> > No other logic changes:
> >
> > [1]: http://lore.kernel.org/r/158662721802.1893045.12301414116114602646.stgit@dwillia2-desk3.amr.corp.intel.com
> >
> > Greg, Andrew,
> >
> > I have a regression test for this case now. This was found by an
> > intermittent data corruption scenario on pmem from a test tool using
> > /dev/mem.
>
> Ick, why are test tools messing around in /dev/mem :)
Yeah, I'm all for useful tools, just not at the expense of kernel integrity.
> Anyway, this seems sane to me, want me to take it through my tree?
Yes please, seems to belong with the driver core.
Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region
2020-05-19 7:03 [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region Dan Williams
2020-05-19 12:11 ` Greg KH
@ 2020-05-19 18:46 ` Matthew Wilcox
2020-05-19 19:36 ` Dan Williams
1 sibling, 1 reply; 8+ messages in thread
From: Matthew Wilcox @ 2020-05-19 18:46 UTC (permalink / raw)
To: Dan Williams
Cc: gregkh, Arnd Bergmann, Ingo Molnar, Kees Cook, Russell King,
Andrew Morton, linux-kernel, linux-mm
On Tue, May 19, 2020 at 12:03:06AM -0700, Dan Williams wrote:
> +void revoke_devmem(struct resource *res)
> +{
> + struct inode *inode = READ_ONCE(devmem_inode);
> +
> + /*
> + * Check that the initialization has completed. Losing the race
> + * is ok because it means drivers are claiming resources before
> + * the fs_initcall level of init and prevent /dev/mem from
> + * establishing mappings.
> + */
> + smp_rmb();
> + if (!inode)
> + return;
Which wmb() is this pairing with?
> +static int devmem_init_inode(void)
> +{
> + static struct vfsmount *devmem_vfs_mount;
> + static int devmem_fs_cnt;
> + struct inode *inode;
> + int rc;
> +
> + rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt);
> + if (rc < 0) {
> + pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc);
> + return rc;
> + }
> +
> + inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb);
> + if (IS_ERR(inode)) {
> + rc = PTR_ERR(inode);
> + pr_err("Cannot allocate inode for /dev/mem: %d\n", rc);
> + simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt);
> + return rc;
> + }
> +
> + /* publish /dev/mem initialized */
> + WRITE_ONCE(devmem_inode, inode);
> + smp_wmb();
> +
> + return 0;
... is that this one? I don't see what it's guarding against. Surely if
it's needed to ensure that the writes to 'inode' have happened before
the write of the inode pointer, the smp_wmb() needs to be before the
WRITE_ONCE, not after it?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region
2020-05-19 18:46 ` Matthew Wilcox
@ 2020-05-19 19:36 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2020-05-19 19:36 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Greg KH, Arnd Bergmann, Ingo Molnar, Kees Cook, Russell King,
Andrew Morton, Linux Kernel Mailing List, Linux MM
On Tue, May 19, 2020 at 11:46 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, May 19, 2020 at 12:03:06AM -0700, Dan Williams wrote:
> > +void revoke_devmem(struct resource *res)
> > +{
> > + struct inode *inode = READ_ONCE(devmem_inode);
> > +
> > + /*
> > + * Check that the initialization has completed. Losing the race
> > + * is ok because it means drivers are claiming resources before
> > + * the fs_initcall level of init and prevent /dev/mem from
> > + * establishing mappings.
> > + */
> > + smp_rmb();
> > + if (!inode)
> > + return;
>
> Which wmb() is this pairing with?
>
> > +static int devmem_init_inode(void)
> > +{
> > + static struct vfsmount *devmem_vfs_mount;
> > + static int devmem_fs_cnt;
> > + struct inode *inode;
> > + int rc;
> > +
> > + rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt);
> > + if (rc < 0) {
> > + pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc);
> > + return rc;
> > + }
> > +
> > + inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb);
> > + if (IS_ERR(inode)) {
> > + rc = PTR_ERR(inode);
> > + pr_err("Cannot allocate inode for /dev/mem: %d\n", rc);
> > + simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt);
> > + return rc;
> > + }
> > +
> > + /* publish /dev/mem initialized */
> > + WRITE_ONCE(devmem_inode, inode);
> > + smp_wmb();
> > +
> > + return 0;
>
> ... is that this one? I don't see what it's guarding against. Surely if
> it's needed to ensure that the writes to 'inode' have happened before
> the write of the inode pointer, the smp_wmb() needs to be before the
> WRITE_ONCE, not after it?
Whoops, yes. Thanks for the catch.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region
@ 2020-05-19 19:36 ` Dan Williams
0 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2020-05-19 19:36 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Greg KH, Arnd Bergmann, Ingo Molnar, Kees Cook, Russell King,
Andrew Morton, Linux Kernel Mailing List, Linux MM
On Tue, May 19, 2020 at 11:46 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, May 19, 2020 at 12:03:06AM -0700, Dan Williams wrote:
> > +void revoke_devmem(struct resource *res)
> > +{
> > + struct inode *inode = READ_ONCE(devmem_inode);
> > +
> > + /*
> > + * Check that the initialization has completed. Losing the race
> > + * is ok because it means drivers are claiming resources before
> > + * the fs_initcall level of init and prevent /dev/mem from
> > + * establishing mappings.
> > + */
> > + smp_rmb();
> > + if (!inode)
> > + return;
>
> Which wmb() is this pairing with?
>
> > +static int devmem_init_inode(void)
> > +{
> > + static struct vfsmount *devmem_vfs_mount;
> > + static int devmem_fs_cnt;
> > + struct inode *inode;
> > + int rc;
> > +
> > + rc = simple_pin_fs(&devmem_fs_type, &devmem_vfs_mount, &devmem_fs_cnt);
> > + if (rc < 0) {
> > + pr_err("Cannot mount /dev/mem pseudo filesystem: %d\n", rc);
> > + return rc;
> > + }
> > +
> > + inode = alloc_anon_inode(devmem_vfs_mount->mnt_sb);
> > + if (IS_ERR(inode)) {
> > + rc = PTR_ERR(inode);
> > + pr_err("Cannot allocate inode for /dev/mem: %d\n", rc);
> > + simple_release_fs(&devmem_vfs_mount, &devmem_fs_cnt);
> > + return rc;
> > + }
> > +
> > + /* publish /dev/mem initialized */
> > + WRITE_ONCE(devmem_inode, inode);
> > + smp_wmb();
> > +
> > + return 0;
>
> ... is that this one? I don't see what it's guarding against. Surely if
> it's needed to ensure that the writes to 'inode' have happened before
> the write of the inode pointer, the smp_wmb() needs to be before the
> WRITE_ONCE, not after it?
Whoops, yes. Thanks for the catch.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region
2020-05-19 18:27 ` Dan Williams
(?)
@ 2020-05-20 5:44 ` Greg KH
-1 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2020-05-20 5:44 UTC (permalink / raw)
To: Dan Williams
Cc: Arnd Bergmann, Ingo Molnar, Kees Cook, Russell King,
Andrew Morton, Linux Kernel Mailing List, Linux MM
On Tue, May 19, 2020 at 11:27:02AM -0700, Dan Williams wrote:
> On Tue, May 19, 2020 at 5:11 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, May 19, 2020 at 12:03:06AM -0700, Dan Williams wrote:
> > > Close the hole of holding a mapping over kernel driver takeover event of
> > > a given address range.
> > >
> > > Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > > introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
> > > kernel against scenarios where a /dev/mem user tramples memory that a
> > > kernel driver owns. However, this protection only prevents *new* read(),
> > > write() and mmap() requests. Established mappings prior to the driver
> > > calling request_mem_region() are left alone.
> > >
> > > Especially with persistent memory, and the core kernel metadata that is
> > > stored there, there are plentiful scenarios for a /dev/mem user to
> > > violate the expectations of the driver and cause amplified damage.
> > >
> > > Teach request_mem_region() to find and shoot down active /dev/mem
> > > mappings that it believes it has successfully claimed for the exclusive
> > > use of the driver. Effectively a driver call to request_mem_region()
> > > becomes a hole-punch on the /dev/mem device.
> > >
> > > The typical usage of unmap_mapping_range() is part of
> > > truncate_pagecache() to punch a hole in a file, but in this case the
> > > implementation is only doing the "first half" of a hole punch. Namely it
> > > is just evacuating current established mappings of the "hole", and it
> > > relies on the fact that /dev/mem establishes mappings in terms of
> > > absolute physical address offsets. Once existing mmap users are
> > > invalidated they can attempt to re-establish the mapping, or attempt to
> > > continue issuing read(2) / write(2) to the invalidated extent, but they
> > > will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
> > > block those subsequent accesses.
> > >
> > > Cc: Arnd Bergmann <arnd@arndb.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Russell King <linux@arm.linux.org.uk>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > > ---
> > > Changes since v1 [1]:
> > >
> > > - updated the changelog to describe the usage of unmap_mapping_range().
> > > No other logic changes:
> > >
> > > [1]: http://lore.kernel.org/r/158662721802.1893045.12301414116114602646.stgit@dwillia2-desk3.amr.corp.intel.com
> > >
> > > Greg, Andrew,
> > >
> > > I have a regression test for this case now. This was found by an
> > > intermittent data corruption scenario on pmem from a test tool using
> > > /dev/mem.
> >
> > Ick, why are test tools messing around in /dev/mem :)
>
> Yeah, I'm all for useful tools, just not at the expense of kernel integrity.
>
> > Anyway, this seems sane to me, want me to take it through my tree?
>
> Yes please, seems to belong with the driver core.
Ok, will wait for a v3 to handle the issue that was just found in
review.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-05-20 5:44 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 7:03 [PATCH v2] /dev/mem: Revoke mappings when a driver claims the region Dan Williams
2020-05-19 12:11 ` Greg KH
2020-05-19 18:27 ` Dan Williams
2020-05-19 18:27 ` Dan Williams
2020-05-20 5:44 ` Greg KH
2020-05-19 18:46 ` Matthew Wilcox
2020-05-19 19:36 ` Dan Williams
2020-05-19 19:36 ` Dan Williams
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.