All of lore.kernel.org
 help / color / mirror / Atom feed
* CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-03-18 17:39 Shachar Raindel
       [not found] ` <AM3PR05MB0935AABF569F15EA846B8E72DC000-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  2015-04-02 10:04   ` Yann Droneaud
  0 siblings, 2 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-03-18 17:39 UTC (permalink / raw)
  To: oss-security-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)
  Cc: stable-u79uwXL29TY76Z2rM5mHXA

Hi,

It was found that the Linux kernel's InfiniBand/RDMA subsystem did not properly sanitize input parameters while registering memory regions from user space via the (u)verbs API. A local user with access to a /dev/infiniband/uverbsX device could use this flaw to crash the system or, potentially, escalate their privileges on the system.

The issue has been assigned CVE-2014-8159.

The issue exists in the InfiniBand/RDMA/iWARP drivers since Linux Kernel version 2.6.13.

Mellanox OFED 2.4-1.0.4 fixes the issue. Available from:
http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers 

RedHat errata: https://access.redhat.com/security/cve/CVE-2014-8159
Canonical errata: http://people.canonical.com/~ubuntu-security/cve/2014/CVE-2014-8159.html
Novell (Suse) bug tracking: https://bugzilla.novell.com/show_bug.cgi?id=914742


The following patch fixes the issue:

--------------- 8< ------------------------------

>From d4d68430d4a12c569e28b4f4468284ea22111186 Mon Sep 17 00:00:00 2001
From: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Date: Sun, 04 Jan 2015 18:30:32 +0200
Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get address arithmetic

Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area
requested.

Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.

This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.

Signed-off-by: Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Jack Morgenstein <jackm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index aec7a6a..8c014b5 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -99,6 +99,14 @@
 	if (dmasync)
 		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
 
+	/*
+	 * If the combination of the addr and size requested for this memory
+	 * region causes an integer overflow, return error.
+	 */
+	if ((PAGE_ALIGN(addr + size) <= size) ||
+	    (PAGE_ALIGN(addr + size) <= addr))
+		return ERR_PTR(-EINVAL);
+
 	if (!can_do_mlock())
 		return ERR_PTR(-EPERM);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
       [not found] ` <AM3PR05MB0935AABF569F15EA846B8E72DC000-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-04-01 17:28   ` Roland Dreier
  2015-04-02  7:52     ` Shachar Raindel
  0 siblings, 1 reply; 45+ messages in thread
From: Roland Dreier @ 2015-04-01 17:28 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	stable-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 18, 2015 at 10:39 AM, Shachar Raindel <raindel-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> Date: Sun, 04 Jan 2015 18:30:32 +0200
> Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get address arithmetic

Just so we're clear, this bug has been known since January 4, and it's
getting sent upstream now?

I assume we want it in 4.0 and -stable?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-01 17:28   ` Roland Dreier
@ 2015-04-02  7:52     ` Shachar Raindel
  2015-04-02 16:32       ` Roland Dreier
  0 siblings, 1 reply; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02  7:52 UTC (permalink / raw)
  To: Roland Dreier
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	stable



> -----Original Message-----
> From: Roland Dreier [mailto:roland@purestorage.com]
> Sent: Wednesday, April 01, 2015 8:28 PM
> To: Shachar Raindel
> Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org>
> (linux-rdma@vger.kernel.org); stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> On Wed, Mar 18, 2015 at 10:39 AM, Shachar Raindel <raindel@mellanox.com>
> wrote:
> > Date: Sun, 04 Jan 2015 18:30:32 +0200
> > Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get
> address arithmetic
> 
> Just so we're clear, this bug has been known since January 4, and it's
> getting sent upstream now?
> 

This bug has been known since January, yes. To reduce the window of
vulnerability, we privately reported it to major distributions first,
aligning everyone on the same release date. Such synchronization takes
time. Given that the bug was not, to the best of our knowledge,
exploited in the wild, we believed that 3 months of embargo is not
exuberated. This is a common practice in the security industry, called
"responsible disclosure."

Following the kernel  security bugs policy [1], we reported it to 
the kernel security contacts few days before making the issue public.
Few days after issue became public, we published a clear report to all
of the relevant mailing lists.

> I assume we want it in 4.0 and -stable?

Yes. Should be applied ASAP to all kernels that contain uverbs.
We put some effort to make sure the patch will apply cleanly on all
historical versions of the kernel, so backporting should be simple.


Thanks,
--Shachar

[1] https://www.kernel.org/doc/Documentation/SecurityBugs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-03-18 17:39 CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access Shachar Raindel
@ 2015-04-02 10:04   ` Yann Droneaud
  2015-04-02 10:04   ` Yann Droneaud
  1 sibling, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 10:04 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> Hi,
> 
> It was found that the Linux kernel's InfiniBand/RDMA subsystem did not
> properly sanitize input parameters while registering memory regions
> from user space via the (u)verbs API. A local user with access to
> a /dev/infiniband/uverbsX device could use this flaw to crash the
> system or, potentially, escalate their privileges on the system.
> 
> The issue has been assigned CVE-2014-8159.
> 
> The issue exists in the InfiniBand/RDMA/iWARP drivers since Linux
> Kernel version 2.6.13.
> 
> Mellanox OFED 2.4-1.0.4 fixes the issue. Available from:
> http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers 
> 
> RedHat errata: https://access.redhat.com/security/cve/CVE-2014-8159
> Canonical errata: http://people.canonical.com/~ubuntu-security/cve/2014/CVE-2014-8159.html
> Novell (Suse) bug tracking: https://bugzilla.novell.com/show_bug.cgi?id=914742
> 
> 
> The following patch fixes the issue:
> 
> --------------- 8< ------------------------------
> 
> From d4d68430d4a12c569e28b4f4468284ea22111186 Mon Sep 17 00:00:00 2001
> From: Shachar Raindel <raindel@mellanox.com>
> Date: Sun, 04 Jan 2015 18:30:32 +0200
> Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get address arithmetic
> 
> Properly verify that the resulting page aligned end address is larger
> than both the start address and the length of the memory area
> requested.
> 
> Both the start and length arguments for ib_umem_get are controlled by
> the user. A misbehaving user can provide values which will cause an
> integer overflow when calculating the page aligned end address.
> 
> This overflow can cause also miscalculation of the number of pages
> mapped, and additional logic issues.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index aec7a6a..8c014b5 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -99,6 +99,14 @@
>  	if (dmasync)
>  		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
>  
> +	/*
> +	 * If the combination of the addr and size requested for this memory
> +	 * region causes an integer overflow, return error.
> +	 */
> +	if ((PAGE_ALIGN(addr + size) <= size) ||
> +	    (PAGE_ALIGN(addr + size) <= addr))
> +		return ERR_PTR(-EINVAL);
> +

Can access_ok() be used here ?

         if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
                        addr, size))
                  return ERR_PTR(-EINVAL);


>  	if (!can_do_mlock())
>  		return ERR_PTR(-EPERM);
> 

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 10:04   ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 10:04 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> Hi,
> 
> It was found that the Linux kernel's InfiniBand/RDMA subsystem did not
> properly sanitize input parameters while registering memory regions
> from user space via the (u)verbs API. A local user with access to
> a /dev/infiniband/uverbsX device could use this flaw to crash the
> system or, potentially, escalate their privileges on the system.
> 
> The issue has been assigned CVE-2014-8159.
> 
> The issue exists in the InfiniBand/RDMA/iWARP drivers since Linux
> Kernel version 2.6.13.
> 
> Mellanox OFED 2.4-1.0.4 fixes the issue. Available from:
> http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers 
> 
> RedHat errata: https://access.redhat.com/security/cve/CVE-2014-8159
> Canonical errata: http://people.canonical.com/~ubuntu-security/cve/2014/CVE-2014-8159.html
> Novell (Suse) bug tracking: https://bugzilla.novell.com/show_bug.cgi?id=914742
> 
> 
> The following patch fixes the issue:
> 
> --------------- 8< ------------------------------
> 
> From d4d68430d4a12c569e28b4f4468284ea22111186 Mon Sep 17 00:00:00 2001
> From: Shachar Raindel <raindel@mellanox.com>
> Date: Sun, 04 Jan 2015 18:30:32 +0200
> Subject: [PATCH] IB/core: Prevent integer overflow in ib_umem_get address arithmetic
> 
> Properly verify that the resulting page aligned end address is larger
> than both the start address and the length of the memory area
> requested.
> 
> Both the start and length arguments for ib_umem_get are controlled by
> the user. A misbehaving user can provide values which will cause an
> integer overflow when calculating the page aligned end address.
> 
> This overflow can cause also miscalculation of the number of pages
> mapped, and additional logic issues.
> 
> Signed-off-by: Shachar Raindel <raindel@mellanox.com>
> Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index aec7a6a..8c014b5 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -99,6 +99,14 @@
>  	if (dmasync)
>  		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
>  
> +	/*
> +	 * If the combination of the addr and size requested for this memory
> +	 * region causes an integer overflow, return error.
> +	 */
> +	if ((PAGE_ALIGN(addr + size) <= size) ||
> +	    (PAGE_ALIGN(addr + size) <= addr))
> +		return ERR_PTR(-EINVAL);
> +

Can access_ok() be used here ?

         if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
                        addr, size))
                  return ERR_PTR(-EINVAL);


>  	if (!can_do_mlock())
>  		return ERR_PTR(-EPERM);
> 

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 10:04   ` Yann Droneaud
  (?)
@ 2015-04-02 10:52       ` Shachar Raindel
  -1 siblings, 0 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02 10:52 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: oss-security-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA

Hi,

> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Thursday, April 02, 2015 1:05 PM
> To: Shachar Raindel
> Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org>
> (linux-rdma@vger.kernel.org); linux-kernel@vger.kernel.org;
> stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> Hi,
> 
> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > Hi,
> >

<snipped long e-mail>
 
> > +	/*
> > +	 * If the combination of the addr and size requested for this
> memory
> > +	 * region causes an integer overflow, return error.
> > +	 */
> > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > +	    (PAGE_ALIGN(addr + size) <= addr))
> > +		return ERR_PTR(-EINVAL);
> > +
> 
> Can access_ok() be used here ?
> 
>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>                         addr, size))
>                   return ERR_PTR(-EINVAL);
> 

No, this will break the current ODP semantics.

ODP allows the user to register memory that is not accessible yet.
This is a critical design feature, as it allows avoiding holding
a registration cache. Adding this check will break the behavior,
forcing memory to be all accessible when registering an ODP MR.

Thanks,
--Shachar

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 10:52       ` Shachar Raindel
  0 siblings, 0 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02 10:52 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1481 bytes --]

Hi,

> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Thursday, April 02, 2015 1:05 PM
> To: Shachar Raindel
> Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org>
> (linux-rdma@vger.kernel.org); linux-kernel@vger.kernel.org;
> stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> Hi,
> 
> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > Hi,
> >

<snipped long e-mail>
 
> > +	/*
> > +	 * If the combination of the addr and size requested for this
> memory
> > +	 * region causes an integer overflow, return error.
> > +	 */
> > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > +	    (PAGE_ALIGN(addr + size) <= addr))
> > +		return ERR_PTR(-EINVAL);
> > +
> 
> Can access_ok() be used here ?
> 
>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>                         addr, size))
>                   return ERR_PTR(-EINVAL);
> 

No, this will break the current ODP semantics.

ODP allows the user to register memory that is not accessible yet.
This is a critical design feature, as it allows avoiding holding
a registration cache. Adding this check will break the behavior,
forcing memory to be all accessible when registering an ODP MR.

Thanks,
--Shachar
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 10:52       ` Shachar Raindel
  0 siblings, 0 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02 10:52 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Thursday, April 02, 2015 1:05 PM
> To: Shachar Raindel
> Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org>
> (linux-rdma@vger.kernel.org); linux-kernel@vger.kernel.org;
> stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> Hi,
> 
> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > Hi,
> >

<snipped long e-mail>
 
> > +	/*
> > +	 * If the combination of the addr and size requested for this
> memory
> > +	 * region causes an integer overflow, return error.
> > +	 */
> > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > +	    (PAGE_ALIGN(addr + size) <= addr))
> > +		return ERR_PTR(-EINVAL);
> > +
> 
> Can access_ok() be used here ?
> 
>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>                         addr, size))
>                   return ERR_PTR(-EINVAL);
> 

No, this will break the current ODP semantics.

ODP allows the user to register memory that is not accessible yet.
This is a critical design feature, as it allows avoiding holding
a registration cache. Adding this check will break the behavior,
forcing memory to be all accessible when registering an ODP MR.

Thanks,
--Shachar

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 10:52       ` Shachar Raindel
  (?)
@ 2015-04-02 13:30           ` Yann Droneaud
  -1 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 13:30 UTC (permalink / raw)
  To: Shachar Raindel, Haggai Eran, Sagi Grimberg
  Cc: oss-security-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA

Hi,

Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
> > Sent: Thursday, April 02, 2015 1:05 PM
> > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :

> > > +	/*
> > > +	 * If the combination of the addr and size requested for this
> > memory
> > > +	 * region causes an integer overflow, return error.
> > > +	 */
> > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > 
> > Can access_ok() be used here ?
> > 
> >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >                         addr, size))
> >                   return ERR_PTR(-EINVAL);
> > 
> 
> No, this will break the current ODP semantics.
> 
> ODP allows the user to register memory that is not accessible yet.
> This is a critical design feature, as it allows avoiding holding
> a registration cache. Adding this check will break the behavior,
> forcing memory to be all accessible when registering an ODP MR.
> 

Where's the check for the range being in userspace memory space,
especially for the ODP case ?

For non ODP case (eg. plain old behavior), does get_user_pages()
ensure the requested pages fit in userspace region on all 
architectures ? I think so.

In ODP case, I'm not sure such check is ever done ?
(Aside, does it take special mesure to protect shared mapping from
being read and/or *written* ?)

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 13:30           ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 13:30 UTC (permalink / raw)
  To: Shachar Raindel, Haggai Eran, Sagi Grimberg
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 1:05 PM
> > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :

> > > +	/*
> > > +	 * If the combination of the addr and size requested for this
> > memory
> > > +	 * region causes an integer overflow, return error.
> > > +	 */
> > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > 
> > Can access_ok() be used here ?
> > 
> >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >                         addr, size))
> >                   return ERR_PTR(-EINVAL);
> > 
> 
> No, this will break the current ODP semantics.
> 
> ODP allows the user to register memory that is not accessible yet.
> This is a critical design feature, as it allows avoiding holding
> a registration cache. Adding this check will break the behavior,
> forcing memory to be all accessible when registering an ODP MR.
> 

Where's the check for the range being in userspace memory space,
especially for the ODP case ?

For non ODP case (eg. plain old behavior), does get_user_pages()
ensure the requested pages fit in userspace region on all 
architectures ? I think so.

In ODP case, I'm not sure such check is ever done ?
(Aside, does it take special mesure to protect shared mapping from
being read and/or *written* ?)

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 13:30           ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 13:30 UTC (permalink / raw)
  To: Shachar Raindel, Haggai Eran, Sagi Grimberg
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 1:05 PM
> > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :

> > > +	/*
> > > +	 * If the combination of the addr and size requested for this
> > memory
> > > +	 * region causes an integer overflow, return error.
> > > +	 */
> > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > 
> > Can access_ok() be used here ?
> > 
> >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >                         addr, size))
> >                   return ERR_PTR(-EINVAL);
> > 
> 
> No, this will break the current ODP semantics.
> 
> ODP allows the user to register memory that is not accessible yet.
> This is a critical design feature, as it allows avoiding holding
> a registration cache. Adding this check will break the behavior,
> forcing memory to be all accessible when registering an ODP MR.
> 

Where's the check for the range being in userspace memory space,
especially for the ODP case ?

For non ODP case (eg. plain old behavior), does get_user_pages()
ensure the requested pages fit in userspace region on all 
architectures ? I think so.

In ODP case, I'm not sure such check is ever done ?
(Aside, does it take special mesure to protect shared mapping from
being read and/or *written* ?)

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 10:52       ` Shachar Raindel
  (?)
@ 2015-04-02 15:15         ` Yann Droneaud
  -1 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 15:15 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,
Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 1:05 PM
> > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
...
> > > +	/*
> > > +	 * If the combination of the addr and size requested for this
> > memory
> > > +	 * region causes an integer overflow, return error.
> > > +	 */
> > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > 
> > Can access_ok() be used here ?
> > 
> >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >                         addr, size))
> >                   return ERR_PTR(-EINVAL);
> > 
> 
> No, this will break the current ODP semantics.
> 
> ODP allows the user to register memory that is not accessible yet.
> This is a critical design feature, as it allows avoiding holding
> a registration cache. Adding this check will break the behavior,
> forcing memory to be all accessible when registering an ODP MR.
> 

Failed to notice previously, but since this would break ODP, and ODP is 
only available starting v3.19-rc1, my proposed fix might be applicable 
for older kernel (if not better).

From 2a3beaeb4b35d968f127cb59cfda2f12dbd87e4b Mon Sep 17 00:00:00 2001
From: Yann Droneaud <ydroneaud@opteya.com>
Date: Thu, 2 Apr 2015 17:01:05 +0200
Subject: [RFC PATCH] IB/core: reject invalid userspace memory range registration

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 drivers/infiniband/core/umem.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index df0c4f605a21..6758b4ac56eb 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -90,6 +90,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	DEFINE_DMA_ATTRS(attrs);
 	struct scatterlist *sg, *sg_list_start;
 	int need_release = 0;
+	bool writable;
 
 	if (dmasync)
 		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
@@ -97,6 +98,19 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	if (!can_do_mlock())
 		return ERR_PTR(-EPERM);
 
+	/*
+	 * We ask for writable memory if any access flags other than
+	 * "remote read" are set.  "Local write" and "remote write"
+	 * obviously require write access.  "Remote atomic" can do
+	 * things like fetch and add, which will modify memory, and
+	 * "MW bind" can change permissions by binding a window.
+	 */
+	writable  = !!(access & ~IB_ACCESS_REMOTE_READ);
+
+	if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
+		       (void __user *)addr, size))
+		return ERR_PTR(-EFAULT);
+
 	umem = kzalloc(sizeof *umem, GFP_KERNEL);
 	if (!umem)
 		return ERR_PTR(-ENOMEM);
@@ -106,14 +120,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	umem->offset    = addr & ~PAGE_MASK;
 	umem->page_size = PAGE_SIZE;
 	umem->pid       = get_task_pid(current, PIDTYPE_PID);
-	/*
-	 * We ask for writable memory if any access flags other than
-	 * "remote read" are set.  "Local write" and "remote write"
-	 * obviously require write access.  "Remote atomic" can do
-	 * things like fetch and add, which will modify memory, and
-	 * "MW bind" can change permissions by binding a window.
-	 */
-	umem->writable  = !!(access & ~IB_ACCESS_REMOTE_READ);
+	umem->writable  = writable;
 
 	/* We assume the memory is from hugetlb until proved otherwise */
 	umem->hugetlb   = 1;
-- 
2.1.0

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 15:15         ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 15:15 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,
Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 1:05 PM
> > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
...
> > > +	/*
> > > +	 * If the combination of the addr and size requested for this
> > memory
> > > +	 * region causes an integer overflow, return error.
> > > +	 */
> > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > 
> > Can access_ok() be used here ?
> > 
> >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >                         addr, size))
> >                   return ERR_PTR(-EINVAL);
> > 
> 
> No, this will break the current ODP semantics.
> 
> ODP allows the user to register memory that is not accessible yet.
> This is a critical design feature, as it allows avoiding holding
> a registration cache. Adding this check will break the behavior,
> forcing memory to be all accessible when registering an ODP MR.
> 

Failed to notice previously, but since this would break ODP, and ODP is 
only available starting v3.19-rc1, my proposed fix might be applicable 
for older kernel (if not better).

>From 2a3beaeb4b35d968f127cb59cfda2f12dbd87e4b Mon Sep 17 00:00:00 2001
From: Yann Droneaud <ydroneaud@opteya.com>
Date: Thu, 2 Apr 2015 17:01:05 +0200
Subject: [RFC PATCH] IB/core: reject invalid userspace memory range registration

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 drivers/infiniband/core/umem.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index df0c4f605a21..6758b4ac56eb 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -90,6 +90,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	DEFINE_DMA_ATTRS(attrs);
 	struct scatterlist *sg, *sg_list_start;
 	int need_release = 0;
+	bool writable;
 
 	if (dmasync)
 		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
@@ -97,6 +98,19 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	if (!can_do_mlock())
 		return ERR_PTR(-EPERM);
 
+	/*
+	 * We ask for writable memory if any access flags other than
+	 * "remote read" are set.  "Local write" and "remote write"
+	 * obviously require write access.  "Remote atomic" can do
+	 * things like fetch and add, which will modify memory, and
+	 * "MW bind" can change permissions by binding a window.
+	 */
+	writable  = !!(access & ~IB_ACCESS_REMOTE_READ);
+
+	if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
+		       (void __user *)addr, size))
+		return ERR_PTR(-EFAULT);
+
 	umem = kzalloc(sizeof *umem, GFP_KERNEL);
 	if (!umem)
 		return ERR_PTR(-ENOMEM);
@@ -106,14 +120,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	umem->offset    = addr & ~PAGE_MASK;
 	umem->page_size = PAGE_SIZE;
 	umem->pid       = get_task_pid(current, PIDTYPE_PID);
-	/*
-	 * We ask for writable memory if any access flags other than
-	 * "remote read" are set.  "Local write" and "remote write"
-	 * obviously require write access.  "Remote atomic" can do
-	 * things like fetch and add, which will modify memory, and
-	 * "MW bind" can change permissions by binding a window.
-	 */
-	umem->writable  = !!(access & ~IB_ACCESS_REMOTE_READ);
+	umem->writable  = writable;
 
 	/* We assume the memory is from hugetlb until proved otherwise */
 	umem->hugetlb   = 1;
-- 
2.1.0

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 15:15         ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 15:15 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,
Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 1:05 PM
> > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
...
> > > +	/*
> > > +	 * If the combination of the addr and size requested for this
> > memory
> > > +	 * region causes an integer overflow, return error.
> > > +	 */
> > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > 
> > Can access_ok() be used here ?
> > 
> >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >                         addr, size))
> >                   return ERR_PTR(-EINVAL);
> > 
> 
> No, this will break the current ODP semantics.
> 
> ODP allows the user to register memory that is not accessible yet.
> This is a critical design feature, as it allows avoiding holding
> a registration cache. Adding this check will break the behavior,
> forcing memory to be all accessible when registering an ODP MR.
> 

Failed to notice previously, but since this would break ODP, and ODP is 
only available starting v3.19-rc1, my proposed fix might be applicable 
for older kernel (if not better).

>From 2a3beaeb4b35d968f127cb59cfda2f12dbd87e4b Mon Sep 17 00:00:00 2001
From: Yann Droneaud <ydroneaud@opteya.com>
Date: Thu, 2 Apr 2015 17:01:05 +0200
Subject: [RFC PATCH] IB/core: reject invalid userspace memory range registration

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
---
 drivers/infiniband/core/umem.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index df0c4f605a21..6758b4ac56eb 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -90,6 +90,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	DEFINE_DMA_ATTRS(attrs);
 	struct scatterlist *sg, *sg_list_start;
 	int need_release = 0;
+	bool writable;
 
 	if (dmasync)
 		dma_set_attr(DMA_ATTR_WRITE_BARRIER, &attrs);
@@ -97,6 +98,19 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	if (!can_do_mlock())
 		return ERR_PTR(-EPERM);
 
+	/*
+	 * We ask for writable memory if any access flags other than
+	 * "remote read" are set.  "Local write" and "remote write"
+	 * obviously require write access.  "Remote atomic" can do
+	 * things like fetch and add, which will modify memory, and
+	 * "MW bind" can change permissions by binding a window.
+	 */
+	writable  = !!(access & ~IB_ACCESS_REMOTE_READ);
+
+	if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
+		       (void __user *)addr, size))
+		return ERR_PTR(-EFAULT);
+
 	umem = kzalloc(sizeof *umem, GFP_KERNEL);
 	if (!umem)
 		return ERR_PTR(-ENOMEM);
@@ -106,14 +120,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
 	umem->offset    = addr & ~PAGE_MASK;
 	umem->page_size = PAGE_SIZE;
 	umem->pid       = get_task_pid(current, PIDTYPE_PID);
-	/*
-	 * We ask for writable memory if any access flags other than
-	 * "remote read" are set.  "Local write" and "remote write"
-	 * obviously require write access.  "Remote atomic" can do
-	 * things like fetch and add, which will modify memory, and
-	 * "MW bind" can change permissions by binding a window.
-	 */
-	umem->writable  = !!(access & ~IB_ACCESS_REMOTE_READ);
+	umem->writable  = writable;
 
 	/* We assume the memory is from hugetlb until proved otherwise */
 	umem->hugetlb   = 1;
-- 
2.1.0

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 13:30           ` Yann Droneaud
@ 2015-04-02 15:18             ` Haggai Eran
  -1 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-02 15:18 UTC (permalink / raw)
  To: Yann Droneaud, Shachar Raindel, Sagi Grimberg
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

On 02/04/2015 16:30, Yann Droneaud wrote:
> Hi,
> 
> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
>>> -----Original Message-----
>>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
>>> Sent: Thursday, April 02, 2015 1:05 PM
>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> 
>>>> +	/*
>>>> +	 * If the combination of the addr and size requested for this
>>> memory
>>>> +	 * region causes an integer overflow, return error.
>>>> +	 */
>>>> +	if ((PAGE_ALIGN(addr + size) <= size) ||
>>>> +	    (PAGE_ALIGN(addr + size) <= addr))
>>>> +		return ERR_PTR(-EINVAL);
>>>> +
>>>
>>> Can access_ok() be used here ?
>>>
>>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>>>                         addr, size))
>>>                   return ERR_PTR(-EINVAL);
>>>
>>
>> No, this will break the current ODP semantics.
>>
>> ODP allows the user to register memory that is not accessible yet.
>> This is a critical design feature, as it allows avoiding holding
>> a registration cache. Adding this check will break the behavior,
>> forcing memory to be all accessible when registering an ODP MR.
>>
> 
> Where's the check for the range being in userspace memory space,
> especially for the ODP case ?
> 
> For non ODP case (eg. plain old behavior), does get_user_pages()
> ensure the requested pages fit in userspace region on all 
> architectures ? I think so.

Yes, get_user_pages will return a smaller amount of pages than requested
if it encounters an unmapped region (or a region without write
permissions for write requests). If this happens, the loop in
ib_umem_get calls get_user_pages again with the next set of pages, and
this time if it the first page still cannot be mapped an error is returned.

> 
> In ODP case, I'm not sure such check is ever done ?

In ODP, we also call get_user_pages, but only when a page fault occurs
(see ib_umem_odp_map_dma_pages()). This allows the user to pre-register
a memory region that contains unmapped virtual space, and then mmap
different files into that area without needing to re-register.

> (Aside, does it take special mesure to protect shared mapping from
> being read and/or *written* ?)

I'm not sure I understand the question. Shared mappings that the process
is allowed to read or write are also allowed for the HCA (specifically,
to local and remote operations the same process performs using the HCA),
provided the application has registered their virtual address space as a
memory region.

Regards,
Haggai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 15:18             ` Haggai Eran
  0 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-02 15:18 UTC (permalink / raw)
  To: Yann Droneaud, Shachar Raindel, Sagi Grimberg
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

On 02/04/2015 16:30, Yann Droneaud wrote:
> Hi,
> 
> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
>>> -----Original Message-----
>>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
>>> Sent: Thursday, April 02, 2015 1:05 PM
>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> 
>>>> +	/*
>>>> +	 * If the combination of the addr and size requested for this
>>> memory
>>>> +	 * region causes an integer overflow, return error.
>>>> +	 */
>>>> +	if ((PAGE_ALIGN(addr + size) <= size) ||
>>>> +	    (PAGE_ALIGN(addr + size) <= addr))
>>>> +		return ERR_PTR(-EINVAL);
>>>> +
>>>
>>> Can access_ok() be used here ?
>>>
>>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>>>                         addr, size))
>>>                   return ERR_PTR(-EINVAL);
>>>
>>
>> No, this will break the current ODP semantics.
>>
>> ODP allows the user to register memory that is not accessible yet.
>> This is a critical design feature, as it allows avoiding holding
>> a registration cache. Adding this check will break the behavior,
>> forcing memory to be all accessible when registering an ODP MR.
>>
> 
> Where's the check for the range being in userspace memory space,
> especially for the ODP case ?
> 
> For non ODP case (eg. plain old behavior), does get_user_pages()
> ensure the requested pages fit in userspace region on all 
> architectures ? I think so.

Yes, get_user_pages will return a smaller amount of pages than requested
if it encounters an unmapped region (or a region without write
permissions for write requests). If this happens, the loop in
ib_umem_get calls get_user_pages again with the next set of pages, and
this time if it the first page still cannot be mapped an error is returned.

> 
> In ODP case, I'm not sure such check is ever done ?

In ODP, we also call get_user_pages, but only when a page fault occurs
(see ib_umem_odp_map_dma_pages()). This allows the user to pre-register
a memory region that contains unmapped virtual space, and then mmap
different files into that area without needing to re-register.

> (Aside, does it take special mesure to protect shared mapping from
> being read and/or *written* ?)

I'm not sure I understand the question. Shared mappings that the process
is allowed to read or write are also allowed for the HCA (specifically,
to local and remote operations the same process performs using the HCA),
provided the application has registered their virtual address space as a
memory region.

Regards,
Haggai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02  7:52     ` Shachar Raindel
@ 2015-04-02 16:32       ` Roland Dreier
  2015-04-02 16:39         ` Shachar Raindel
  0 siblings, 1 reply; 45+ messages in thread
From: Roland Dreier @ 2015-04-02 16:32 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	stable

On Thu, Apr 2, 2015 at 12:52 AM, Shachar Raindel <raindel@mellanox.com> wrote:
> This is a common practice in the security industry, called
> "responsible disclosure."
>
> Following the kernel  security bugs policy [1], we reported it to
> the kernel security contacts few days before making the issue public.
> Few days after issue became public, we published a clear report to all
> of the relevant mailing lists.

Isn't the point of responsible disclosure to delay disclosure until a
fix is in place?  What's the point of sending a notification to the
kernel security team if you're going to disclose publicly before the
upstream kernel is fixed?

 - R.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 15:15         ` Yann Droneaud
@ 2015-04-02 16:34           ` Shachar Raindel
  -1 siblings, 0 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02 16:34 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Thursday, April 02, 2015 6:16 PM
> To: Shachar Raindel
> Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org>
> (linux-rdma@vger.kernel.org); linux-kernel@vger.kernel.org;
> stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> Hi,
> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > -----Original Message-----
> > > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > Sent: Thursday, April 02, 2015 1:05 PM
> > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> ...
> > > > +	/*
> > > > +	 * If the combination of the addr and size requested for this
> > > memory
> > > > +	 * region causes an integer overflow, return error.
> > > > +	 */
> > > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > > +		return ERR_PTR(-EINVAL);
> > > > +
> > >
> > > Can access_ok() be used here ?
> > >
> > >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > >                         addr, size))
> > >                   return ERR_PTR(-EINVAL);
> > >
> >
> > No, this will break the current ODP semantics.
> >
> > ODP allows the user to register memory that is not accessible yet.
> > This is a critical design feature, as it allows avoiding holding
> > a registration cache. Adding this check will break the behavior,
> > forcing memory to be all accessible when registering an ODP MR.
> >
> 
> Failed to notice previously, but since this would break ODP, and ODP is
> only available starting v3.19-rc1, my proposed fix might be applicable
> for older kernel (if not better).
> 

Can you explain how this proposed fix is better than the existing patch?
Why do we want to push to the stable tree a patch that is not in the
upstream? There is an existing, tested, patch that is going to the tip
of the development. It even applies cleanly on every kernel version around.

Thanks,
--Shachar

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 16:34           ` Shachar Raindel
  0 siblings, 0 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02 16:34 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2223 bytes --]

Hi,

> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Thursday, April 02, 2015 6:16 PM
> To: Shachar Raindel
> Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org>
> (linux-rdma@vger.kernel.org); linux-kernel@vger.kernel.org;
> stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> Hi,
> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > -----Original Message-----
> > > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > Sent: Thursday, April 02, 2015 1:05 PM
> > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> ...
> > > > +	/*
> > > > +	 * If the combination of the addr and size requested for this
> > > memory
> > > > +	 * region causes an integer overflow, return error.
> > > > +	 */
> > > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > > +		return ERR_PTR(-EINVAL);
> > > > +
> > >
> > > Can access_ok() be used here ?
> > >
> > >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > >                         addr, size))
> > >                   return ERR_PTR(-EINVAL);
> > >
> >
> > No, this will break the current ODP semantics.
> >
> > ODP allows the user to register memory that is not accessible yet.
> > This is a critical design feature, as it allows avoiding holding
> > a registration cache. Adding this check will break the behavior,
> > forcing memory to be all accessible when registering an ODP MR.
> >
> 
> Failed to notice previously, but since this would break ODP, and ODP is
> only available starting v3.19-rc1, my proposed fix might be applicable
> for older kernel (if not better).
> 

Can you explain how this proposed fix is better than the existing patch?
Why do we want to push to the stable tree a patch that is not in the
upstream? There is an existing, tested, patch that is going to the tip
of the development. It even applies cleanly on every kernel version around.

Thanks,
--Shachar
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 15:18             ` Haggai Eran
@ 2015-04-02 16:35               ` Yann Droneaud
  -1 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 16:35 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Shachar Raindel, Sagi Grimberg, oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi Haggai,

Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> On 02/04/2015 16:30, Yann Droneaud wrote:
> > Hi,
> > 
> > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> >>> -----Original Message-----
> >>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> >>> Sent: Thursday, April 02, 2015 1:05 PM
> >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > 
> >>>> +	/*
> >>>> +	 * If the combination of the addr and size requested for this
> >>> memory
> >>>> +	 * region causes an integer overflow, return error.
> >>>> +	 */
> >>>> +	if ((PAGE_ALIGN(addr + size) <= size) ||
> >>>> +	    (PAGE_ALIGN(addr + size) <= addr))
> >>>> +		return ERR_PTR(-EINVAL);
> >>>> +
> >>>
> >>> Can access_ok() be used here ?
> >>>
> >>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >>>                         addr, size))
> >>>                   return ERR_PTR(-EINVAL);
> >>>
> >>
> >> No, this will break the current ODP semantics.
> >>
> >> ODP allows the user to register memory that is not accessible yet.
> >> This is a critical design feature, as it allows avoiding holding
> >> a registration cache. Adding this check will break the behavior,
> >> forcing memory to be all accessible when registering an ODP MR.
> >>
> > 
> > Where's the check for the range being in userspace memory space,
> > especially for the ODP case ?
> > 
> > For non ODP case (eg. plain old behavior), does get_user_pages()
> > ensure the requested pages fit in userspace region on all 
> > architectures ? I think so.
> 
> Yes, get_user_pages will return a smaller amount of pages than requested
> if it encounters an unmapped region (or a region without write
> permissions for write requests). If this happens, the loop in
> ib_umem_get calls get_user_pages again with the next set of pages, and
> this time if it the first page still cannot be mapped an error is returned.
> 
> > 
> > In ODP case, I'm not sure such check is ever done ?
> 
> In ODP, we also call get_user_pages, but only when a page fault occurs
> (see ib_umem_odp_map_dma_pages()). This allows the user to pre-register
> a memory region that contains unmapped virtual space, and then mmap
> different files into that area without needing to re-register.
> 

OK, thanks for the description.

> > (Aside, does it take special mesure to protect shared mapping from
> > being read and/or *written* ?)
> 
> I'm not sure I understand the question. Shared mappings that the process
> is allowed to read or write are also allowed for the HCA (specifically,
> to local and remote operations the same process performs using the HCA),
> provided the application has registered their virtual address space as a
> memory region.
> 

I was refering to description of get_user_pages():

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/gup.c?id=v4.0-rc6#n765

 * @force:	whether to force access even when user mapping is currently
 *		protected (but never forces write access to shared mapping).

But since ib_umem_odp_map_dma_pages() use get_user_pages() with force
argument set to 0, it's OK.

Another related question: as the large memory range could be registered 
by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), 
what's prevent the kernel to map a file as the result of mmap(0, ...)
in this  region, making it available remotely through IBV_WR_RDMA_READ /
IBV_WR_RDMA_WRITE ?

Again, thanks for the information I was missing to understand how ODP is
checking the memory ranges.

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 16:35               ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 16:35 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Shachar Raindel, Sagi Grimberg, oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi Haggai,

Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> On 02/04/2015 16:30, Yann Droneaud wrote:
> > Hi,
> > 
> > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> >>> -----Original Message-----
> >>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> >>> Sent: Thursday, April 02, 2015 1:05 PM
> >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > 
> >>>> +	/*
> >>>> +	 * If the combination of the addr and size requested for this
> >>> memory
> >>>> +	 * region causes an integer overflow, return error.
> >>>> +	 */
> >>>> +	if ((PAGE_ALIGN(addr + size) <= size) ||
> >>>> +	    (PAGE_ALIGN(addr + size) <= addr))
> >>>> +		return ERR_PTR(-EINVAL);
> >>>> +
> >>>
> >>> Can access_ok() be used here ?
> >>>
> >>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >>>                         addr, size))
> >>>                   return ERR_PTR(-EINVAL);
> >>>
> >>
> >> No, this will break the current ODP semantics.
> >>
> >> ODP allows the user to register memory that is not accessible yet.
> >> This is a critical design feature, as it allows avoiding holding
> >> a registration cache. Adding this check will break the behavior,
> >> forcing memory to be all accessible when registering an ODP MR.
> >>
> > 
> > Where's the check for the range being in userspace memory space,
> > especially for the ODP case ?
> > 
> > For non ODP case (eg. plain old behavior), does get_user_pages()
> > ensure the requested pages fit in userspace region on all 
> > architectures ? I think so.
> 
> Yes, get_user_pages will return a smaller amount of pages than requested
> if it encounters an unmapped region (or a region without write
> permissions for write requests). If this happens, the loop in
> ib_umem_get calls get_user_pages again with the next set of pages, and
> this time if it the first page still cannot be mapped an error is returned.
> 
> > 
> > In ODP case, I'm not sure such check is ever done ?
> 
> In ODP, we also call get_user_pages, but only when a page fault occurs
> (see ib_umem_odp_map_dma_pages()). This allows the user to pre-register
> a memory region that contains unmapped virtual space, and then mmap
> different files into that area without needing to re-register.
> 

OK, thanks for the description.

> > (Aside, does it take special mesure to protect shared mapping from
> > being read and/or *written* ?)
> 
> I'm not sure I understand the question. Shared mappings that the process
> is allowed to read or write are also allowed for the HCA (specifically,
> to local and remote operations the same process performs using the HCA),
> provided the application has registered their virtual address space as a
> memory region.
> 

I was refering to description of get_user_pages():

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/gup.c?id=v4.0-rc6#n765

 * @force:	whether to force access even when user mapping is currently
 *		protected (but never forces write access to shared mapping).

But since ib_umem_odp_map_dma_pages() use get_user_pages() with force
argument set to 0, it's OK.

Another related question: as the large memory range could be registered 
by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND), 
what's prevent the kernel to map a file as the result of mmap(0, ...)
in this  region, making it available remotely through IBV_WR_RDMA_READ /
IBV_WR_RDMA_WRITE ?

Again, thanks for the information I was missing to understand how ODP is
checking the memory ranges.

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 16:32       ` Roland Dreier
@ 2015-04-02 16:39         ` Shachar Raindel
  0 siblings, 0 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02 16:39 UTC (permalink / raw)
  To: Roland Dreier
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	stable



> -----Original Message-----
> From: Roland Dreier [mailto:roland@purestorage.com]
> Sent: Thursday, April 02, 2015 7:33 PM
> To: Shachar Raindel
> Cc: oss-security@lists.openwall.com; <linux-rdma@vger.kernel.org>
> (linux-rdma@vger.kernel.org); stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> On Thu, Apr 2, 2015 at 12:52 AM, Shachar Raindel <raindel@mellanox.com>
> wrote:
> > This is a common practice in the security industry, called
> > "responsible disclosure."
> >
> > Following the kernel  security bugs policy [1], we reported it to
> > the kernel security contacts few days before making the issue public.
> > Few days after issue became public, we published a clear report to all
> > of the relevant mailing lists.
> 
> Isn't the point of responsible disclosure to delay disclosure until a
> fix is in place?  What's the point of sending a notification to the
> kernel security team if you're going to disclose publicly before the
> upstream kernel is fixed?
> 

We delayed the disclosure until most major Linux vendors released a fix for
the issue, give or take in synchronization.

The Linux security contact list only guarantee secrecy for 7 days. We
therefore contacted them only close to the date at which fixes were going to
be released, to follow their expectations for period of time between contact
and public disclosure.

Thanks,
--Shachar

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 16:35               ` Yann Droneaud
@ 2015-04-02 16:44                 ` Shachar Raindel
  -1 siblings, 0 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02 16:44 UTC (permalink / raw)
  To: Yann Droneaud, Haggai Eran
  Cc: Sagi Grimberg, oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable



> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Thursday, April 02, 2015 7:35 PM
> To: Haggai Eran
> Cc: Shachar Raindel; Sagi Grimberg; oss-security@lists.openwall.com;
> <linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org); linux-
> kernel@vger.kernel.org; stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> Hi Haggai,
> 
> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> > On 02/04/2015 16:30, Yann Droneaud wrote:
> > > Hi,
> > >
> > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > >>> -----Original Message-----
> > >>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > >>> Sent: Thursday, April 02, 2015 1:05 PM
> > >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > >
> > >>>> +	/*
> > >>>> +	 * If the combination of the addr and size requested for this
> > >>> memory
> > >>>> +	 * region causes an integer overflow, return error.
> > >>>> +	 */
> > >>>> +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > >>>> +	    (PAGE_ALIGN(addr + size) <= addr))
> > >>>> +		return ERR_PTR(-EINVAL);
> > >>>> +
> > >>>
> > >>> Can access_ok() be used here ?
> > >>>
> > >>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > >>>                         addr, size))
> > >>>                   return ERR_PTR(-EINVAL);
> > >>>
> > >>
> > >> No, this will break the current ODP semantics.
> > >>
> > >> ODP allows the user to register memory that is not accessible yet.
> > >> This is a critical design feature, as it allows avoiding holding
> > >> a registration cache. Adding this check will break the behavior,
> > >> forcing memory to be all accessible when registering an ODP MR.
> > >>
> > >
> > > Where's the check for the range being in userspace memory space,
> > > especially for the ODP case ?
> > >
> > > For non ODP case (eg. plain old behavior), does get_user_pages()
> > > ensure the requested pages fit in userspace region on all
> > > architectures ? I think so.
> >
> > Yes, get_user_pages will return a smaller amount of pages than
> requested
> > if it encounters an unmapped region (or a region without write
> > permissions for write requests). If this happens, the loop in
> > ib_umem_get calls get_user_pages again with the next set of pages, and
> > this time if it the first page still cannot be mapped an error is
> returned.
> >
> > >
> > > In ODP case, I'm not sure such check is ever done ?
> >
> > In ODP, we also call get_user_pages, but only when a page fault occurs
> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
> register
> > a memory region that contains unmapped virtual space, and then mmap
> > different files into that area without needing to re-register.
> >
> 
> OK, thanks for the description.
> 
> > > (Aside, does it take special mesure to protect shared mapping from
> > > being read and/or *written* ?)
> >
> > I'm not sure I understand the question. Shared mappings that the
> process
> > is allowed to read or write are also allowed for the HCA
> (specifically,
> > to local and remote operations the same process performs using the
> HCA),
> > provided the application has registered their virtual address space as
> a
> > memory region.
> >
> 
> I was refering to description of get_user_pages():
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/g
> up.c?id=v4.0-rc6#n765
> 
>  * @force:	whether to force access even when user mapping is currently
>  *		protected (but never forces write access to shared mapping).
> 
> But since ib_umem_odp_map_dma_pages() use get_user_pages() with force
> argument set to 0, it's OK.
> 
> Another related question: as the large memory range could be registered
> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> what's prevent the kernel to map a file as the result of mmap(0, ...)
> in this  region, making it available remotely through IBV_WR_RDMA_READ /
> IBV_WR_RDMA_WRITE ?
> 

This is not a bug. This is a feature.

Exposing a file through RDMA, using ODP, can be done exactly like this.
Given that the application explicitly requested this behavior, I don't
see why it is a problem. Actually, some of our tests use such flows.
The mmu notifiers mechanism allow us to do this safely. When the page is
written back to disk, it is removed from the ODP mapping. When it is
accessed by the HCA, it is brought back to RAM.


Thanks,
--Shachar

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 16:44                 ` Shachar Raindel
  0 siblings, 0 replies; 45+ messages in thread
From: Shachar Raindel @ 2015-04-02 16:44 UTC (permalink / raw)
  To: Yann Droneaud, Haggai Eran
  Cc: Sagi Grimberg, oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4656 bytes --]



> -----Original Message-----
> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> Sent: Thursday, April 02, 2015 7:35 PM
> To: Haggai Eran
> Cc: Shachar Raindel; Sagi Grimberg; oss-security@lists.openwall.com;
> <linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org); linux-
> kernel@vger.kernel.org; stable@vger.kernel.org
> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
> physical memory access
> 
> Hi Haggai,
> 
> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> > On 02/04/2015 16:30, Yann Droneaud wrote:
> > > Hi,
> > >
> > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > >>> -----Original Message-----
> > >>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > >>> Sent: Thursday, April 02, 2015 1:05 PM
> > >>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > >
> > >>>> +	/*
> > >>>> +	 * If the combination of the addr and size requested for this
> > >>> memory
> > >>>> +	 * region causes an integer overflow, return error.
> > >>>> +	 */
> > >>>> +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > >>>> +	    (PAGE_ALIGN(addr + size) <= addr))
> > >>>> +		return ERR_PTR(-EINVAL);
> > >>>> +
> > >>>
> > >>> Can access_ok() be used here ?
> > >>>
> > >>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > >>>                         addr, size))
> > >>>                   return ERR_PTR(-EINVAL);
> > >>>
> > >>
> > >> No, this will break the current ODP semantics.
> > >>
> > >> ODP allows the user to register memory that is not accessible yet.
> > >> This is a critical design feature, as it allows avoiding holding
> > >> a registration cache. Adding this check will break the behavior,
> > >> forcing memory to be all accessible when registering an ODP MR.
> > >>
> > >
> > > Where's the check for the range being in userspace memory space,
> > > especially for the ODP case ?
> > >
> > > For non ODP case (eg. plain old behavior), does get_user_pages()
> > > ensure the requested pages fit in userspace region on all
> > > architectures ? I think so.
> >
> > Yes, get_user_pages will return a smaller amount of pages than
> requested
> > if it encounters an unmapped region (or a region without write
> > permissions for write requests). If this happens, the loop in
> > ib_umem_get calls get_user_pages again with the next set of pages, and
> > this time if it the first page still cannot be mapped an error is
> returned.
> >
> > >
> > > In ODP case, I'm not sure such check is ever done ?
> >
> > In ODP, we also call get_user_pages, but only when a page fault occurs
> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
> register
> > a memory region that contains unmapped virtual space, and then mmap
> > different files into that area without needing to re-register.
> >
> 
> OK, thanks for the description.
> 
> > > (Aside, does it take special mesure to protect shared mapping from
> > > being read and/or *written* ?)
> >
> > I'm not sure I understand the question. Shared mappings that the
> process
> > is allowed to read or write are also allowed for the HCA
> (specifically,
> > to local and remote operations the same process performs using the
> HCA),
> > provided the application has registered their virtual address space as
> a
> > memory region.
> >
> 
> I was refering to description of get_user_pages():
> 
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/mm/g
> up.c?id=v4.0-rc6#n765
> 
>  * @force:	whether to force access even when user mapping is currently
>  *		protected (but never forces write access to shared mapping).
> 
> But since ib_umem_odp_map_dma_pages() use get_user_pages() with force
> argument set to 0, it's OK.
> 
> Another related question: as the large memory range could be registered
> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> what's prevent the kernel to map a file as the result of mmap(0, ...)
> in this  region, making it available remotely through IBV_WR_RDMA_READ /
> IBV_WR_RDMA_WRITE ?
> 

This is not a bug. This is a feature.

Exposing a file through RDMA, using ODP, can be done exactly like this.
Given that the application explicitly requested this behavior, I don't
see why it is a problem. Actually, some of our tests use such flows.
The mmu notifiers mechanism allow us to do this safely. When the page is
written back to disk, it is removed from the ODP mapping. When it is
accessed by the HCA, it is brought back to RAM.


Thanks,
--Shachar
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 16:44                 ` Shachar Raindel
  (?)
@ 2015-04-02 18:12                     ` Haggai Eran
  -1 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-02 18:12 UTC (permalink / raw)
  To: Shachar Raindel, Yann Droneaud
  Cc: Sagi Grimberg, oss-security-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA


On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote:
>> -----Original Message-----
>> From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
>> Sent: Thursday, April 02, 2015 7:35 PM
>> To: Haggai Eran
>> Cc: Shachar Raindel; Sagi Grimberg; oss-security-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public.gmane.org;
>> <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> (linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org); linux-
>> kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
>> physical memory access
>>
>> Hi Haggai,
>>
>> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
>> > On 02/04/2015 16:30, Yann Droneaud wrote:
>> >> Hi,
>> >>
>> >> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
>> >>>> -----Original Message-----
>> >>>> From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
>> >>>> Sent: Thursday, April 02, 2015 1:05 PM
>> >>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
>> >>
>> >>>>> +      /*
>> >>>>> +       * If the combination of the addr and size requested for this
>> >>>> memory
>> >>>>> +       * region causes an integer overflow, return error.
>> >>>>> +       */
>> >>>>> +      if ((PAGE_ALIGN(addr + size) <= size) ||
>> >>>>> +          (PAGE_ALIGN(addr + size) <= addr))
>> >>>>> +              return ERR_PTR(-EINVAL);
>> >>>>> +
>> >>>>
>> >>>> Can access_ok() be used here ?
>> >>>>
>> >>>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>> >>>>                         addr, size))
>> >>>>                   return ERR_PTR(-EINVAL);
>> >>>>
>> >>>
>> >>> No, this will break the current ODP semantics.
>> >>>
>> >>> ODP allows the user to register memory that is not accessible yet.
>> >>> This is a critical design feature, as it allows avoiding holding
>> >>> a registration cache. Adding this check will break the behavior,
>> >>> forcing memory to be all accessible when registering an ODP MR.
>> >>>
>> >>
>> >> Where's the check for the range being in userspace memory space,
>> >> especially for the ODP case ?
>> >>
>> >> For non ODP case (eg. plain old behavior), does get_user_pages()
>> >> ensure the requested pages fit in userspace region on all
>> >> architectures ? I think so.
>> >
>> > Yes, get_user_pages will return a smaller amount of pages than
>> requested
>> > if it encounters an unmapped region (or a region without write
>> > permissions for write requests). If this happens, the loop in
>> > ib_umem_get calls get_user_pages again with the next set of pages, and
>> > this time if it the first page still cannot be mapped an error is
>> returned.
>> >
>> >>
>> >> In ODP case, I'm not sure such check is ever done ?
>> >
>> > In ODP, we also call get_user_pages, but only when a page fault occurs
>> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
>> register
>> > a memory region that contains unmapped virtual space, and then mmap
>> > different files into that area without needing to re-register.
>> >
>>
>> OK, thanks for the description.
>>
>> ...
>>
>> Another related question: as the large memory range could be registered
>> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
>> what's prevent the kernel to map a file as the result of mmap(0, ...)
>> in this  region, making it available remotely through IBV_WR_RDMA_READ /
>> IBV_WR_RDMA_WRITE ?
>>
> 
> This is not a bug. This is a feature.
> 
> Exposing a file through RDMA, using ODP, can be done exactly like this.
> Given that the application explicitly requested this behavior, I don't
> see why it is a problem. Actually, some of our tests use such flows.
> The mmu notifiers mechanism allow us to do this safely. When the page is
> written back to disk, it is removed from the ODP mapping. When it is
> accessed by the HCA, it is brought back to RAM.
> 

I want to add that we would like to see users registering a very large memory region (perhaps the entire process address space) for local access, and then enabling remote access only to specific regions using memory windows. However, this isn't supported yet by our driver. Still, there are valid cases where you would still want the results of an mmap(0,...) call to be remotely accessible, in cases where there is enough trust between the local process and the remote process. It may help a middleware communication library register a large portion of the address space in advance, and still work with random pointers given to it by another application module.

Regards,
Haggai--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 18:12                     ` Haggai Eran
  0 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-02 18:12 UTC (permalink / raw)
  To: Shachar Raindel, Yann Droneaud
  Cc: Sagi Grimberg, oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable


On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote:
>> -----Original Message-----
>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
>> Sent: Thursday, April 02, 2015 7:35 PM
>> To: Haggai Eran
>> Cc: Shachar Raindel; Sagi Grimberg; oss-security@lists.openwall.com;
>> <linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org); linux-
>> kernel@vger.kernel.org; stable@vger.kernel.org
>> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
>> physical memory access
>>
>> Hi Haggai,
>>
>> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
>> > On 02/04/2015 16:30, Yann Droneaud wrote:
>> >> Hi,
>> >>
>> >> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
>> >>>> -----Original Message-----
>> >>>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
>> >>>> Sent: Thursday, April 02, 2015 1:05 PM
>> >>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
>> >>
>> >>>>> +      /*
>> >>>>> +       * If the combination of the addr and size requested for this
>> >>>> memory
>> >>>>> +       * region causes an integer overflow, return error.
>> >>>>> +       */
>> >>>>> +      if ((PAGE_ALIGN(addr + size) <= size) ||
>> >>>>> +          (PAGE_ALIGN(addr + size) <= addr))
>> >>>>> +              return ERR_PTR(-EINVAL);
>> >>>>> +
>> >>>>
>> >>>> Can access_ok() be used here ?
>> >>>>
>> >>>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>> >>>>                         addr, size))
>> >>>>                   return ERR_PTR(-EINVAL);
>> >>>>
>> >>>
>> >>> No, this will break the current ODP semantics.
>> >>>
>> >>> ODP allows the user to register memory that is not accessible yet.
>> >>> This is a critical design feature, as it allows avoiding holding
>> >>> a registration cache. Adding this check will break the behavior,
>> >>> forcing memory to be all accessible when registering an ODP MR.
>> >>>
>> >>
>> >> Where's the check for the range being in userspace memory space,
>> >> especially for the ODP case ?
>> >>
>> >> For non ODP case (eg. plain old behavior), does get_user_pages()
>> >> ensure the requested pages fit in userspace region on all
>> >> architectures ? I think so.
>> >
>> > Yes, get_user_pages will return a smaller amount of pages than
>> requested
>> > if it encounters an unmapped region (or a region without write
>> > permissions for write requests). If this happens, the loop in
>> > ib_umem_get calls get_user_pages again with the next set of pages, and
>> > this time if it the first page still cannot be mapped an error is
>> returned.
>> >
>> >>
>> >> In ODP case, I'm not sure such check is ever done ?
>> >
>> > In ODP, we also call get_user_pages, but only when a page fault occurs
>> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
>> register
>> > a memory region that contains unmapped virtual space, and then mmap
>> > different files into that area without needing to re-register.
>> >
>>
>> OK, thanks for the description.
>>
>> ...
>>
>> Another related question: as the large memory range could be registered
>> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
>> what's prevent the kernel to map a file as the result of mmap(0, ...)
>> in this  region, making it available remotely through IBV_WR_RDMA_READ /
>> IBV_WR_RDMA_WRITE ?
>>
> 
> This is not a bug. This is a feature.
> 
> Exposing a file through RDMA, using ODP, can be done exactly like this.
> Given that the application explicitly requested this behavior, I don't
> see why it is a problem. Actually, some of our tests use such flows.
> The mmu notifiers mechanism allow us to do this safely. When the page is
> written back to disk, it is removed from the ODP mapping. When it is
> accessed by the HCA, it is brought back to RAM.
> 

I want to add that we would like to see users registering a very large memory region (perhaps the entire process address space) for local access, and then enabling remote access only to specific regions using memory windows. However, this isn't supported yet by our driver. Still, there are valid cases where you would still want the results of an mmap(0,...) call to be remotely accessible, in cases where there is enough trust between the local process and the remote process. It may help a middleware communication library register a large portion of the address space in advance, and still work with random pointers given to it by another application module.

Regards,
Haggai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 18:12                     ` Haggai Eran
  0 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-02 18:12 UTC (permalink / raw)
  To: Shachar Raindel, Yann Droneaud
  Cc: Sagi Grimberg, oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable


On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote:
>> -----Original Message-----
>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
>> Sent: Thursday, April 02, 2015 7:35 PM
>> To: Haggai Eran
>> Cc: Shachar Raindel; Sagi Grimberg; oss-security@lists.openwall.com;
>> <linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org); linux-
>> kernel@vger.kernel.org; stable@vger.kernel.org
>> Subject: Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected
>> physical memory access
>>
>> Hi Haggai,
>>
>> Le jeudi 02 avril 2015 � 18:18 +0300, Haggai Eran a �crit :
>> > On 02/04/2015 16:30, Yann Droneaud wrote:
>> >> Hi,
>> >>
>> >> Le jeudi 02 avril 2015 � 10:52 +0000, Shachar Raindel a �crit :
>> >>>> -----Original Message-----
>> >>>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
>> >>>> Sent: Thursday, April 02, 2015 1:05 PM
>> >>>> Le mercredi 18 mars 2015 � 17:39 +0000, Shachar Raindel a �crit :
>> >>
>> >>>>> +      /*
>> >>>>> +       * If the combination of the addr and size requested for this
>> >>>> memory
>> >>>>> +       * region causes an integer overflow, return error.
>> >>>>> +       */
>> >>>>> +      if ((PAGE_ALIGN(addr + size) <= size) ||
>> >>>>> +          (PAGE_ALIGN(addr + size) <= addr))
>> >>>>> +              return ERR_PTR(-EINVAL);
>> >>>>> +
>> >>>>
>> >>>> Can access_ok() be used here ?
>> >>>>
>> >>>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
>> >>>>                         addr, size))
>> >>>>                   return ERR_PTR(-EINVAL);
>> >>>>
>> >>>
>> >>> No, this will break the current ODP semantics.
>> >>>
>> >>> ODP allows the user to register memory that is not accessible yet.
>> >>> This is a critical design feature, as it allows avoiding holding
>> >>> a registration cache. Adding this check will break the behavior,
>> >>> forcing memory to be all accessible when registering an ODP MR.
>> >>>
>> >>
>> >> Where's the check for the range being in userspace memory space,
>> >> especially for the ODP case ?
>> >>
>> >> For non ODP case (eg. plain old behavior), does get_user_pages()
>> >> ensure the requested pages fit in userspace region on all
>> >> architectures ? I think so.
>> >
>> > Yes, get_user_pages will return a smaller amount of pages than
>> requested
>> > if it encounters an unmapped region (or a region without write
>> > permissions for write requests). If this happens, the loop in
>> > ib_umem_get calls get_user_pages again with the next set of pages, and
>> > this time if it the first page still cannot be mapped an error is
>> returned.
>> >
>> >>
>> >> In ODP case, I'm not sure such check is ever done ?
>> >
>> > In ODP, we also call get_user_pages, but only when a page fault occurs
>> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
>> register
>> > a memory region that contains unmapped virtual space, and then mmap
>> > different files into that area without needing to re-register.
>> >
>>
>> OK, thanks for the description.
>>
>> ...
>>
>> Another related question: as the large memory range could be registered
>> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
>> what's prevent the kernel to map a file as the result of mmap(0, ...)
>> in this  region, making it available remotely through IBV_WR_RDMA_READ /
>> IBV_WR_RDMA_WRITE ?
>>
> 
> This is not a bug. This is a feature.
> 
> Exposing a file through RDMA, using ODP, can be done exactly like this.
> Given that the application explicitly requested this behavior, I don't
> see why it is a problem. Actually, some of our tests use such flows.
> The mmu notifiers mechanism allow us to do this safely. When the page is
> written back to disk, it is removed from the ODP mapping. When it is
> accessed by the HCA, it is brought back to RAM.
> 

I want to add that we would like to see users registering a very large memory region (perhaps the entire process address space) for local access, and then enabling remote access only to specific regions using memory windows. However, this isn't supported yet by our driver. Still, there are valid cases where you would still want the results of an mmap(0,...) call to be remotely accessible, in cases where there is enough trust between the local process and the remote process. It may help a middleware communication library register a large portion of the address space in advance, and still work with random pointers given to it by another application module.

Regards,
Haggai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 16:44                 ` Shachar Raindel
@ 2015-04-02 20:40                   ` Yann Droneaud
  -1 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 20:40 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: Haggai Eran, Sagi Grimberg, oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 7:35 PM

> > Another related question: as the large memory range could be registered
> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> > what's prevent the kernel to map a file as the result of mmap(0, ...)
> > in this  region, making it available remotely through IBV_WR_RDMA_READ /
> > IBV_WR_RDMA_WRITE ?
> > 
> 
> This is not a bug. This is a feature.
> 
> Exposing a file through RDMA, using ODP, can be done exactly like this.
> Given that the application explicitly requested this behavior, I don't
> see why it is a problem. 

If the application cannot choose what will end up in the region it has
registered, it's an issue !

What might happen if one library in a program call mmap(0, size, ...) to
load a file storing a secret (a private key), and that file ends up 
being mapped in an registered but otherwise free region (afaict, the 
kernel is allowed to do it) ?
What might happen if one library in a program call call mmap(0, 
size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
write in this location a secret (a passphrase), and that area ends up
in the memory region registered for on demand paging ?

The application haven't choose to disclose these confidential piece of 
information, but they are available for reading/writing by remote
through RDMA given it knows the rkey of the memory region (which is a 
32bits value).

I hope I'm missing something, because I'm not feeling confident such
behavor is a feature.


> Actually, some of our tests use such flows.
> The mmu notifiers mechanism allow us to do this safely. When the page is
> written back to disk, it is removed from the ODP mapping. When it is
> accessed by the HCA, it is brought back to RAM.
> 

I'm not discussing about the benefit of On Demand Paging and why it's a
very good feature to expose files through RDMA.

I'm trying to understand how the application can choose what is exposed
through RDMA if it registers a very large memory region for later use 
(but do not actually explicitly map something there yet): what's the
consequences ?

   void *start = sbrk(0);
   size_t size = ULONG_MAX - (unsigned long)start;

   ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)


Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-02 20:40                   ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-02 20:40 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: Haggai Eran, Sagi Grimberg, oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 7:35 PM

> > Another related question: as the large memory range could be registered
> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> > what's prevent the kernel to map a file as the result of mmap(0, ...)
> > in this  region, making it available remotely through IBV_WR_RDMA_READ /
> > IBV_WR_RDMA_WRITE ?
> > 
> 
> This is not a bug. This is a feature.
> 
> Exposing a file through RDMA, using ODP, can be done exactly like this.
> Given that the application explicitly requested this behavior, I don't
> see why it is a problem. 

If the application cannot choose what will end up in the region it has
registered, it's an issue !

What might happen if one library in a program call mmap(0, size, ...) to
load a file storing a secret (a private key), and that file ends up 
being mapped in an registered but otherwise free region (afaict, the 
kernel is allowed to do it) ?
What might happen if one library in a program call call mmap(0, 
size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
write in this location a secret (a passphrase), and that area ends up
in the memory region registered for on demand paging ?

The application haven't choose to disclose these confidential piece of 
information, but they are available for reading/writing by remote
through RDMA given it knows the rkey of the memory region (which is a 
32bits value).

I hope I'm missing something, because I'm not feeling confident such
behavor is a feature.


> Actually, some of our tests use such flows.
> The mmu notifiers mechanism allow us to do this safely. When the page is
> written back to disk, it is removed from the ODP mapping. When it is
> accessed by the HCA, it is brought back to RAM.
> 

I'm not discussing about the benefit of On Demand Paging and why it's a
very good feature to expose files through RDMA.

I'm trying to understand how the application can choose what is exposed
through RDMA if it registers a very large memory region for later use 
(but do not actually explicitly map something there yet): what's the
consequences ?

   void *start = sbrk(0);
   size_t size = ULONG_MAX - (unsigned long)start;

   ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)


Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 20:40                   ` Yann Droneaud
  (?)
@ 2015-04-03  8:39                       ` Haggai Eran
  -1 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-03  8:39 UTC (permalink / raw)
  To: Yann Droneaud, Shachar Raindel
  Cc: Sagi Grimberg,
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA

On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <ydroneaud-RlY5vtjFyJ1hl2p70BpVqQ@public.gmane.orgm> wrote:
> Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit :
>> > -----Original Message-----
>> > From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
>> > Sent: Thursday, April 02, 2015 7:35 PM
> 
>> > Another related question: as the large memory range could be registered
>> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
>> > what's prevent the kernel to map a file as the result of mmap(0, ...)
>> > in this  region, making it available remotely through IBV_WR_RDMA_READ /
>> > IBV_WR_RDMA_WRITE ?
>> >
>>
>> This is not a bug. This is a feature.
>>
>> Exposing a file through RDMA, using ODP, can be done exactly like this.
>> Given that the application explicitly requested this behavior, I don't
>> see why it is a problem.
> 
> If the application cannot choose what will end up in the region it has
> registered, it's an issue !
> 
> What might happen if one library in a program call mmap(0, size, ...) to
> load a file storing a secret (a private key), and that file ends up
> being mapped in an registered but otherwise free region (afaict, the
> kernel is allowed to do it) ?
> What might happen if one library in a program call call mmap(0,
> size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
> write in this location a secret (a passphrase), and that area ends up
> in the memory region registered for on demand paging ?
> 
> The application haven't choose to disclose these confidential piece of
> information, but they are available for reading/writing by remote
> through RDMA given it knows the rkey of the memory region (which is a
> 32bits value).
> 
> I hope I'm missing something, because I'm not feeling confident such
> behavor is a feature.

What we are aiming for is the possibility to register the entire process' address 
space for RDMA operations (if the process chooses to use this feature).
This is similar to multiple threads accessing the same address space. I'm sure 
you wouldn't be complaining about the ability of one thread to access the secret 
passphrase mmapped by another thread in your example.

> I'm trying to understand how the application can choose what is exposed
> through RDMA if it registers a very large memory region for later use
> (but do not actually explicitly map something there yet): what's the
> consequences ?
> 
>    void *start = sbrk(0);
>    size_t size = ULONG_MAX - (unsigned long)start;
> 
>    ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)

The consequences are exactly as you wrote. Just as giving a non-ODP rkey 
to a remote node allows the node to access the registered memory behind that 
rkey, giving an ODP rkey to a remote node allows that node to access the 
virtual address space behind that rkey.

Regards,
Haggai--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-03  8:39                       ` Haggai Eran
  0 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-03  8:39 UTC (permalink / raw)
  To: Yann Droneaud, Shachar Raindel
  Cc: Sagi Grimberg,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <ydroneaud@opteya.com> wrote:
> Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit :
>> > -----Original Message-----
>> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
>> > Sent: Thursday, April 02, 2015 7:35 PM
> 
>> > Another related question: as the large memory range could be registered
>> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
>> > what's prevent the kernel to map a file as the result of mmap(0, ...)
>> > in this  region, making it available remotely through IBV_WR_RDMA_READ /
>> > IBV_WR_RDMA_WRITE ?
>> >
>>
>> This is not a bug. This is a feature.
>>
>> Exposing a file through RDMA, using ODP, can be done exactly like this.
>> Given that the application explicitly requested this behavior, I don't
>> see why it is a problem.
> 
> If the application cannot choose what will end up in the region it has
> registered, it's an issue !
> 
> What might happen if one library in a program call mmap(0, size, ...) to
> load a file storing a secret (a private key), and that file ends up
> being mapped in an registered but otherwise free region (afaict, the
> kernel is allowed to do it) ?
> What might happen if one library in a program call call mmap(0,
> size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
> write in this location a secret (a passphrase), and that area ends up
> in the memory region registered for on demand paging ?
> 
> The application haven't choose to disclose these confidential piece of
> information, but they are available for reading/writing by remote
> through RDMA given it knows the rkey of the memory region (which is a
> 32bits value).
> 
> I hope I'm missing something, because I'm not feeling confident such
> behavor is a feature.

What we are aiming for is the possibility to register the entire process' address 
space for RDMA operations (if the process chooses to use this feature).
This is similar to multiple threads accessing the same address space. I'm sure 
you wouldn't be complaining about the ability of one thread to access the secret 
passphrase mmapped by another thread in your example.

> I'm trying to understand how the application can choose what is exposed
> through RDMA if it registers a very large memory region for later use
> (but do not actually explicitly map something there yet): what's the
> consequences ?
> 
>    void *start = sbrk(0);
>    size_t size = ULONG_MAX - (unsigned long)start;
> 
>    ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)

The consequences are exactly as you wrote. Just as giving a non-ODP rkey 
to a remote node allows the node to access the registered memory behind that 
rkey, giving an ODP rkey to a remote node allows that node to access the 
virtual address space behind that rkey.

Regards,
Haggai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-03  8:39                       ` Haggai Eran
  0 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-03  8:39 UTC (permalink / raw)
  To: Yann Droneaud, Shachar Raindel
  Cc: Sagi Grimberg,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <ydroneaud@opteya.com> wrote:
> Le jeudi 02 avril 2015 � 16:44 +0000, Shachar Raindel a �crit :
>> > -----Original Message-----
>> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
>> > Sent: Thursday, April 02, 2015 7:35 PM
> 
>> > Another related question: as the large memory range could be registered
>> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
>> > what's prevent the kernel to map a file as the result of mmap(0, ...)
>> > in this  region, making it available remotely through IBV_WR_RDMA_READ /
>> > IBV_WR_RDMA_WRITE ?
>> >
>>
>> This is not a bug. This is a feature.
>>
>> Exposing a file through RDMA, using ODP, can be done exactly like this.
>> Given that the application explicitly requested this behavior, I don't
>> see why it is a problem.
> 
> If the application cannot choose what will end up in the region it has
> registered, it's an issue !
> 
> What might happen if one library in a program call mmap(0, size, ...) to
> load a file storing a secret (a private key), and that file ends up
> being mapped in an registered but otherwise free region (afaict, the
> kernel is allowed to do it) ?
> What might happen if one library in a program call call mmap(0,
> size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
> write in this location a secret (a passphrase), and that area ends up
> in the memory region registered for on demand paging ?
> 
> The application haven't choose to disclose these confidential piece of
> information, but they are available for reading/writing by remote
> through RDMA given it knows the rkey of the memory region (which is a
> 32bits value).
> 
> I hope I'm missing something, because I'm not feeling confident such
> behavor is a feature.

What we are aiming for is the possibility to register the entire process' address 
space for RDMA operations (if the process chooses to use this feature).
This is similar to multiple threads accessing the same address space. I'm sure 
you wouldn't be complaining about the ability of one thread to access the secret 
passphrase mmapped by another thread in your example.

> I'm trying to understand how the application can choose what is exposed
> through RDMA if it registers a very large memory region for later use
> (but do not actually explicitly map something there yet): what's the
> consequences ?
> 
>    void *start = sbrk(0);
>    size_t size = ULONG_MAX - (unsigned long)start;
> 
>    ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)

The consequences are exactly as you wrote. Just as giving a non-ODP rkey 
to a remote node allows the node to access the registered memory behind that 
rkey, giving an ODP rkey to a remote node allows that node to access the 
virtual address space behind that rkey.

Regards,
Haggai

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [oss-security] RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
       [not found]                 ` <20150402174401.GA24285@nautica>
@ 2015-04-03 11:49                   ` Shachar Raindel
       [not found]                     ` <AM3PR05MB0935F9F1011E30F11C2A08BDDCF10-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 45+ messages in thread
From: Shachar Raindel @ 2015-04-03 11:49 UTC (permalink / raw)
  To: Dominique Martinet,
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)

Hi Dominique,

> -----Original Message-----
> From: Dominique Martinet [mailto:dominique.martinet@cea.fr]
> Sent: Thursday, April 02, 2015 8:44 PM
> To: Shachar Raindel
> Subject: Re: [oss-security] RE: CVE-2014-8159 kernel: infiniband:
> uverbs: unprotected physical memory access
> 
> Hi,
> 
> Shachar Raindel wrote on Thu, Apr 02, 2015:
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > Another related question: as the large memory range could be
> registered
> > > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> > > what's prevent the kernel to map a file as the result of mmap(0,
> ...)
> > > in this  region, making it available remotely through
> IBV_WR_RDMA_READ /
> > > IBV_WR_RDMA_WRITE ?
> > >
> >
> > This is not a bug. This is a feature.
> >
> > Exposing a file through RDMA, using ODP, can be done exactly like
> this.
> > Given that the application explicitly requested this behavior, I don't
> > see why it is a problem. Actually, some of our tests use such flows.
> > The mmu notifiers mechanism allow us to do this safely. When the page
> is
> > written back to disk, it is removed from the ODP mapping. When it is
> > accessed by the HCA, it is brought back to RAM.
> 
> Forgive the private reply, but I've actually tried that a while ago and

No problem, better than e-mailing the entire world on RDMA specific topics.
As this discussion is relevant to the Linux RDMA users, I'm adding back
the linux-rdma mailing list.

> couldn't get it to work - ibv_reg_mr would return EINVAL on an address
> obtained by mmap.

Were you mmaping a normal disk file, or was the mmap targeting an MMIO of
another hardware device? mmap of a normal disk file should work also with
normal memory registration, assuming you are providing the proper length.

mmap of the MMIO area of another hardware device (i.e. interfacing an FPGA,
NVRAM, or similar things) requires some code changes on both sides. The
current kernel code in the infiniband side is using get_user_pages, which
does not support MMIO pages. The proposed PeerDirect patches [1] allows peer
device to declare ownership of virtual address ranges, and enable such
registration. However, these patches are have not yet been merged upstream.

> 
> Conceptually as well I'm not sure how it's supposed to work, mmap should
> only actually issue reads when memory access issues page faults (give or
> take suitable readahead logic), but I don't think direct memory access
> from the IB/RDMA adapter would issue such page faults ?

You are correct. RDMA adapters without ODP support do not issue page faults.
Instead, during memory registration, the ib_umem code calls get_user_pages,
which ensures all relevant pages are in memory, and pins them as needed.

> Likewise on writes, would need the kernel to notice memory has been
> written and pages are dirty/needs flushing.
> 

Similarly, when deregistering a writable memory region, the kernel driver
marks all pages as dirty before unpinning them. You can see the code doing
this in [2].

> So, what am I missing? I'd wager it's that "ODP" you're talking about,
> do you have any documentation I could skim through ?
> 

Liran Liss gave a presentation about ODP at OFA [3]. The technology is
available for ConnectIB devices using the most recent firmware and kernel
versions above 3.19.

Thanks,
--Shachar

[1] http://www.spinics.net/lists/linux-rdma/msg21770.html
[2] http://lxr.free-electrons.com/source/drivers/infiniband/core/umem.c#L62
[3] https://www.openfabrics.org/images/Workshops_2014/DevWorkshop/presos/Tuesday/pdf/09.30_2014_OFA_Workshop_ODP_update_final.pdf and https://www.youtube.com/watch?v=KbrlsXQbHCw



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-03  8:39                       ` Haggai Eran
@ 2015-04-03 11:49                         ` Yann Droneaud
  -1 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-03 11:49 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Shachar Raindel, Sagi Grimberg,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le vendredi 03 avril 2015 à 08:39 +0000, Haggai Eran a écrit :
> On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <ydroneaud@opteya.com> wrote:
> > Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit :
> >> > -----Original Message-----
> >> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> >> > Sent: Thursday, April 02, 2015 7:35 PM
> > 
> >> > Another related question: as the large memory range could be registered
> >> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> >> > what's prevent the kernel to map a file as the result of mmap(0, ...)
> >> > in this  region, making it available remotely through IBV_WR_RDMA_READ /
> >> > IBV_WR_RDMA_WRITE ?
> >> >
> >>
> >> This is not a bug. This is a feature.
> >>
> >> Exposing a file through RDMA, using ODP, can be done exactly like this.
> >> Given that the application explicitly requested this behavior, I don't
> >> see why it is a problem.
> > 
> > If the application cannot choose what will end up in the region it has
> > registered, it's an issue !
> > 
> > What might happen if one library in a program call mmap(0, size, ...) to
> > load a file storing a secret (a private key), and that file ends up
> > being mapped in an registered but otherwise free region (afaict, the
> > kernel is allowed to do it) ?
> > What might happen if one library in a program call call mmap(0,
> > size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
> > write in this location a secret (a passphrase), and that area ends up
> > in the memory region registered for on demand paging ?
> > 
> > The application haven't choose to disclose these confidential piece of
> > information, but they are available for reading/writing by remote
> > through RDMA given it knows the rkey of the memory region (which is a
> > 32bits value).
> > 
> > I hope I'm missing something, because I'm not feeling confident such
> > behavor is a feature.
> 
> What we are aiming for is the possibility to register the entire process' address 
> space for RDMA operations (if the process chooses to use this feature).
> This is similar to multiple threads accessing the same address space. I'm sure 
> you wouldn't be complaining about the ability of one thread to access the secret 
> passphrase mmapped by another thread in your example.
> 
> > I'm trying to understand how the application can choose what is exposed
> > through RDMA if it registers a very large memory region for later use
> > (but do not actually explicitly map something there yet): what's the
> > consequences ?
> > 
> >    void *start = sbrk(0);
> >    size_t size = ULONG_MAX - (unsigned long)start;
> > 
> >    ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)
> 
> The consequences are exactly as you wrote. Just as giving a non-ODP rkey 
> to a remote node allows the node to access the registered memory behind that 
> rkey, giving an ODP rkey to a remote node allows that node to access the 
> virtual address space behind that rkey.
> 

There's a difference: it's impossible to give a valid non-ODP rkey that
point to a memory region not already mapped (backed by a file for 
example), so the application *choose* the content of the memory to be
made accessible remotely before making it accessible.

As I understand the last explanation regarding ODP, at creation time,
an ODP rkey can point to a free, unused, unallocated memory portion.
At this point the kernel can happily map anything the application
(and its libraries) want to map at a (almost) *random* address that
could be in (or partially in) the ODP memory region.

And I have a problem with such random behavior. Allowing this is seems
dangerous and should be done with care.

I believe the application must kept the control of what's end up in its 
ODP registered memory region.

Especially for multi thread program: imagine one thread creating a large
memory region for its future purposes, then send the rkey to a remote 
peer and wait for some work to be done.
In the mean time another call mmap(0, ...) to map a file at a kernel 
chosen address, and that address happen to be in the memory region 
registered by the other thread:

1) the first thread is amputated from a portion of memory it was 
willing to use;
2) the data used by the second thread is accessible to the remote 
peer(s) while not expected.

Speculatively registering memory seems dangerous for any use case I
could think of.

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-03 11:49                         ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-03 11:49 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Shachar Raindel, Sagi Grimberg,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le vendredi 03 avril 2015 à 08:39 +0000, Haggai Eran a écrit :
> On Thursday, April 2, 2015 11:40 PM, Yann Droneaud <ydroneaud@opteya.com> wrote:
> > Le jeudi 02 avril 2015 à 16:44 +0000, Shachar Raindel a écrit :
> >> > -----Original Message-----
> >> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> >> > Sent: Thursday, April 02, 2015 7:35 PM
> > 
> >> > Another related question: as the large memory range could be registered
> >> > by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> >> > what's prevent the kernel to map a file as the result of mmap(0, ...)
> >> > in this  region, making it available remotely through IBV_WR_RDMA_READ /
> >> > IBV_WR_RDMA_WRITE ?
> >> >
> >>
> >> This is not a bug. This is a feature.
> >>
> >> Exposing a file through RDMA, using ODP, can be done exactly like this.
> >> Given that the application explicitly requested this behavior, I don't
> >> see why it is a problem.
> > 
> > If the application cannot choose what will end up in the region it has
> > registered, it's an issue !
> > 
> > What might happen if one library in a program call mmap(0, size, ...) to
> > load a file storing a secret (a private key), and that file ends up
> > being mapped in an registered but otherwise free region (afaict, the
> > kernel is allowed to do it) ?
> > What might happen if one library in a program call call mmap(0,
> > size, ..., MAP_ANONYMOUS,...) to allocate memory, call mlock(), then
> > write in this location a secret (a passphrase), and that area ends up
> > in the memory region registered for on demand paging ?
> > 
> > The application haven't choose to disclose these confidential piece of
> > information, but they are available for reading/writing by remote
> > through RDMA given it knows the rkey of the memory region (which is a
> > 32bits value).
> > 
> > I hope I'm missing something, because I'm not feeling confident such
> > behavor is a feature.
> 
> What we are aiming for is the possibility to register the entire process' address 
> space for RDMA operations (if the process chooses to use this feature).
> This is similar to multiple threads accessing the same address space. I'm sure 
> you wouldn't be complaining about the ability of one thread to access the secret 
> passphrase mmapped by another thread in your example.
> 
> > I'm trying to understand how the application can choose what is exposed
> > through RDMA if it registers a very large memory region for later use
> > (but do not actually explicitly map something there yet): what's the
> > consequences ?
> > 
> >    void *start = sbrk(0);
> >    size_t size = ULONG_MAX - (unsigned long)start;
> > 
> >    ibv_reg_mr(pd, start, size, IB_ACCESS_ON_DEMAND)
> 
> The consequences are exactly as you wrote. Just as giving a non-ODP rkey 
> to a remote node allows the node to access the registered memory behind that 
> rkey, giving an ODP rkey to a remote node allows that node to access the 
> virtual address space behind that rkey.
> 

There's a difference: it's impossible to give a valid non-ODP rkey that
point to a memory region not already mapped (backed by a file for 
example), so the application *choose* the content of the memory to be
made accessible remotely before making it accessible.

As I understand the last explanation regarding ODP, at creation time,
an ODP rkey can point to a free, unused, unallocated memory portion.
At this point the kernel can happily map anything the application
(and its libraries) want to map at a (almost) *random* address that
could be in (or partially in) the ODP memory region.

And I have a problem with such random behavior. Allowing this is seems
dangerous and should be done with care.

I believe the application must kept the control of what's end up in its 
ODP registered memory region.

Especially for multi thread program: imagine one thread creating a large
memory region for its future purposes, then send the rkey to a remote 
peer and wait for some work to be done.
In the mean time another call mmap(0, ...) to map a file at a kernel 
chosen address, and that address happen to be in the memory region 
registered by the other thread:

1) the first thread is amputated from a portion of memory it was 
willing to use;
2) the data used by the second thread is accessible to the remote 
peer(s) while not expected.

Speculatively registering memory seems dangerous for any use case I
could think of.

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [oss-security] RE: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
       [not found]                     ` <AM3PR05MB0935F9F1011E30F11C2A08BDDCF10-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2015-04-03 12:43                       ` Dominique Martinet
  0 siblings, 0 replies; 45+ messages in thread
From: Dominique Martinet @ 2015-04-03 12:43 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org)

Hi,

Shachar Raindel wrote on Fri, Apr 03, 2015 at 11:49:13AM +0000:
> > couldn't get it to work - ibv_reg_mr would return EINVAL on an address
> > obtained by mmap.
> 
> Were you mmaping a normal disk file, or was the mmap targeting an MMIO of
> another hardware device? mmap of a normal disk file should work also with
> normal memory registration, assuming you are providing the proper length.

On a proper file.
It was a year or two ago, I actually tried again now and it does seem
to work alright even on older kernels... I wonder what I did wrong back
then!

> mmap of the MMIO area of another hardware device (i.e. interfacing an FPGA,
> NVRAM, or similar things) requires some code changes on both sides. The
> current kernel code in the infiniband side is using get_user_pages, which
> does not support MMIO pages. The proposed PeerDirect patches [1] allows peer
> device to declare ownership of virtual address ranges, and enable such
> registration. However, these patches are have not yet been merged upstream.

Interesting, I don't need this right now but it's good to know.

> > Conceptually as well I'm not sure how it's supposed to work, mmap should
> > only actually issue reads when memory access issues page faults (give or
> > take suitable readahead logic), but I don't think direct memory access
> > from the IB/RDMA adapter would issue such page faults ?
> 
> You are correct. RDMA adapters without ODP support do not issue page faults.
> Instead, during memory registration, the ib_umem code calls get_user_pages,
> which ensures all relevant pages are in memory, and pins them as needed.
> 
> > Likewise on writes, would need the kernel to notice memory has been
> > written and pages are dirty/needs flushing.
> > 
> 
> Similarly, when deregistering a writable memory region, the kernel driver
> marks all pages as dirty before unpinning them. You can see the code doing
> this in [2].

Ok this makes sense, thanks for clearing it up.

> Liran Liss gave a presentation about ODP at OFA [3]. The technology is
> available for ConnectIB devices using the most recent firmware and kernel
> versions above 3.19.

I don't have any recent kernel around, but I'll give it a shot next
week.
(I'm working on a userspace file server, nfs-ganesha, so being able to
mmap a file and use it directly for I/O is very promising for me!)


> [1] http://www.spinics.net/lists/linux-rdma/msg21770.html
> [2] http://lxr.free-electrons.com/source/drivers/infiniband/core/umem.c#L62
> [3] https://www.openfabrics.org/images/Workshops_2014/DevWorkshop/presos/Tuesday/pdf/09.30_2014_OFA_Workshop_ODP_update_final.pdf and https://www.youtube.com/watch?v=KbrlsXQbHCw


Thanks for the detailed reply, I'll dig a bit further and come back
straight to the list if need to.

-- 
Dominique Martinet
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 16:34           ` Shachar Raindel
  (?)
@ 2015-04-08 12:19               ` Yann Droneaud
  -1 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-08 12:19 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8,
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
	(linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org),
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA

Hi,

Le jeudi 02 avril 2015 à 16:34 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
> > Sent: Thursday, April 02, 2015 6:16 PM
> > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > > -----Original Message-----
> > > > From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
> > > > Sent: Thursday, April 02, 2015 1:05 PM
> > > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > ...
> > > > > +	/*
> > > > > +	 * If the combination of the addr and size requested for this
> > > > memory
> > > > > +	 * region causes an integer overflow, return error.
> > > > > +	 */
> > > > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > > > +		return ERR_PTR(-EINVAL);
> > > > > +
> > > >
> > > > Can access_ok() be used here ?
> > > >
> > > >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > > >                         addr, size))
> > > >                   return ERR_PTR(-EINVAL);
> > > >
> > >
> > > No, this will break the current ODP semantics.
> > >
> > > ODP allows the user to register memory that is not accessible yet.
> > > This is a critical design feature, as it allows avoiding holding
> > > a registration cache. Adding this check will break the behavior,
> > > forcing memory to be all accessible when registering an ODP MR.
> > >
> > 
> > Failed to notice previously, but since this would break ODP, and ODP is
> > only available starting v3.19-rc1, my proposed fix might be applicable
> > for older kernel (if not better).
> > 
> 
> Can you explain how this proposed fix is better than the existing patch?
> Why do we want to push to the stable tree a patch that is not in the
> upstream? There is an existing, tested, patch that is going to the tip
> of the development. It even applies cleanly on every kernel version around.
> 

access_ok() check for overflow *and* that the region is the memory range
for the current process. The later check is not done in your proposed 
fix (but it should not be needed as get_user_pages() will be called 
to validate the whole region for non-ODP memory registration).

Anyway, AFAIK access_ok() won't check for address being not NULL and
size not being 0, and I've noticed your proposed fix also ensure address
is not equal to NULL and, more important, that size is not equal to 0:
before v3.15-rc1 and commit eeb8461e36c9 ("IB: Refactor umem to use
linear SG table"), calling ib_umem_get() with size equal to 0 would 
succeed with any arbitrary address ... who knows what might happen in 
the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
memory region.
This part of the changes was not detailled in your commit message: it's
an issue not related to overflow which is addressed by your patch.

So I agree my proposed patch is no better than yours: I've missed the
0-sized memory region issue and didn't take care of NULL address.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-08 12:19               ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-08 12:19 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le jeudi 02 avril 2015 à 16:34 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 6:16 PM
> > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > > -----Original Message-----
> > > > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > > Sent: Thursday, April 02, 2015 1:05 PM
> > > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > ...
> > > > > +	/*
> > > > > +	 * If the combination of the addr and size requested for this
> > > > memory
> > > > > +	 * region causes an integer overflow, return error.
> > > > > +	 */
> > > > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > > > +		return ERR_PTR(-EINVAL);
> > > > > +
> > > >
> > > > Can access_ok() be used here ?
> > > >
> > > >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > > >                         addr, size))
> > > >                   return ERR_PTR(-EINVAL);
> > > >
> > >
> > > No, this will break the current ODP semantics.
> > >
> > > ODP allows the user to register memory that is not accessible yet.
> > > This is a critical design feature, as it allows avoiding holding
> > > a registration cache. Adding this check will break the behavior,
> > > forcing memory to be all accessible when registering an ODP MR.
> > >
> > 
> > Failed to notice previously, but since this would break ODP, and ODP is
> > only available starting v3.19-rc1, my proposed fix might be applicable
> > for older kernel (if not better).
> > 
> 
> Can you explain how this proposed fix is better than the existing patch?
> Why do we want to push to the stable tree a patch that is not in the
> upstream? There is an existing, tested, patch that is going to the tip
> of the development. It even applies cleanly on every kernel version around.
> 

access_ok() check for overflow *and* that the region is the memory range
for the current process. The later check is not done in your proposed 
fix (but it should not be needed as get_user_pages() will be called 
to validate the whole region for non-ODP memory registration).

Anyway, AFAIK access_ok() won't check for address being not NULL and
size not being 0, and I've noticed your proposed fix also ensure address
is not equal to NULL and, more important, that size is not equal to 0:
before v3.15-rc1 and commit eeb8461e36c9 ("IB: Refactor umem to use
linear SG table"), calling ib_umem_get() with size equal to 0 would 
succeed with any arbitrary address ... who knows what might happen in 
the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
memory region.
This part of the changes was not detailled in your commit message: it's
an issue not related to overflow which is addressed by your patch.

So I agree my proposed patch is no better than yours: I've missed the
0-sized memory region issue and didn't take care of NULL address.

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-08 12:19               ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-08 12:19 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le jeudi 02 avril 2015 à 16:34 +0000, Shachar Raindel a écrit :
> > -----Original Message-----
> > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > Sent: Thursday, April 02, 2015 6:16 PM
> > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > > -----Original Message-----
> > > > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > > Sent: Thursday, April 02, 2015 1:05 PM
> > > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > ...
> > > > > +	/*
> > > > > +	 * If the combination of the addr and size requested for this
> > > > memory
> > > > > +	 * region causes an integer overflow, return error.
> > > > > +	 */
> > > > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > > > +		return ERR_PTR(-EINVAL);
> > > > > +
> > > >
> > > > Can access_ok() be used here ?
> > > >
> > > >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > > >                         addr, size))
> > > >                   return ERR_PTR(-EINVAL);
> > > >
> > >
> > > No, this will break the current ODP semantics.
> > >
> > > ODP allows the user to register memory that is not accessible yet.
> > > This is a critical design feature, as it allows avoiding holding
> > > a registration cache. Adding this check will break the behavior,
> > > forcing memory to be all accessible when registering an ODP MR.
> > >
> > 
> > Failed to notice previously, but since this would break ODP, and ODP is
> > only available starting v3.19-rc1, my proposed fix might be applicable
> > for older kernel (if not better).
> > 
> 
> Can you explain how this proposed fix is better than the existing patch?
> Why do we want to push to the stable tree a patch that is not in the
> upstream? There is an existing, tested, patch that is going to the tip
> of the development. It even applies cleanly on every kernel version around.
> 

access_ok() check for overflow *and* that the region is the memory range
for the current process. The later check is not done in your proposed 
fix (but it should not be needed as get_user_pages() will be called 
to validate the whole region for non-ODP memory registration).

Anyway, AFAIK access_ok() won't check for address being not NULL and
size not being 0, and I've noticed your proposed fix also ensure address
is not equal to NULL and, more important, that size is not equal to 0:
before v3.15-rc1 and commit eeb8461e36c9 ("IB: Refactor umem to use
linear SG table"), calling ib_umem_get() with size equal to 0 would 
succeed with any arbitrary address ... who knows what might happen in 
the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
memory region.
This part of the changes was not detailled in your commit message: it's
an issue not related to overflow which is addressed by your patch.

So I agree my proposed patch is no better than yours: I've missed the
0-sized memory region issue and didn't take care of NULL address.

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-08 12:19               ` Yann Droneaud
@ 2015-04-08 12:44                 ` Yann Droneaud
  -1 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-08 12:44 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le mercredi 08 avril 2015 à 14:19 +0200, Yann Droneaud a écrit :
> Le jeudi 02 avril 2015 à 16:34 +0000, Shachar Raindel a écrit :
> > > -----Original Message-----
> > > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > Sent: Thursday, April 02, 2015 6:16 PM
> > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > > > -----Original Message-----
> > > > > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > > > Sent: Thursday, April 02, 2015 1:05 PM
> > > > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > > ...
> > > > > > +	/*
> > > > > > +	 * If the combination of the addr and size requested for this
> > > > > memory
> > > > > > +	 * region causes an integer overflow, return error.
> > > > > > +	 */
> > > > > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > > > > +		return ERR_PTR(-EINVAL);
> > > > > > +
> > > > >
> > > > > Can access_ok() be used here ?
> > > > >
> > > > >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > > > >                         addr, size))
> > > > >                   return ERR_PTR(-EINVAL);
> > > > >
> > > >
> > > > No, this will break the current ODP semantics.
> > > >
> > > > ODP allows the user to register memory that is not accessible yet.
> > > > This is a critical design feature, as it allows avoiding holding
> > > > a registration cache. Adding this check will break the behavior,
> > > > forcing memory to be all accessible when registering an ODP MR.
> > > >
> > > 
> > > Failed to notice previously, but since this would break ODP, and ODP is
> > > only available starting v3.19-rc1, my proposed fix might be applicable
> > > for older kernel (if not better).
> > > 
> > 
> > Can you explain how this proposed fix is better than the existing patch?
> > Why do we want to push to the stable tree a patch that is not in the
> > upstream? There is an existing, tested, patch that is going to the tip
> > of the development. It even applies cleanly on every kernel version around.
> > 
> 
> access_ok() check for overflow *and* that the region is the memory range
> for the current process. The later check is not done in your proposed 
> fix (but it should not be needed as get_user_pages() will be called 
> to validate the whole region for non-ODP memory registration).
> 
> Anyway, AFAIK access_ok() won't check for address being not NULL and
> size not being 0, and I've noticed your proposed fix also ensure address
> is not equal to NULL and, more important, that size is not equal to 0

It only check address not being 0 if size is already PAGE_SIZE aligned,
and it only check size not being 0 if address is already PAGE_SIZE
aligned.

> before v3.15-rc1 and commit eeb8461e36c9 ("IB: Refactor umem to use
> linear SG table"), calling ib_umem_get() with size equal to 0 would 
> succeed with any arbitrary address ... who knows what might happen in 
> the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
> memory region.
> This part of the changes was not detailled in your commit message: it's
> an issue not related to overflow which is addressed by your patch.
> 
> So I agree my proposed patch is no better than yours: I've missed the
> 0-sized memory region issue and didn't take care of NULL address.
> 

Regards.

-- 
Yann Droneaud
OPTEYA

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-08 12:44                 ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-08 12:44 UTC (permalink / raw)
  To: Shachar Raindel
  Cc: oss-security,
	<linux-rdma@vger.kernel.org> (linux-rdma@vger.kernel.org),
	linux-kernel, stable

Hi,

Le mercredi 08 avril 2015 à 14:19 +0200, Yann Droneaud a écrit :
> Le jeudi 02 avril 2015 à 16:34 +0000, Shachar Raindel a écrit :
> > > -----Original Message-----
> > > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > Sent: Thursday, April 02, 2015 6:16 PM
> > > Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> > > > > -----Original Message-----
> > > > > From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> > > > > Sent: Thursday, April 02, 2015 1:05 PM
> > > > > Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> > > ...
> > > > > > +	/*
> > > > > > +	 * If the combination of the addr and size requested for this
> > > > > memory
> > > > > > +	 * region causes an integer overflow, return error.
> > > > > > +	 */
> > > > > > +	if ((PAGE_ALIGN(addr + size) <= size) ||
> > > > > > +	    (PAGE_ALIGN(addr + size) <= addr))
> > > > > > +		return ERR_PTR(-EINVAL);
> > > > > > +
> > > > >
> > > > > Can access_ok() be used here ?
> > > > >
> > > > >          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> > > > >                         addr, size))
> > > > >                   return ERR_PTR(-EINVAL);
> > > > >
> > > >
> > > > No, this will break the current ODP semantics.
> > > >
> > > > ODP allows the user to register memory that is not accessible yet.
> > > > This is a critical design feature, as it allows avoiding holding
> > > > a registration cache. Adding this check will break the behavior,
> > > > forcing memory to be all accessible when registering an ODP MR.
> > > >
> > > 
> > > Failed to notice previously, but since this would break ODP, and ODP is
> > > only available starting v3.19-rc1, my proposed fix might be applicable
> > > for older kernel (if not better).
> > > 
> > 
> > Can you explain how this proposed fix is better than the existing patch?
> > Why do we want to push to the stable tree a patch that is not in the
> > upstream? There is an existing, tested, patch that is going to the tip
> > of the development. It even applies cleanly on every kernel version around.
> > 
> 
> access_ok() check for overflow *and* that the region is the memory range
> for the current process. The later check is not done in your proposed 
> fix (but it should not be needed as get_user_pages() will be called 
> to validate the whole region for non-ODP memory registration).
> 
> Anyway, AFAIK access_ok() won't check for address being not NULL and
> size not being 0, and I've noticed your proposed fix also ensure address
> is not equal to NULL and, more important, that size is not equal to 0

It only check address not being 0 if size is already PAGE_SIZE aligned,
and it only check size not being 0 if address is already PAGE_SIZE
aligned.

> before v3.15-rc1 and commit eeb8461e36c9 ("IB: Refactor umem to use
> linear SG table"), calling ib_umem_get() with size equal to 0 would 
> succeed with any arbitrary address ... who knows what might happen in 
> the lowlevel drivers (aka. providers) if they got an umem for a 0-sized
> memory region.
> This part of the changes was not detailled in your commit message: it's
> an issue not related to overflow which is addressed by your patch.
> 
> So I agree my proposed patch is no better than yours: I've missed the
> 0-sized memory region issue and didn't take care of NULL address.
> 

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-02 18:12                     ` Haggai Eran
@ 2015-04-13 13:29                         ` Yann Droneaud
  -1 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-13 13:29 UTC (permalink / raw)
  To: Haggai Eran
  Cc: Shachar Raindel, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi,

Le jeudi 02 avril 2015 à 18:12 +0000, Haggai Eran a écrit :
> On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote:
> >> -----Original Message-----
> >> From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
> >> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> >> > On 02/04/2015 16:30, Yann Droneaud wrote:
> >> >> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> >> >>>> -----Original Message-----
> >> >>>> From: Yann Droneaud [mailto:ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org]
> >> >>>> Sent: Thursday, April 02, 2015 1:05 PM
> >> >>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> >> >>
> >> >>>>> +      /*
> >> >>>>> +       * If the combination of the addr and size requested for this
> >> >>>> memory
> >> >>>>> +       * region causes an integer overflow, return error.
> >> >>>>> +       */
> >> >>>>> +      if ((PAGE_ALIGN(addr + size) <= size) ||
> >> >>>>> +          (PAGE_ALIGN(addr + size) <= addr))
> >> >>>>> +              return ERR_PTR(-EINVAL);
> >> >>>>> +
> >> >>>>
> >> >>>> Can access_ok() be used here ?
> >> >>>>
> >> >>>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >> >>>>                         addr, size))
> >> >>>>                   return ERR_PTR(-EINVAL);
> >> >>>>
> >> >>>
> >> >>> No, this will break the current ODP semantics.
> >> >>>
> >> >>> ODP allows the user to register memory that is not accessible yet.
> >> >>> This is a critical design feature, as it allows avoiding holding
> >> >>> a registration cache. Adding this check will break the behavior,
> >> >>> forcing memory to be all accessible when registering an ODP MR.
> >> >>>
> >> >>
> >> >> Where's the check for the range being in userspace memory space,
> >> >> especially for the ODP case ?
> >> >>
> >> >> For non ODP case (eg. plain old behavior), does get_user_pages()
> >> >> ensure the requested pages fit in userspace region on all
> >> >> architectures ? I think so.
> >> >
> >> > Yes, get_user_pages will return a smaller amount of pages than
> >> requested
> >> > if it encounters an unmapped region (or a region without write
> >> > permissions for write requests). If this happens, the loop in
> >> > ib_umem_get calls get_user_pages again with the next set of pages, and
> >> > this time if it the first page still cannot be mapped an error is
> >> returned.
> >> >
> >> >>
> >> >> In ODP case, I'm not sure such check is ever done ?
> >> >
> >> > In ODP, we also call get_user_pages, but only when a page fault occurs
> >> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
> >> register
> >> > a memory region that contains unmapped virtual space, and then mmap
> >> > different files into that area without needing to re-register.
> >> >
> >>
> >> OK, thanks for the description.
> >>
> >> ...
> >>
> >> Another related question: as the large memory range could be registered
> >> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> >> what's prevent the kernel to map a file as the result of mmap(0, ...)
> >> in this  region, making it available remotely through IBV_WR_RDMA_READ /
> >> IBV_WR_RDMA_WRITE ?
> >>
> > 
> > This is not a bug. This is a feature.
> > 
> > Exposing a file through RDMA, using ODP, can be done exactly like this.
> > Given that the application explicitly requested this behavior, I don't
> > see why it is a problem. Actually, some of our tests use such flows.
> > The mmu notifiers mechanism allow us to do this safely. When the page is
> > written back to disk, it is removed from the ODP mapping. When it is
> > accessed by the HCA, it is brought back to RAM.
> > 
> 
> I want to add that we would like to see users registering a very large
> memory region (perhaps the entire process address space) for local
> access, and then enabling remote access only to specific regions using
> memory windows. However, this isn't supported yet by our driver.

In such scheme, the registration must still be handled "manually":
one has to create a memory window to get a rkey to be exchanged with
a peer, so why one would want to register such a large memory region
(the whole process space) ?

I guess creating the memory window is faster than registering memory
region. I'd rather say this is not an excuse to register a larger 
memory region (up to the whole process space, current and future) as it 
sounds like a surprising optimisation: let the HCA known too many
pages just to be sure it already knows some when the process want to 
use them. It seems it would become difficult to handle if there's too
many processes.

I would prefer creating memory region becoming costless (through ODP :).

>  Still, there are valid cases where you would still want the results
> of an mmap(0,...) call to be remotely accessible, in cases where there
> is enough trust between the local process and the remote process.

mmap(0, ...., fd) let the kernel choose where to put the file in 
process virtual memory space, so it may, may not, partially, end up in 
an ODP pre registered memory region for a range unallocated/unused yet.

I don't think one want such to happen.

>  It may help a middleware communication library register a large
> portion of the address space in advance, and still work with random
> pointers given to it by another application module.
> 

But as said in the beginnig of your message, the middleware would have
bind a memory window before posting work request / exposing rkey for
the "random pointers".

So I fail to understand how could be used ODP when it comes to 
registering a memory region not yet backed by something.

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-13 13:29                         ` Yann Droneaud
  0 siblings, 0 replies; 45+ messages in thread
From: Yann Droneaud @ 2015-04-13 13:29 UTC (permalink / raw)
  To: Haggai Eran; +Cc: Shachar Raindel, Sagi Grimberg, linux-rdma, linux-kernel

Hi,

Le jeudi 02 avril 2015 à 18:12 +0000, Haggai Eran a écrit :
> On Thursday, April 2, 2015 7:44 PM, Shachar Raindel wrote:
> >> -----Original Message-----
> >> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> >> Le jeudi 02 avril 2015 à 18:18 +0300, Haggai Eran a écrit :
> >> > On 02/04/2015 16:30, Yann Droneaud wrote:
> >> >> Le jeudi 02 avril 2015 à 10:52 +0000, Shachar Raindel a écrit :
> >> >>>> -----Original Message-----
> >> >>>> From: Yann Droneaud [mailto:ydroneaud@opteya.com]
> >> >>>> Sent: Thursday, April 02, 2015 1:05 PM
> >> >>>> Le mercredi 18 mars 2015 à 17:39 +0000, Shachar Raindel a écrit :
> >> >>
> >> >>>>> +      /*
> >> >>>>> +       * If the combination of the addr and size requested for this
> >> >>>> memory
> >> >>>>> +       * region causes an integer overflow, return error.
> >> >>>>> +       */
> >> >>>>> +      if ((PAGE_ALIGN(addr + size) <= size) ||
> >> >>>>> +          (PAGE_ALIGN(addr + size) <= addr))
> >> >>>>> +              return ERR_PTR(-EINVAL);
> >> >>>>> +
> >> >>>>
> >> >>>> Can access_ok() be used here ?
> >> >>>>
> >> >>>>          if (!access_ok(writable ? VERIFY_WRITE : VERIFY_READ,
> >> >>>>                         addr, size))
> >> >>>>                   return ERR_PTR(-EINVAL);
> >> >>>>
> >> >>>
> >> >>> No, this will break the current ODP semantics.
> >> >>>
> >> >>> ODP allows the user to register memory that is not accessible yet.
> >> >>> This is a critical design feature, as it allows avoiding holding
> >> >>> a registration cache. Adding this check will break the behavior,
> >> >>> forcing memory to be all accessible when registering an ODP MR.
> >> >>>
> >> >>
> >> >> Where's the check for the range being in userspace memory space,
> >> >> especially for the ODP case ?
> >> >>
> >> >> For non ODP case (eg. plain old behavior), does get_user_pages()
> >> >> ensure the requested pages fit in userspace region on all
> >> >> architectures ? I think so.
> >> >
> >> > Yes, get_user_pages will return a smaller amount of pages than
> >> requested
> >> > if it encounters an unmapped region (or a region without write
> >> > permissions for write requests). If this happens, the loop in
> >> > ib_umem_get calls get_user_pages again with the next set of pages, and
> >> > this time if it the first page still cannot be mapped an error is
> >> returned.
> >> >
> >> >>
> >> >> In ODP case, I'm not sure such check is ever done ?
> >> >
> >> > In ODP, we also call get_user_pages, but only when a page fault occurs
> >> > (see ib_umem_odp_map_dma_pages()). This allows the user to pre-
> >> register
> >> > a memory region that contains unmapped virtual space, and then mmap
> >> > different files into that area without needing to re-register.
> >> >
> >>
> >> OK, thanks for the description.
> >>
> >> ...
> >>
> >> Another related question: as the large memory range could be registered
> >> by user space with ibv_reg_mr(pd, base, size, IB_ACCESS_ON_DEMAND),
> >> what's prevent the kernel to map a file as the result of mmap(0, ...)
> >> in this  region, making it available remotely through IBV_WR_RDMA_READ /
> >> IBV_WR_RDMA_WRITE ?
> >>
> > 
> > This is not a bug. This is a feature.
> > 
> > Exposing a file through RDMA, using ODP, can be done exactly like this.
> > Given that the application explicitly requested this behavior, I don't
> > see why it is a problem. Actually, some of our tests use such flows.
> > The mmu notifiers mechanism allow us to do this safely. When the page is
> > written back to disk, it is removed from the ODP mapping. When it is
> > accessed by the HCA, it is brought back to RAM.
> > 
> 
> I want to add that we would like to see users registering a very large
> memory region (perhaps the entire process address space) for local
> access, and then enabling remote access only to specific regions using
> memory windows. However, this isn't supported yet by our driver.

In such scheme, the registration must still be handled "manually":
one has to create a memory window to get a rkey to be exchanged with
a peer, so why one would want to register such a large memory region
(the whole process space) ?

I guess creating the memory window is faster than registering memory
region. I'd rather say this is not an excuse to register a larger 
memory region (up to the whole process space, current and future) as it 
sounds like a surprising optimisation: let the HCA known too many
pages just to be sure it already knows some when the process want to 
use them. It seems it would become difficult to handle if there's too
many processes.

I would prefer creating memory region becoming costless (through ODP :).

>  Still, there are valid cases where you would still want the results
> of an mmap(0,...) call to be remotely accessible, in cases where there
> is enough trust between the local process and the remote process.

mmap(0, ...., fd) let the kernel choose where to put the file in 
process virtual memory space, so it may, may not, partially, end up in 
an ODP pre registered memory region for a range unallocated/unused yet.

I don't think one want such to happen.

>  It may help a middleware communication library register a large
> portion of the address space in advance, and still work with random
> pointers given to it by another application module.
> 

But as said in the beginnig of your message, the middleware would have
bind a memory window before posting work request / exposing rkey for
the "random pointers".

So I fail to understand how could be used ODP when it comes to 
registering a memory region not yet backed by something.

Regards.

-- 
Yann Droneaud
OPTEYA



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
  2015-04-13 13:29                         ` Yann Droneaud
@ 2015-04-14  8:11                             ` Haggai Eran
  -1 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-14  8:11 UTC (permalink / raw)
  To: Yann Droneaud
  Cc: Shachar Raindel, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 13/04/2015 16:29, Yann Droneaud wrote:
> Le jeudi 02 avril 2015 à 18:12 +0000, Haggai Eran a écrit :
...
>>
>> I want to add that we would like to see users registering a very large
>> memory region (perhaps the entire process address space) for local
>> access, and then enabling remote access only to specific regions using
>> memory windows. However, this isn't supported yet by our driver.
> 
> In such scheme, the registration must still be handled "manually":
> one has to create a memory window to get a rkey to be exchanged with
> a peer, so why one would want to register such a large memory region
> (the whole process space) ?
> 
> I guess creating the memory window is faster than registering memory
> region. 
Right.

It takes time to create and fill the hardware's page tables. Using
memory windows allows you to reuse the work done previously, while still
having a more granular control over the RDMA permissions. The larger MR
can be created with only local permissions, and the memory window can
add specific remote permissions to a smaller range. The memory window
utilizes the page tables created for the memory region.

> I'd rather say this is not an excuse to register a larger
> memory region (up to the whole process space, current and future) as it 
> sounds like a surprising optimisation: let the HCA known too many
> pages just to be sure it already knows some when the process want to 
> use them. It seems it would become difficult to handle if there's too
> many processes.
Are you worried about pinning too many pages? That is an issue we want
to solve with ODP :)

> 
> I would prefer creating memory region becoming costless (through ODP :).
I agree :)

> 
>>  Still, there are valid cases where you would still want the results
>> of an mmap(0,...) call to be remotely accessible, in cases where there
>> is enough trust between the local process and the remote process.
> 
> mmap(0, ...., fd) let the kernel choose where to put the file in 
> process virtual memory space, so it may, may not, partially, end up in 
> an ODP pre registered memory region for a range unallocated/unused yet.
> 
> I don't think one want such to happen.
I think that in some cases the benefit of allowing this outweigh the
risks. This is why it is an opt-in feature.

> 
>>  It may help a middleware communication library register a large
>> portion of the address space in advance, and still work with random
>> pointers given to it by another application module.
>>
> 
> But as said in the beginnig of your message, the middleware would have
> bind a memory window before posting work request / exposing rkey for
> the "random pointers".
> 
> So I fail to understand how could be used ODP when it comes to 
> registering a memory region not yet backed by something.

In this scenario, the middleware would first register the full address
space as an ODP memory region with local permissions only. When it wants
to provide remote access to some buffer, it would bind a memory window
over the ODP MR. This is possible with multiple processes because it
uses the virtual memory system without pinning. It won't cause random
mmap regions to be mapped for RDMA without the specific intent of the
application.

However, we currently don't have support for memory windows over ODP
MRs. Even if we did, there is some performance penalty due to binding
and invalidating memory windows. Some applications will still need full
process address space access for RDMA.

Regards,
Haggai
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access
@ 2015-04-14  8:11                             ` Haggai Eran
  0 siblings, 0 replies; 45+ messages in thread
From: Haggai Eran @ 2015-04-14  8:11 UTC (permalink / raw)
  To: Yann Droneaud; +Cc: Shachar Raindel, Sagi Grimberg, linux-rdma, linux-kernel

On 13/04/2015 16:29, Yann Droneaud wrote:
> Le jeudi 02 avril 2015 à 18:12 +0000, Haggai Eran a écrit :
...
>>
>> I want to add that we would like to see users registering a very large
>> memory region (perhaps the entire process address space) for local
>> access, and then enabling remote access only to specific regions using
>> memory windows. However, this isn't supported yet by our driver.
> 
> In such scheme, the registration must still be handled "manually":
> one has to create a memory window to get a rkey to be exchanged with
> a peer, so why one would want to register such a large memory region
> (the whole process space) ?
> 
> I guess creating the memory window is faster than registering memory
> region. 
Right.

It takes time to create and fill the hardware's page tables. Using
memory windows allows you to reuse the work done previously, while still
having a more granular control over the RDMA permissions. The larger MR
can be created with only local permissions, and the memory window can
add specific remote permissions to a smaller range. The memory window
utilizes the page tables created for the memory region.

> I'd rather say this is not an excuse to register a larger
> memory region (up to the whole process space, current and future) as it 
> sounds like a surprising optimisation: let the HCA known too many
> pages just to be sure it already knows some when the process want to 
> use them. It seems it would become difficult to handle if there's too
> many processes.
Are you worried about pinning too many pages? That is an issue we want
to solve with ODP :)

> 
> I would prefer creating memory region becoming costless (through ODP :).
I agree :)

> 
>>  Still, there are valid cases where you would still want the results
>> of an mmap(0,...) call to be remotely accessible, in cases where there
>> is enough trust between the local process and the remote process.
> 
> mmap(0, ...., fd) let the kernel choose where to put the file in 
> process virtual memory space, so it may, may not, partially, end up in 
> an ODP pre registered memory region for a range unallocated/unused yet.
> 
> I don't think one want such to happen.
I think that in some cases the benefit of allowing this outweigh the
risks. This is why it is an opt-in feature.

> 
>>  It may help a middleware communication library register a large
>> portion of the address space in advance, and still work with random
>> pointers given to it by another application module.
>>
> 
> But as said in the beginnig of your message, the middleware would have
> bind a memory window before posting work request / exposing rkey for
> the "random pointers".
> 
> So I fail to understand how could be used ODP when it comes to 
> registering a memory region not yet backed by something.

In this scenario, the middleware would first register the full address
space as an ODP memory region with local permissions only. When it wants
to provide remote access to some buffer, it would bind a memory window
over the ODP MR. This is possible with multiple processes because it
uses the virtual memory system without pinning. It won't cause random
mmap regions to be mapped for RDMA without the specific intent of the
application.

However, we currently don't have support for memory windows over ODP
MRs. Even if we did, there is some performance penalty due to binding
and invalidating memory windows. Some applications will still need full
process address space access for RDMA.

Regards,
Haggai

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2015-04-14  8:12 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-18 17:39 CVE-2014-8159 kernel: infiniband: uverbs: unprotected physical memory access Shachar Raindel
     [not found] ` <AM3PR05MB0935AABF569F15EA846B8E72DC000-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-04-01 17:28   ` Roland Dreier
2015-04-02  7:52     ` Shachar Raindel
2015-04-02 16:32       ` Roland Dreier
2015-04-02 16:39         ` Shachar Raindel
2015-04-02 10:04 ` Yann Droneaud
2015-04-02 10:04   ` Yann Droneaud
     [not found]   ` <1427969085.17020.5.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-04-02 10:52     ` Shachar Raindel
2015-04-02 10:52       ` Shachar Raindel
2015-04-02 10:52       ` Shachar Raindel
     [not found]       ` <AM3PR05MB0935AA4898B4B519D2DAA3C4DCF20-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-04-02 13:30         ` Yann Droneaud
2015-04-02 13:30           ` Yann Droneaud
2015-04-02 13:30           ` Yann Droneaud
2015-04-02 15:18           ` Haggai Eran
2015-04-02 15:18             ` Haggai Eran
2015-04-02 16:35             ` Yann Droneaud
2015-04-02 16:35               ` Yann Droneaud
2015-04-02 16:44               ` Shachar Raindel
2015-04-02 16:44                 ` Shachar Raindel
     [not found]                 ` <AM2PR05MB0929FB71C5A4DE92A7709F92DCF20-Wc3DjHnhGicijy4iGQu0rtqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-04-02 18:12                   ` Haggai Eran
2015-04-02 18:12                     ` Haggai Eran
2015-04-02 18:12                     ` Haggai Eran
     [not found]                     ` <1427998401240.52348-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-04-13 13:29                       ` Yann Droneaud
2015-04-13 13:29                         ` Yann Droneaud
     [not found]                         ` <1428931781.22575.232.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-04-14  8:11                           ` Haggai Eran
2015-04-14  8:11                             ` Haggai Eran
2015-04-02 20:40                 ` Yann Droneaud
2015-04-02 20:40                   ` Yann Droneaud
     [not found]                   ` <1428007208.22575.104.camel-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
2015-04-03  8:39                     ` Haggai Eran
2015-04-03  8:39                       ` Haggai Eran
2015-04-03  8:39                       ` Haggai Eran
2015-04-03 11:49                       ` Yann Droneaud
2015-04-03 11:49                         ` Yann Droneaud
     [not found]                 ` <20150402174401.GA24285@nautica>
2015-04-03 11:49                   ` [oss-security] " Shachar Raindel
     [not found]                     ` <AM3PR05MB0935F9F1011E30F11C2A08BDDCF10-LOZWmgKjnYgQouBfZGh8ttqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-04-03 12:43                       ` Dominique Martinet
2015-04-02 15:15       ` Yann Droneaud
2015-04-02 15:15         ` Yann Droneaud
2015-04-02 15:15         ` Yann Droneaud
2015-04-02 16:34         ` Shachar Raindel
2015-04-02 16:34           ` Shachar Raindel
     [not found]           ` <AM2PR05MB0929EDC60BBE5DAAAD4AB1B4DCF20-Wc3DjHnhGicijy4iGQu0rtqRiQSDpxhJvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2015-04-08 12:19             ` Yann Droneaud
2015-04-08 12:19               ` Yann Droneaud
2015-04-08 12:19               ` Yann Droneaud
2015-04-08 12:44               ` Yann Droneaud
2015-04-08 12:44                 ` Yann Droneaud

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.