All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next][PATCH v2 0/2] rds: handle unsupported rdma request to fs dax memory
@ 2019-04-29 23:37 Santosh Shilimkar
  2019-04-29 23:37 ` [net-next][PATCH v2 1/2] " Santosh Shilimkar
  2019-04-29 23:37 ` [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging Santosh Shilimkar
  0 siblings, 2 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2019-04-29 23:37 UTC (permalink / raw)
  To: netdev, davem; +Cc: santosh.shilimkar

RDS doesn't support RDMA on memory apertures that require On Demand
Paging (ODP), such as FS DAX memory. User applications can try to use
RDS to perform RDMA over such memories and since it doesn't report any
failure, it can lead to unexpected issues like memory corruption when
a couple of out of sync file system operations like ftruncate etc. are
performed.

The patch adds a check so that such an attempt to RDMA to/from memory
apertures requiring ODP will fail. A sysctl is added to indicate
whether RDMA on ODP memory is supported.


Hans Westgaard Ry (1):
  rds: handle unsupported rdma request to fs dax memory

Santosh Shilimkar (1):
  rds: add sysctl for rds support of On-Demand-Paging

 net/rds/ib.h        | 1 +
 net/rds/ib_sysctl.c | 8 ++++++++
 net/rds/rdma.c      | 5 +++--
 3 files changed, 12 insertions(+), 2 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-04-29 23:37 [net-next][PATCH v2 0/2] rds: handle unsupported rdma request to fs dax memory Santosh Shilimkar
@ 2019-04-29 23:37 ` Santosh Shilimkar
  2019-05-01  7:44   ` Leon Romanovsky
  2019-05-10 12:54   ` Jason Gunthorpe
  2019-04-29 23:37 ` [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging Santosh Shilimkar
  1 sibling, 2 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2019-04-29 23:37 UTC (permalink / raw)
  To: netdev, davem; +Cc: santosh.shilimkar

From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>

RDS doesn't support RDMA on memory apertures that require On Demand
Paging (ODP), such as FS DAX memory. User applications can try to use
RDS to perform RDMA over such memories and since it doesn't report any
failure, it can lead to unexpected issues like memory corruption when
a couple of out of sync file system operations like ftruncate etc. are
performed.

The patch adds a check so that such an attempt to RDMA to/from memory
apertures requiring ODP will fail.

Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/rdma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index 182ab84..e0a6b72 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
 {
 	int ret;
 
-	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
-
+	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
+	ret = get_user_pages_longterm(user_addr, nr_pages,
+				      write, pages, NULL);
 	if (ret >= 0 && ret < nr_pages) {
 		while (ret--)
 			put_page(pages[ret]);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-04-29 23:37 [net-next][PATCH v2 0/2] rds: handle unsupported rdma request to fs dax memory Santosh Shilimkar
  2019-04-29 23:37 ` [net-next][PATCH v2 1/2] " Santosh Shilimkar
@ 2019-04-29 23:37 ` Santosh Shilimkar
  2019-05-01  7:45   ` Leon Romanovsky
  2019-05-10 13:02   ` Jason Gunthorpe
  1 sibling, 2 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2019-04-29 23:37 UTC (permalink / raw)
  To: netdev, davem; +Cc: santosh.shilimkar

RDS doesn't support RDMA on memory apertures that require On Demand
Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
whether RDMA requiring ODP is supported.

Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/ib.h        | 1 +
 net/rds/ib_sysctl.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/net/rds/ib.h b/net/rds/ib.h
index 67a715b..80e11ef 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -457,5 +457,6 @@ unsigned int rds_ib_stats_info_copy(struct rds_info_iterator *iter,
 extern unsigned long rds_ib_sysctl_max_unsig_bytes;
 extern unsigned long rds_ib_sysctl_max_recv_allocation;
 extern unsigned int rds_ib_sysctl_flow_control;
+extern unsigned int rds_ib_sysctl_odp_support;
 
 #endif
diff --git a/net/rds/ib_sysctl.c b/net/rds/ib_sysctl.c
index e4e41b3..7cc02cd 100644
--- a/net/rds/ib_sysctl.c
+++ b/net/rds/ib_sysctl.c
@@ -60,6 +60,7 @@
  * will cause credits to be added before protocol negotiation.
  */
 unsigned int rds_ib_sysctl_flow_control = 0;
+unsigned int rds_ib_sysctl_odp_support;
 
 static struct ctl_table rds_ib_sysctl_table[] = {
 	{
@@ -103,6 +104,13 @@
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname       = "odp_support",
+		.data           = &rds_ib_sysctl_odp_support,
+		.maxlen         = sizeof(rds_ib_sysctl_odp_support),
+		.mode           = 0444,
+		.proc_handler   = proc_dointvec,
+	},
 	{ }
 };
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-04-29 23:37 ` [net-next][PATCH v2 1/2] " Santosh Shilimkar
@ 2019-05-01  7:44   ` Leon Romanovsky
  2019-05-01 17:54     ` Santosh Shilimkar
  2019-05-10 12:54   ` Jason Gunthorpe
  1 sibling, 1 reply; 28+ messages in thread
From: Leon Romanovsky @ 2019-05-01  7:44 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem

On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
> From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>
> RDS doesn't support RDMA on memory apertures that require On Demand
> Paging (ODP), such as FS DAX memory. User applications can try to use
> RDS to perform RDMA over such memories and since it doesn't report any
> failure, it can lead to unexpected issues like memory corruption when
> a couple of out of sync file system operations like ftruncate etc. are
> performed.
>
> The patch adds a check so that such an attempt to RDMA to/from memory
> apertures requiring ODP will fail.
>
> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> ---
>  net/rds/rdma.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> index 182ab84..e0a6b72 100644
> --- a/net/rds/rdma.c
> +++ b/net/rds/rdma.c
> @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
>  {
>  	int ret;
>
> -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
> -
> +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
> +	ret = get_user_pages_longterm(user_addr, nr_pages,
> +				      write, pages, NULL);

I'm not RDS expert, but from what I see in net/rds/rdma.c and this code,
you tried to mimic ib_umem_get() without protection, checks and native
ODP, FS and DAX supports.

The real way to solve your ODP problem will require to extend
ib_umem_get() to work for kernel ULPs too and use it instead of
get_user_pages(). We are working on that and it is in internal review now.

It is applicable if underneath your RDS code, there is IB code, in case
there is no such layer, you shouldn't return IB_DEVICE_ON_DEMAND_PAGING
capability to user space and return EINVAL for every attempt to create
such ODP MR.

Thanks

>  	if (ret >= 0 && ret < nr_pages) {
>  		while (ret--)
>  			put_page(pages[ret]);
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-04-29 23:37 ` [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging Santosh Shilimkar
@ 2019-05-01  7:45   ` Leon Romanovsky
  2019-05-01 17:54     ` Santosh Shilimkar
  2019-05-10 13:02   ` Jason Gunthorpe
  1 sibling, 1 reply; 28+ messages in thread
From: Leon Romanovsky @ 2019-05-01  7:45 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem

On Mon, Apr 29, 2019 at 04:37:20PM -0700, Santosh Shilimkar wrote:
> RDS doesn't support RDMA on memory apertures that require On Demand
> Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
> whether RDMA requiring ODP is supported.
>
> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> ---
>  net/rds/ib.h        | 1 +
>  net/rds/ib_sysctl.c | 8 ++++++++
>  2 files changed, 9 insertions(+)

This sysctl is not needed at all

>
> diff --git a/net/rds/ib.h b/net/rds/ib.h
> index 67a715b..80e11ef 100644
> --- a/net/rds/ib.h
> +++ b/net/rds/ib.h
> @@ -457,5 +457,6 @@ unsigned int rds_ib_stats_info_copy(struct rds_info_iterator *iter,
>  extern unsigned long rds_ib_sysctl_max_unsig_bytes;
>  extern unsigned long rds_ib_sysctl_max_recv_allocation;
>  extern unsigned int rds_ib_sysctl_flow_control;
> +extern unsigned int rds_ib_sysctl_odp_support;
>
>  #endif
> diff --git a/net/rds/ib_sysctl.c b/net/rds/ib_sysctl.c
> index e4e41b3..7cc02cd 100644
> --- a/net/rds/ib_sysctl.c
> +++ b/net/rds/ib_sysctl.c
> @@ -60,6 +60,7 @@
>   * will cause credits to be added before protocol negotiation.
>   */
>  unsigned int rds_ib_sysctl_flow_control = 0;
> +unsigned int rds_ib_sysctl_odp_support;
>
>  static struct ctl_table rds_ib_sysctl_table[] = {
>  	{
> @@ -103,6 +104,13 @@
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +	{
> +		.procname       = "odp_support",
> +		.data           = &rds_ib_sysctl_odp_support,
> +		.maxlen         = sizeof(rds_ib_sysctl_odp_support),
> +		.mode           = 0444,
> +		.proc_handler   = proc_dointvec,
> +	},
>  	{ }
>  };
>
> --
> 1.9.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-01  7:44   ` Leon Romanovsky
@ 2019-05-01 17:54     ` Santosh Shilimkar
  2019-05-02  6:21       ` Leon Romanovsky
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-01 17:54 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev, davem

On 5/1/2019 12:44 AM, Leon Romanovsky wrote:
> On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
>> From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>
>> RDS doesn't support RDMA on memory apertures that require On Demand
>> Paging (ODP), such as FS DAX memory. User applications can try to use
>> RDS to perform RDMA over such memories and since it doesn't report any
>> failure, it can lead to unexpected issues like memory corruption when
>> a couple of out of sync file system operations like ftruncate etc. are
>> performed.
>>
>> The patch adds a check so that such an attempt to RDMA to/from memory
>> apertures requiring ODP will fail.
>>
>> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>> ---
>>   net/rds/rdma.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
>> index 182ab84..e0a6b72 100644
>> --- a/net/rds/rdma.c
>> +++ b/net/rds/rdma.c
>> @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
>>   {
>>   	int ret;
>>
>> -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
>> -
>> +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
>> +	ret = get_user_pages_longterm(user_addr, nr_pages,
>> +				      write, pages, NULL);
> 
> I'm not RDS expert, but from what I see in net/rds/rdma.c and this code,
> you tried to mimic ib_umem_get() without protection, checks and native
> ODP, FS and DAX supports.
>
> The real way to solve your ODP problem will require to extend
> ib_umem_get() to work for kernel ULPs too and use it instead of
> get_user_pages(). We are working on that and it is in internal review now.
>
Yes am aware of it. For FS_DAX like memory,  get_user_pages_longterm()
fails and then using ib_reg_user_mr() the memory is registered as
ODP regsion. This work is not ready yet and without above check,
one can do RDMA on FS DAX memory with Fast Reg or FMR memory
registration which is not safe and hence need to fail the operation.

Once the support is added to RDS, this code path will make that
registration go through.

Hope it clarifies.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-05-01  7:45   ` Leon Romanovsky
@ 2019-05-01 17:54     ` Santosh Shilimkar
  2019-05-02  6:18       ` Leon Romanovsky
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-01 17:54 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev, davem

On 5/1/2019 12:45 AM, Leon Romanovsky wrote:
> On Mon, Apr 29, 2019 at 04:37:20PM -0700, Santosh Shilimkar wrote:
>> RDS doesn't support RDMA on memory apertures that require On Demand
>> Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
>> whether RDMA requiring ODP is supported.
>>
>> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>> ---
>>   net/rds/ib.h        | 1 +
>>   net/rds/ib_sysctl.c | 8 ++++++++
>>   2 files changed, 9 insertions(+)
> 
> This sysctl is not needed at all
> 
Its needed for application to check the support of the ODP support
feature which in progress. Failing the RDS_GET_MR was just one path
and we also support inline MR registration along with message request.

Basically application runs on different kernel versions and to be
portable, it will check if underneath RDS support ODP and then only
use RDMA. If not it will fallback to buffer copy mode. Hope
it clarifies.


Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-05-01 17:54     ` Santosh Shilimkar
@ 2019-05-02  6:18       ` Leon Romanovsky
  2019-05-02 17:59         ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Leon Romanovsky @ 2019-05-02  6:18 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem

On Wed, May 01, 2019 at 10:54:50AM -0700, Santosh Shilimkar wrote:
> On 5/1/2019 12:45 AM, Leon Romanovsky wrote:
> > On Mon, Apr 29, 2019 at 04:37:20PM -0700, Santosh Shilimkar wrote:
> > > RDS doesn't support RDMA on memory apertures that require On Demand
> > > Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
> > > whether RDMA requiring ODP is supported.
> > >
> > > Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> > > Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> > > Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> > > ---
> > >   net/rds/ib.h        | 1 +
> > >   net/rds/ib_sysctl.c | 8 ++++++++
> > >   2 files changed, 9 insertions(+)
> >
> > This sysctl is not needed at all
> >
> Its needed for application to check the support of the ODP support
> feature which in progress. Failing the RDS_GET_MR was just one path
> and we also support inline MR registration along with message request.
>
> Basically application runs on different kernel versions and to be
> portable, it will check if underneath RDS support ODP and then only
> use RDMA. If not it will fallback to buffer copy mode. Hope
> it clarifies.

Using ODP sysctl to determine if to use RDMA or not, looks like very
problematic approach. How old applications will work in such case
without knowledge of such sysctl?
How new applications will distinguish between ODP is not supported, but
RDMA works?

Thanks

>
>
> Regards,
> Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-01 17:54     ` Santosh Shilimkar
@ 2019-05-02  6:21       ` Leon Romanovsky
  2019-05-02 17:52         ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Leon Romanovsky @ 2019-05-02  6:21 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem

On Wed, May 01, 2019 at 10:54:00AM -0700, Santosh Shilimkar wrote:
> On 5/1/2019 12:44 AM, Leon Romanovsky wrote:
> > On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
> > > From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > >
> > > RDS doesn't support RDMA on memory apertures that require On Demand
> > > Paging (ODP), such as FS DAX memory. User applications can try to use
> > > RDS to perform RDMA over such memories and since it doesn't report any
> > > failure, it can lead to unexpected issues like memory corruption when
> > > a couple of out of sync file system operations like ftruncate etc. are
> > > performed.
> > >
> > > The patch adds a check so that such an attempt to RDMA to/from memory
> > > apertures requiring ODP will fail.
> > >
> > > Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> > > Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> > > Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> > > ---
> > >   net/rds/rdma.c | 5 +++--
> > >   1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> > > index 182ab84..e0a6b72 100644
> > > --- a/net/rds/rdma.c
> > > +++ b/net/rds/rdma.c
> > > @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
> > >   {
> > >   	int ret;
> > >
> > > -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
> > > -
> > > +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
> > > +	ret = get_user_pages_longterm(user_addr, nr_pages,
> > > +				      write, pages, NULL);
> >
> > I'm not RDS expert, but from what I see in net/rds/rdma.c and this code,
> > you tried to mimic ib_umem_get() without protection, checks and native
> > ODP, FS and DAX supports.
> >
> > The real way to solve your ODP problem will require to extend
> > ib_umem_get() to work for kernel ULPs too and use it instead of
> > get_user_pages(). We are working on that and it is in internal review now.
> >
> Yes am aware of it. For FS_DAX like memory,  get_user_pages_longterm()
> fails and then using ib_reg_user_mr() the memory is registered as
> ODP regsion. This work is not ready yet and without above check,
> one can do RDMA on FS DAX memory with Fast Reg or FMR memory
> registration which is not safe and hence need to fail the operation.
>
> Once the support is added to RDS, this code path will make that
> registration go through.
>
> Hope it clarifies.

Only partial, why don't you check if user asked ODP through verbs
interface and return EOPNOTSUPP in such case?

It will ensure that once your code will support ODP properly written
applications will work with/without ODP natively.

Thanks

>
> Regards,
> Santosh
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-02  6:21       ` Leon Romanovsky
@ 2019-05-02 17:52         ` Santosh Shilimkar
  2019-05-05  6:28           ` Leon Romanovsky
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-02 17:52 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev, davem, Moni Shoua

On 5/1/2019 11:21 PM, Leon Romanovsky wrote:
> On Wed, May 01, 2019 at 10:54:00AM -0700, Santosh Shilimkar wrote:
>> On 5/1/2019 12:44 AM, Leon Romanovsky wrote:
>>> On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
>>>> From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>>
>>>> RDS doesn't support RDMA on memory apertures that require On Demand
>>>> Paging (ODP), such as FS DAX memory. User applications can try to use
>>>> RDS to perform RDMA over such memories and since it doesn't report any
>>>> failure, it can lead to unexpected issues like memory corruption when
>>>> a couple of out of sync file system operations like ftruncate etc. are
>>>> performed.
>>>>
>>>> The patch adds a check so that such an attempt to RDMA to/from memory
>>>> apertures requiring ODP will fail.
>>>>
>>>> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>>> ---
>>>>    net/rds/rdma.c | 5 +++--
>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
>>>> index 182ab84..e0a6b72 100644
>>>> --- a/net/rds/rdma.c
>>>> +++ b/net/rds/rdma.c
>>>> @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
>>>>    {
>>>>    	int ret;
>>>>
>>>> -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
>>>> -
>>>> +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
>>>> +	ret = get_user_pages_longterm(user_addr, nr_pages,
>>>> +				      write, pages, NULL);
>>>
>>> I'm not RDS expert, but from what I see in net/rds/rdma.c and this code,
>>> you tried to mimic ib_umem_get() without protection, checks and native
>>> ODP, FS and DAX supports.
>>>
>>> The real way to solve your ODP problem will require to extend
>>> ib_umem_get() to work for kernel ULPs too and use it instead of
>>> get_user_pages(). We are working on that and it is in internal review now.
>>>
>> Yes am aware of it. For FS_DAX like memory,  get_user_pages_longterm()
>> fails and then using ib_reg_user_mr() the memory is registered as
>> ODP regsion. This work is not ready yet and without above check,
>> one can do RDMA on FS DAX memory with Fast Reg or FMR memory
>> registration which is not safe and hence need to fail the operation.
>>
>> Once the support is added to RDS, this code path will make that
>> registration go through.
>>
>> Hope it clarifies.
> 
> Only partial, why don't you check if user asked ODP through verbs
> interface and return EOPNOTSUPP in such case?
>
I think you are mixing two separate things. ODP is just one way of
supporting RDMA on FS DAX memory. Tomorrow, some other mechanism
can be used as well. RDS is just using inbuilt kernel mm API
to find out if its FS DAX memory(get_user_pages_longterm).
Current code will make RDS get_mr fail if RDS application issues
memory registration request on FS DAX memory and in future when
support gets added, it will do the ODP registration and return
the key.

> It will ensure that once your code will support ODP properly written
> applications will work with/without ODP natively.
> 
Application shouldn't care if RDS ULP internally uses ODP
or some other mechanism to support RDMA on FS DAX memory.
This makes it transparent it to RDS application.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-05-02  6:18       ` Leon Romanovsky
@ 2019-05-02 17:59         ` Santosh Shilimkar
  2019-05-05  6:22           ` Leon Romanovsky
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-02 17:59 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev, davem, Moni Shoua



On 5/1/2019 11:18 PM, Leon Romanovsky wrote:
> On Wed, May 01, 2019 at 10:54:50AM -0700, Santosh Shilimkar wrote:
>> On 5/1/2019 12:45 AM, Leon Romanovsky wrote:
>>> On Mon, Apr 29, 2019 at 04:37:20PM -0700, Santosh Shilimkar wrote:
>>>> RDS doesn't support RDMA on memory apertures that require On Demand
>>>> Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
>>>> whether RDMA requiring ODP is supported.
>>>>
>>>> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>>> ---
>>>>    net/rds/ib.h        | 1 +
>>>>    net/rds/ib_sysctl.c | 8 ++++++++
>>>>    2 files changed, 9 insertions(+)
>>>
>>> This sysctl is not needed at all
>>>
>> Its needed for application to check the support of the ODP support
>> feature which in progress. Failing the RDS_GET_MR was just one path
>> and we also support inline MR registration along with message request.
>>
>> Basically application runs on different kernel versions and to be
>> portable, it will check if underneath RDS support ODP and then only
>> use RDMA. If not it will fallback to buffer copy mode. Hope
>> it clarifies.
> 
> Using ODP sysctl to determine if to use RDMA or not, looks like very
> problematic approach. How old applications will work in such case
> without knowledge of such sysctl?
> How new applications will distinguish between ODP is not supported, but
> RDMA works?
> 
Actually this is not ODP sysctl but really whether RDS supports
RDMA on fs_dax memory or not. I had different name for sysctl but
in internal review it got changed.

Ignoring the name of the sysctl, here is the application logic.
- If fs_dax sysctl path doesn't exist, no RDMA on FS DAX memory(this
will cover all the older kernels, which doesn't have this patch)
- If fs_dax sysctl path exist and its value is 0, no RDMA on FS
DAX. This will cover kernels which this patch but don't have
actual support for ODP based registration.
- If fs_dax sysctl path exist and its value is 1, RDMA can be
issued on FS DAX memory. This sysctl will be updated to value 1
once the support gets added.

Hope it clarifies better now.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-05-02 17:59         ` Santosh Shilimkar
@ 2019-05-05  6:22           ` Leon Romanovsky
  2019-05-06 16:37             ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Leon Romanovsky @ 2019-05-05  6:22 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem, Moni Shoua

On Thu, May 02, 2019 at 10:59:58AM -0700, Santosh Shilimkar wrote:
>
>
> On 5/1/2019 11:18 PM, Leon Romanovsky wrote:
> > On Wed, May 01, 2019 at 10:54:50AM -0700, Santosh Shilimkar wrote:
> > > On 5/1/2019 12:45 AM, Leon Romanovsky wrote:
> > > > On Mon, Apr 29, 2019 at 04:37:20PM -0700, Santosh Shilimkar wrote:
> > > > > RDS doesn't support RDMA on memory apertures that require On Demand
> > > > > Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
> > > > > whether RDMA requiring ODP is supported.
> > > > >
> > > > > Reviewed-by: H??kon Bugge <haakon.bugge@oracle.com>
> > > > > Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> > > > > Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > > > Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> > > > > ---
> > > > >    net/rds/ib.h        | 1 +
> > > > >    net/rds/ib_sysctl.c | 8 ++++++++
> > > > >    2 files changed, 9 insertions(+)
> > > >
> > > > This sysctl is not needed at all
> > > >
> > > Its needed for application to check the support of the ODP support
> > > feature which in progress. Failing the RDS_GET_MR was just one path
> > > and we also support inline MR registration along with message request.
> > >
> > > Basically application runs on different kernel versions and to be
> > > portable, it will check if underneath RDS support ODP and then only
> > > use RDMA. If not it will fallback to buffer copy mode. Hope
> > > it clarifies.
> >
> > Using ODP sysctl to determine if to use RDMA or not, looks like very
> > problematic approach. How old applications will work in such case
> > without knowledge of such sysctl?
> > How new applications will distinguish between ODP is not supported, but
> > RDMA works?
> >
> Actually this is not ODP sysctl but really whether RDS supports
> RDMA on fs_dax memory or not. I had different name for sysctl but
> in internal review it got changed.
>
> Ignoring the name of the sysctl, here is the application logic.
> - If fs_dax sysctl path doesn't exist, no RDMA on FS DAX memory(this
> will cover all the older kernels, which doesn't have this patch)
> - If fs_dax sysctl path exist and its value is 0, no RDMA on FS
> DAX. This will cover kernels which this patch but don't have
> actual support for ODP based registration.
> - If fs_dax sysctl path exist and its value is 1, RDMA can be
> issued on FS DAX memory. This sysctl will be updated to value 1
> once the support gets added.
>
> Hope it clarifies better now.

Santosh,

Thanks for explanation, I have one more question,

If I'm author of hostile application and write code to disregard that
new sysctl, will any of combinations of kernel/application cause to
kernel panic? If not, we don't really need to expose this information,
if yes, this sysctl is not enough.

Thanks

>
> Regards,
> Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-02 17:52         ` Santosh Shilimkar
@ 2019-05-05  6:28           ` Leon Romanovsky
  2019-05-06 16:39             ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Leon Romanovsky @ 2019-05-05  6:28 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem, Moni Shoua

On Thu, May 02, 2019 at 10:52:23AM -0700, Santosh Shilimkar wrote:
> On 5/1/2019 11:21 PM, Leon Romanovsky wrote:
> > On Wed, May 01, 2019 at 10:54:00AM -0700, Santosh Shilimkar wrote:
> > > On 5/1/2019 12:44 AM, Leon Romanovsky wrote:
> > > > On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
> > > > > From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > > >
> > > > > RDS doesn't support RDMA on memory apertures that require On Demand
> > > > > Paging (ODP), such as FS DAX memory. User applications can try to use
> > > > > RDS to perform RDMA over such memories and since it doesn't report any
> > > > > failure, it can lead to unexpected issues like memory corruption when
> > > > > a couple of out of sync file system operations like ftruncate etc. are
> > > > > performed.
> > > > >
> > > > > The patch adds a check so that such an attempt to RDMA to/from memory
> > > > > apertures requiring ODP will fail.
> > > > >
> > > > > Reviewed-by: H??kon Bugge <haakon.bugge@oracle.com>
> > > > > Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> > > > > Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > > > Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> > > > > ---
> > > > >    net/rds/rdma.c | 5 +++--
> > > > >    1 file changed, 3 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> > > > > index 182ab84..e0a6b72 100644
> > > > > --- a/net/rds/rdma.c
> > > > > +++ b/net/rds/rdma.c
> > > > > @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
> > > > >    {
> > > > >    	int ret;
> > > > >
> > > > > -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
> > > > > -
> > > > > +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
> > > > > +	ret = get_user_pages_longterm(user_addr, nr_pages,
> > > > > +				      write, pages, NULL);
> > > >
> > > > I'm not RDS expert, but from what I see in net/rds/rdma.c and this code,
> > > > you tried to mimic ib_umem_get() without protection, checks and native
> > > > ODP, FS and DAX supports.
> > > >
> > > > The real way to solve your ODP problem will require to extend
> > > > ib_umem_get() to work for kernel ULPs too and use it instead of
> > > > get_user_pages(). We are working on that and it is in internal review now.
> > > >
> > > Yes am aware of it. For FS_DAX like memory,  get_user_pages_longterm()
> > > fails and then using ib_reg_user_mr() the memory is registered as
> > > ODP regsion. This work is not ready yet and without above check,
> > > one can do RDMA on FS DAX memory with Fast Reg or FMR memory
> > > registration which is not safe and hence need to fail the operation.
> > >
> > > Once the support is added to RDS, this code path will make that
> > > registration go through.
> > >
> > > Hope it clarifies.
> >
> > Only partial, why don't you check if user asked ODP through verbs
> > interface and return EOPNOTSUPP in such case?
> >
> I think you are mixing two separate things. ODP is just one way of
> supporting RDMA on FS DAX memory. Tomorrow, some other mechanism
> can be used as well. RDS is just using inbuilt kernel mm API
> to find out if its FS DAX memory(get_user_pages_longterm).
> Current code will make RDS get_mr fail if RDS application issues
> memory registration request on FS DAX memory and in future when
> support gets added, it will do the ODP registration and return
> the key.

But we are talking about kernel code only, right?
Future support will be added if it exists.

>
> > It will ensure that once your code will support ODP properly written
> > applications will work with/without ODP natively.
> >
> Application shouldn't care if RDS ULP internally uses ODP
> or some other mechanism to support RDMA on FS DAX memory.
> This makes it transparent it to RDS application.

ODP checks need to be internal to kernel, user won't see those ODP
checks.

Thanks

>
> Regards,
> Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-05-05  6:22           ` Leon Romanovsky
@ 2019-05-06 16:37             ` Santosh Shilimkar
  0 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-06 16:37 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev, davem, Moni Shoua

5/4/2019 11:22 PM, Leon Romanovsky wrote:
> On Thu, May 02, 2019 at 10:59:58AM -0700, Santosh Shilimkar wrote:
>>
>>
>> On 5/1/2019 11:18 PM, Leon Romanovsky wrote:
>>> On Wed, May 01, 2019 at 10:54:50AM -0700, Santosh Shilimkar wrote:
>>>> On 5/1/2019 12:45 AM, Leon Romanovsky wrote:
>>>>> On Mon, Apr 29, 2019 at 04:37:20PM -0700, Santosh Shilimkar wrote:
>>>>>> RDS doesn't support RDMA on memory apertures that require On Demand
>>>>>> Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
>>>>>> whether RDMA requiring ODP is supported.
>>>>>>
>>>>>> Reviewed-by: H??kon Bugge <haakon.bugge@oracle.com>
>>>>>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>>>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>>>>> ---
>>>>>>     net/rds/ib.h        | 1 +
>>>>>>     net/rds/ib_sysctl.c | 8 ++++++++
>>>>>>     2 files changed, 9 insertions(+)
>>>>>
>>>>> This sysctl is not needed at all
>>>>>
>>>> Its needed for application to check the support of the ODP support
>>>> feature which in progress. Failing the RDS_GET_MR was just one path
>>>> and we also support inline MR registration along with message request.
>>>>
>>>> Basically application runs on different kernel versions and to be
>>>> portable, it will check if underneath RDS support ODP and then only
>>>> use RDMA. If not it will fallback to buffer copy mode. Hope
>>>> it clarifies.
>>>
>>> Using ODP sysctl to determine if to use RDMA or not, looks like very
>>> problematic approach. How old applications will work in such case
>>> without knowledge of such sysctl?
>>> How new applications will distinguish between ODP is not supported, but
>>> RDMA works?
>>>
>> Actually this is not ODP sysctl but really whether RDS supports
>> RDMA on fs_dax memory or not. I had different name for sysctl but
>> in internal review it got changed.
>>
>> Ignoring the name of the sysctl, here is the application logic.
>> - If fs_dax sysctl path doesn't exist, no RDMA on FS DAX memory(this
>> will cover all the older kernels, which doesn't have this patch)
>> - If fs_dax sysctl path exist and its value is 0, no RDMA on FS
>> DAX. This will cover kernels which this patch but don't have
>> actual support for ODP based registration.
>> - If fs_dax sysctl path exist and its value is 1, RDMA can be
>> issued on FS DAX memory. This sysctl will be updated to value 1
>> once the support gets added.
>>
>> Hope it clarifies better now.
> 
> Santosh,
> 
> Thanks for explanation, I have one more question,
> 
> If I'm author of hostile application and write code to disregard that
> new sysctl, will any of combinations of kernel/application cause to
> kernel panic? If not, we don't really need to expose this information,
> if yes, this sysctl is not enough.
> 
It Won't panic. Thats why the other patch also makes the call fail when
tried to register FS DAX memory with RDS.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-05  6:28           ` Leon Romanovsky
@ 2019-05-06 16:39             ` Santosh Shilimkar
  0 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-06 16:39 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: netdev, davem, Moni Shoua



On 5/4/2019 11:28 PM, Leon Romanovsky wrote:
> On Thu, May 02, 2019 at 10:52:23AM -0700, Santosh Shilimkar wrote:
>> On 5/1/2019 11:21 PM, Leon Romanovsky wrote:
>>> On Wed, May 01, 2019 at 10:54:00AM -0700, Santosh Shilimkar wrote:
>>>> On 5/1/2019 12:44 AM, Leon Romanovsky wrote:
>>>>> On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
>>>>>> From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>>>>
>>>>>> RDS doesn't support RDMA on memory apertures that require On Demand
>>>>>> Paging (ODP), such as FS DAX memory. User applications can try to use
>>>>>> RDS to perform RDMA over such memories and since it doesn't report any
>>>>>> failure, it can lead to unexpected issues like memory corruption when
>>>>>> a couple of out of sync file system operations like ftruncate etc. are
>>>>>> performed.
>>>>>>
>>>>>> The patch adds a check so that such an attempt to RDMA to/from memory
>>>>>> apertures requiring ODP will fail.
>>>>>>
>>>>>> Reviewed-by: H??kon Bugge <haakon.bugge@oracle.com>
>>>>>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>>>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>>>>> ---
>>>>>>     net/rds/rdma.c | 5 +++--
>>>>>>     1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
>>>>>> index 182ab84..e0a6b72 100644
>>>>>> --- a/net/rds/rdma.c
>>>>>> +++ b/net/rds/rdma.c
>>>>>> @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
>>>>>>     {
>>>>>>     	int ret;
>>>>>>
>>>>>> -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
>>>>>> -
>>>>>> +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
>>>>>> +	ret = get_user_pages_longterm(user_addr, nr_pages,
>>>>>> +				      write, pages, NULL);
>>>>>
>>>>> I'm not RDS expert, but from what I see in net/rds/rdma.c and this code,
>>>>> you tried to mimic ib_umem_get() without protection, checks and native
>>>>> ODP, FS and DAX supports.
>>>>>
>>>>> The real way to solve your ODP problem will require to extend
>>>>> ib_umem_get() to work for kernel ULPs too and use it instead of
>>>>> get_user_pages(). We are working on that and it is in internal review now.
>>>>>
>>>> Yes am aware of it. For FS_DAX like memory,  get_user_pages_longterm()
>>>> fails and then using ib_reg_user_mr() the memory is registered as
>>>> ODP regsion. This work is not ready yet and without above check,
>>>> one can do RDMA on FS DAX memory with Fast Reg or FMR memory
>>>> registration which is not safe and hence need to fail the operation.
>>>>
>>>> Once the support is added to RDS, this code path will make that
>>>> registration go through.
>>>>
>>>> Hope it clarifies.
>>>
>>> Only partial, why don't you check if user asked ODP through verbs
>>> interface and return EOPNOTSUPP in such case?
>>>
>> I think you are mixing two separate things. ODP is just one way of
>> supporting RDMA on FS DAX memory. Tomorrow, some other mechanism
>> can be used as well. RDS is just using inbuilt kernel mm API
>> to find out if its FS DAX memory(get_user_pages_longterm).
>> Current code will make RDS get_mr fail if RDS application issues
>> memory registration request on FS DAX memory and in future when
>> support gets added, it will do the ODP registration and return
>> the key.
> 
> But we are talking about kernel code only, right?
> Future support will be added if it exists.
> 
yes kernel code only.

>>
>>> It will ensure that once your code will support ODP properly written
>>> applications will work with/without ODP natively.
>>>
>> Application shouldn't care if RDS ULP internally uses ODP
>> or some other mechanism to support RDMA on FS DAX memory.
>> This makes it transparent it to RDS application.
> 
> ODP checks need to be internal to kernel, user won't see those ODP
> checks.
> 
Correct. The check is within RDS kernel module.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-04-29 23:37 ` [net-next][PATCH v2 1/2] " Santosh Shilimkar
  2019-05-01  7:44   ` Leon Romanovsky
@ 2019-05-10 12:54   ` Jason Gunthorpe
  2019-05-10 16:11     ` Santosh Shilimkar
  1 sibling, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2019-05-10 12:54 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem

On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
> From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> 
> RDS doesn't support RDMA on memory apertures that require On Demand
> Paging (ODP), such as FS DAX memory. User applications can try to use
> RDS to perform RDMA over such memories and since it doesn't report any
> failure, it can lead to unexpected issues like memory corruption when
> a couple of out of sync file system operations like ftruncate etc. are
> performed.

This comment doesn't make any sense..

> The patch adds a check so that such an attempt to RDMA to/from memory
> apertures requiring ODP will fail.
> 
> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>  net/rds/rdma.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> index 182ab84..e0a6b72 100644
> +++ b/net/rds/rdma.c
> @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
>  {
>  	int ret;
>  
> -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
> -
> +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
> +	ret = get_user_pages_longterm(user_addr, nr_pages,
> +				      write, pages, NULL);

GUP is supposed to fully work on DAX filesystems.

You only need to switch to the long term version if the duration of
the GUP is under control of user space - ie it may last forever.

Short duration pins in the kernel do not need long term. 

At a minimum the commit message needs re-writing to properly explain
the motivation here.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-04-29 23:37 ` [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging Santosh Shilimkar
  2019-05-01  7:45   ` Leon Romanovsky
@ 2019-05-10 13:02   ` Jason Gunthorpe
  2019-05-10 16:13     ` Santosh Shilimkar
  1 sibling, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2019-05-10 13:02 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem

On Mon, Apr 29, 2019 at 04:37:20PM -0700, Santosh Shilimkar wrote:
> RDS doesn't support RDMA on memory apertures that require On Demand
> Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
> whether RDMA requiring ODP is supported.
> 
> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>  net/rds/ib.h        | 1 +
>  net/rds/ib_sysctl.c | 8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/net/rds/ib.h b/net/rds/ib.h
> index 67a715b..80e11ef 100644
> +++ b/net/rds/ib.h
> @@ -457,5 +457,6 @@ unsigned int rds_ib_stats_info_copy(struct rds_info_iterator *iter,
>  extern unsigned long rds_ib_sysctl_max_unsig_bytes;
>  extern unsigned long rds_ib_sysctl_max_recv_allocation;
>  extern unsigned int rds_ib_sysctl_flow_control;
> +extern unsigned int rds_ib_sysctl_odp_support;
>  
>  #endif
> diff --git a/net/rds/ib_sysctl.c b/net/rds/ib_sysctl.c
> index e4e41b3..7cc02cd 100644
> +++ b/net/rds/ib_sysctl.c
> @@ -60,6 +60,7 @@
>   * will cause credits to be added before protocol negotiation.
>   */
>  unsigned int rds_ib_sysctl_flow_control = 0;
> +unsigned int rds_ib_sysctl_odp_support;
>  
>  static struct ctl_table rds_ib_sysctl_table[] = {
>  	{
> @@ -103,6 +104,13 @@
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +	{
> +		.procname       = "odp_support",
> +		.data           = &rds_ib_sysctl_odp_support,
> +		.maxlen         = sizeof(rds_ib_sysctl_odp_support),
> +		.mode           = 0444,
> +		.proc_handler   = proc_dointvec,
> +	},
>  	{ }
>  };

using a read-only sysctl as a capability negotiation scheme seems
horrible to me

Jason  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 12:54   ` Jason Gunthorpe
@ 2019-05-10 16:11     ` Santosh Shilimkar
  2019-05-10 17:55       ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-10 16:11 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: netdev, davem, Hans Westgaard Ry

On 5/10/2019 5:54 AM, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
>> From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>
>> RDS doesn't support RDMA on memory apertures that require On Demand
>> Paging (ODP), such as FS DAX memory. User applications can try to use
>> RDS to perform RDMA over such memories and since it doesn't report any
>> failure, it can lead to unexpected issues like memory corruption when
>> a couple of out of sync file system operations like ftruncate etc. are
>> performed.
> 
> This comment doesn't make any sense..
>
>> The patch adds a check so that such an attempt to RDMA to/from memory
>> apertures requiring ODP will fail.
>>
>> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>   net/rds/rdma.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
>> index 182ab84..e0a6b72 100644
>> +++ b/net/rds/rdma.c
>> @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
>>   {
>>   	int ret;
>>   
>> -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
>> -
>> +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
>> +	ret = get_user_pages_longterm(user_addr, nr_pages,
>> +				      write, pages, NULL);
> 
> GUP is supposed to fully work on DAX filesystems.
>
Above comment has typo. Should have been
get_user_pages_longterm return -EOPNOTSUPP.

> You only need to switch to the long term version if the duration of
> the GUP is under control of user space - ie it may last forever.
>
> Short duration pins in the kernel do not need long term.
>
Thats true but the intention here is to use the long term version
which does check for the FS DAX memory. Instead of calling direct
accessor to check DAX memory region, longterm version of the API
is used

> At a minimum the commit message needs re-writing to properly explain
> the motivation here.
> 
Commit is actually trying to describe the motivation describing more of 
issues of not making the call fail. The code comment typo was
misleading.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging
  2019-05-10 13:02   ` Jason Gunthorpe
@ 2019-05-10 16:13     ` Santosh Shilimkar
  0 siblings, 0 replies; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-10 16:13 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: netdev, davem



On 5/10/2019 6:02 AM, Jason Gunthorpe wrote:
> On Mon, Apr 29, 2019 at 04:37:20PM -0700, Santosh Shilimkar wrote:
>> RDS doesn't support RDMA on memory apertures that require On Demand
>> Paging (ODP), such as FS DAX memory. A sysctl is added to indicate
>> whether RDMA requiring ODP is supported.
>>
>> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>   net/rds/ib.h        | 1 +
>>   net/rds/ib_sysctl.c | 8 ++++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/net/rds/ib.h b/net/rds/ib.h
>> index 67a715b..80e11ef 100644
>> +++ b/net/rds/ib.h
>> @@ -457,5 +457,6 @@ unsigned int rds_ib_stats_info_copy(struct rds_info_iterator *iter,
>>   extern unsigned long rds_ib_sysctl_max_unsig_bytes;
>>   extern unsigned long rds_ib_sysctl_max_recv_allocation;
>>   extern unsigned int rds_ib_sysctl_flow_control;
>> +extern unsigned int rds_ib_sysctl_odp_support;
>>   
>>   #endif
>> diff --git a/net/rds/ib_sysctl.c b/net/rds/ib_sysctl.c
>> index e4e41b3..7cc02cd 100644
>> +++ b/net/rds/ib_sysctl.c
>> @@ -60,6 +60,7 @@
>>    * will cause credits to be added before protocol negotiation.
>>    */
>>   unsigned int rds_ib_sysctl_flow_control = 0;
>> +unsigned int rds_ib_sysctl_odp_support;
>>   
>>   static struct ctl_table rds_ib_sysctl_table[] = {
>>   	{
>> @@ -103,6 +104,13 @@
>>   		.mode		= 0644,
>>   		.proc_handler	= proc_dointvec,
>>   	},
>> +	{
>> +		.procname       = "odp_support",
>> +		.data           = &rds_ib_sysctl_odp_support,
>> +		.maxlen         = sizeof(rds_ib_sysctl_odp_support),
>> +		.mode           = 0444,
>> +		.proc_handler   = proc_dointvec,
>> +	},
>>   	{ }
>>   };
> 
> using a read-only sysctl as a capability negotiation scheme seems
> horrible to me
>
Do you have a suggestion ? Was thinking of adding a socketopt but
didn't pursue it further.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 16:11     ` Santosh Shilimkar
@ 2019-05-10 17:55       ` Jason Gunthorpe
  2019-05-10 18:02         ` santosh.shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2019-05-10 17:55 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem, Hans Westgaard Ry

On Fri, May 10, 2019 at 09:11:24AM -0700, Santosh Shilimkar wrote:
> On 5/10/2019 5:54 AM, Jason Gunthorpe wrote:
> > On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
> > > From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > 
> > > RDS doesn't support RDMA on memory apertures that require On Demand
> > > Paging (ODP), such as FS DAX memory. User applications can try to use
> > > RDS to perform RDMA over such memories and since it doesn't report any
> > > failure, it can lead to unexpected issues like memory corruption when
> > > a couple of out of sync file system operations like ftruncate etc. are
> > > performed.
> > 
> > This comment doesn't make any sense..
> > 
> > > The patch adds a check so that such an attempt to RDMA to/from memory
> > > apertures requiring ODP will fail.
> > > 
> > > Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> > > Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> > > Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> > >   net/rds/rdma.c | 5 +++--
> > >   1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> > > index 182ab84..e0a6b72 100644
> > > +++ b/net/rds/rdma.c
> > > @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
> > >   {
> > >   	int ret;
> > > -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
> > > -
> > > +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
> > > +	ret = get_user_pages_longterm(user_addr, nr_pages,
> > > +				      write, pages, NULL);
> > 
> > GUP is supposed to fully work on DAX filesystems.
> > 
> Above comment has typo. Should have been
> get_user_pages_longterm return -EOPNOTSUPP.
> 
> > You only need to switch to the long term version if the duration of
> > the GUP is under control of user space - ie it may last forever.
> > 
> > Short duration pins in the kernel do not need long term.
> > 
> Thats true but the intention here is to use the long term version
> which does check for the FS DAX memory. Instead of calling direct
> accessor to check DAX memory region, longterm version of the API
> is used
> 
> > At a minimum the commit message needs re-writing to properly explain
> > the motivation here.
> > 
> Commit is actually trying to describe the motivation describing more of
> issues of not making the call fail. The code comment typo was
> misleading.

Every single sentence in the commit message is wrong

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 17:55       ` Jason Gunthorpe
@ 2019-05-10 18:02         ` santosh.shilimkar
  2019-05-10 18:07           ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: santosh.shilimkar @ 2019-05-10 18:02 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: netdev, davem, Hans Westgaard Ry



On 5/10/19 10:55 AM, Jason Gunthorpe wrote:
> On Fri, May 10, 2019 at 09:11:24AM -0700, Santosh Shilimkar wrote:
>> On 5/10/2019 5:54 AM, Jason Gunthorpe wrote:
>>> On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
>>>> From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>>
>>>> RDS doesn't support RDMA on memory apertures that require On Demand
>>>> Paging (ODP), such as FS DAX memory. User applications can try to use
>>>> RDS to perform RDMA over such memories and since it doesn't report any
>>>> failure, it can lead to unexpected issues like memory corruption when
>>>> a couple of out of sync file system operations like ftruncate etc. are
>>>> performed.
>>>
>>> This comment doesn't make any sense..
>>>
>>>> The patch adds a check so that such an attempt to RDMA to/from memory
>>>> apertures requiring ODP will fail.
>>>>
>>>> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>>>    net/rds/rdma.c | 5 +++--
>>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/net/rds/rdma.c b/net/rds/rdma.c
>>>> index 182ab84..e0a6b72 100644
>>>> +++ b/net/rds/rdma.c
>>>> @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
>>>>    {
>>>>    	int ret;
>>>> -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
>>>> -
>>>> +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
>>>> +	ret = get_user_pages_longterm(user_addr, nr_pages,
>>>> +				      write, pages, NULL);
>>>
>>> GUP is supposed to fully work on DAX filesystems.
>>>
>> Above comment has typo. Should have been
>> get_user_pages_longterm return -EOPNOTSUPP.
>>
>>> You only need to switch to the long term version if the duration of
>>> the GUP is under control of user space - ie it may last forever.
>>>
>>> Short duration pins in the kernel do not need long term.
>>>
>> Thats true but the intention here is to use the long term version
>> which does check for the FS DAX memory. Instead of calling direct
>> accessor to check DAX memory region, longterm version of the API
>> is used
>>
>>> At a minimum the commit message needs re-writing to properly explain
>>> the motivation here.
>>>
>> Commit is actually trying to describe the motivation describing more of
>> issues of not making the call fail. The code comment typo was
>> misleading.
> 
> Every single sentence in the commit message is wrong
> 
I will rewrite commit message but can you please comment on other
questions above. GUP long term was used to detect whether its
fs_dax memory which could be misleading since the RDS MRs are
short lived. Do you want us to use accessor instead to check
if its FS DAX memory?

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 18:02         ` santosh.shilimkar
@ 2019-05-10 18:07           ` Jason Gunthorpe
  2019-05-10 18:58             ` santosh.shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2019-05-10 18:07 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, davem, Hans Westgaard Ry

On Fri, May 10, 2019 at 11:02:35AM -0700, santosh.shilimkar@oracle.com wrote:
> 
> 
> On 5/10/19 10:55 AM, Jason Gunthorpe wrote:
> > On Fri, May 10, 2019 at 09:11:24AM -0700, Santosh Shilimkar wrote:
> > > On 5/10/2019 5:54 AM, Jason Gunthorpe wrote:
> > > > On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
> > > > > From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > > > 
> > > > > RDS doesn't support RDMA on memory apertures that require On Demand
> > > > > Paging (ODP), such as FS DAX memory. User applications can try to use
> > > > > RDS to perform RDMA over such memories and since it doesn't report any
> > > > > failure, it can lead to unexpected issues like memory corruption when
> > > > > a couple of out of sync file system operations like ftruncate etc. are
> > > > > performed.
> > > > 
> > > > This comment doesn't make any sense..
> > > > 
> > > > > The patch adds a check so that such an attempt to RDMA to/from memory
> > > > > apertures requiring ODP will fail.
> > > > > 
> > > > > Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> > > > > Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> > > > > Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > > > Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> > > > >    net/rds/rdma.c | 5 +++--
> > > > >    1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/net/rds/rdma.c b/net/rds/rdma.c
> > > > > index 182ab84..e0a6b72 100644
> > > > > +++ b/net/rds/rdma.c
> > > > > @@ -158,8 +158,9 @@ static int rds_pin_pages(unsigned long user_addr, unsigned int nr_pages,
> > > > >    {
> > > > >    	int ret;
> > > > > -	ret = get_user_pages_fast(user_addr, nr_pages, write, pages);
> > > > > -
> > > > > +	/* get_user_pages return -EOPNOTSUPP for fs_dax memory */
> > > > > +	ret = get_user_pages_longterm(user_addr, nr_pages,
> > > > > +				      write, pages, NULL);
> > > > 
> > > > GUP is supposed to fully work on DAX filesystems.
> > > > 
> > > Above comment has typo. Should have been
> > > get_user_pages_longterm return -EOPNOTSUPP.
> > > 
> > > > You only need to switch to the long term version if the duration of
> > > > the GUP is under control of user space - ie it may last forever.
> > > > 
> > > > Short duration pins in the kernel do not need long term.
> > > > 
> > > Thats true but the intention here is to use the long term version
> > > which does check for the FS DAX memory. Instead of calling direct
> > > accessor to check DAX memory region, longterm version of the API
> > > is used
> > > 
> > > > At a minimum the commit message needs re-writing to properly explain
> > > > the motivation here.
> > > > 
> > > Commit is actually trying to describe the motivation describing more of
> > > issues of not making the call fail. The code comment typo was
> > > misleading.
> > 
> > Every single sentence in the commit message is wrong
> > 
> I will rewrite commit message but can you please comment on other
> questions above. GUP long term was used to detect whether its
> fs_dax memory which could be misleading since the RDS MRs are
> short lived. Do you want us to use accessor instead to check
> if its FS DAX memory?

Why would you need to detect FS DAX memory? GUP users are not supposed
to care.

GUP is supposed to work just 'fine' - other than the usual bugs we
have with GUP and any FS backed memory.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 18:07           ` Jason Gunthorpe
@ 2019-05-10 18:58             ` santosh.shilimkar
  2019-05-10 19:20               ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: santosh.shilimkar @ 2019-05-10 18:58 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: netdev, davem, Hans Westgaard Ry

On 5/10/19 11:07 AM, Jason Gunthorpe wrote:
> On Fri, May 10, 2019 at 11:02:35AM -0700, santosh.shilimkar@oracle.com wrote:
>>
>>
>> On 5/10/19 10:55 AM, Jason Gunthorpe wrote:
>>> On Fri, May 10, 2019 at 09:11:24AM -0700, Santosh Shilimkar wrote:
>>>> On 5/10/2019 5:54 AM, Jason Gunthorpe wrote:
>>>>> On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
>>>>>> From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>>>>
>>>>>> RDS doesn't support RDMA on memory apertures that require On Demand
>>>>>> Paging (ODP), such as FS DAX memory. User applications can try to use
>>>>>> RDS to perform RDMA over such memories and since it doesn't report any
>>>>>> failure, it can lead to unexpected issues like memory corruption when
>>>>>> a couple of out of sync file system operations like ftruncate etc. are
>>>>>> performed.
>>>>>
>>>>> This comment doesn't make any sense..
>>>>>
>>>>>> The patch adds a check so that such an attempt to RDMA to/from memory
>>>>>> apertures requiring ODP will fail.
>>>>>>
>>>>>> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>>>> Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>>>>>> Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
>>>>>> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>>>>>     net/rds/rdma.c | 5 +++--
>>>>>>     1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
[...]

> 
> Why would you need to detect FS DAX memory? GUP users are not supposed
> to care.
> 
> GUP is supposed to work just 'fine' - other than the usual bugs we
> have with GUP and any FS backed memory.
> 
Am not saying there is any issue with GUP. Let me try to explain the
issue first. You are aware of various discussions about doing DMA
or RDMA on FS DAX memory. e.g [1] [2] [3]

One of the proposal to do safely RDMA on FS DAX memory is/was ODP
Since its hooked with mm, it can block file system operations
like ftruncate on the mmaped file systems handle while ongoing IO(RDMA).

Currently RDS doesn't have support for ODP MR registration
and hence we don't want user application to do RDMA using
fastreg/fmr on FS DAX memory which isn't safe. So the intention
was, to make RDS_GET_MR fail if the user provided memory are is
FS DAX & RDS kernel module doesn't support ODP.

We have systems equipped with both regular DRAM as well as PMEM
DIMMs. So RDS needs to find out what kind of memory user is
passing to registers for RDMA. If its regular DRAM, it will
continue as now and return the key to application and if its
FS DAX memory, it  suppose to fail the call. GUP long
term was used since it checked fs dax memory and
reports -EOPNOTSUPP for fs_dax memory. Using that error
code, patch was making RDS get_mr call fail.

In short, till the ODP support added to RDS, we want the RDMA
request to fail for FS dax memory.

Hope above clarifies it.

Regards,
Santosh

[1] https://lwn.net/Articles/737273/
[2] https://lkml.org/lkml/2019/2/5/570
[3] https://lists.01.org/pipermail/linux-nvdimm/2018-January/013935.html


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 18:58             ` santosh.shilimkar
@ 2019-05-10 19:20               ` Jason Gunthorpe
  2019-05-10 19:38                 ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2019-05-10 19:20 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, davem, Hans Westgaard Ry

On Fri, May 10, 2019 at 11:58:42AM -0700, santosh.shilimkar@oracle.com wrote:
> On 5/10/19 11:07 AM, Jason Gunthorpe wrote:
> > On Fri, May 10, 2019 at 11:02:35AM -0700, santosh.shilimkar@oracle.com wrote:
> > > 
> > > 
> > > On 5/10/19 10:55 AM, Jason Gunthorpe wrote:
> > > > On Fri, May 10, 2019 at 09:11:24AM -0700, Santosh Shilimkar wrote:
> > > > > On 5/10/2019 5:54 AM, Jason Gunthorpe wrote:
> > > > > > On Mon, Apr 29, 2019 at 04:37:19PM -0700, Santosh Shilimkar wrote:
> > > > > > > From: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > > > > > 
> > > > > > > RDS doesn't support RDMA on memory apertures that require On Demand
> > > > > > > Paging (ODP), such as FS DAX memory. User applications can try to use
> > > > > > > RDS to perform RDMA over such memories and since it doesn't report any
> > > > > > > failure, it can lead to unexpected issues like memory corruption when
> > > > > > > a couple of out of sync file system operations like ftruncate etc. are
> > > > > > > performed.
> > > > > > 
> > > > > > This comment doesn't make any sense..
> > > > > > 
> > > > > > > The patch adds a check so that such an attempt to RDMA to/from memory
> > > > > > > apertures requiring ODP will fail.
> > > > > > > 
> > > > > > > Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
> > > > > > > Reviewed-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> > > > > > > Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
> > > > > > > Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> > > > > > >     net/rds/rdma.c | 5 +++--
> > > > > > >     1 file changed, 3 insertions(+), 2 deletions(-)
> > > > > > > 
> [...]
> 
> > 
> > Why would you need to detect FS DAX memory? GUP users are not supposed
> > to care.
> > 
> > GUP is supposed to work just 'fine' - other than the usual bugs we
> > have with GUP and any FS backed memory.
> > 
> Am not saying there is any issue with GUP. Let me try to explain the
> issue first. You are aware of various discussions about doing DMA
> or RDMA on FS DAX memory. e.g [1] [2] [3]
> 
> One of the proposal to do safely RDMA on FS DAX memory is/was ODP

It is not about safety. ODP is required in all places that would have
used gup_longterm because ODP avoids th gup_longterm entirely.

> Currently RDS doesn't have support for ODP MR registration
> and hence we don't want user application to do RDMA using
> fastreg/fmr on FS DAX memory which isn't safe.

No, it is safe.

The only issue is you need to determine if this use of GUP is longterm
or short term. Longterm means userspace is in control of how long the
GUP lasts, short term means the kernel is in control.

ie posting a fastreg, sending the data, then un-GUP'ing on completion
is a short term GUP and it is fine on any type of memory.

So if it is a long term pin then it needs to be corrected and the only
thing the comment needs to explain is that it is a long term pin.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 19:20               ` Jason Gunthorpe
@ 2019-05-10 19:38                 ` Santosh Shilimkar
  2019-05-10 19:47                   ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-10 19:38 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: netdev, davem, Hans Westgaard Ry

On 5/10/2019 12:20 PM, Jason Gunthorpe wrote:
> On Fri, May 10, 2019 at 11:58:42AM -0700, santosh.shilimkar@oracle.com wrote:
>> On 5/10/19 11:07 AM, Jason Gunthorpe wrote:
>>> On Fri, May 10, 2019 at 11:02:35AM -0700, santosh.shilimkar@oracle.com wrote:

[...]

>>> Why would you need to detect FS DAX memory? GUP users are not supposed
>>> to care.
>>>
>>> GUP is supposed to work just 'fine' - other than the usual bugs we
>>> have with GUP and any FS backed memory.
>>>
>> Am not saying there is any issue with GUP. Let me try to explain the
>> issue first. You are aware of various discussions about doing DMA
>> or RDMA on FS DAX memory. e.g [1] [2] [3]
>>
>> One of the proposal to do safely RDMA on FS DAX memory is/was ODP
> 
> It is not about safety. ODP is required in all places that would have
> used gup_longterm because ODP avoids th gup_longterm entirely.
> 
>> Currently RDS doesn't have support for ODP MR registration
>> and hence we don't want user application to do RDMA using
>> fastreg/fmr on FS DAX memory which isn't safe.
> 
> No, it is safe.
> 
> The only issue is you need to determine if this use of GUP is longterm
> or short term. Longterm means userspace is in control of how long the
> GUP lasts, short term means the kernel is in control.
> 
> ie posting a fastreg, sending the data, then un-GUP'ing on completion
> is a short term GUP and it is fine on any type of memory.
> 
> So if it is a long term pin then it needs to be corrected and the only
> thing the comment needs to explain is that it is a long term pin.
> 
Thanks for clarification. At least the distinction is clear to me now. 
Yes the key can be valid for long term till the remote RDMA IO is issued 
and finished. After that user can issue an invalidate/free key or
upfront specify a flag to free/invalidate the key on remote IO
completion.

Will update the commit message accordingly. Can you please also
comment on question on 2/2 ?

regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 19:38                 ` Santosh Shilimkar
@ 2019-05-10 19:47                   ` Jason Gunthorpe
  2019-05-10 20:12                     ` Santosh Shilimkar
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2019-05-10 19:47 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem, Hans Westgaard Ry

On Fri, May 10, 2019 at 12:38:31PM -0700, Santosh Shilimkar wrote:
> On 5/10/2019 12:20 PM, Jason Gunthorpe wrote:
> > On Fri, May 10, 2019 at 11:58:42AM -0700, santosh.shilimkar@oracle.com wrote:
> > > On 5/10/19 11:07 AM, Jason Gunthorpe wrote:
> > > > On Fri, May 10, 2019 at 11:02:35AM -0700, santosh.shilimkar@oracle.com wrote:
> 
> [...]
> 
> > > > Why would you need to detect FS DAX memory? GUP users are not supposed
> > > > to care.
> > > > 
> > > > GUP is supposed to work just 'fine' - other than the usual bugs we
> > > > have with GUP and any FS backed memory.
> > > > 
> > > Am not saying there is any issue with GUP. Let me try to explain the
> > > issue first. You are aware of various discussions about doing DMA
> > > or RDMA on FS DAX memory. e.g [1] [2] [3]
> > > 
> > > One of the proposal to do safely RDMA on FS DAX memory is/was ODP
> > 
> > It is not about safety. ODP is required in all places that would have
> > used gup_longterm because ODP avoids th gup_longterm entirely.
> > 
> > > Currently RDS doesn't have support for ODP MR registration
> > > and hence we don't want user application to do RDMA using
> > > fastreg/fmr on FS DAX memory which isn't safe.
> > 
> > No, it is safe.
> > 
> > The only issue is you need to determine if this use of GUP is longterm
> > or short term. Longterm means userspace is in control of how long the
> > GUP lasts, short term means the kernel is in control.
> > 
> > ie posting a fastreg, sending the data, then un-GUP'ing on completion
> > is a short term GUP and it is fine on any type of memory.
> > 
> > So if it is a long term pin then it needs to be corrected and the only
> > thing the comment needs to explain is that it is a long term pin.
> > 
> Thanks for clarification. At least the distinction is clear to me now. Yes
> the key can be valid for long term till the remote RDMA IO is issued and
> finished. After that user can issue an invalidate/free key or
> upfront specify a flag to free/invalidate the key on remote IO
> completion.

Again, the test is if *userspace* controls this. So if userspace is
the thing that does the invalidate/free then it is long term. Sounds
like if it specifies the free/invalidate flag then it short term.

At this point you'd probably be better to keep both options.

> Will update the commit message accordingly. Can you please also
> comment on question on 2/2 ?

I have no advice on how to do compatability knobs in netdev - only
that sysctl does not seem appropriate.
 
Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 19:47                   ` Jason Gunthorpe
@ 2019-05-10 20:12                     ` Santosh Shilimkar
  2019-05-10 20:39                       ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Santosh Shilimkar @ 2019-05-10 20:12 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: netdev, davem, Hans Westgaard Ry

On 5/10/2019 12:47 PM, Jason Gunthorpe wrote:
> On Fri, May 10, 2019 at 12:38:31PM -0700, Santosh Shilimkar wrote:
>> On 5/10/2019 12:20 PM, Jason Gunthorpe wrote:
>>> On Fri, May 10, 2019 at 11:58:42AM -0700, santosh.shilimkar@oracle.com wrote:
>>>> On 5/10/19 11:07 AM, Jason Gunthorpe wrote:
>>>>> On Fri, May 10, 2019 at 11:02:35AM -0700, santosh.shilimkar@oracle.com wrote:
>>
>> [...]
>>
>>>>> Why would you need to detect FS DAX memory? GUP users are not supposed
>>>>> to care.
>>>>>
>>>>> GUP is supposed to work just 'fine' - other than the usual bugs we
>>>>> have with GUP and any FS backed memory.
>>>>>
>>>> Am not saying there is any issue with GUP. Let me try to explain the
>>>> issue first. You are aware of various discussions about doing DMA
>>>> or RDMA on FS DAX memory. e.g [1] [2] [3]
>>>>
>>>> One of the proposal to do safely RDMA on FS DAX memory is/was ODP
>>>
>>> It is not about safety. ODP is required in all places that would have
>>> used gup_longterm because ODP avoids th gup_longterm entirely.
>>>
>>>> Currently RDS doesn't have support for ODP MR registration
>>>> and hence we don't want user application to do RDMA using
>>>> fastreg/fmr on FS DAX memory which isn't safe.
>>>
>>> No, it is safe.
>>>
>>> The only issue is you need to determine if this use of GUP is longterm
>>> or short term. Longterm means userspace is in control of how long the
>>> GUP lasts, short term means the kernel is in control.
>>>
>>> ie posting a fastreg, sending the data, then un-GUP'ing on completion
>>> is a short term GUP and it is fine on any type of memory.
>>>
>>> So if it is a long term pin then it needs to be corrected and the only
>>> thing the comment needs to explain is that it is a long term pin.
>>>
>> Thanks for clarification. At least the distinction is clear to me now. Yes
>> the key can be valid for long term till the remote RDMA IO is issued and
>> finished. After that user can issue an invalidate/free key or
>> upfront specify a flag to free/invalidate the key on remote IO
>> completion.
> 
> Again, the test is if *userspace* controls this. So if userspace is
> the thing that does the invalidate/free then it is long term. Sounds
> like if it specifies the free/invalidate flag then it short term.
> 
> At this point you'd probably be better to keep both options.
>
Thats possible using the provided flag state but I am still not sure
whether its guaranteed to be safe when marked as short term even with
flag which tells kernel to invalidate/free the MR on remote IO
completion. Till the remote server finishes the IO, there is
still a window where userspace on local server can
modify the file mappings. Registered file handle say was
ftuncated to zero by another process and the backing memory
was allocated  by some other process as part of fallocate.
Now the mapping on HCA is invalid from local userspace perspective
but since key is valid, remote can still do RDMA to that region.

How do we avoid such an issue without GUP_longterm ?

>> Will update the commit message accordingly. Can you please also
>> comment on question on 2/2 ?
> 
> I have no advice on how to do compatability knobs in netdev - only
> that sysctl does not seem appropriate.
>   
OK. Let me think about alternative.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [net-next][PATCH v2 1/2] rds: handle unsupported rdma request to fs dax memory
  2019-05-10 20:12                     ` Santosh Shilimkar
@ 2019-05-10 20:39                       ` Jason Gunthorpe
  0 siblings, 0 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2019-05-10 20:39 UTC (permalink / raw)
  To: Santosh Shilimkar; +Cc: netdev, davem, Hans Westgaard Ry

On Fri, May 10, 2019 at 01:12:49PM -0700, Santosh Shilimkar wrote:
> On 5/10/2019 12:47 PM, Jason Gunthorpe wrote:
> > On Fri, May 10, 2019 at 12:38:31PM -0700, Santosh Shilimkar wrote:
> > > On 5/10/2019 12:20 PM, Jason Gunthorpe wrote:
> > > > On Fri, May 10, 2019 at 11:58:42AM -0700, santosh.shilimkar@oracle.com wrote:
> > > > > On 5/10/19 11:07 AM, Jason Gunthorpe wrote:
> > > > > > On Fri, May 10, 2019 at 11:02:35AM -0700, santosh.shilimkar@oracle.com wrote:
> > > 
> > > [...]
> > > 
> > > > > > Why would you need to detect FS DAX memory? GUP users are not supposed
> > > > > > to care.
> > > > > > 
> > > > > > GUP is supposed to work just 'fine' - other than the usual bugs we
> > > > > > have with GUP and any FS backed memory.
> > > > > > 
> > > > > Am not saying there is any issue with GUP. Let me try to explain the
> > > > > issue first. You are aware of various discussions about doing DMA
> > > > > or RDMA on FS DAX memory. e.g [1] [2] [3]
> > > > > 
> > > > > One of the proposal to do safely RDMA on FS DAX memory is/was ODP
> > > > 
> > > > It is not about safety. ODP is required in all places that would have
> > > > used gup_longterm because ODP avoids th gup_longterm entirely.
> > > > 
> > > > > Currently RDS doesn't have support for ODP MR registration
> > > > > and hence we don't want user application to do RDMA using
> > > > > fastreg/fmr on FS DAX memory which isn't safe.
> > > > 
> > > > No, it is safe.
> > > > 
> > > > The only issue is you need to determine if this use of GUP is longterm
> > > > or short term. Longterm means userspace is in control of how long the
> > > > GUP lasts, short term means the kernel is in control.
> > > > 
> > > > ie posting a fastreg, sending the data, then un-GUP'ing on completion
> > > > is a short term GUP and it is fine on any type of memory.
> > > > 
> > > > So if it is a long term pin then it needs to be corrected and the only
> > > > thing the comment needs to explain is that it is a long term pin.
> > > > 
> > > Thanks for clarification. At least the distinction is clear to me now. Yes
> > > the key can be valid for long term till the remote RDMA IO is issued and
> > > finished. After that user can issue an invalidate/free key or
> > > upfront specify a flag to free/invalidate the key on remote IO
> > > completion.
> > 
> > Again, the test is if *userspace* controls this. So if userspace is
> > the thing that does the invalidate/free then it is long term. Sounds
> > like if it specifies the free/invalidate flag then it short term.
> > 
> > At this point you'd probably be better to keep both options.
> > 
> Thats possible using the provided flag state but I am still not sure
> whether its guaranteed to be safe when marked as short term even with
> flag which tells kernel to invalidate/free the MR on remote IO
> completion. Till the remote server finishes the IO, 

This is fine.

> there is still a window where userspace on local server can modify
> the file mappings. Registered file handle say was ftuncated to zero
> by another process and the backing memory was allocated by some
> other process as part of fallocate.  

The FS is supposed to maintain sane semantics across GUP - fallocate
should block until GUP is done. This is normal.

> How do we avoid such an issue without GUP_longterm ?

You don't, there is no problem using GUP for short term - ie a time
frame entirely under control of the kernel.

Jason

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-05-10 20:39 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-29 23:37 [net-next][PATCH v2 0/2] rds: handle unsupported rdma request to fs dax memory Santosh Shilimkar
2019-04-29 23:37 ` [net-next][PATCH v2 1/2] " Santosh Shilimkar
2019-05-01  7:44   ` Leon Romanovsky
2019-05-01 17:54     ` Santosh Shilimkar
2019-05-02  6:21       ` Leon Romanovsky
2019-05-02 17:52         ` Santosh Shilimkar
2019-05-05  6:28           ` Leon Romanovsky
2019-05-06 16:39             ` Santosh Shilimkar
2019-05-10 12:54   ` Jason Gunthorpe
2019-05-10 16:11     ` Santosh Shilimkar
2019-05-10 17:55       ` Jason Gunthorpe
2019-05-10 18:02         ` santosh.shilimkar
2019-05-10 18:07           ` Jason Gunthorpe
2019-05-10 18:58             ` santosh.shilimkar
2019-05-10 19:20               ` Jason Gunthorpe
2019-05-10 19:38                 ` Santosh Shilimkar
2019-05-10 19:47                   ` Jason Gunthorpe
2019-05-10 20:12                     ` Santosh Shilimkar
2019-05-10 20:39                       ` Jason Gunthorpe
2019-04-29 23:37 ` [net-next][PATCH v2 2/2] rds: add sysctl for rds support of On-Demand-Paging Santosh Shilimkar
2019-05-01  7:45   ` Leon Romanovsky
2019-05-01 17:54     ` Santosh Shilimkar
2019-05-02  6:18       ` Leon Romanovsky
2019-05-02 17:59         ` Santosh Shilimkar
2019-05-05  6:22           ` Leon Romanovsky
2019-05-06 16:37             ` Santosh Shilimkar
2019-05-10 13:02   ` Jason Gunthorpe
2019-05-10 16:13     ` Santosh Shilimkar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.