linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Doug Ledford <dledford@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dave Chinner <david@fromorbit.com>,
	Christopher Lameter <cl@linux.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	lsf-pc@lists.linux-foundation.org,
	linux-rdma <linux-rdma@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
Date: Thu, 7 Feb 2019 09:23:53 -0800	[thread overview]
Message-ID: <20190207172352.GC29531@iweiny-DESK2.sc.intel.com> (raw)
In-Reply-To: <CAPcyv4hPmwXv6xGpyWGs-zx3xswAnzF0HGX6Kx3t=LSysDRZog@mail.gmail.com>

On Wed, Feb 06, 2019 at 07:13:16PM -0800, Dan Williams wrote:
> On Wed, Feb 6, 2019 at 6:42 PM Doug Ledford <dledford@redhat.com> wrote:
> >
> > On Wed, 2019-02-06 at 14:44 -0800, Dan Williams wrote:
> > > On Wed, Feb 6, 2019 at 2:25 PM Doug Ledford <dledford@redhat.com> wrote:
> > > > Can someone give me a real world scenario that someone is *actually*
> > > > asking for with this?
> > >
> > > I'll point to this example. At the 6:35 mark Kodi talks about the
> > > Oracle use case for DAX + RDMA.
> > >
> > > https://youtu.be/ywKPPIE8JfQ?t=395
> >
> > I watched this, and I see that Oracle is all sorts of excited that their
> > storage machines can scale out, and they can access the storage and it
> > has basically no CPU load on the storage server while performing
> > millions of queries.  What I didn't hear in there is why DAX has to be
> > in the picture, or why Oracle couldn't do the same thing with a simple
> > memory region exported directly to the RDMA subsystem, or why reflink or
> > any of the other features you talk about are needed.  So, while these
> > things may legitimately be needed, this video did not tell me about
> > how/why they are needed, just that RDMA is really, *really* cool for
> > their use case and gets them 0% CPU utilization on their storage
> > servers.  I didn't watch the whole thing though.  Do they get into that
> > later on?  Do they get to that level of technical discussion, or is this
> > all higher level?
> 
> They don't. The point of sharing that video was illustrating that RDMA
> to persistent memory use case. That 0% cpu utilization is because the
> RDMA target is not page-cache / anonymous on the storage box it's
> directly to a file offset in DAX / persistent memory. A solution to
> truncate lets that use case use more than just Device-DAX or ODP
> capable adapters. That said, I need to let Ira jump in here because
> saying layout leases solves the problem is not true, it's just the
> start of potentially solving the problem. It's not clear to me what
> the long tail of work looks like once the filesystem raises a
> notification to the RDMA target process.

This is exactly the problem which has been touched on by others throughout this
thread.

1) To fully support leases on all hardware we will have to allow for RMDA
   processes to be killed when they don't respond to the lease

   a) If the process has done something bad (like truncate or hole punch) then
      the idea that "they get what they deserve" may be ok.

   b) However, if this is because of some underlying file system maintenance
      this is as Jason says unreasonable.  It would be much better to tell the
      application "you can't do this"

2) To fully respond to a lease revocation involves a number of kernel changes
   in the RDMA stack but more importantly modifying every user space RDMA
   application to respond to a message from a channel they may not even be
   listening to.

I think this is where Jason is getting very concerned.  When you
combine 1b and 2 you end up with a "non production" worthy solution.

NOTE: This is somewhat true of ODP hardware as well since applications register
each individual RDMA memory region as either ODP or not.  So out of the box not
all application would work automatically.

Ira


  reply	other threads:[~2019-02-07 17:24 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 17:50 [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA Ira Weiny
2019-02-05 18:01 ` Ira Weiny
2019-02-06 21:31   ` Dave Chinner
2019-02-06  9:50 ` Jan Kara
2019-02-06 17:31   ` Jason Gunthorpe
2019-02-06 17:52     ` Matthew Wilcox
2019-02-06 18:32       ` Doug Ledford
2019-02-06 18:35         ` Matthew Wilcox
2019-02-06 18:44           ` Doug Ledford
2019-02-06 18:52           ` Jason Gunthorpe
2019-02-06 19:45             ` Dan Williams
2019-02-06 20:14               ` Doug Ledford
2019-02-06 21:04                 ` Dan Williams
2019-02-06 21:12                   ` Doug Ledford
2019-02-06 19:16         ` Christopher Lameter
2019-02-06 19:40           ` Matthew Wilcox
2019-02-06 20:16             ` Doug Ledford
2019-02-06 20:20               ` Matthew Wilcox
2019-02-06 20:28                 ` Doug Ledford
2019-02-06 20:41                   ` Matthew Wilcox
2019-02-06 20:47                     ` Doug Ledford
2019-02-06 20:49                       ` Matthew Wilcox
2019-02-06 20:50                         ` Doug Ledford
2019-02-06 20:31                 ` Jason Gunthorpe
2019-02-06 20:39                 ` Christopher Lameter
2019-02-06 20:54                 ` Doug Ledford
2019-02-07 16:48                   ` Jan Kara
2019-02-06 20:24             ` Christopher Lameter
2019-02-06 21:03           ` Dave Chinner
2019-02-06 22:08             ` Jason Gunthorpe
2019-02-06 22:24               ` Doug Ledford
2019-02-06 22:44                 ` Dan Williams
2019-02-06 23:21                   ` Jason Gunthorpe
2019-02-06 23:30                     ` Dan Williams
2019-02-06 23:41                       ` Jason Gunthorpe
2019-02-07  0:22                         ` Dan Williams
2019-02-07  5:33                           ` Jason Gunthorpe
2019-02-07  1:57                   ` Doug Ledford
2019-02-07  2:48                     ` Dan Williams
2019-02-07  2:42                   ` Doug Ledford
2019-02-07  3:13                     ` Dan Williams
2019-02-07 17:23                       ` Ira Weiny [this message]
2019-02-07 16:25                   ` Doug Ledford
2019-02-07 16:55                     ` Christopher Lameter
2019-02-07 17:35                       ` Ira Weiny
2019-02-07 18:17                         ` Christopher Lameter
2019-02-08  4:43                       ` Dave Chinner
2019-02-08 11:10                         ` Jan Kara
2019-02-08 20:50                           ` Dan Williams
2019-02-11 10:24                             ` Jan Kara
2019-02-11 17:22                               ` Dan Williams
2019-02-11 18:06                                 ` Jason Gunthorpe
2019-02-11 18:15                                   ` Dan Williams
2019-02-11 18:19                                   ` Ira Weiny
2019-02-11 18:26                                     ` Jason Gunthorpe
2019-02-11 18:40                                       ` Matthew Wilcox
2019-02-11 19:58                                         ` Dan Williams
2019-02-11 20:49                                           ` Jason Gunthorpe
2019-02-11 21:02                                             ` Dan Williams
2019-02-11 21:09                                               ` Jason Gunthorpe
2019-02-12 16:34                                                 ` Jan Kara
2019-02-12 16:55                                                   ` Christopher Lameter
2019-02-13 15:06                                                     ` Jan Kara
2019-02-12 16:36                                               ` Christopher Lameter
2019-02-12 16:44                                                 ` Jan Kara
2019-02-11 21:08                                     ` Jerome Glisse
2019-02-11 21:22                                     ` John Hubbard
2019-02-11 22:12                                       ` Jason Gunthorpe
2019-02-11 22:33                                         ` John Hubbard
2019-02-12 16:39                                           ` Christopher Lameter
2019-02-13  2:58                                             ` John Hubbard
2019-02-12 16:28                                   ` Jan Kara
2019-02-14 20:26                                   ` Jerome Glisse
2019-02-14 20:50                                     ` Matthew Wilcox
2019-02-14 21:39                                       ` Jerome Glisse
2019-02-15  1:19                                         ` Dave Chinner
2019-02-15 15:42                                           ` Christopher Lameter
2019-02-15 18:08                                             ` Matthew Wilcox
2019-02-15 18:31                                               ` Christopher Lameter
2019-02-15 22:00                                                 ` Jason Gunthorpe
2019-02-15 23:38                                                   ` Ira Weiny
2019-02-16 22:42                                                     ` Dave Chinner
2019-02-17  2:54                                                     ` Christopher Lameter
2019-02-12 16:07                                 ` Jan Kara
2019-02-12 21:53                                   ` Dan Williams
2019-02-08 21:20                           ` Dave Chinner
2019-02-08 15:33                         ` Christopher Lameter
2019-02-07 17:24                     ` Matthew Wilcox
2019-02-07 17:26                       ` Jason Gunthorpe
2019-02-07  3:52                 ` Dave Chinner
2019-02-07  5:23                   ` Jason Gunthorpe
2019-02-07  6:00                     ` Dan Williams
2019-02-07 17:17                       ` Jason Gunthorpe
2019-02-07 23:54                         ` Dan Williams
2019-02-08  1:44                           ` Ira Weiny
2019-02-08  5:19                           ` Jason Gunthorpe
2019-02-08  7:20                             ` Dan Williams
2019-02-08 15:42                               ` Jason Gunthorpe
2019-02-07 15:04                     ` Chuck Lever
2019-02-07 15:28                       ` Tom Talpey
2019-02-07 15:37                         ` Doug Ledford
2019-02-07 15:41                           ` Tom Talpey
2019-02-07 15:56                             ` Doug Ledford
2019-02-07 16:57                         ` Ira Weiny
2019-02-07 21:31                           ` Tom Talpey
2019-02-07 16:54                     ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190207172352.GC29531@iweiny-DESK2.sc.intel.com \
    --to=ira.weiny@intel.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=dledford@redhat.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).