Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Jason Gunthorpe <jgg@mellanox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Doug Ledford <dledford@redhat.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] Please pull RDMA subsystem changes
Date: Sun, 28 Apr 2019 23:49:40 +0000
Message-ID: <20190428234935.GA15233@mellanox.com> (raw)
In-Reply-To: <CAHk-=wj4ay=jy6wuN4d9p9v+O32i0aH9SMfu39VKP-Ai7hKp=g@mail.gmail.com>

On Sun, Apr 28, 2019 at 09:59:56AM -0700, Linus Torvalds wrote:
> On Sun, Apr 28, 2019 at 4:52 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
> >
> > Nothing particularly special here. There is a small merge conflict
> > with Adrea's mm_still_valid patches which is resolved as below:
> 
> I still don't understand *why* you play the crazy VM games to begin with.
> 
> What's wrong with just returning SIGBUS? Why does that
> rdma_umap_fault() not just look like this one-liner:
> 
>         return VM_FAULT_SIGBUS;
> 
> without the crazy parts? Nobody ever explained why you'd want to have
> that silly "let's turn it into a bogus anonymous mapping".

There was a big thread where I went over the use case with Andrea, but
I guess that was private..

It is for high availability - we have situations where the hardware
can fault and needs some kind of destructive recovery. For instance a
firmware reboot, or a VM migration.

In these designs there may be multiple cards in the system and the
userspace application could be using both. Just because one card
crashed we can't send SIGBUS and kill the application, that breaks the
HA design.

So.. the kernel makes the BAR VMA into a 'dummy' and sends an async
notification to the application. The use of the BAR memory by
userspace is all 'write only' so it doesn't really care. When it sees
the async notification it safely cleans up the userspace side of
things.

An more modern VM example of where this gets used is on VM systems
using SRIO-V pass through of a raw RDMA device. When it is time to
migrate the VM then the hypervisor causes the SRIO-V instance to fault
and be removed from the guest kernel, then migrates and attaches a new
RDMA SRIO-V instance. The user space is expected to see the failure,
maintain state, then recover onto the new device.

The only alternative that has come up would be to delay the kernel
side until the application cleans up and deletes the VMA, but people
generally don't like this as it degrades the recovery time and has the
usual problems with blocking the kernel on userspace.

When this was created I'm not sure people explored more creative ideas
like trying to handle/ignore the SIGBUS in userspace - unfortunately
it has been so long now that we are probably stuck doing this as part
of the UAPI.

I've been trying to make it less crufty over the last year based on
remarks from yourself and Andrea, but I'm still stuck with this basic
requirement that the VMA shouldn't fault or touch the BAR after the
hardware is released by the kernel.

Thanks,
Jason

  reply index

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-28 11:52 Jason Gunthorpe
2019-04-28 16:59 ` Linus Torvalds
2019-04-28 23:49   ` Jason Gunthorpe [this message]
2019-04-29  0:09     ` Linus Torvalds
2019-04-30 12:53       ` Jason Gunthorpe
2019-04-28 18:05 ` pr-tracker-bot
2019-04-29  6:09 ` Heiko Carstens
2019-04-29  8:40   ` Jason Gunthorpe
2019-04-29  9:00     ` Michal Kubecek
2019-04-29  9:19       ` Leon Romanovsky
2019-04-29 15:42     ` Doug Ledford
2019-04-29 16:29       ` Doug Ledford
  -- strict thread matches above, loose matches on Subject: below --
2020-06-04 19:51 Jason Gunthorpe
2020-06-05 21:15 ` pr-tracker-bot
2020-05-29 14:15 Jason Gunthorpe
2020-05-29 21:10 ` pr-tracker-bot
2020-05-15 19:13 Jason Gunthorpe
2020-05-15 20:15 ` pr-tracker-bot
2020-04-28 19:59 Jason Gunthorpe
2020-04-28 20:40 ` pr-tracker-bot
2020-04-01 23:01 Jason Gunthorpe
2020-04-02  2:05 ` pr-tracker-bot
2020-03-26 17:31 Jason Gunthorpe
2020-03-26 17:50 ` pr-tracker-bot
2020-03-08  0:07 Jason Gunthorpe
2020-03-08  2:00 ` pr-tracker-bot
2020-02-15 21:32 Jason Gunthorpe
2020-02-16  1:20 ` pr-tracker-bot
2020-01-31 15:16 Jason Gunthorpe
2020-01-31 23:10 ` pr-tracker-bot
2020-01-10  2:16 Jason Gunthorpe
2020-01-11 14:45 ` pr-tracker-bot
2019-11-27  0:24 Jason Gunthorpe
2019-11-27 18:19 ` Linus Torvalds
2019-11-27 19:23   ` Jason Gunthorpe
2019-11-27 18:45 ` pr-tracker-bot
2019-11-14 15:16 Jason Gunthorpe
2019-11-14 16:55 ` pr-tracker-bot
2019-10-31 18:22 Jason Gunthorpe
2019-11-01 17:10 ` pr-tracker-bot
2019-10-09 14:28 Jason Gunthorpe
2019-10-09 19:30 ` pr-tracker-bot
2019-09-19 16:34 Jason Gunthorpe
2019-09-21 17:40 ` pr-tracker-bot
2019-07-30 12:15 Jason Gunthorpe
2019-07-30 20:40 ` pr-tracker-bot
2019-07-15 15:26 Jason Gunthorpe
2019-07-16  4:35 ` pr-tracker-bot
2019-06-06 20:14 Jason Gunthorpe
2019-06-07 16:45 ` pr-tracker-bot
2019-05-15  0:46 Jason Gunthorpe
2019-05-15  4:05 ` pr-tracker-bot
2019-05-09 13:37 Jason Gunthorpe
2019-05-09 16:25 ` pr-tracker-bot
2019-04-10 18:46 Jason Gunthorpe
2019-04-10 19:55 ` pr-tracker-bot
2019-03-18  1:04 Jason Gunthorpe
2019-03-19 18:15 ` pr-tracker-bot
2019-03-07  1:34 Jason Gunthorpe
2019-03-10  1:40 ` pr-tracker-bot
2019-02-21 23:07 Jason Gunthorpe
2019-02-22 18:40 ` pr-tracker-bot
2019-02-01 17:41 Jason Gunthorpe
2019-02-01 18:45 ` pr-tracker-bot
2019-01-18  3:56 Jason Gunthorpe
2019-01-18  5:35 ` pr-tracker-bot
2019-01-04  5:00 Jason Gunthorpe
2019-01-06  2:40 ` pr-tracker-bot
2018-12-24 22:16 Jason Gunthorpe
2018-12-29  1:30 ` pr-tracker-bot
2018-11-29 22:52 Jason Gunthorpe
2018-10-25 21:21 Jason Gunthorpe
2018-10-26 14:50 ` Linus Torvalds
2018-09-27 18:24 Jason Gunthorpe
2018-09-27 20:17 ` Greg Kroah-Hartman
2018-09-10 23:04 Jason Gunthorpe
2018-08-22 21:44 Jason Gunthorpe
2018-08-16 21:57 Jason Gunthorpe
2018-08-17 19:31 ` Linus Torvalds
2018-08-17 19:44   ` Linus Torvalds
2018-08-17 20:50     ` Linus Torvalds
2018-08-17 21:16       ` Jason Gunthorpe
2018-08-17 23:56         ` Linus Torvalds
2018-08-17 20:15   ` Jason Gunthorpe
2018-08-17 20:27     ` Linus Torvalds
2018-08-17 21:27       ` Jason Gunthorpe
2018-08-03 16:11 Jason Gunthorpe
2018-07-13 17:13 Jason Gunthorpe
2018-06-20 21:00 Jason Gunthorpe
2018-06-06 21:42 Jason Gunthorpe
2018-06-01 16:47 Jason Gunthorpe
2018-05-24 20:51 Jason Gunthorpe
2018-04-06 16:05 Jason Gunthorpe
2018-05-16 17:39 ` Eugene Syromiatnikov
2018-05-16 17:49   ` Jason Gunthorpe
2018-05-16 18:01     ` Eugene Syromiatnikov
2018-03-29 19:17 Jason Gunthorpe
2018-03-20 23:17 Jason Gunthorpe
2018-01-31 17:47 Jason Gunthorpe
     [not found] ` <20180131174735.GA18568-uk2M96/98Pc@public.gmane.org>
2018-01-31 20:11   ` Linus Torvalds
     [not found]     ` <CA+55aFxmnW-iu1Na3QC8Ci8Q_Qdfn2Ak_9wDB6+A564R=Xn9Ag-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-31 21:04       ` Jason Gunthorpe
     [not found]         ` <20180131210457.GE23352-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2018-02-01 19:12           ` Linus Torvalds
     [not found]             ` <CA+55aFxySLoCHcmZgx2PiF6jEazVOSy=2idccRsWqOzmyK2gaw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-01 20:33               ` Jason Gunthorpe
2017-12-28 21:03 Jason Gunthorpe
2017-12-16 17:12 Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190428234935.GA15233@mellanox.com \
    --to=jgg@mellanox.com \
    --cc=dledford@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git