All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Xin, Xiaohui" <xiaohui.xin@intel.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@elte.hu" <mingo@elte.hu>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"herbert@gondor.hengli.com.au" <herbert@gondor.hengli.com.au>,
	"jdike@linux.intel.com" <jdike@linux.intel.com>
Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
Date: Wed, 15 Sep 2010 13:28:11 +0200	[thread overview]
Message-ID: <20100915112811.GB29267@redhat.com> (raw)
In-Reply-To: <F2E9EB7348B8264F86B6AB8151CE2D792B8C815365@shsmsx502.ccr.corp.intel.com>

On Wed, Sep 15, 2010 at 11:13:44AM +0800, Xin, Xiaohui wrote:
> >From: Michael S. Tsirkin [mailto:mst@redhat.com]
> >Sent: Sunday, September 12, 2010 9:37 PM
> >To: Xin, Xiaohui
> >Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> >mingo@elte.hu; davem@davemloft.net; herbert@gondor.hengli.com.au;
> >jdike@linux.intel.com
> >Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
> >
> >On Sat, Sep 11, 2010 at 03:41:14PM +0800, Xin, Xiaohui wrote:
> >> >>Playing with rlimit on data path, transparently to the application in this way
> >> >>looks strange to me, I suspect this has unexpected security implications.
> >> >>Further, applications may have other uses for locked memory
> >> >>besides mpassthru - you should not just take it because it's there.
> >> >>
> >> >>Can we have an ioctl that lets userspace configure how much
> >> >>memory to lock? This ioctl will decrement the rlimit and store
> >> >>the data in the device structure so we can do accounting
> >> >>internally. Put it back on close or on another ioctl.
> >> >Yes, we can decrement the rlimit in ioctl in one time to avoid
> >> >data path.
> >> >
> >> >>Need to be careful for when this operation gets called
> >> >>again with 0 or another small value while we have locked memory -
> >> >>maybe just fail with EBUSY?  or wait until it gets unlocked?
> >> >>Maybe 0 can be special-cased and deactivate zero-copy?.
> >> >>
> >>
> >> How about we don't use a new ioctl, but just check the rlimit
> >> in one MPASSTHRU_BINDDEV ioctl? If we find mp device
> >> break the rlimit, then we fail the bind ioctl, and thus can't do
> >> zero copy any more.
> >
> >Yes, and not just check, but decrement as well.
> >I think we should give userspace control over
> >how much memory we can lock and subtract from the rlimit.
> >It's OK to add this as a parameter to MPASSTHRU_BINDDEV.
> >Then increment the rlimit back on unbind and on close?
> >
> >This opens up an interesting condition: process 1
> >calls bind, process 2 calls unbind or close.
> >This will increment rlimit for process 2.
> >Not sure how to fix this properly.
> >
> I can't too, can we do any synchronous operations on rlimit stuff?
> I quite suspect in it.
>  
> >--
> >MST

Here's what infiniband does: simply pass the amount of memory userspace
wants you to lock on an ioctl, and verify that either you have
CAP_IPC_LOCK or this number does not exceed the current rlimit.  (must
be on ioctl, not on open, as we likely want the fd passed around between
processes), but do not decrement rlimit.  Use this on following
operations.  Be careful if this can be changed while operations are in
progress.

This does mean that the effective amount of memory that userspace can
lock is doubled, but at least it is not unlimited, and we sidestep all
other issues such as userspace running out of lockable memory simply by
virtue of using the driver.

-- 
MST

  reply	other threads:[~2010-09-15 11:34 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-06  9:23 [RFC PATCH v9 00/16] Provide a zero-copy method on KVM virtio-net xiaohui.xin
2010-08-06  9:23 ` xiaohui.xin
2010-08-06  9:23 ` [RFC PATCH v9 01/16] Add a new structure for skb buffer from external xiaohui.xin
2010-08-06  9:23   ` xiaohui.xin
2010-08-06  9:23   ` [RFC PATCH v9 02/16] Add a new struct for device to manipulate external buffer xiaohui.xin
2010-08-06  9:23     ` xiaohui.xin
2010-08-06  9:23     ` [RFC PATCH v9 03/16] Add a ndo_mp_port_prep func to net_device_ops xiaohui.xin
2010-08-06  9:23       ` xiaohui.xin
2010-08-06  9:23       ` [RFC PATCH v9 04/16] Add a function make external buffer owner to query capability xiaohui.xin
2010-08-06  9:23         ` xiaohui.xin
2010-08-06  9:23         ` [RFC PATCH v9 05/16] Add a function to indicate if device use external buffer xiaohui.xin
2010-08-06  9:23           ` xiaohui.xin
2010-08-06  9:23           ` [RFC PATCH v9 06/16] Use callback to deal with skb_release_data() specially xiaohui.xin
2010-08-06  9:23             ` xiaohui.xin
2010-08-06  9:23             ` [RFC PATCH v9 07/16] Modify netdev_alloc_page() to get external buffer xiaohui.xin
2010-08-06  9:23               ` xiaohui.xin
2010-08-06  9:23               ` [RFC PATCH v9 08/16] Modify netdev_free_page() to release " xiaohui.xin
2010-08-06  9:23                 ` xiaohui.xin
2010-08-06  9:23                 ` [RFC PATCH v9 09/16] Don't do skb recycle, if device use " xiaohui.xin
2010-08-06  9:23                   ` xiaohui.xin
2010-08-06  9:23                   ` [RFC PATCH v9 10/16] Add a hook to intercept external buffers from NIC driver xiaohui.xin
2010-08-06  9:23                     ` xiaohui.xin
2010-08-06  9:23                     ` [RFC PATCH v9 11/16] Add header file for mp device xiaohui.xin
2010-08-06  9:23                       ` xiaohui.xin
2010-08-06  9:23                       ` [RFC PATCH v9 13/16] Add a kconfig entry and make entry " xiaohui.xin
2010-08-06  9:23                         ` xiaohui.xin
2010-08-06  9:23                         ` [RFC PATCH v9 12/16] Add mp(mediate passthru) device xiaohui.xin
2010-08-06  9:23                           ` xiaohui.xin
2010-08-06  9:23                           ` [RFC PATCH v9 14/16] Provides multiple submits and asynchronous notifications xiaohui.xin
2010-08-06  9:23                             ` xiaohui.xin
2010-08-06  9:23                             ` [RFC PATCH v9 15/16] An example how to modifiy NIC driver to use napi_gro_frags() interface xiaohui.xin
2010-08-06  9:23                               ` xiaohui.xin
2010-08-06  9:23                               ` [RFC PATCH v9 16/16] An example how to alloc user buffer based on " xiaohui.xin
2010-08-06  9:23                                 ` xiaohui.xin
2010-09-06 11:11                           ` [RFC PATCH v9 12/16] Add mp(mediate passthru) device Michael S. Tsirkin
2010-09-10 13:40                             ` Xin, Xiaohui
2010-09-11  7:41                               ` Xin, Xiaohui
2010-09-12 13:37                                 ` Michael S. Tsirkin
2010-09-15  3:13                                   ` Xin, Xiaohui
2010-09-15 11:28                                     ` Michael S. Tsirkin [this message]
2010-09-17  3:16                                       ` Xin, Xiaohui
2010-09-20  8:08                                       ` xiaohui.xin
2010-09-20 11:36                                         ` Michael S. Tsirkin
2010-09-21  1:39                                           ` Xin, Xiaohui
2010-09-21 13:14                                             ` Michael S. Tsirkin
2010-09-22 11:41                                               ` Xin, Xiaohui
2010-09-22 11:55                                                 ` Michael S. Tsirkin
2010-09-23 12:56                                                   ` Xin, Xiaohui
2010-09-26 11:50                                                     ` Michael S. Tsirkin
2010-09-27  0:42                                                       ` Xin, Xiaohui
2010-09-11  9:42                               ` Xin, Xiaohui
2010-08-11  1:23 ` [RFC PATCH v9 00/16] Provide a zero-copy method on KVM virtio-net Shirley Ma
2010-08-11  1:23   ` Shirley Ma
2010-08-11  1:43   ` Shirley Ma
2010-08-11  1:43     ` Shirley Ma
2010-08-11  6:01     ` Shirley Ma
2010-08-11  6:01       ` Shirley Ma
2010-08-11  6:55       ` Shirley Ma
2010-08-11  6:55         ` Shirley Ma
2010-09-03 10:52         ` Michael S. Tsirkin
2010-09-13 18:48           ` Shirley Ma
2010-09-13 21:35           ` Shirley Ma
2010-09-03 10:14   ` Michael S. Tsirkin
2010-09-03 20:29     ` Sridhar Samudrala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100915112811.GB29267@redhat.com \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.hengli.com.au \
    --cc=jdike@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=xiaohui.xin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.