From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752914Ab0IOLea (ORCPT ); Wed, 15 Sep 2010 07:34:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:31220 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750844Ab0IOLe2 (ORCPT ); Wed, 15 Sep 2010 07:34:28 -0400 Date: Wed, 15 Sep 2010 13:28:11 +0200 From: "Michael S. Tsirkin" To: "Xin, Xiaohui" Cc: "netdev@vger.kernel.org" , "kvm@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "mingo@elte.hu" , "davem@davemloft.net" , "herbert@gondor.hengli.com.au" , "jdike@linux.intel.com" Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device. Message-ID: <20100915112811.GB29267@redhat.com> References: <1281086624-5765-10-git-send-email-xiaohui.xin@intel.com> <1281086624-5765-11-git-send-email-xiaohui.xin@intel.com> <1281086624-5765-12-git-send-email-xiaohui.xin@intel.com> <1281086624-5765-13-git-send-email-xiaohui.xin@intel.com> <1281086624-5765-14-git-send-email-xiaohui.xin@intel.com> <20100906111126.GA15608@redhat.com> <20100912133703.GJ22982@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-12-10) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 15, 2010 at 11:13:44AM +0800, Xin, Xiaohui wrote: > >From: Michael S. Tsirkin [mailto:mst@redhat.com] > >Sent: Sunday, September 12, 2010 9:37 PM > >To: Xin, Xiaohui > >Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; > >mingo@elte.hu; davem@davemloft.net; herbert@gondor.hengli.com.au; > >jdike@linux.intel.com > >Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device. > > > >On Sat, Sep 11, 2010 at 03:41:14PM +0800, Xin, Xiaohui wrote: > >> >>Playing with rlimit on data path, transparently to the application in this way > >> >>looks strange to me, I suspect this has unexpected security implications. > >> >>Further, applications may have other uses for locked memory > >> >>besides mpassthru - you should not just take it because it's there. > >> >> > >> >>Can we have an ioctl that lets userspace configure how much > >> >>memory to lock? This ioctl will decrement the rlimit and store > >> >>the data in the device structure so we can do accounting > >> >>internally. Put it back on close or on another ioctl. > >> >Yes, we can decrement the rlimit in ioctl in one time to avoid > >> >data path. > >> > > >> >>Need to be careful for when this operation gets called > >> >>again with 0 or another small value while we have locked memory - > >> >>maybe just fail with EBUSY? or wait until it gets unlocked? > >> >>Maybe 0 can be special-cased and deactivate zero-copy?. > >> >> > >> > >> How about we don't use a new ioctl, but just check the rlimit > >> in one MPASSTHRU_BINDDEV ioctl? If we find mp device > >> break the rlimit, then we fail the bind ioctl, and thus can't do > >> zero copy any more. > > > >Yes, and not just check, but decrement as well. > >I think we should give userspace control over > >how much memory we can lock and subtract from the rlimit. > >It's OK to add this as a parameter to MPASSTHRU_BINDDEV. > >Then increment the rlimit back on unbind and on close? > > > >This opens up an interesting condition: process 1 > >calls bind, process 2 calls unbind or close. > >This will increment rlimit for process 2. > >Not sure how to fix this properly. > > > I can't too, can we do any synchronous operations on rlimit stuff? > I quite suspect in it. > > >-- > >MST Here's what infiniband does: simply pass the amount of memory userspace wants you to lock on an ioctl, and verify that either you have CAP_IPC_LOCK or this number does not exceed the current rlimit. (must be on ioctl, not on open, as we likely want the fd passed around between processes), but do not decrement rlimit. Use this on following operations. Be careful if this can be changed while operations are in progress. This does mean that the effective amount of memory that userspace can lock is doubled, but at least it is not unlimited, and we sidestep all other issues such as userspace running out of lockable memory simply by virtue of using the driver. -- MST