From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:33557) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1URccg-0003aJ-RP for qemu-devel@nongnu.org; Mon, 15 Apr 2013 02:10:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1URcce-00027X-9B for qemu-devel@nongnu.org; Mon, 15 Apr 2013 02:10:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:32493) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1URcce-00027N-1G for qemu-devel@nongnu.org; Mon, 15 Apr 2013 02:10:36 -0400 Date: Mon, 15 Apr 2013 09:10:29 +0300 From: "Michael S. Tsirkin" Message-ID: <20130415061028.GB16107@redhat.com> References: <20130411191533.GA25515@redhat.com> <51671DFF.80904@linux.vnet.ibm.com> <20130412104802.GA23467@redhat.com> <5168105C.5040605@linux.vnet.ibm.com> <20130414082827.GA1548@redhat.com> <516ABDB8.1090100@linux.vnet.ibm.com> <20130414185116.GE7165@redhat.com> <516B06E0.9040804@linux.vnet.ibm.com> <20130414211647.GG7165@redhat.com> <516B538C.5060008@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <516B538C.5060008@linux.vnet.ibm.com> Subject: Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael R. Hines" Cc: aliguori@us.ibm.com, qemu-devel@nongnu.org, owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com, gokul@us.ibm.com, Paolo Bonzini On Sun, Apr 14, 2013 at 09:10:36PM -0400, Michael R. Hines wrote: > On 04/14/2013 05:16 PM, Michael S. Tsirkin wrote: > >On Sun, Apr 14, 2013 at 03:43:28PM -0400, Michael R. Hines wrote: > >>On 04/14/2013 02:51 PM, Michael S. Tsirkin wrote: > >>>On Sun, Apr 14, 2013 at 10:31:20AM -0400, Michael R. Hines wrote: > >>>>On 04/14/2013 04:28 AM, Michael S. Tsirkin wrote: > >>>>>On Fri, Apr 12, 2013 at 09:47:08AM -0400, Michael R. Hines wrote: > >>>>>>Second, as I've explained, I strongly, strongly disagree with unregistering > >>>>>>memory for all of the aforementioned reasons - workloads do not > >>>>>>operate in such a manner that they can tolerate memory to be > >>>>>>pulled out from underneath them at such fine-grained time scales > >>>>>>in the *middle* of a relocation and I will not commit to writing a solution > >>>>>>for a problem that doesn't exist. > >>>>>Exactly same thing happens with swap, doesn't it? > >>>>>You are saying workloads simply can not tolerate swap. > >>>>> > >>>>>>If you can prove (through some kind of anaylsis) that workloads > >>>>>>would benefit from this kind of fine-grained memory overcommit > >>>>>>by having cgroups swap out memory to disk underneath them > >>>>>>without their permission, I would happily reconsider my position. > >>>>>> > >>>>>>- Michael > >>>>>This has nothing to do with cgroups directly, it's just a way to > >>>>>demonstrate you have a bug. > >>>>> > >>>>If your datacenter or your cloud or your product does not want to > >>>>tolerate page registration, then don't use RDMA! > >>>> > >>>>The bottom line is: RDMA is useless without page registration. Without > >>>>it, the performance of it will be crippled. If you define that as a bug, > >>>>then so be it. > >>>> > >>>>- Michael > >>>No one cares if you do page registration or not. ulimit -l 10g is the > >>>problem. You should limit the amount of locked memory. > >>>Lots of good research went into making RDMA go fast with limited locked > >>>memory, with some success. Search for "registration cache" for example. > >>> > >>Patches using such a cache would be welcome. > >> > >>- Michael > >> > >And when someone writes them one day, we'll have to carry the old code > >around for interoperability as well. Not pretty. To avoid that, you > >need to explicitly say in the documenation that it's experimental and > >unsupported. > > > > That's what protocols are for. > > As I've already said, I've incorporated this into the design of the protocol > already. > > The protocol already has a field called "repeat" which allows a user to > request multiple chunk registrations at the same time. > If you insist, I can add a capability / command to the protocol > called "unregister chunk", > but I'm not volunteering to implement that command as I don't have any data > showing it to be of any value. The value would be being able to run your code in qemu as unpriveledged user. > That would insulate the protocol against any such future > "registration cache" design. > > - Michael > It won't. If it's unimplemented it won't be of any use since now your code does not implement the protocol fully. -- MST