From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32824) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cAEmj-000138-9v for qemu-devel@nongnu.org; Fri, 25 Nov 2016 06:35:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cAEmg-0007kT-5Y for qemu-devel@nongnu.org; Fri, 25 Nov 2016 06:35:17 -0500 Received: from mail-wm0-x244.google.com ([2a00:1450:400c:c09::244]:34800) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1cAEmf-0007jB-Qs for qemu-devel@nongnu.org; Fri, 25 Nov 2016 06:35:14 -0500 Received: by mail-wm0-x244.google.com with SMTP id g23so7522682wme.1 for ; Fri, 25 Nov 2016 03:35:13 -0800 (PST) Date: Fri, 25 Nov 2016 11:35:08 +0000 From: Stefan Hajnoczi Message-ID: <20161125113508.GC4939@stefanha-x1.localdomain> References: <20161118133611.GC5371@redhat.com> <40265568-6388-e302-0bbf-a08a6746a686@redhat.com> <5DCF5E88-0BC6-443B-B557-9A72D32A4D49@veritas.com> <20161124111135.GC9117@stefanha-x1.localdomain> <20161124160856.GB13535@stefanha-x1.localdomain> <3B08C602-0033-4FCD-AE83-F0962322A7F7@veritas.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1SQmhf2mF2YjsYvc" Content-Disposition: inline In-Reply-To: <3B08C602-0033-4FCD-AE83-F0962322A7F7@veritas.com> Subject: Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ketan Nilangekar Cc: Paolo Bonzini , ashish mittal , "Daniel P. Berrange" , Jeff Cody , qemu-devel , Kevin Wolf , Markus Armbruster , Fam Zheng , Ashish Mittal , Abhijit Dey , Buddhi Madhav , "Venkatesha M.G." , Nitin Jerath , Gaurav Bhandarkar , Abhishek Kane , Ketan Mahajan , Niranjan Pendharkar --1SQmhf2mF2YjsYvc Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 25, 2016 at 08:27:26AM +0000, Ketan Nilangekar wrote: > On 11/24/16, 9:38 PM, "Stefan Hajnoczi" wrote: > On Thu, Nov 24, 2016 at 11:31:14AM +0000, Ketan Nilangekar wrote: > > On 11/24/16, 4:41 PM, "Stefan Hajnoczi" wrote: > > On Thu, Nov 24, 2016 at 05:44:37AM +0000, Ketan Nilangekar wrot= e: > > > On 11/24/16, 4:07 AM, "Paolo Bonzini" w= rote: > > > >On 23/11/2016 23:09, ashish mittal wrote: > > > >> On the topic of protocol security - > > > >>=20 > > > >> Would it be enough for the first patch to implement only > > > >> authentication and not encryption? > > > > > > > >Yes, of course. However, as we introduce more and more QEMU= -specific > > > >characteristics to a protocol that is already QEMU-specific = (it doesn't > > > >do failover, etc.), I am still not sure of the actual benefi= t of using > > > >libqnio versus having an NBD server or FUSE driver. > > > > > > > >You have already mentioned performance, but the design has c= hanged so > > > >much that I think one of the two things has to change: eithe= r failover > > > >moves back to QEMU and there is no (closed source) translato= r running on > > > >the node, or the translator needs to speak a well-known and > > > >already-supported protocol. > > >=20 > > > IMO design has not changed. Implementation has changed signif= icantly. I would propose that we keep resiliency/failover code out of QEMU = driver and implement it entirely in libqnio as planned in a subsequent revi= sion. The VxHS server does not need to understand/handle failover at all.= =20 > > >=20 > > > Today libqnio gives us significantly better performance than = any NBD/FUSE implementation. We know because we have prototyped with both. = Significant improvements to libqnio are also in the pipeline which will use= cross memory attach calls to further boost performance. Ofcourse a big rea= son for the performance is also the HyperScale storage backend but we belie= ve this method of IO tapping/redirecting can be leveraged by other solution= s as well. > > =20 > > By "cross memory attach" do you mean > > process_vm_readv(2)/process_vm_writev(2)? > > =20 > > Ketan> Yes. > > =20 > > That puts us back to square one in terms of security. You have > > (untrusted) QEMU + (untrusted) libqnio directly accessing the m= emory of > > another process on the same machine. That process is therefore= also > > untrusted and may only process data for one guest so that guest= s stay > > isolated from each other. > > =20 > > Ketan> Understood but this will be no worse than the current networ= k based communication between qnio and vxhs server. And although we have qu= estions around QEMU trust/vulnerability issues, we are looking to implement= basic authentication scheme between libqnio and vxhs server. > =20 > This is incorrect. > =20 > Cross memory attach is equivalent to ptrace(2) (i.e. debugger) access. > It means process A reads/writes directly from/to process B memory. B= oth > processes must have the same uid/gid. There is no trust boundary > between them. > =20 > Ketan> Not if vxhs server is running as root and initiating the cross mem= attach. Which is also why we are proposing a basic authentication mechanis= m between qemu-vxhs. But anyway the cross memory attach is for a near futur= e implementation.=20 >=20 > Network communication does not require both processes to have the same > uid/gid. If you want multiple QEMU processes talking to a single ser= ver > there must be a trust boundary between client and server. The server > can validate the input from the client and reject undesired operation= s. >=20 > Ketan> This is what we are trying to propose. With the addition of authen= tication between qemu-vxhs server, we should be able to achieve this. Quest= ion is, would that be acceptable? > =20 > Hope this makes sense now. > =20 > Two architectures that implement the QEMU trust model correctly are: > =20 > 1. Cross memory attach: each QEMU process has a dedicated vxhs server > process to prevent guests from attacking each other. This is wher= e I > said you might as well put the code inside QEMU since there is no > isolation anyway. From what you've said it sounds like the vxhs > server needs a host-wide view and is responsible for all guests > running on the host, so I guess we have to rule out this > architecture. > =20 > 2. Network communication: one vxhs server process and multiple guests. > Here you might as well use NBD or iSCSI because it already exists = and > the vxhs driver doesn't add any unique functionality over existing > protocols. >=20 > Ketan> NBD does not give us the performance we are trying to achieve. Bes= ides NBD does not have any authentication support. NBD over TCP supports TLS with X.509 certificate authentication. I think Daniel Berrange mentioned that. NBD over AF_UNIX does not need authentication because it relies on file permissions for access control. Each guest should have its own UNIX domain socket that it connects to. That socket can only see exports that have been assigned to the guest. > There is a hybrid 2.a approach which uses both 1 & 2 but I=E2=80=99d keep= that for a later discussion. Please discuss it now so everyone gets on the same page. I think there is a big gap and we need to communicate so that progress can be made. > > There's an easier way to get even better performance: get rid o= f libqnio > > and the external process. Move the code from the external proc= ess into > > QEMU to eliminate the process_vm_readv(2)/process_vm_writev(2) = and > > context switching. > > =20 > > Can you remind me why there needs to be an external process? > > =20 > > Ketan> Apart from virtualizing the available direct attached stora= ge on the compute, vxhs storage backend (the external process) provides fea= tures such as storage QoS, resiliency, efficient use of direct attached sto= rage, automatic storage recovery points (snapshots) etc. Implementing this = in QEMU is not practical and not the purpose of proposing this driver. > =20 > This sounds similar to what QEMU and Linux (file systems, LVM, RAID, > etc) already do. It brings to mind a third architecture: > =20 > 3. A Linux driver or file system. Then QEMU opens a raw block device. > This is what the Ceph rbd block driver in Linux does. This > architecture has a kernel-userspace boundary so vxhs does not have= to > trust QEMU. > =20 > I suggest Architecture #2. You'll be able to deploy on existing syst= ems > because QEMU already supports NBD or iSCSI. Use the time you gain fr= om > switching to this architecture on benchmarking and optimizing NBD or > iSCSI so performance is closer to your goal. > =20 > Ketan> We have made a choice to go with QEMU driver approach after seriou= s evaluation of most if not all standard IO tapping mechanisms including NF= S, NBD and FUSE. None of these has been able to deliver the performance tha= t we have set ourselves to achieve. Hence the effort to propose this new IO= tap which we believe will provide an alternate to the existing mechanisms = and hopefully benefit the community. I thought the VxHS block driver was another network block driver like GlusterFS or Sheepdog but you are actually proposing a new local I/O tap with the goal of better performance. Please share fio(1) or other standard benchmark configuration files and performance results. NBD and libqnio wire protocols have comparable performance characteristics. There is no magic that should give either one a fundamental edge over the other. Am I missing something? The main performance difference is probably that libqnio opens 8 simultaneous connections but that's not unique to the wire protocol. What happens when you run 8 NBD simultaneous TCP connections? Stefan --1SQmhf2mF2YjsYvc Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEcBAEBAgAGBQJYOCHsAAoJEJykq7OBq3PIiLUIAJFdzV1LlCDMNCTnoA/Meapd aW4sW1pPuIn2HtF8Z/A/W03PQ+QabVvZLaDwxsaFePnslTUET4Zqvl1r7KyA956n 8EBLvxbbJnabrLhWtzvEKP5dlUIMMaCbxvM7OAcWBLDqLxrYcoAuaR67MO72d7F2 5qRq60lS1od8U2Xg1wQaOigcLvRL46XCxWOrEYDtHSd7zD8EKcp0QRf0kSVqJjsO ioDd2RJOdS0924amy6/19Fdn7bAH2bbJNuij96TeBWt0iCGrGij8ZrPgnlo3yvZN KayMgAbhLXt3UIJJiG3qj1iM6ScZrmhF2z5PN4GxLJOoXBXF+Lt8839eFkaLL9k= =H+Kg -----END PGP SIGNATURE----- --1SQmhf2mF2YjsYvc--