From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nitesh Narayan Lal Subject: Re: [PULL] vhost: cleanups and fixes Date: Thu, 14 Jun 2018 11:01:59 -0400 Message-ID: <0f18063c-c76b-4728-5145-810f069988ea@redhat.com> References: <20180611192353-mutt-send-email-mst@kernel.org> <20180612042600-mutt-send-email-mst@kernel.org> <5B1FA8EF.4030409@intel.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="C0WFAU9gndFtW7QPGmxTNU31IlbEQU5Zk" Cc: KVM list , virtualization , Network Development , Linux Kernel Mailing List , Andrew Morton , Bjorn Andersson To: Wei Wang , Linus Torvalds , "Michael S. Tsirkin" Return-path: In-Reply-To: <5B1FA8EF.4030409@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --C0WFAU9gndFtW7QPGmxTNU31IlbEQU5Zk Content-Type: multipart/mixed; boundary="sgSEkqYiSh3LZ7fwVPXMo9SKALwufKrUn"; protected-headers="v1" From: Nitesh Narayan Lal To: Wei Wang , Linus Torvalds , "Michael S. Tsirkin" Cc: KVM list , virtualization , Network Development , Linux Kernel Mailing List , Andrew Morton , Bjorn Andersson Message-ID: <0f18063c-c76b-4728-5145-810f069988ea@redhat.com> Subject: Re: [PULL] vhost: cleanups and fixes References: <20180611192353-mutt-send-email-mst@kernel.org> <20180612042600-mutt-send-email-mst@kernel.org> <5B1FA8EF.4030409@intel.com> In-Reply-To: <5B1FA8EF.4030409@intel.com> --sgSEkqYiSh3LZ7fwVPXMo9SKALwufKrUn Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US Hi Wei, On 06/12/2018 07:05 AM, Wei Wang wrote: > On 06/12/2018 09:59 AM, Linus Torvalds wrote: >> On Mon, Jun 11, 2018 at 6:36 PM Michael S. Tsirkin >> wrote: >>> Maybe it will help to have GFP_NONE which will make any allocation >>> fail if attempted. Linus, would this address your comment? >> It would definitely have helped me initially overlook that call chain.= >> >> But then when I started looking at the whole dma_map_page() thing, it >> just raised my hackles again. >> >> I would seriously suggest having a much simpler version for the "no >> allocation, no dma mapping" case, so that it's *obvious* that that >> never happens. >> >> So instead of having virtio_balloon_send_free_pages() call a really >> generic complex chain of functions that in _some_ cases can do memory >> allocation, why isn't there a short-circuited "vitruque_add_datum()" >> that is guaranteed to never do anything like that? >> >> Honestly, I look at "add_one_sg()" and it really doesn't make me >> happy. It looks hacky as hell. If I read the code right, you're really= >> trying to just queue up a simple tuple of , except you encode= >> it as a page pointer in order to play games with the SG logic, and >> then you hmap that to the ring, except in this case it's all a fake >> ring that just adds the cpu-physical address instead. >> >> And to figuer that out, it's like five layers of indirection through >> different helper functions that *can* do more generic things but in >> this case don't. >> >> And you do all of this from a core VM callback function with some >> _really_ core VM locks held. >> >> That makes no sense to me. >> >> How about this: >> >> =C2=A0 - get rid of all that code >> >> =C2=A0 - make the core VM callback save the "these are the free memory= >> regions" in a fixed and limited array. One that DOES JUST THAT. No >> crazy "SG IO dma-mapping function crap". Just a plain array of a fixed= >> size, pre-allocated for that virtio instance. >> >> =C2=A0 - make it obvious that what you do in that sequence is ten >> instructions and no allocations ("Look ma, I wrote a value to an array= >> and incremented the array idex, and I'M DONE") >> >> =C2=A0 - then in that workqueue entry that you start *anyway*, you emp= ty the >> array and do all the crazy virtio stuff. >> >> In fact, while at it, just simplify the VM interface too. Instead of >> traversing a random number of buddy lists, just trraverse *one* - the >> top-level one. Are you seriously ever going to shrink or mark >> read-only anythin *but* something big enough to be in the maximum >> order? >> >> MAX_ORDER is what, 11? So we're talking 8MB blocks. Do you *really* >> want the balloon code to work on smaller things, particularly since >> the whole interface is fundamentally racy and opportunistic to begin >> with? > > OK, I will implement a new version based on the suggestions. Thanks. I have been working on a similar series [1] that is more generic, which solves the problem of giving unused memory back to the host and could be used to solve the migration problem as well. Can you take a look and see if you can use my series in some way? [1] https://www.spinics.net/lists/kvm/msg170113.html > > Best, > Wei > --=20 Regards Nitesh --sgSEkqYiSh3LZ7fwVPXMo9SKALwufKrUn-- --C0WFAU9gndFtW7QPGmxTNU31IlbEQU5Zk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJbIoNnAAoJEKOGQNwGMqM5Dy4P/Ra5IFpF8SVy4Q9fWv/o5efH jGrm45/6BhRpOF73lsHROIUxncelHLW9TKWOCmSQ6kgC84OKAdo/1O5gNbWvVz5W 4IvPTGVEDAyAFj4AUlkkPMPbRapCRpbzPCuH+KaIVb4QDn8uHU+alSPLn+v1ZjPd zraiY7nJ4muqNlKraOrStNUnZ2Hny1ZPSha/IDWLWFk9S/pPB1pP1nghmcGQI3Tp UkJjx3uz1DDZ/fIlnkO93GVzZtYhK5NNQzgvc0b1E7fK0+FnHq7AmBQfG5ZokvSE u6icEZl2muRpgWz2Tzewycn28yR8P5/5EVnDkoti0lkET009AAAb/ylcmvtR6BHh ZWcGl1EK23268rN6RzYa4Bj/9cipbDNMjOu5zJX4/3/7d5t9qaWmIiM7HGXFhKLI Ly0YgK9uLngKlTT3B74w0PgX0+MXT6rxn1aPNZi9ddI68iHaynS/xwKWi9HkN86i 8ZhqwkkY5Gxa62R4ed+0LfNap6DgsiOisusHIi4eSfKbvhFFIgVN1/dOrNDe1dTq uJC/CrOjztJA9x469IL0Js6nhFDrrZMLzH5j6vQgJw/ifR3Ee6memY0ycf9kQrwj xHFpINvR477cbc3dhbDvCH1sOo/JwJJkvViqNFZKtgGjosLZC7ATWN8eucAQqnZT lSPpv+uFdTnKdSexYr4h =H2xU -----END PGP SIGNATURE----- --C0WFAU9gndFtW7QPGmxTNU31IlbEQU5Zk--