From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54174) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X7ONW-0002Tu-Fa for qemu-devel@nongnu.org; Wed, 16 Jul 2014 08:32:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X7ONP-0007oA-E2 for qemu-devel@nongnu.org; Wed, 16 Jul 2014 08:32:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:29255) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X7ONP-0007nc-6E for qemu-devel@nongnu.org; Wed, 16 Jul 2014 08:32:03 -0400 Message-ID: <53C670B9.7030008@redhat.com> Date: Wed, 16 Jul 2014 14:31:53 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1404495717-4239-1-git-send-email-dgilbert@redhat.com> <1404495717-4239-8-git-send-email-dgilbert@redhat.com> <53B7CE18.9020907@redhat.com> <20140716093742.GE2514@work-vm> <53C64AE1.8070805@redhat.com> <20140716115233.GH2514@work-vm> In-Reply-To: <20140716115233.GH2514@work-vm> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Dr. David Alan Gilbert" Cc: aarcange@redhat.com, yamahata@private.email.ne.jp, lilei@linux.vnet.ibm.com, qemu-devel@nongnu.org, quintela@redhat.com Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto: > * Paolo Bonzini (pbonzini@redhat.com) wrote: >> Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto: >>>>> >>>>> + >>>>> + /* If it's already open, return it */ >>>>> + if (qfs->file->return_path) { >>>>> + return qfs->file->return_path; >>>> >>>> Wouldn't this leave a dangling file descriptor if you call >>>> socket_dup_return_path twice, and then close the original QEMUFile? >>> >>> Hmm - how? >> >> The problem is that there is no reference count on QEMUFile, so if you do >> >> f1 = open_return_path(f0); >> f2 = open_return_path(f0); >> /* now f1 == f2 */ >> qemu_fclose(f1); >> /* now f2 is dangling */ > > I think from the way I'm using it, I can remove the optimisation, but I do > need to check. > > I'm not too sure what your worry is about 'f2' in this case; I guess the caller > needs to know that it should only close the return path once - is that > your worry? Yes. The API is not well encapsulated; a "random" caller of open_return_path does not know (and cannot know) whether it should close the returned file or not. > I'm more nervous about dropping that one, because the current scheme > does provide a clean way of finding the forward path if you've got the > reverse (although I don't think I make use of it). If it isn't used, why keep it? >>> Source side >>> Forward path - written by migration thread >>> : It's OK for this to be blocking, but we end up with it being >>> non-blocking, and modify the socket code to emulate blocking. >> >> This likely has a performance impact though. The first migration thread >> code drop from Juan already improved throughput a lot, even if it kept the >> iothread all the time and only converted from nonblocking writes to >> blocking. > > Can you give some more reasoning as to why you think this will hit the > performance so much; I thought the output buffers were quite big anyway. I don't really know, it's >>> Return path - opened by main thread, read by fd_handler on main thread >>> : Must be non-blocking so as not to block the main thread while >>> waiting for a partially sent command. >> >> Why can't you handle this in the migration thread (or a new postcopy thread >> on the source side)? Then it can stay blocking. > > Handling it within the migration thread would make it much more complicated > (which would be bad since it's already complex enough); Ok. I'm not sure why it is more complicated since migration is essentially two-phase, one where the source drives it and one where the source just waits for requests, but I'll trust you on this. :) >>> Destination side >>> Forward path - read by main thread >> >> This must be nonblocking so that the monitor keeps responding. > > Interesting, I suspect none of the code in there is set up for that at the > moment, so how does that work during migration at the moment? It sure is. :) On the destination side, migration is done in a coroutine (see process_incoming_migration) so it's all transparent. Only socket_get_buffer has to know about this: len = qemu_recv(s->fd, buf, size, 0); if (len != -1) { break; } if (socket_error() == EAGAIN) { yield_until_fd_readable(s->fd); } else if (socket_error() != EINTR) { break; } If the socket is put in blocking mode recv will never return EAGAIN, so this code will only run if the socket is nonblocking. > Actually, I see I missed something here; this should be: > > Destination side > Forward path - read by main thread, and listener thread (see the > separate mail that described that listner thread) > > and that means that once postcopy is going (and the listener thread started) > it can't block the monitor. Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) once it gets its hands on the QEMUFile. >>> Return path - opened by main thread, written by main thread AND postcopy >>> thread (protected by rp_mutex) >> >> When does the main thread needs to write? > > Not much; the only things the main thread currently responds to are the > ReqAck (ping like) requests; those are turning out to be very useful during debug; > I've also got the ability for the destination to send a migration result back to the > source which seems useful to be able to 'fail' early. Why can't this be done in the listener thread? (Thus transforming it into a more general postcopy migration thread; later we could even change incoming migration from a coroutine to a thread). >> If it doesn't need that, you can just switch to blocking when you process >> the listen command (i.e. when the postcopy thread starts). > > Why don't I just do it anyway? Prior to postcopy starting we're in the same > situation as we're in with precopy today, so can already get mainblock threading. See above for the explanation. Paolo