From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54129) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eBkNS-0004GF-AV for qemu-devel@nongnu.org; Mon, 06 Nov 2017 11:35:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eBkNO-0001U2-G5 for qemu-devel@nongnu.org; Mon, 06 Nov 2017 11:35:58 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33374) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eBkNO-0001Sl-9g for qemu-devel@nongnu.org; Mon, 06 Nov 2017 11:35:54 -0500 References: <20171031112457.10516.8971.stgit@pasha-VirtualBox> <20171031112633.10516.44062.stgit@pasha-VirtualBox> <92aa3279-66b5-b765-b36b-2acb6413bd47@redhat.com> <001301d35484$75071110$5f153330$@ru> <87tvybhewj.fsf@linaro.org> <6ef0c3d0-41e5-d3cf-e84d-857ff1b47e48@redhat.com> <8760ansgjx.fsf@linaro.org> <87zi7zqshq.fsf@linaro.org> From: Paolo Bonzini Message-ID: Date: Mon, 6 Nov 2017 17:35:43 +0100 MIME-Version: 1.0 In-Reply-To: <87zi7zqshq.fsf@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH 17/26] replay: push replay_mutex_lock up the call tree List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?Q?Alex_Benn=c3=a9e?= Cc: Pavel Dovgalyuk , 'Pavel Dovgalyuk' , qemu-devel@nongnu.org, kwolf@redhat.com, peter.maydell@linaro.org, boost.lists@gmail.com, quintela@redhat.com, jasowang@redhat.com, mst@redhat.com, zuban32s@gmail.com, maria.klimushenkova@ispras.ru, kraxel@redhat.com On 06/11/2017 17:30, Alex Benn=C3=A9e wrote: > Previously the synchronisation of the main thread and the vCPU thread > was ensured by the holding of the BQL. However the trend has been to > reduce the time the BQL was held across the system including under TC= G > system emulation. As it is important that batches of events are kept > in sequence (e.g. expiring timers and checkpoints in the main thread > while instruction checkpoints are written by the vCPU thread) we need > another lock to keep things in lock-step. This role is now handled by > the replay_mutex_lock. It used to be held only for each event being > written but now it is held for a whole execution period. This results > in a deterministic ping-pong between the two main threads. I would remove the last two sentences (which might belong in a commit message, but not in documentation). > As the BQL is now a finer grained lock than the replay_lock it is > almost certainly a bug taking the replay_mutex_lock while the BQL is > held. This is enforced by an assert. While the unlocks are usually in > the reverse order it is not necessary and therefor you can drop the > replay_lock while holding the BQL rather than doing any more > unlock/unlock/lock sequences. As the BQL is now a finer grained lock than the replay_lock it is almost certainly a bug, and a source of deadlocks, to take the replay_mutex_lock while the BQL is held. This is enforced by an assert. While the unlocks are usually in the reverse order, this is not necessary; you can drop the replay_lock while holding the BQL, without doing a more complicated unlock_iothread/replay_unlock/lock_iothread sequence. Paolo