From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57727) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuK1g-0006H8-Gk for qemu-devel@nongnu.org; Mon, 27 Aug 2018 12:06:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuK1c-00035w-B4 for qemu-devel@nongnu.org; Mon, 27 Aug 2018 12:05:58 -0400 References: <20180629124052.331406-1-dplotnikov@virtuozzo.com> <4eca96cf-5520-fb82-2e23-afff3ba35077@virtuozzo.com> <7da20590-58d7-4ef0-92b9-fcb955f28c9b@virtuozzo.com> <20180813163048.GO4323@localhost.localdomain> <4dcf31f4-b2e0-e3cf-47ac-a9e745a9e2d8@virtuozzo.com> From: John Snow Message-ID: Date: Mon, 27 Aug 2018 12:05:37 -0400 MIME-Version: 1.0 In-Reply-To: <4dcf31f4-b2e0-e3cf-47ac-a9e745a9e2d8@virtuozzo.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [Qemu-block] [PATCH v0 0/2] Postponed actions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Denis Plotnikov , Kevin Wolf Cc: famz@redhat.com, qemu-block@nongnu.org, qemu-devel@nongnu.org, qemu-stable@nongnu.org, stefanha@redhat.com, mreitz@redhat.com On 08/27/2018 03:05 AM, Denis Plotnikov wrote: > PING! PING! >=20 Sorry, Kevin and Stefan are both on PTO right now, I think. I can't promise I have the time to look soon, but you at least deserve an answer for the radio silence the last week. --js > On 14.08.2018 10:08, Denis Plotnikov wrote: >> >> >> On 13.08.2018 19:30, Kevin Wolf wrote: >>> Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben: >>>> Ping ping! >>>> >>>> On 16.07.2018 21:59, John Snow wrote: >>>>> >>>>> >>>>> On 07/16/2018 11:01 AM, Denis Plotnikov wrote: >>>>>> Ping! >>>>>> >>>>> >>>>> I never saw a reply to Stefan's question on July 2nd, did you reply >>>>> off-list? >>>>> >>>>> --js >>>> Yes, I did. I talked to Stefan why the patch set appeared. >>> >>> The rest of us still don't know the answer. I had the same question. >>> >>> Kevin >> Yes, that's my fault. I should have post it earlier. >> >> I reviewed the problem once again and come up with the following >> explanation. >> Indeed, if the global lock has been taken by the main thread the vCPU >> threads won't be able to execute mmio ide. >> But, if the main thread will release the lock then nothing will preven= t >> vCPU threads form execution what they want, e.g writing to the block >> device. >> >> In case of running the mirroring it is possible. Let's take a look >> at the following snippet of mirror_run. This is a part the mirroring >> completion part. >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= bdrv_drained_begin(bs); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= cnt =3D bdrv_get_dirty_count(s->dirty_bitmap); >> =C2=A0>>>>>>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (cnt > 0 || mirror_flush= (s) < 0) { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 bdrv_drained_end(bs); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 continue; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= } >> >> (X) >>>>=C2=A0=C2=A0=C2=A0 assert(QLIST_EMPTY(&bs->tracked_requests)); >> >> mirror_flush here can yield the current coroutine so nothing more can >> be executed. >> We could end up with the situation when the main loop have to revolve >> to poll for another timer/bh to process. While revolving it releases >> the global lock. If the global lock is waited for by a vCPU (any >> other) thread, the waiting thread will get the lock and make what it >> intends. >> >> This is something that I can observe: >> >> mirror_flush yields coroutine, the main thread revolves and locks >> because a vCPU was waiting for the lock. Now the vCPU thread owns the >> lock and the main thread waits for the lock releasing. >> The vCPU thread does cmd_write_dma and releases the lock. Then, the ma= in >> thread gets the lock and continues to run eventually proceeding with >> the coroutine yeiled. >> If the vCPU requests aren't completed by the moment we will assert at >> (X). If the vCPU requests are completed we won't even notice that we >> had some writes while in the drained section. >> >> Denis >>> >>>>>> On 29.06.2018 15:40, Denis Plotnikov wrote: >>>>>>> There are cases when a request to a block driver state shouldn't >>>>>>> have >>>>>>> appeared producing dangerous race conditions. >>>>>>> This misbehaviour is usually happens with storage devices emulate= d >>>>>>> without eventfd for guest to host notifications like IDE. >>>>>>> >>>>>>> The issue arises when the context is in the "drained" section >>>>>>> and doesn't expect the request to come, but request comes from th= e >>>>>>> device not using iothread and which context is processed by the m= ain >>>>>>> loop. >>>>>>> >>>>>>> The main loop apart of the iothread event loop isn't blocked by t= he >>>>>>> "drained" section. >>>>>>> The request coming and processing while in "drained" section can >>>>>>> spoil >>>>>>> the >>>>>>> block driver state consistency. >>>>>>> >>>>>>> This behavior can be observed in the following KVM-based case: >>>>>>> >>>>>>> 1. Setup a VM with an IDE disk. >>>>>>> 2. Inside a VM start a disk writing load for the IDE device >>>>>>> =C2=A0 =C2=A0=C2=A0 e.g: dd if=3D of=3D bs=3DX count=3D= Y oflag=3Ddirect >>>>>>> 3. On the host create a mirroring block job for the IDE device >>>>>>> =C2=A0 =C2=A0=C2=A0 e.g: drive_mirror >>>>>>> 4. On the host finish the block job >>>>>>> =C2=A0 =C2=A0=C2=A0 e.g: block_job_complete >>>>>>> =C2=A0 =C2=A0 Having done the 4th action, you could get an assert= : >>>>>>> assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run. >>>>>>> On my setup, the assert is 1/3 reproducible. >>>>>>> >>>>>>> The patch series introduces the mechanism to postpone the request= s >>>>>>> until the BDS leaves "drained" section for the devices not using >>>>>>> iothreads. >>>>>>> Also, it modifies the asynchronous block backend infrastructure >>>>>>> to use >>>>>>> that mechanism to release the assert bug for IDE devices. >>>>>>> >>>>>>> Denis Plotnikov (2): >>>>>>> =C2=A0 =C2=A0=C2=A0 async: add infrastructure for postponed actio= ns >>>>>>> =C2=A0 =C2=A0=C2=A0 block: postpone the coroutine executing if th= e BDS's is drained >>>>>>> >>>>>>> =C2=A0 =C2=A0 block/block-backend.c | 58 >>>>>>> ++++++++++++++++++++++++++++++--------- >>>>>>> =C2=A0 =C2=A0 include/block/aio.h=C2=A0=C2=A0 | 63 >>>>>>> +++++++++++++++++++++++++++++++++++++++++++ >>>>>>> =C2=A0 =C2=A0 util/async.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 | 33 +++++++++++++++++++++++ >>>>>>> =C2=A0 =C2=A0 3 files changed, 142 insertions(+), 12 deletions(-) >>>>>>> >>>>>> >>>> >>>> --=C2=A0 >>>> Best, >>>> Denis >> >=20 --=20 =E2=80=94js