From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60682) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5GeU-0008Jn-MQ for qemu-devel@nongnu.org; Wed, 17 Jun 2015 12:57:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z5GeQ-0004e5-MF for qemu-devel@nongnu.org; Wed, 17 Jun 2015 12:57:26 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56489) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z5GeQ-0004dy-Ey for qemu-devel@nongnu.org; Wed, 17 Jun 2015 12:57:22 -0400 Date: Wed, 17 Jun 2015 17:57:16 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150617165716.GM2122@work-vm> References: <878uborigh.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <878uborigh.fsf@linaro.org> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] RFC Multi-threaded TCG design document List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Benn?e Cc: mttcg@greensocs.com, peter.maydell@linaro.org, mark.burton@greensocs.com, qemu-devel@nongnu.org, agraf@suse.de, guillaume.delbergue@greensocs.com, pbonzini@redhat.com, fred.konrad@greensocs.com * Alex Benn?e (alex.bennee@linaro.org) wrote: > Hi, > Shared Data Structures > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > Global TCG State > ---------------- >=20 > We need to protect the entire code generation cycle including any post > generation patching of the translated code. This also implies a shared > translation buffer which contains code running on all cores. Any > execution path that comes to the main run loop will need to hold a > mutex for code generation. This also includes times when we need flush > code or jumps from the tb_cache. >=20 > DESIGN REQUIREMENT: Add locking around all code generation, patching > and jump cache modification I don't think that you require a shared translation buffer between cores to do this - although it *might* be the easiest way. You could have a per-core translation buffer, the only requirement is that most invalidation operations happen on all the buffers (although that might depend on the emulated architecture). With a per-core translation buffer, each core could generate new translat= ions without locking the other cores as long as no one is doing invalidations. > Memory maps and TLBs > -------------------- >=20 > The memory handling code is fairly critical to the speed of memory > access in the emulated system. >=20 > - Memory regions (dividing up access to PIO, MMIO and RAM) > - Dirty page tracking (for code gen, migration and display) > - Virtual TLB (for translating guest address->real address) >=20 > There is a both a fast path walked by the generated code and a slow > path when resolution is required. When the TLB tables are updated we > need to ensure they are done in a safe way by bringing all executing > threads to a halt before making the modifications. >=20 > DESIGN REQUIREMENTS: >=20 > - TLB Flush All/Page > - can be across-CPUs > - will need all other CPUs brought to a halt > - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs) > - This is a per-CPU table - by definition can't race > - updated by it's own thread when the slow-path is forced >=20 > Emulated hardware state > ----------------------- >=20 > Currently the hardware emulation has no protection against > multiple-accesses. However guest systems accessing emulated hardware > should be carrying out their own locking to prevent multiple CPUs > confusing the hardware. Of course there is no guarantee the there > couldn't be a broken guest that doesn't lock so you could get racing > accesses to the hardware. >=20 > There is the class of paravirtualized hardware (VIRTIO) that works in > a purely mmio mode. Often setting flags directly in guest memory as a > result of a guest triggered transaction. >=20 > DESIGN REQUIREMENTS: >=20 > - Access to IO Memory should be serialised by an IOMem mutex > - The mutex should be recursive (e.g. allowing pid to relock itself) >=20 > IO Subsystem > ------------ >=20 > The I/O subsystem is heavily used by KVM and has seen a lot of > improvements to offload I/O tasks to dedicated IOThreads. There should > be no additional locking required once we reach the Block Driver. >=20 > DESIGN REQUIREMENTS: >=20 > - The dataplane should continue to be protected by the iothread locks Watch out for where DMA invalidates the translated code. Dave >=20 >=20 > References > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/pla= in/Documentation/memory-barriers.txt > [2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561 > [3] http://thread.gmane.org/gmane.comp.emulators.qemu/335297 >=20 >=20 >=20 > --=20 > Alex Benn=E9e -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK