From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42276) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bCjr3-0004LP-8N for qemu-devel@nongnu.org; Tue, 14 Jun 2016 04:37:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bCjr0-0005Xs-1B for qemu-devel@nongnu.org; Tue, 14 Jun 2016 04:37:49 -0400 Received: from mail-wm0-x230.google.com ([2a00:1450:400c:c09::230]:36843) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bCjqy-0005WR-QK for qemu-devel@nongnu.org; Tue, 14 Jun 2016 04:37:45 -0400 Received: by mail-wm0-x230.google.com with SMTP id n184so110640560wmn.1 for ; Tue, 14 Jun 2016 01:37:44 -0700 (PDT) From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <87ziqtf2ac.fsf@linaro.org> Date: Tue, 14 Jun 2016 09:37:47 +0100 Message-ID: <87bn34go84.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC 02/10] softmmu_llsc_template.h: Move to multi-threading List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Fedorov Cc: Alvise Rigo , mttcg@listserver.greensocs.com, qemu-devel@nongnu.org, jani.kokkonen@huawei.com, claudio.fontana@huawei.com, tech@virtualopensystems.com, fred.konrad@greensocs.com, pbonzini@redhat.com, rth@twiddle.net, cota@braap.org, peter.maydell@linaro.org Alex Bennée writes: > Sergey Fedorov writes: > >> On 26/05/16 19:35, Alvise Rigo wrote: >>> Using tcg_exclusive_{lock,unlock}(), make the emulation of >>> LoadLink/StoreConditional thread safe. >>> >>> During an LL access, this lock protects the load access itself, the >>> update of the exclusive history and the update of the VCPU's protected >>> range. In a SC access, the lock protects the store access itself, the >>> possible reset of other VCPUs' protected range and the reset of the >>> exclusive context of calling VCPU. >>> >>> The lock is also taken when a normal store happens to access an >>> exclusive page to reset other VCPUs' protected range in case of >>> collision. >> >> I think the key problem here is that the load in LL helper can race with >> a concurrent regular fast-path store. It's probably easier to annotate >> the source here: >> >> 1 WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr, >> 2 TCGMemOpIdx oi, uintptr_t retaddr) >> 3 { >> 4 WORD_TYPE ret; >> 5 int index; >> 6 CPUState *this_cpu = ENV_GET_CPU(env); >> 7 CPUClass *cc = CPU_GET_CLASS(this_cpu); >> 8 hwaddr hw_addr; >> 9 unsigned mmu_idx = get_mmuidx(oi); >> >> 10 index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); >> >> 11 tcg_exclusive_lock(); >> >> 12 /* Use the proper load helper from cpu_ldst.h */ >> 13 ret = helper_ld(env, addr, oi, retaddr); >> >> 14 /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr >> + xlat) >> 15 * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */ >> 16 hw_addr = (env->iotlb[mmu_idx][index].addr & >> TARGET_PAGE_MASK) + addr; >> 17 if (likely(!(env->tlb_table[mmu_idx][index].addr_read & >> TLB_MMIO))) { >> 18 /* If all the vCPUs have the EXCL bit set for this page >> there is no need >> 19 * to request any flush. */ >> 20 if (cpu_physical_memory_set_excl(hw_addr)) { >> 21 CPUState *cpu; >> >> 22 excl_history_put_addr(hw_addr); >> 23 CPU_FOREACH(cpu) { >> 24 if (this_cpu != cpu) { >> 25 tlb_flush_other(this_cpu, cpu, 1); >> 26 } >> 27 } >> 28 } >> 29 /* For this vCPU, just update the TLB entry, no need to >> flush. */ >> 30 env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL; >> 31 } else { >> 32 /* Set a pending exclusive access in the MemoryRegion */ >> 33 MemoryRegion *mr = iotlb_to_region(this_cpu, >> 34 >> env->iotlb[mmu_idx][index].addr, >> 35 >> env->iotlb[mmu_idx][index].attrs); >> 36 mr->pending_excl_access = true; >> 37 } >> >> 38 cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE); >> >> 39 tcg_exclusive_unlock(); >> >> 40 /* From now on we are in LL/SC context */ >> 41 this_cpu->ll_sc_context = true; >> >> 42 return ret; >> 43 } >> >> >> The exclusive lock at line 11 doesn't help if concurrent fast-patch >> store at this address occurs after we finished load at line 13 but >> before TLB is flushed as a result of line 25. If we reorder the load to >> happen after the TLB flush request we still must be sure that the flush >> is complete before we can do the load safely. > > I think this can be fixed using async_safe_run_on_cpu and tweaking the > ldlink helper. > > * Change the helper_ldlink call > - pass it offset-of(cpu->reg[n]) so it can store result of load > - maybe pass it next-pc (unless there is some other way to know) > > vCPU runs until the ldlink instruction occurs and jumps to the helper > > * Once in the helper_ldlink > - queue up an async helper function with info of offset > - cpu_loop_exit_restore(with next PC) > > vCPU the issued the ldlink exits immediately, waits until all vCPUs are > out of generated code. > > * Once in helper_ldlink async helper > - Everything at this point is quiescent, no vCPU activity > - Flush all TLBs/set flags > - Do the load from memory, store directly into cpu->reg[n] > > The key thing is once we are committed to load in the async helper > nothing else can get in the way. Any stores before we are in the helper > happen as normal, once we exit the async helper all potential > conflicting stores will slow path. > > There is a little messing about in knowing the next PC which is simple > in the ARM case but gets a bit more complicated for architectures that > have deferred jump slots. I haven't looked into this nit yet. Thinking on it further the messing about with offset and PCs could be solved just by having a simple flag: - First run, no flag set, queue safe work, restart loop at PC - Second run, flag set, do load, clear flag As long as the flag is per-vCPU I think its safe from races. > >> >>> >>> Moreover, adapt target-arm to also cope with the new multi-threaded >>> execution. >>> >>> Signed-off-by: Alvise Rigo >>> --- >>> softmmu_llsc_template.h | 11 +++++++++-- >>> softmmu_template.h | 6 ++++++ >>> target-arm/op_helper.c | 6 ++++++ >>> 3 files changed, 21 insertions(+), 2 deletions(-) >>> >>> diff --git a/softmmu_llsc_template.h b/softmmu_llsc_template.h >>> index 2c4a494..d3810c0 100644 >>> --- a/softmmu_llsc_template.h >>> +++ b/softmmu_llsc_template.h >>> @@ -62,11 +62,13 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr, >>> hwaddr hw_addr; >>> unsigned mmu_idx = get_mmuidx(oi); >>> >>> + index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); >>> + >>> + tcg_exclusive_lock(); >>> + >>> /* Use the proper load helper from cpu_ldst.h */ >>> ret = helper_ld(env, addr, oi, retaddr); >>> >>> - index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1); >>> - >>> /* hw_addr = hwaddr of the page (i.e. section->mr->ram_addr + xlat) >>> * plus the offset (i.e. addr & ~TARGET_PAGE_MASK) */ >>> hw_addr = (env->iotlb[mmu_idx][index].addr & TARGET_PAGE_MASK) + addr; >>> @@ -95,6 +97,8 @@ WORD_TYPE helper_ldlink_name(CPUArchState *env, target_ulong addr, >>> >>> cc->cpu_set_excl_protected_range(this_cpu, hw_addr, DATA_SIZE); >>> >>> + tcg_exclusive_unlock(); >>> + >>> /* From now on we are in LL/SC context */ >>> this_cpu->ll_sc_context = true; >>> >>> @@ -114,6 +118,8 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, target_ulong addr, >>> * access as one made by the store conditional wrapper. If the store >>> * conditional does not succeed, the value will be set to 0.*/ >>> cpu->excl_succeeded = true; >>> + >>> + tcg_exclusive_lock(); >>> helper_st(env, addr, val, oi, retaddr); >>> >>> if (cpu->excl_succeeded) { >>> @@ -123,6 +129,7 @@ WORD_TYPE helper_stcond_name(CPUArchState *env, target_ulong addr, >>> >>> /* Unset LL/SC context */ >>> cc->cpu_reset_excl_context(cpu); >>> + tcg_exclusive_unlock(); >>> >>> return ret; >>> } >>> diff --git a/softmmu_template.h b/softmmu_template.h >>> index 76fe37e..9363a7b 100644 >>> --- a/softmmu_template.h >>> +++ b/softmmu_template.h >>> @@ -537,11 +537,16 @@ static inline void smmu_helper(do_excl_store)(CPUArchState *env, >>> } >>> } >>> >>> + /* Take the lock in case we are not coming from a SC */ >>> + tcg_exclusive_lock(); >>> + >>> smmu_helper(do_ram_store)(env, little_endian, val, addr, oi, >>> get_mmuidx(oi), index, retaddr); >>> >>> reset_other_cpus_colliding_ll_addr(hw_addr, DATA_SIZE); >>> >>> + tcg_exclusive_unlock(); >>> + >>> return; >>> } >>> >>> @@ -572,6 +577,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val, >>> /* Handle an IO access or exclusive access. */ >>> if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) { >>> if (tlb_addr & TLB_EXCL) { >>> + >>> smmu_helper(do_excl_store)(env, true, val, addr, oi, index, >>> retaddr); >>> return; >>> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c >>> index e22afc5..19ea52d 100644 >>> --- a/target-arm/op_helper.c >>> +++ b/target-arm/op_helper.c >>> @@ -35,7 +35,9 @@ static void raise_exception(CPUARMState *env, uint32_t excp, >>> cs->exception_index = excp; >>> env->exception.syndrome = syndrome; >>> env->exception.target_el = target_el; >>> + tcg_exclusive_lock(); >>> cc->cpu_reset_excl_context(cs); >>> + tcg_exclusive_unlock(); >>> cpu_loop_exit(cs); >>> } >>> >>> @@ -58,7 +60,9 @@ void HELPER(atomic_clear)(CPUARMState *env) >>> CPUState *cs = ENV_GET_CPU(env); >>> CPUClass *cc = CPU_GET_CLASS(cs); >>> >>> + tcg_exclusive_lock(); >>> cc->cpu_reset_excl_context(cs); >>> + tcg_exclusive_unlock(); >>> } >>> >>> uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def, >>> @@ -874,7 +878,9 @@ void HELPER(exception_return)(CPUARMState *env) >>> >>> aarch64_save_sp(env, cur_el); >>> >>> + tcg_exclusive_lock(); >>> cc->cpu_reset_excl_context(cs); >>> + tcg_exclusive_unlock(); >>> >>> /* We must squash the PSTATE.SS bit to zero unless both of the >>> * following hold: -- Alex Bennée