* [PULL 0/5] tcg patch queue @ 2020-09-03 21:40 Richard Henderson 2020-09-03 21:40 ` [PULL 1/5] cputlb: Make store_helper less fragile to compiler optimizations Richard Henderson ` (5 more replies) 0 siblings, 6 replies; 30+ messages in thread From: Richard Henderson @ 2020-09-03 21:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell The following changes since commit 3dd23a4fb8fd72d2220a90a809f213999ffe7f3a: Merge remote-tracking branch 'remotes/legoater/tags/pull-aspeed-20200901' into staging (2020-09-03 14:12:48 +0100) are available in the Git repository at: https://github.com/rth7680/qemu.git tags/pull-tcg-20200903 for you to fetch changes up to fe4b0b5bfa96c38ad1cad0689a86cca9f307e353: tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem (2020-09-03 13:13:58 -0700) ---------------------------------------------------------------- Improve inlining in cputlb.c. Fix vector abs fallback. Only set parallel_cpus for SMP. Add vector dupm for 256-bit elements. ---------------------------------------------------------------- Richard Henderson (4): cputlb: Make store_helper less fragile to compiler optimizations softmmu/cpus: Only set parallel_cpus for SMP tcg: Eliminate one store for in-place 128-bit dup_mem tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem Stephen Long (1): tcg: Fix tcg gen for vectorized absolute value accel/tcg/cputlb.c | 138 ++++++++++++++++++++++++++++++----------------------- softmmu/cpus.c | 11 ++++- tcg/tcg-op-gvec.c | 61 ++++++++++++++++++++--- 3 files changed, 143 insertions(+), 67 deletions(-) ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PULL 1/5] cputlb: Make store_helper less fragile to compiler optimizations 2020-09-03 21:40 [PULL 0/5] tcg patch queue Richard Henderson @ 2020-09-03 21:40 ` Richard Henderson 2020-09-03 21:40 ` [PULL 2/5] tcg: Fix tcg gen for vectorized absolute value Richard Henderson ` (4 subsequent siblings) 5 siblings, 0 replies; 30+ messages in thread From: Richard Henderson @ 2020-09-03 21:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, Alex Bennée, Shu-Chun Weng This has no functional change. The current function structure is: inline QEMU_ALWAYSINLINE store_memop() { switch () { ... default: qemu_build_not_reached(); } } inline QEMU_ALWAYSINLINE store_helper() { ... if (span_two_pages_or_io) { ... helper_ret_stb_mmu(); } store_memop(); } helper_ret_stb_mmu() { store_helper(); } Whereas GCC will generate an error at compile-time when an always_inline function is not inlined, Clang does not. Nor does Clang prioritize the inlining of always_inline functions. Both of these are arguably bugs. Both `store_memop` and `store_helper` need to be inlined and allow constant propogations to eliminate the `qemu_build_not_reached` call. However, if the compiler instead chooses to inline helper_ret_stb_mmu into store_helper, then store_helper is now self-recursive and the compiler is no longer able to propagate the constant in the same way. This does not produce at current QEMU head, but was reproducible at v4.2.0 with `clang-10 -O2 -fexperimental-new-pass-manager`. The inline recursion problem can be fixed solely by marking helper_ret_stb_mmu as noinline, so the compiler does not make an incorrect decision about which functions to inline. In addition, extract store_helper_unaligned as a noinline subroutine that can be shared by all of the helpers. This saves about 6k code size in an optimized x86_64 build. Reported-by: Shu-Chun Weng <scw@google.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- accel/tcg/cputlb.c | 138 ++++++++++++++++++++++++++------------------- 1 file changed, 79 insertions(+), 59 deletions(-) diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 2d48281942..6489abbf8c 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -2009,6 +2009,80 @@ store_memop(void *haddr, uint64_t val, MemOp op) } } +static void __attribute__((noinline)) +store_helper_unaligned(CPUArchState *env, target_ulong addr, uint64_t val, + uintptr_t retaddr, size_t size, uintptr_t mmu_idx, + bool big_endian) +{ + const size_t tlb_off = offsetof(CPUTLBEntry, addr_write); + uintptr_t index, index2; + CPUTLBEntry *entry, *entry2; + target_ulong page2, tlb_addr, tlb_addr2; + TCGMemOpIdx oi; + size_t size2; + int i; + + /* + * Ensure the second page is in the TLB. Note that the first page + * is already guaranteed to be filled, and that the second page + * cannot evict the first. + */ + page2 = (addr + size) & TARGET_PAGE_MASK; + size2 = (addr + size) & ~TARGET_PAGE_MASK; + index2 = tlb_index(env, mmu_idx, page2); + entry2 = tlb_entry(env, mmu_idx, page2); + + tlb_addr2 = tlb_addr_write(entry2); + if (!tlb_hit_page(tlb_addr2, page2)) { + if (!victim_tlb_hit(env, mmu_idx, index2, tlb_off, page2)) { + tlb_fill(env_cpu(env), page2, size2, MMU_DATA_STORE, + mmu_idx, retaddr); + index2 = tlb_index(env, mmu_idx, page2); + entry2 = tlb_entry(env, mmu_idx, page2); + } + tlb_addr2 = tlb_addr_write(entry2); + } + + index = tlb_index(env, mmu_idx, addr); + entry = tlb_entry(env, mmu_idx, addr); + tlb_addr = tlb_addr_write(entry); + + /* + * Handle watchpoints. Since this may trap, all checks + * must happen before any store. + */ + if (unlikely(tlb_addr & TLB_WATCHPOINT)) { + cpu_check_watchpoint(env_cpu(env), addr, size - size2, + env_tlb(env)->d[mmu_idx].iotlb[index].attrs, + BP_MEM_WRITE, retaddr); + } + if (unlikely(tlb_addr2 & TLB_WATCHPOINT)) { + cpu_check_watchpoint(env_cpu(env), page2, size2, + env_tlb(env)->d[mmu_idx].iotlb[index2].attrs, + BP_MEM_WRITE, retaddr); + } + + /* + * XXX: not efficient, but simple. + * This loop must go in the forward direction to avoid issues + * with self-modifying code in Windows 64-bit. + */ + oi = make_memop_idx(MO_UB, mmu_idx); + if (big_endian) { + for (i = 0; i < size; ++i) { + /* Big-endian extract. */ + uint8_t val8 = val >> (((size - 1) * 8) - (i * 8)); + helper_ret_stb_mmu(env, addr + i, val8, oi, retaddr); + } + } else { + for (i = 0; i < size; ++i) { + /* Little-endian extract. */ + uint8_t val8 = val >> (i * 8); + helper_ret_stb_mmu(env, addr + i, val8, oi, retaddr); + } + } +} + static inline void QEMU_ALWAYS_INLINE store_helper(CPUArchState *env, target_ulong addr, uint64_t val, TCGMemOpIdx oi, uintptr_t retaddr, MemOp op) @@ -2097,64 +2171,9 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val, if (size > 1 && unlikely((addr & ~TARGET_PAGE_MASK) + size - 1 >= TARGET_PAGE_SIZE)) { - int i; - uintptr_t index2; - CPUTLBEntry *entry2; - target_ulong page2, tlb_addr2; - size_t size2; - do_unaligned_access: - /* - * Ensure the second page is in the TLB. Note that the first page - * is already guaranteed to be filled, and that the second page - * cannot evict the first. - */ - page2 = (addr + size) & TARGET_PAGE_MASK; - size2 = (addr + size) & ~TARGET_PAGE_MASK; - index2 = tlb_index(env, mmu_idx, page2); - entry2 = tlb_entry(env, mmu_idx, page2); - tlb_addr2 = tlb_addr_write(entry2); - if (!tlb_hit_page(tlb_addr2, page2)) { - if (!victim_tlb_hit(env, mmu_idx, index2, tlb_off, page2)) { - tlb_fill(env_cpu(env), page2, size2, MMU_DATA_STORE, - mmu_idx, retaddr); - index2 = tlb_index(env, mmu_idx, page2); - entry2 = tlb_entry(env, mmu_idx, page2); - } - tlb_addr2 = tlb_addr_write(entry2); - } - - /* - * Handle watchpoints. Since this may trap, all checks - * must happen before any store. - */ - if (unlikely(tlb_addr & TLB_WATCHPOINT)) { - cpu_check_watchpoint(env_cpu(env), addr, size - size2, - env_tlb(env)->d[mmu_idx].iotlb[index].attrs, - BP_MEM_WRITE, retaddr); - } - if (unlikely(tlb_addr2 & TLB_WATCHPOINT)) { - cpu_check_watchpoint(env_cpu(env), page2, size2, - env_tlb(env)->d[mmu_idx].iotlb[index2].attrs, - BP_MEM_WRITE, retaddr); - } - - /* - * XXX: not efficient, but simple. - * This loop must go in the forward direction to avoid issues - * with self-modifying code in Windows 64-bit. - */ - for (i = 0; i < size; ++i) { - uint8_t val8; - if (memop_big_endian(op)) { - /* Big-endian extract. */ - val8 = val >> (((size - 1) * 8) - (i * 8)); - } else { - /* Little-endian extract. */ - val8 = val >> (i * 8); - } - helper_ret_stb_mmu(env, addr + i, val8, oi, retaddr); - } + store_helper_unaligned(env, addr, val, retaddr, size, + mmu_idx, memop_big_endian(op)); return; } @@ -2162,8 +2181,9 @@ store_helper(CPUArchState *env, target_ulong addr, uint64_t val, store_memop(haddr, val, op); } -void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val, - TCGMemOpIdx oi, uintptr_t retaddr) +void __attribute__((noinline)) +helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val, + TCGMemOpIdx oi, uintptr_t retaddr) { store_helper(env, addr, val, oi, retaddr, MO_UB); } -- 2.25.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PULL 2/5] tcg: Fix tcg gen for vectorized absolute value 2020-09-03 21:40 [PULL 0/5] tcg patch queue Richard Henderson 2020-09-03 21:40 ` [PULL 1/5] cputlb: Make store_helper less fragile to compiler optimizations Richard Henderson @ 2020-09-03 21:40 ` Richard Henderson 2020-09-03 21:40 ` [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson ` (3 subsequent siblings) 5 siblings, 0 replies; 30+ messages in thread From: Richard Henderson @ 2020-09-03 21:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, Stephen Long From: Stephen Long <steplong@quicinc.com> The fallback inline expansion for vectorized absolute value, when the host doesn't support such an insn was flawed. E.g. when a vector of bytes has all elements negative, mask will be 0xffff_ffff_ffff_ffff. Subtracting mask only adds 1 to the low element instead of all elements becase -mask is 1 and not 0x0101_0101_0101_0101. Signed-off-by: Stephen Long <steplong@quicinc.com> Message-Id: <20200813161818.190-1-steplong@quicinc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- tcg/tcg-op-gvec.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 3707c0effb..793d4ba64c 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -2264,12 +2264,13 @@ static void gen_absv_mask(TCGv_i64 d, TCGv_i64 b, unsigned vece) tcg_gen_muli_i64(t, t, (1 << nbit) - 1); /* - * Invert (via xor -1) and add one (via sub -1). + * Invert (via xor -1) and add one. * Because of the ordering the msb is cleared, * so we never have carry into the next element. */ tcg_gen_xor_i64(d, b, t); - tcg_gen_sub_i64(d, d, t); + tcg_gen_andi_i64(t, t, dup_const(vece, 1)); + tcg_gen_add_i64(d, d, t); tcg_temp_free_i64(t); } -- 2.25.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-03 21:40 [PULL 0/5] tcg patch queue Richard Henderson 2020-09-03 21:40 ` [PULL 1/5] cputlb: Make store_helper less fragile to compiler optimizations Richard Henderson 2020-09-03 21:40 ` [PULL 2/5] tcg: Fix tcg gen for vectorized absolute value Richard Henderson @ 2020-09-03 21:40 ` Richard Henderson 2020-09-07 10:05 ` Claudio Fontana 2020-09-03 21:41 ` [PULL 4/5] tcg: Eliminate one store for in-place 128-bit dup_mem Richard Henderson ` (2 subsequent siblings) 5 siblings, 1 reply; 30+ messages in thread From: Richard Henderson @ 2020-09-03 21:40 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, Philippe Mathieu-Daudé Do not set parallel_cpus if there is only one cpu instantiated. This will allow tcg to use serial code to implement atomics. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- softmmu/cpus.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/softmmu/cpus.c b/softmmu/cpus.c index a802e899ab..e3b98065c9 100644 --- a/softmmu/cpus.c +++ b/softmmu/cpus.c @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) if (!tcg_region_inited) { tcg_region_inited = 1; tcg_region_init(); + /* + * If MTTCG, and we will create multiple cpus, + * then we will have cpus running in parallel. + */ + if (qemu_tcg_mttcg_enabled()) { + MachineState *ms = MACHINE(qdev_get_machine()); + if (ms->smp.max_cpus > 1) { + parallel_cpus = true; + } + } } if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) if (qemu_tcg_mttcg_enabled()) { /* create a thread per vCPU with TCG (MTTCG) */ - parallel_cpus = true; snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", cpu->cpu_index); -- 2.25.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-03 21:40 ` [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson @ 2020-09-07 10:05 ` Claudio Fontana 2020-09-07 10:20 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 30+ messages in thread From: Claudio Fontana @ 2020-09-07 10:05 UTC (permalink / raw) To: Richard Henderson, qemu-devel Cc: peter.maydell, Philippe Mathieu-Daudé, Paolo Bonzini Hi Richard, currently rebasing on top of this one, just a question, why is the patch not directly using "current_machine"? Is using MACHINE(qdev_get_machine()) preferrable here? Thanks, Claudio On 9/3/20 11:40 PM, Richard Henderson wrote: > Do not set parallel_cpus if there is only one cpu instantiated. > This will allow tcg to use serial code to implement atomics. > > Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > softmmu/cpus.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/softmmu/cpus.c b/softmmu/cpus.c > index a802e899ab..e3b98065c9 100644 > --- a/softmmu/cpus.c > +++ b/softmmu/cpus.c > @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) > if (!tcg_region_inited) { > tcg_region_inited = 1; > tcg_region_init(); > + /* > + * If MTTCG, and we will create multiple cpus, > + * then we will have cpus running in parallel. > + */ > + if (qemu_tcg_mttcg_enabled()) { > + MachineState *ms = MACHINE(qdev_get_machine()); MachineState *ms = current_machine; ? > + if (ms->smp.max_cpus > 1) { > + parallel_cpus = true; > + } > + } > } > > if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { > @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) > > if (qemu_tcg_mttcg_enabled()) { > /* create a thread per vCPU with TCG (MTTCG) */ > - parallel_cpus = true; > snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", > cpu->cpu_index); > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-07 10:05 ` Claudio Fontana @ 2020-09-07 10:20 ` Philippe Mathieu-Daudé 2020-09-07 16:30 ` Claudio Fontana 0 siblings, 1 reply; 30+ messages in thread From: Philippe Mathieu-Daudé @ 2020-09-07 10:20 UTC (permalink / raw) To: Claudio Fontana, Richard Henderson, qemu-devel Cc: peter.maydell, Paolo Bonzini On 9/7/20 12:05 PM, Claudio Fontana wrote: > Hi Richard, > > currently rebasing on top of this one, > just a question, why is the patch not directly using "current_machine"? For user mode? > > Is using MACHINE(qdev_get_machine()) preferrable here? > > Thanks, > > Claudio > > On 9/3/20 11:40 PM, Richard Henderson wrote: >> Do not set parallel_cpus if there is only one cpu instantiated. >> This will allow tcg to use serial code to implement atomics. >> >> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> --- >> softmmu/cpus.c | 11 ++++++++++- >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/softmmu/cpus.c b/softmmu/cpus.c >> index a802e899ab..e3b98065c9 100644 >> --- a/softmmu/cpus.c >> +++ b/softmmu/cpus.c >> @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >> if (!tcg_region_inited) { >> tcg_region_inited = 1; >> tcg_region_init(); >> + /* >> + * If MTTCG, and we will create multiple cpus, >> + * then we will have cpus running in parallel. >> + */ >> + if (qemu_tcg_mttcg_enabled()) { >> + MachineState *ms = MACHINE(qdev_get_machine()); > > MachineState *ms = current_machine; > ? > > >> + if (ms->smp.max_cpus > 1) { >> + parallel_cpus = true; >> + } >> + } >> } >> >> if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { >> @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >> >> if (qemu_tcg_mttcg_enabled()) { >> /* create a thread per vCPU with TCG (MTTCG) */ >> - parallel_cpus = true; >> snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", >> cpu->cpu_index); >> >> > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-07 10:20 ` Philippe Mathieu-Daudé @ 2020-09-07 16:30 ` Claudio Fontana 2020-09-07 16:49 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 30+ messages in thread From: Claudio Fontana @ 2020-09-07 16:30 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: peter.maydell, Richard Henderson, qemu-devel, Paolo Bonzini On 9/7/20 12:20 PM, Philippe Mathieu-Daudé wrote: > On 9/7/20 12:05 PM, Claudio Fontana wrote: >> Hi Richard, >> >> currently rebasing on top of this one, >> just a question, why is the patch not directly using "current_machine"? > > For user mode? In user mode I'd not expect softmmu/cpus.c to be built at all... Ciao, Claudio > >> >> Is using MACHINE(qdev_get_machine()) preferrable here? >> >> Thanks, >> >> Claudio >> >> On 9/3/20 11:40 PM, Richard Henderson wrote: >>> Do not set parallel_cpus if there is only one cpu instantiated. >>> This will allow tcg to use serial code to implement atomics. >>> >>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>> --- >>> softmmu/cpus.c | 11 ++++++++++- >>> 1 file changed, 10 insertions(+), 1 deletion(-) >>> >>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c >>> index a802e899ab..e3b98065c9 100644 >>> --- a/softmmu/cpus.c >>> +++ b/softmmu/cpus.c >>> @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>> if (!tcg_region_inited) { >>> tcg_region_inited = 1; >>> tcg_region_init(); >>> + /* >>> + * If MTTCG, and we will create multiple cpus, >>> + * then we will have cpus running in parallel. >>> + */ >>> + if (qemu_tcg_mttcg_enabled()) { >>> + MachineState *ms = MACHINE(qdev_get_machine()); >> >> MachineState *ms = current_machine; >> ? >> >> >>> + if (ms->smp.max_cpus > 1) { >>> + parallel_cpus = true; >>> + } >>> + } >>> } >>> >>> if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { >>> @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>> >>> if (qemu_tcg_mttcg_enabled()) { >>> /* create a thread per vCPU with TCG (MTTCG) */ >>> - parallel_cpus = true; >>> snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", >>> cpu->cpu_index); >>> >>> >> >> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-07 16:30 ` Claudio Fontana @ 2020-09-07 16:49 ` Philippe Mathieu-Daudé 2020-09-08 7:09 ` Claudio Fontana 0 siblings, 1 reply; 30+ messages in thread From: Philippe Mathieu-Daudé @ 2020-09-07 16:49 UTC (permalink / raw) To: Claudio Fontana Cc: peter.maydell, Richard Henderson, qemu-devel, Paolo Bonzini On 9/7/20 6:30 PM, Claudio Fontana wrote: > On 9/7/20 12:20 PM, Philippe Mathieu-Daudé wrote: >> On 9/7/20 12:05 PM, Claudio Fontana wrote: >>> Hi Richard, >>> >>> currently rebasing on top of this one, >>> just a question, why is the patch not directly using "current_machine"? >> >> For user mode? > > In user mode I'd not expect softmmu/cpus.c to be built at all... Which is why :) current_machine is NULL in user-mode. > > Ciao, > > Claudio > >> >>> >>> Is using MACHINE(qdev_get_machine()) preferrable here? >>> >>> Thanks, >>> >>> Claudio >>> >>> On 9/3/20 11:40 PM, Richard Henderson wrote: >>>> Do not set parallel_cpus if there is only one cpu instantiated. >>>> This will allow tcg to use serial code to implement atomics. >>>> >>>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>>> --- >>>> softmmu/cpus.c | 11 ++++++++++- >>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c >>>> index a802e899ab..e3b98065c9 100644 >>>> --- a/softmmu/cpus.c >>>> +++ b/softmmu/cpus.c >>>> @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>>> if (!tcg_region_inited) { >>>> tcg_region_inited = 1; >>>> tcg_region_init(); >>>> + /* >>>> + * If MTTCG, and we will create multiple cpus, >>>> + * then we will have cpus running in parallel. >>>> + */ >>>> + if (qemu_tcg_mttcg_enabled()) { >>>> + MachineState *ms = MACHINE(qdev_get_machine()); >>> >>> MachineState *ms = current_machine; >>> ? >>> >>> >>>> + if (ms->smp.max_cpus > 1) { >>>> + parallel_cpus = true; >>>> + } >>>> + } >>>> } >>>> >>>> if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { >>>> @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>>> >>>> if (qemu_tcg_mttcg_enabled()) { >>>> /* create a thread per vCPU with TCG (MTTCG) */ >>>> - parallel_cpus = true; >>>> snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", >>>> cpu->cpu_index); >>>> >>>> >>> >>> > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-07 16:49 ` Philippe Mathieu-Daudé @ 2020-09-08 7:09 ` Claudio Fontana 2020-09-08 7:56 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 30+ messages in thread From: Claudio Fontana @ 2020-09-08 7:09 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: peter.maydell, Richard Henderson, qemu-devel, Paolo Bonzini On 9/7/20 6:49 PM, Philippe Mathieu-Daudé wrote: > On 9/7/20 6:30 PM, Claudio Fontana wrote: >> On 9/7/20 12:20 PM, Philippe Mathieu-Daudé wrote: >>> On 9/7/20 12:05 PM, Claudio Fontana wrote: >>>> Hi Richard, >>>> >>>> currently rebasing on top of this one, >>>> just a question, why is the patch not directly using "current_machine"? >>> >>> For user mode? >> >> In user mode I'd not expect softmmu/cpus.c to be built at all... > > Which is why :) current_machine is NULL in user-mode. Ciao Philippe, then why does the patch change softmmu/cpus.c in a way that accounts for user mode? The function that the patch changes is never called in user mode. The patch could instead use current_machine without any concern of it being NULL, it will always be set in vl.c . Ciao, Claudio > >> >> Ciao, >> >> Claudio >> >>> >>>> >>>> Is using MACHINE(qdev_get_machine()) preferrable here? >>>> >>>> Thanks, >>>> >>>> Claudio >>>> >>>> On 9/3/20 11:40 PM, Richard Henderson wrote: >>>>> Do not set parallel_cpus if there is only one cpu instantiated. >>>>> This will allow tcg to use serial code to implement atomics. >>>>> >>>>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >>>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>>>> --- >>>>> softmmu/cpus.c | 11 ++++++++++- >>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c >>>>> index a802e899ab..e3b98065c9 100644 >>>>> --- a/softmmu/cpus.c >>>>> +++ b/softmmu/cpus.c >>>>> @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>>>> if (!tcg_region_inited) { >>>>> tcg_region_inited = 1; >>>>> tcg_region_init(); >>>>> + /* >>>>> + * If MTTCG, and we will create multiple cpus, >>>>> + * then we will have cpus running in parallel. >>>>> + */ >>>>> + if (qemu_tcg_mttcg_enabled()) { >>>>> + MachineState *ms = MACHINE(qdev_get_machine()); >>>> >>>> MachineState *ms = current_machine; >>>> ? >>>> >>>> >>>>> + if (ms->smp.max_cpus > 1) { >>>>> + parallel_cpus = true; >>>>> + } >>>>> + } >>>>> } >>>>> >>>>> if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { >>>>> @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>>>> >>>>> if (qemu_tcg_mttcg_enabled()) { >>>>> /* create a thread per vCPU with TCG (MTTCG) */ >>>>> - parallel_cpus = true; >>>>> snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", >>>>> cpu->cpu_index); >>>>> >>>>> >>>> >>>> >> >> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-08 7:09 ` Claudio Fontana @ 2020-09-08 7:56 ` Philippe Mathieu-Daudé 2020-09-10 8:28 ` Claudio Fontana 0 siblings, 1 reply; 30+ messages in thread From: Philippe Mathieu-Daudé @ 2020-09-08 7:56 UTC (permalink / raw) To: Claudio Fontana Cc: peter.maydell, Richard Henderson, qemu-devel, Paolo Bonzini +Laurent On 9/8/20 9:09 AM, Claudio Fontana wrote: > On 9/7/20 6:49 PM, Philippe Mathieu-Daudé wrote: >> On 9/7/20 6:30 PM, Claudio Fontana wrote: >>> On 9/7/20 12:20 PM, Philippe Mathieu-Daudé wrote: >>>> On 9/7/20 12:05 PM, Claudio Fontana wrote: >>>>> Hi Richard, >>>>> >>>>> currently rebasing on top of this one, >>>>> just a question, why is the patch not directly using "current_machine"? >>>> >>>> For user mode? >>> >>> In user mode I'd not expect softmmu/cpus.c to be built at all... >> >> Which is why :) current_machine is NULL in user-mode. > > Ciao Philippe, > > then why does the patch change softmmu/cpus.c in a way that accounts for user mode? > > The function that the patch changes is never called in user mode. > > The patch could instead use current_machine without any concern of it being NULL, it will always be set in vl.c . Better send a patch to prove your point :) I have bad remember about the last time I tried to understand/modify that part, because IIRC the user-mode creates some fake machine to satisfy various QEMU generic code assumptions. Sincerely I now prefer stay away from this; too many unmerged patches there. > > Ciao, > > Claudio > >> >>> >>> Ciao, >>> >>> Claudio >>> >>>> >>>>> >>>>> Is using MACHINE(qdev_get_machine()) preferrable here? >>>>> >>>>> Thanks, >>>>> >>>>> Claudio >>>>> >>>>> On 9/3/20 11:40 PM, Richard Henderson wrote: >>>>>> Do not set parallel_cpus if there is only one cpu instantiated. >>>>>> This will allow tcg to use serial code to implement atomics. >>>>>> >>>>>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >>>>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>>>>> --- >>>>>> softmmu/cpus.c | 11 ++++++++++- >>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c >>>>>> index a802e899ab..e3b98065c9 100644 >>>>>> --- a/softmmu/cpus.c >>>>>> +++ b/softmmu/cpus.c >>>>>> @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>>>>> if (!tcg_region_inited) { >>>>>> tcg_region_inited = 1; >>>>>> tcg_region_init(); >>>>>> + /* >>>>>> + * If MTTCG, and we will create multiple cpus, >>>>>> + * then we will have cpus running in parallel. >>>>>> + */ >>>>>> + if (qemu_tcg_mttcg_enabled()) { >>>>>> + MachineState *ms = MACHINE(qdev_get_machine()); >>>>> >>>>> MachineState *ms = current_machine; >>>>> ? >>>>> >>>>> >>>>>> + if (ms->smp.max_cpus > 1) { >>>>>> + parallel_cpus = true; >>>>>> + } >>>>>> + } >>>>>> } >>>>>> >>>>>> if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { >>>>>> @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>>>>> >>>>>> if (qemu_tcg_mttcg_enabled()) { >>>>>> /* create a thread per vCPU with TCG (MTTCG) */ >>>>>> - parallel_cpus = true; >>>>>> snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", >>>>>> cpu->cpu_index); >>>>>> >>>>>> >>>>> >>>>> >>> >>> > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-08 7:56 ` Philippe Mathieu-Daudé @ 2020-09-10 8:28 ` Claudio Fontana 2020-09-10 10:32 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 30+ messages in thread From: Claudio Fontana @ 2020-09-10 8:28 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: Laurent Vivier, peter.maydell, Richard Henderson, qemu-devel, Paolo Bonzini Hi Philippe, On 9/8/20 9:56 AM, Philippe Mathieu-Daudé wrote: > +Laurent > > On 9/8/20 9:09 AM, Claudio Fontana wrote: >> On 9/7/20 6:49 PM, Philippe Mathieu-Daudé wrote: >>> On 9/7/20 6:30 PM, Claudio Fontana wrote: >>>> On 9/7/20 12:20 PM, Philippe Mathieu-Daudé wrote: >>>>> On 9/7/20 12:05 PM, Claudio Fontana wrote: >>>>>> Hi Richard, >>>>>> >>>>>> currently rebasing on top of this one, >>>>>> just a question, why is the patch not directly using "current_machine"? >>>>> >>>>> For user mode? >>>> >>>> In user mode I'd not expect softmmu/cpus.c to be built at all... >>> >>> Which is why :) current_machine is NULL in user-mode. >> >> Ciao Philippe, >> >> then why does the patch change softmmu/cpus.c in a way that accounts for user mode? >> >> The function that the patch changes is never called in user mode. >> >> The patch could instead use current_machine without any concern of it being NULL, it will always be set in vl.c . > > Better send a patch to prove your point :) Yes, I am already testing without problems the cpus-refactoring series with this applied: commit 53f6db772f1522025650441102b16be17773bdc9 Author: Claudio Fontana <cfontana@suse.de> Date: Tue Sep 8 10:59:07 2020 +0200 accel/tcg: use current_machine as it is always set for softmmu current_machine is always set before accelerators are initialized, so use that instead of MACHINE(qdev_get_machine()). Signed-off-by: Claudio Fontana <cfontana@suse.de> diff --git a/accel/tcg/tcg-cpus.c b/accel/tcg/tcg-cpus.c index ec7158b55e..05af1168a2 100644 --- a/accel/tcg/tcg-cpus.c +++ b/accel/tcg/tcg-cpus.c @@ -484,7 +484,7 @@ static void tcg_start_vcpu_thread(CPUState *cpu) * then we will have cpus running in parallel. */ if (qemu_tcg_mttcg_enabled()) { - MachineState *ms = MACHINE(qdev_get_machine()); + MachineState *ms = current_machine; if (ms->smp.max_cpus > 1) { parallel_cpus = true; } > > I have bad remember about the last time I tried to understand/modify > that part, because IIRC the user-mode creates some fake machine to > satisfy various QEMU generic code assumptions. Sincerely I now prefer > stay away from this; too many unmerged patches there. linux-user/main.c and bsd-user/main.c seem to use cpu_create() to create the cpus, then they have their own cpu_loop(), they do not use any of the cpus.c code. By contrast, softmmu/vl.c initializes current_machine in qemu_init(), even before accelerator is chosen and configured. So there is no chance currently that current_machine is not set correctly when the accelerator vcpu startup code in is called. Ciao, CLaudio > >> >> Ciao, >> >> Claudio >> >>> >>>> >>>> Ciao, >>>> >>>> Claudio >>>> >>>>> >>>>>> >>>>>> Is using MACHINE(qdev_get_machine()) preferrable here? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Claudio >>>>>> >>>>>> On 9/3/20 11:40 PM, Richard Henderson wrote: >>>>>>> Do not set parallel_cpus if there is only one cpu instantiated. >>>>>>> This will allow tcg to use serial code to implement atomics. >>>>>>> >>>>>>> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >>>>>>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>>>>>> --- >>>>>>> softmmu/cpus.c | 11 ++++++++++- >>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/softmmu/cpus.c b/softmmu/cpus.c >>>>>>> index a802e899ab..e3b98065c9 100644 >>>>>>> --- a/softmmu/cpus.c >>>>>>> +++ b/softmmu/cpus.c >>>>>>> @@ -1895,6 +1895,16 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>>>>>> if (!tcg_region_inited) { >>>>>>> tcg_region_inited = 1; >>>>>>> tcg_region_init(); >>>>>>> + /* >>>>>>> + * If MTTCG, and we will create multiple cpus, >>>>>>> + * then we will have cpus running in parallel. >>>>>>> + */ >>>>>>> + if (qemu_tcg_mttcg_enabled()) { >>>>>>> + MachineState *ms = MACHINE(qdev_get_machine()); >>>>>> >>>>>> MachineState *ms = current_machine; >>>>>> ? >>>>>> >>>>>> >>>>>>> + if (ms->smp.max_cpus > 1) { >>>>>>> + parallel_cpus = true; >>>>>>> + } >>>>>>> + } >>>>>>> } >>>>>>> >>>>>>> if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) { >>>>>>> @@ -1904,7 +1914,6 @@ static void qemu_tcg_init_vcpu(CPUState *cpu) >>>>>>> >>>>>>> if (qemu_tcg_mttcg_enabled()) { >>>>>>> /* create a thread per vCPU with TCG (MTTCG) */ >>>>>>> - parallel_cpus = true; >>>>>>> snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG", >>>>>>> cpu->cpu_index); >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>>> >> >> ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP 2020-09-10 8:28 ` Claudio Fontana @ 2020-09-10 10:32 ` Philippe Mathieu-Daudé 0 siblings, 0 replies; 30+ messages in thread From: Philippe Mathieu-Daudé @ 2020-09-10 10:32 UTC (permalink / raw) To: Claudio Fontana Cc: Laurent Vivier, peter.maydell, Richard Henderson, qemu-devel, Paolo Bonzini On 9/10/20 10:28 AM, Claudio Fontana wrote: > Hi Philippe, > > On 9/8/20 9:56 AM, Philippe Mathieu-Daudé wrote: >> +Laurent >> >> On 9/8/20 9:09 AM, Claudio Fontana wrote: >>> On 9/7/20 6:49 PM, Philippe Mathieu-Daudé wrote: >>>> On 9/7/20 6:30 PM, Claudio Fontana wrote: >>>>> On 9/7/20 12:20 PM, Philippe Mathieu-Daudé wrote: >>>>>> On 9/7/20 12:05 PM, Claudio Fontana wrote: >>>>>>> Hi Richard, >>>>>>> >>>>>>> currently rebasing on top of this one, >>>>>>> just a question, why is the patch not directly using "current_machine"? >>>>>> >>>>>> For user mode? >>>>> >>>>> In user mode I'd not expect softmmu/cpus.c to be built at all... >>>> >>>> Which is why :) current_machine is NULL in user-mode. >>> >>> Ciao Philippe, >>> >>> then why does the patch change softmmu/cpus.c in a way that accounts for user mode? >>> >>> The function that the patch changes is never called in user mode. >>> >>> The patch could instead use current_machine without any concern of it being NULL, it will always be set in vl.c . >> >> Better send a patch to prove your point :) > > Yes, I am already testing without problems the cpus-refactoring series with this applied: > > commit 53f6db772f1522025650441102b16be17773bdc9 > Author: Claudio Fontana <cfontana@suse.de> > Date: Tue Sep 8 10:59:07 2020 +0200 > > accel/tcg: use current_machine as it is always set for softmmu > > current_machine is always set before accelerators are initialized, > so use that instead of MACHINE(qdev_get_machine()). > > Signed-off-by: Claudio Fontana <cfontana@suse.de> > > diff --git a/accel/tcg/tcg-cpus.c b/accel/tcg/tcg-cpus.c > index ec7158b55e..05af1168a2 100644 > --- a/accel/tcg/tcg-cpus.c > +++ b/accel/tcg/tcg-cpus.c > @@ -484,7 +484,7 @@ static void tcg_start_vcpu_thread(CPUState *cpu) > * then we will have cpus running in parallel. > */ > if (qemu_tcg_mttcg_enabled()) { > - MachineState *ms = MACHINE(qdev_get_machine()); > + MachineState *ms = current_machine; > if (ms->smp.max_cpus > 1) { > parallel_cpus = true; > } > > > >> >> I have bad remember about the last time I tried to understand/modify >> that part, because IIRC the user-mode creates some fake machine to >> satisfy various QEMU generic code assumptions. Sincerely I now prefer >> stay away from this; too many unmerged patches there. > > > linux-user/main.c and bsd-user/main.c seem to use cpu_create() to create the cpus, > then they have their own cpu_loop(), they do not use any of the cpus.c code. > > By contrast, softmmu/vl.c initializes current_machine in qemu_init(), > even before accelerator is chosen and configured. > > So there is no chance currently that current_machine is not set correctly when > the accelerator vcpu startup code in is called. Thanks for checking and correcting me! We are good then :) > > Ciao, > > CLaudio ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PULL 4/5] tcg: Eliminate one store for in-place 128-bit dup_mem 2020-09-03 21:40 [PULL 0/5] tcg patch queue Richard Henderson ` (2 preceding siblings ...) 2020-09-03 21:40 ` [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson @ 2020-09-03 21:41 ` Richard Henderson 2020-09-03 21:41 ` [PULL 5/5] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem Richard Henderson 2020-09-06 13:07 ` [PULL 0/5] tcg patch queue Peter Maydell 5 siblings, 0 replies; 30+ messages in thread From: Richard Henderson @ 2020-09-03 21:41 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, Philippe Mathieu-Daudé Do not store back to the exact memory from which we just loaded. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- tcg/tcg-op-gvec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index 793d4ba64c..fcc25b04e6 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1581,7 +1581,7 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V128); tcg_gen_ld_vec(in, cpu_env, aofs); - for (i = 0; i < oprsz; i += 16) { + for (i = (aofs == dofs) * 16; i < oprsz; i += 16) { tcg_gen_st_vec(in, cpu_env, dofs + i); } tcg_temp_free_vec(in); @@ -1591,7 +1591,7 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, tcg_gen_ld_i64(in0, cpu_env, aofs); tcg_gen_ld_i64(in1, cpu_env, aofs + 8); - for (i = 0; i < oprsz; i += 16) { + for (i = (aofs == dofs) * 16; i < oprsz; i += 16) { tcg_gen_st_i64(in0, cpu_env, dofs + i); tcg_gen_st_i64(in1, cpu_env, dofs + i + 8); } -- 2.25.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PULL 5/5] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem 2020-09-03 21:40 [PULL 0/5] tcg patch queue Richard Henderson ` (3 preceding siblings ...) 2020-09-03 21:41 ` [PULL 4/5] tcg: Eliminate one store for in-place 128-bit dup_mem Richard Henderson @ 2020-09-03 21:41 ` Richard Henderson 2020-09-06 13:07 ` [PULL 0/5] tcg patch queue Peter Maydell 5 siblings, 0 replies; 30+ messages in thread From: Richard Henderson @ 2020-09-03 21:41 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, Philippe Mathieu-Daudé We already support duplication of 128-bit blocks. This extends that support to 256-bit blocks. This will be needed by SVE2. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- tcg/tcg-op-gvec.c | 52 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c index fcc25b04e6..7ebd9e8298 100644 --- a/tcg/tcg-op-gvec.c +++ b/tcg/tcg-op-gvec.c @@ -1570,12 +1570,10 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, do_dup(vece, dofs, oprsz, maxsz, NULL, in, 0); tcg_temp_free_i64(in); } - } else { + } else if (vece == 4) { /* 128-bit duplicate. */ - /* ??? Dup to 256-bit vector. */ int i; - tcg_debug_assert(vece == 4); tcg_debug_assert(oprsz >= 16); if (TCG_TARGET_HAS_v128) { TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V128); @@ -1601,6 +1599,54 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs, if (oprsz < maxsz) { expand_clr(dofs + oprsz, maxsz - oprsz); } + } else if (vece == 5) { + /* 256-bit duplicate. */ + int i; + + tcg_debug_assert(oprsz >= 32); + tcg_debug_assert(oprsz % 32 == 0); + if (TCG_TARGET_HAS_v256) { + TCGv_vec in = tcg_temp_new_vec(TCG_TYPE_V256); + + tcg_gen_ld_vec(in, cpu_env, aofs); + for (i = (aofs == dofs) * 32; i < oprsz; i += 32) { + tcg_gen_st_vec(in, cpu_env, dofs + i); + } + tcg_temp_free_vec(in); + } else if (TCG_TARGET_HAS_v128) { + TCGv_vec in0 = tcg_temp_new_vec(TCG_TYPE_V128); + TCGv_vec in1 = tcg_temp_new_vec(TCG_TYPE_V128); + + tcg_gen_ld_vec(in0, cpu_env, aofs); + tcg_gen_ld_vec(in1, cpu_env, aofs + 16); + for (i = (aofs == dofs) * 32; i < oprsz; i += 32) { + tcg_gen_st_vec(in0, cpu_env, dofs + i); + tcg_gen_st_vec(in1, cpu_env, dofs + i + 16); + } + tcg_temp_free_vec(in0); + tcg_temp_free_vec(in1); + } else { + TCGv_i64 in[4]; + int j; + + for (j = 0; j < 4; ++j) { + in[j] = tcg_temp_new_i64(); + tcg_gen_ld_i64(in[j], cpu_env, aofs + j * 8); + } + for (i = (aofs == dofs) * 32; i < oprsz; i += 32) { + for (j = 0; j < 4; ++j) { + tcg_gen_st_i64(in[j], cpu_env, dofs + i + j * 8); + } + } + for (j = 0; j < 4; ++j) { + tcg_temp_free_i64(in[j]); + } + } + if (oprsz < maxsz) { + expand_clr(dofs + oprsz, maxsz - oprsz); + } + } else { + g_assert_not_reached(); } } -- 2.25.1 ^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2020-09-03 21:40 [PULL 0/5] tcg patch queue Richard Henderson ` (4 preceding siblings ...) 2020-09-03 21:41 ` [PULL 5/5] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem Richard Henderson @ 2020-09-06 13:07 ` Peter Maydell 5 siblings, 0 replies; 30+ messages in thread From: Peter Maydell @ 2020-09-06 13:07 UTC (permalink / raw) To: Richard Henderson; +Cc: QEMU Developers On Thu, 3 Sep 2020 at 22:41, Richard Henderson <richard.henderson@linaro.org> wrote: > > The following changes since commit 3dd23a4fb8fd72d2220a90a809f213999ffe7f3a: > > Merge remote-tracking branch 'remotes/legoater/tags/pull-aspeed-20200901' into staging (2020-09-03 14:12:48 +0100) > > are available in the Git repository at: > > https://github.com/rth7680/qemu.git tags/pull-tcg-20200903 > > for you to fetch changes up to fe4b0b5bfa96c38ad1cad0689a86cca9f307e353: > > tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem (2020-09-03 13:13:58 -0700) > > ---------------------------------------------------------------- > Improve inlining in cputlb.c. > Fix vector abs fallback. > Only set parallel_cpus for SMP. > Add vector dupm for 256-bit elements. > > ---------------------------------------------------------------- Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/5.2 for any user-visible changes. -- PMM ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PULL 0/5] tcg patch queue @ 2023-01-16 22:36 Richard Henderson 2023-01-17 15:47 ` Peter Maydell 2023-01-20 9:41 ` Thomas Huth 0 siblings, 2 replies; 30+ messages in thread From: Richard Henderson @ 2023-01-16 22:36 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell The following changes since commit fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: tests/qtest/qom-test: Do not print tested properties by default (2023-01-16 15:00:57 +0000) are available in the Git repository at: https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 for you to fetch changes up to 61710a7e23a63546da0071ea32adb96476fa5d07: accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 -1000) ---------------------------------------------------------------- - Reorg cpu_tb_exec around setjmp. - Use __attribute__((target)) for buffer_is_zero. - Add perfmap and jitdump for perf support. ---------------------------------------------------------------- Ilya Leoshkevich (3): linux-user: Clean up when exiting due to a signal accel/tcg: Add debuginfo support tcg: add perfmap and jitdump Richard Henderson (2): util/bufferiszero: Use __attribute__((target)) for avx2/avx512 accel/tcg: Split out cpu_exec_{setjmp,loop} docs/devel/tcg.rst | 23 +++ meson.build | 16 +- accel/tcg/debuginfo.h | 77 ++++++++++ accel/tcg/perf.h | 49 ++++++ accel/tcg/cpu-exec.c | 111 +++++++------- accel/tcg/debuginfo.c | 96 ++++++++++++ accel/tcg/perf.c | 375 ++++++++++++++++++++++++++++++++++++++++++++++ accel/tcg/translate-all.c | 7 + hw/core/loader.c | 5 + linux-user/elfload.c | 3 + linux-user/exit.c | 2 + linux-user/main.c | 15 ++ linux-user/signal.c | 8 +- softmmu/vl.c | 11 ++ tcg/tcg.c | 2 + util/bufferiszero.c | 41 +---- accel/tcg/meson.build | 2 + linux-user/meson.build | 1 + qemu-options.hx | 20 +++ 19 files changed, 763 insertions(+), 101 deletions(-) create mode 100644 accel/tcg/debuginfo.h create mode 100644 accel/tcg/perf.h create mode 100644 accel/tcg/debuginfo.c create mode 100644 accel/tcg/perf.c ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2023-01-16 22:36 Richard Henderson @ 2023-01-17 15:47 ` Peter Maydell 2023-01-20 9:41 ` Thomas Huth 1 sibling, 0 replies; 30+ messages in thread From: Peter Maydell @ 2023-01-17 15:47 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel On Mon, 16 Jan 2023 at 22:36, Richard Henderson <richard.henderson@linaro.org> wrote: > > The following changes since commit fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: > > tests/qtest/qom-test: Do not print tested properties by default (2023-01-16 15:00:57 +0000) > > are available in the Git repository at: > > https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 > > for you to fetch changes up to 61710a7e23a63546da0071ea32adb96476fa5d07: > > accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 -1000) > > ---------------------------------------------------------------- > - Reorg cpu_tb_exec around setjmp. > - Use __attribute__((target)) for buffer_is_zero. > - Add perfmap and jitdump for perf support. > > ---------------------------------------------------------------- Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0 for any user-visible changes. -- PMM ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2023-01-16 22:36 Richard Henderson 2023-01-17 15:47 ` Peter Maydell @ 2023-01-20 9:41 ` Thomas Huth 2023-01-20 10:50 ` Alex Bennée ` (2 more replies) 1 sibling, 3 replies; 30+ messages in thread From: Thomas Huth @ 2023-01-20 9:41 UTC (permalink / raw) To: Richard Henderson, qemu-devel, Ilya Leoshkevich; +Cc: peter.maydell On 16/01/2023 23.36, Richard Henderson wrote: > The following changes since commit fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: > > tests/qtest/qom-test: Do not print tested properties by default (2023-01-16 15:00:57 +0000) > > are available in the Git repository at: > > https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 > > for you to fetch changes up to 61710a7e23a63546da0071ea32adb96476fa5d07: > > accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 -1000) > > ---------------------------------------------------------------- > - Reorg cpu_tb_exec around setjmp. > - Use __attribute__((target)) for buffer_is_zero. > - Add perfmap and jitdump for perf support. > > ---------------------------------------------------------------- > Ilya Leoshkevich (3): > linux-user: Clean up when exiting due to a signal > accel/tcg: Add debuginfo support > tcg: add perfmap and jitdump > > Richard Henderson (2): > util/bufferiszero: Use __attribute__((target)) for avx2/avx512 > accel/tcg: Split out cpu_exec_{setjmp,loop} Hi Richard, hi Ilya, with the recent QEMU master branch (commit 701ed34), I'm now seeing failures in Travis: https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411 Everything was still fine a couple of days ago (commit fb7e799): https://app.travis-ci.com/github/huth/qemu/builds/259755664 ... so it seems this is likely related to this pull request. Could you please have a look? Thanks, Thomas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2023-01-20 9:41 ` Thomas Huth @ 2023-01-20 10:50 ` Alex Bennée 2023-01-20 10:53 ` Ilya Leoshkevich 2023-01-21 6:07 ` Richard Henderson 2 siblings, 0 replies; 30+ messages in thread From: Alex Bennée @ 2023-01-20 10:50 UTC (permalink / raw) To: Thomas Huth Cc: Richard Henderson, Ilya Leoshkevich, peter.maydell, qemu-devel Thomas Huth <thuth@redhat.com> writes: > On 16/01/2023 23.36, Richard Henderson wrote: >> The following changes since commit fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: >> tests/qtest/qom-test: Do not print tested properties by default >> (2023-01-16 15:00:57 +0000) >> are available in the Git repository at: >> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 >> for you to fetch changes up to >> 61710a7e23a63546da0071ea32adb96476fa5d07: >> accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 >> -1000) >> ---------------------------------------------------------------- >> - Reorg cpu_tb_exec around setjmp. >> - Use __attribute__((target)) for buffer_is_zero. >> - Add perfmap and jitdump for perf support. >> ---------------------------------------------------------------- >> Ilya Leoshkevich (3): >> linux-user: Clean up when exiting due to a signal >> accel/tcg: Add debuginfo support >> tcg: add perfmap and jitdump >> Richard Henderson (2): >> util/bufferiszero: Use __attribute__((target)) for avx2/avx512 >> accel/tcg: Split out cpu_exec_{setjmp,loop} > > Hi Richard, hi Ilya, > > with the recent QEMU master branch (commit 701ed34), I'm now seeing > failures in Travis: > > https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411 > > Everything was still fine a couple of days ago (commit fb7e799): > > https://app.travis-ci.com/github/huth/qemu/builds/259755664 > > ... so it seems this is likely related to this pull request. Could you > please have a look? Hmm maybe the code motion has revealed another form of the compiler bug. I guess these bugs don't die, they just refract. > > Thanks, > Thomas -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2023-01-20 9:41 ` Thomas Huth 2023-01-20 10:50 ` Alex Bennée @ 2023-01-20 10:53 ` Ilya Leoshkevich 2023-01-20 12:51 ` Thomas Huth 2023-01-21 6:07 ` Richard Henderson 2 siblings, 1 reply; 30+ messages in thread From: Ilya Leoshkevich @ 2023-01-20 10:53 UTC (permalink / raw) To: Thomas Huth, Richard Henderson, Alex Bennée, qemu-devel Cc: peter.maydell On Fri, 2023-01-20 at 10:41 +0100, Thomas Huth wrote: > On 16/01/2023 23.36, Richard Henderson wrote: > > The following changes since commit > > fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: > > > > tests/qtest/qom-test: Do not print tested properties by default > > (2023-01-16 15:00:57 +0000) > > > > are available in the Git repository at: > > > > https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 > > > > for you to fetch changes up to > > 61710a7e23a63546da0071ea32adb96476fa5d07: > > > > accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 > > -1000) > > > > ---------------------------------------------------------------- > > - Reorg cpu_tb_exec around setjmp. > > - Use __attribute__((target)) for buffer_is_zero. > > - Add perfmap and jitdump for perf support. > > > > ---------------------------------------------------------------- > > Ilya Leoshkevich (3): > > linux-user: Clean up when exiting due to a signal > > accel/tcg: Add debuginfo support > > tcg: add perfmap and jitdump > > > > Richard Henderson (2): > > util/bufferiszero: Use __attribute__((target)) for > > avx2/avx512 > > accel/tcg: Split out cpu_exec_{setjmp,loop} > > Hi Richard, hi Ilya, > > with the recent QEMU master branch (commit 701ed34), I'm now seeing > failures > in Travis: > > https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411 > > Everything was still fine a couple of days ago (commit fb7e799): > > https://app.travis-ci.com/github/huth/qemu/builds/259755664 > > ... so it seems this is likely related to this pull request. Could > you > please have a look? > > Thanks, > Thomas > I would expect this to be (temporarily) fixed by [1], but we probably don't set GITLAB_CI in Travis. Would it make sense to set it? It looks as if this variable is currently used only to skip certain tests. If not, then maybe split it into QEMU_CI, GITLAB_CI and TRAVIS_CI? https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg04438.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2023-01-20 10:53 ` Ilya Leoshkevich @ 2023-01-20 12:51 ` Thomas Huth 2023-01-20 16:49 ` Alex Bennée 0 siblings, 1 reply; 30+ messages in thread From: Thomas Huth @ 2023-01-20 12:51 UTC (permalink / raw) To: Ilya Leoshkevich, Richard Henderson, Alex Bennée, qemu-devel Cc: peter.maydell On 20/01/2023 11.53, Ilya Leoshkevich wrote: > On Fri, 2023-01-20 at 10:41 +0100, Thomas Huth wrote: >> On 16/01/2023 23.36, Richard Henderson wrote: >>> The following changes since commit >>> fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: >>> >>> tests/qtest/qom-test: Do not print tested properties by default >>> (2023-01-16 15:00:57 +0000) >>> >>> are available in the Git repository at: >>> >>> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 >>> >>> for you to fetch changes up to >>> 61710a7e23a63546da0071ea32adb96476fa5d07: >>> >>> accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 >>> -1000) >>> >>> ---------------------------------------------------------------- >>> - Reorg cpu_tb_exec around setjmp. >>> - Use __attribute__((target)) for buffer_is_zero. >>> - Add perfmap and jitdump for perf support. >>> >>> ---------------------------------------------------------------- >>> Ilya Leoshkevich (3): >>> linux-user: Clean up when exiting due to a signal >>> accel/tcg: Add debuginfo support >>> tcg: add perfmap and jitdump >>> >>> Richard Henderson (2): >>> util/bufferiszero: Use __attribute__((target)) for >>> avx2/avx512 >>> accel/tcg: Split out cpu_exec_{setjmp,loop} >> >> Hi Richard, hi Ilya, >> >> with the recent QEMU master branch (commit 701ed34), I'm now seeing >> failures >> in Travis: >> >> https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411 >> >> Everything was still fine a couple of days ago (commit fb7e799): >> >> https://app.travis-ci.com/github/huth/qemu/builds/259755664 >> >> ... so it seems this is likely related to this pull request. Could >> you >> please have a look? >> >> Thanks, >> Thomas >> > > I would expect this to be (temporarily) fixed by [1], but we probably > don't set GITLAB_CI in Travis. Would it make sense to set it? It looks > as if this variable is currently used only to skip certain tests. > > If not, then maybe split it into QEMU_CI, GITLAB_CI and TRAVIS_CI? > > https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg04438.html Ah, ok, so this test has issues in gitlab, too! For Travis, I think we should either check the CI or TRAVIS environment variables: https://docs.travis-ci.com/user/environment-variables/#default-environment-variables Thomas ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2023-01-20 12:51 ` Thomas Huth @ 2023-01-20 16:49 ` Alex Bennée 0 siblings, 0 replies; 30+ messages in thread From: Alex Bennée @ 2023-01-20 16:49 UTC (permalink / raw) To: Thomas Huth Cc: Ilya Leoshkevich, Richard Henderson, qemu-devel, peter.maydell Thomas Huth <thuth@redhat.com> writes: > On 20/01/2023 11.53, Ilya Leoshkevich wrote: >> On Fri, 2023-01-20 at 10:41 +0100, Thomas Huth wrote: >>> On 16/01/2023 23.36, Richard Henderson wrote: >>>> The following changes since commit >>>> fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: >>>> >>>> tests/qtest/qom-test: Do not print tested properties by default >>>> (2023-01-16 15:00:57 +0000) >>>> >>>> are available in the Git repository at: >>>> >>>> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 >>>> >>>> for you to fetch changes up to >>>> 61710a7e23a63546da0071ea32adb96476fa5d07: >>>> >>>> accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 >>>> -1000) >>>> >>>> ---------------------------------------------------------------- >>>> - Reorg cpu_tb_exec around setjmp. >>>> - Use __attribute__((target)) for buffer_is_zero. >>>> - Add perfmap and jitdump for perf support. >>>> >>>> ---------------------------------------------------------------- >>>> Ilya Leoshkevich (3): >>>> linux-user: Clean up when exiting due to a signal >>>> accel/tcg: Add debuginfo support >>>> tcg: add perfmap and jitdump >>>> >>>> Richard Henderson (2): >>>> util/bufferiszero: Use __attribute__((target)) for >>>> avx2/avx512 >>>> accel/tcg: Split out cpu_exec_{setjmp,loop} >>> >>> Hi Richard, hi Ilya, >>> >>> with the recent QEMU master branch (commit 701ed34), I'm now seeing >>> failures >>> in Travis: >>> >>> https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411 >>> >>> Everything was still fine a couple of days ago (commit fb7e799): >>> >>> https://app.travis-ci.com/github/huth/qemu/builds/259755664 >>> >>> ... so it seems this is likely related to this pull request. Could >>> you >>> please have a look? >>> >>> Thanks, >>> Thomas >>> >> I would expect this to be (temporarily) fixed by [1], but we >> probably >> don't set GITLAB_CI in Travis. Would it make sense to set it? It looks >> as if this variable is currently used only to skip certain tests. >> If not, then maybe split it into QEMU_CI, GITLAB_CI and TRAVIS_CI? >> https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg04438.html > > Ah, ok, so this test has issues in gitlab, too! *sigh* yeah the test is flaky but this is a subtly different failure mode. All the gitlab failures I saw where the test triggering the abort rather than the assert catch we have here. > > For Travis, I think we should either check the CI or TRAVIS > environment variables: > > > https://docs.travis-ci.com/user/environment-variables/#default-environment-variables > > Thomas -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2023-01-20 9:41 ` Thomas Huth 2023-01-20 10:50 ` Alex Bennée 2023-01-20 10:53 ` Ilya Leoshkevich @ 2023-01-21 6:07 ` Richard Henderson 2 siblings, 0 replies; 30+ messages in thread From: Richard Henderson @ 2023-01-21 6:07 UTC (permalink / raw) To: Thomas Huth, qemu-devel, Ilya Leoshkevich; +Cc: peter.maydell On 1/19/23 23:41, Thomas Huth wrote: > On 16/01/2023 23.36, Richard Henderson wrote: >> The following changes since commit fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: >> >> tests/qtest/qom-test: Do not print tested properties by default (2023-01-16 15:00:57 >> +0000) >> >> are available in the Git repository at: >> >> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 >> >> for you to fetch changes up to 61710a7e23a63546da0071ea32adb96476fa5d07: >> >> accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 -1000) >> >> ---------------------------------------------------------------- >> - Reorg cpu_tb_exec around setjmp. >> - Use __attribute__((target)) for buffer_is_zero. >> - Add perfmap and jitdump for perf support. >> >> ---------------------------------------------------------------- >> Ilya Leoshkevich (3): >> linux-user: Clean up when exiting due to a signal >> accel/tcg: Add debuginfo support >> tcg: add perfmap and jitdump >> >> Richard Henderson (2): >> util/bufferiszero: Use __attribute__((target)) for avx2/avx512 >> accel/tcg: Split out cpu_exec_{setjmp,loop} > > Hi Richard, hi Ilya, > > with the recent QEMU master branch (commit 701ed34), I'm now seeing failures in Travis: > > https://app.travis-ci.com/github/huth/qemu/jobs/593786529#L14411 > > Everything was still fine a couple of days ago (commit fb7e799): > > https://app.travis-ci.com/github/huth/qemu/builds/259755664 > > ... so it seems this is likely related to this pull request. Could you please have a look? Thankfully our s390x.ci.qemu.org has the same version gcc installed, and I was able to reproduce this. But only once -- it's irregular and very low frequency. The code generated by gcc is correct and easy to inspect, since cpu_exec_setjmp is now quite small: 00000000000f3250 <cpu_exec_setjmp.isra.0>: f3250: eb 6f f0 30 00 24 stmg %r6,%r15,48(%r15) f3256: a7 39 00 00 lghi %r3,0 f325a: e3 f0 ff 58 ff 71 lay %r15,-168(%r15) // Save cpu to stack+160. f3260: e3 20 f0 a0 00 24 stg %r2,160(%r15) f3266: 41 20 20 f0 la %r2,240(%r2) f326a: c0 e5 ff fb 10 eb brasl %r14,55440 <__sigsetjmp@plt> f3270: ec 26 00 0d 00 7e cijne %r2,0,f328a <cpu_exec_setjmp.isra.0+0x3a> // Reload cpu for cpu_exec_loop(). f3276: e3 20 f0 a0 00 04 lg %r2,160(%r15) f327c: c0 e5 ff ff fb ee brasl %r14,f2a58 <cpu_exec_loop.isra.0> f3282: eb 6f f0 d8 00 04 lmg %r6,%r15,216(%r15) f3288: 07 fe br %r14 // Load tls pointer and current_cpu address. f328a: b2 4f 00 10 ear %r1,%a0 f328e: c0 20 00 0a 35 9d larl %r2,239dc8 <current_cpu@@Base+0x239dc8> f3294: eb 11 00 20 00 0d sllg %r1,%r1,32 f329a: e3 20 20 00 00 04 lg %r2,0(%r2) f32a0: b2 4f 00 11 ear %r1,%a1 // Reload cpu for comparison f32a4: e3 30 f0 a0 00 04 lg %r3,160(%r15) // cpu == current_cpu f32aa: e3 32 10 00 00 20 cg %r3,0(%r2,%r1) f32b0: a7 84 00 12 je f32d4 <cpu_exec_setjmp.isra.0+0x84> ... The only way I can imagine that this comparison fails is if we have corrupted the stack in some way. I have not been able to induce failure under any sort of debugging, and I can't imagine where irregular corruption would have come from. r~ r~ ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PULL 0/5] tcg patch queue @ 2023-01-16 22:36 Richard Henderson 0 siblings, 0 replies; 30+ messages in thread From: Richard Henderson @ 2023-01-16 22:36 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell The following changes since commit fb7e7990342e59cf67dbd895c1a1e3fb1741df7a: tests/qtest/qom-test: Do not print tested properties by default (2023-01-16 15:00:57 +0000) are available in the Git repository at: https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230116 for you to fetch changes up to 61710a7e23a63546da0071ea32adb96476fa5d07: accel/tcg: Split out cpu_exec_{setjmp,loop} (2023-01-16 10:14:12 -1000) ---------------------------------------------------------------- - Reorg cpu_tb_exec around setjmp. - Use __attribute__((target)) for buffer_is_zero. - Add perfmap and jitdump for perf support. ---------------------------------------------------------------- Ilya Leoshkevich (3): linux-user: Clean up when exiting due to a signal accel/tcg: Add debuginfo support tcg: add perfmap and jitdump Richard Henderson (2): util/bufferiszero: Use __attribute__((target)) for avx2/avx512 accel/tcg: Split out cpu_exec_{setjmp,loop} docs/devel/tcg.rst | 23 +++ meson.build | 16 +- accel/tcg/debuginfo.h | 77 ++++++++++ accel/tcg/perf.h | 49 ++++++ accel/tcg/cpu-exec.c | 111 +++++++------- accel/tcg/debuginfo.c | 96 ++++++++++++ accel/tcg/perf.c | 375 ++++++++++++++++++++++++++++++++++++++++++++++ accel/tcg/translate-all.c | 7 + hw/core/loader.c | 5 + linux-user/elfload.c | 3 + linux-user/exit.c | 2 + linux-user/main.c | 15 ++ linux-user/signal.c | 8 +- softmmu/vl.c | 11 ++ tcg/tcg.c | 2 + util/bufferiszero.c | 41 +---- accel/tcg/meson.build | 2 + linux-user/meson.build | 1 + qemu-options.hx | 20 +++ 19 files changed, 763 insertions(+), 101 deletions(-) create mode 100644 accel/tcg/debuginfo.h create mode 100644 accel/tcg/perf.h create mode 100644 accel/tcg/debuginfo.c create mode 100644 accel/tcg/perf.c ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PULL 0/5] tcg patch queue @ 2021-05-01 18:51 Richard Henderson 2021-05-01 19:27 ` no-reply 2021-05-02 13:32 ` Peter Maydell 0 siblings, 2 replies; 30+ messages in thread From: Richard Henderson @ 2021-05-01 18:51 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell The following changes since commit 8f860d2633baf9c2b6261f703f86e394c6bc22ca: Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-04-30' into staging (2021-04-30 16:02:00 +0100) are available in the Git repository at: https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210501 for you to fetch changes up to af93ccacc772019298be4c3e47251cdaa60d0c21: decodetree: Extend argument set syntax to allow types (2021-05-01 11:45:35 -0700) ---------------------------------------------------------------- Include cleanups. Decodetree enhancements for power10. ---------------------------------------------------------------- Luis Fernando Fujita Pires (1): decodetree: Add support for 64-bit instructions Philippe Mathieu-Daudé (1): exec: Remove accel/tcg/ from include paths Richard Henderson (3): decodetree: Introduce whex and whexC helpers decodetree: More use of f-strings decodetree: Extend argument set syntax to allow types docs/devel/decodetree.rst | 11 ++- meson.build | 1 - include/exec/helper-gen.h | 4 +- include/exec/helper-proto.h | 4 +- include/exec/helper-tcg.h | 4 +- tests/decode/succ_argset_type1.decode | 1 + scripts/decodetree.py | 172 +++++++++++++++++++--------------- 7 files changed, 112 insertions(+), 85 deletions(-) create mode 100644 tests/decode/succ_argset_type1.decode ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2021-05-01 18:51 Richard Henderson @ 2021-05-01 19:27 ` no-reply 2021-05-02 13:32 ` Peter Maydell 1 sibling, 0 replies; 30+ messages in thread From: no-reply @ 2021-05-01 19:27 UTC (permalink / raw) To: richard.henderson; +Cc: peter.maydell, qemu-devel Patchew URL: https://patchew.org/QEMU/20210501185116.1338875-1-richard.henderson@linaro.org/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20210501185116.1338875-1-richard.henderson@linaro.org Subject: [PULL 0/5] tcg patch queue === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20210501185116.1338875-1-richard.henderson@linaro.org -> patchew/20210501185116.1338875-1-richard.henderson@linaro.org Switched to a new branch 'test' 3f52d0d decodetree: Extend argument set syntax to allow types 2f170a4 decodetree: Add support for 64-bit instructions 6567eed decodetree: More use of f-strings 95caca8 decodetree: Introduce whex and whexC helpers b9a64b1 exec: Remove accel/tcg/ from include paths === OUTPUT BEGIN === 1/5 Checking commit b9a64b13ad89 (exec: Remove accel/tcg/ from include paths) 2/5 Checking commit 95caca818e08 (decodetree: Introduce whex and whexC helpers) ERROR: line over 90 characters #52: FILE: scripts/decodetree.py:495: + output(ind, f'if ((insn & {whexC(innermask)}) == {whexC(innerbits)}) {{\n') WARNING: line over 80 characters #53: FILE: scripts/decodetree.py:496: + output(ind, f' /* {str_match_bits(p.fixedbits, p.fixedmask)} */\n') total: 1 errors, 1 warnings, 136 lines checked Patch 2/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 3/5 Checking commit 6567eed04461 (decodetree: More use of f-strings) 4/5 Checking commit 2f170a408195 (decodetree: Add support for 64-bit instructions) WARNING: line over 80 characters #75: FILE: scripts/decodetree.py:236: + ret = f'deposit{bitop_width}({ret}, {pos}, {bitop_width - pos}, {ext})' total: 0 errors, 1 warnings, 63 lines checked Patch 4/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 5/5 Checking commit 3f52d0d96c42 (decodetree: Extend argument set syntax to allow types) Use of uninitialized value $acpi_testexpected in string eq at ./scripts/checkpatch.pl line 1529. WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #156: new file mode 100644 total: 0 errors, 1 warnings, 121 lines checked Patch 5/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20210501185116.1338875-1-richard.henderson@linaro.org/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2021-05-01 18:51 Richard Henderson 2021-05-01 19:27 ` no-reply @ 2021-05-02 13:32 ` Peter Maydell 1 sibling, 0 replies; 30+ messages in thread From: Peter Maydell @ 2021-05-02 13:32 UTC (permalink / raw) To: Richard Henderson; +Cc: QEMU Developers On Sat, 1 May 2021 at 19:51, Richard Henderson <richard.henderson@linaro.org> wrote: > > The following changes since commit 8f860d2633baf9c2b6261f703f86e394c6bc22ca: > > Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2021-04-30' into staging (2021-04-30 16:02:00 +0100) > > are available in the Git repository at: > > https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210501 > > for you to fetch changes up to af93ccacc772019298be4c3e47251cdaa60d0c21: > > decodetree: Extend argument set syntax to allow types (2021-05-01 11:45:35 -0700) > > ---------------------------------------------------------------- > Include cleanups. > Decodetree enhancements for power10. Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1 for any user-visible changes. -- PMM ^ permalink raw reply [flat|nested] 30+ messages in thread
* [PULL 0/5] tcg patch queue @ 2020-03-17 19:00 Richard Henderson 2020-03-17 23:34 ` no-reply 2020-03-19 10:17 ` Peter Maydell 0 siblings, 2 replies; 30+ messages in thread From: Richard Henderson @ 2020-03-17 19:00 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell The following changes since commit 40c67636f67c2a89745f2e698522fe917326a952: Merge remote-tracking branch 'remotes/kraxel/tags/usb-20200317-pull-request' into staging (2020-03-17 14:00:56 +0000) are available in the Git repository at: https://github.com/rth7680/qemu.git tags/pull-tcg-20200317 for you to fetch changes up to 0270bd503e3699b7202200a2d693ad1feb57473f: tcg: Remove tcg-runtime-gvec.c DO_CMP0 (2020-03-17 08:41:07 -0700) ---------------------------------------------------------------- Fix tcg/i386 bug vs sari_vec. Fix tcg-runtime-gvec.c vs i386 without avx. ---------------------------------------------------------------- Richard Henderson (5): tcg/i386: Bound shift count expanding sari_vec tcg: Remove CONFIG_VECTOR16 tcg: Tidy tcg-runtime-gvec.c types tcg: Tidy tcg-runtime-gvec.c DUP* tcg: Remove tcg-runtime-gvec.c DO_CMP0 configure | 56 -------- accel/tcg/tcg-runtime-gvec.c | 298 +++++++++++++++++-------------------------- tcg/i386/tcg-target.inc.c | 9 +- 3 files changed, 122 insertions(+), 241 deletions(-) ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2020-03-17 19:00 Richard Henderson @ 2020-03-17 23:34 ` no-reply 2020-03-19 10:17 ` Peter Maydell 1 sibling, 0 replies; 30+ messages in thread From: no-reply @ 2020-03-17 23:34 UTC (permalink / raw) To: richard.henderson; +Cc: peter.maydell, qemu-devel Patchew URL: https://patchew.org/QEMU/20200317190013.25036-1-richard.henderson@linaro.org/ Hi, This series seems to have some coding style problems. See output below for more information: Subject: [PULL 0/5] tcg patch queue Message-id: 20200317190013.25036-1-richard.henderson@linaro.org Type: series === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 83eaadd tcg: Remove tcg-runtime-gvec.c DO_CMP0 e0008a5 tcg: Tidy tcg-runtime-gvec.c DUP* 8e7d6d3 tcg: Tidy tcg-runtime-gvec.c types 44bd3c5 tcg: Remove CONFIG_VECTOR16 f410c29 tcg/i386: Bound shift count expanding sari_vec === OUTPUT BEGIN === 1/5 Checking commit f410c296b774 (tcg/i386: Bound shift count expanding sari_vec) 2/5 Checking commit 44bd3c5fbbdb (tcg: Remove CONFIG_VECTOR16) 3/5 Checking commit 8e7d6d39c529 (tcg: Tidy tcg-runtime-gvec.c types) ERROR: spaces required around that '&' (ctx:WxO) #442: FILE: accel/tcg/tcg-runtime-gvec.c:510: + *(uint64_t *)(d + i) = *(uint64_t *)(a + i) &~ *(uint64_t *)(b + i); ^ ERROR: space prohibited after that '~' (ctx:OxW) #442: FILE: accel/tcg/tcg-runtime-gvec.c:510: + *(uint64_t *)(d + i) = *(uint64_t *)(a + i) &~ *(uint64_t *)(b + i); ^ ERROR: spaces required around that '|' (ctx:WxO) #453: FILE: accel/tcg/tcg-runtime-gvec.c:521: + *(uint64_t *)(d + i) = *(uint64_t *)(a + i) |~ *(uint64_t *)(b + i); ^ ERROR: space prohibited after that '~' (ctx:OxW) #453: FILE: accel/tcg/tcg-runtime-gvec.c:521: + *(uint64_t *)(d + i) = *(uint64_t *)(a + i) |~ *(uint64_t *)(b + i); ^ ERROR: spaces required around that '==' (ctx:WxB) #677: FILE: accel/tcg/tcg-runtime-gvec.c:897: + DO_CMP1(gvec_eq##SZ, uint##SZ##_t, ==) \ ^ ERROR: spaces required around that '!=' (ctx:WxB) #678: FILE: accel/tcg/tcg-runtime-gvec.c:898: + DO_CMP1(gvec_ne##SZ, uint##SZ##_t, !=) \ ^ ERROR: spaces required around that '<' (ctx:WxB) #679: FILE: accel/tcg/tcg-runtime-gvec.c:899: + DO_CMP1(gvec_lt##SZ, int##SZ##_t, <) \ ^ ERROR: spaces required around that '<=' (ctx:WxB) #680: FILE: accel/tcg/tcg-runtime-gvec.c:900: + DO_CMP1(gvec_le##SZ, int##SZ##_t, <=) \ ^ ERROR: spaces required around that '<' (ctx:WxB) #681: FILE: accel/tcg/tcg-runtime-gvec.c:901: + DO_CMP1(gvec_ltu##SZ, uint##SZ##_t, <) \ ^ ERROR: spaces required around that '<=' (ctx:WxB) #682: FILE: accel/tcg/tcg-runtime-gvec.c:902: + DO_CMP1(gvec_leu##SZ, uint##SZ##_t, <=) ^ total: 10 errors, 0 warnings, 630 lines checked Patch 3/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 4/5 Checking commit e0008a500fbb (tcg: Tidy tcg-runtime-gvec.c DUP*) 5/5 Checking commit 83eaadd6af23 (tcg: Remove tcg-runtime-gvec.c DO_CMP0) ERROR: spaces required around that '*' (ctx:WxV) #30: FILE: accel/tcg/tcg-runtime-gvec.c:869: + *(TYPE *)(d + i) = -(*(TYPE *)(a + i) OP *(TYPE *)(b + i)); \ ^ total: 1 errors, 0 warnings, 23 lines checked Patch 5/5 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20200317190013.25036-1-richard.henderson@linaro.org/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PULL 0/5] tcg patch queue 2020-03-17 19:00 Richard Henderson 2020-03-17 23:34 ` no-reply @ 2020-03-19 10:17 ` Peter Maydell 1 sibling, 0 replies; 30+ messages in thread From: Peter Maydell @ 2020-03-19 10:17 UTC (permalink / raw) To: Richard Henderson; +Cc: QEMU Developers On Tue, 17 Mar 2020 at 19:00, Richard Henderson <richard.henderson@linaro.org> wrote: > > The following changes since commit 40c67636f67c2a89745f2e698522fe917326a952: > > Merge remote-tracking branch 'remotes/kraxel/tags/usb-20200317-pull-request' into staging (2020-03-17 14:00:56 +0000) > > are available in the Git repository at: > > https://github.com/rth7680/qemu.git tags/pull-tcg-20200317 > > for you to fetch changes up to 0270bd503e3699b7202200a2d693ad1feb57473f: > > tcg: Remove tcg-runtime-gvec.c DO_CMP0 (2020-03-17 08:41:07 -0700) > > ---------------------------------------------------------------- > Fix tcg/i386 bug vs sari_vec. > Fix tcg-runtime-gvec.c vs i386 without avx. > > ---------------------------------------------------------------- Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0 for any user-visible changes. -- PMM ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2023-01-21 6:08 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-03 21:40 [PULL 0/5] tcg patch queue Richard Henderson 2020-09-03 21:40 ` [PULL 1/5] cputlb: Make store_helper less fragile to compiler optimizations Richard Henderson 2020-09-03 21:40 ` [PULL 2/5] tcg: Fix tcg gen for vectorized absolute value Richard Henderson 2020-09-03 21:40 ` [PULL 3/5] softmmu/cpus: Only set parallel_cpus for SMP Richard Henderson 2020-09-07 10:05 ` Claudio Fontana 2020-09-07 10:20 ` Philippe Mathieu-Daudé 2020-09-07 16:30 ` Claudio Fontana 2020-09-07 16:49 ` Philippe Mathieu-Daudé 2020-09-08 7:09 ` Claudio Fontana 2020-09-08 7:56 ` Philippe Mathieu-Daudé 2020-09-10 8:28 ` Claudio Fontana 2020-09-10 10:32 ` Philippe Mathieu-Daudé 2020-09-03 21:41 ` [PULL 4/5] tcg: Eliminate one store for in-place 128-bit dup_mem Richard Henderson 2020-09-03 21:41 ` [PULL 5/5] tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem Richard Henderson 2020-09-06 13:07 ` [PULL 0/5] tcg patch queue Peter Maydell -- strict thread matches above, loose matches on Subject: below -- 2023-01-16 22:36 Richard Henderson 2023-01-17 15:47 ` Peter Maydell 2023-01-20 9:41 ` Thomas Huth 2023-01-20 10:50 ` Alex Bennée 2023-01-20 10:53 ` Ilya Leoshkevich 2023-01-20 12:51 ` Thomas Huth 2023-01-20 16:49 ` Alex Bennée 2023-01-21 6:07 ` Richard Henderson 2023-01-16 22:36 Richard Henderson 2021-05-01 18:51 Richard Henderson 2021-05-01 19:27 ` no-reply 2021-05-02 13:32 ` Peter Maydell 2020-03-17 19:00 Richard Henderson 2020-03-17 23:34 ` no-reply 2020-03-19 10:17 ` Peter Maydell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.