All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [PATCH v2 00/45] tcg: support for multiple TCG contexts
@ 2017-07-19  0:22 jiang.biao2
  2017-07-19  1:08 ` Richard Henderson
  0 siblings, 1 reply; 4+ messages in thread
From: jiang.biao2 @ 2017-07-19  0:22 UTC (permalink / raw)
  To: cota; +Cc: qemu-devel, rth

Hi,

Seeing your work on multiple TCG, it seems that it has some kind of connection with the  MTTCG feature,

but I do not figure out how they are connected in detail.

Could you pls help to confirm the following questions:

what is the relationship between your patches and the MTTCG feature mentioned by https://lwn.net/Articles/697265/?

What is the current status of the development of the MTTCG feature?

Is
 there any problem with the multithread programme running with 
linux-user qemu mode? would the situation be improved with  the MTTCG 
feature?

We need to use linux-user mode qemu to run multithread app, but there seems to be many problem.





Thanks a lot.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/45] tcg: support for multiple TCG contexts
  2017-07-19  0:22 [Qemu-devel] [PATCH v2 00/45] tcg: support for multiple TCG contexts jiang.biao2
@ 2017-07-19  1:08 ` Richard Henderson
  2017-07-19  9:31   ` Alex Bennée
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Henderson @ 2017-07-19  1:08 UTC (permalink / raw)
  To: jiang.biao2, cota; +Cc: qemu-devel, Alex Bennée

On 07/18/2017 02:22 PM, jiang.biao2@zte.com.cn wrote:
> Seeing your work on multiple TCG, it seems that it has some kind of connection 
> with the  MTTCG feature,
> 
> but I do not figure out how they are connected in detail.
> 
> Could you pls help to confirm the following questions:
> 
>  1.
> 
>     what is the relationship between your patches and the MTTCG feature
>     mentioned by https://lwn.net/Articles/697265/?


The current MTTCG feature is in QEMU mainline.  It allows parallel execution of 
translated code in both system mode.  It does *not* allow parallel translation 
-- all translation is done with tb_lock held.

Note that we *always* have parallel execution in user mode.  However, this can 
and does lead to problems.  See below.

This patch set allows parallel translation in system mode.  This is shown to 
improve the overall throughput.  It does *not* allow parallel translation in 
user mode.  Firstly because user mode already shares more translations between 
threads (because it is running a single executable), and so the translation 
routines are not high in the profile.  Secondly because there are additional 
locking problems due to the fact that we have no bound on the number of user 
threads.


>  2.
> 
>     What is the current status of the development of the MTTCG feature?

MTTCG has only been enabled on a few targets: alpha, arm, ppc64.
Look for "mttcg=yes" in configure.

In order for MTTCG to be enabled, the target must be adjusted so that
(1) all atomic instructions are implemented with atomic tcg operations,
(2) define TCG_GUEST_DEFAULT_MO to indicate any barriers implied by
     normal memory operations by the target architecture.

For target/mips, neither of these things are complete.

MTTCG has only been enabled on one host: i386.
Look for TCG_TARGET_DEFAULT_MO in tcg/*/tcg-target.h.

In order for MTTCG to be enabled, the target memory order must not be stronger 
than the host memory order.  Since i386 has a very strong host memory order, it 
is easy for it to emulate any guest.  When the host has a weak memory order, we 
need to add the additional barriers that are implied by the target.  This is 
work that has not been done.

I am not sure why we have not already added this definition to all of the other 
tcg hosts.  I think this is just oversight, since almost everyone uses x86_64 
linux as the host for testing tcg.  However, since all of the supported targets 
have weak memory orders we ought to be able to support them with any host.


>  3.
> 
>     Is there any problem with the multithread programme running with linux-user
>     qemu mode? would the situation be improved with  the MTTCG feature?
> 
>     We need to use linux-user mode qemu to run multithread app, but there seems
>     to be many problem.

For user mode, we should still follow the rules for MTTCG, but we do not. 
Instead we take it on faith that they have been and execute the code in 
parallel anyway.  This faith is often misplaced and it does mean that 
unsupported targets execute user mode code incorrectly.


r~

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/45] tcg: support for multiple TCG contexts
  2017-07-19  1:08 ` Richard Henderson
@ 2017-07-19  9:31   ` Alex Bennée
  0 siblings, 0 replies; 4+ messages in thread
From: Alex Bennée @ 2017-07-19  9:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: jiang.biao2, cota, qemu-devel


Richard Henderson <rth@twiddle.net> writes:

> On 07/18/2017 02:22 PM, jiang.biao2@zte.com.cn wrote:
>> Seeing your work on multiple TCG, it seems that it has some kind of
>> connection with the  MTTCG feature,
>>
>> but I do not figure out how they are connected in detail.
>>
>> Could you pls help to confirm the following questions:
>>
>>  1.
>>
>>     what is the relationship between your patches and the MTTCG feature
>>     mentioned by https://lwn.net/Articles/697265/?
>
>
> The current MTTCG feature is in QEMU mainline.  It allows parallel
> execution of translated code in both system mode.  It does *not* allow
> parallel translation -- all translation is done with tb_lock held.
>
> Note that we *always* have parallel execution in user mode.  However,
> this can and does lead to problems.  See below.
>
> This patch set allows parallel translation in system mode.  This is
> shown to improve the overall throughput.  It does *not* allow parallel
> translation in user mode.  Firstly because user mode already shares
> more translations between threads (because it is running a single
> executable), and so the translation routines are not high in the
> profile.  Secondly because there are additional locking problems due
> to the fact that we have no bound on the number of user threads.
>
>
>>  2.
>>
>>     What is the current status of the development of the MTTCG feature?
>
> MTTCG has only been enabled on a few targets: alpha, arm, ppc64.
> Look for "mttcg=yes" in configure.
>
> In order for MTTCG to be enabled, the target must be adjusted so that
> (1) all atomic instructions are implemented with atomic tcg operations,
> (2) define TCG_GUEST_DEFAULT_MO to indicate any barriers implied by
>     normal memory operations by the target architecture.
>
> For target/mips, neither of these things are complete.
>
> MTTCG has only been enabled on one host: i386.
> Look for TCG_TARGET_DEFAULT_MO in tcg/*/tcg-target.h.
>
> In order for MTTCG to be enabled, the target memory order must not be
> stronger than the host memory order.  Since i386 has a very strong
> host memory order, it is easy for it to emulate any guest.  When the
> host has a weak memory order, we need to add the additional barriers
> that are implied by the target.  This is work that has not been done.
>
> I am not sure why we have not already added this definition to all of
> the other tcg hosts.  I think this is just oversight, since almost
> everyone uses x86_64 linux as the host for testing tcg.  However,
> since all of the supported targets have weak memory orders we ought to
> be able to support them with any host.

Yeah the MO definitions can be added to the other guests/backends. I was
hoping it would be done by those who have a better understanding of each
guests micro-architecture so as to avoid any silly mistakes. As you say
I think they will all mostly be 0 anyway ;-)

>>  3.
>>
>>     Is there any problem with the multithread programme running with linux-user
>>     qemu mode? would the situation be improved with  the MTTCG feature?
>>
>>     We need to use linux-user mode qemu to run multithread app, but there seems
>>     to be many problem.
>
> For user mode, we should still follow the rules for MTTCG, but we do
> not. Instead we take it on faith that they have been and execute the
> code in parallel anyway.  This faith is often misplaced and it does
> mean that unsupported targets execute user mode code incorrectly.

Certainly for user-mode once the appropriate changes are made to atomic
and barrier support instructions user-mode stability should improve
(modulo unsupported/buggy syscall translation).

System mode emulation usually requires a bit more work and system-wide
instructions, usually ones which trigger interrupts and TLB flushes, to
make sure they are done in a safe way.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PATCH v2 00/45] tcg: support for multiple TCG contexts
@ 2017-07-16 20:03 Emilio G. Cota
  0 siblings, 0 replies; 4+ messages in thread
From: Emilio G. Cota @ 2017-07-16 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

v1:
  https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02059.html

Thanks all for your comments on v1.

This v2 patchset applies on top of stefanha's tracing tree (9212a18e371):
  https://github.com/stefanha/qemu/tree/tracing
That tree has some changes (per-vcpu TCG tracing) that would conflict
with many of the patches in this series. So I based the series on that
tree.

To ease review/testing, you can pull this series from:
  https://github.com/cota/qemu/tree/multi-tcg-v2

Note: patches 1 and 2 are already on master, but not yet on stefanha's tree.
So I'm leaving them here.

Note: I cannot even compile-test _WIN32 bits, help appreciated! See
patches 40-41.

Changes from v1:
- Added R-b tags
- Added comments to the commit logs about the atomic_set/read thing.
- Renamed have_tb_lock to acquired_tb_lock in tb_find
- Merged tb->invalid into tb->cflags
  - Cleaned up the checking of the tb->invalid field
- Consolidated TB lookups into a common tb_lookup__cpu_state function
  - Removed addr argument from lookup_tb_ptr
- Defined CF_PARALLEL, and used it for hashing. Incorporated Richard's
  feedback on the previous patch, including:
  - Removed use of parallel_cpus from target/*
  - Removed use of parallel_cpus from tcg/*
  - Moved down the exclusive region in cpu_exec_step_atomic
    - Brought cpu_exec_step into cpu_exec_step_atomic
- Defined and used DEBUG_*_GATE in translate-all
  - Introduced TB_PAGE_ADDR_FMT
- Defined struct tb_tc to bring together tb->tc_{ptr,search,size}
  - Used the struct for g_tree comparisons
  - The struct has now a 4-byte hole, but really given the added
    tb->trace_vcpu_dstate field (a u32) we probably can just live
    with it.
- renamed tb_free to tb_remove
- Use size_t everywhere when counting TB's and code size
- Moved tci_regs to tcg_qemu_tb_exec's stack
- Defined tcg_init_ctx and made tcg_ctx a pointer
- Switched to dynamic allocation of TCG optimizer globals
  - Folded them into TCGContext
- Introduced an array of *tcg_ctx's (instead of a list) to keep track
  of TCGContexts.
- Wrapped a macro with do..while(0) in the TCGProf patch to please checkpatch
- Moved qemu_real_host_page_size/mask to osdep
  - Introduced qemu_mprotect_rwx/none in osdep
    - Used these helpers instead of local inlines in translate-all.c
- TCG regions:
  - tcg_region_init takes a desired number of regions, not a desired
    region size.
      - TCG region sizes are a multiple of the host's page size
  - Add a guard page at the end of each region
    - Do not allocate a guard page when allocating code_gen_buffer
  - switched tcg_region_alloc to positive logic (return true on error)
  - Document non-trivial functions (N.B. some doc added in the region
    patch, but quite a bit more is added in the "multiple TCG context"
    patch)
  - Simplified initialization: child TCG threads just have to call
    tcg_register_thread(). All other initialization is done by the
    parent thread.
  - Changed the place at which we call tcg_region_init in softmmu,
    so that we can check whether mttcg is enabled when deciding
    how many regions to have.
    - Use 1 region when !mttcg.
- Dropped the "do not hold tb_lock" patch for now; the patchset is
  already too long, and to do a good job there takes more than just
  one patch. I have already started working on that though, based
  on the feedback from v1.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-07-19  9:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-19  0:22 [Qemu-devel] [PATCH v2 00/45] tcg: support for multiple TCG contexts jiang.biao2
2017-07-19  1:08 ` Richard Henderson
2017-07-19  9:31   ` Alex Bennée
  -- strict thread matches above, loose matches on Subject: below --
2017-07-16 20:03 Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.