Re: [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG?

* Re: [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG?
       [not found] <60169742.1c69fb81.90ae8.cdc6SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2021-01-31 23:01 ` Richard Henderson
  2021-02-01 16:59   ` Liren Wei
       [not found]   ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 3+ messages in thread
From: Richard Henderson @ 2021-01-31 23:01 UTC (permalink / raw)
  To: Liren Wei, qemu-devel; +Cc: Paolo Bonzini, Alex Bennée, Peter Maydell

On 1/31/21 1:38 AM, Liren Wei wrote:
> However, similar to the situation described in:
> https://lists.nongnu.org/archive/html/qemu-devel/2018-02/msg02529.html
> 
> When we have 2 vCPUs with one of them writing to the code page while
> the other just translated some code within that same page, the following
> situation might happen:
> 
>    vCPU thread 1 - writing      vCPU thread 2 - translating
>    -----------------------      -----------------------
>    TLB check -> slow path
>      notdirty_write()
>        set dirty flag
>      write to RAM
>                                 tb_gen_code()
>                                   tb_page_add()
>                                     tlb_protect_code()
> 
>    TLB check -> fast path
>                                       set TLB_NOTDIRTY
>      write to RAM
> executing unmodified code for this time
>                                 and maybe also for the next time, never
>                                 re-translate modified TBs.
> 
> 
> My question is:
>   Should the situation described above be considered as a bug or,
>   an intended behavior for QEMU (, so it's the programmer's fault
>   for not flushing the icache after modifying shared code page)?

Yes, this is a bug, because we are trying to support e.g. x86 which does not
require an icache flush.

I think the page lock, the TLB_NOTDIRTY setting, and a possible sync on the
setting, needs to happen before the bytes are read during translation.
Otherwise we don't catch the case above, nor do we catch

	CPU1			CPU2
	------------------	--------------------------
	TLB check -> fast
				tb_gen_code() -> all of it
	  write to ram

Also because of x86 (and other architectures in which a single instruction can
span a page boundary), I think this lock+set+sync sequence needs to happen on
demand in something called from the function set defined in
include/exec/translator.h

That also means that any target/cpu/ which has not been converted to use that
interface remains broken, and should be converted or deprecated.

Are you planning to work on this?

r~

^ permalink raw reply	[flat|nested] 3+ messages in thread