All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG?
       [not found] <60169742.1c69fb81.90ae8.cdc6SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2021-01-31 23:01 ` Richard Henderson
  2021-02-01 16:59   ` Liren Wei
       [not found]   ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com>
  0 siblings, 2 replies; 3+ messages in thread
From: Richard Henderson @ 2021-01-31 23:01 UTC (permalink / raw)
  To: Liren Wei, qemu-devel; +Cc: Paolo Bonzini, Alex Bennée, Peter Maydell

On 1/31/21 1:38 AM, Liren Wei wrote:
> However, similar to the situation described in:
> https://lists.nongnu.org/archive/html/qemu-devel/2018-02/msg02529.html
> 
> When we have 2 vCPUs with one of them writing to the code page while
> the other just translated some code within that same page, the following
> situation might happen:
> 
>    vCPU thread 1 - writing      vCPU thread 2 - translating
>    -----------------------      -----------------------
>    TLB check -> slow path
>      notdirty_write()
>        set dirty flag
>      write to RAM
>                                 tb_gen_code()
>                                   tb_page_add()
>                                     tlb_protect_code()
> 
>    TLB check -> fast path
>                                       set TLB_NOTDIRTY
>      write to RAM
> executing unmodified code for this time
>                                 and maybe also for the next time, never
>                                 re-translate modified TBs.
> 
> 
> My question is:
>   Should the situation described above be considered as a bug or,
>   an intended behavior for QEMU (, so it's the programmer's fault
>   for not flushing the icache after modifying shared code page)?

Yes, this is a bug, because we are trying to support e.g. x86 which does not
require an icache flush.

I think the page lock, the TLB_NOTDIRTY setting, and a possible sync on the
setting, needs to happen before the bytes are read during translation.
Otherwise we don't catch the case above, nor do we catch

	CPU1			CPU2
	------------------	--------------------------
	TLB check -> fast
				tb_gen_code() -> all of it
	  write to ram

Also because of x86 (and other architectures in which a single instruction can
span a page boundary), I think this lock+set+sync sequence needs to happen on
demand in something called from the function set defined in
include/exec/translator.h

That also means that any target/cpu/ which has not been converted to use that
interface remains broken, and should be converted or deprecated.

Are you planning to work on this?


r~


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG?
  2021-01-31 23:01 ` [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG? Richard Henderson
@ 2021-02-01 16:59   ` Liren Wei
       [not found]   ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 0 replies; 3+ messages in thread
From: Liren Wei @ 2021-02-01 16:59 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 2/1/21 7:01 AM, Richard Henderson wrote:
 > Yes, this is a bug, because we are trying to support e.g. x86 which 
does not
 > require an icache flush.

That is too bad :(

I know nothing about the modern hardware, it's really hard to imagine what
is done in CPU to maintain the coherence when this kind of cross-modifying
scenario happens.

 > I think the page lock, the TLB_NOTDIRTY setting, and a possible sync 
on the
 > setting, needs to happen before the bytes are read during translation.
 > Otherwise we don't catch the case above, nor do we catch
 >
 >     CPU1                  CPU2
 >     ------------------    --------------------------
 >     TLB check -> fast
 >                           tb_gen_code() -> all of it
 >       write to ram
 >
 > Also because of x86 (and other architectures in which a single 
instruction can
 > span a page boundary), I think this lock+set+sync sequence needs to 
happen on
 > demand in something called from the function set defined in
 > include/exec/translator.h
 >
 > That also means that any target/cpu/ which has not been converted to 
use that
 > interface remains broken, and should be converted or deprecated.

I failed to figure out what do you mean by lock+set+sync, in particular:
   - What is the use of the page lock here (Is this the lock of PageDesc?)
   - Is the "possible sync" means some kind of wait so that TLB_NOTDIRTY is
     definitely able to catch further "write to ram"?

 > Are you planning to work on this?

No, sorry for that.. Neither do I see myself qualified enough to do this 
job,
nor do I have enough time for it. But I did considered the following:

Since "TLB check" and "fast path write to ram" are separate steps, it seems
to me that CPU1 can always (in the extreme case) enter the fast path before
CPU2 starts doing translation, and then write to already-translated code
of CPU2 without informing it.

Therefore maybe we can mark the RAM backing page in QEMU's page table as
non-writable at an early stage in tb_gen_code() using the ability of the
underlying OS, register a signal handler to intercept the first "write 
to ram"
happened, restore the page to be writable, and eventually inform the
translating thread to do something about it. (e.g. queue_work_on_cpu() and
cpu_exit() the translating vCPU so that it has chance to invalidate the TB
after possibly running that TB for several times)

But all these sounds very intrusive to the existing code base, and I'm not
sure whether it make sense...

Thanks
Liren Wei




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG?
       [not found]   ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com>
@ 2021-02-01 18:27     ` Richard Henderson
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Henderson @ 2021-02-01 18:27 UTC (permalink / raw)
  To: Liren Wei, qemu-devel

On 2/1/21 6:59 AM, Liren Wei wrote:
> On 2/1/21 7:01 AM, Richard Henderson wrote:
>> Yes, this is a bug, because we are trying to support e.g. x86 which does not
>> require an icache flush.
> 
> That is too bad :(
> 
> I know nothing about the modern hardware, it's really hard to imagine what
> is done in CPU to maintain the coherence when this kind of cross-modifying
> scenario happens.

Indeed.  Complicated cache shootdown stuff, no doubt.  It also helps that the
cpu is decoding one instruction at a time, whereas qemu is decoding many
instructions.

>> I think the page lock, the TLB_NOTDIRTY setting, and a possible sync on the
>> setting, needs to happen before the bytes are read during translation.
>> Otherwise we don't catch the case above, nor do we catch
>>
>>     CPU1                  CPU2
>>     ------------------    --------------------------
>>     TLB check -> fast
>>                           tb_gen_code() -> all of it
>>       write to ram
>>
>> Also because of x86 (and other architectures in which a single instruction can
>> span a page boundary), I think this lock+set+sync sequence needs to happen on
>> demand in something called from the function set defined in
>> include/exec/translator.h
>>
>> That also means that any target/cpu/ which has not been converted to use that
>> interface remains broken, and should be converted or deprecated.
> 
> I failed to figure out what do you mean by lock+set+sync,

I used "lock+set+sync" as shorthand for the sequence I described in the first
paragraph above.

> in particular:
>   - What is the use of the page lock here (Is this the lock of PageDesc?)

Yes, this would be the PageDesc lock.  It would prevent TLB_NOTDIRTY from being
removed by the slow-path write until any translation is complete.

>   - Is the "possible sync" means some kind of wait so that TLB_NOTDIRTY is
>     definitely able to catch further "write to ram"?

That's what the sync is for -- to make sure that no other cpu can have seen a
dirty page and not be finished with the write.

> Therefore maybe we can mark the RAM backing page in QEMU's page table as
> non-writable at an early stage in tb_gen_code() using the ability of the
> underlying OS...

It's very expensive to change page tables.  It's also complicated to capture
and re-start with signal handlers.  We do that sort of thing for
qemu-linux-user, where there *is* no slow path and we have no other option.

I think it would be much easier to reason with locks.


r~



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-02-01 18:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <60169742.1c69fb81.90ae8.cdc6SMTPIN_ADDED_BROKEN@mx.google.com>
2021-01-31 23:01 ` [QUESTION] tcg: Is concurrent storing and code translation of the same code page considered as racing in MTTCG? Richard Henderson
2021-02-01 16:59   ` Liren Wei
     [not found]   ` <60183365.1c69fb81.8afce.3d7bSMTPIN_ADDED_BROKEN@mx.google.com>
2021-02-01 18:27     ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.