[Qemu-devel] Making cputlb.c operations safe for MTTCG

* [Qemu-devel] Making cputlb.c operations safe for MTTCG
@ 2016-07-26 12:09 Alex Bennée
  2016-07-26 14:03 ` [Qemu-devel] [PATCH] cputlb: make tlb_reset_dirty " Alex Bennée
  2016-08-01 14:54 ` [Qemu-devel] Making cputlb.c operations " Paolo Bonzini
  0 siblings, 2 replies; 14+ messages in thread
From: Alex Bennée @ 2016-07-26 12:09 UTC (permalink / raw)
  To: MTTCG Devel, QEMU Developers, Paolo Bonzini, Richard Henderson,
	Sergey Fedorov, Emilio G. Cota, Frederic Konrad, a.rigo

Hi,

While I've been re-spinning the base patches I've brought forward some
of the async work for cputlb done on the ARM enabling set. Thanks to
Sergey's consolidation work we have a robust mechanism for halting all
vCPUs to get work done if we need to. The cputlb changes are actually
independent of any specific architecture fixes needed so it makes sense
to fix them all in the base set. This works well for the various
tlb_flush type operations.

Going through cputlb though I have come across one use case where
deferring the work until later seems like a potential bottleneck and
also introduces a potential race.

When we do code generation we use tlb_protect_code() to set the region
as DIRTY_MEMORY_CODE and update the SoftMMU TLB flags to force the slow
path if anything tries to write to areas which have generated code
blocks associated with them. This operation is intrinsically cross-vCPU
as any vCPU writing to the code needs to be trapped:

    static void tlb_reset_dirty_range_all(ram_addr_t start, ram_addr_t length)
    {
        CPUState *cpu;
        ram_addr_t start1;
        RAMBlock *block;
        ram_addr_t end;

        end = TARGET_PAGE_ALIGN(start + length);
        start &= TARGET_PAGE_MASK;

        rcu_read_lock();
        block = qemu_get_ram_block(start);
        assert(block == qemu_get_ram_block(end - 1));
        start1 = (uintptr_t)ramblock_ptr(block, start - block->offset);
        CPU_FOREACH(cpu) {
            tlb_reset_dirty(cpu, start1, length);
        }
        rcu_read_unlock();
    }

If we defer the updating of the other vCPUs to later we'll introduce a
potential race which while I'm sure would be tricky to hit could result
in for example a guest probe not getting picked up if placed just after
code generation.

As the eventual operation is the setting of a flag I'm wondering if we
can simply use atomic primitives to ensure we don't corrupt the lookup
address when setting the TLB_NOTDIRTY flag?

Of course the TLB structure itself covers a number of values but AFAICT
erroneously setting TLB_NOTDIRTY on a entry that gets updated to a new
address wouldn't cause a problem except triggering an additional
slow-path write. If we are careful about the filling of the TLB entries
can we be sure we are always safe?

I hope to have some patches to show by the end of the week.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 14+ messages in thread