Re: [Qemu-devel] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Qemu-devel] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
       [not found] <36e41adf-b0b3-3efa-51c4-f1a70cd05b98@ilande.co.uk>
@ 2017-03-13 21:28 ` Mark Cave-Ayland
       [not found] ` <87wpbsp49a.fsf@linaro.org>
  1 sibling, 0 replies; 25+ messages in thread
From: Mark Cave-Ayland @ 2017-03-13 21:28 UTC (permalink / raw)
  To: jan.kiszka, fred.konrad, cota, alex.bennee, rth, bobby.prani,
	qemu-devel, qemu-ppc

On 13/03/17 21:25, Mark Cave-Ayland wrote:

> I've recently noticed some video artifacts appearing in the form of
> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc
> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an
> example) which I've finally bisected down to this commit.
> 
> The horizontal lines tend to appear at different positions with
> different lengths, although it seems to be more common that a complete
> scanline is affected. Reproducing the issue is fairly easy with a MacOS
> 9.2.1 ISO and the command line below:
> 
> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99
> 
> Could it be that this patch is still missing a lock somewhere which
> causes these artifacts to appear?

(Apologies - resending as forgot to add CCs for qemu-devel and qemu-ppc)


ATB,

Mark.

^ permalink raw reply	[flat|nested] 25+ messages in thread

[parent not found: <87wpbsp49a.fsf@linaro.org>]

* Re: [Qemu-devel] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
       [not found] ` <87wpbsp49a.fsf@linaro.org>
@ 2017-03-14 11:12   ` Mark Cave-Ayland
  2017-03-14 13:56     ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan
  0 siblings, 1 reply; 25+ messages in thread
From: Mark Cave-Ayland @ 2017-03-14 11:12 UTC (permalink / raw)
  To: Alex Bennée
  Cc: jan.kiszka, fred.konrad, cota, rth, bobby.prani, qemu-devel, qemu-ppc

On 14/03/17 10:00, Alex Bennée wrote:

> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
> 
>> I've recently noticed some video artifacts appearing in the form of
>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc
>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an
>> example) which I've finally bisected down to this commit.
> 
> Interesting. Is the video hardware in this case a simple framebuffer or
> is there some sort of GPU involved?

For the mac99 machine it's just the normal QEMU PCI std-vga card, so
nothing special. Having run tests across SPARC which uses its own
framebuffer, there have been no obvious artifacts that I've spotted
there to date.

>> The horizontal lines tend to appear at different positions with
>> different lengths, although it seems to be more common that a complete
>> scanline is affected. Reproducing the issue is fairly easy with a MacOS
>> 9.2.1 ISO and the command line below:
>>
>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99
>>
>> Could it be that this patch is still missing a lock somewhere which
>> causes these artifacts to appear?
> 
> It depends. If the hardware is accessed via MMIO then the BQL is taken
> automatically (unless the area is explicitly marked as doing its own
> locking see: mr->global_locking).
> 
> Is the artefact temporary or permanent? Could it be a change in
> synchronisation between the emulator updating the framebuffer and
> whatever backend updating its display?

AFAICT they are temporary in that they appear randomly on the display,
but the next time that particular area of the display is updated by the
guest then they tend to either change/disappear.

MacOS 9 is interesting in that it's effectively an emulator within an
emulator, i.e. 68K code running on a PPC emulator and so does tend to
stress QEMU in interesting places. Given that I've also seen the
occasional issue with Darwin/OS X it seems the problem isn't just
constrained to OS 9, however it seems to occur a lot more often there.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 11:12   ` Mark Cave-Ayland
@ 2017-03-14 13:56     ` BALATON Zoltan
  2017-03-14 15:02       ` luigi burdo
  2017-03-14 15:41       ` Alex Bennée
  0 siblings, 2 replies; 25+ messages in thread
From: BALATON Zoltan @ 2017-03-14 13:56 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: Alex Bennée, jan.kiszka, qemu-devel, cota, qemu-ppc,
	bobby.prani, fred.konrad, rth

On Tue, 14 Mar 2017, Mark Cave-Ayland wrote:
> On 14/03/17 10:00, Alex Bennée wrote:
>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
>>
>>> I've recently noticed some video artifacts appearing in the form of
>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc
>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an
>>> example) which I've finally bisected down to this commit.
>>
>> Interesting. Is the video hardware in this case a simple framebuffer or
>> is there some sort of GPU involved?
>
> For the mac99 machine it's just the normal QEMU PCI std-vga card, so
> nothing special. Having run tests across SPARC which uses its own
> framebuffer, there have been no obvious artifacts that I've spotted
> there to date.
>
>>> The horizontal lines tend to appear at different positions with
>>> different lengths, although it seems to be more common that a complete
>>> scanline is affected. Reproducing the issue is fairly easy with a MacOS
>>> 9.2.1 ISO and the command line below:
>>>
>>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99
>>>
>>> Could it be that this patch is still missing a lock somewhere which
>>> causes these artifacts to appear?
>>
>> It depends. If the hardware is accessed via MMIO then the BQL is taken
>> automatically (unless the area is explicitly marked as doing its own
>> locking see: mr->global_locking).
>>
>> Is the artefact temporary or permanent? Could it be a change in
>> synchronisation between the emulator updating the framebuffer and
>> whatever backend updating its display?
>
> AFAICT they are temporary in that they appear randomly on the display,
> but the next time that particular area of the display is updated by the
> guest then they tend to either change/disappear.

I've also noticed artifacts when testing the SM501 changes with MorphOS on 
recent QEMU so it may not be related to just the std vga. It's shown as 
bands (larger number of adjacent scanlines) not updating when the client 
first wrote to the framebuffer but then they disappeared during the next 
refresh. I'm guessing it may be related to dirty checking of the 
framebuffer memory which is used to decide when a scan line needs update?

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 13:56     ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan
@ 2017-03-14 15:02       ` luigi burdo
  2017-03-14 15:41       ` Alex Bennée
  1 sibling, 0 replies; 25+ messages in thread
From: luigi burdo @ 2017-03-14 15:02 UTC (permalink / raw)
  To: BALATON Zoltan, Mark Cave-Ayland
  Cc: jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, rth,
	Alex Bennée, fred.konrad

framebuffer memory which is used to decide when a scan line needs update?

Hi Balton i have this artifacts since 2.20 on Linux PPC.
are more evident if the sdl driver is used, much less on GTK but still present there too plus GTK have issue with pointer if gl is enabled (reported in past in bugzilla)
I found it in STD when kvm is enabled on ppc32, virtio-vga gave this issue much less . (qemu-system-PPC64 have virtio enabled),
The strange thing is the std dont gave this issue on x86 emulation via tcg...

I hope one day we will have virtio on ppc32 too.

Luigi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 13:56     ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan
  2017-03-14 15:02       ` luigi burdo
@ 2017-03-14 15:41       ` Alex Bennée
  2017-03-14 15:52         ` Mark Cave-Ayland
  1 sibling, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2017-03-14 15:41 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Mark Cave-Ayland, jan.kiszka, qemu-devel, cota, qemu-ppc,
	bobby.prani, fred.konrad, rth


BALATON Zoltan <balaton@eik.bme.hu> writes:

> On Tue, 14 Mar 2017, Mark Cave-Ayland wrote:
>> On 14/03/17 10:00, Alex Bennée wrote:
>>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
>>>
>>>> I've recently noticed some video artifacts appearing in the form of
>>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc
>>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an
>>>> example) which I've finally bisected down to this commit.
>>>
>>> Interesting. Is the video hardware in this case a simple framebuffer or
>>> is there some sort of GPU involved?
>>
>> For the mac99 machine it's just the normal QEMU PCI std-vga card, so
>> nothing special. Having run tests across SPARC which uses its own
>> framebuffer, there have been no obvious artifacts that I've spotted
>> there to date.
>>
>>>> The horizontal lines tend to appear at different positions with
>>>> different lengths, although it seems to be more common that a complete
>>>> scanline is affected. Reproducing the issue is fairly easy with a MacOS
>>>> 9.2.1 ISO and the command line below:
>>>>
>>>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99
>>>>
>>>> Could it be that this patch is still missing a lock somewhere which
>>>> causes these artifacts to appear?
>>>
>>> It depends. If the hardware is accessed via MMIO then the BQL is taken
>>> automatically (unless the area is explicitly marked as doing its own
>>> locking see: mr->global_locking).
>>>
>>> Is the artefact temporary or permanent? Could it be a change in
>>> synchronisation between the emulator updating the framebuffer and
>>> whatever backend updating its display?
>>
>> AFAICT they are temporary in that they appear randomly on the display,
>> but the next time that particular area of the display is updated by the
>> guest then they tend to either change/disappear.
>
> I've also noticed artifacts when testing the SM501 changes with
> MorphOS on recent QEMU so it may not be related to just the std vga.
> It's shown as bands (larger number of adjacent scanlines) not updating
> when the client first wrote to the framebuffer but then they
> disappeared during the next refresh. I'm guessing it may be related to
> dirty checking of the framebuffer memory which is used to decide when
> a scan line needs update?

Interesting. I guess if the corrupted scan lines are in blocks of
PAGE_SIZE that might make sense.

Commit (b0706b716769494f321a0d2bfd9fa9893992f995) made
tlb_reset_dirty_range update the SoftMMU addr_write entry atomically.
Now I don't see how that could race in single threaded TCG mode but it
could explain things.

I notice that tlb_set_dirty/tlb_set_dirty1 should maybe do the same. The
currently assume they are only called from the CPU's context. If you
enable #define DEBUG_TLB in cputlb.c does the assert fire?

>
> Regards,
> BALATON Zoltan


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 15:41       ` Alex Bennée
@ 2017-03-14 15:52         ` Mark Cave-Ayland
  2017-03-14 16:48           ` Alex Bennée
  2017-03-15 14:16           ` Alex Bennée
  0 siblings, 2 replies; 25+ messages in thread
From: Mark Cave-Ayland @ 2017-03-14 15:52 UTC (permalink / raw)
  To: Alex Bennée, BALATON Zoltan
  Cc: jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth

On 14/03/17 15:41, Alex Bennée wrote:

> BALATON Zoltan <balaton@eik.bme.hu> writes:
> 
>> On Tue, 14 Mar 2017, Mark Cave-Ayland wrote:
>>> On 14/03/17 10:00, Alex Bennée wrote:
>>>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
>>>>
>>>>> I've recently noticed some video artifacts appearing in the form of
>>>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc
>>>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an
>>>>> example) which I've finally bisected down to this commit.
>>>>
>>>> Interesting. Is the video hardware in this case a simple framebuffer or
>>>> is there some sort of GPU involved?
>>>
>>> For the mac99 machine it's just the normal QEMU PCI std-vga card, so
>>> nothing special. Having run tests across SPARC which uses its own
>>> framebuffer, there have been no obvious artifacts that I've spotted
>>> there to date.
>>>
>>>>> The horizontal lines tend to appear at different positions with
>>>>> different lengths, although it seems to be more common that a complete
>>>>> scanline is affected. Reproducing the issue is fairly easy with a MacOS
>>>>> 9.2.1 ISO and the command line below:
>>>>>
>>>>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99
>>>>>
>>>>> Could it be that this patch is still missing a lock somewhere which
>>>>> causes these artifacts to appear?
>>>>
>>>> It depends. If the hardware is accessed via MMIO then the BQL is taken
>>>> automatically (unless the area is explicitly marked as doing its own
>>>> locking see: mr->global_locking).
>>>>
>>>> Is the artefact temporary or permanent? Could it be a change in
>>>> synchronisation between the emulator updating the framebuffer and
>>>> whatever backend updating its display?
>>>
>>> AFAICT they are temporary in that they appear randomly on the display,
>>> but the next time that particular area of the display is updated by the
>>> guest then they tend to either change/disappear.
>>
>> I've also noticed artifacts when testing the SM501 changes with
>> MorphOS on recent QEMU so it may not be related to just the std vga.
>> It's shown as bands (larger number of adjacent scanlines) not updating
>> when the client first wrote to the framebuffer but then they
>> disappeared during the next refresh. I'm guessing it may be related to
>> dirty checking of the framebuffer memory which is used to decide when
>> a scan line needs update?
> 
> Interesting. I guess if the corrupted scan lines are in blocks of
> PAGE_SIZE that might make sense.
> 
> Commit (b0706b716769494f321a0d2bfd9fa9893992f995) made
> tlb_reset_dirty_range update the SoftMMU addr_write entry atomically.
> Now I don't see how that could race in single threaded TCG mode but it
> could explain things.
> 
> I notice that tlb_set_dirty/tlb_set_dirty1 should maybe do the same. The
> currently assume they are only called from the CPU's context. If you
> enable #define DEBUG_TLB in cputlb.c does the assert fire?

Not in a quick local test here. However this does ring a bell - one of
Ben's PPC optimisations was to batch TLB flushes so they only occur at
certain times - see commit cd0c6f473532bfaf20a095bc90a18e45162981b5 for
more detail.

Could it be that these underlying assumptions have now changed?


ATB,

Mark.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 15:52         ` Mark Cave-Ayland
@ 2017-03-14 16:48           ` Alex Bennée
  2017-03-14 17:34             ` BALATON Zoltan
  2017-03-15 14:16           ` Alex Bennée
  1 sibling, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2017-03-14 16:48 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc,
	bobby.prani, fred.konrad, rth


Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:

> On 14/03/17 15:41, Alex Bennée wrote:
>
>> BALATON Zoltan <balaton@eik.bme.hu> writes:
>>
>>> On Tue, 14 Mar 2017, Mark Cave-Ayland wrote:
>>>> On 14/03/17 10:00, Alex Bennée wrote:
>>>>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
>>>>>
>>>>>> I've recently noticed some video artifacts appearing in the form of
>>>>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc
>>>>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an
>>>>>> example) which I've finally bisected down to this commit.
>>>>>
>>>>> Interesting. Is the video hardware in this case a simple framebuffer or
>>>>> is there some sort of GPU involved?
>>>>
>>>> For the mac99 machine it's just the normal QEMU PCI std-vga card, so
>>>> nothing special. Having run tests across SPARC which uses its own
>>>> framebuffer, there have been no obvious artifacts that I've spotted
>>>> there to date.
>>>>
>>>>>> The horizontal lines tend to appear at different positions with
>>>>>> different lengths, although it seems to be more common that a complete
>>>>>> scanline is affected. Reproducing the issue is fairly easy with a MacOS
>>>>>> 9.2.1 ISO and the command line below:
>>>>>>
>>>>>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99
>>>>>>
>>>>>> Could it be that this patch is still missing a lock somewhere which
>>>>>> causes these artifacts to appear?
>>>>>
>>>>> It depends. If the hardware is accessed via MMIO then the BQL is taken
>>>>> automatically (unless the area is explicitly marked as doing its own
>>>>> locking see: mr->global_locking).
>>>>>
>>>>> Is the artefact temporary or permanent? Could it be a change in
>>>>> synchronisation between the emulator updating the framebuffer and
>>>>> whatever backend updating its display?
>>>>
>>>> AFAICT they are temporary in that they appear randomly on the display,
>>>> but the next time that particular area of the display is updated by the
>>>> guest then they tend to either change/disappear.
>>>
>>> I've also noticed artifacts when testing the SM501 changes with
>>> MorphOS on recent QEMU so it may not be related to just the std vga.
>>> It's shown as bands (larger number of adjacent scanlines) not updating
>>> when the client first wrote to the framebuffer but then they
>>> disappeared during the next refresh. I'm guessing it may be related to
>>> dirty checking of the framebuffer memory which is used to decide when
>>> a scan line needs update?
>>
>> Interesting. I guess if the corrupted scan lines are in blocks of
>> PAGE_SIZE that might make sense.
>>
>> Commit (b0706b716769494f321a0d2bfd9fa9893992f995) made
>> tlb_reset_dirty_range update the SoftMMU addr_write entry atomically.
>> Now I don't see how that could race in single threaded TCG mode but it
>> could explain things.
>>
>> I notice that tlb_set_dirty/tlb_set_dirty1 should maybe do the same. The
>> currently assume they are only called from the CPU's context. If you
>> enable #define DEBUG_TLB in cputlb.c does the assert fire?
>
> Not in a quick local test here.

I've just realised that assert will never fire because all vCPUS in
single threaded mode share the same thread id:

cpus.c/qemu_tcg_init_vcpu:

        /* For non-MTTCG cases we share the thread */
        cpu->thread = single_tcg_cpu_thread;
        cpu->halt_cond = single_tcg_halt_cond;

> However this does ring a bell - one of
> Ben's PPC optimisations was to batch TLB flushes so they only occur at
> certain times - see commit cd0c6f473532bfaf20a095bc90a18e45162981b5 for
> more detail.
>
> Could it be that these underlying assumptions have now changed?

Hmm I'm not sure.

So from a single-threaded -smp guest case there should be no difference
in behaviour. However cross-vCPU flushes are queued up using the async
work queue and are dealt with in the target vCPU context. In the single
threaded case it shouldn't matter as this work will get executed as soon
as round-robin scheduler gets to it:

cpus.c/qemu_tcg_rr_cpu_thread_fn:
  while (cpu && !cpu->queued_work_first && !cpu->exit_request) {

When converting a target to MTTCG its certainly something that has to
have attention paid to it. For example some cross-vCPU tlb flushes need
to be complete from the source vCPU point of view. In this case you call
the tlb_flush_*_synced() variants and exit the execution loop. This
ensures all vCPUs have completed flushes before we continue. See
a67cf2772733e for what I did on ARM. However this shouldn't affect
anything in the single-threaded world.

However delaying tlb_flushes() could certainly expose/hide stuff that is
accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to
avoid racing with the TB invalidation logic. The act of the flush will
certainly wipe all existing SoftMMU entries and force a re-load on each
memory access.

So is the dirty status of memory being read from outside a vCPU
execution context?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 16:48           ` Alex Bennée
@ 2017-03-14 17:34             ` BALATON Zoltan
  2017-03-14 17:53               ` luigi burdo
                                 ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: BALATON Zoltan @ 2017-03-14 17:34 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Mark Cave-Ayland, jan.kiszka, qemu-devel, cota, qemu-ppc,
	bobby.prani, fred.konrad, rth

On Tue, 14 Mar 2017, Alex Bennée wrote:
> So from a single-threaded -smp guest case there should be no difference
> in behaviour. However cross-vCPU flushes are queued up using the async
> work queue and are dealt with in the target vCPU context. In the single
> threaded case it shouldn't matter as this work will get executed as soon
> as round-robin scheduler gets to it:
>
> cpus.c/qemu_tcg_rr_cpu_thread_fn:
>  while (cpu && !cpu->queued_work_first && !cpu->exit_request) {
>
> When converting a target to MTTCG its certainly something that has to
> have attention paid to it. For example some cross-vCPU tlb flushes need
> to be complete from the source vCPU point of view. In this case you call
> the tlb_flush_*_synced() variants and exit the execution loop. This
> ensures all vCPUs have completed flushes before we continue. See
> a67cf2772733e for what I did on ARM. However this shouldn't affect
> anything in the single-threaded world.

I think we have a single CPU and thread for these ppc machines here so I'm 
not sure how this could be relevant.

> However delaying tlb_flushes() could certainly expose/hide stuff that is
> accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to
> avoid racing with the TB invalidation logic. The act of the flush will
> certainly wipe all existing SoftMMU entries and force a re-load on each
> memory access.
>
> So is the dirty status of memory being read from outside a vCPU
> execution context?

Like from the display controller models that use memory_region_get_dirty() 
to check if the frambuffer needs to be updated? But all display adaptors 
seem to do this and the problem was only seem on ppc so it may be related 
to something ppc specific.

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 17:34             ` BALATON Zoltan
@ 2017-03-14 17:53               ` luigi burdo
  2017-03-15 11:14               ` Alex Bennée
  2017-03-16  6:39               ` Paolo Bonzini
  2 siblings, 0 replies; 25+ messages in thread
From: luigi burdo @ 2017-03-14 17:53 UTC (permalink / raw)
  To: BALATON Zoltan, Alex Bennée
  Cc: jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad

Zoltan,

i tested on ppcel  debian there isnt issue reported . i think is only a BE issue.

Luigi

Like from the display controller models that use memory_region_get_dirty()
to check if the frambuffer needs to be updated? But all display adaptors
seem to do this and the problem was only seem on ppc so it may be related
to something ppc specific.

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 17:34             ` BALATON Zoltan
  2017-03-14 17:53               ` luigi burdo
@ 2017-03-15 11:14               ` Alex Bennée
  2017-03-15 13:26                 ` Mark Cave-Ayland
  2017-03-16  6:39               ` Paolo Bonzini
  2 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2017-03-15 11:14 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Mark Cave-Ayland, jan.kiszka, qemu-devel, cota, qemu-ppc,
	bobby.prani, fred.konrad, rth


BALATON Zoltan <balaton@eik.bme.hu> writes:

> On Tue, 14 Mar 2017, Alex Bennée wrote:
>> So from a single-threaded -smp guest case there should be no difference
>> in behaviour.
<snip>
>>However this shouldn't affect
>> anything in the single-threaded world.
>
> I think we have a single CPU and thread for these ppc machines here so
> I'm not sure how this could be relevant.
>
>> However delaying tlb_flushes() could certainly expose/hide stuff that is
>> accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to
>> avoid racing with the TB invalidation logic. The act of the flush will
>> certainly wipe all existing SoftMMU entries and force a re-load on each
>> memory access.
>>
>> So is the dirty status of memory being read from outside a vCPU
>> execution context?
>
> Like from the display controller models that use
> memory_region_get_dirty() to check if the frambuffer needs to be
> updated? But all display adaptors seem to do this and the problem was
> only seem on ppc so it may be related to something ppc specific.

So this accesses the memory_region API which is under RCU control.
AFAIUI this should mean the dirty status may be read late (e.g. next
update) but should never be incorrect (e.g. miss a dirtying operation).

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-15 11:14               ` Alex Bennée
@ 2017-03-15 13:26                 ` Mark Cave-Ayland
  2017-03-15 14:19                   ` Alex Bennée
  0 siblings, 1 reply; 25+ messages in thread
From: Mark Cave-Ayland @ 2017-03-15 13:26 UTC (permalink / raw)
  To: Alex Bennée, BALATON Zoltan
  Cc: jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad

On 15/03/17 11:14, Alex Bennée wrote:

> BALATON Zoltan <balaton@eik.bme.hu> writes:
> 
>> On Tue, 14 Mar 2017, Alex Bennée wrote:
>>> So from a single-threaded -smp guest case there should be no difference
>>> in behaviour.
> <snip>
>>> However this shouldn't affect
>>> anything in the single-threaded world.
>>
>> I think we have a single CPU and thread for these ppc machines here so
>> I'm not sure how this could be relevant.
>>
>>> However delaying tlb_flushes() could certainly expose/hide stuff that is
>>> accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to
>>> avoid racing with the TB invalidation logic. The act of the flush will
>>> certainly wipe all existing SoftMMU entries and force a re-load on each
>>> memory access.
>>>
>>> So is the dirty status of memory being read from outside a vCPU
>>> execution context?
>>
>> Like from the display controller models that use
>> memory_region_get_dirty() to check if the frambuffer needs to be
>> updated? But all display adaptors seem to do this and the problem was
>> only seem on ppc so it may be related to something ppc specific.
> 
> So this accesses the memory_region API which is under RCU control.
> AFAIUI this should mean the dirty status may be read late (e.g. next
> update) but should never be incorrect (e.g. miss a dirtying operation).

AFAICT check_tlb_flush() gets passed a global parameter which if set
true invalidates the TLB across all CPU TLBs rather than just the local
CPU TLB - but then in this case we're only running with a single CPU so
I can't see how this is relevant.

Have you been able to reproduce the artifacts locally at all? I'm
wondering if once the icount fixup patches are in, it might be easier to
debug if enabling icount causes the artifacts to appear in a
deterministic manner.


ATB,

Mark.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-15 13:26                 ` Mark Cave-Ayland
@ 2017-03-15 14:19                   ` Alex Bennée
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2017-03-15 14:19 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc,
	bobby.prani, rth, fred.konrad


Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:

> On 15/03/17 11:14, Alex Bennée wrote:
>
>> BALATON Zoltan <balaton@eik.bme.hu> writes:
>>
>>> On Tue, 14 Mar 2017, Alex Bennée wrote:
>>>> So from a single-threaded -smp guest case there should be no difference
>>>> in behaviour.
>> <snip>
>>>> However this shouldn't affect
>>>> anything in the single-threaded world.
>>>
>>> I think we have a single CPU and thread for these ppc machines here so
>>> I'm not sure how this could be relevant.
>>>
>>>> However delaying tlb_flushes() could certainly expose/hide stuff that is
>>>> accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to
>>>> avoid racing with the TB invalidation logic. The act of the flush will
>>>> certainly wipe all existing SoftMMU entries and force a re-load on each
>>>> memory access.
>>>>
>>>> So is the dirty status of memory being read from outside a vCPU
>>>> execution context?
>>>
>>> Like from the display controller models that use
>>> memory_region_get_dirty() to check if the frambuffer needs to be
>>> updated? But all display adaptors seem to do this and the problem was
>>> only seem on ppc so it may be related to something ppc specific.
>>
>> So this accesses the memory_region API which is under RCU control.
>> AFAIUI this should mean the dirty status may be read late (e.g. next
>> update) but should never be incorrect (e.g. miss a dirtying operation).
>
> AFAICT check_tlb_flush() gets passed a global parameter which if set
> true invalidates the TLB across all CPU TLBs rather than just the local
> CPU TLB - but then in this case we're only running with a single CPU so
> I can't see how this is relevant.

Not quite. tlb_flush used to take a global flag but it ignored it. It
has been removed in recent updates to the cputlb API ;-)

> Have you been able to reproduce the artifacts locally at all? I'm
> wondering if once the icount fixup patches are in, it might be easier to
> debug if enabling icount causes the artifacts to appear in a
> deterministic manner.

I have, and I can make them go away as well by forcing a full update.
See my other longer email for a description of what I think is
happening. Now I just need a neat and upstreamable fix ;-)

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 17:34             ` BALATON Zoltan
  2017-03-14 17:53               ` luigi burdo
  2017-03-15 11:14               ` Alex Bennée
@ 2017-03-16  6:39               ` Paolo Bonzini
  2017-03-16  7:51                 ` Alex Bennée
  2 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2017-03-16  6:39 UTC (permalink / raw)
  To: BALATON Zoltan, Alex Bennée
  Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc,
	bobby.prani, rth, fred.konrad



On 14/03/2017 18:34, BALATON Zoltan wrote:
> Like from the display controller models that use
> memory_region_get_dirty() to check if the frambuffer needs to be
> updated? But all display adaptors seem to do this and the problem was
> only seem on ppc so it may be related to something ppc specific.

You need to use test_and_clear_dirty instead of get_dirty/reset_dirty.
Or alternatively you need to reset immediately after get_dirty.  At
least cg3.c is doing

	read dirty bitmap
	read VRAM
	clear dirty bitmap

which has a race.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-16  6:39               ` Paolo Bonzini
@ 2017-03-16  7:51                 ` Alex Bennée
  2017-03-16  8:34                   ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2017-03-16  7:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: BALATON Zoltan, jan.kiszka, Mark Cave-Ayland, qemu-devel, cota,
	qemu-ppc, bobby.prani, rth, fred.konrad


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 14/03/2017 18:34, BALATON Zoltan wrote:
>> Like from the display controller models that use
>> memory_region_get_dirty() to check if the frambuffer needs to be
>> updated? But all display adaptors seem to do this and the problem was
>> only seem on ppc so it may be related to something ppc specific.
>
> You need to use test_and_clear_dirty instead of get_dirty/reset_dirty.
> Or alternatively you need to reset immediately after get_dirty.  At
> least cg3.c is doing
>
> 	read dirty bitmap
> 	read VRAM
> 	clear dirty bitmap
>
> which has a race.

Are you saying this is also racy also in the KVM case or just that TCG
doesn't currently sync up with the current dirty bitmap mechanism?

AIUI the memory regions are under RCU so you always get a consistent
view (with updates after you take a copy going to the next iteration).
What I think needs doing is hooking into the ->log-sync mechanism to
reset SoftMMU TLB entries so the dirty detection carries on for the next
sync point?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-16  7:51                 ` Alex Bennée
@ 2017-03-16  8:34                   ` Paolo Bonzini
  2017-03-16 15:34                     ` Gerd Hoffmann
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2017-03-16  8:34 UTC (permalink / raw)
  To: Alex Bennée
  Cc: BALATON Zoltan, jan.kiszka, Mark Cave-Ayland, qemu-devel, cota,
	qemu-ppc, bobby.prani, rth, fred.konrad



On 16/03/2017 08:51, Alex Bennée wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 14/03/2017 18:34, BALATON Zoltan wrote:
>>> Like from the display controller models that use
>>> memory_region_get_dirty() to check if the frambuffer needs to be
>>> updated? But all display adaptors seem to do this and the problem was
>>> only seem on ppc so it may be related to something ppc specific.
>>
>> You need to use test_and_clear_dirty instead of get_dirty/reset_dirty.
>> Or alternatively you need to reset immediately after get_dirty.  At
>> least cg3.c is doing
>>
>> 	read dirty bitmap
>> 	read VRAM
>> 	clear dirty bitmap
>>
>> which has a race.
> 
> Are you saying this is also racy also in the KVM case or just that TCG
> doesn't currently sync up with the current dirty bitmap mechanism?

It's okay for KVM because the dirty bitmap is copied from KVM by the
device itself, before updating the screen (with
memory_region_sync_dirty_bitmap).  For TCG, on the other hand, there is
full concurrency between the CPU that sets the bits and the device that
clears them.

> AIUI the memory regions are under RCU so you always get a consistent
> view (with updates after you take a copy going to the next iteration).

No, RCU only protects against resizes of the bitmap.  The bitmap is not
copied on every access (of course :)).

> What I think needs doing is hooking into the ->log-sync mechanism to
> reset SoftMMU TLB entries so the dirty detection carries on for the next
> sync point?

It's much simpler than that, just clear the dirty bitmap bit before
reading the memory.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-16  8:34                   ` Paolo Bonzini
@ 2017-03-16 15:34                     ` Gerd Hoffmann
  2017-03-16 17:00                       ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gerd Hoffmann @ 2017-03-16 15:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alex Bennée, jan.kiszka, Mark Cave-Ayland, qemu-devel, cota,
	qemu-ppc, bobby.prani, fred.konrad, rth

  Hi,

> >> least cg3.c is doing
> >>
> >> 	read dirty bitmap
> >> 	read VRAM
> >> 	clear dirty bitmap
> >>
> >> which has a race.

> It's much simpler than that, just clear the dirty bitmap bit before
> reading the memory.

Well, not *that* simple.  vga checks the dirty bitmap with scanline
granularity, like that:

  foreach (scanline) {
     if (get_dirty(scanline))
        update_scanline()
  }
  reset_dirty(framebuffer)

I suspect simply transforming that to

  foreach (scanline) {
     if (test_and_clear_dirty(scanline))
       update_scanline()
  }

is not going to fly due to page tracking working with page granularity.
With two subsequent scanlines within one page the second scanline will
never be updated because updating first clears the dirty bit of the
page ...

Looping twice over all scanlines, with the first loop just figuring
which scanlines are modified, then clear dirty bits, then update in a
second loop should work I think.  It'll duplicate a bunch of code
though, because in reality the loop isn't just three lines because of
doublescan, interlave and other funky stuff coming from CGA
compatibility.

Given that probably pretty much every display adapter is affected I'd
tend to take Alex patch for 2.9, then sort the mess in the 2.10 devel
cycle and revert the patch when done.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-16 15:34                     ` Gerd Hoffmann
@ 2017-03-16 17:00                       ` Paolo Bonzini
  2017-03-28 13:40                         ` Gerd Hoffmann
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2017-03-16 17:00 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc,
	bobby.prani, rth, Alex Bennée, fred.konrad



On 16/03/2017 16:34, Gerd Hoffmann wrote:
> Well, not *that* simple.  vga checks the dirty bitmap with scanline
> granularity, like that:
> 
>   foreach (scanline) {
>      if (get_dirty(scanline))
>         update_scanline()
>   }
>   reset_dirty(framebuffer)
> 
> I suspect simply transforming that to
> 
>   foreach (scanline) {
>      if (test_and_clear_dirty(scanline))
>        update_scanline()
>   }
> 
> is not going to fly due to page tracking working with page granularity.
> With two subsequent scanlines within one page the second scanline will
> never be updated because updating first clears the dirty bit of the
> page ...
> 
> Looping twice over all scanlines, with the first loop just figuring
> which scanlines are modified, then clear dirty bits, then update in a
> second loop should work I think.  It'll duplicate a bunch of code
> though, because in reality the loop isn't just three lines because of
> doublescan, interlave and other funky stuff coming from CGA
> compatibility.
> 
> Given that probably pretty much every display adapter is affected I'd
> tend to take Alex patch for 2.9, then sort the mess in the 2.10 devel
> cycle and revert the patch when done.

You're right; an alternative is to copy the dirty bitmap to a local one
and clear the global one (the dirty bitmap for an 8 MB full HD frame
buffer is just 256 bytes).  With the right API to abstract the job, it
should be relatively easy to fix all adapters.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-16 17:00                       ` Paolo Bonzini
@ 2017-03-28 13:40                         ` Gerd Hoffmann
  2017-03-28 14:13                           ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gerd Hoffmann @ 2017-03-28 13:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc,
	bobby.prani, fred.konrad, Alex Bennée, rth

  Hi,

> > Well, not *that* simple.  vga checks the dirty bitmap with scanline
> > granularity, like that:
> > 
> >   foreach (scanline) {
> >      if (get_dirty(scanline))
> >         update_scanline()
> >   }
> >   reset_dirty(framebuffer)
> > 
> > I suspect simply transforming that to
> > 
> >   foreach (scanline) {
> >      if (test_and_clear_dirty(scanline))
> >        update_scanline()
> >   }
> > 
> > is not going to fly due to page tracking working with page granularity.
> > With two subsequent scanlines within one page the second scanline will
> > never be updated because updating first clears the dirty bit of the
> > page ...

> You're right; an alternative is to copy the dirty bitmap to a local one
> and clear the global one (the dirty bitmap for an 8 MB full HD frame
> buffer is just 256 bytes).  With the right API to abstract the job, it
> should be relatively easy to fix all adapters.

Started looking into this.  So I guess you are thinking about a variant
of memory_region_test_and_clear_dirty and/or
cpu_physical_memory_test_and_clear_dirty
which returns a bitmap instead of a bool?  Plus some helper function
which use the returned bitmap to figure whenever a specific scanline has
been touched or not?

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-28 13:40                         ` Gerd Hoffmann
@ 2017-03-28 14:13                           ` Paolo Bonzini
  0 siblings, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2017-03-28 14:13 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc,
	bobby.prani, fred.konrad, Alex Bennée, rth



On 28/03/2017 15:40, Gerd Hoffmann wrote:
>   Hi,
> 
>>> Well, not *that* simple.  vga checks the dirty bitmap with scanline
>>> granularity, like that:
>>>
>>>   foreach (scanline) {
>>>      if (get_dirty(scanline))
>>>         update_scanline()
>>>   }
>>>   reset_dirty(framebuffer)
>>>
>>> I suspect simply transforming that to
>>>
>>>   foreach (scanline) {
>>>      if (test_and_clear_dirty(scanline))
>>>        update_scanline()
>>>   }
>>>
>>> is not going to fly due to page tracking working with page granularity.
>>> With two subsequent scanlines within one page the second scanline will
>>> never be updated because updating first clears the dirty bit of the
>>> page ...
> 
>> You're right; an alternative is to copy the dirty bitmap to a local one
>> and clear the global one (the dirty bitmap for an 8 MB full HD frame
>> buffer is just 256 bytes).  With the right API to abstract the job, it
>> should be relatively easy to fix all adapters.
> 
> Started looking into this.  So I guess you are thinking about a variant
> of memory_region_test_and_clear_dirty and/or
> cpu_physical_memory_test_and_clear_dirty
> which returns a bitmap instead of a bool?  Plus some helper function
> which use the returned bitmap to figure whenever a specific scanline has
> been touched or not?

Yes, exactly.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-14 15:52         ` Mark Cave-Ayland
  2017-03-14 16:48           ` Alex Bennée
@ 2017-03-15 14:16           ` Alex Bennée
  2017-03-15 15:25             ` Gerd Hoffmann
  1 sibling, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2017-03-15 14:16 UTC (permalink / raw)
  To: Mark Cave-Ayland
  Cc: BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc,
	bobby.prani, fred.konrad, rth, Gerd Hoffmann

Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:

> On 14/03/17 15:41, Alex Bennée wrote:
>
>> BALATON Zoltan <balaton@eik.bme.hu> writes:
>>
>>> On Tue, 14 Mar 2017, Mark Cave-Ayland wrote:
>>>> On 14/03/17 10:00, Alex Bennée wrote:
>>>>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:
>>>>>
>>>>>> I've recently noticed some video artifacts appearing in the form of
>>>>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc
>>>>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an
>>>>>> example) which I've finally bisected down to this commit.
<snip>
>
> Could it be that these underlying assumptions have now changed?

So there is an assumption that has changed. It used to be the TCG code
and the gui_update code could not run at the same time as they both
needed to take the BQL.

When drop global lock patch went in this was reversed. The BQL is no
longer held while running the TCG until it needs it.

It certainly needs it when it is doing HW emulation and for MMIO mapped
registers this is what happens. Every time we write to a MMIO register
the SoftMMU helper routines follow the slow path, through the memory API
(which takes a lock) and into the hw emulation code. However for
framebuffers our behaviour is slightly different.

Instead of having MMIO register spaces we use the dirty tracking
mechanism. Here regions are marked as for dirty tracking and when the
SoftMMU helper first comes to this bit of memory it will follow the slow
path and mark region as visited. Once done this bit is cleared and all
future writes to that page are written directly from the translated
code. This no longer has an implicit synchronisation from the BQL so
there is now a race and you can have memory being updated which might
miss this flagging.

I've sort of confirmed that by hacking full_update = 1 in
vga_update_display and the artefacts go away.

Note that KVM has some similar hacks to avoid trapping all writes to
video memory with its coalesced mmio mechanism however I'm not familiar
with all the details.

Now there are mechanisms we can use to ensure there are no races happen
and return to the situation that the display is only updated when the
TCG cores are not running. We simply queue up update function to run as
async safe work which will guarantee the TCG vCPUs will not be running
while stuff is updated. I'm a little hesitant to do that globally as it
may well slow down KVM by introducing an unneeded pause.

This also involves making a tweak into the ui/console.c code
(gui_update) so I'm CC'ing Gerd who might have an insight on how to do
this neatly.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-15 14:16           ` Alex Bennée
@ 2017-03-15 15:25             ` Gerd Hoffmann
  2017-03-15 16:20               ` Alex Bennée
  0 siblings, 1 reply; 25+ messages in thread
From: Gerd Hoffmann @ 2017-03-15 15:25 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Mark Cave-Ayland, BALATON Zoltan, jan.kiszka, qemu-devel, cota,
	qemu-ppc, bobby.prani, fred.konrad, rth

  Hi,

> Instead of having MMIO register spaces we use the dirty tracking
> mechanism. Here regions are marked as for dirty tracking and when the
> SoftMMU helper first comes to this bit of memory it will follow the slow
> path and mark region as visited.

visited?  Do you mean dirty bit is set for the page?  Or is this
something else?

> Once done this bit is cleared and all
> future writes to that page are written directly from the translated
> code. This no longer has an implicit synchronisation from the BQL so
> there is now a race and you can have memory being updated which might
> miss this flagging.

Not fully following what you are trying to say.  But pages being updated
without dirty bit getting set (if clear) certainly is a problem for the
vga emulation (and live migration too).

> Note that KVM has some similar hacks to avoid trapping all writes to
> video memory with its coalesced mmio mechanism however I'm not familiar
> with all the details.

Normal linear framebuffer access doesn't use this.

> Now there are mechanisms we can use to ensure there are no races happen
> and return to the situation that the display is only updated when the
> TCG cores are not running.

tcg and display updates running in parallel isn't a problem, we have
that with kvm anyway.  Dirty bit handling must be correct though.

With kvm at the start of each display update vga fetches the dirty
bitmap from the kernel (memory_region_sync_dirty_bitmap).  Then it goes
use memory_region_get_dirty to figure which pages have been touched.

When memory_region_sync_dirty_bitmap is called the kernel will clear the
memory bitmap of the region and also map all pages read-only.  Next
guest update will pagefault and the kernel can set the dirty bit for the
page (maybe there is a more efficient way with EPT available).

I suspect the memory_region_sync_dirty_bitmap call on tcg should reset
the fast path optimization, so the slow path can update the dirty bits
correctly.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-15 15:25             ` Gerd Hoffmann
@ 2017-03-15 16:20               ` Alex Bennée
  2017-03-16  7:35                 ` Gerd Hoffmann
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2017-03-15 16:20 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Mark Cave-Ayland, BALATON Zoltan, jan.kiszka, qemu-devel, cota,
	qemu-ppc, bobby.prani, fred.konrad, rth


Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> Instead of having MMIO register spaces we use the dirty tracking
>> mechanism. Here regions are marked as for dirty tracking and when the
>> SoftMMU helper first comes to this bit of memory it will follow the slow
>> path and mark region as visited.
>
> visited?  Do you mean dirty bit is set for the page?  Or is this
> something else?

Yes the dirty page bit (or the clearing of the TLB_NOTDIRTY bit from the
SoftMMU entry).

>
>> Once done this bit is cleared and all
>> future writes to that page are written directly from the translated
>> code. This no longer has an implicit synchronisation from the BQL so
>> there is now a race and you can have memory being updated which might
>> miss this flagging.
>
> Not fully following what you are trying to say.  But pages being updated
> without dirty bit getting set (if clear) certainly is a problem for the
> vga emulation (and live migration too).
>
>> Note that KVM has some similar hacks to avoid trapping all writes to
>> video memory with its coalesced mmio mechanism however I'm not familiar
>> with all the details.
>
> Normal linear framebuffer access doesn't use this.

Ahh OK - as I said I wasn't super familiar with what coalesced mmio was
trying to achieve. I assume it is trying to avoid trapping on every
single MMIO access?

>
>> Now there are mechanisms we can use to ensure there are no races happen
>> and return to the situation that the display is only updated when the
>> TCG cores are not running.
>
> tcg and display updates running in parallel isn't a problem, we have
> that with kvm anyway.  Dirty bit handling must be correct though.
>
> With kvm at the start of each display update vga fetches the dirty
> bitmap from the kernel (memory_region_sync_dirty_bitmap).  Then it goes
> use memory_region_get_dirty to figure which pages have been touched.
>
> When memory_region_sync_dirty_bitmap is called the kernel will clear the
> memory bitmap of the region and also map all pages read-only.  Next
> guest update will pagefault and the kernel can set the dirty bit for the
> page (maybe there is a more efficient way with EPT available).
>
> I suspect the memory_region_sync_dirty_bitmap call on tcg should reset
> the fast path optimization, so the slow path can update the dirty bits
> correctly.

Sure. And for the low level cputlb implementation we can clear those
bits atomically. However when the memory region is synced we also need
to flush any entries in the TLB that have already been hit and cleared
TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from
outside of a vCPU context because we can't just queue the work and exit
the vCPU run loop to reach a known CPU state.

The RFC PATCH I sent earlier solves this by ensuring the update runs in
a quiescent period (i.e. when the vCPUs are not running) but it is
sub-optimal as it means the display code has to have a basic knowledge
of vCPUs and when they run.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-15 16:20               ` Alex Bennée
@ 2017-03-16  7:35                 ` Gerd Hoffmann
  2017-03-16  7:56                   ` Alex Bennée
  2017-03-16 14:22                   ` Paolo Bonzini
  0 siblings, 2 replies; 25+ messages in thread
From: Gerd Hoffmann @ 2017-03-16  7:35 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Mark Cave-Ayland, BALATON Zoltan, jan.kiszka, qemu-devel, cota,
	qemu-ppc, bobby.prani, fred.konrad, rth

  Hi,

> >> Note that KVM has some similar hacks to avoid trapping all writes to
> >> video memory with its coalesced mmio mechanism however I'm not familiar
> >> with all the details.
> >
> > Normal linear framebuffer access doesn't use this.
> 
> Ahh OK - as I said I wasn't super familiar with what coalesced mmio was
> trying to achieve. I assume it is trying to avoid trapping on every
> single MMIO access?

You can't avoid the trap, but you can avoid notifying userspace about
each and every single trap and batch things.

But vga framebuffer is normal ram in most video modes, it doesn't trap
(except on first access with dirty bit clear, for dirty bit tracking).  

IIRC some of the historic video modes which are pretty much unused these
days (planar 4bit) trap on each access and perform better with coalesced
mmio.  Also nics which use mmio instead of dma for package xfer.

> > I suspect the memory_region_sync_dirty_bitmap call on tcg should reset
> > the fast path optimization, so the slow path can update the dirty bits
> > correctly.
> 
> Sure. And for the low level cputlb implementation we can clear those
> bits atomically. However when the memory region is synced we also need
> to flush any entries in the TLB that have already been hit and cleared
> TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from
> outside of a vCPU context because we can't just queue the work and exit
> the vCPU run loop to reach a known CPU state.

Hmm, ok.

> The RFC PATCH I sent earlier solves this by ensuring the update runs in
> a quiescent period (i.e. when the vCPUs are not running) but it is
> sub-optimal as it means the display code has to have a basic knowledge
> of vCPUs and when they run.

Still the best option for 2.9 I guess ...

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-16  7:35                 ` Gerd Hoffmann
@ 2017-03-16  7:56                   ` Alex Bennée
  2017-03-16 14:22                   ` Paolo Bonzini
  1 sibling, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2017-03-16  7:56 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Mark Cave-Ayland, BALATON Zoltan, jan.kiszka, qemu-devel, cota,
	qemu-ppc, bobby.prani, fred.konrad, rth


Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> >> Note that KVM has some similar hacks to avoid trapping all writes to
>> >> video memory with its coalesced mmio mechanism however I'm not familiar
>> >> with all the details.
>> >
>> > Normal linear framebuffer access doesn't use this.
>>
>> Ahh OK - as I said I wasn't super familiar with what coalesced mmio was
>> trying to achieve. I assume it is trying to avoid trapping on every
>> single MMIO access?
>
> You can't avoid the trap, but you can avoid notifying userspace about
> each and every single trap and batch things.
>
> But vga framebuffer is normal ram in most video modes, it doesn't trap
> (except on first access with dirty bit clear, for dirty bit tracking).
>
> IIRC some of the historic video modes which are pretty much unused these
> days (planar 4bit) trap on each access and perform better with coalesced
> mmio.  Also nics which use mmio instead of dma for package xfer.
>
>> > I suspect the memory_region_sync_dirty_bitmap call on tcg should reset
>> > the fast path optimization, so the slow path can update the dirty bits
>> > correctly.
>>
>> Sure. And for the low level cputlb implementation we can clear those
>> bits atomically. However when the memory region is synced we also need
>> to flush any entries in the TLB that have already been hit and cleared
>> TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from
>> outside of a vCPU context because we can't just queue the work and exit
>> the vCPU run loop to reach a known CPU state.
>
> Hmm, ok.
>
>> The RFC PATCH I sent earlier solves this by ensuring the update runs in
>> a quiescent period (i.e. when the vCPUs are not running) but it is
>> sub-optimal as it means the display code has to have a basic knowledge
>> of vCPUs and when they run.
>
> Still the best option for 2.9 I guess ...

I think so although Paolo has pointed out potential issues with the
other display adaptors.

As this dirty tracking is also used for live migration I also need to
check if the BQL change has broken that for TCG guests or that it just
never worked anyway. If it is the later then that gives a bit more
breathing room for plumbing in the full fix to keep the memory_regions
view of the world consistent with the SoftMMU TLB.

>
> cheers,
>   Gerd


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution"
  2017-03-16  7:35                 ` Gerd Hoffmann
  2017-03-16  7:56                   ` Alex Bennée
@ 2017-03-16 14:22                   ` Paolo Bonzini
  1 sibling, 0 replies; 25+ messages in thread
From: Paolo Bonzini @ 2017-03-16 14:22 UTC (permalink / raw)
  To: Gerd Hoffmann, Alex Bennée
  Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc,
	bobby.prani, rth, fred.konrad



On 16/03/2017 08:35, Gerd Hoffmann wrote:
>   Hi,
> 
>>>> Note that KVM has some similar hacks to avoid trapping all writes to
>>>> video memory with its coalesced mmio mechanism however I'm not familiar
>>>> with all the details.
>>>
>>> Normal linear framebuffer access doesn't use this.
>>
>> Ahh OK - as I said I wasn't super familiar with what coalesced mmio was
>> trying to achieve. I assume it is trying to avoid trapping on every
>> single MMIO access?
> 
> You can't avoid the trap, but you can avoid notifying userspace about
> each and every single trap and batch things.
> 
> But vga framebuffer is normal ram in most video modes, it doesn't trap
> (except on first access with dirty bit clear, for dirty bit tracking).  
> 
> IIRC some of the historic video modes which are pretty much unused these
> days (planar 4bit) trap on each access and perform better with coalesced
> mmio.  Also nics which use mmio instead of dma for package xfer.
> 
>>> I suspect the memory_region_sync_dirty_bitmap call on tcg should reset
>>> the fast path optimization, so the slow path can update the dirty bits
>>> correctly.
>>
>> Sure. And for the low level cputlb implementation we can clear those
>> bits atomically. However when the memory region is synced we also need
>> to flush any entries in the TLB that have already been hit and cleared
>> TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from
>> outside of a vCPU context because we can't just queue the work and exit
>> the vCPU run loop to reach a known CPU state.
> 
> Hmm, ok.
> 
>> The RFC PATCH I sent earlier solves this by ensuring the update runs in
>> a quiescent period (i.e. when the vCPUs are not running) but it is
>> sub-optimal as it means the display code has to have a basic knowledge
>> of vCPUs and when they run.
> 
> Still the best option for 2.9 I guess ...

For both VGA and SM501, it should be the issue I pointed out in
https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg03197.html.
The dirty bitmap has been thread-safe for a long time now.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-03-28 14:14 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <36e41adf-b0b3-3efa-51c4-f1a70cd05b98@ilande.co.uk>
2017-03-13 21:28 ` [Qemu-devel] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" Mark Cave-Ayland
     [not found] ` <87wpbsp49a.fsf@linaro.org>
2017-03-14 11:12   ` Mark Cave-Ayland
2017-03-14 13:56     ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan
2017-03-14 15:02       ` luigi burdo
2017-03-14 15:41       ` Alex Bennée
2017-03-14 15:52         ` Mark Cave-Ayland
2017-03-14 16:48           ` Alex Bennée
2017-03-14 17:34             ` BALATON Zoltan
2017-03-14 17:53               ` luigi burdo
2017-03-15 11:14               ` Alex Bennée
2017-03-15 13:26                 ` Mark Cave-Ayland
2017-03-15 14:19                   ` Alex Bennée
2017-03-16  6:39               ` Paolo Bonzini
2017-03-16  7:51                 ` Alex Bennée
2017-03-16  8:34                   ` Paolo Bonzini
2017-03-16 15:34                     ` Gerd Hoffmann
2017-03-16 17:00                       ` Paolo Bonzini
2017-03-28 13:40                         ` Gerd Hoffmann
2017-03-28 14:13                           ` Paolo Bonzini
2017-03-15 14:16           ` Alex Bennée
2017-03-15 15:25             ` Gerd Hoffmann
2017-03-15 16:20               ` Alex Bennée
2017-03-16  7:35                 ` Gerd Hoffmann
2017-03-16  7:56                   ` Alex Bennée
2017-03-16 14:22                   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.