* Re: [Qemu-devel] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" [not found] <36e41adf-b0b3-3efa-51c4-f1a70cd05b98@ilande.co.uk> @ 2017-03-13 21:28 ` Mark Cave-Ayland [not found] ` <87wpbsp49a.fsf@linaro.org> 1 sibling, 0 replies; 25+ messages in thread From: Mark Cave-Ayland @ 2017-03-13 21:28 UTC (permalink / raw) To: jan.kiszka, fred.konrad, cota, alex.bennee, rth, bobby.prani, qemu-devel, qemu-ppc On 13/03/17 21:25, Mark Cave-Ayland wrote: > I've recently noticed some video artifacts appearing in the form of > horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc > images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an > example) which I've finally bisected down to this commit. > > The horizontal lines tend to appear at different positions with > different lengths, although it seems to be more common that a complete > scanline is affected. Reproducing the issue is fairly easy with a MacOS > 9.2.1 ISO and the command line below: > > ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99 > > Could it be that this patch is still missing a lock somewhere which > causes these artifacts to appear? (Apologies - resending as forgot to add CCs for qemu-devel and qemu-ppc) ATB, Mark. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <87wpbsp49a.fsf@linaro.org>]
* Re: [Qemu-devel] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" [not found] ` <87wpbsp49a.fsf@linaro.org> @ 2017-03-14 11:12 ` Mark Cave-Ayland 2017-03-14 13:56 ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan 0 siblings, 1 reply; 25+ messages in thread From: Mark Cave-Ayland @ 2017-03-14 11:12 UTC (permalink / raw) To: Alex Bennée Cc: jan.kiszka, fred.konrad, cota, rth, bobby.prani, qemu-devel, qemu-ppc On 14/03/17 10:00, Alex Bennée wrote: > Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: > >> I've recently noticed some video artifacts appearing in the form of >> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc >> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an >> example) which I've finally bisected down to this commit. > > Interesting. Is the video hardware in this case a simple framebuffer or > is there some sort of GPU involved? For the mac99 machine it's just the normal QEMU PCI std-vga card, so nothing special. Having run tests across SPARC which uses its own framebuffer, there have been no obvious artifacts that I've spotted there to date. >> The horizontal lines tend to appear at different positions with >> different lengths, although it seems to be more common that a complete >> scanline is affected. Reproducing the issue is fairly easy with a MacOS >> 9.2.1 ISO and the command line below: >> >> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99 >> >> Could it be that this patch is still missing a lock somewhere which >> causes these artifacts to appear? > > It depends. If the hardware is accessed via MMIO then the BQL is taken > automatically (unless the area is explicitly marked as doing its own > locking see: mr->global_locking). > > Is the artefact temporary or permanent? Could it be a change in > synchronisation between the emulator updating the framebuffer and > whatever backend updating its display? AFAICT they are temporary in that they appear randomly on the display, but the next time that particular area of the display is updated by the guest then they tend to either change/disappear. MacOS 9 is interesting in that it's effectively an emulator within an emulator, i.e. 68K code running on a PPC emulator and so does tend to stress QEMU in interesting places. Given that I've also seen the occasional issue with Darwin/OS X it seems the problem isn't just constrained to OS 9, however it seems to occur a lot more often there. ATB, Mark. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 11:12 ` Mark Cave-Ayland @ 2017-03-14 13:56 ` BALATON Zoltan 2017-03-14 15:02 ` luigi burdo 2017-03-14 15:41 ` Alex Bennée 0 siblings, 2 replies; 25+ messages in thread From: BALATON Zoltan @ 2017-03-14 13:56 UTC (permalink / raw) To: Mark Cave-Ayland Cc: Alex Bennée, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth On Tue, 14 Mar 2017, Mark Cave-Ayland wrote: > On 14/03/17 10:00, Alex Bennée wrote: >> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: >> >>> I've recently noticed some video artifacts appearing in the form of >>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc >>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an >>> example) which I've finally bisected down to this commit. >> >> Interesting. Is the video hardware in this case a simple framebuffer or >> is there some sort of GPU involved? > > For the mac99 machine it's just the normal QEMU PCI std-vga card, so > nothing special. Having run tests across SPARC which uses its own > framebuffer, there have been no obvious artifacts that I've spotted > there to date. > >>> The horizontal lines tend to appear at different positions with >>> different lengths, although it seems to be more common that a complete >>> scanline is affected. Reproducing the issue is fairly easy with a MacOS >>> 9.2.1 ISO and the command line below: >>> >>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99 >>> >>> Could it be that this patch is still missing a lock somewhere which >>> causes these artifacts to appear? >> >> It depends. If the hardware is accessed via MMIO then the BQL is taken >> automatically (unless the area is explicitly marked as doing its own >> locking see: mr->global_locking). >> >> Is the artefact temporary or permanent? Could it be a change in >> synchronisation between the emulator updating the framebuffer and >> whatever backend updating its display? > > AFAICT they are temporary in that they appear randomly on the display, > but the next time that particular area of the display is updated by the > guest then they tend to either change/disappear. I've also noticed artifacts when testing the SM501 changes with MorphOS on recent QEMU so it may not be related to just the std vga. It's shown as bands (larger number of adjacent scanlines) not updating when the client first wrote to the framebuffer but then they disappeared during the next refresh. I'm guessing it may be related to dirty checking of the framebuffer memory which is used to decide when a scan line needs update? Regards, BALATON Zoltan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 13:56 ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan @ 2017-03-14 15:02 ` luigi burdo 2017-03-14 15:41 ` Alex Bennée 1 sibling, 0 replies; 25+ messages in thread From: luigi burdo @ 2017-03-14 15:02 UTC (permalink / raw) To: BALATON Zoltan, Mark Cave-Ayland Cc: jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, rth, Alex Bennée, fred.konrad framebuffer memory which is used to decide when a scan line needs update? Hi Balton i have this artifacts since 2.20 on Linux PPC. are more evident if the sdl driver is used, much less on GTK but still present there too plus GTK have issue with pointer if gl is enabled (reported in past in bugzilla) I found it in STD when kvm is enabled on ppc32, virtio-vga gave this issue much less . (qemu-system-PPC64 have virtio enabled), The strange thing is the std dont gave this issue on x86 emulation via tcg... I hope one day we will have virtio on ppc32 too. Luigi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 13:56 ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan 2017-03-14 15:02 ` luigi burdo @ 2017-03-14 15:41 ` Alex Bennée 2017-03-14 15:52 ` Mark Cave-Ayland 1 sibling, 1 reply; 25+ messages in thread From: Alex Bennée @ 2017-03-14 15:41 UTC (permalink / raw) To: BALATON Zoltan Cc: Mark Cave-Ayland, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth BALATON Zoltan <balaton@eik.bme.hu> writes: > On Tue, 14 Mar 2017, Mark Cave-Ayland wrote: >> On 14/03/17 10:00, Alex Bennée wrote: >>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: >>> >>>> I've recently noticed some video artifacts appearing in the form of >>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc >>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an >>>> example) which I've finally bisected down to this commit. >>> >>> Interesting. Is the video hardware in this case a simple framebuffer or >>> is there some sort of GPU involved? >> >> For the mac99 machine it's just the normal QEMU PCI std-vga card, so >> nothing special. Having run tests across SPARC which uses its own >> framebuffer, there have been no obvious artifacts that I've spotted >> there to date. >> >>>> The horizontal lines tend to appear at different positions with >>>> different lengths, although it seems to be more common that a complete >>>> scanline is affected. Reproducing the issue is fairly easy with a MacOS >>>> 9.2.1 ISO and the command line below: >>>> >>>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99 >>>> >>>> Could it be that this patch is still missing a lock somewhere which >>>> causes these artifacts to appear? >>> >>> It depends. If the hardware is accessed via MMIO then the BQL is taken >>> automatically (unless the area is explicitly marked as doing its own >>> locking see: mr->global_locking). >>> >>> Is the artefact temporary or permanent? Could it be a change in >>> synchronisation between the emulator updating the framebuffer and >>> whatever backend updating its display? >> >> AFAICT they are temporary in that they appear randomly on the display, >> but the next time that particular area of the display is updated by the >> guest then they tend to either change/disappear. > > I've also noticed artifacts when testing the SM501 changes with > MorphOS on recent QEMU so it may not be related to just the std vga. > It's shown as bands (larger number of adjacent scanlines) not updating > when the client first wrote to the framebuffer but then they > disappeared during the next refresh. I'm guessing it may be related to > dirty checking of the framebuffer memory which is used to decide when > a scan line needs update? Interesting. I guess if the corrupted scan lines are in blocks of PAGE_SIZE that might make sense. Commit (b0706b716769494f321a0d2bfd9fa9893992f995) made tlb_reset_dirty_range update the SoftMMU addr_write entry atomically. Now I don't see how that could race in single threaded TCG mode but it could explain things. I notice that tlb_set_dirty/tlb_set_dirty1 should maybe do the same. The currently assume they are only called from the CPU's context. If you enable #define DEBUG_TLB in cputlb.c does the assert fire? > > Regards, > BALATON Zoltan -- Alex Bennée ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 15:41 ` Alex Bennée @ 2017-03-14 15:52 ` Mark Cave-Ayland 2017-03-14 16:48 ` Alex Bennée 2017-03-15 14:16 ` Alex Bennée 0 siblings, 2 replies; 25+ messages in thread From: Mark Cave-Ayland @ 2017-03-14 15:52 UTC (permalink / raw) To: Alex Bennée, BALATON Zoltan Cc: jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth On 14/03/17 15:41, Alex Bennée wrote: > BALATON Zoltan <balaton@eik.bme.hu> writes: > >> On Tue, 14 Mar 2017, Mark Cave-Ayland wrote: >>> On 14/03/17 10:00, Alex Bennée wrote: >>>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: >>>> >>>>> I've recently noticed some video artifacts appearing in the form of >>>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc >>>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an >>>>> example) which I've finally bisected down to this commit. >>>> >>>> Interesting. Is the video hardware in this case a simple framebuffer or >>>> is there some sort of GPU involved? >>> >>> For the mac99 machine it's just the normal QEMU PCI std-vga card, so >>> nothing special. Having run tests across SPARC which uses its own >>> framebuffer, there have been no obvious artifacts that I've spotted >>> there to date. >>> >>>>> The horizontal lines tend to appear at different positions with >>>>> different lengths, although it seems to be more common that a complete >>>>> scanline is affected. Reproducing the issue is fairly easy with a MacOS >>>>> 9.2.1 ISO and the command line below: >>>>> >>>>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99 >>>>> >>>>> Could it be that this patch is still missing a lock somewhere which >>>>> causes these artifacts to appear? >>>> >>>> It depends. If the hardware is accessed via MMIO then the BQL is taken >>>> automatically (unless the area is explicitly marked as doing its own >>>> locking see: mr->global_locking). >>>> >>>> Is the artefact temporary or permanent? Could it be a change in >>>> synchronisation between the emulator updating the framebuffer and >>>> whatever backend updating its display? >>> >>> AFAICT they are temporary in that they appear randomly on the display, >>> but the next time that particular area of the display is updated by the >>> guest then they tend to either change/disappear. >> >> I've also noticed artifacts when testing the SM501 changes with >> MorphOS on recent QEMU so it may not be related to just the std vga. >> It's shown as bands (larger number of adjacent scanlines) not updating >> when the client first wrote to the framebuffer but then they >> disappeared during the next refresh. I'm guessing it may be related to >> dirty checking of the framebuffer memory which is used to decide when >> a scan line needs update? > > Interesting. I guess if the corrupted scan lines are in blocks of > PAGE_SIZE that might make sense. > > Commit (b0706b716769494f321a0d2bfd9fa9893992f995) made > tlb_reset_dirty_range update the SoftMMU addr_write entry atomically. > Now I don't see how that could race in single threaded TCG mode but it > could explain things. > > I notice that tlb_set_dirty/tlb_set_dirty1 should maybe do the same. The > currently assume they are only called from the CPU's context. If you > enable #define DEBUG_TLB in cputlb.c does the assert fire? Not in a quick local test here. However this does ring a bell - one of Ben's PPC optimisations was to batch TLB flushes so they only occur at certain times - see commit cd0c6f473532bfaf20a095bc90a18e45162981b5 for more detail. Could it be that these underlying assumptions have now changed? ATB, Mark. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 15:52 ` Mark Cave-Ayland @ 2017-03-14 16:48 ` Alex Bennée 2017-03-14 17:34 ` BALATON Zoltan 2017-03-15 14:16 ` Alex Bennée 1 sibling, 1 reply; 25+ messages in thread From: Alex Bennée @ 2017-03-14 16:48 UTC (permalink / raw) To: Mark Cave-Ayland Cc: BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: > On 14/03/17 15:41, Alex Bennée wrote: > >> BALATON Zoltan <balaton@eik.bme.hu> writes: >> >>> On Tue, 14 Mar 2017, Mark Cave-Ayland wrote: >>>> On 14/03/17 10:00, Alex Bennée wrote: >>>>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: >>>>> >>>>>> I've recently noticed some video artifacts appearing in the form of >>>>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc >>>>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an >>>>>> example) which I've finally bisected down to this commit. >>>>> >>>>> Interesting. Is the video hardware in this case a simple framebuffer or >>>>> is there some sort of GPU involved? >>>> >>>> For the mac99 machine it's just the normal QEMU PCI std-vga card, so >>>> nothing special. Having run tests across SPARC which uses its own >>>> framebuffer, there have been no obvious artifacts that I've spotted >>>> there to date. >>>> >>>>>> The horizontal lines tend to appear at different positions with >>>>>> different lengths, although it seems to be more common that a complete >>>>>> scanline is affected. Reproducing the issue is fairly easy with a MacOS >>>>>> 9.2.1 ISO and the command line below: >>>>>> >>>>>> ./qemu-system-ppc -cdrom MacOS921.iso -boot d -m 512 -M mac99 >>>>>> >>>>>> Could it be that this patch is still missing a lock somewhere which >>>>>> causes these artifacts to appear? >>>>> >>>>> It depends. If the hardware is accessed via MMIO then the BQL is taken >>>>> automatically (unless the area is explicitly marked as doing its own >>>>> locking see: mr->global_locking). >>>>> >>>>> Is the artefact temporary or permanent? Could it be a change in >>>>> synchronisation between the emulator updating the framebuffer and >>>>> whatever backend updating its display? >>>> >>>> AFAICT they are temporary in that they appear randomly on the display, >>>> but the next time that particular area of the display is updated by the >>>> guest then they tend to either change/disappear. >>> >>> I've also noticed artifacts when testing the SM501 changes with >>> MorphOS on recent QEMU so it may not be related to just the std vga. >>> It's shown as bands (larger number of adjacent scanlines) not updating >>> when the client first wrote to the framebuffer but then they >>> disappeared during the next refresh. I'm guessing it may be related to >>> dirty checking of the framebuffer memory which is used to decide when >>> a scan line needs update? >> >> Interesting. I guess if the corrupted scan lines are in blocks of >> PAGE_SIZE that might make sense. >> >> Commit (b0706b716769494f321a0d2bfd9fa9893992f995) made >> tlb_reset_dirty_range update the SoftMMU addr_write entry atomically. >> Now I don't see how that could race in single threaded TCG mode but it >> could explain things. >> >> I notice that tlb_set_dirty/tlb_set_dirty1 should maybe do the same. The >> currently assume they are only called from the CPU's context. If you >> enable #define DEBUG_TLB in cputlb.c does the assert fire? > > Not in a quick local test here. I've just realised that assert will never fire because all vCPUS in single threaded mode share the same thread id: cpus.c/qemu_tcg_init_vcpu: /* For non-MTTCG cases we share the thread */ cpu->thread = single_tcg_cpu_thread; cpu->halt_cond = single_tcg_halt_cond; > However this does ring a bell - one of > Ben's PPC optimisations was to batch TLB flushes so they only occur at > certain times - see commit cd0c6f473532bfaf20a095bc90a18e45162981b5 for > more detail. > > Could it be that these underlying assumptions have now changed? Hmm I'm not sure. So from a single-threaded -smp guest case there should be no difference in behaviour. However cross-vCPU flushes are queued up using the async work queue and are dealt with in the target vCPU context. In the single threaded case it shouldn't matter as this work will get executed as soon as round-robin scheduler gets to it: cpus.c/qemu_tcg_rr_cpu_thread_fn: while (cpu && !cpu->queued_work_first && !cpu->exit_request) { When converting a target to MTTCG its certainly something that has to have attention paid to it. For example some cross-vCPU tlb flushes need to be complete from the source vCPU point of view. In this case you call the tlb_flush_*_synced() variants and exit the execution loop. This ensures all vCPUs have completed flushes before we continue. See a67cf2772733e for what I did on ARM. However this shouldn't affect anything in the single-threaded world. However delaying tlb_flushes() could certainly expose/hide stuff that is accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to avoid racing with the TB invalidation logic. The act of the flush will certainly wipe all existing SoftMMU entries and force a re-load on each memory access. So is the dirty status of memory being read from outside a vCPU execution context? -- Alex Bennée ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 16:48 ` Alex Bennée @ 2017-03-14 17:34 ` BALATON Zoltan 2017-03-14 17:53 ` luigi burdo ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: BALATON Zoltan @ 2017-03-14 17:34 UTC (permalink / raw) To: Alex Bennée Cc: Mark Cave-Ayland, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth On Tue, 14 Mar 2017, Alex Bennée wrote: > So from a single-threaded -smp guest case there should be no difference > in behaviour. However cross-vCPU flushes are queued up using the async > work queue and are dealt with in the target vCPU context. In the single > threaded case it shouldn't matter as this work will get executed as soon > as round-robin scheduler gets to it: > > cpus.c/qemu_tcg_rr_cpu_thread_fn: > while (cpu && !cpu->queued_work_first && !cpu->exit_request) { > > When converting a target to MTTCG its certainly something that has to > have attention paid to it. For example some cross-vCPU tlb flushes need > to be complete from the source vCPU point of view. In this case you call > the tlb_flush_*_synced() variants and exit the execution loop. This > ensures all vCPUs have completed flushes before we continue. See > a67cf2772733e for what I did on ARM. However this shouldn't affect > anything in the single-threaded world. I think we have a single CPU and thread for these ppc machines here so I'm not sure how this could be relevant. > However delaying tlb_flushes() could certainly expose/hide stuff that is > accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to > avoid racing with the TB invalidation logic. The act of the flush will > certainly wipe all existing SoftMMU entries and force a re-load on each > memory access. > > So is the dirty status of memory being read from outside a vCPU > execution context? Like from the display controller models that use memory_region_get_dirty() to check if the frambuffer needs to be updated? But all display adaptors seem to do this and the problem was only seem on ppc so it may be related to something ppc specific. Regards, BALATON Zoltan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 17:34 ` BALATON Zoltan @ 2017-03-14 17:53 ` luigi burdo 2017-03-15 11:14 ` Alex Bennée 2017-03-16 6:39 ` Paolo Bonzini 2 siblings, 0 replies; 25+ messages in thread From: luigi burdo @ 2017-03-14 17:53 UTC (permalink / raw) To: BALATON Zoltan, Alex Bennée Cc: jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad Zoltan, i tested on ppcel debian there isnt issue reported . i think is only a BE issue. Luigi Like from the display controller models that use memory_region_get_dirty() to check if the frambuffer needs to be updated? But all display adaptors seem to do this and the problem was only seem on ppc so it may be related to something ppc specific. Regards, BALATON Zoltan ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 17:34 ` BALATON Zoltan 2017-03-14 17:53 ` luigi burdo @ 2017-03-15 11:14 ` Alex Bennée 2017-03-15 13:26 ` Mark Cave-Ayland 2017-03-16 6:39 ` Paolo Bonzini 2 siblings, 1 reply; 25+ messages in thread From: Alex Bennée @ 2017-03-15 11:14 UTC (permalink / raw) To: BALATON Zoltan Cc: Mark Cave-Ayland, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth BALATON Zoltan <balaton@eik.bme.hu> writes: > On Tue, 14 Mar 2017, Alex Bennée wrote: >> So from a single-threaded -smp guest case there should be no difference >> in behaviour. <snip> >>However this shouldn't affect >> anything in the single-threaded world. > > I think we have a single CPU and thread for these ppc machines here so > I'm not sure how this could be relevant. > >> However delaying tlb_flushes() could certainly expose/hide stuff that is >> accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to >> avoid racing with the TB invalidation logic. The act of the flush will >> certainly wipe all existing SoftMMU entries and force a re-load on each >> memory access. >> >> So is the dirty status of memory being read from outside a vCPU >> execution context? > > Like from the display controller models that use > memory_region_get_dirty() to check if the frambuffer needs to be > updated? But all display adaptors seem to do this and the problem was > only seem on ppc so it may be related to something ppc specific. So this accesses the memory_region API which is under RCU control. AFAIUI this should mean the dirty status may be read late (e.g. next update) but should never be incorrect (e.g. miss a dirtying operation). -- Alex Bennée ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-15 11:14 ` Alex Bennée @ 2017-03-15 13:26 ` Mark Cave-Ayland 2017-03-15 14:19 ` Alex Bennée 0 siblings, 1 reply; 25+ messages in thread From: Mark Cave-Ayland @ 2017-03-15 13:26 UTC (permalink / raw) To: Alex Bennée, BALATON Zoltan Cc: jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad On 15/03/17 11:14, Alex Bennée wrote: > BALATON Zoltan <balaton@eik.bme.hu> writes: > >> On Tue, 14 Mar 2017, Alex Bennée wrote: >>> So from a single-threaded -smp guest case there should be no difference >>> in behaviour. > <snip> >>> However this shouldn't affect >>> anything in the single-threaded world. >> >> I think we have a single CPU and thread for these ppc machines here so >> I'm not sure how this could be relevant. >> >>> However delaying tlb_flushes() could certainly expose/hide stuff that is >>> accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to >>> avoid racing with the TB invalidation logic. The act of the flush will >>> certainly wipe all existing SoftMMU entries and force a re-load on each >>> memory access. >>> >>> So is the dirty status of memory being read from outside a vCPU >>> execution context? >> >> Like from the display controller models that use >> memory_region_get_dirty() to check if the frambuffer needs to be >> updated? But all display adaptors seem to do this and the problem was >> only seem on ppc so it may be related to something ppc specific. > > So this accesses the memory_region API which is under RCU control. > AFAIUI this should mean the dirty status may be read late (e.g. next > update) but should never be incorrect (e.g. miss a dirtying operation). AFAICT check_tlb_flush() gets passed a global parameter which if set true invalidates the TLB across all CPU TLBs rather than just the local CPU TLB - but then in this case we're only running with a single CPU so I can't see how this is relevant. Have you been able to reproduce the artifacts locally at all? I'm wondering if once the icount fixup patches are in, it might be easier to debug if enabling icount causes the artifacts to appear in a deterministic manner. ATB, Mark. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-15 13:26 ` Mark Cave-Ayland @ 2017-03-15 14:19 ` Alex Bennée 0 siblings, 0 replies; 25+ messages in thread From: Alex Bennée @ 2017-03-15 14:19 UTC (permalink / raw) To: Mark Cave-Ayland Cc: BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: > On 15/03/17 11:14, Alex Bennée wrote: > >> BALATON Zoltan <balaton@eik.bme.hu> writes: >> >>> On Tue, 14 Mar 2017, Alex Bennée wrote: >>>> So from a single-threaded -smp guest case there should be no difference >>>> in behaviour. >> <snip> >>>> However this shouldn't affect >>>> anything in the single-threaded world. >>> >>> I think we have a single CPU and thread for these ppc machines here so >>> I'm not sure how this could be relevant. >>> >>>> However delaying tlb_flushes() could certainly expose/hide stuff that is >>>> accessing the dirty mechanism. tlb_flush itself now takes the tb_lock() to >>>> avoid racing with the TB invalidation logic. The act of the flush will >>>> certainly wipe all existing SoftMMU entries and force a re-load on each >>>> memory access. >>>> >>>> So is the dirty status of memory being read from outside a vCPU >>>> execution context? >>> >>> Like from the display controller models that use >>> memory_region_get_dirty() to check if the frambuffer needs to be >>> updated? But all display adaptors seem to do this and the problem was >>> only seem on ppc so it may be related to something ppc specific. >> >> So this accesses the memory_region API which is under RCU control. >> AFAIUI this should mean the dirty status may be read late (e.g. next >> update) but should never be incorrect (e.g. miss a dirtying operation). > > AFAICT check_tlb_flush() gets passed a global parameter which if set > true invalidates the TLB across all CPU TLBs rather than just the local > CPU TLB - but then in this case we're only running with a single CPU so > I can't see how this is relevant. Not quite. tlb_flush used to take a global flag but it ignored it. It has been removed in recent updates to the cputlb API ;-) > Have you been able to reproduce the artifacts locally at all? I'm > wondering if once the icount fixup patches are in, it might be easier to > debug if enabling icount causes the artifacts to appear in a > deterministic manner. I have, and I can make them go away as well by forcing a full update. See my other longer email for a description of what I think is happening. Now I just need a neat and upstreamable fix ;-) -- Alex Bennée ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 17:34 ` BALATON Zoltan 2017-03-14 17:53 ` luigi burdo 2017-03-15 11:14 ` Alex Bennée @ 2017-03-16 6:39 ` Paolo Bonzini 2017-03-16 7:51 ` Alex Bennée 2 siblings, 1 reply; 25+ messages in thread From: Paolo Bonzini @ 2017-03-16 6:39 UTC (permalink / raw) To: BALATON Zoltan, Alex Bennée Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad On 14/03/2017 18:34, BALATON Zoltan wrote: > Like from the display controller models that use > memory_region_get_dirty() to check if the frambuffer needs to be > updated? But all display adaptors seem to do this and the problem was > only seem on ppc so it may be related to something ppc specific. You need to use test_and_clear_dirty instead of get_dirty/reset_dirty. Or alternatively you need to reset immediately after get_dirty. At least cg3.c is doing read dirty bitmap read VRAM clear dirty bitmap which has a race. Paolo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-16 6:39 ` Paolo Bonzini @ 2017-03-16 7:51 ` Alex Bennée 2017-03-16 8:34 ` Paolo Bonzini 0 siblings, 1 reply; 25+ messages in thread From: Alex Bennée @ 2017-03-16 7:51 UTC (permalink / raw) To: Paolo Bonzini Cc: BALATON Zoltan, jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad Paolo Bonzini <pbonzini@redhat.com> writes: > On 14/03/2017 18:34, BALATON Zoltan wrote: >> Like from the display controller models that use >> memory_region_get_dirty() to check if the frambuffer needs to be >> updated? But all display adaptors seem to do this and the problem was >> only seem on ppc so it may be related to something ppc specific. > > You need to use test_and_clear_dirty instead of get_dirty/reset_dirty. > Or alternatively you need to reset immediately after get_dirty. At > least cg3.c is doing > > read dirty bitmap > read VRAM > clear dirty bitmap > > which has a race. Are you saying this is also racy also in the KVM case or just that TCG doesn't currently sync up with the current dirty bitmap mechanism? AIUI the memory regions are under RCU so you always get a consistent view (with updates after you take a copy going to the next iteration). What I think needs doing is hooking into the ->log-sync mechanism to reset SoftMMU TLB entries so the dirty detection carries on for the next sync point? -- Alex Bennée ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-16 7:51 ` Alex Bennée @ 2017-03-16 8:34 ` Paolo Bonzini 2017-03-16 15:34 ` Gerd Hoffmann 0 siblings, 1 reply; 25+ messages in thread From: Paolo Bonzini @ 2017-03-16 8:34 UTC (permalink / raw) To: Alex Bennée Cc: BALATON Zoltan, jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad On 16/03/2017 08:51, Alex Bennée wrote: > > Paolo Bonzini <pbonzini@redhat.com> writes: > >> On 14/03/2017 18:34, BALATON Zoltan wrote: >>> Like from the display controller models that use >>> memory_region_get_dirty() to check if the frambuffer needs to be >>> updated? But all display adaptors seem to do this and the problem was >>> only seem on ppc so it may be related to something ppc specific. >> >> You need to use test_and_clear_dirty instead of get_dirty/reset_dirty. >> Or alternatively you need to reset immediately after get_dirty. At >> least cg3.c is doing >> >> read dirty bitmap >> read VRAM >> clear dirty bitmap >> >> which has a race. > > Are you saying this is also racy also in the KVM case or just that TCG > doesn't currently sync up with the current dirty bitmap mechanism? It's okay for KVM because the dirty bitmap is copied from KVM by the device itself, before updating the screen (with memory_region_sync_dirty_bitmap). For TCG, on the other hand, there is full concurrency between the CPU that sets the bits and the device that clears them. > AIUI the memory regions are under RCU so you always get a consistent > view (with updates after you take a copy going to the next iteration). No, RCU only protects against resizes of the bitmap. The bitmap is not copied on every access (of course :)). > What I think needs doing is hooking into the ->log-sync mechanism to > reset SoftMMU TLB entries so the dirty detection carries on for the next > sync point? It's much simpler than that, just clear the dirty bitmap bit before reading the memory. Paolo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-16 8:34 ` Paolo Bonzini @ 2017-03-16 15:34 ` Gerd Hoffmann 2017-03-16 17:00 ` Paolo Bonzini 0 siblings, 1 reply; 25+ messages in thread From: Gerd Hoffmann @ 2017-03-16 15:34 UTC (permalink / raw) To: Paolo Bonzini Cc: Alex Bennée, jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth Hi, > >> least cg3.c is doing > >> > >> read dirty bitmap > >> read VRAM > >> clear dirty bitmap > >> > >> which has a race. > It's much simpler than that, just clear the dirty bitmap bit before > reading the memory. Well, not *that* simple. vga checks the dirty bitmap with scanline granularity, like that: foreach (scanline) { if (get_dirty(scanline)) update_scanline() } reset_dirty(framebuffer) I suspect simply transforming that to foreach (scanline) { if (test_and_clear_dirty(scanline)) update_scanline() } is not going to fly due to page tracking working with page granularity. With two subsequent scanlines within one page the second scanline will never be updated because updating first clears the dirty bit of the page ... Looping twice over all scanlines, with the first loop just figuring which scanlines are modified, then clear dirty bits, then update in a second loop should work I think. It'll duplicate a bunch of code though, because in reality the loop isn't just three lines because of doublescan, interlave and other funky stuff coming from CGA compatibility. Given that probably pretty much every display adapter is affected I'd tend to take Alex patch for 2.9, then sort the mess in the 2.10 devel cycle and revert the patch when done. cheers, Gerd ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-16 15:34 ` Gerd Hoffmann @ 2017-03-16 17:00 ` Paolo Bonzini 2017-03-28 13:40 ` Gerd Hoffmann 0 siblings, 1 reply; 25+ messages in thread From: Paolo Bonzini @ 2017-03-16 17:00 UTC (permalink / raw) To: Gerd Hoffmann Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc, bobby.prani, rth, Alex Bennée, fred.konrad On 16/03/2017 16:34, Gerd Hoffmann wrote: > Well, not *that* simple. vga checks the dirty bitmap with scanline > granularity, like that: > > foreach (scanline) { > if (get_dirty(scanline)) > update_scanline() > } > reset_dirty(framebuffer) > > I suspect simply transforming that to > > foreach (scanline) { > if (test_and_clear_dirty(scanline)) > update_scanline() > } > > is not going to fly due to page tracking working with page granularity. > With two subsequent scanlines within one page the second scanline will > never be updated because updating first clears the dirty bit of the > page ... > > Looping twice over all scanlines, with the first loop just figuring > which scanlines are modified, then clear dirty bits, then update in a > second loop should work I think. It'll duplicate a bunch of code > though, because in reality the loop isn't just three lines because of > doublescan, interlave and other funky stuff coming from CGA > compatibility. > > Given that probably pretty much every display adapter is affected I'd > tend to take Alex patch for 2.9, then sort the mess in the 2.10 devel > cycle and revert the patch when done. You're right; an alternative is to copy the dirty bitmap to a local one and clear the global one (the dirty bitmap for an 8 MB full HD frame buffer is just 256 bytes). With the right API to abstract the job, it should be relatively easy to fix all adapters. Paolo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-16 17:00 ` Paolo Bonzini @ 2017-03-28 13:40 ` Gerd Hoffmann 2017-03-28 14:13 ` Paolo Bonzini 0 siblings, 1 reply; 25+ messages in thread From: Gerd Hoffmann @ 2017-03-28 13:40 UTC (permalink / raw) To: Paolo Bonzini Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, Alex Bennée, rth Hi, > > Well, not *that* simple. vga checks the dirty bitmap with scanline > > granularity, like that: > > > > foreach (scanline) { > > if (get_dirty(scanline)) > > update_scanline() > > } > > reset_dirty(framebuffer) > > > > I suspect simply transforming that to > > > > foreach (scanline) { > > if (test_and_clear_dirty(scanline)) > > update_scanline() > > } > > > > is not going to fly due to page tracking working with page granularity. > > With two subsequent scanlines within one page the second scanline will > > never be updated because updating first clears the dirty bit of the > > page ... > You're right; an alternative is to copy the dirty bitmap to a local one > and clear the global one (the dirty bitmap for an 8 MB full HD frame > buffer is just 256 bytes). With the right API to abstract the job, it > should be relatively easy to fix all adapters. Started looking into this. So I guess you are thinking about a variant of memory_region_test_and_clear_dirty and/or cpu_physical_memory_test_and_clear_dirty which returns a bitmap instead of a bool? Plus some helper function which use the returned bitmap to figure whenever a specific scanline has been touched or not? cheers, Gerd ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-28 13:40 ` Gerd Hoffmann @ 2017-03-28 14:13 ` Paolo Bonzini 0 siblings, 0 replies; 25+ messages in thread From: Paolo Bonzini @ 2017-03-28 14:13 UTC (permalink / raw) To: Gerd Hoffmann Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, Alex Bennée, rth On 28/03/2017 15:40, Gerd Hoffmann wrote: > Hi, > >>> Well, not *that* simple. vga checks the dirty bitmap with scanline >>> granularity, like that: >>> >>> foreach (scanline) { >>> if (get_dirty(scanline)) >>> update_scanline() >>> } >>> reset_dirty(framebuffer) >>> >>> I suspect simply transforming that to >>> >>> foreach (scanline) { >>> if (test_and_clear_dirty(scanline)) >>> update_scanline() >>> } >>> >>> is not going to fly due to page tracking working with page granularity. >>> With two subsequent scanlines within one page the second scanline will >>> never be updated because updating first clears the dirty bit of the >>> page ... > >> You're right; an alternative is to copy the dirty bitmap to a local one >> and clear the global one (the dirty bitmap for an 8 MB full HD frame >> buffer is just 256 bytes). With the right API to abstract the job, it >> should be relatively easy to fix all adapters. > > Started looking into this. So I guess you are thinking about a variant > of memory_region_test_and_clear_dirty and/or > cpu_physical_memory_test_and_clear_dirty > which returns a bitmap instead of a bool? Plus some helper function > which use the returned bitmap to figure whenever a specific scanline has > been touched or not? Yes, exactly. Paolo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-14 15:52 ` Mark Cave-Ayland 2017-03-14 16:48 ` Alex Bennée @ 2017-03-15 14:16 ` Alex Bennée 2017-03-15 15:25 ` Gerd Hoffmann 1 sibling, 1 reply; 25+ messages in thread From: Alex Bennée @ 2017-03-15 14:16 UTC (permalink / raw) To: Mark Cave-Ayland Cc: BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth, Gerd Hoffmann Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: > On 14/03/17 15:41, Alex Bennée wrote: > >> BALATON Zoltan <balaton@eik.bme.hu> writes: >> >>> On Tue, 14 Mar 2017, Mark Cave-Ayland wrote: >>>> On 14/03/17 10:00, Alex Bennée wrote: >>>>> Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes: >>>>> >>>>>> I've recently noticed some video artifacts appearing in the form of >>>>>> horizontal lines whilst testing OpenBIOS boot on some qemu-system-ppc >>>>>> images (see https://www.ilande.co.uk/tmp/qemu/macos9-stripe.png for an >>>>>> example) which I've finally bisected down to this commit. <snip> > > Could it be that these underlying assumptions have now changed? So there is an assumption that has changed. It used to be the TCG code and the gui_update code could not run at the same time as they both needed to take the BQL. When drop global lock patch went in this was reversed. The BQL is no longer held while running the TCG until it needs it. It certainly needs it when it is doing HW emulation and for MMIO mapped registers this is what happens. Every time we write to a MMIO register the SoftMMU helper routines follow the slow path, through the memory API (which takes a lock) and into the hw emulation code. However for framebuffers our behaviour is slightly different. Instead of having MMIO register spaces we use the dirty tracking mechanism. Here regions are marked as for dirty tracking and when the SoftMMU helper first comes to this bit of memory it will follow the slow path and mark region as visited. Once done this bit is cleared and all future writes to that page are written directly from the translated code. This no longer has an implicit synchronisation from the BQL so there is now a race and you can have memory being updated which might miss this flagging. I've sort of confirmed that by hacking full_update = 1 in vga_update_display and the artefacts go away. Note that KVM has some similar hacks to avoid trapping all writes to video memory with its coalesced mmio mechanism however I'm not familiar with all the details. Now there are mechanisms we can use to ensure there are no races happen and return to the situation that the display is only updated when the TCG cores are not running. We simply queue up update function to run as async safe work which will guarantee the TCG vCPUs will not be running while stuff is updated. I'm a little hesitant to do that globally as it may well slow down KVM by introducing an unneeded pause. This also involves making a tweak into the ui/console.c code (gui_update) so I'm CC'ing Gerd who might have an insight on how to do this neatly. -- Alex Bennée ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-15 14:16 ` Alex Bennée @ 2017-03-15 15:25 ` Gerd Hoffmann 2017-03-15 16:20 ` Alex Bennée 0 siblings, 1 reply; 25+ messages in thread From: Gerd Hoffmann @ 2017-03-15 15:25 UTC (permalink / raw) To: Alex Bennée Cc: Mark Cave-Ayland, BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth Hi, > Instead of having MMIO register spaces we use the dirty tracking > mechanism. Here regions are marked as for dirty tracking and when the > SoftMMU helper first comes to this bit of memory it will follow the slow > path and mark region as visited. visited? Do you mean dirty bit is set for the page? Or is this something else? > Once done this bit is cleared and all > future writes to that page are written directly from the translated > code. This no longer has an implicit synchronisation from the BQL so > there is now a race and you can have memory being updated which might > miss this flagging. Not fully following what you are trying to say. But pages being updated without dirty bit getting set (if clear) certainly is a problem for the vga emulation (and live migration too). > Note that KVM has some similar hacks to avoid trapping all writes to > video memory with its coalesced mmio mechanism however I'm not familiar > with all the details. Normal linear framebuffer access doesn't use this. > Now there are mechanisms we can use to ensure there are no races happen > and return to the situation that the display is only updated when the > TCG cores are not running. tcg and display updates running in parallel isn't a problem, we have that with kvm anyway. Dirty bit handling must be correct though. With kvm at the start of each display update vga fetches the dirty bitmap from the kernel (memory_region_sync_dirty_bitmap). Then it goes use memory_region_get_dirty to figure which pages have been touched. When memory_region_sync_dirty_bitmap is called the kernel will clear the memory bitmap of the region and also map all pages read-only. Next guest update will pagefault and the kernel can set the dirty bit for the page (maybe there is a more efficient way with EPT available). I suspect the memory_region_sync_dirty_bitmap call on tcg should reset the fast path optimization, so the slow path can update the dirty bits correctly. cheers, Gerd ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-15 15:25 ` Gerd Hoffmann @ 2017-03-15 16:20 ` Alex Bennée 2017-03-16 7:35 ` Gerd Hoffmann 0 siblings, 1 reply; 25+ messages in thread From: Alex Bennée @ 2017-03-15 16:20 UTC (permalink / raw) To: Gerd Hoffmann Cc: Mark Cave-Ayland, BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth Gerd Hoffmann <kraxel@redhat.com> writes: > Hi, > >> Instead of having MMIO register spaces we use the dirty tracking >> mechanism. Here regions are marked as for dirty tracking and when the >> SoftMMU helper first comes to this bit of memory it will follow the slow >> path and mark region as visited. > > visited? Do you mean dirty bit is set for the page? Or is this > something else? Yes the dirty page bit (or the clearing of the TLB_NOTDIRTY bit from the SoftMMU entry). > >> Once done this bit is cleared and all >> future writes to that page are written directly from the translated >> code. This no longer has an implicit synchronisation from the BQL so >> there is now a race and you can have memory being updated which might >> miss this flagging. > > Not fully following what you are trying to say. But pages being updated > without dirty bit getting set (if clear) certainly is a problem for the > vga emulation (and live migration too). > >> Note that KVM has some similar hacks to avoid trapping all writes to >> video memory with its coalesced mmio mechanism however I'm not familiar >> with all the details. > > Normal linear framebuffer access doesn't use this. Ahh OK - as I said I wasn't super familiar with what coalesced mmio was trying to achieve. I assume it is trying to avoid trapping on every single MMIO access? > >> Now there are mechanisms we can use to ensure there are no races happen >> and return to the situation that the display is only updated when the >> TCG cores are not running. > > tcg and display updates running in parallel isn't a problem, we have > that with kvm anyway. Dirty bit handling must be correct though. > > With kvm at the start of each display update vga fetches the dirty > bitmap from the kernel (memory_region_sync_dirty_bitmap). Then it goes > use memory_region_get_dirty to figure which pages have been touched. > > When memory_region_sync_dirty_bitmap is called the kernel will clear the > memory bitmap of the region and also map all pages read-only. Next > guest update will pagefault and the kernel can set the dirty bit for the > page (maybe there is a more efficient way with EPT available). > > I suspect the memory_region_sync_dirty_bitmap call on tcg should reset > the fast path optimization, so the slow path can update the dirty bits > correctly. Sure. And for the low level cputlb implementation we can clear those bits atomically. However when the memory region is synced we also need to flush any entries in the TLB that have already been hit and cleared TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from outside of a vCPU context because we can't just queue the work and exit the vCPU run loop to reach a known CPU state. The RFC PATCH I sent earlier solves this by ensuring the update runs in a quiescent period (i.e. when the vCPUs are not running) but it is sub-optimal as it means the display code has to have a basic knowledge of vCPUs and when they run. -- Alex Bennée ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-15 16:20 ` Alex Bennée @ 2017-03-16 7:35 ` Gerd Hoffmann 2017-03-16 7:56 ` Alex Bennée 2017-03-16 14:22 ` Paolo Bonzini 0 siblings, 2 replies; 25+ messages in thread From: Gerd Hoffmann @ 2017-03-16 7:35 UTC (permalink / raw) To: Alex Bennée Cc: Mark Cave-Ayland, BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth Hi, > >> Note that KVM has some similar hacks to avoid trapping all writes to > >> video memory with its coalesced mmio mechanism however I'm not familiar > >> with all the details. > > > > Normal linear framebuffer access doesn't use this. > > Ahh OK - as I said I wasn't super familiar with what coalesced mmio was > trying to achieve. I assume it is trying to avoid trapping on every > single MMIO access? You can't avoid the trap, but you can avoid notifying userspace about each and every single trap and batch things. But vga framebuffer is normal ram in most video modes, it doesn't trap (except on first access with dirty bit clear, for dirty bit tracking). IIRC some of the historic video modes which are pretty much unused these days (planar 4bit) trap on each access and perform better with coalesced mmio. Also nics which use mmio instead of dma for package xfer. > > I suspect the memory_region_sync_dirty_bitmap call on tcg should reset > > the fast path optimization, so the slow path can update the dirty bits > > correctly. > > Sure. And for the low level cputlb implementation we can clear those > bits atomically. However when the memory region is synced we also need > to flush any entries in the TLB that have already been hit and cleared > TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from > outside of a vCPU context because we can't just queue the work and exit > the vCPU run loop to reach a known CPU state. Hmm, ok. > The RFC PATCH I sent earlier solves this by ensuring the update runs in > a quiescent period (i.e. when the vCPUs are not running) but it is > sub-optimal as it means the display code has to have a basic knowledge > of vCPUs and when they run. Still the best option for 2.9 I guess ... cheers, Gerd ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-16 7:35 ` Gerd Hoffmann @ 2017-03-16 7:56 ` Alex Bennée 2017-03-16 14:22 ` Paolo Bonzini 1 sibling, 0 replies; 25+ messages in thread From: Alex Bennée @ 2017-03-16 7:56 UTC (permalink / raw) To: Gerd Hoffmann Cc: Mark Cave-Ayland, BALATON Zoltan, jan.kiszka, qemu-devel, cota, qemu-ppc, bobby.prani, fred.konrad, rth Gerd Hoffmann <kraxel@redhat.com> writes: > Hi, > >> >> Note that KVM has some similar hacks to avoid trapping all writes to >> >> video memory with its coalesced mmio mechanism however I'm not familiar >> >> with all the details. >> > >> > Normal linear framebuffer access doesn't use this. >> >> Ahh OK - as I said I wasn't super familiar with what coalesced mmio was >> trying to achieve. I assume it is trying to avoid trapping on every >> single MMIO access? > > You can't avoid the trap, but you can avoid notifying userspace about > each and every single trap and batch things. > > But vga framebuffer is normal ram in most video modes, it doesn't trap > (except on first access with dirty bit clear, for dirty bit tracking). > > IIRC some of the historic video modes which are pretty much unused these > days (planar 4bit) trap on each access and perform better with coalesced > mmio. Also nics which use mmio instead of dma for package xfer. > >> > I suspect the memory_region_sync_dirty_bitmap call on tcg should reset >> > the fast path optimization, so the slow path can update the dirty bits >> > correctly. >> >> Sure. And for the low level cputlb implementation we can clear those >> bits atomically. However when the memory region is synced we also need >> to flush any entries in the TLB that have already been hit and cleared >> TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from >> outside of a vCPU context because we can't just queue the work and exit >> the vCPU run loop to reach a known CPU state. > > Hmm, ok. > >> The RFC PATCH I sent earlier solves this by ensuring the update runs in >> a quiescent period (i.e. when the vCPUs are not running) but it is >> sub-optimal as it means the display code has to have a basic knowledge >> of vCPUs and when they run. > > Still the best option for 2.9 I guess ... I think so although Paolo has pointed out potential issues with the other display adaptors. As this dirty tracking is also used for live migration I also need to check if the BQL change has broken that for TCG guests or that it just never worked anyway. If it is the later then that gives a bit more breathing room for plumbing in the full fix to keep the memory_regions view of the world consistent with the SoftMMU TLB. > > cheers, > Gerd -- Alex Bennée ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [Qemu-ppc] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" 2017-03-16 7:35 ` Gerd Hoffmann 2017-03-16 7:56 ` Alex Bennée @ 2017-03-16 14:22 ` Paolo Bonzini 1 sibling, 0 replies; 25+ messages in thread From: Paolo Bonzini @ 2017-03-16 14:22 UTC (permalink / raw) To: Gerd Hoffmann, Alex Bennée Cc: jan.kiszka, Mark Cave-Ayland, qemu-devel, cota, qemu-ppc, bobby.prani, rth, fred.konrad On 16/03/2017 08:35, Gerd Hoffmann wrote: > Hi, > >>>> Note that KVM has some similar hacks to avoid trapping all writes to >>>> video memory with its coalesced mmio mechanism however I'm not familiar >>>> with all the details. >>> >>> Normal linear framebuffer access doesn't use this. >> >> Ahh OK - as I said I wasn't super familiar with what coalesced mmio was >> trying to achieve. I assume it is trying to avoid trapping on every >> single MMIO access? > > You can't avoid the trap, but you can avoid notifying userspace about > each and every single trap and batch things. > > But vga framebuffer is normal ram in most video modes, it doesn't trap > (except on first access with dirty bit clear, for dirty bit tracking). > > IIRC some of the historic video modes which are pretty much unused these > days (planar 4bit) trap on each access and perform better with coalesced > mmio. Also nics which use mmio instead of dma for package xfer. > >>> I suspect the memory_region_sync_dirty_bitmap call on tcg should reset >>> the fast path optimization, so the slow path can update the dirty bits >>> correctly. >> >> Sure. And for the low level cputlb implementation we can clear those >> bits atomically. However when the memory region is synced we also need >> to flush any entries in the TLB that have already been hit and cleared >> TLB_NOTDIRTY to we can trigger the slow path again. This is tricky from >> outside of a vCPU context because we can't just queue the work and exit >> the vCPU run loop to reach a known CPU state. > > Hmm, ok. > >> The RFC PATCH I sent earlier solves this by ensuring the update runs in >> a quiescent period (i.e. when the vCPUs are not running) but it is >> sub-optimal as it means the display code has to have a basic knowledge >> of vCPUs and when they run. > > Still the best option for 2.9 I guess ... For both VGA and SM501, it should be the issue I pointed out in https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg03197.html. The dirty bitmap has been thread-safe for a long time now. Paolo ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2017-03-28 14:14 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <36e41adf-b0b3-3efa-51c4-f1a70cd05b98@ilande.co.uk> 2017-03-13 21:28 ` [Qemu-devel] qemu-system-ppc video artifacts since "tcg: drop global lock during TCG code execution" Mark Cave-Ayland [not found] ` <87wpbsp49a.fsf@linaro.org> 2017-03-14 11:12 ` Mark Cave-Ayland 2017-03-14 13:56 ` [Qemu-devel] [Qemu-ppc] " BALATON Zoltan 2017-03-14 15:02 ` luigi burdo 2017-03-14 15:41 ` Alex Bennée 2017-03-14 15:52 ` Mark Cave-Ayland 2017-03-14 16:48 ` Alex Bennée 2017-03-14 17:34 ` BALATON Zoltan 2017-03-14 17:53 ` luigi burdo 2017-03-15 11:14 ` Alex Bennée 2017-03-15 13:26 ` Mark Cave-Ayland 2017-03-15 14:19 ` Alex Bennée 2017-03-16 6:39 ` Paolo Bonzini 2017-03-16 7:51 ` Alex Bennée 2017-03-16 8:34 ` Paolo Bonzini 2017-03-16 15:34 ` Gerd Hoffmann 2017-03-16 17:00 ` Paolo Bonzini 2017-03-28 13:40 ` Gerd Hoffmann 2017-03-28 14:13 ` Paolo Bonzini 2017-03-15 14:16 ` Alex Bennée 2017-03-15 15:25 ` Gerd Hoffmann 2017-03-15 16:20 ` Alex Bennée 2017-03-16 7:35 ` Gerd Hoffmann 2017-03-16 7:56 ` Alex Bennée 2017-03-16 14:22 ` Paolo Bonzini
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.