qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* GICv3 for MTTCG
@ 2021-05-11 17:51 Andrey Shinkevich
  2021-05-11 19:53 ` Richard Henderson
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Andrey Shinkevich @ 2021-05-11 17:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, drjones, richard.henderson, qemu-arm,
	Chengen (William, FixNet),
	alex.bennee

Dear colleagues,

I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host.
The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8:

include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8

The version 3 of the Generic Interrupt Controller (GICv3) is not
supported in QEMU for some reason unknown to me. It would allow to
increase the limit of CPUs and accelerate the MTTCG performance on a
multiple core hypervisor.
I have got an idea to implement the Interrupt Translation Service (ITS)
for using by MTTCG for ARM architecture.

Do you find that idea useful and feasible?
If yes, how much time do you estimate for such a project to complete by
one developer?
If no, what are reasons for not implementing GICv3 for MTTCG in QEMU?

Best regards,
Andrey Shinkevich


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-11 17:51 GICv3 for MTTCG Andrey Shinkevich
@ 2021-05-11 19:53 ` Richard Henderson
  2021-05-12  1:44 ` Zenghui Yu
  2021-05-12 15:26 ` Alex Bennée
  2 siblings, 0 replies; 15+ messages in thread
From: Richard Henderson @ 2021-05-11 19:53 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: peter.maydell, drjones, qemu-arm, alex.bennee, Chengen (William, FixNet)

On 5/11/21 12:51 PM, Andrey Shinkevich wrote:
> The version 3 of the Generic Interrupt Controller (GICv3) is not
> supported in QEMU for some reason unknown to me.

It is supported.  You have to enable it like so:

   -M virt,gic-version=3


r~


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-11 17:51 GICv3 for MTTCG Andrey Shinkevich
  2021-05-11 19:53 ` Richard Henderson
@ 2021-05-12  1:44 ` Zenghui Yu
  2021-05-12 15:26 ` Alex Bennée
  2 siblings, 0 replies; 15+ messages in thread
From: Zenghui Yu @ 2021-05-12  1:44 UTC (permalink / raw)
  To: Andrey Shinkevich, qemu-devel
  Cc: peter.maydell, drjones, shashi.mallela, richard.henderson,
	qemu-arm, Chengen (William, FixNet),
	wanghaibin.wang

[+Shashi]

On 2021/5/12 1:51, Andrey Shinkevich wrote:
> Dear colleagues,
> 
> I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host.
> The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8:
> 
> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
> 
> The version 3 of the Generic Interrupt Controller (GICv3) is not
> supported in QEMU for some reason unknown to me. It would allow to
> increase the limit of CPUs and accelerate the MTTCG performance on a
> multiple core hypervisor.
> I have got an idea to implement the Interrupt Translation Service (ITS)
> for using by MTTCG for ARM architecture.
> 
> Do you find that idea useful and feasible?
> If yes, how much time do you estimate for such a project to complete by
> one developer?
> If no, what are reasons for not implementing GICv3 for MTTCG in QEMU?

Are you looking for something like that [*]? I think it has been on the
list for a while.

[*] https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html


Zenghui


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-11 17:51 GICv3 for MTTCG Andrey Shinkevich
  2021-05-11 19:53 ` Richard Henderson
  2021-05-12  1:44 ` Zenghui Yu
@ 2021-05-12 15:26 ` Alex Bennée
  2021-05-13 16:35   ` Andrey Shinkevich
  2 siblings, 1 reply; 15+ messages in thread
From: Alex Bennée @ 2021-05-12 15:26 UTC (permalink / raw)
  To: Andrey Shinkevich
  Cc: peter.maydell, drjones, Cota, richard.henderson, qemu-devel,
	qemu-arm, Chengen (William,  FixNet)


Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:

> Dear colleagues,
>
> I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host.
> The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8:
>
> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>
> The version 3 of the Generic Interrupt Controller (GICv3) is not
> supported in QEMU for some reason unknown to me. It would allow to
> increase the limit of CPUs and accelerate the MTTCG performance on a
> multiple core hypervisor.

It is supported, you just need to select it.

> I have got an idea to implement the Interrupt Translation Service (ITS)
> for using by MTTCG for ARM architecture.

There is some work to support ITS under TCG already posted:

  Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
  Date: Thu, 29 Apr 2021 19:41:53 -0400
  Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>

please do review and test.

> Do you find that idea useful and feasible?
> If yes, how much time do you estimate for such a project to complete by
> one developer?
> If no, what are reasons for not implementing GICv3 for MTTCG in QEMU?

As far as MTTCG performance is concerned there is a degree of
diminishing returns to be expected as the synchronisation cost between
threads will eventually outweigh the gains of additional threads.

There are a number of parts that could improve this performance. The
first would be picking up the BQL reduction series from your FutureWei
colleges who worked on the problem when they were Linaro assignees:

  Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from cpu_handle_interrupt/exception path
  Date: Wed, 19 Aug 2020 14:28:49 -0400
  Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>

There was also a longer series moving towards per-CPU locks:

  Subject: [PATCH v10 00/73] per-CPU locks
  Date: Wed, 17 Jun 2020 17:01:18 -0400
  Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>

I believe the initial measurements showed that the BQL cost started to
edge up with GIC interactions. We did discuss approaches for this and I
think one idea was use non-BQL locking for the GIC. You would need to
revert:

  Subject: [PATCH-for-5.2] exec: Remove MemoryRegion::global_locking field
  Date: Thu,  6 Aug 2020 17:07:26 +0200
  Message-Id: <20200806150726.962-1-philmd@redhat.com>

and then implement a more fine tuned locking in the GIC emulation
itself. However I think the BQL and per-CPU locks are lower hanging
fruit to tackle first.

>
> Best regards,
> Andrey Shinkevich


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-12 15:26 ` Alex Bennée
@ 2021-05-13 16:35   ` Andrey Shinkevich
  2021-05-13 16:45     ` Shashi Mallela
  2021-05-13 17:19     ` Alex Bennée
  0 siblings, 2 replies; 15+ messages in thread
From: Andrey Shinkevich @ 2021-05-13 16:35 UTC (permalink / raw)
  To: Alex Bennée
  Cc: peter.maydell, drjones, Cota, shashi.mallela, richard.henderson,
	qemu-devel, qemu-arm, Chengen (William,  FixNet),
	yuzenghui, Wanghaibin (D)

Dear colleagues,

Thank you all very much for your responses. Let me reply with one message.

I configured QEMU for AARCH64 guest:
$ ./configure --target-list=aarch64-softmmu

When I start QEMU with GICv3 on an x86 host:
qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3

QEMU reports this error from hw/pci/msix.c:
error_setg(errp, "MSI-X is not supported by interrupt controller");

Probably, the variable 'msi_nonbroken' would be initialized in
hw/intc/arm_gicv3_its_common.c:
gicv3_its_init_mmio(..)

I guess that it works with KVM acceleration only rather than with TCG.

The error persists after applying the series:
https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
"GICv3 LPI and ITS feature implementation"
(special thanks for referring me to that)

Please, make me clear and advise ideas how that error can be fixed?
Should the MSI-X support be implemented with GICv3 extra?

When successful, I would like to test QEMU for a maximum number of cores 
to get the best MTTCG performance.
Probably, we will get just some percentage of performance enhancement 
with the BQL series applied, won't we? I will test it as well.

Best regards,
Andrey Shinkevich


On 5/12/21 6:43 PM, Alex Bennée wrote:
> 
> Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
> 
>> Dear colleagues,
>>
>> I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host.
>> The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8:
>>
>> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>>
>> The version 3 of the Generic Interrupt Controller (GICv3) is not
>> supported in QEMU for some reason unknown to me. It would allow to
>> increase the limit of CPUs and accelerate the MTTCG performance on a
>> multiple core hypervisor.
> 
> It is supported, you just need to select it.
> 
>> I have got an idea to implement the Interrupt Translation Service (ITS)
>> for using by MTTCG for ARM architecture.
> 
> There is some work to support ITS under TCG already posted:
> 
>    Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>    Date: Thu, 29 Apr 2021 19:41:53 -0400
>    Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
> 
> please do review and test.
> 
>> Do you find that idea useful and feasible?
>> If yes, how much time do you estimate for such a project to complete by
>> one developer?
>> If no, what are reasons for not implementing GICv3 for MTTCG in QEMU?
> 
> As far as MTTCG performance is concerned there is a degree of
> diminishing returns to be expected as the synchronisation cost between
> threads will eventually outweigh the gains of additional threads.
> 
> There are a number of parts that could improve this performance. The
> first would be picking up the BQL reduction series from your FutureWei
> colleges who worked on the problem when they were Linaro assignees:
> 
>    Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from cpu_handle_interrupt/exception path
>    Date: Wed, 19 Aug 2020 14:28:49 -0400
>    Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
> 
> There was also a longer series moving towards per-CPU locks:
> 
>    Subject: [PATCH v10 00/73] per-CPU locks
>    Date: Wed, 17 Jun 2020 17:01:18 -0400
>    Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
> 
> I believe the initial measurements showed that the BQL cost started to
> edge up with GIC interactions. We did discuss approaches for this and I
> think one idea was use non-BQL locking for the GIC. You would need to
> revert:
> 
>    Subject: [PATCH-for-5.2] exec: Remove MemoryRegion::global_locking field
>    Date: Thu,  6 Aug 2020 17:07:26 +0200
>    Message-Id: <20200806150726.962-1-philmd@redhat.com>
> 
> and then implement a more fine tuned locking in the GIC emulation
> itself. However I think the BQL and per-CPU locks are lower hanging
> fruit to tackle first.
> 
>>
>> Best regards,
>> Andrey Shinkevich
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-13 16:35   ` Andrey Shinkevich
@ 2021-05-13 16:45     ` Shashi Mallela
  2021-05-13 18:29       ` Andrey Shinkevich
  2021-06-17 16:43       ` Andrey Shinkevich
  2021-05-13 17:19     ` Alex Bennée
  1 sibling, 2 replies; 15+ messages in thread
From: Shashi Mallela @ 2021-05-13 16:45 UTC (permalink / raw)
  To: Andrey Shinkevich
  Cc: peter.maydell, drjones, Cota, richard.henderson, qemu-devel,
	qemu-arm, Chengen (William, FixNet), yuzenghui, Wanghaibin (D),
	Alex Bennée

[-- Attachment #1: Type: text/plain, Size: 4577 bytes --]

Hi Andrey,

To clarify, the patch series
> https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
> "GICv3 LPI and ITS feature implementation"
>

is applicable for virt machine 6.1 onwards,i.e ITS TCG functionality is not available for version 6.0 that is being tried
here.

Thanks
Shashi

On May 13 2021, at 12:35 pm, Andrey Shinkevich <andrey.shinkevich@huawei.com> wrote:
> Dear colleagues,
>
> Thank you all very much for your responses. Let me reply with one message.
> I configured QEMU for AARCH64 guest:
> $ ./configure --target-list=aarch64-softmmu
>
> When I start QEMU with GICv3 on an x86 host:
> qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
>
> QEMU reports this error from hw/pci/msix.c:
> error_setg(errp, "MSI-X is not supported by interrupt controller");
>
> Probably, the variable 'msi_nonbroken' would be initialized in
> hw/intc/arm_gicv3_its_common.c:
> gicv3_its_init_mmio(..)
>
> I guess that it works with KVM acceleration only rather than with TCG.
> The error persists after applying the series:
> https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
> "GICv3 LPI and ITS feature implementation"
> (special thanks for referring me to that)
>
> Please, make me clear and advise ideas how that error can be fixed?
> Should the MSI-X support be implemented with GICv3 extra?
>
> When successful, I would like to test QEMU for a maximum number of cores
> to get the best MTTCG performance.
> Probably, we will get just some percentage of performance enhancement
> with the BQL series applied, won't we? I will test it as well.
>
> Best regards,
> Andrey Shinkevich
>
>
> On 5/12/21 6:43 PM, Alex Bennée wrote:
> >
> > Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
> >
> >> Dear colleagues,
> >>
> >> I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host.
> >> The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8:
> >>
> >> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
> >>
> >> The version 3 of the Generic Interrupt Controller (GICv3) is not
> >> supported in QEMU for some reason unknown to me. It would allow to
> >> increase the limit of CPUs and accelerate the MTTCG performance on a
> >> multiple core hypervisor.
> >
> > It is supported, you just need to select it.
> >
> >> I have got an idea to implement the Interrupt Translation Service (ITS)
> >> for using by MTTCG for ARM architecture.
> >
> > There is some work to support ITS under TCG already posted:
> >
> > Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
> > Date: Thu, 29 Apr 2021 19:41:53 -0400
> > Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
> >
> > please do review and test.
> >
> >> Do you find that idea useful and feasible?
> >> If yes, how much time do you estimate for such a project to complete by
> >> one developer?
> >> If no, what are reasons for not implementing GICv3 for MTTCG in QEMU?
> >
> > As far as MTTCG performance is concerned there is a degree of
> > diminishing returns to be expected as the synchronisation cost between
> > threads will eventually outweigh the gains of additional threads.
> >
> > There are a number of parts that could improve this performance. The
> > first would be picking up the BQL reduction series from your FutureWei
> > colleges who worked on the problem when they were Linaro assignees:
> >
> > Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from cpu_handle_interrupt/exception path
> > Date: Wed, 19 Aug 2020 14:28:49 -0400
> > Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
> >
> > There was also a longer series moving towards per-CPU locks:
> >
> > Subject: [PATCH v10 00/73] per-CPU locks
> > Date: Wed, 17 Jun 2020 17:01:18 -0400
> > Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
> >
> > I believe the initial measurements showed that the BQL cost started to
> > edge up with GIC interactions. We did discuss approaches for this and I
> > think one idea was use non-BQL locking for the GIC. You would need to
> > revert:
> >
> > Subject: [PATCH-for-5.2] exec: Remove MemoryRegion::global_locking field
> > Date: Thu, 6 Aug 2020 17:07:26 +0200
> > Message-Id: <20200806150726.962-1-philmd@redhat.com>
> >
> > and then implement a more fine tuned locking in the GIC emulation
> > itself. However I think the BQL and per-CPU locks are lower hanging
> > fruit to tackle first.
> >
> >>
> >> Best regards,
> >> Andrey Shinkevich
> >
> >
>


[-- Attachment #2: Type: text/html, Size: 5999 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-13 16:35   ` Andrey Shinkevich
  2021-05-13 16:45     ` Shashi Mallela
@ 2021-05-13 17:19     ` Alex Bennée
  2021-05-13 18:33       ` Andrey Shinkevich
  2021-05-14  5:21       ` Andrey Shinkevich
  1 sibling, 2 replies; 15+ messages in thread
From: Alex Bennée @ 2021-05-13 17:19 UTC (permalink / raw)
  To: Andrey Shinkevich
  Cc: peter.maydell, drjones, Cota, shashi.mallela, richard.henderson,
	qemu-devel, qemu-arm, Chengen (William,  FixNet),
	yuzenghui, Wanghaibin (D)


Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:

> Dear colleagues,
>
> Thank you all very much for your responses. Let me reply with one message.
>
> I configured QEMU for AARCH64 guest:
> $ ./configure --target-list=aarch64-softmmu
>
> When I start QEMU with GICv3 on an x86 host:
> qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3

Hmm are you sure you are running your built QEMU? For me the following
works fine:

  ./aarch64-softmmu/qemu-system-aarch64 -machine virt-6.0,gic-version=3,accel=tcg -cpu max -serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-scsi-pci -device scsi-hd,drive=hd0 -blockdev driver=raw,node-name=hd0,discard=unmap,file.driver=host_device,file.filename=/dev/zvol/hackpool-0/debian-buster-arm64 -kernel
~/lsrc/linux.git/builds/arm64.nopreempt/arch/arm64/boot/Image -append "console=ttyAMA0 root=/dev/sda2" -display none -m 8G,maxmem=8G -smp 12


>
> QEMU reports this error from hw/pci/msix.c:
> error_setg(errp, "MSI-X is not supported by interrupt controller");
>
> Probably, the variable 'msi_nonbroken' would be initialized in
> hw/intc/arm_gicv3_its_common.c:
> gicv3_its_init_mmio(..)
>
> I guess that it works with KVM acceleration only rather than with TCG.
>
> The error persists after applying the series:
> https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
> "GICv3 LPI and ITS feature implementation"
> (special thanks for referring me to that)
>
> Please, make me clear and advise ideas how that error can be fixed?
> Should the MSI-X support be implemented with GICv3 extra?
>
> When successful, I would like to test QEMU for a maximum number of cores 
> to get the best MTTCG performance.
> Probably, we will get just some percentage of performance enhancement 
> with the BQL series applied, won't we? I will test it as well.
>
> Best regards,
> Andrey Shinkevich
>
>
> On 5/12/21 6:43 PM, Alex Bennée wrote:
>> 
>> Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
>> 
>>> Dear colleagues,
>>>
>>> I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host.
>>> The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8:
>>>
>>> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>>>
>>> The version 3 of the Generic Interrupt Controller (GICv3) is not
>>> supported in QEMU for some reason unknown to me. It would allow to
>>> increase the limit of CPUs and accelerate the MTTCG performance on a
>>> multiple core hypervisor.
>> 
>> It is supported, you just need to select it.
>> 
>>> I have got an idea to implement the Interrupt Translation Service (ITS)
>>> for using by MTTCG for ARM architecture.
>> 
>> There is some work to support ITS under TCG already posted:
>> 
>>    Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>>    Date: Thu, 29 Apr 2021 19:41:53 -0400
>>    Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
>> 
>> please do review and test.
>> 
>>> Do you find that idea useful and feasible?
>>> If yes, how much time do you estimate for such a project to complete by
>>> one developer?
>>> If no, what are reasons for not implementing GICv3 for MTTCG in QEMU?
>> 
>> As far as MTTCG performance is concerned there is a degree of
>> diminishing returns to be expected as the synchronisation cost between
>> threads will eventually outweigh the gains of additional threads.
>> 
>> There are a number of parts that could improve this performance. The
>> first would be picking up the BQL reduction series from your FutureWei
>> colleges who worked on the problem when they were Linaro assignees:
>> 
>>    Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from cpu_handle_interrupt/exception path
>>    Date: Wed, 19 Aug 2020 14:28:49 -0400
>>    Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
>> 
>> There was also a longer series moving towards per-CPU locks:
>> 
>>    Subject: [PATCH v10 00/73] per-CPU locks
>>    Date: Wed, 17 Jun 2020 17:01:18 -0400
>>    Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
>> 
>> I believe the initial measurements showed that the BQL cost started to
>> edge up with GIC interactions. We did discuss approaches for this and I
>> think one idea was use non-BQL locking for the GIC. You would need to
>> revert:
>> 
>>    Subject: [PATCH-for-5.2] exec: Remove MemoryRegion::global_locking field
>>    Date: Thu,  6 Aug 2020 17:07:26 +0200
>>    Message-Id: <20200806150726.962-1-philmd@redhat.com>
>> 
>> and then implement a more fine tuned locking in the GIC emulation
>> itself. However I think the BQL and per-CPU locks are lower hanging
>> fruit to tackle first.
>> 
>>>
>>> Best regards,
>>> Andrey Shinkevich
>> 
>> 


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-13 16:45     ` Shashi Mallela
@ 2021-05-13 18:29       ` Andrey Shinkevich
  2021-06-17 16:43       ` Andrey Shinkevich
  1 sibling, 0 replies; 15+ messages in thread
From: Andrey Shinkevich @ 2021-05-13 18:29 UTC (permalink / raw)
  To: Shashi Mallela
  Cc: peter.maydell, drjones, Cota, richard.henderson, qemu-devel,
	qemu-arm, Chengen (William, FixNet), yuzenghui, Wanghaibin (D),
	Alex Bennée

Hi Shashi,

Thank you very much for letting me know.
I changed virt machine to the version 6.1 and the error disappeared.
But the guest OS is experiencing severe delays while booting and 
starting. The delays take minutes mostly here:

#0  0x00007f1d0932554d in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f1d09320e9b in _L_lock_883 () at /lib64/libpthread.so.0
#2  0x00007f1d09320d68 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x0000560bf51637b3 in qemu_mutex_lock_impl (mutex=0x560bf5e05820 
<qemu_global_mutex>, file=0x560bf56db84b "../util/main-loop.c", 
line=252) at ../util/qemu-thread-posix.c:79
#4  0x0000560bf4d65403 in qemu_mutex_lock_iothread_impl 
(file=0x560bf56db84b "../util/main-loop.c", line=252) at 
../softmmu/cpus.c:491
#5  0x0000560bf516faa5 in os_host_main_loop_wait (timeout=2367975) at 
../util/main-loop.c:252
#6  0x0000560bf516fbb0 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:530
#7  0x0000560bf4ddc186 in qemu_main_loop () at ../softmmu/runstate.c:725
#8  0x0000560bf473ae42 in main (argc=63, argv=0x7ffc5920eba8, 
envp=0x7ffc5920eda8) at ../softmmu/main.c:50

and here:

#0  0x00007f1d0903cd8f in ppoll () at /lib64/libc.so.6
#1  0x0000560bf512e2d0 in qemu_poll_ns (fds=0x560bf70f12b0, nfds=5, 
timeout=350259000000) at ../util/qemu-timer.c:348
#2  0x0000560bf516fa8c in os_host_main_loop_wait (timeout=350259000000) 
at ../util/main-loop.c:249
#3  0x0000560bf516fbb0 in main_loop_wait (nonblocking=0) at 
../util/main-loop.c:530
#4  0x0000560bf4ddc186 in qemu_main_loop () at ../softmmu/runstate.c:725
#5  0x0000560bf473ae42 in main (argc=63, argv=0x7ffc5920eba8, 
envp=0x7ffc5920eda8) at ../softmmu/main.c:50

Eventually, the guest hangs at the second back trace above.

Best regards,
Andrey


On 5/13/21 7:45 PM, Shashi Mallela wrote:
> Hi Andrey,
> 
> To clarify, the patch series
> 
>     https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>     "GICv3 LPI and ITS feature implementation"
> 
> is applicable for virt machine 6.1 onwards,i.e ITS TCG functionality is 
> not available for version 6.0 that is being tried
> here.
> 
> Thanks
> Shashi
> 
> On May 13 2021, at 12:35 pm, Andrey Shinkevich 
> <andrey.shinkevich@huawei.com> wrote:
> 
>     Dear colleagues,
> 
>     Thank you all very much for your responses. Let me reply with one
>     message.
> 
>     I configured QEMU for AARCH64 guest:
>     $ ./configure --target-list=aarch64-softmmu
> 
>     When I start QEMU with GICv3 on an x86 host:
>     qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
> 
>     QEMU reports this error from hw/pci/msix.c:
>     error_setg(errp, "MSI-X is not supported by interrupt controller");
> 
>     Probably, the variable 'msi_nonbroken' would be initialized in
>     hw/intc/arm_gicv3_its_common.c:
>     gicv3_its_init_mmio(..)
> 
>     I guess that it works with KVM acceleration only rather than with TCG.
> 
>     The error persists after applying the series:
>     https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>     "GICv3 LPI and ITS feature implementation"
>     (special thanks for referring me to that)
> 
>     Please, make me clear and advise ideas how that error can be fixed?
>     Should the MSI-X support be implemented with GICv3 extra?
> 
>     When successful, I would like to test QEMU for a maximum number of cores
>     to get the best MTTCG performance.
>     Probably, we will get just some percentage of performance enhancement
>     with the BQL series applied, won't we? I will test it as well.
> 
>     Best regards,
>     Andrey Shinkevich
> 
> 
>     On 5/12/21 6:43 PM, Alex Bennée wrote:
>      >
>      > Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
>      >
>      >> Dear colleagues,
>      >>
>      >> I am looking for ways to accelerate the MTTCG for ARM guest on
>     x86-64 host.
>      >> The maximum number of CPUs for MTTCG that uses GICv2 is limited
>     by 8:
>      >>
>      >> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>      >>
>      >> The version 3 of the Generic Interrupt Controller (GICv3) is not
>      >> supported in QEMU for some reason unknown to me. It would allow to
>      >> increase the limit of CPUs and accelerate the MTTCG performance on a
>      >> multiple core hypervisor.
>      >
>      > It is supported, you just need to select it.
>      >
>      >> I have got an idea to implement the Interrupt Translation
>     Service (ITS)
>      >> for using by MTTCG for ARM architecture.
>      >
>      > There is some work to support ITS under TCG already posted:
>      >
>      > Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>      > Date: Thu, 29 Apr 2021 19:41:53 -0400
>      > Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
>      >
>      > please do review and test.
>      >
>      >> Do you find that idea useful and feasible?
>      >> If yes, how much time do you estimate for such a project to
>     complete by
>      >> one developer?
>      >> If no, what are reasons for not implementing GICv3 for MTTCG in
>     QEMU?
>      >
>      > As far as MTTCG performance is concerned there is a degree of
>      > diminishing returns to be expected as the synchronisation cost
>     between
>      > threads will eventually outweigh the gains of additional threads.
>      >
>      > There are a number of parts that could improve this performance. The
>      > first would be picking up the BQL reduction series from your
>     FutureWei
>      > colleges who worked on the problem when they were Linaro assignees:
>      >
>      > Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from
>     cpu_handle_interrupt/exception path
>      > Date: Wed, 19 Aug 2020 14:28:49 -0400
>      > Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
>      >
>      > There was also a longer series moving towards per-CPU locks:
>      >
>      > Subject: [PATCH v10 00/73] per-CPU locks
>      > Date: Wed, 17 Jun 2020 17:01:18 -0400
>      > Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
>      >
>      > I believe the initial measurements showed that the BQL cost
>     started to
>      > edge up with GIC interactions. We did discuss approaches for this
>     and I
>      > think one idea was use non-BQL locking for the GIC. You would need to
>      > revert:
>      >
>      > Subject: [PATCH-for-5.2] exec: Remove
>     MemoryRegion::global_locking field
>      > Date: Thu, 6 Aug 2020 17:07:26 +0200
>      > Message-Id: <20200806150726.962-1-philmd@redhat.com>
>      >
>      > and then implement a more fine tuned locking in the GIC emulation
>      > itself. However I think the BQL and per-CPU locks are lower hanging
>      > fruit to tackle first.
>      >
>      >>
>      >> Best regards,
>      >> Andrey Shinkevich
>      >
>      >
> 
> Sent from Mailspring



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-13 17:19     ` Alex Bennée
@ 2021-05-13 18:33       ` Andrey Shinkevich
  2021-05-14  5:21       ` Andrey Shinkevich
  1 sibling, 0 replies; 15+ messages in thread
From: Andrey Shinkevich @ 2021-05-13 18:33 UTC (permalink / raw)
  To: Alex Bennée
  Cc: peter.maydell, drjones, Cota, shashi.mallela, richard.henderson,
	qemu-devel, qemu-arm, Chengen (William,  FixNet),
	yuzenghui, Wanghaibin (D)

I built QEMU from the source files downloaded from
https://github.com/qemu/qemu
latest commit 3e9f48bcdabe57f8
I have applied the series "GICv3 LPI and ITS feature implementation".

When I tried to start QEMU back then with the '-kernel' option, the boot 
loader failed to locate the rootfs disk by its correct ID. Specifying 
'root=/dev/sda2' didn't help me also.
So, I used virt-manager successfully which runs QEMU with the following 
arguments:

/usr/local/bin/qemu-system-aarch64 -name 
guest=EulerOS-2.8-Rich,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-33-EulerOS-2.8-Rich/master-key.aes 
-machine virt-6.1,accel=tcg,usb=off,dump-guest-core=off,gic-version=3 
-cpu max -drive 
file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on 
-drive 
file=/var/lib/libvirt/qemu/nvram/EulerOS-2.8-Rich_VARS.fd,if=pflash,format=raw,unit=1 
-m 4096 -smp 8,sockets=8,cores=1,threads=1 -uuid 
c95e0e92-011b-449a-8e3f-b5f0938aaaa7 -display none -no-user-config 
-nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on -device 
pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 
-device 
pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 
-device 
pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 
-device 
pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 
-device qemu-xhci,p2=8,p3=8,id=usb,bus=pci.2,addr=0x0 -device 
virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -drive 
file=/var/lib/libvirt/images/EulerOS-2.8-Rich.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 
-drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 
-netdev tap,fd=28,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f9:e0:69,bus=pci.1,addr=0x0 
-chardev pty,id=charserial0 -serial chardev:charserial0 -msg timestamp=on

Best regards,
Andrey.


On 5/13/21 8:20 PM, Alex Bennée wrote:
> 
> Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
> 
>> Dear colleagues,
>>
>> Thank you all very much for your responses. Let me reply with one message.
>>
>> I configured QEMU for AARCH64 guest:
>> $ ./configure --target-list=aarch64-softmmu
>>
>> When I start QEMU with GICv3 on an x86 host:
>> qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
> 
> Hmm are you sure you are running your built QEMU? For me the following
> works fine:
> 
>    ./aarch64-softmmu/qemu-system-aarch64 -machine virt-6.0,gic-version=3,accel=tcg -cpu max -serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-scsi-pci -device scsi-hd,drive=hd0 -blockdev driver=raw,node-name=hd0,discard=unmap,file.driver=host_device,file.filename=/dev/zvol/hackpool-0/debian-buster-arm64 -kernel
> ~/lsrc/linux.git/builds/arm64.nopreempt/arch/arm64/boot/Image -append "console=ttyAMA0 root=/dev/sda2" -display none -m 8G,maxmem=8G -smp 12
> 
> 
>>
>> QEMU reports this error from hw/pci/msix.c:
>> error_setg(errp, "MSI-X is not supported by interrupt controller");
>>
>> Probably, the variable 'msi_nonbroken' would be initialized in
>> hw/intc/arm_gicv3_its_common.c:
>> gicv3_its_init_mmio(..)
>>
>> I guess that it works with KVM acceleration only rather than with TCG.
>>
>> The error persists after applying the series:
>> https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>> "GICv3 LPI and ITS feature implementation"
>> (special thanks for referring me to that)
>>
>> Please, make me clear and advise ideas how that error can be fixed?
>> Should the MSI-X support be implemented with GICv3 extra?
>>
>> When successful, I would like to test QEMU for a maximum number of cores
>> to get the best MTTCG performance.
>> Probably, we will get just some percentage of performance enhancement
>> with the BQL series applied, won't we? I will test it as well.
>>
>> Best regards,
>> Andrey Shinkevich
>>
>>
>> On 5/12/21 6:43 PM, Alex Bennée wrote:
>>>
>>> Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
>>>
>>>> Dear colleagues,
>>>>
>>>> I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host.
>>>> The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8:
>>>>
>>>> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>>>>
>>>> The version 3 of the Generic Interrupt Controller (GICv3) is not
>>>> supported in QEMU for some reason unknown to me. It would allow to
>>>> increase the limit of CPUs and accelerate the MTTCG performance on a
>>>> multiple core hypervisor.
>>>
>>> It is supported, you just need to select it.
>>>
>>>> I have got an idea to implement the Interrupt Translation Service (ITS)
>>>> for using by MTTCG for ARM architecture.
>>>
>>> There is some work to support ITS under TCG already posted:
>>>
>>>     Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>>>     Date: Thu, 29 Apr 2021 19:41:53 -0400
>>>     Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
>>>
>>> please do review and test.
>>>
>>>> Do you find that idea useful and feasible?
>>>> If yes, how much time do you estimate for such a project to complete by
>>>> one developer?
>>>> If no, what are reasons for not implementing GICv3 for MTTCG in QEMU?
>>>
>>> As far as MTTCG performance is concerned there is a degree of
>>> diminishing returns to be expected as the synchronisation cost between
>>> threads will eventually outweigh the gains of additional threads.
>>>
>>> There are a number of parts that could improve this performance. The
>>> first would be picking up the BQL reduction series from your FutureWei
>>> colleges who worked on the problem when they were Linaro assignees:
>>>
>>>     Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from cpu_handle_interrupt/exception path
>>>     Date: Wed, 19 Aug 2020 14:28:49 -0400
>>>     Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
>>>
>>> There was also a longer series moving towards per-CPU locks:
>>>
>>>     Subject: [PATCH v10 00/73] per-CPU locks
>>>     Date: Wed, 17 Jun 2020 17:01:18 -0400
>>>     Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
>>>
>>> I believe the initial measurements showed that the BQL cost started to
>>> edge up with GIC interactions. We did discuss approaches for this and I
>>> think one idea was use non-BQL locking for the GIC. You would need to
>>> revert:
>>>
>>>     Subject: [PATCH-for-5.2] exec: Remove MemoryRegion::global_locking field
>>>     Date: Thu,  6 Aug 2020 17:07:26 +0200
>>>     Message-Id: <20200806150726.962-1-philmd@redhat.com>
>>>
>>> and then implement a more fine tuned locking in the GIC emulation
>>> itself. However I think the BQL and per-CPU locks are lower hanging
>>> fruit to tackle first.
>>>
>>>>
>>>> Best regards,
>>>> Andrey Shinkevich
>>>
>>>
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-13 17:19     ` Alex Bennée
  2021-05-13 18:33       ` Andrey Shinkevich
@ 2021-05-14  5:21       ` Andrey Shinkevich
  1 sibling, 0 replies; 15+ messages in thread
From: Andrey Shinkevich @ 2021-05-14  5:21 UTC (permalink / raw)
  To: Alex Bennée
  Cc: peter.maydell, drjones, Cota, shashi.mallela, richard.henderson,
	qemu-devel, qemu-arm, Chengen (William,  FixNet),
	yuzenghui, Wanghaibin (D)

On 5/13/21 8:20 PM, Alex Bennée wrote:
> 
> Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
> 
>> Dear colleagues,
>>
>> Thank you all very much for your responses. Let me reply with one message.
>>
>> I configured QEMU for AARCH64 guest:
>> $ ./configure --target-list=aarch64-softmmu
>>
>> When I start QEMU with GICv3 on an x86 host:
>> qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
> 
> Hmm are you sure you are running your built QEMU? For me the following
> works fine:

No doubt I run my built QEMU because I am debugging it and watching the 
run of it with gcc.

> 
>    ./aarch64-softmmu/qemu-system-aarch64 -machine virt-6.0,gic-version=3,accel=tcg -cpu max -serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-scsi-pci -device scsi-hd,drive=hd0 -blockdev driver=raw,node-name=hd0,discard=unmap,file.driver=host_device,file.filename=/dev/zvol/hackpool-0/debian-buster-arm64 -kernel
> ~/lsrc/linux.git/builds/arm64.nopreempt/arch/arm64/boot/Image -append "console=ttyAMA0 root=/dev/sda2" -display none -m 8G,maxmem=8G -smp 12
> 
> 

Which source code are you using for building your QEMU? Would you please 
send me the link if it is a source other than github.com/qemu/qemu?
I downloaded and pulled the latest commit 3e9f48bcdabe57f8f and applied 
the series "[PATCH v3 0/8] GICv3 LPI and ITS feature implementation" 
ONLY. Did you do the same?

I have NOT applied the series "[PATCH v2 0/7] accel/tcg: remove implied 
BQL from cpu_handle_interrupt/exception path" yet because it is old and 
the manual applying takes more time (will do it later). Is it a possible 
reason that my guest hangs with locks at start?

Andrey

>>
>> QEMU reports this error from hw/pci/msix.c:
>> error_setg(errp, "MSI-X is not supported by interrupt controller");
>>
>> Probably, the variable 'msi_nonbroken' would be initialized in
>> hw/intc/arm_gicv3_its_common.c:
>> gicv3_its_init_mmio(..)
>>
>> I guess that it works with KVM acceleration only rather than with TCG.
>>
>> The error persists after applying the series:
>> https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>> "GICv3 LPI and ITS feature implementation"
>> (special thanks for referring me to that)
>>
>> Please, make me clear and advise ideas how that error can be fixed?
>> Should the MSI-X support be implemented with GICv3 extra?
>>
>> When successful, I would like to test QEMU for a maximum number of cores
>> to get the best MTTCG performance.
>> Probably, we will get just some percentage of performance enhancement
>> with the BQL series applied, won't we? I will test it as well.
>>
>> Best regards,
>> Andrey Shinkevich
>>
>>
>> On 5/12/21 6:43 PM, Alex Bennée wrote:
>>>
>>> Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
>>>
>>>> Dear colleagues,
>>>>
>>>> I am looking for ways to accelerate the MTTCG for ARM guest on x86-64 host.
>>>> The maximum number of CPUs for MTTCG that uses GICv2 is limited by 8:
>>>>
>>>> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>>>>
>>>> The version 3 of the Generic Interrupt Controller (GICv3) is not
>>>> supported in QEMU for some reason unknown to me. It would allow to
>>>> increase the limit of CPUs and accelerate the MTTCG performance on a
>>>> multiple core hypervisor.
>>>
>>> It is supported, you just need to select it.
>>>
>>>> I have got an idea to implement the Interrupt Translation Service (ITS)
>>>> for using by MTTCG for ARM architecture.
>>>
>>> There is some work to support ITS under TCG already posted:
>>>
>>>     Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>>>     Date: Thu, 29 Apr 2021 19:41:53 -0400
>>>     Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
>>>
>>> please do review and test.
>>>
>>>> Do you find that idea useful and feasible?
>>>> If yes, how much time do you estimate for such a project to complete by
>>>> one developer?
>>>> If no, what are reasons for not implementing GICv3 for MTTCG in QEMU?
>>>
>>> As far as MTTCG performance is concerned there is a degree of
>>> diminishing returns to be expected as the synchronisation cost between
>>> threads will eventually outweigh the gains of additional threads.
>>>
>>> There are a number of parts that could improve this performance. The
>>> first would be picking up the BQL reduction series from your FutureWei
>>> colleges who worked on the problem when they were Linaro assignees:
>>>
>>>     Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from cpu_handle_interrupt/exception path
>>>     Date: Wed, 19 Aug 2020 14:28:49 -0400
>>>     Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
>>>
>>> There was also a longer series moving towards per-CPU locks:
>>>
>>>     Subject: [PATCH v10 00/73] per-CPU locks
>>>     Date: Wed, 17 Jun 2020 17:01:18 -0400
>>>     Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
>>>
>>> I believe the initial measurements showed that the BQL cost started to
>>> edge up with GIC interactions. We did discuss approaches for this and I
>>> think one idea was use non-BQL locking for the GIC. You would need to
>>> revert:
>>>
>>>     Subject: [PATCH-for-5.2] exec: Remove MemoryRegion::global_locking field
>>>     Date: Thu,  6 Aug 2020 17:07:26 +0200
>>>     Message-Id: <20200806150726.962-1-philmd@redhat.com>
>>>
>>> and then implement a more fine tuned locking in the GIC emulation
>>> itself. However I think the BQL and per-CPU locks are lower hanging
>>> fruit to tackle first.
>>>
>>>>
>>>> Best regards,
>>>> Andrey Shinkevich
>>>
>>>
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-05-13 16:45     ` Shashi Mallela
  2021-05-13 18:29       ` Andrey Shinkevich
@ 2021-06-17 16:43       ` Andrey Shinkevich
  2021-06-17 17:44         ` shashi.mallela
  2021-06-18 13:15         ` Alex Bennée
  1 sibling, 2 replies; 15+ messages in thread
From: Andrey Shinkevich @ 2021-06-17 16:43 UTC (permalink / raw)
  To: Shashi Mallela
  Cc: peter.maydell, drjones, Cota, richard.henderson, qemu-devel,
	qemu-arm, Chengen (William, FixNet), yuzenghui, Wanghaibin (D),
	Alex Bennée

Dear Shashi,

I have applied the version 4 of the series "GICv3 LPI and ITS feature 
implementation" right after the commit 3e9f48b as before (because the 
GCCv7.5 is unavailable in the YUM repository for CentOS-7.9).

The guest OS still hangs at its start when QEMU is configured with 4 or 
more vCPUs (with 1 to 3 vCPUs the guest starts and runs OK and the MTTCG 
works properly):

Welcome to EulerOS 2.0 ... (Initramfs)!

…

[  OK  ] Mounted Kernel Configuration File System.

[  OK  ] Started udev Coldplug all Devices.

[  OK  ] Reached target System Initialization.

[  OK  ] Reached target Basic System.



IT HANGS HERE
  (with 4 or more vCPUs)!!!


[  OK  ] Found device /dev/mapper/euleros-root.

[  OK  ] Reached target Initrd Root Device.

[  OK  ] Started dracut initqueue hook.

          Starting File System Check on /dev/mapper/euleros-root...

[  OK  ] Reached target Remote File Systems (Pre).

[  OK  ] Reached target Remote File Systems.

[  OK  ] Started File System Check on /dev/mapper/euleros-root.

          Mounting /sysroot...

[  OK  ] Mounted /sysroot.

…


The back trace of threads in QEMU looks like a dead lock in MTTCG, 
doesn't it?

Thread 7 (Thread 0x7f476e489700 (LWP 24967)):

#0  0x00007f477c2bbd19 in syscall () at /lib64/libc.so.6

#1  0x000055747d41a270 in qemu_event_wait (val=<optimized out>, 
f=<optimized out>) at /home/andy/git/qemu/include/qemu/futex.h:29

#2  0x000055747d41a270 in qemu_event_wait (ev=ev@entry=0x55747e051c28 
<rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460

#3  0x000055747d444d78 in call_rcu_thread (opaque=opaque@entry=0x0) at 
../util/rcu.c:258

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 6 (Thread 0x7f472ce42700 (LWP 24970)):

#0  0x00007f477c2b6ccd in poll () at /lib64/libc.so.6

#1  0x00007f47805c137c in g_main_context_iterate.isra.19 () at 
/lib64/libglib-2.0.so.0

#2  0x00007f47805c16ca in g_main_loop_run () at /lib64/libglib-2.0.so.0

#3  0x000055747d29b071 in iothread_run 
(opaque=opaque@entry=0x55747f85f280) at ../iothread.c:80

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 5 (Thread 0x7f461f9ff700 (LWP 24971)):

#0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
/lib64/libpthread.so.0

#1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f916670, 
mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
"../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174

#2  0x000055747d20ae36 in qemu_wait_io_event 
(cpu=cpu@entry=0x55747f8b7920) at ../softmmu/cpus.c:417

#3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
(arg=arg@entry=0x55747f8b7920) at ../accel/tcg/tcg-accel-ops-mttcg.c:98

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 4 (Thread 0x7f461f1fe700 (LWP 24972)):

#0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
/lib64/libpthread.so.0

#1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9897e0, 
mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
"../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174

#2  0x000055747d20ae36 in qemu_wait_io_event 
(cpu=cpu@entry=0x55747f924bc0) at ../softmmu/cpus.c:417

#3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
(arg=arg@entry=0x55747f924bc0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 3 (Thread 0x7f461e9fd700 (LWP 24973)):

#0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
/lib64/libpthread.so.0

#1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9f5b40, 
mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
"../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174

#2  0x000055747d20ae36 in qemu_wait_io_event 
(cpu=cpu@entry=0x55747f990ba0) at ../softmmu/cpus.c:417

#3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
(arg=arg@entry=0x55747f990ba0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 2 (Thread 0x7f461e1fc700 (LWP 24974)):

#0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
/lib64/libpthread.so.0

---Type <return> to continue, or q <return> to quit---

#1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747fa626c0, 
mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
"../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174

#2  0x000055747d20ae36 in qemu_wait_io_event 
(cpu=cpu@entry=0x55747f9fcf00) at ../softmmu/cpus.c:417

#3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
(arg=arg@entry=0x55747f9fcf00) at ../accel/tcg/tcg-accel-ops-mttcg.c:98

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 1 (Thread 0x7f4781db4d00 (LWP 24957)):

#0  0x00007f477c2b6d8f in ppoll () at /lib64/libc.so.6

#1  0x000055747d431439 in qemu_poll_ns (__ss=0x0, 
__timeout=0x7ffcc3188330, __nfds=<optimized out>, __fds=<optimized out>) 
at /usr/include/bits/poll2.h:77

#2  0x000055747d431439 in qemu_poll_ns (fds=<optimized out>, 
nfds=<optimized out>, timeout=timeout@entry=3792947) at 
../util/qemu-timer.c:348

#3  0x000055747d4466ce in main_loop_wait (timeout=<optimized out>) at 
../util/main-loop.c:249

#4  0x000055747d4466ce in main_loop_wait 
(nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530

#5  0x000055747d2695c7 in qemu_main_loop () at ../softmmu/runstate.c:725

#6  0x000055747ccc1bde in main (argc=<optimized out>, argv=<optimized 
out>, envp=<optimized out>) at ../softmmu/main.c:50

(gdb)


I run QEMU with virt-manager as this:

qemu      7311     1 70 19:15 ?        00:00:05 
/usr/local/bin/qemu-system-aarch64 -name 
guest=EulerOS-2.8-Rich,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-95-EulerOS-2.8-Rich/master-key.aes 
-machine virt-6.1,accel=tcg,usb=off,dump-guest-core=off,gic-version=3 
-cpu max -drive 
file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on 
-drive 
file=/var/lib/libvirt/qemu/nvram/EulerOS-2.8-Rich_VARS.fd,if=pflash,format=raw,unit=1 
-m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid 
c95e0e92-011b-449a-8e3f-b5f0938aaaa7 -display none -no-user-config 
-nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon 
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on -device 
pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 
-device 
pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 
-device 
pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 
-device 
pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 
-device qemu-xhci,p2=8,p3=8,id=usb,bus=pci.2,addr=0x0 -device 
virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -drive 
file=/var/lib/libvirt/images/EulerOS-2.8-Rich.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 
-drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 
-netdev tap,fd=28,id=hostnet0 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f9:e0:69,bus=pci.1,addr=0x0 
-chardev pty,id=charserial0 -serial chardev:charserial0 -msg timestamp=on

The issue is reproducible and persists.
1. Do you think that applying the series results in the dead lock in 
MTTCG? Or it may be other reason?
2. Which piece of QEMU source code should I investigate to locate the issue?

Best regards,
Andrey Shinkevich


On 5/13/21 7:45 PM, Shashi Mallela wrote:
> Hi Andrey,
> 
> To clarify, the patch series
> 
>     https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>     "GICv3 LPI and ITS feature implementation"
> 
> is applicable for virt machine 6.1 onwards,i.e ITS TCG functionality is 
> not available for version 6.0 that is being tried
> here.
> 
> Thanks
> Shashi
> 
> On May 13 2021, at 12:35 pm, Andrey Shinkevich 
> <andrey.shinkevich@huawei.com> wrote:
> 
>     Dear colleagues,
> 
>     Thank you all very much for your responses. Let me reply with one
>     message.
> 
>     I configured QEMU for AARCH64 guest:
>     $ ./configure --target-list=aarch64-softmmu
> 
>     When I start QEMU with GICv3 on an x86 host:
>     qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
> 
>     QEMU reports this error from hw/pci/msix.c:
>     error_setg(errp, "MSI-X is not supported by interrupt controller");
> 
>     Probably, the variable 'msi_nonbroken' would be initialized in
>     hw/intc/arm_gicv3_its_common.c:
>     gicv3_its_init_mmio(..)
> 
>     I guess that it works with KVM acceleration only rather than with TCG.
> 
>     The error persists after applying the series:
>     https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>     "GICv3 LPI and ITS feature implementation"
>     (special thanks for referring me to that)
> 
>     Please, make me clear and advise ideas how that error can be fixed?
>     Should the MSI-X support be implemented with GICv3 extra?
> 
>     When successful, I would like to test QEMU for a maximum number of cores
>     to get the best MTTCG performance.
>     Probably, we will get just some percentage of performance enhancement
>     with the BQL series applied, won't we? I will test it as well.
> 
>     Best regards,
>     Andrey Shinkevich
> 
> 
>     On 5/12/21 6:43 PM, Alex Bennée wrote:
>      >
>      > Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
>      >
>      >> Dear colleagues,
>      >>
>      >> I am looking for ways to accelerate the MTTCG for ARM guest on
>     x86-64 host.
>      >> The maximum number of CPUs for MTTCG that uses GICv2 is limited
>     by 8:
>      >>
>      >> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>      >>
>      >> The version 3 of the Generic Interrupt Controller (GICv3) is not
>      >> supported in QEMU for some reason unknown to me. It would allow to
>      >> increase the limit of CPUs and accelerate the MTTCG performance on a
>      >> multiple core hypervisor.
>      >
>      > It is supported, you just need to select it.
>      >
>      >> I have got an idea to implement the Interrupt Translation
>     Service (ITS)
>      >> for using by MTTCG for ARM architecture.
>      >
>      > There is some work to support ITS under TCG already posted:
>      >
>      > Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>      > Date: Thu, 29 Apr 2021 19:41:53 -0400
>      > Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
>      >
>      > please do review and test.
>      >
>      >> Do you find that idea useful and feasible?
>      >> If yes, how much time do you estimate for such a project to
>     complete by
>      >> one developer?
>      >> If no, what are reasons for not implementing GICv3 for MTTCG in
>     QEMU?
>      >
>      > As far as MTTCG performance is concerned there is a degree of
>      > diminishing returns to be expected as the synchronisation cost
>     between
>      > threads will eventually outweigh the gains of additional threads.
>      >
>      > There are a number of parts that could improve this performance. The
>      > first would be picking up the BQL reduction series from your
>     FutureWei
>      > colleges who worked on the problem when they were Linaro assignees:
>      >
>      > Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from
>     cpu_handle_interrupt/exception path
>      > Date: Wed, 19 Aug 2020 14:28:49 -0400
>      > Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
>      >
>      > There was also a longer series moving towards per-CPU locks:
>      >
>      > Subject: [PATCH v10 00/73] per-CPU locks
>      > Date: Wed, 17 Jun 2020 17:01:18 -0400
>      > Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
>      >
>      > I believe the initial measurements showed that the BQL cost
>     started to
>      > edge up with GIC interactions. We did discuss approaches for this
>     and I
>      > think one idea was use non-BQL locking for the GIC. You would need to
>      > revert:
>      >
>      > Subject: [PATCH-for-5.2] exec: Remove
>     MemoryRegion::global_locking field
>      > Date: Thu, 6 Aug 2020 17:07:26 +0200
>      > Message-Id: <20200806150726.962-1-philmd@redhat.com>
>      >
>      > and then implement a more fine tuned locking in the GIC emulation
>      > itself. However I think the BQL and per-CPU locks are lower hanging
>      > fruit to tackle first.
>      >
>      >>
>      >> Best regards,
>      >> Andrey Shinkevich
>      >
>      >
> 
> Sent from Mailspring



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-06-17 16:43       ` Andrey Shinkevich
@ 2021-06-17 17:44         ` shashi.mallela
  2021-06-17 18:55           ` Andrey Shinkevich
  2021-06-18 13:15         ` Alex Bennée
  1 sibling, 1 reply; 15+ messages in thread
From: shashi.mallela @ 2021-06-17 17:44 UTC (permalink / raw)
  To: Andrey Shinkevich
  Cc: peter.maydell, drjones, Cota, richard.henderson, qemu-devel,
	qemu-arm, Chengen (William,  FixNet), yuzenghui, Wanghaibin (D),
	Alex Bennée

Hi Andrey,

The issue doesnt seem related to ITS patchset as the implementation has
no changes around MTTCG or vCPU configurations.

if this patchset were not applied(with only commit 3e9f48b),do you
still see the hang issue?

Thanks
Shashi


On Thu, 2021-06-17 at 16:43 +0000, Andrey Shinkevich wrote:
> Dear Shashi,
> 
> I have applied the version 4 of the series "GICv3 LPI and ITS
> feature 
> implementation" right after the commit 3e9f48b as before (because
> the 
> GCCv7.5 is unavailable in the YUM repository for CentOS-7.9).
> 
> The guest OS still hangs at its start when QEMU is configured with 4
> or 
> more vCPUs (with 1 to 3 vCPUs the guest starts and runs OK and the
> MTTCG 
> works properly):
> 
> Welcome to EulerOS 2.0 ... (Initramfs)!
> 
> …
> 
> [  OK  ] Mounted Kernel Configuration File System.
> 
> [  OK  ] Started udev Coldplug all Devices.
> 
> [  OK  ] Reached target System Initialization.
> 
> [  OK  ] Reached target Basic System.
> 
> 
> 
> IT HANGS HERE
>   (with 4 or more vCPUs)!!!
> 
> 
> [  OK  ] Found device /dev/mapper/euleros-root.
> 
> [  OK  ] Reached target Initrd Root Device.
> 
> [  OK  ] Started dracut initqueue hook.
> 
>           Starting File System Check on /dev/mapper/euleros-root...
> 
> [  OK  ] Reached target Remote File Systems (Pre).
> 
> [  OK  ] Reached target Remote File Systems.
> 
> [  OK  ] Started File System Check on /dev/mapper/euleros-root.
> 
>           Mounting /sysroot...
> 
> [  OK  ] Mounted /sysroot.
> 
> …
> 
> 
> The back trace of threads in QEMU looks like a dead lock in MTTCG, 
> doesn't it?
> 
> Thread 7 (Thread 0x7f476e489700 (LWP 24967)):
> 
> #0  0x00007f477c2bbd19 in syscall () at /lib64/libc.so.6
> 
> #1  0x000055747d41a270 in qemu_event_wait (val=<optimized out>, 
> f=<optimized out>) at /home/andy/git/qemu/include/qemu/futex.h:29
> 
> #2  0x000055747d41a270 in qemu_event_wait (ev=ev@entry=0x55747e051c28
>  
> <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460
> 
> #3  0x000055747d444d78 in call_rcu_thread (opaque=opaque@entry=0x0)
> at 
> ../util/rcu.c:258
> 
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>)
> at 
> ../util/qemu-thread-posix.c:521
> 
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
> 
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
> 
> 
> 
> Thread 6 (Thread 0x7f472ce42700 (LWP 24970)):
> 
> #0  0x00007f477c2b6ccd in poll () at /lib64/libc.so.6
> 
> #1  0x00007f47805c137c in g_main_context_iterate.isra.19 () at 
> /lib64/libglib-2.0.so.0
> 
> #2  0x00007f47805c16ca in g_main_loop_run () at /lib64/libglib-
> 2.0.so.0
> 
> #3  0x000055747d29b071 in iothread_run 
> (opaque=opaque@entry=0x55747f85f280) at ../iothread.c:80
> 
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>)
> at 
> ../util/qemu-thread-posix.c:521
> 
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
> 
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
> 
> 
> 
> Thread 5 (Thread 0x7f461f9ff700 (LWP 24971)):
> 
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
> 
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f916670, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
> 
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f8b7920) at ../softmmu/cpus.c:417
> 
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f8b7920) at ../accel/tcg/tcg-accel-ops-
> mttcg.c:98
> 
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>)
> at 
> ../util/qemu-thread-posix.c:521
> 
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
> 
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
> 
> 
> 
> Thread 4 (Thread 0x7f461f1fe700 (LWP 24972)):
> 
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
> 
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9897e0, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
> 
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f924bc0) at ../softmmu/cpus.c:417
> 
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f924bc0) at ../accel/tcg/tcg-accel-ops-
> mttcg.c:98
> 
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>)
> at 
> ../util/qemu-thread-posix.c:521
> 
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
> 
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
> 
> 
> 
> Thread 3 (Thread 0x7f461e9fd700 (LWP 24973)):
> 
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
> 
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9f5b40, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
> 
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f990ba0) at ../softmmu/cpus.c:417
> 
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f990ba0) at ../accel/tcg/tcg-accel-ops-
> mttcg.c:98
> 
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>)
> at 
> ../util/qemu-thread-posix.c:521
> 
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
> 
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
> 
> 
> 
> Thread 2 (Thread 0x7f461e1fc700 (LWP 24974)):
> 
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
> 
> ---Type <return> to continue, or q <return> to quit---
> 
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747fa626c0, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
> 
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f9fcf00) at ../softmmu/cpus.c:417
> 
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f9fcf00) at ../accel/tcg/tcg-accel-ops-
> mttcg.c:98
> 
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>)
> at 
> ../util/qemu-thread-posix.c:521
> 
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
> 
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
> 
> 
> 
> Thread 1 (Thread 0x7f4781db4d00 (LWP 24957)):
> 
> #0  0x00007f477c2b6d8f in ppoll () at /lib64/libc.so.6
> 
> #1  0x000055747d431439 in qemu_poll_ns (__ss=0x0, 
> __timeout=0x7ffcc3188330, __nfds=<optimized out>, __fds=<optimized
> out>) 
> at /usr/include/bits/poll2.h:77
> 
> #2  0x000055747d431439 in qemu_poll_ns (fds=<optimized out>, 
> nfds=<optimized out>, timeout=timeout@entry=3792947) at 
> ../util/qemu-timer.c:348
> 
> #3  0x000055747d4466ce in main_loop_wait (timeout=<optimized out>)
> at 
> ../util/main-loop.c:249
> 
> #4  0x000055747d4466ce in main_loop_wait 
> (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
> 
> #5  0x000055747d2695c7 in qemu_main_loop () at
> ../softmmu/runstate.c:725
> 
> #6  0x000055747ccc1bde in main (argc=<optimized out>,
> argv=<optimized 
> out>, envp=<optimized out>) at ../softmmu/main.c:50
> 
> (gdb)
> 
> 
> I run QEMU with virt-manager as this:
> 
> qemu      7311     1 70 19:15 ?        00:00:05 
> /usr/local/bin/qemu-system-aarch64 -name 
> guest=EulerOS-2.8-Rich,debug-threads=on -S -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-95-
> EulerOS-2.8-Rich/master-key.aes 
> -machine virt-6.1,accel=tcg,usb=off,dump-guest-core=off,gic-
> version=3 
> -cpu max -drive 
> file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,reado
> nly=on 
> -drive 
> file=/var/lib/libvirt/qemu/nvram/EulerOS-2.8-
> Rich_VARS.fd,if=pflash,format=raw,unit=1 
> -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid 
> c95e0e92-011b-449a-8e3f-b5f0938aaaa7 -display none -no-user-config 
> -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon 
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-
> shutdown 
> -boot strict=on -device 
> pcie-root-
> port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
>  
> -device 
> pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 
> -device 
> pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 
> -device 
> pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 
> -device qemu-xhci,p2=8,p3=8,id=usb,bus=pci.2,addr=0x0 -device 
> virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -drive 
> file=/var/lib/libvirt/images/EulerOS-2.8-
> Rich.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-
> 0,id=scsi0-0-0-0,bootindex=1 
> -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
> scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-
> 1,id=scsi0-0-0-1 
> -netdev tap,fd=28,id=hostnet0 -device 
> virtio-net-
> pci,netdev=hostnet0,id=net0,mac=52:54:00:f9:e0:69,bus=pci.1,addr=0x0 
> -chardev pty,id=charserial0 -serial chardev:charserial0 -msg
> timestamp=on
> 
> The issue is reproducible and persists.
> 1. Do you think that applying the series results in the dead lock in 
> MTTCG? Or it may be other reason?
> 2. Which piece of QEMU source code should I investigate to locate the
> issue?
> 
> Best regards,
> Andrey Shinkevich
> 
> 
> On 5/13/21 7:45 PM, Shashi Mallela wrote:
> > Hi Andrey,
> > 
> > To clarify, the patch series
> > 
> >     
> > https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
> >     "GICv3 LPI and ITS feature implementation"
> > 
> > is applicable for virt machine 6.1 onwards,i.e ITS TCG
> > functionality is 
> > not available for version 6.0 that is being tried
> > here.
> > 
> > Thanks
> > Shashi
> > 
> > On May 13 2021, at 12:35 pm, Andrey Shinkevich 
> > <andrey.shinkevich@huawei.com> wrote:
> > 
> >     Dear colleagues,
> > 
> >     Thank you all very much for your responses. Let me reply with
> > one
> >     message.
> > 
> >     I configured QEMU for AARCH64 guest:
> >     $ ./configure --target-list=aarch64-softmmu
> > 
> >     When I start QEMU with GICv3 on an x86 host:
> >     qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
> > 
> >     QEMU reports this error from hw/pci/msix.c:
> >     error_setg(errp, "MSI-X is not supported by interrupt
> > controller");
> > 
> >     Probably, the variable 'msi_nonbroken' would be initialized in
> >     hw/intc/arm_gicv3_its_common.c:
> >     gicv3_its_init_mmio(..)
> > 
> >     I guess that it works with KVM acceleration only rather than
> > with TCG.
> > 
> >     The error persists after applying the series:
> >     
> > https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
> >     "GICv3 LPI and ITS feature implementation"
> >     (special thanks for referring me to that)
> > 
> >     Please, make me clear and advise ideas how that error can be
> > fixed?
> >     Should the MSI-X support be implemented with GICv3 extra?
> > 
> >     When successful, I would like to test QEMU for a maximum number
> > of cores
> >     to get the best MTTCG performance.
> >     Probably, we will get just some percentage of performance
> > enhancement
> >     with the BQL series applied, won't we? I will test it as well.
> > 
> >     Best regards,
> >     Andrey Shinkevich
> > 
> > 
> >     On 5/12/21 6:43 PM, Alex Bennée wrote:
> >      >
> >      > Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
> >      >
> >      >> Dear colleagues,
> >      >>
> >      >> I am looking for ways to accelerate the MTTCG for ARM guest
> > on
> >     x86-64 host.
> >      >> The maximum number of CPUs for MTTCG that uses GICv2 is
> > limited
> >     by 8:
> >      >>
> >      >> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
> >      >>
> >      >> The version 3 of the Generic Interrupt Controller (GICv3)
> > is not
> >      >> supported in QEMU for some reason unknown to me. It would
> > allow to
> >      >> increase the limit of CPUs and accelerate the MTTCG
> > performance on a
> >      >> multiple core hypervisor.
> >      >
> >      > It is supported, you just need to select it.
> >      >
> >      >> I have got an idea to implement the Interrupt Translation
> >     Service (ITS)
> >      >> for using by MTTCG for ARM architecture.
> >      >
> >      > There is some work to support ITS under TCG already posted:
> >      >
> >      > Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature
> > implementation
> >      > Date: Thu, 29 Apr 2021 19:41:53 -0400
> >      > Message-Id: <
> > 20210429234201.125565-1-shashi.mallela@linaro.org>
> >      >
> >      > please do review and test.
> >      >
> >      >> Do you find that idea useful and feasible?
> >      >> If yes, how much time do you estimate for such a project to
> >     complete by
> >      >> one developer?
> >      >> If no, what are reasons for not implementing GICv3 for
> > MTTCG in
> >     QEMU?
> >      >
> >      > As far as MTTCG performance is concerned there is a degree
> > of
> >      > diminishing returns to be expected as the synchronisation
> > cost
> >     between
> >      > threads will eventually outweigh the gains of additional
> > threads.
> >      >
> >      > There are a number of parts that could improve this
> > performance. The
> >      > first would be picking up the BQL reduction series from your
> >     FutureWei
> >      > colleges who worked on the problem when they were Linaro
> > assignees:
> >      >
> >      > Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from
> >     cpu_handle_interrupt/exception path
> >      > Date: Wed, 19 Aug 2020 14:28:49 -0400
> >      > Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
> >      >
> >      > There was also a longer series moving towards per-CPU locks:
> >      >
> >      > Subject: [PATCH v10 00/73] per-CPU locks
> >      > Date: Wed, 17 Jun 2020 17:01:18 -0400
> >      > Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
> >      >
> >      > I believe the initial measurements showed that the BQL cost
> >     started to
> >      > edge up with GIC interactions. We did discuss approaches for
> > this
> >     and I
> >      > think one idea was use non-BQL locking for the GIC. You
> > would need to
> >      > revert:
> >      >
> >      > Subject: [PATCH-for-5.2] exec: Remove
> >     MemoryRegion::global_locking field
> >      > Date: Thu, 6 Aug 2020 17:07:26 +0200
> >      > Message-Id: <20200806150726.962-1-philmd@redhat.com>
> >      >
> >      > and then implement a more fine tuned locking in the GIC
> > emulation
> >      > itself. However I think the BQL and per-CPU locks are lower
> > hanging
> >      > fruit to tackle first.
> >      >
> >      >>
> >      >> Best regards,
> >      >> Andrey Shinkevich
> >      >
> >      >
> > 
> > Sent from Mailspring



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-06-17 17:44         ` shashi.mallela
@ 2021-06-17 18:55           ` Andrey Shinkevich
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Shinkevich @ 2021-06-17 18:55 UTC (permalink / raw)
  To: shashi.mallela
  Cc: peter.maydell, drjones, Cota, richard.henderson, qemu-devel,
	qemu-arm, Chengen (William, FixNet), yuzenghui, Wanghaibin (D),
	Alex Bennée

On 6/17/21 8:44 PM, shashi.mallela@linaro.org wrote:
> Hi Andrey,
> 
> The issue doesnt seem related to ITS patchset as the implementation has
> no changes around MTTCG or vCPU configurations.
> 
> if this patchset were not applied(with only commit 3e9f48b),do you
> still see the hang issue?

No, I don't. Even with the patchset applied, the 'gic-version=2' turns 
the guest to normal running.
With the 'gic-version=3' and '-smp 3' (1,2 or 3 vCPUs), the guest starts 
and runs OK as well.

Andrey

> 
> Thanks
> Shashi
> 
> 
> On Thu, 2021-06-17 at 16:43 +0000, Andrey Shinkevich wrote:
>> Dear Shashi,
>>
>> I have applied the version 4 of the series "GICv3 LPI and ITS
>> feature
>> implementation" right after the commit 3e9f48b as before (because
>> the
>> GCCv7.5 is unavailable in the YUM repository for CentOS-7.9).
>>
>> The guest OS still hangs at its start when QEMU is configured with 4
>> or
>> more vCPUs (with 1 to 3 vCPUs the guest starts and runs OK and the
>> MTTCG
>> works properly):
>>
>> Welcome to EulerOS 2.0 ... (Initramfs)!
>>
>> …
>>
>> [  OK  ] Mounted Kernel Configuration File System.
>>
>> [  OK  ] Started udev Coldplug all Devices.
>>
>> [  OK  ] Reached target System Initialization.
>>
>> [  OK  ] Reached target Basic System.
>>
>>
>>
>> IT HANGS HERE
>>    (with 4 or more vCPUs)!!!
>>
>>
>> [  OK  ] Found device /dev/mapper/euleros-root.
>>
>> [  OK  ] Reached target Initrd Root Device.
>>
>> [  OK  ] Started dracut initqueue hook.
>>
>>            Starting File System Check on /dev/mapper/euleros-root...
>>
>> [  OK  ] Reached target Remote File Systems (Pre).
>>
>> [  OK  ] Reached target Remote File Systems.
>>
>> [  OK  ] Started File System Check on /dev/mapper/euleros-root.
>>
>>            Mounting /sysroot...
>>
>> [  OK  ] Mounted /sysroot.
>>
>> …
>>
>>
>> The back trace of threads in QEMU looks like a dead lock in MTTCG,
>> doesn't it?
>>
>> Thread 7 (Thread 0x7f476e489700 (LWP 24967)):
>>
...
>>
>> Thread 5 (Thread 0x7f461f9ff700 (LWP 24971)):
>>
...
>>
>> Thread 4 (Thread 0x7f461f1fe700 (LWP 24972)):
>>
...
>>
>> Thread 3 (Thread 0x7f461e9fd700 (LWP 24973)...
>>
>> Thread 2 (Thread 0x7f461e1fc700 (LWP 24974)):
>>
>> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
>> /lib64/libpthread.so.0
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747fa626c0,
>> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
>> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>>
>> #2  0x000055747d20ae36 in qemu_wait_io_event
>> (cpu=cpu@entry=0x55747f9fcf00) at ../softmmu/cpus.c:417
>>
>> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn
>> (arg=arg@entry=0x55747f9fcf00) at ../accel/tcg/tcg-accel-ops-
>> mttcg.c:98
>>
>> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>)
>> at
>> ../util/qemu-thread-posix.c:521
>>
>> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>>
>> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>>
>>
>> Thread 1 (Thread 0x7f4781db4d00 (LWP 24957)):
>>
...
>>
>> (gdb)
>>
>>
>> I run QEMU with virt-manager as this:
>>
>> qemu      7311     1 70 19:15 ?        00:00:05
>> /usr/local/bin/qemu-system-aarch64 -name
>> guest=EulerOS-2.8-Rich,debug-threads=on -S -object
>> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-95-
>> EulerOS-2.8-Rich/master-key.aes
>> -machine virt-6.1,accel=tcg,usb=off,dump-guest-core=off,gic-
>> version=3
>> -cpu max -drive
>> file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,reado
>> nly=on
>> -drive
>> file=/var/lib/libvirt/qemu/nvram/EulerOS-2.8-
>> Rich_VARS.fd,if=pflash,format=raw,unit=1
>> -m 4096 -smp 4,sockets=4,cores=1,threads=1
...
>>
>> The issue is reproducible and persists.
>> 1. Do you think that applying the series results in the dead lock in
>> MTTCG? Or it may be other reason?
>> 2. Which piece of QEMU source code should I investigate to locate the
>> issue?
>>
>> Best regards,
>> Andrey Shinkevich
>>
...
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-06-17 16:43       ` Andrey Shinkevich
  2021-06-17 17:44         ` shashi.mallela
@ 2021-06-18 13:15         ` Alex Bennée
  2021-06-18 15:18           ` Andrey Shinkevich
  1 sibling, 1 reply; 15+ messages in thread
From: Alex Bennée @ 2021-06-18 13:15 UTC (permalink / raw)
  To: Andrey Shinkevich
  Cc: peter.maydell, drjones, Cota, Shashi Mallela, richard.henderson,
	qemu-devel, qemu-arm, Chengen (William, FixNet),
	yuzenghui, Wanghaibin (D)


Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:

> Dear Shashi,
>
> I have applied the version 4 of the series "GICv3 LPI and ITS feature 
> implementation" right after the commit 3e9f48b as before (because the 
> GCCv7.5 is unavailable in the YUM repository for CentOS-7.9).
>
> The guest OS still hangs at its start when QEMU is configured with 4 or 
> more vCPUs (with 1 to 3 vCPUs the guest starts and runs OK and the MTTCG 
> works properly):

Does QEMU itself hang? If you attach gdb to QEMU and do:

  thread apply all bt

that should dump the backtrace for all threads. Could you post the backtrace?

>
> Welcome to EulerOS 2.0 ... (Initramfs)!
>
> …
>
> [  OK  ] Mounted Kernel Configuration File System.
>
> [  OK  ] Started udev Coldplug all Devices.
>
> [  OK  ] Reached target System Initialization.
>
> [  OK  ] Reached target Basic System.
>
>
>
> IT HANGS HERE
>   (with 4 or more vCPUs)!!!
>
>
> [  OK  ] Found device /dev/mapper/euleros-root.
>
> [  OK  ] Reached target Initrd Root Device.
>
> [  OK  ] Started dracut initqueue hook.
>
>           Starting File System Check on /dev/mapper/euleros-root...
>
> [  OK  ] Reached target Remote File Systems (Pre).
>
> [  OK  ] Reached target Remote File Systems.
>
> [  OK  ] Started File System Check on /dev/mapper/euleros-root.
>
>           Mounting /sysroot...
>
> [  OK  ] Mounted /sysroot.
>
> …
>
>
> The back trace of threads in QEMU looks like a dead lock in MTTCG, 
> doesn't it?
>
> Thread 7 (Thread 0x7f476e489700 (LWP 24967)):
>
> #0  0x00007f477c2bbd19 in syscall () at /lib64/libc.so.6
>
> #1  0x000055747d41a270 in qemu_event_wait (val=<optimized out>, 
> f=<optimized out>) at /home/andy/git/qemu/include/qemu/futex.h:29
>
> #2  0x000055747d41a270 in qemu_event_wait (ev=ev@entry=0x55747e051c28 
> <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460
>
> #3  0x000055747d444d78 in call_rcu_thread (opaque=opaque@entry=0x0) at 
> ../util/rcu.c:258
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 6 (Thread 0x7f472ce42700 (LWP 24970)):
>
> #0  0x00007f477c2b6ccd in poll () at /lib64/libc.so.6
>
> #1  0x00007f47805c137c in g_main_context_iterate.isra.19 () at 
> /lib64/libglib-2.0.so.0
>
> #2  0x00007f47805c16ca in g_main_loop_run () at /lib64/libglib-2.0.so.0
>
> #3  0x000055747d29b071 in iothread_run 
> (opaque=opaque@entry=0x55747f85f280) at ../iothread.c:80
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 5 (Thread 0x7f461f9ff700 (LWP 24971)):
>
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
>
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f916670, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f8b7920) at ../softmmu/cpus.c:417
>
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f8b7920) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 4 (Thread 0x7f461f1fe700 (LWP 24972)):
>
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
>
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9897e0, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f924bc0) at ../softmmu/cpus.c:417
>
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f924bc0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 3 (Thread 0x7f461e9fd700 (LWP 24973)):
>
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
>
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9f5b40, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f990ba0) at ../softmmu/cpus.c:417
>
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f990ba0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 2 (Thread 0x7f461e1fc700 (LWP 24974)):
>
> #0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> /lib64/libpthread.so.0
>
> ---Type <return> to continue, or q <return> to quit---
>
> #1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747fa626c0, 
> mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c 
> "../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174
>
> #2  0x000055747d20ae36 in qemu_wait_io_event 
> (cpu=cpu@entry=0x55747f9fcf00) at ../softmmu/cpus.c:417
>
> #3  0x000055747d18d6a1 in mttcg_cpu_thread_fn 
> (arg=arg@entry=0x55747f9fcf00) at ../accel/tcg/tcg-accel-ops-mttcg.c:98
>
> #4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:521
>
> #5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0
>
> #6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6
>
>
>
> Thread 1 (Thread 0x7f4781db4d00 (LWP 24957)):
>
> #0  0x00007f477c2b6d8f in ppoll () at /lib64/libc.so.6
>
> #1  0x000055747d431439 in qemu_poll_ns (__ss=0x0, 
> __timeout=0x7ffcc3188330, __nfds=<optimized out>, __fds=<optimized out>) 
> at /usr/include/bits/poll2.h:77
>
> #2  0x000055747d431439 in qemu_poll_ns (fds=<optimized out>, 
> nfds=<optimized out>, timeout=timeout@entry=3792947) at 
> ../util/qemu-timer.c:348
>
> #3  0x000055747d4466ce in main_loop_wait (timeout=<optimized out>) at 
> ../util/main-loop.c:249
>
> #4  0x000055747d4466ce in main_loop_wait 
> (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
>
> #5  0x000055747d2695c7 in qemu_main_loop () at ../softmmu/runstate.c:725
>
> #6  0x000055747ccc1bde in main (argc=<optimized out>, argv=<optimized 
> out>, envp=<optimized out>) at ../softmmu/main.c:50
>
> (gdb)
>
>
> I run QEMU with virt-manager as this:
>
> qemu      7311     1 70 19:15 ?        00:00:05 
> /usr/local/bin/qemu-system-aarch64 -name 
> guest=EulerOS-2.8-Rich,debug-threads=on -S -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-95-EulerOS-2.8-Rich/master-key.aes 
> -machine virt-6.1,accel=tcg,usb=off,dump-guest-core=off,gic-version=3 
> -cpu max -drive 
> file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on 
> -drive 
> file=/var/lib/libvirt/qemu/nvram/EulerOS-2.8-Rich_VARS.fd,if=pflash,format=raw,unit=1 
> -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid 
> c95e0e92-011b-449a-8e3f-b5f0938aaaa7 -display none -no-user-config 
> -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon 
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
> -boot strict=on -device 
> pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 
> -device 
> pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 
> -device 
> pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 
> -device 
> pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 
> -device qemu-xhci,p2=8,p3=8,id=usb,bus=pci.2,addr=0x0 -device 
> virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -drive 
> file=/var/lib/libvirt/images/EulerOS-2.8-Rich.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 
> -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device 
> scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 
> -netdev tap,fd=28,id=hostnet0 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f9:e0:69,bus=pci.1,addr=0x0 
> -chardev pty,id=charserial0 -serial chardev:charserial0 -msg timestamp=on
>
> The issue is reproducible and persists.
> 1. Do you think that applying the series results in the dead lock in 
> MTTCG? Or it may be other reason?
> 2. Which piece of QEMU source code should I investigate to locate the issue?
>
> Best regards,
> Andrey Shinkevich
>
>
> On 5/13/21 7:45 PM, Shashi Mallela wrote:
>> Hi Andrey,
>> 
>> To clarify, the patch series
>> 
>>     https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>>     "GICv3 LPI and ITS feature implementation"
>> 
>> is applicable for virt machine 6.1 onwards,i.e ITS TCG functionality is 
>> not available for version 6.0 that is being tried
>> here.
>> 
>> Thanks
>> Shashi
>> 
>> On May 13 2021, at 12:35 pm, Andrey Shinkevich 
>> <andrey.shinkevich@huawei.com> wrote:
>> 
>>     Dear colleagues,
>> 
>>     Thank you all very much for your responses. Let me reply with one
>>     message.
>> 
>>     I configured QEMU for AARCH64 guest:
>>     $ ./configure --target-list=aarch64-softmmu
>> 
>>     When I start QEMU with GICv3 on an x86 host:
>>     qemu-system-aarch64 -machine virt-6.0,accel=tcg,gic-version=3
>> 
>>     QEMU reports this error from hw/pci/msix.c:
>>     error_setg(errp, "MSI-X is not supported by interrupt controller");
>> 
>>     Probably, the variable 'msi_nonbroken' would be initialized in
>>     hw/intc/arm_gicv3_its_common.c:
>>     gicv3_its_init_mmio(..)
>> 
>>     I guess that it works with KVM acceleration only rather than with TCG.
>> 
>>     The error persists after applying the series:
>>     https://lists.gnu.org/archive/html/qemu-arm/2021-04/msg00944.html
>>     "GICv3 LPI and ITS feature implementation"
>>     (special thanks for referring me to that)
>> 
>>     Please, make me clear and advise ideas how that error can be fixed?
>>     Should the MSI-X support be implemented with GICv3 extra?
>> 
>>     When successful, I would like to test QEMU for a maximum number of cores
>>     to get the best MTTCG performance.
>>     Probably, we will get just some percentage of performance enhancement
>>     with the BQL series applied, won't we? I will test it as well.
>> 
>>     Best regards,
>>     Andrey Shinkevich
>> 
>> 
>>     On 5/12/21 6:43 PM, Alex Bennée wrote:
>>      >
>>      > Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
>>      >
>>      >> Dear colleagues,
>>      >>
>>      >> I am looking for ways to accelerate the MTTCG for ARM guest on
>>     x86-64 host.
>>      >> The maximum number of CPUs for MTTCG that uses GICv2 is limited
>>     by 8:
>>      >>
>>      >> include/hw/intc/arm_gic_common.h:#define GIC_NCPU 8
>>      >>
>>      >> The version 3 of the Generic Interrupt Controller (GICv3) is not
>>      >> supported in QEMU for some reason unknown to me. It would allow to
>>      >> increase the limit of CPUs and accelerate the MTTCG performance on a
>>      >> multiple core hypervisor.
>>      >
>>      > It is supported, you just need to select it.
>>      >
>>      >> I have got an idea to implement the Interrupt Translation
>>     Service (ITS)
>>      >> for using by MTTCG for ARM architecture.
>>      >
>>      > There is some work to support ITS under TCG already posted:
>>      >
>>      > Subject: [PATCH v3 0/8] GICv3 LPI and ITS feature implementation
>>      > Date: Thu, 29 Apr 2021 19:41:53 -0400
>>      > Message-Id: <20210429234201.125565-1-shashi.mallela@linaro.org>
>>      >
>>      > please do review and test.
>>      >
>>      >> Do you find that idea useful and feasible?
>>      >> If yes, how much time do you estimate for such a project to
>>     complete by
>>      >> one developer?
>>      >> If no, what are reasons for not implementing GICv3 for MTTCG in
>>     QEMU?
>>      >
>>      > As far as MTTCG performance is concerned there is a degree of
>>      > diminishing returns to be expected as the synchronisation cost
>>     between
>>      > threads will eventually outweigh the gains of additional threads.
>>      >
>>      > There are a number of parts that could improve this performance. The
>>      > first would be picking up the BQL reduction series from your
>>     FutureWei
>>      > colleges who worked on the problem when they were Linaro assignees:
>>      >
>>      > Subject: [PATCH v2 0/7] accel/tcg: remove implied BQL from
>>     cpu_handle_interrupt/exception path
>>      > Date: Wed, 19 Aug 2020 14:28:49 -0400
>>      > Message-Id: <20200819182856.4893-1-robert.foley@linaro.org>
>>      >
>>      > There was also a longer series moving towards per-CPU locks:
>>      >
>>      > Subject: [PATCH v10 00/73] per-CPU locks
>>      > Date: Wed, 17 Jun 2020 17:01:18 -0400
>>      > Message-Id: <20200617210231.4393-1-robert.foley@linaro.org>
>>      >
>>      > I believe the initial measurements showed that the BQL cost
>>     started to
>>      > edge up with GIC interactions. We did discuss approaches for this
>>     and I
>>      > think one idea was use non-BQL locking for the GIC. You would need to
>>      > revert:
>>      >
>>      > Subject: [PATCH-for-5.2] exec: Remove
>>     MemoryRegion::global_locking field
>>      > Date: Thu, 6 Aug 2020 17:07:26 +0200
>>      > Message-Id: <20200806150726.962-1-philmd@redhat.com>
>>      >
>>      > and then implement a more fine tuned locking in the GIC emulation
>>      > itself. However I think the BQL and per-CPU locks are lower hanging
>>      > fruit to tackle first.
>>      >
>>      >>
>>      >> Best regards,
>>      >> Andrey Shinkevich
>>      >
>>      >
>> 
>> Sent from Mailspring


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GICv3 for MTTCG
  2021-06-18 13:15         ` Alex Bennée
@ 2021-06-18 15:18           ` Andrey Shinkevich
  0 siblings, 0 replies; 15+ messages in thread
From: Andrey Shinkevich @ 2021-06-18 15:18 UTC (permalink / raw)
  To: Alex Bennée
  Cc: peter.maydell, drjones, Cota, Shashi Mallela, richard.henderson,
	qemu-devel, qemu-arm, Chengen (William, FixNet),
	yuzenghui, Wanghaibin (D)

On 6/18/21 4:17 PM, Alex Bennée wrote:
> 
> Andrey Shinkevich <andrey.shinkevich@huawei.com> writes:
> 
>> Dear Shashi,
>>
>> I have applied the version 4 of the series "GICv3 LPI and ITS feature
>> implementation" right after the commit 3e9f48b as before (because the
>> GCCv7.5 is unavailable in the YUM repository for CentOS-7.9).
>>
>> The guest OS still hangs at its start when QEMU is configured with 4 or
>> more vCPUs (with 1 to 3 vCPUs the guest starts and runs OK and the MTTCG
>> works properly):
> 
> Does QEMU itself hang? If you attach gdb to QEMU and do:
> 
>    thread apply all bt
> 
> that should dump the backtrace for all threads. Could you post the backtrace?
> 
Thank you, Alex, for your response.

Yes, it is QEMU that hangs
The dump of gdb command '# thr a a bt' is below
:


Thread 7 (Thread 0x7f476e489700 (LWP 24967)):

#0  0x00007f477c2bbd19 in syscall () at /lib64/libc.so.6

#1  0x000055747d41a270 in qemu_event_wait (val=<optimized out>,
f=<optimized out>) at /home/andy/git/qemu/include/qemu/futex.h:29

#2  0x000055747d41a270 in qemu_event_wait (ev=ev@entry=0x55747e051c28
<rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460

#3  0x000055747d444d78 in call_rcu_thread (opaque=opaque@entry=0x0) at
../util/rcu.c:258

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 6 (Thread 0x7f472ce42700 (LWP 24970)):

#0  0x00007f477c2b6ccd in poll () at /lib64/libc.so.6

#1  0x00007f47805c137c in g_main_context_iterate.isra.19 () at
/lib64/libglib-2.0.so.0

#2  0x00007f47805c16ca in g_main_loop_run () at /lib64/libglib-2.0.so.0

#3  0x000055747d29b071 in iothread_run
(opaque=opaque@entry=0x55747f85f280) at ../iothread.c:80

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 5 (Thread 0x7f461f9ff700 (LWP 24971)):

#0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
/lib64/libpthread.so.0

#1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f916670,
mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
"../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174

#2  0x000055747d20ae36 in qemu_wait_io_event
(cpu=cpu@entry=0x55747f8b7920) at ../softmmu/cpus.c:417

#3  0x000055747d18d6a1 in mttcg_cpu_thread_fn
(arg=arg@entry=0x55747f8b7920) at ../accel/tcg/tcg-accel-ops-mttcg.c:98

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 4 (Thread 0x7f461f1fe700 (LWP 24972)):

#0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
/lib64/libpthread.so.0

#1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9897e0,
mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
"../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174

#2  0x000055747d20ae36 in qemu_wait_io_event
(cpu=cpu@entry=0x55747f924bc0) at ../softmmu/cpus.c:417

#3  0x000055747d18d6a1 in mttcg_cpu_thread_fn
(arg=arg@entry=0x55747f924bc0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 3 (Thread 0x7f461e9fd700 (LWP 24973)):

#0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
/lib64/libpthread.so.0

#1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747f9f5b40,
mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
"../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174

#2  0x000055747d20ae36 in qemu_wait_io_event
(cpu=cpu@entry=0x55747f990ba0) at ../softmmu/cpus.c:417

#3  0x000055747d18d6a1 in mttcg_cpu_thread_fn
(arg=arg@entry=0x55747f990ba0) at ../accel/tcg/tcg-accel-ops-mttcg.c:98

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 2 (Thread 0x7f461e1fc700 (LWP 24974)):

#0  0x00007f477c59ca35 in pthread_cond_wait@@GLIBC_2.3.2 () at
/lib64/libpthread.so.0

---Type <return> to continue, or q <return> to quit---

#1  0x000055747d419b1d in qemu_cond_wait_impl (cond=0x55747fa626c0,
mutex=0x55747e04dc00 <qemu_global_mutex>, file=0x55747d5dbe5c
"../softmmu/cpus.c", line=417) at ../util/qemu-thread-posix.c:174

#2  0x000055747d20ae36 in qemu_wait_io_event
(cpu=cpu@entry=0x55747f9fcf00) at ../softmmu/cpus.c:417

#3  0x000055747d18d6a1 in mttcg_cpu_thread_fn
(arg=arg@entry=0x55747f9fcf00) at ../accel/tcg/tcg-accel-ops-mttcg.c:98

#4  0x000055747d419406 in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:521

#5  0x00007f477c598ea5 in start_thread () at /lib64/libpthread.so.0

#6  0x00007f477c2c19fd in clone () at /lib64/libc.so.6



Thread 1 (Thread 0x7f4781db4d00 (LWP 24957)):

#0  0x00007f477c2b6d8f in ppoll () at /lib64/libc.so.6

#1  0x000055747d431439 in qemu_poll_ns (__ss=0x0,
__timeout=0x7ffcc3188330, __nfds=<optimized out>, __fds=<optimized out>)
at /usr/include/bits/poll2.h:77

#2  0x000055747d431439 in qemu_poll_ns (fds=<optimized out>,
nfds=<optimized out>, timeout=timeout@entry=3792947) at
../util/qemu-timer.c:348

#3  0x000055747d4466ce in main_loop_wait (timeout=<optimized out>) at
../util/main-loop.c:249

#4  0x000055747d4466ce in main_loop_wait
(nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530

#5  0x000055747d2695c7 in qemu_main_loop () at ../softmmu/runstate.c:725

#6  0x000055747ccc1bde in main (argc=<optimized out>, argv=<optimized
out>, envp=<optimized out>) at ../softmmu/main.c:50

(gdb)

Andrey
...
>>
>>
>> I run QEMU with virt-manager as this:
>>
>> qemu      7311     1 70 19:15 ?        00:00:05
>> /usr/local/bin/qemu-system-aarch64 -name
>> guest=EulerOS-2.8-Rich,debug-threads=on -S -object
>> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-95-EulerOS-2.8-Rich/master-key.aes
>> -machine virt-6.1,accel=tcg,usb=off,dump-guest-core=off,gic-version=3
>> -cpu max -drive
>> file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on
>> -drive
>> file=/var/lib/libvirt/qemu/nvram/EulerOS-2.8-Rich_VARS.fd,if=pflash,format=raw,unit=1
>> -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid
>> c95e0e92-011b-449a-8e3f-b5f0938aaaa7 -display none -no-user-config
>> -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon
>> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
>> -boot strict=on -device
>> pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
>> -device
>> pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1
>> -device
>> pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2
>> -device
>> pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3
>> -device qemu-xhci,p2=8,p3=8,id=usb,bus=pci.2,addr=0x0 -device
>> virtio-scsi-pci,id=scsi0,bus=pci.3,addr=0x0 -drive
>> file=/var/lib/libvirt/images/EulerOS-2.8-Rich.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0
>> -device
>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
>> -drive if=none,id=drive-scsi0-0-0-1,readonly=on -device
>> scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1
>> -netdev tap,fd=28,id=hostnet0 -device
>> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f9:e0:69,bus=pci.1,addr=0x0
>> -chardev pty,id=charserial0 -serial chardev:charserial0 -msg timestamp=on
>>
>> The issue is reproducible and persists.
>> 1. Do you think that applying the series results in the dead lock in
>> MTTCG? Or it may be other reason?
>> 2. Which piece of QEMU source code should I investigate to locate the issue?
>>
>> Best regards,
>> Andrey Shinkevich
>>
>>
...
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-18 15:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11 17:51 GICv3 for MTTCG Andrey Shinkevich
2021-05-11 19:53 ` Richard Henderson
2021-05-12  1:44 ` Zenghui Yu
2021-05-12 15:26 ` Alex Bennée
2021-05-13 16:35   ` Andrey Shinkevich
2021-05-13 16:45     ` Shashi Mallela
2021-05-13 18:29       ` Andrey Shinkevich
2021-06-17 16:43       ` Andrey Shinkevich
2021-06-17 17:44         ` shashi.mallela
2021-06-17 18:55           ` Andrey Shinkevich
2021-06-18 13:15         ` Alex Bennée
2021-06-18 15:18           ` Andrey Shinkevich
2021-05-13 17:19     ` Alex Bennée
2021-05-13 18:33       ` Andrey Shinkevich
2021-05-14  5:21       ` Andrey Shinkevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).