kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Zenghui Yu <yuzenghui@huawei.com>,
	kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	drjones@redhat.com
Cc: andre.przywara@arm.com
Subject: Re: [kvm-unit-tests PATCH 10/10] arm64: gic: Use IPI test checking for the LPI tests
Date: Tue, 1 Dec 2020 15:09:44 +0000	[thread overview]
Message-ID: <1dc913fd-80f5-4c8b-9c4d-52a1f0eee400@arm.com> (raw)
In-Reply-To: <eb8f5df2-6ea5-7aad-344a-70fcef3adad8@arm.com>

Hi,

On 11/30/20 2:19 PM, Alexandru Elisei wrote:
> Hi Zenghui,
>
> On 11/30/20 1:59 PM, Zenghui Yu wrote:
>> Hi Alex,
>>
>> On 2020/11/27 22:50, Alexandru Elisei wrote:
>>> Hi Zhenghui,
>>>
>>> Thank you for having a look at this!
>>>
>>> On 11/26/20 9:30 AM, Zenghui Yu wrote:
>>>> On 2020/11/25 23:51, Alexandru Elisei wrote:
>>>>> The reason for the failure is that the test "dev2/eventid=20 now triggers
>>>>> an LPI" triggers 2 LPIs, not one. This behavior was present before this
>>>>> patch, but it was ignored because check_lpi_stats() wasn't looking at the
>>>>> acked array.
>>>>>
>>>>> I'm not familiar with the ITS so I'm not sure if this is expected, if the
>>>>> test is incorrect or if there is something wrong with KVM emulation.
>>>> I think this is expected, or not.
>>>>
>>>> Before INVALL, the LPI-8195 was already pending but disabled. On
>>>> receiving INVALL, VGIC will reload configuration for all LPIs targeting
>>>> collection-3 and deliver the now enabled LPI-8195. We'll therefore see
>>>> and handle it before sending the following INT (which will set the
>>>> LPI-8195 pending again).
>>>>
>>>>> Did some more testing on an Ampere eMAG (fast out-of-order cores) using
>>>>> qemu and kvmtool and Linux v5.8, here's what I found:
>>>>>
>>>>> - Using qemu and gic.flat built from*master*: error encountered 864 times
>>>>>     out of 1088 runs.
>>>>> - Using qemu: error encountered 852 times out of 1027 runs.
>>>>> - Using kvmtool: error encountered 8164 times out of 10602 runs.
>>>> If vcpu-3 hadn't seen and handled LPI-8195 as quickly as possible (e.g.,
>>>> vcpu-3 hadn't been scheduled), the following INT will set the already
>>>> pending LPI-8195 pending again and we'll receive it *once* on vcpu-3.
>>>> And we won't see the mentioned failure.
>>>>
>>>> I think we can just drop the (meaningless and confusing?) INT.
>>> I think I understand your explanation, the VCPU takes the interrupt immediately
>>> after the INVALL and before the INT, and the second interrupt that I am seeing is
>>> the one caused by the INT command.
>> Yes.
>>
>>> I tried modifying the test like this:
>>>
>>> diff --git a/arm/gic.c b/arm/gic.c
>>> index 6e93da80fe0d..0ef8c12ea234 100644
>>> --- a/arm/gic.c
>>> +++ b/arm/gic.c
>>> @@ -761,10 +761,17 @@ static void test_its_trigger(void)
>>>          wmb();
>>>          cpumask_clear(&mask);
>>>          cpumask_set_cpu(3, &mask);
>>> -       its_send_int(dev2, 20);
>> Shouldn't its_send_invall(col3) be moved down here? See below.
>>
>>>          wait_for_interrupts(&mask);
>>>          report(check_acked(&mask, 0, 8195),
>>> -                       "dev2/eventid=20 now triggers an LPI");
>>> +                       "dev2/eventid=20 pending LPI is received");
>>> +
>>> +       stats_reset();
>>> +       wmb();
>>> +       cpumask_clear(&mask);
>>> +       cpumask_set_cpu(3, &mask);
>>> +       its_send_int(dev2, 20);
>>> +       wait_for_interrupts(&mask);
>>> +       report(check_acked(&mask, 0, 8195), "dev2/eventid=20 triggers an LPI");
>>>            report_prefix_pop();
>>>   I removed the INT from the initial test, and added a separate one to check that
>>> the INT command still works. That looks to me that preserves the spirit of the
>>> original test. After doing stress testing this is what I got:
>>>
>>> - with kvmtool, 47,709 iterations, 27 times the test timed out when waiting for
>>> the interrupt after INVALL.
>>> - with qemu, 15,511 iterations, 258 times the test timed out when waiting for the
>>> interrupt after INVALL, just like with kvmtool.
>> I guess the reason of failure is that the LPI is taken *immediately*
>> after the INVALL?
>>
>>     /* Now call the invall and check the LPI hits */
>>     its_send_invall(col3);
>>         <- LPI is taken, acked[]++
>>     stats_reset();
>>         <- acked[] is cleared unexpectedly
>>     wmb();
>>     cpumask_clear(&mask);
>>     cpumask_set_cpu(3, &mask);
>>     wait_for_interrupts(&mask);
>>         <- we'll hit timed-out since acked[] is 0
> Yes, of course, you're right, I didn't realize that I was resetting the stats
> *after* the interrupt was enabled. This also explains why I was still seeing
> timeouts even when the timeout duration was set to 50 seconds. I'll retest with
> the fix:
>
> diff --git a/arm/gic.c b/arm/gic.c
> index 6e93da80fe0d..c4240f5aba39 100644
> --- a/arm/gic.c
> +++ b/arm/gic.c
> @@ -756,15 +756,22 @@ static void test_its_trigger(void)
>                         "dev2/eventid=20 still does not trigger any LPI");
>  
>         /* Now call the invall and check the LPI hits */
> +       stats_reset();
> +       wmb();
> +       cpumask_clear(&mask);
> +       cpumask_set_cpu(3, &mask);
>         its_send_invall(col3);
> +       wait_for_interrupts(&mask);
> +       report(check_acked(&mask, 0, 8195),
> +                       "dev2/eventid=20 pending LPI is received");
> +
>         stats_reset();
>         wmb();
>         cpumask_clear(&mask);
>         cpumask_set_cpu(3, &mask);
>         its_send_int(dev2, 20);
>         wait_for_interrupts(&mask);
> -       report(check_acked(&mask, 0, 8195),
> -                       "dev2/eventid=20 now triggers an LPI");
> +       report(check_acked(&mask, 0, 8195), "dev2/eventid20 triggers an LPI");
>  
>         report_prefix_pop();
>  
> I also pushed a branch at [1].
>
> Thank you so much for spotting this! You've saved me (and probably others) a lot
> of time debugging.
>
> [1] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/fixes1-v2

I have been testing the branch, no failures after 17,996 runs with qemu and 58,669
runs with kvmtool. This looks fine to me, I'll send a v2 with the fix.

Thanks,
Alex

  reply	other threads:[~2020-12-01 15:09 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-25 15:51 [kvm-unit-tests PATCH 00/10] GIC fixes and improvements Alexandru Elisei
2020-11-25 15:51 ` [kvm-unit-tests PATCH 01/10] lib: arm/arm64: gicv3: Add missing barrier when sending IPIs Alexandru Elisei
2020-12-01 16:37   ` Auger Eric
2020-12-01 17:37     ` Alexandru Elisei
2020-11-25 15:51 ` [kvm-unit-tests PATCH 02/10] lib: arm/arm64: gicv2: " Alexandru Elisei
2020-12-01 16:37   ` Auger Eric
2020-11-25 15:51 ` [kvm-unit-tests PATCH 03/10] arm/arm64: gic: Remove memory synchronization from ipi_clear_active_handler() Alexandru Elisei
2020-12-01 16:37   ` Auger Eric
2020-12-02 14:02     ` Alexandru Elisei
2020-12-02 14:14       ` Alexandru Elisei
2020-12-03  9:41         ` Auger Eric
2020-11-25 15:51 ` [kvm-unit-tests PATCH 04/10] arm/arm64: gic: Remove unnecessary synchronization with stats_reset() Alexandru Elisei
2020-12-01 16:48   ` Auger Eric
2020-12-02 14:06     ` Alexandru Elisei
2020-12-03 13:10   ` Auger Eric
2020-11-25 15:51 ` [kvm-unit-tests PATCH 05/10] arm/arm64: gic: Use correct memory ordering for the IPI test Alexandru Elisei
2020-12-03 13:10   ` Auger Eric
2020-12-03 13:21     ` Alexandru Elisei
2020-11-25 15:51 ` [kvm-unit-tests PATCH 06/10] arm/arm64: gic: Check spurious and bad_sender in the active test Alexandru Elisei
2020-12-03 13:10   ` Auger Eric
2020-11-25 15:51 ` [kvm-unit-tests PATCH 07/10] arm/arm64: gic: Wait for writes to acked or spurious to complete Alexandru Elisei
2020-12-03 13:21   ` Auger Eric
2020-11-25 15:51 ` [kvm-unit-tests PATCH 08/10] arm/arm64: gic: Split check_acked() into two functions Alexandru Elisei
2020-12-03 13:39   ` Auger Eric
2020-12-10 14:45     ` Alexandru Elisei
2020-12-15 13:58       ` Auger Eric
2020-12-16 11:40         ` Alexandru Elisei
2020-12-16 12:37           ` Auger Eric
2020-11-25 15:51 ` [kvm-unit-tests PATCH 09/10] arm/arm64: gic: Make check_acked() more generic Alexandru Elisei
2020-12-03 14:59   ` Auger Eric
2020-11-25 15:51 ` [kvm-unit-tests PATCH 10/10] arm64: gic: Use IPI test checking for the LPI tests Alexandru Elisei
2020-11-26  9:30   ` Zenghui Yu
2020-11-27 14:50     ` Alexandru Elisei
2020-11-30 13:59       ` Zenghui Yu
2020-11-30 14:19         ` Alexandru Elisei
2020-12-01 15:09           ` Alexandru Elisei [this message]
2020-11-30 17:48     ` Auger Eric
2020-12-03 14:59   ` Auger Eric
2020-12-09 10:29     ` Alexandru Elisei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1dc913fd-80f5-4c8b-9c4d-52a1f0eee400@arm.com \
    --to=alexandru.elisei@arm.com \
    --cc=andre.przywara@arm.com \
    --cc=drjones@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).