kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Auger <eric.auger@redhat.com>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Ricardo Koller <ricarkol@google.com>,
	kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	andrew.jones@linux.dev, maz@kernel.org, oliver.upton@linux.dev,
	reijiw@google.com
Subject: Re: [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal
Date: Wed, 5 Oct 2022 11:50:09 +0200	[thread overview]
Message-ID: <2c577c0a-7bdb-81b0-f0c3-6ede3688b94d@redhat.com> (raw)
In-Reply-To: <Yz1MiE64ZEa7twtM@monolith.localdoman>

Hi Alexandru,

On 10/5/22 11:21, Alexandru Elisei wrote:
> Hi Eric,
>
> On Tue, Oct 04, 2022 at 07:31:25PM +0200, Eric Auger wrote:
>> Hi Alexandru,
>>
>> On 10/4/22 18:58, Alexandru Elisei wrote:
>>> Hi Eric,
>>>
>>> On Tue, Oct 04, 2022 at 06:20:23PM +0200, Eric Auger wrote:
>>>> Hi Ricardo, Marc,
>>>>
>>>> On 8/5/22 02:41, Ricardo Koller wrote:
>>>>> There are some tests that fail when running on bare metal (including a
>>>>> passthrough prototype).  There are three issues with the tests.  The
>>>>> first one is that there are some missing isb()'s between enabling event
>>>>> counting and the actual counting. This wasn't an issue on KVM as
>>>>> trapping on registers served as context synchronization events. The
>>>>> second issue is that some tests assume that registers reset to 0.  And
>>>>> finally, the third issue is that overflowing the low counter of a
>>>>> chained event sets the overflow flag in PMVOS and some tests fail by
>>>>> checking for it not being set.
>>>>>
>>>>> Addressed all comments from the previous version:
>>>>> https://lore.kernel.org/kvmarm/20220803182328.2438598-1-ricarkol@google.com/T/#t
>>>>> - adding missing isb() and fixed the commit message (Alexandru).
>>>>> - fixed wording of a report() check (Andrew).
>>>>>
>>>>> Thanks!
>>>>> Ricardo
>>>>>
>>>>> Ricardo Koller (3):
>>>>>   arm: pmu: Add missing isb()'s after sys register writing
>>>>>   arm: pmu: Reset the pmu registers before starting some tests
>>>>>   arm: pmu: Check for overflow in the low counter in chained counters
>>>>>     tests
>>>>>
>>>>>  arm/pmu.c | 56 ++++++++++++++++++++++++++++++++++++++-----------------
>>>>>  1 file changed, 39 insertions(+), 17 deletions(-)
>>>>>
>>>> While testing this series and the related '[PATCH 0/9] KVM: arm64: PMU:
>>>> Fixing chained events, and PMUv3p5 support' I noticed I have kvm unit
>>>> test failures on some machines. This does not seem related to those
>>>> series though since I was able to get them without. The failures happen
>>>> on Amberwing machine for instance with the pmu-chain-promotion.
>>>>
>>>> While further investigating I noticed there is a lot of variability on
>>>> the kvm unit test mem_access_loop() count. I can get the counter = 0x1F
>>>> on the first iteration and 0x96 on the subsequent ones for instance.
>>>> While running mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E) I was
>>>> expecting the counter to be close to 20. It is on some HW.
>>>>
>>>> [..]
>>>>
>>>> So I come to the actual question. Can we do any assumption on the
>>>> (virtual) PMU quality/precision? If not, the tests I originally wrote
>>>> are damned to fail on some HW (on some other they always pass) and I
>>>> need to make a decision wrt re-writing part of them, expecially those
>>>> which expect overflow after a given amount of ops. Otherwise, there is
>>>> either something wrong in the test (asm?) or in KVM PMU emulation.
>>>>
>>>> I tried to bisect because I did observe the same behavior on some older
>>>> kernels but the bisect was not successful as the issue does not happen
>>>> always.
>>>>
>>>> Thoughts?
>>> Looking at mem_access_loop(), the first thing that jumps out is the fact
>>> that is missing a DSB barrier. ISB affects only instructions, not memory
>>> accesses and without a DSB, the PE can reorder memory accesses however it
>>> sees fit.
>> Following your suggestion I added a dsh ish at the end of loop and
>> before disabling pmcr_el0 (I hope this is the place you were thinking
>> of) but unfortunately it does not seem to fix my issue.
> Yes, DSB ISH after "b.gt 1b\n" and before the write to PMCR_EL0 that
> disables the PMU.
>
> I think you also need a DSB ISH before the write to PMCR_EL0 that enables
> the PMU in the first instruction of the asm block. In your example, the
> MEM_ACCESS event count is higher than expected, and one explanation for the
> large disparity that I can think of is that previous memory accesses are
> reordered past the instruction that enables the PMU, which makes the PMU
> add these events to the total event count.

Makes sense. I added those at the 2 locations but unfortunately it does
not change the result for me.
>
>>> I also believe precise_instrs_loop() to be in the same situation, as the
>>> architecture doesn't guarantee that the cycle counter increments after
>>> every CPU cycle (ARM DDI 0487I.a, page D11-5246):
>>>
>>> "Although the architecture requires that direct reads of PMCCNTR_EL0 or
>>> PMCCNTR occur in program order, there is no requirement that the count
>>> increments between two such reads. Even when the counter is incrementing on
>>> every clock cycle, software might need check that the difference between
>>> two reads of the counter is nonzero."
>> OK
>>> There's also an entire section in ARM DDI 0487I.a dedicated to this, titled
>>> "A reasonable degree of inaccuracy" (page D11-5248). I'll post some
>>> snippets that I found interesting, but there are more examples and
>>> explanations to be found in that chapter.
>> yeah I saw that, hence my question about the reasonable disparity we can
>> expect from the HW/SW stack.
>>> "In exceptional circumstances, such as a change in Security state or other
>>> boundary condition, it is acceptable for the count to be inaccurate."
>>>
>>> PMCR writes are trapped by KVM. Is a change in exception level an
>>> "exception circumstance"? Could be, but couldn't find anything definitive.
>>> For example, the architecture allows an implementation to drop an event in
>>> the case of an interrupt:
>>>
>>> "However, dropping a single branch count as the result of a rare
>>> interaction with an interrupt is acceptable."
>>>
>>> So events could definitely be dropped because of an interrupt for the host.
>>>
>>> And there's also this:
>>>
>>> "The imprecision means that the counter might have counted an event around
>>> the time the counter was disabled, but does not allow the event to be
>>> observed as counted after the counter was disabled."
>> In our case there seems to be a huge discrepancy.
> I agree. There is this about the MEM_ACCESS event in the Arm ARM:
>
> "The counter counts each Memory-read operation or Memory-write operation
> that the PE makes."
>
> As for what a Memory-read operation is (emphasis added by me):
>
> "A memory-read operation might be due to:
> The result of an architecturally executed memory-reading instructions.
> The result of a Speculatively executed memory-reading instructions <- this
> is why the DSB ISH is needed before enabling the PMU.
> **A translation table walk**."
>
> Those extra memory accesses might be caused by the table walker deciding to
> walk the tables, speculatively or not. Software has no control over the
> table walker (as long as it is enabled).
That's indeed an interesting track. But can it be possible that for 20
expected load instructions we end up with ~150 actual memory accesses.
I can't help thinking this is a quite surprising amount.  Also the
pattern is surprising: the first iteration gives low counter count (~30)
while subsequent ones bring higher and constant ones (~150). I would
have expected the opposite, no? I will try to run the same experience on
various HW I have access to.

Anyway there is a problem while interpreting the result of the tests.
Either it can happen on some HW (it is a valid behavior according to the
ARM spec) and the test is simply not runnable or it is a bug somewhere
in the SW stack. 

It would be interesting to run the same tests at baremetal level on
Amberwing and see what are the results. Ricardo/Drew, could you give
some links about the setup?

Thanks

Eric
>
> Thanks,
> Alex
>
>>> If you want my opinion, if it is necessary to count the number of events
>>> for a test instead, I would define a margin of error on the number of
>>> events counted. Or the test could be changed to check that at least one
>>> such event was observed.
>> I agree with you on the fact a reasonable margin must be observed and
>> the tests may need to be rewritten to account for the observed disparity
>> if considered "normal". Another way to proceed is to compute the
>> disparity before launching the main tests and if too big, skip the main
>> tests. Again on some HW, the counts are really 'as expected' and constant.
>>
>> Thanks!
>>
>> Eric
>>> Thanks,
>>> Alex
>>>


  parent reply	other threads:[~2022-10-05  9:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05  0:41 [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Ricardo Koller
2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 1/3] arm: pmu: Add missing isb()'s after sys register writing Ricardo Koller
2022-08-09 15:21   ` Alexandru Elisei
2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 2/3] arm: pmu: Reset the pmu registers before starting some tests Ricardo Koller
2022-08-10 19:02   ` Andrew Jones
2022-08-10 23:33     ` Ricardo Koller
2022-08-11  7:04       ` Andrew Jones
2022-08-11 18:51         ` Ricardo Koller
2022-08-05  0:41 ` [kvm-unit-tests PATCH v3 3/3] arm: pmu: Check for overflow in the low counter in chained counters tests Ricardo Koller
2022-08-10 17:30   ` Oliver Upton
2022-08-10 18:28     ` Andrew Jones
2022-08-10 17:33 ` [kvm-unit-tests PATCH v3 0/3] arm: pmu: Fixes for bare metal Oliver Upton
2022-10-04 16:20 ` Eric Auger
2022-10-04 16:58   ` Alexandru Elisei
2022-10-04 17:31     ` Eric Auger
2022-10-05  9:21       ` Alexandru Elisei
2022-10-05  9:41         ` Alexandru Elisei
2022-10-05  9:51           ` Eric Auger
2022-10-05  9:50         ` Eric Auger [this message]
2022-10-06  9:25           ` Alexandru Elisei
2022-10-11  3:50           ` Ricardo Koller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c577c0a-7bdb-81b0-f0c3-6ede3688b94d@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alexandru.elisei@arm.com \
    --cc=andrew.jones@linux.dev \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=reijiw@google.com \
    --cc=ricarkol@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).