From: Manoj Iyer <manoj.iyer@canonical.com>
To: James Morse <james.morse@arm.com>
Cc: Manoj Iyer <manoj.iyer@canonical.com>,
Shanker Donthineni <shankerd@codeaurora.org>,
Will Deacon <will.deacon@arm.com>,
Marc Zyngier <marc.zyngier@arm.com>,
linux-arm-kernel@lists.infradead.org,
Catalin Marinas <catalin.marinas@arm.com>,
Ard Biesheuvel <ard.biesheuvel@linaro.org>,
Matt Fleming <matt@codeblueprint.co.uk>,
Christoffer Dall <christoffer.dall@linaro.org>,
linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org,
kvmarm@lists.cs.columbia.edu
Subject: Re: [3/3] arm64: Add software workaround for Falkor erratum 1041
Date: Thu, 9 Nov 2017 10:14:10 -0600 (CST) [thread overview]
Message-ID: <alpine.DEB.2.20.1711091010570.15101@lazy> (raw)
In-Reply-To: <alpine.DEB.2.20.1711090949110.15101@lazy>
On Thu, 9 Nov 2017, Manoj Iyer wrote:
>
> James,
>
> (sorry for top-posting)
>
> Applied patch 3 patches to Ubuntu Artful Kernel ( 4.13.0-16-generic )
>
> - Start 20 VMs one at a time
>
> In a loop:
> - Stop (virsh destroy) 20 VMs one at a time
> - Start (virsh start) 20 VMs one at a time.
Fixing some confusion I might have introduced in my prev email.
- Applied all 3 patches to Ubuntu Artful Kernel ( 4.13.0-16-generic )
- Created 20 VMs one at a time
In a loop:
- Stop (virsh destroy) 20 VMs one at a time
- Start (virsh start) 20 VMs one at a time.
>
> The system reset's itself after starting the last VM on the 1st loop
> displaying the following:
>
> awrep6 login: [ 603.349141] ACPI CPPC: PCC check channel failed. Status=0
> [ 603.765101] ACPI CPPC: PCC check channel failed. Status=0
> [ 603.937389] ACPI CPPC: PCC check channel failed. Status=0
> [ 608.285495] ACPI CPPC: PCC check channel failed. Status=0
> [ 608.289481] ACPI CPPC: PCC check channel failed. Status=0
>
> SYS_DBG: Running SDI image (immediate mode)
> SYS_DBG: Ram Dump Init
> SYS_DBG: Failed to init SD card
> SYS_DBG: Resetting system!
>
> Followed by the following messages on system reboot:
> [ 6.616891] BERT: Error records from previous boot:
> [ 6.621655] [Hardware Error]: event severity: fatal
> [ 6.626516] [Hardware Error]: imprecise tstamp: 0000-00-00 00:00:00
> [ 6.632851] [Hardware Error]: Error 0, type: fatal
> [ 6.637713] [Hardware Error]: section type: unknown,
> d2e2621c-f936-468d-0d84-15a4ed015c8b
> [ 6.646045] [Hardware Error]: section length: 0x238
> [ 6.651082] [Hardware Error]: 00000000: 72724502 5220726f 6f736165 6e55206e
> .Error Reason Un
> [ 6.659761] [Hardware Error]: 00000010: 776f6e6b 0000006e 00000000 00000000
> known...........
> [ 6.668442] [Hardware Error]: 00000020: 00000000 00000000 00000000 00000000
> ................
> [ 6.677122] [Hardware Error]: 00000030: 00000000 00000000 00000000 00000000
> ................
>
>
> On Thu, 9 Nov 2017, James Morse wrote:
>
>> Hi Manoj,
>>
>> On 08/11/17 19:05, Manoj Iyer wrote:
>>> On Thu, 2 Nov 2017, Shanker Donthineni wrote:
>>>> The ARM architecture defines the memory locations that are permitted
>>>> to be accessed as the result of a speculative instruction fetch from
>>>> an exception level for which all stages of translation are disabled.
>>>> Specifically, the core is permitted to speculatively fetch from the
>>>> 4KB region containing the current program counter and next 4KB.
>>>>
>>>> When translation is changed from enabled to disabled for the running
>>>> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
>>>> Falkor core may errantly speculatively access memory locations outside
>>>> of the 4KB region permitted by the architecture. The errant memory
>>>> access may lead to one of the following unexpected behaviors.
>>
>>> I applied the 3 patches to Ubuntu 4.13.0-16-generic (Artful) kernel and
>>> ran stress-ng cpu tests on QDF2400 server
>>
>> [...]
>>
>>> Where stress-ng would spawn N workers and test cpu offline/online, perform
>>> matrix operations, do rapid context switchs, and anonymous mmaps. Although
>>> I was not able to reproduce the erratum on the stock 4.13 kernel using the
>>> same test case, the patched kernel did not seem to introduce any
>>> regressions either. I ran the stress-ng tests for over 8hrs found the
>>> system to be stable.
>>
>>
>> Could you throw kexec and KVM into the mix? This issue only shows up when
>> we
>> disable the MMU, which we almost never do.
>>
>> For CPU offline/online we make the PSCI 'offline' call with the MMU
>> enabled.
>> When the CPU comes back firmware has reset the EL2/EL1 SCTLR from a higher
>> exception level, so it won't hit this issue.
>>
>> One place we do this is kexec, where we drop into purgatory with the MMU
>> disabled.
>>
>> The other is KVM unloading itself to return to the hyp stub. You can stress
>> this
>> by starting and stopping a VM. When the number of VMs reaches 0 KVM should
>> unload via 'kvm_arch_hardware_disable()'.
>>
>>
>> Thanks,
>>
>> James
>>
>>
>
> --
> ============================
> Manoj Iyer
> Ubuntu/Canonical
> ARM Servers - Cloud
> ============================
>
>
--
============================
Manoj Iyer
Ubuntu/Canonical
ARM Servers - Cloud
============================
next prev parent reply other threads:[~2017-11-09 16:14 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-03 3:27 [PATCH 0/3] Implement a software workaround for Falkor erratum 1041 Shanker Donthineni
2017-11-03 3:27 ` [PATCH 1/3] arm64: Define cputype macros for Falkor CPU Shanker Donthineni
2017-11-03 3:27 ` [PATCH 2/3] arm64: Prepare SCTLR_ELn accesses to handle Falkor erratum 1041 Shanker Donthineni
2017-11-03 3:27 ` [PATCH 3/3] arm64: Add software workaround for " Shanker Donthineni
2017-11-03 15:11 ` Robin Murphy
2017-11-04 21:43 ` Shanker Donthineni
2017-11-09 11:08 ` James Morse
2017-11-09 15:22 ` Shanker Donthineni
2017-11-10 10:24 ` James Morse
2017-11-13 1:06 ` Shanker Donthineni
2017-11-08 19:05 ` [3/3] " Manoj Iyer
2017-11-09 11:06 ` James Morse
2017-11-09 15:52 ` Manoj Iyer
2017-11-09 16:14 ` Manoj Iyer [this message]
2017-11-09 16:58 ` Manoj Iyer
2017-11-10 17:49 ` Manoj Iyer
2017-11-15 15:12 ` Manoj Iyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.20.1711091010570.15101@lazy \
--to=manoj.iyer@canonical.com \
--cc=ard.biesheuvel@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=christoffer.dall@linaro.org \
--cc=james.morse@arm.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=matt@codeblueprint.co.uk \
--cc=shankerd@codeaurora.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).