xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: xen-devel@lists.xenproject.org,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"committers@xenproject.org" <committers@xenproject.org>
Subject: Re: Regressed XSA-286, was [xen-unstable test] 161917: regressions - FAIL
Date: Thu, 17 Jun 2021 13:56:29 +0200	[thread overview]
Message-ID: <99833b7b-f626-fac5-d426-109afd4ffa38@suse.com> (raw)
In-Reply-To: <637ff3c7-afeb-aae4-0f1d-5ae168e01e01@citrix.com>

On 16.06.2021 17:43, Andrew Cooper wrote:
> On 16/06/2021 09:48, Jan Beulich wrote:
>> On 13.05.2021 22:15, Andrew Cooper wrote:
>>> On 13/05/2021 04:56, osstest service owner wrote:
>>>> flight 161917 xen-unstable real [real]
>>>> http://logs.test-lab.xenproject.org/osstest/logs/161917/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>>  test-arm64-arm64-examine      8 reboot                   fail REGR. vs. 161898
>>>>  test-arm64-arm64-xl-thunderx  8 xen-boot                 fail REGR. vs. 161898
>>>>  test-arm64-arm64-xl-credit1   8 xen-boot                 fail REGR. vs. 161898
>>>>  test-arm64-arm64-xl-credit2   8 xen-boot                 fail REGR. vs. 161898
>>>>  test-arm64-arm64-xl           8 xen-boot                 fail REGR. vs. 161898
>>> I reported these on IRC, and Julien/Stefano have already committed a fix.
>>>
>>>> Tests which are failing intermittently (not blocking):
>>>>  test-xtf-amd64-amd64-3 92 xtf/test-pv32pae-xsa-286 fail in 161909 pass in 161917
>>> While noticing the ARM issue above, I also spotted this one by chance. 
>>> There are two issues.
>>>
>>> First, I have reverted bed7e6cad30 and edcfce55917.  The XTF test is
>>> correct, and they really do reintroduce XSA-286.  It is a miracle of
>>> timing that we don't need an XSA/CVE against Xen 4.15.
>> As expressed at the time already, I view this reverting you did, without
>> there being any emergency and without you having gathered any acks or
>> allowed for objections, as overstepping your competencies. I did post a
>> patch to the XTF test, which I believe is wrong, without having had any
>> feedback there either. Unless I hear back by the end of this week with
>> substantial arguments of why I am wrong (which would need to also cover
>> the fact that an issue was found with 32-bit PAE only, in turn supporting
>> my view on the overall state), I intend to revert your revert early next
>> week.
> 
> It has frankly taken a while to formulate a civil reply.
> 
> I am very irritated that you have *twice* recently introduced security
> vulnerabilities by bypassing my reviews/objections on patches.

I'm sorry, Andrew, but already in my original reply a month ago I did
express that I couldn't find any record of you having objected to the
changes. It doesn't help that you claim you've objected when you
really didn't (which is the impression I get from not finding anything,
and which also matches my recollection of what was discussed).

I don't think I know which 2nd instance you're referring to, and hence
I can't respond to that aspect.

> At the time, I had to drop work on an in-progress security issue to
> urgently investigate why we'd regressed upstream, and why OSSTest hadn't
> blocked it.
> 
> I am more generally irritated that you are constantly breaking things
> which GitlabCI can tell you is broken, and that I'm having to drop work
> I'm supposed to be doing to unbreak them.

GitlabCI doesn't tell me anything just yet, unless I go actively poll
it. And as mentioned just yesterday on irc, I don't think I can easily
navigate my way through those web pages, to find breakage I may have
introduced and hence would better go fix. Unlike osstest, where I am
told what failed, and I know where to find the corresponding logs.

It's also not clear to me at all in how far GitlabCI would have
spotted the issue here, no matter whether it's caused by a hypervisor
change or the XTF test being wrong. So far I've seen GitlabCI only
spot build issues.

I'm also puzzled, to put it mildly, of your use of "constantly" here.

> In the case of this revert specifically, I did get agreement on IRC
> before reverting.

How can I know you did? You didn't even care to reply to my mail from
a month ago. And there was no reason to make an emergency out of this
and ask on irc. You could have sent mail just like is done for all
other normal bug fixes etc. Iirc I was on PTO at that time; it would
hence only have been fair to wait until my return.

> In your proposed edit to the XTF test, you say
> 
>   L3 entry updates aren't specified to take immediate effect in PAE mode:
> 
> but this is not accurate.  It's what the Intel SDM says, but is
> contradicted by the AMD APM which states that this behaviour is not true
> under NPT under any circumstance, nor is it true on native.
> 
> Furthermore, any 32bit PV guest knowing it is running on a 64bit Xen
> (even from simply checking Xen >= 4.3) can rely on the relaxed
> behaviour, irrespective of what the unwritten PV ABI might want to say
> on the matter, due to knowing that it is running on Long mode paging as
> opposed to legacy PAE paging.

Neither of these are reasons for a 32-bit guest to _rely_ on such
behavior. Hence the change to the XTF test, which so far you also
didn't care to reply to.

I'm aware of NPT having different behavior, but can you point me to
the place in AMD doc saying so also for native? In fact I can find a
statement to the contrary:

"The behavior of PAE mode in a nested-paging guest differs slightly
 from the behavior of (host-only) legacy PAE mode, in that the
 guest’s four PDPEs are not loaded into the processor at the time
 CR3 is written. Instead, the PDPEs are accessed on demand as part
 of a table walk."

This to me implies that in the native case the behavior matches
Intel's.

> If these two technical reasons aren't good enough, then consider the
> manifestation of the issue itself.  XSA-286 is specifically about Xen
> editing the wrong PTE, because of the use of linear pagetables, in light
> of the guest not flushing the TLB.

The PTE edited is, as said, only perceived wrong by the XTF test.
Hence the patch to correct it.

> If you were to remove linear pagetables from Xen, the issue
> (do_mmu_update() edits the wrong PTE) would cease to manifest even on
> legacy PAE paging, demonstrating that the problem is with Xen's actions,
> not with the guests.

And if I introduced shadowing of the L3E writes, pushing the new ones
into the live page tables only upon CR3 writes, the issue would
reappear. It ought to be permissible to make such a change to Xen,
even if we may have no specific reason to do so at this point (albeit
I think we really better would, to match bare metal behavior).

Also, in my request to you (still in context above) I did specifically
ask about the aspect of the observed issue only manifesting on 32-bit,
yet you claiming a general problem, i.e. also affecting 64-bit. You
didn't comment on this at all.

Jan



  reply	other threads:[~2021-06-17 11:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-13  3:56 [xen-unstable test] 161917: regressions - FAIL osstest service owner
2021-05-13 20:15 ` Regressed XSA-286, was " Andrew Cooper
2021-05-17  8:43   ` Jan Beulich
2021-05-17 10:59     ` Jan Beulich
2021-06-16  8:48   ` Jan Beulich
2021-06-16 15:43     ` Andrew Cooper
2021-06-17 11:56       ` Jan Beulich [this message]
2021-06-17 13:05         ` Ian Jackson
2021-06-17 14:40           ` Jan Beulich
2021-06-17 14:49             ` Ian Jackson
2021-06-17 14:55               ` Jan Beulich
2021-06-28 12:35           ` Ping: " Jan Beulich
2021-06-17 21:26         ` Stefano Stabellini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99833b7b-f626-fac5-d426-109afd4ffa38@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=committers@xenproject.org \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).