All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
@ 2019-03-21 20:26 Andrew Cooper
  2019-03-21 20:47 ` Igor Druzhinin
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Andrew Cooper @ 2019-03-21 20:26 UTC (permalink / raw)
  To: Xen-devel
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Jun Nakajima, Konrad Rzeszutek Wilk, George Dunlap,
	Andrew Cooper, Tim Deegan, Julien Grall, Paul Durrant,
	Jan Beulich, Ian Jackson, Roger Pau Monné

It turns out that this code was previously dead.

c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
enough for acpi_parse_one_drhd() to not take the

  /* Skip checking if segment is not accessible yet. */

path unconditionally.  However, some systems have DMAR tables which list
devices which are disabled by user choice (in particular, Dell PowerEdge R740
with I/O AT DMA disabled), and turning off all IOMMU functionality in this
case is entirely unhelpful behaviour.

Leave the warning which identifies the problematic devices, but drop the
remaining logic.  This leaves the system in better overall state, and working
in the same way that it did in previous releases.

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Julien Grall <julien.grall@arm.com>
CC: Jun Nakajima <jun.nakajima@intel.com>
CC: Kevin Tian <kevin.tian@intel.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Igor Druzhinin <igor.druzhinin@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: Juergen Gross <jgross@suse.com>

This is a candidate for 4.12.  Given the absense of Jan as the maintaner, and
the proximity to the 4.12 release, I put this patch to The Rest for a
hopefully-more-timely decision and review.
---
 docs/misc/xen-command-line.pandoc  |  7 +------
 xen/drivers/passthrough/iommu.c    |  3 ---
 xen/drivers/passthrough/vtd/dmar.c | 29 ++---------------------------
 xen/include/xen/iommu.h            |  3 +--
 4 files changed, 4 insertions(+), 38 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a9fe449..6db82f3 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1173,8 +1173,7 @@ detection of systems known to misbehave upon accesses to that port.
 ### iommu
     = List of [ <bool>, verbose, debug, force, required,
                 sharept, intremap, intpost, crash-disable,
-                snoop, qinval, igfx, workaround_bios_bug,
-                amd-iommu-perdev-intremap,
+                snoop, qinval, igfx, amd-iommu-perdev-intremap,
                 dom0-{passthrough,strict} ]
 
     All sub-options are boolean in nature.
@@ -1265,10 +1264,6 @@ The following options are specific to Intel VT-d hardware:
     similar to Linux's `intel_iommu=igfx_off` option.  If specifying `no-igfx`
     fixes anything, please report the problem.
 
-*   The `workaround_bios_bug` boolean is disabled by default.  It can be used
-    to ignore errors when parsing the ACPI tables, and finding a listed PCI
-    device which doesn't appear to exist in the system.
-
 The following options are specific to AMD-Vi hardware:
 
 *   The `amd-iommu-perdev-intremap` boolean controls whether the interrupt
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 611e857..a6697d5 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -30,7 +30,6 @@ bool_t __initdata iommu_enable = 1;
 bool_t __read_mostly iommu_enabled;
 bool_t __read_mostly force_iommu;
 bool_t __read_mostly iommu_verbose;
-bool_t __read_mostly iommu_workaround_bios_bug;
 bool_t __read_mostly iommu_igfx = 1;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
@@ -75,8 +74,6 @@ static int __init parse_iommu_param(const char *s)
         else if ( (val = parse_boolean("force", s, ss)) >= 0 ||
                   (val = parse_boolean("required", s, ss)) >= 0 )
             force_iommu = val;
-        else if ( (val = parse_boolean("workaround_bios_bug", s, ss)) >= 0 )
-            iommu_workaround_bios_bug = val;
         else if ( (val = parse_boolean("igfx", s, ss)) >= 0 )
             iommu_igfx = val;
         else if ( (val = parse_boolean("verbose", s, ss)) >= 0 )
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index 81afa54..2372cd2 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -514,7 +514,7 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
     else
     {
         u8 b, d, f;
-        unsigned int i = 0, invalid_cnt = 0;
+        unsigned int i = 0;
         union {
             const void *raw;
             const struct acpi_dmar_device_scope *scope;
@@ -536,37 +536,12 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header)
             f = PCI_FUNC(dmaru->scope.devices[i]);
 
             if ( !pci_device_detect(drhd->segment, b, d, f) )
-            {
                 printk(XENLOG_WARNING VTDPREFIX
                        " Non-existent device (%04x:%02x:%02x.%u) in this DRHD's scope!\n",
                        drhd->segment, b, d, f);
-                invalid_cnt++;
-            }
         }
 
-        if ( invalid_cnt )
-        {
-            if ( iommu_workaround_bios_bug &&
-                 invalid_cnt == dmaru->scope.devices_cnt )
-            {
-                printk(XENLOG_WARNING VTDPREFIX
-                       "  Workaround BIOS bug: ignoring DRHD (no devices in its scope are PCI discoverable)\n");
-
-                scope_devices_free(&dmaru->scope);
-                iommu_free(dmaru);
-                xfree(dmaru);
-            }
-            else
-            {
-                printk(XENLOG_WARNING VTDPREFIX
-                       "  DRHD is invalid (some devices in its scope are not PCI discoverable)\n");
-                printk(XENLOG_WARNING VTDPREFIX
-                       "  Try \"iommu=force\" or \"iommu=workaround_bios_bug\" if you really want VT-d\n");
-                ret = -EINVAL;
-            }
-        }
-        else
-            acpi_register_drhd_unit(dmaru);
+        acpi_register_drhd_unit(dmaru);
     }
 
 out:
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 64a5078..62a24d5 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -53,8 +53,7 @@ static inline bool_t dfn_eq(dfn_t x, dfn_t y)
 }
 
 extern bool_t iommu_enable, iommu_enabled;
-extern bool_t force_iommu, iommu_verbose;
-extern bool_t iommu_workaround_bios_bug, iommu_igfx;
+extern bool_t force_iommu, iommu_verbose, iommu_igfx;
 extern bool_t iommu_snoop, iommu_qinval, iommu_intremap, iommu_intpost;
 extern bool_t iommu_hap_pt_share;
 extern bool_t iommu_debug;
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-21 20:26 [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely Andrew Cooper
@ 2019-03-21 20:47 ` Igor Druzhinin
  2019-03-22  1:43 ` Roger Pau Monné
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Igor Druzhinin @ 2019-03-21 20:47 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Juergen Gross, Kevin Tian, Stefano Stabellini, Wei Liu,
	Jun Nakajima, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Julien Grall, Paul Durrant, Jan Beulich, Ian Jackson,
	Roger Pau Monné

On 21/03/2019 20:26, Andrew Cooper wrote:
> It turns out that this code was previously dead.
> 
> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
> enough for acpi_parse_one_drhd() to not take the
> 
>   /* Skip checking if segment is not accessible yet. */
> 
> path unconditionally.  However, some systems have DMAR tables which list
> devices which are disabled by user choice (in particular, Dell PowerEdge R740
> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
> case is entirely unhelpful behaviour.
> 
> Leave the warning which identifies the problematic devices, but drop the
> remaining logic.  This leaves the system in better overall state, and working
> in the same way that it did in previous releases.
> 
> Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Igor Druzhinin <igor.druzhinin@citrix.com>

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-21 20:26 [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely Andrew Cooper
  2019-03-21 20:47 ` Igor Druzhinin
@ 2019-03-22  1:43 ` Roger Pau Monné
  2019-03-22  6:07 ` Juergen Gross
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Roger Pau Monné @ 2019-03-22  1:43 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Jun Nakajima, Konrad Rzeszutek Wilk, George Dunlap,
	Tim Deegan, Xen-devel, Julien Grall, Paul Durrant, Jan Beulich,
	Ian Jackson

On Thu, Mar 21, 2019 at 08:26:20PM +0000, Andrew Cooper wrote:
> It turns out that this code was previously dead.
> 
> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
> enough for acpi_parse_one_drhd() to not take the
> 
>   /* Skip checking if segment is not accessible yet. */
> 
> path unconditionally.  However, some systems have DMAR tables which list
> devices which are disabled by user choice (in particular, Dell PowerEdge R740
> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
> case is entirely unhelpful behaviour.
> 
> Leave the warning which identifies the problematic devices, but drop the
> remaining logic.  This leaves the system in better overall state, and working
> in the same way that it did in previous releases.
> 
> Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

I think this is a more sane behavior.

> -        if ( invalid_cnt )
> -        {
> -            if ( iommu_workaround_bios_bug &&
> -                 invalid_cnt == dmaru->scope.devices_cnt )
> -            {
> -                printk(XENLOG_WARNING VTDPREFIX
> -                       "  Workaround BIOS bug: ignoring DRHD (no devices in its scope are PCI discoverable)\n");
> -
> -                scope_devices_free(&dmaru->scope);
> -                iommu_free(dmaru);
> -                xfree(dmaru);
> -            }
> -            else
> -            {
> -                printk(XENLOG_WARNING VTDPREFIX
> -                       "  DRHD is invalid (some devices in its scope are not PCI discoverable)\n");
> -                printk(XENLOG_WARNING VTDPREFIX
> -                       "  Try \"iommu=force\" or \"iommu=workaround_bios_bug\" if you really want VT-d\n");
> -                ret = -EINVAL;
> -            }

The workaround_bios_bug option seems quite pointless anyway, it only
prevents propagating the error to the caller if all the devices in the
DMAR scope are non-existent, or else the DMAR won't be registered
and an error would be returned to the caller.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-21 20:26 [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely Andrew Cooper
  2019-03-21 20:47 ` Igor Druzhinin
  2019-03-22  1:43 ` Roger Pau Monné
@ 2019-03-22  6:07 ` Juergen Gross
  2019-03-22 11:53 ` George Dunlap
  2019-03-25 15:24 ` Jan Beulich
  4 siblings, 0 replies; 17+ messages in thread
From: Juergen Gross @ 2019-03-22  6:07 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Igor Druzhinin, Kevin Tian, Stefano Stabellini, Wei Liu,
	Jun Nakajima, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Julien Grall, Paul Durrant, Jan Beulich, Ian Jackson,
	Roger Pau Monné

On 21/03/2019 21:26, Andrew Cooper wrote:
> It turns out that this code was previously dead.
> 
> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
> enough for acpi_parse_one_drhd() to not take the
> 
>   /* Skip checking if segment is not accessible yet. */
> 
> path unconditionally.  However, some systems have DMAR tables which list
> devices which are disabled by user choice (in particular, Dell PowerEdge R740
> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
> case is entirely unhelpful behaviour.
> 
> Leave the warning which identifies the problematic devices, but drop the
> remaining logic.  This leaves the system in better overall state, and working
> in the same way that it did in previous releases.
> 
> Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-21 20:26 [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely Andrew Cooper
                   ` (2 preceding siblings ...)
  2019-03-22  6:07 ` Juergen Gross
@ 2019-03-22 11:53 ` George Dunlap
  2019-03-25 15:24 ` Jan Beulich
  4 siblings, 0 replies; 17+ messages in thread
From: George Dunlap @ 2019-03-22 11:53 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Jun Nakajima, Konrad Rzeszutek Wilk, George Dunlap,
	Tim Deegan, Julien Grall, Paul Durrant, Jan Beulich, Ian Jackson,
	Roger Pau Monné

On 3/21/19 8:26 PM, Andrew Cooper wrote:
> It turns out that this code was previously dead.
> 
> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
> enough for acpi_parse_one_drhd() to not take the
> 
>   /* Skip checking if segment is not accessible yet. */
> 
> path unconditionally.  However, some systems have DMAR tables which list
> devices which are disabled by user choice (in particular, Dell PowerEdge R740
> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
> case is entirely unhelpful behaviour.
> 
> Leave the warning which identifies the problematic devices, but drop the
> remaining logic.  This leaves the system in better overall state, and working
> in the same way that it did in previous releases.
> 
> Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

In the maintianer's absence, and in light of the other R-b's, and the
timing, I approve this going in without the maintianer's ack:

Acked-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-21 20:26 [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely Andrew Cooper
                   ` (3 preceding siblings ...)
  2019-03-22 11:53 ` George Dunlap
@ 2019-03-25 15:24 ` Jan Beulich
  2019-03-25 17:36   ` Andrew Cooper
  2019-03-27 11:02   ` George Dunlap
  4 siblings, 2 replies; 17+ messages in thread
From: Jan Beulich @ 2019-03-25 15:24 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap, Juergen Gross
  Cc: Igor Druzhinin, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, Tim Deegan, Lars Kurth, Xen-devel,
	Julien Grall, Paul Durrant, Jun Nakajima, Ian Jackson,
	Roger Pau Monne

>>> On 21.03.19 at 21:26, <andrew.cooper3@citrix.com> wrote:
> It turns out that this code was previously dead.

If it was entirely dead, why the rush to get the change into 4.12?
(I suppose the later parts of description are then justifying why
the code wasn't actually dead, and why it was getting in the way,
but I think this way of putting it is at least confusing.)

> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
> enough for acpi_parse_one_drhd() to not take the
> 
>   /* Skip checking if segment is not accessible yet. */
> 
> path unconditionally.

Or am I misreading the initial sentence, and you're really suggesting
that prior to said commit the code in question had been dead? How's
that commit related then? Segment 0 is valid irrespective of any
MMCFG information gained from ACPI tables (see pci_segments_init()).

>  However, some systems have DMAR tables which list
> devices which are disabled by user choice (in particular, Dell PowerEdge R740
> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
> case is entirely unhelpful behaviour.

As in many other cases, what is or is not unhelpful is often a matter
of perception and hence possibly subjective. I can see your point,
but I also can see why the authors of the code considered it a rather
bad sign if non-existing PCI devices get named by an ACPI table.
What if e.g. later a device gets hot-plugged into that very SBDF?

> Leave the warning which identifies the problematic devices, but drop the
> remaining logic.  This leaves the system in better overall state, and working
> in the same way that it did in previous releases.

I wonder whether you've taken the time to look at the description
of the commit first introducing this logic (a8059ffced "VT-d: improve
RMRR validity checking"). I find it worrying in particular to
effectively revert a change which claims 'to avoid any security
vulnerability with malicious s/s re-enabling "supposed disabled"
devices' without any discussion of why that may have been a
wrong perspective to take.

I'd also appreciate clarification on you saying "working in the same
way that it did in previous releases" - it sounds as if you might
have spotted a regression here, but it's not really becoming clear
to me what that regression then would have been.

> This is a candidate for 4.12.  Given the absense of Jan as the maintaner, and
> the proximity to the 4.12 release, I put this patch to The Rest for a
> hopefully-more-timely decision and review.

To be honest, I have two problems with this: For one the main
part of your change falls in Kevin's realm, not mine. And then,
even if it would have been mainly me to ack the change, I was
gone for three days, not three months. Yet the code had been
this way for over 9 years. One thing seems pretty clear to me:
It is at best non-obvious that there is no risk of regressions
here.

Much less risky changes have been rejected as coming too late
for 4.12. I don't think this should have been rushed into the
tree, and even less so for a release about to be cut. Especially
not without a proper maintainer ack.

The main reason why, at least for the moment, I'm not meaning
to officially request a revert is that I'm not sure the original
commit's description is fully correct and / or it was really
addressing some ia64-specific quirk.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-25 15:24 ` Jan Beulich
@ 2019-03-25 17:36   ` Andrew Cooper
  2019-03-26  9:08     ` Jan Beulich
  2019-03-27 11:02   ` George Dunlap
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2019-03-25 17:36 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap, Juergen Gross
  Cc: Igor Druzhinin, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, Tim Deegan, Lars Kurth, Xen-devel,
	Julien Grall, Paul Durrant, Jun Nakajima, Ian Jackson,
	Roger Pau Monne

On 25/03/2019 15:24, Jan Beulich wrote:
>>>> On 21.03.19 at 21:26, <andrew.cooper3@citrix.com> wrote:
>> It turns out that this code was previously dead.
> If it was entirely dead, why the rush to get the change into 4.12?
> (I suppose the later parts of description are then justifying why
> the code wasn't actually dead, and why it was getting in the way,
> but I think this way of putting it is at least confusing.)
>
>> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
>> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
>> enough for acpi_parse_one_drhd() to not take the
>>
>>   /* Skip checking if segment is not accessible yet. */
>>
>> path unconditionally.
> Or am I misreading the initial sentence, and you're really suggesting
> that prior to said commit the code in question had been dead?

This logic (which is being deleted) used to be dead, and became non-dead
with the identified commit.

The code, now being run, constitutes a functional regression on some
hardware, which worked fine with Xen 4.11.

> How's that commit related then? Segment 0 is valid irrespective of any
> MMCFG information gained from ACPI tables (see pci_segments_init()).

Exactly - that is why it is a regression.

Before pci_segments_init() (which is the first action of
acpi_mmcfg_init(), and hence is moved by the mentioned commit),
acpi_parse_one_drhd()'s query of segment zero returns "not present",
causing the check to be skipped.

>
>>  However, some systems have DMAR tables which list
>> devices which are disabled by user choice (in particular, Dell PowerEdge R740
>> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
>> case is entirely unhelpful behaviour.
> As in many other cases, what is or is not unhelpful is often a matter
> of perception and hence possibly subjective. I can see your point,
> but I also can see why the authors of the code considered it a rather
> bad sign if non-existing PCI devices get named by an ACPI table.
> What if e.g. later a device gets hot-plugged into that very SBDF?

Exactly the same as what happens on Xen 4.11 and earlier, whatever that
happens to be.

>
>> Leave the warning which identifies the problematic devices, but drop the
>> remaining logic.  This leaves the system in better overall state, and working
>> in the same way that it did in previous releases.
> I wonder whether you've taken the time to look at the description
> of the commit first introducing this logic (a8059ffced "VT-d: improve
> RMRR validity checking"). I find it worrying in particular to
> effectively revert a change which claims 'to avoid any security
> vulnerability with malicious s/s re-enabling "supposed disabled"
> devices' without any discussion of why that may have been a
> wrong perspective to take.

I had, and as a maintainer, I'd reject a patch like that were it
presented today.

There is a nebulous claim of security, but it is exactly that -
nebulous.  There isn't enough information to work out what the concern
was, and even if the concern was valid, disabling VT-d across the system
isn't an appropriate action to take.

>
> I'd also appreciate clarification on you saying "working in the same
> way that it did in previous releases" - it sounds as if you might
> have spotted a regression here, but it's not really becoming clear
> to me what that regression then would have been.
>
>> This is a candidate for 4.12.  Given the absense of Jan as the maintaner, and
>> the proximity to the 4.12 release, I put this patch to The Rest for a
>> hopefully-more-timely decision and review.
> To be honest, I have two problems with this: For one the main
> part of your change falls in Kevin's realm, not mine. And then,
> even if it would have been mainly me to ack the change, I was
> gone for three days, not three months. Yet the code had been
> this way for over 9 years. One thing seems pretty clear to me:
> It is at best non-obvious that there is no risk of regressions
> here.

The commit message states clearly the commit which caused the function
regression, and the justification for why deleting the code resolves the
regression by making the behaviour identical to 4.11 and earlier.

I'm not sure what more you are looking for, but this is very clear cut
and safe from my point of view.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-25 17:36   ` Andrew Cooper
@ 2019-03-26  9:08     ` Jan Beulich
  2019-03-26 12:43       ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2019-03-26  9:08 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Lars Kurth, Xen-devel, Julien Grall, Paul Durrant, Jun Nakajima,
	Ian Jackson, Roger Pau Monne

>>> On 25.03.19 at 18:36, <andrew.cooper3@citrix.com> wrote:
> On 25/03/2019 15:24, Jan Beulich wrote:
>>>>> On 21.03.19 at 21:26, <andrew.cooper3@citrix.com> wrote:
>>> It turns out that this code was previously dead.
>> If it was entirely dead, why the rush to get the change into 4.12?
>> (I suppose the later parts of description are then justifying why
>> the code wasn't actually dead, and why it was getting in the way,
>> but I think this way of putting it is at least confusing.)
>>
>>> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
>>> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
>>> enough for acpi_parse_one_drhd() to not take the
>>>
>>>   /* Skip checking if segment is not accessible yet. */
>>>
>>> path unconditionally.
>> Or am I misreading the initial sentence, and you're really suggesting
>> that prior to said commit the code in question had been dead?
> 
> This logic (which is being deleted) used to be dead, and became non-dead
> with the identified commit.
> 
> The code, now being run, constitutes a functional regression on some
> hardware, which worked fine with Xen 4.11.
> 
>> How's that commit related then? Segment 0 is valid irrespective of any
>> MMCFG information gained from ACPI tables (see pci_segments_init()).
> 
> Exactly - that is why it is a regression.
> 
> Before pci_segments_init() (which is the first action of
> acpi_mmcfg_init(), and hence is moved by the mentioned commit),
> acpi_parse_one_drhd()'s query of segment zero returns "not present",
> causing the check to be skipped.
> 
>>>  However, some systems have DMAR tables which list
>>> devices which are disabled by user choice (in particular, Dell PowerEdge R740
>>> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
>>> case is entirely unhelpful behaviour.
>> As in many other cases, what is or is not unhelpful is often a matter
>> of perception and hence possibly subjective. I can see your point,
>> but I also can see why the authors of the code considered it a rather
>> bad sign if non-existing PCI devices get named by an ACPI table.
>> What if e.g. later a device gets hot-plugged into that very SBDF?
> 
> Exactly the same as what happens on Xen 4.11 and earlier, whatever that
> happens to be.
> 
>>> Leave the warning which identifies the problematic devices, but drop the
>>> remaining logic.  This leaves the system in better overall state, and working
>>> in the same way that it did in previous releases.
>> I wonder whether you've taken the time to look at the description
>> of the commit first introducing this logic (a8059ffced "VT-d: improve
>> RMRR validity checking"). I find it worrying in particular to
>> effectively revert a change which claims 'to avoid any security
>> vulnerability with malicious s/s re-enabling "supposed disabled"
>> devices' without any discussion of why that may have been a
>> wrong perspective to take.
> 
> I had, and as a maintainer, I'd reject a patch like that were it
> presented today.

Understood. But whether you'd accept it with a better description
is unknown, I assume.

> There is a nebulous claim of security, but it is exactly that -
> nebulous.  There isn't enough information to work out what the concern
> was, and even if the concern was valid, disabling VT-d across the system
> isn't an appropriate action to take.

This heavily depends on the position the system's admin takes:
Enabling VT-d in an incomplete fashion may as well be considered
worse than not enabling it at all. That's because without VT-d
pass-through of devices to HVM guests won't be allowed at all.
Whereas with VT-d enabled passing through a device may, in
such a situation, put the system at higher risk than the admin is
aware.

Furthermore, as much as the security related claim there is
nebulous, your description - I'm sorry to say that - isn't much
better, as you don't clarify why there's _no_ security aspect
there. Stating that "this leaves the system in better overall
state" without making clear why that is _for everyone_ is not
helpful at all.

>> I'd also appreciate clarification on you saying "working in the same
>> way that it did in previous releases" - it sounds as if you might
>> have spotted a regression here, but it's not really becoming clear
>> to me what that regression then would have been.
>>
>>> This is a candidate for 4.12.  Given the absense of Jan as the maintaner, and
>>> the proximity to the 4.12 release, I put this patch to The Rest for a
>>> hopefully-more-timely decision and review.
>> To be honest, I have two problems with this: For one the main
>> part of your change falls in Kevin's realm, not mine. And then,
>> even if it would have been mainly me to ack the change, I was
>> gone for three days, not three months. Yet the code had been
>> this way for over 9 years. One thing seems pretty clear to me:
>> It is at best non-obvious that there is no risk of regressions
>> here.
> 
> The commit message states clearly the commit which caused the function
> regression, and the justification for why deleting the code resolves the
> regression by making the behaviour identical to 4.11 and earlier.

But are you seriously suggesting the behavior in 4.11 and earlier
was intended? I'm of the opinion that the assumption was that
pci_known_segment() would unconditionally return true for
segment 0. When writing my original reply it didn't even occur to
me that this could not have been the case.

> I'm not sure what more you are looking for, but this is very clear cut
> and safe from my point of view.

Well, your claim regarding "4.11 and earlier" is clearly wrong, albeit
in another way than I first thought: Looking back far enough, prior
to the introduction of support for segments other than 0 the code
in question did not skip the checking on segment 0. Obviously at
that time nothing other than segment 0 would have been checked
(and as it seems the DRHD's segment specification was silently
ignored).

IOW there was an earlier regression causing the checks to be
skipped. This looks to have been my fault with fd1e17183b ("VT-d:
don't reject possibly valid DRHD or RMRR"), not realizing that the
call to pt_pci_init() would have had to happen ahead of that to
acpi_boot_init() in order for pci_known_segment() to return true
for segment 0 at all times.

So along the lines of what I've been saying before - what I would
have expected in the commit message (and what I still hope for
in a reply of yours) is a discussion of why exactly the original
commit adding the checking was wrong, the more with its later
amending by the command line (sub-)option you've now removed.
As said above - it should be the admin's decision whether to
enable the IOMMU despite there being firmware flaws. You've
taken away this (policy) decision from them, uniformly forcing the
behavior you consider the only possible good one onto everyone.

As an aside - the description of your change also makes it sound
as if the observed behavior was correct. I doubt this to be the
case; I think devices disabled by user choice should not have
references left in ACPI tables.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-26  9:08     ` Jan Beulich
@ 2019-03-26 12:43       ` Andrew Cooper
  2019-03-26 13:39         ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2019-03-26 12:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Lars Kurth, Xen-devel, Julien Grall, Paul Durrant, Jun Nakajima,
	Ian Jackson, Roger Pau Monne

On 26/03/2019 09:08, Jan Beulich wrote:
>>
>>>> Leave the warning which identifies the problematic devices, but drop the
>>>> remaining logic.  This leaves the system in better overall state, and working
>>>> in the same way that it did in previous releases.
>>> I wonder whether you've taken the time to look at the description
>>> of the commit first introducing this logic (a8059ffced "VT-d: improve
>>> RMRR validity checking"). I find it worrying in particular to
>>> effectively revert a change which claims 'to avoid any security
>>> vulnerability with malicious s/s re-enabling "supposed disabled"
>>> devices' without any discussion of why that may have been a
>>> wrong perspective to take.
>> I had, and as a maintainer, I'd reject a patch like that were it
>> presented today.
> Understood. But whether you'd accept it with a better description
> is unknown, I assume.

I severely doubt I'd accept it at all, because it is entirely
unreasonable behaviour.

At best, it is the equivalent of throwing your hands up in the air and
saying "I give up", and that is not good enough behaviour for Xen.

>
>> There is a nebulous claim of security, but it is exactly that -
>> nebulous.  There isn't enough information to work out what the concern
>> was, and even if the concern was valid, disabling VT-d across the system
>> isn't an appropriate action to take.
> This heavily depends on the position the system's admin takes:
> Enabling VT-d in an incomplete fashion may as well be considered
> worse than not enabling it at all.

No - that's simply not true, or a reasonable position to take. 
Disabling the IOMMU prevents the system from booting with a PVH dom0.

I am not aware of a credible case where partially enabled VT-d is less
secure than no VT-d, and there is one headline case now where disabled
VT-d causes a failure to boot.

> Furthermore, as much as the security related claim there is
> nebulous, your description - I'm sorry to say that - isn't much
> better, as you don't clarify why there's _no_ security aspect
> there. Stating that "this leaves the system in better overall
> state" without making clear why that is _for everyone_ is not
> helpful at all.

The nebulous security claim is not relevant to this patch.

This code was not run previously.  An unexpected consequence of a change
in 4.12 caused it to run, and break booting on some (sadly rather
common) systems.

This is a regression in 4.12 and needs resolving.  The choice is between
reverting dcf41790 or removing this code, and reverting dcf41790 is
obviously not a valid thing to do.

Beyond that, I really don't care what the exact behaviour of 4.11 was. 
If there is a real security issue then it still needs fixing on all
versions of Xen, and this change doesn't alter that property.

However, until someone can work out what the alleged issue is, we can't
really progress this argument, and we mustn't keep broken code simply
because it purports to "fix" an unspecified issue.

>
>> I'm not sure what more you are looking for, but this is very clear cut
>> and safe from my point of view.
> Well, your claim regarding "4.11 and earlier" is clearly wrong

I have made a statement, backed up with specific reference to the code
which, to the best of my ability, demonstrates it to be true.

If you believe contrary then clearly identify the fault in my reasoning.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-26 12:43       ` Andrew Cooper
@ 2019-03-26 13:39         ` Jan Beulich
  2019-03-27 14:38           ` Andrew Cooper
  2019-03-28 15:06           ` George Dunlap
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Beulich @ 2019-03-26 13:39 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Lars Kurth, Xen-devel, Julien Grall, Paul Durrant, Jun Nakajima,
	Ian Jackson, Roger Pau Monne

>>> On 26.03.19 at 13:43, <andrew.cooper3@citrix.com> wrote:
> On 26/03/2019 09:08, Jan Beulich wrote:
>>>
>>>>> Leave the warning which identifies the problematic devices, but drop the
>>>>> remaining logic.  This leaves the system in better overall state, and working
>>>>> in the same way that it did in previous releases.
>>>> I wonder whether you've taken the time to look at the description
>>>> of the commit first introducing this logic (a8059ffced "VT-d: improve
>>>> RMRR validity checking"). I find it worrying in particular to
>>>> effectively revert a change which claims 'to avoid any security
>>>> vulnerability with malicious s/s re-enabling "supposed disabled"
>>>> devices' without any discussion of why that may have been a
>>>> wrong perspective to take.
>>> I had, and as a maintainer, I'd reject a patch like that were it
>>> presented today.
>> Understood. But whether you'd accept it with a better description
>> is unknown, I assume.
> 
> I severely doubt I'd accept it at all, because it is entirely
> unreasonable behaviour.
> 
> At best, it is the equivalent of throwing your hands up in the air and
> saying "I give up", and that is not good enough behaviour for Xen.
> 
>>
>>> There is a nebulous claim of security, but it is exactly that -
>>> nebulous.  There isn't enough information to work out what the concern
>>> was, and even if the concern was valid, disabling VT-d across the system
>>> isn't an appropriate action to take.
>> This heavily depends on the position the system's admin takes:
>> Enabling VT-d in an incomplete fashion may as well be considered
>> worse than not enabling it at all.
> 
> No - that's simply not true, or a reasonable position to take. 

As is every way of thinking differently than you do? I'm sorry to
be putting it this way, but you continue to make claims about
how people ought to think without giving any reason why that's
the only valid way. I can't see anything wrong with someone
putting themselves on the position that if they see an enabled
IOMMU, they assume that pass-through is as safe as it can
(currently) be. Just to then be caught by surprise that there is
a device not actually handled by any IOMMU? After all a non-
existent device listed in a table may as well be a hint that it's
just its SBDF which the firmware got wrong.

> Disabling the IOMMU prevents the system from booting with a PVH dom0.

But doing what you did is not the only way of getting around this.
Defaulting to workaround_bios_bug=1 in the PVH case would be
another, as would be a mode in which the IOMMU exists for Dom0's
purposes only (i.e. still disallowing any pass-through to DomU-s).

> I am not aware of a credible case where partially enabled VT-d is less
> secure than no VT-d, and there is one headline case now where disabled
> VT-d causes a failure to boot.
> 
>> Furthermore, as much as the security related claim there is
>> nebulous, your description - I'm sorry to say that - isn't much
>> better, as you don't clarify why there's _no_ security aspect
>> there. Stating that "this leaves the system in better overall
>> state" without making clear why that is _for everyone_ is not
>> helpful at all.
> 
> The nebulous security claim is not relevant to this patch.
> 
> This code was not run previously.  An unexpected consequence of a change
> in 4.12 caused it to run, and break booting on some (sadly rather
> common) systems.
> 
> This is a regression in 4.12 and needs resolving.  The choice is between
> reverting dcf41790 or removing this code, and reverting dcf41790 is
> obviously not a valid thing to do.

As explained before, there was an earlier regression, which - if it
had been noticed in time - would have made all versions from 4.2
to 4.11 behave like 4.12 without your change. This behavior was
intended by the original author. Ripping the code out by convincing
people to bypass normal review flow is, well, not very nice to put it
mildly.

> Beyond that, I really don't care what the exact behaviour of 4.11 was. 
> If there is a real security issue then it still needs fixing on all
> versions of Xen, and this change doesn't alter that property.
> 
> However, until someone can work out what the alleged issue is, we can't
> really progress this argument, and we mustn't keep broken code simply
> because it purports to "fix" an unspecified issue.

You seem to forget that your change is to deal with one form of
broken firmware. It is simply impossible to enumerate all ways in
which firmware _might_ be broken. The original code allegedly
tried to deal with some other form of firmware flaw.

Just like in the EFI case, where there's so much breakage, I do
think that default behavior of software ought to be to assume
sane firmware behavior, allowing for workarounds where
needed. Unless positively identified to be needed on a system,
and unless needed virually everywhere, such workarounds
should not be enabled by default. That is, in the given case a
DMI quirk could have been added enabling workaround_bios_bug
by default for the R740.

>>> I'm not sure what more you are looking for, but this is very clear cut
>>> and safe from my point of view.
>> Well, your claim regarding "4.11 and earlier" is clearly wrong
> 
> I have made a statement, backed up with specific reference to the code
> which, to the best of my ability, demonstrates it to be true.
> 
> If you believe contrary then clearly identify the fault in my reasoning.

I did, by pointing out the earlier regression, which you elected to
ignore altogether in your reply.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-25 15:24 ` Jan Beulich
  2019-03-25 17:36   ` Andrew Cooper
@ 2019-03-27 11:02   ` George Dunlap
  2019-03-27 11:46     ` Jan Beulich
  1 sibling, 1 reply; 17+ messages in thread
From: George Dunlap @ 2019-03-27 11:02 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, George Dunlap, Juergen Gross
  Cc: Igor Druzhinin, Kevin Tian, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, Tim Deegan, Lars Kurth, Xen-devel,
	Julien Grall, Paul Durrant, Jun Nakajima, Ian Jackson,
	Roger Pau Monne

On 3/25/19 3:24 PM, Jan Beulich wrote:
>>>> On 21.03.19 at 21:26, <andrew.cooper3@citrix.com> wrote:
>> It turns out that this code was previously dead.
> 
> If it was entirely dead, why the rush to get the change into 4.12?
> (I suppose the later parts of description are then justifying why
> the code wasn't actually dead, and why it was getting in the way,
> but I think this way of putting it is at least confusing.)
> 
>> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
>> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
>> enough for acpi_parse_one_drhd() to not take the
>>
>>   /* Skip checking if segment is not accessible yet. */
>>
>> path unconditionally.
> 
> Or am I misreading the initial sentence, and you're really suggesting
> that prior to said commit the code in question had been dead? How's
> that commit related then? Segment 0 is valid irrespective of any
> MMCFG information gained from ACPI tables (see pci_segments_init()).
> 
>>  However, some systems have DMAR tables which list
>> devices which are disabled by user choice (in particular, Dell PowerEdge R740
>> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
>> case is entirely unhelpful behaviour.
> 
> As in many other cases, what is or is not unhelpful is often a matter
> of perception and hence possibly subjective. I can see your point,
> but I also can see why the authors of the code considered it a rather
> bad sign if non-existing PCI devices get named by an ACPI table.
> What if e.g. later a device gets hot-plugged into that very SBDF?
> 
>> Leave the warning which identifies the problematic devices, but drop the
>> remaining logic.  This leaves the system in better overall state, and working
>> in the same way that it did in previous releases.
> 
> I wonder whether you've taken the time to look at the description
> of the commit first introducing this logic (a8059ffced "VT-d: improve
> RMRR validity checking"). I find it worrying in particular to
> effectively revert a change which claims 'to avoid any security
> vulnerability with malicious s/s re-enabling "supposed disabled"
> devices' without any discussion of why that may have been a
> wrong perspective to take.

Having read the patch description, I think you have the polarity of that
comment wrong.

If I understand correctly, that patch disables part of the IOMMU if it
finds something strange, *unless* iommu=force is on.  The text gives the
idea that it is *less* safe to disable the IOMMU; and that enabling the
IOMMU functionality, even with invalid entries, is safer than leaving it
off.

The patch we checked in (if I understand correctly), enables the IOMMU
in more situations, even when iommu=force is not set; and thus (by the
logic of the original patch) is more "safe" by default than the previous
patch.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-27 11:02   ` George Dunlap
@ 2019-03-27 11:46     ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2019-03-27 11:46 UTC (permalink / raw)
  To: george.dunlap
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Tim Deegan, Lars Kurth, Xen-devel, Julien Grall, Paul Durrant,
	Jun Nakajima, Ian Jackson, Roger Pau Monne

>>> On 27.03.19 at 12:02, <george.dunlap@citrix.com> wrote:
> On 3/25/19 3:24 PM, Jan Beulich wrote:
>>>>> On 21.03.19 at 21:26, <andrew.cooper3@citrix.com> wrote:
>>> It turns out that this code was previously dead.
>> 
>> If it was entirely dead, why the rush to get the change into 4.12?
>> (I suppose the later parts of description are then justifying why
>> the code wasn't actually dead, and why it was getting in the way,
>> but I think this way of putting it is at least confusing.)
>> 
>>> c/s dcf41790 " x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling
>>> acpi_parse_dmar()" resulted in PCI segment 0 now having been initialised
>>> enough for acpi_parse_one_drhd() to not take the
>>>
>>>   /* Skip checking if segment is not accessible yet. */
>>>
>>> path unconditionally.
>> 
>> Or am I misreading the initial sentence, and you're really suggesting
>> that prior to said commit the code in question had been dead? How's
>> that commit related then? Segment 0 is valid irrespective of any
>> MMCFG information gained from ACPI tables (see pci_segments_init()).
>> 
>>>  However, some systems have DMAR tables which list
>>> devices which are disabled by user choice (in particular, Dell PowerEdge 
> R740
>>> with I/O AT DMA disabled), and turning off all IOMMU functionality in this
>>> case is entirely unhelpful behaviour.
>> 
>> As in many other cases, what is or is not unhelpful is often a matter
>> of perception and hence possibly subjective. I can see your point,
>> but I also can see why the authors of the code considered it a rather
>> bad sign if non-existing PCI devices get named by an ACPI table.
>> What if e.g. later a device gets hot-plugged into that very SBDF?
>> 
>>> Leave the warning which identifies the problematic devices, but drop the
>>> remaining logic.  This leaves the system in better overall state, and 
> working
>>> in the same way that it did in previous releases.
>> 
>> I wonder whether you've taken the time to look at the description
>> of the commit first introducing this logic (a8059ffced "VT-d: improve
>> RMRR validity checking"). I find it worrying in particular to
>> effectively revert a change which claims 'to avoid any security
>> vulnerability with malicious s/s re-enabling "supposed disabled"
>> devices' without any discussion of why that may have been a
>> wrong perspective to take.
> 
> Having read the patch description, I think you have the polarity of that
> comment wrong.
> 
> If I understand correctly, that patch disables part of the IOMMU if it
> finds something strange, *unless* iommu=force is on.  The text gives the
> idea that it is *less* safe to disable the IOMMU; and that enabling the
> IOMMU functionality, even with invalid entries, is safer than leaving it
> off.

Hmm, indeed, I did associate the security vulnerability statement
with the wrong context. Yet still, "iommu=force" is what is precisely
meant for such a situation: Enable the IOMMU despite there having
been some issues.

> The patch we checked in (if I understand correctly), enables the IOMMU
> in more situations, even when iommu=force is not set; and thus (by the
> logic of the original patch) is more "safe" by default than the previous
> patch.

I'm not sure about this conclusion of yours: If the goal had been to
make things "more secure" by default, why would they have disabled
the IOMMU in the first place when finding non-discoverable devices?

How do we know (in the abstract general case) that enabling the
IOMMU is not going to cause well hidden problems when the firmware
provided data is inconsistent? Leaving the IOMMU off in such a case
puts the system in a well known (albeit likely less secure) state.
Turning the IOMMU on, otoh, puts the system in an unknown (albeit
likely more secure) state. This calls for an admin decision, and I
continue to think that the well known state is preferable as the
default, because of the risk that the firmware flaw opens an unknown
security hole.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-26 13:39         ` Jan Beulich
@ 2019-03-27 14:38           ` Andrew Cooper
  2019-03-27 15:41             ` Jan Beulich
  2019-03-28 14:49             ` George Dunlap
  2019-03-28 15:06           ` George Dunlap
  1 sibling, 2 replies; 17+ messages in thread
From: Andrew Cooper @ 2019-03-27 14:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Lars Kurth, Xen-devel, Julien Grall, Paul Durrant, Jun Nakajima,
	Ian Jackson, Roger Pau Monne

On 26/03/2019 13:39, Jan Beulich wrote:
>>>> On 26.03.19 at 13:43, <andrew.cooper3@citrix.com> wrote:
>> On 26/03/2019 09:08, Jan Beulich wrote:
>>>>>> Leave the warning which identifies the problematic devices, but drop the
>>>>>> remaining logic.  This leaves the system in better overall state, and working
>>>>>> in the same way that it did in previous releases.
>>>>> I wonder whether you've taken the time to look at the description
>>>>> of the commit first introducing this logic (a8059ffced "VT-d: improve
>>>>> RMRR validity checking"). I find it worrying in particular to
>>>>> effectively revert a change which claims 'to avoid any security
>>>>> vulnerability with malicious s/s re-enabling "supposed disabled"
>>>>> devices' without any discussion of why that may have been a
>>>>> wrong perspective to take.
>>>> I had, and as a maintainer, I'd reject a patch like that were it
>>>> presented today.
>>> Understood. But whether you'd accept it with a better description
>>> is unknown, I assume.
>> I severely doubt I'd accept it at all, because it is entirely
>> unreasonable behaviour.
>>
>> At best, it is the equivalent of throwing your hands up in the air and
>> saying "I give up", and that is not good enough behaviour for Xen.
>>
>>>> There is a nebulous claim of security, but it is exactly that -
>>>> nebulous.  There isn't enough information to work out what the concern
>>>> was, and even if the concern was valid, disabling VT-d across the system
>>>> isn't an appropriate action to take.
>>> This heavily depends on the position the system's admin takes:
>>> Enabling VT-d in an incomplete fashion may as well be considered
>>> worse than not enabling it at all.
>> No - that's simply not true, or a reasonable position to take. 
> As is every way of thinking differently than you do?

No, but I do expect common sense to be used in the judgement of what is
appropriate and/or reasonable end user behaviour.

> I'm sorry to
> be putting it this way, but you continue to make claims about
> how people ought to think without giving any reason why that's
> the only valid way. I can't see anything wrong with someone
> putting themselves on the position that if they see an enabled
> IOMMU, they assume that pass-through is as safe as it can
> (currently) be.

Once again, we get back to a un-justified (and disputed) claim of
"security".

*What* is unsafe about having non-active devices behind an IOMMU?

How does this scenario differ from one of PCI hotplug where the device
really doesn't exist at boot time, and comes into existence later?

> Just to then be caught by surprise that there is
> a device not actually handled by any IOMMU? After all a non-
> existent device listed in a table may as well be a hint that it's
> just its SBDF which the firmware got wrong.

and where does Xen currently check this?  (this is a rhetorical question
- Xen cannot check this.)

There is absolutely nothing *at all* which guarantees that just because
a number of devices are identified to be behind specific IOMMUs, that
DMA wont start appearing from elsewhere in the system.

Security of the system when it comes to IOMMUs *is and always will be* a
mutually cooperative and trusting relationship between Xen and the firmware.

The notion of "I'm safe because there were no inconsistencies in a piece
of information I have to trust fully" is security theatre, not security.

>
>> Disabling the IOMMU prevents the system from booting with a PVH dom0.
> But doing what you did is not the only way of getting around this.
> Defaulting to workaround_bios_bug=1 in the PVH case would be
> another, as would be a mode in which the IOMMU exists for Dom0's
> purposes only (i.e. still disallowing any pass-through to DomU-s).

A discussion along these lines might be appropriate in the middle of a
dev cycle, and might even be valid for a discussion of future improvements.

It is not appropriate for resolving an issue identified as a 4.12
blocker by the RM, on a timescale which needs to fit into the 4.12
release plans.

>> I am not aware of a credible case where partially enabled VT-d is less
>> secure than no VT-d, and there is one headline case now where disabled
>> VT-d causes a failure to boot.
>>
>>> Furthermore, as much as the security related claim there is
>>> nebulous, your description - I'm sorry to say that - isn't much
>>> better, as you don't clarify why there's _no_ security aspect
>>> there. Stating that "this leaves the system in better overall
>>> state" without making clear why that is _for everyone_ is not
>>> helpful at all.
>> The nebulous security claim is not relevant to this patch.
>>
>> This code was not run previously.  An unexpected consequence of a change
>> in 4.12 caused it to run, and break booting on some (sadly rather
>> common) systems.
>>
>> This is a regression in 4.12 and needs resolving.  The choice is between
>> reverting dcf41790 or removing this code, and reverting dcf41790 is
>> obviously not a valid thing to do.
> As explained before, there was an earlier regression, which - if it
> had been noticed in time - would have made all versions from 4.2
> to 4.11 behave like 4.12 without your change.

And if you'd not broken the behaviour back in 4.2, this class of system
would have been discovered the first time someone tried booting Xen on
it, not at the point someone is trying to upgrade 4.11 to 4.12.

I also wouldn't be pushing this specific fix in this specific
situation.  (In reality, I'd have already pushed this fix when the bug
was originally reported, because its unreasonable behaviour no matter
the situation which calls for it to be modified.)

> This behavior was
> intended by the original author. Ripping the code out by convincing
> people to bypass normal review flow is, well, not very nice to put it
> mildly.

That again was explicitly called out, with an explanation of why I was
deviating from usual process, including an agreement from The Rest that
the exceptional circumstances were warranted.

>
>> Beyond that, I really don't care what the exact behaviour of 4.11 was. 
>> If there is a real security issue then it still needs fixing on all
>> versions of Xen, and this change doesn't alter that property.
>>
>> However, until someone can work out what the alleged issue is, we can't
>> really progress this argument, and we mustn't keep broken code simply
>> because it purports to "fix" an unspecified issue.
> You seem to forget that your change is to deal with one form of
> broken firmware. It is simply impossible to enumerate all ways in
> which firmware _might_ be broken. The original code allegedly
> tried to deal with some other form of firmware flaw.

There is no such thing as a non-buggy firmware, just like there is no
such thing as a bug-free Xen.

What matters is that Xen copes in a way which is not detrimental to a
user being able to reconfigure their system.

>
> Just like in the EFI case, where there's so much breakage, I do
> think that default behavior of software ought to be to assume
> sane firmware behavior, allowing for workarounds where
> needed. Unless positively identified to be needed on a system,
> and unless needed virually everywhere, such workarounds
> should not be enabled by default. That is, in the given case a
> DMI quirk could have been added enabling workaround_bios_bug
> by default for the R740.

I am not going to start this argument again.

Disagreements with all of these points have been raised by multiple
parties on xen-devel, and I find myself in a position where I don't have
anything civil to add.

>
>>>> I'm not sure what more you are looking for, but this is very clear cut
>>>> and safe from my point of view.
>>> Well, your claim regarding "4.11 and earlier" is clearly wrong
>> I have made a statement, backed up with specific reference to the code
>> which, to the best of my ability, demonstrates it to be true.
>>
>> If you believe contrary then clearly identify the fault in my reasoning.
> I did, by pointing out the earlier regression, which you elected to
> ignore altogether in your reply.

You identified why Xen 4.11 didn't behave in the way you expected.

In doing so, you also demonstrated why the commit message was, in fact,
correct.


Like other parts of this thread, it was deviating sufficiently
off-topic/relevance that I chose to trim it.  I will continue doing so
in an effort to reduce the amount of my time that this thread is
wasting.  I don't know for certain, but I expect you've also got better
things to do with your time than arguing over off-topic aspects of this
thread.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-27 14:38           ` Andrew Cooper
@ 2019-03-27 15:41             ` Jan Beulich
  2019-03-28 14:49             ` George Dunlap
  1 sibling, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2019-03-27 15:41 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Lars Kurth, Xen-devel, Julien Grall, Paul Durrant, Jun Nakajima,
	Ian Jackson, Roger Pau Monne

>>> On 27.03.19 at 15:38, <andrew.cooper3@citrix.com> wrote:
> There is absolutely nothing *at all* which guarantees that just because
> a number of devices are identified to be behind specific IOMMUs, that
> DMA wont start appearing from elsewhere in the system.

I fully agree here.

> Security of the system when it comes to IOMMUs *is and always will be* a
> mutually cooperative and trusting relationship between Xen and the firmware.
> 
> The notion of "I'm safe because there were no inconsistencies in a piece
> of information I have to trust fully" is security theatre, not security.

Doing our best to sanity check information we're handed is, I think,
not just "security theater".

>>> Disabling the IOMMU prevents the system from booting with a PVH dom0.
>> But doing what you did is not the only way of getting around this.
>> Defaulting to workaround_bios_bug=1 in the PVH case would be
>> another, as would be a mode in which the IOMMU exists for Dom0's
>> purposes only (i.e. still disallowing any pass-through to DomU-s).
> 
> A discussion along these lines might be appropriate in the middle of a
> dev cycle, and might even be valid for a discussion of future improvements.
> 
> It is not appropriate for resolving an issue identified as a 4.12
> blocker by the RM, on a timescale which needs to fit into the 4.12
> release plans.

Okay, I clearly must have missed the mail where this was flagged
as release critical.

Irrespective of this I don't think, though, that a pending release
is sufficient justification to rush in a controversial patch. We're
yet to hear from Kevin, but as you can see I would not have ack-
ed the patch. Emergency ack-s from The Rest maintainers ought
to be given on the assumption that the actual maintainer(s)
would likely not object. I don't think George tried to violate this,
i.e. I think he was assuming that the maintainer(s) would agree,
but as we see this was wrong in this case.

> And if you'd not broken the behaviour back in 4.2, this class of system
> would have been discovered the first time someone tried booting Xen on
> it, not at the point someone is trying to upgrade 4.11 to 4.12.

Correct, and I take the blame for this, but I don't think it helps
the situation. If the problem had been discovered earlier, I
don't think the fix would have been to rip out that code
altogether.

>>> I have made a statement, backed up with specific reference to the code
>>> which, to the best of my ability, demonstrates it to be true.
>>>
>>> If you believe contrary then clearly identify the fault in my reasoning.
>> I did, by pointing out the earlier regression, which you elected to
>> ignore altogether in your reply.
> 
> You identified why Xen 4.11 didn't behave in the way you expected.
> 
> In doing so, you also demonstrated why the commit message was, in fact,
> correct.

Well, we continue to disagree here. It was at best misleading and/or
incomplete.

> Like other parts of this thread, it was deviating sufficiently
> off-topic/relevance that I chose to trim it.  I will continue doing so
> in an effort to reduce the amount of my time that this thread is
> wasting.  I don't know for certain, but I expect you've also got better
> things to do with your time than arguing over off-topic aspects of this
> thread.

Purely from a technical pov the discussion on whether this should
have been rushed in may indeed be considered off topic. But this
doesn't means it's irrelevant. Would you have liked it better if I
had started a separate thread, e.g. by formally requesting a
revert?

I can understand your frustration, but it's not you alone who is
frustrated. Throughout this discussion I've not seen a single
sign that you would be willing to find a compromise. I'm sorry to
repeat myself, but it imo is not reasonable to assume that
your way of thinking is the only possible or reasonable one.
Claiming that anything else is beyond "common sense" is, well,
offending.

And yes, I do have better things to do with my time. But I don't
think I can leave uncommented how things have gone here, if
nothing else then in the hopes that it wouldn't repeat again.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-27 14:38           ` Andrew Cooper
  2019-03-27 15:41             ` Jan Beulich
@ 2019-03-28 14:49             ` George Dunlap
  1 sibling, 0 replies; 17+ messages in thread
From: George Dunlap @ 2019-03-28 14:49 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Lars Kurth, Xen-devel, Julien Grall, Paul Durrant, Jun Nakajima,
	Ian Jackson, Roger Pau Monne

On 3/27/19 2:38 PM, Andrew Cooper wrote:
> On 26/03/2019 13:39, Jan Beulich wrote:
>>>>> On 26.03.19 at 13:43, <andrew.cooper3@citrix.com> wrote:
>>> On 26/03/2019 09:08, Jan Beulich wrote:
>>>>>>> Leave the warning which identifies the problematic devices, but drop the
>>>>>>> remaining logic.  This leaves the system in better overall state, and working
>>>>>>> in the same way that it did in previous releases.
>>>>>> I wonder whether you've taken the time to look at the description
>>>>>> of the commit first introducing this logic (a8059ffced "VT-d: improve
>>>>>> RMRR validity checking"). I find it worrying in particular to
>>>>>> effectively revert a change which claims 'to avoid any security
>>>>>> vulnerability with malicious s/s re-enabling "supposed disabled"
>>>>>> devices' without any discussion of why that may have been a
>>>>>> wrong perspective to take.
>>>>> I had, and as a maintainer, I'd reject a patch like that were it
>>>>> presented today.
>>>> Understood. But whether you'd accept it with a better description
>>>> is unknown, I assume.
>>> I severely doubt I'd accept it at all, because it is entirely
>>> unreasonable behaviour.
>>>
>>> At best, it is the equivalent of throwing your hands up in the air and
>>> saying "I give up", and that is not good enough behaviour for Xen.
>>>
>>>>> There is a nebulous claim of security, but it is exactly that -
>>>>> nebulous.  There isn't enough information to work out what the concern
>>>>> was, and even if the concern was valid, disabling VT-d across the system
>>>>> isn't an appropriate action to take.
>>>> This heavily depends on the position the system's admin takes:
>>>> Enabling VT-d in an incomplete fashion may as well be considered
>>>> worse than not enabling it at all.
>>> No - that's simply not true, or a reasonable position to take. 
>> As is every way of thinking differently than you do?
> 
> No, but I do expect common sense to be used in the judgement of what is
> appropriate and/or reasonable end user behaviour.

Andy, you're not being reasonable here.  Just because *you* can't think
of how disabling the DRHD could be useful behavior doesn't mean there
isn't one.  The original patch took time and effort to write; so one of
two things is true:

1. The authors were attempting to address a theoretical concern; the
behavior in question didn't fix a real problem they had, or

2. The authors were attempting to address a real problem they had, and
the patch in question fixed it (for some value of "fixed").

#1 does happen, but on the whole, #2 is more likely; so it's much better
to assume that there was a problem that the patch fixed, even if it
might not have been the best *way* to fix it.

And in fact, if you go back and look at the original discussion [1]
(which involved Intel, Fujitsu, HP, and others), #2 turns out to to be
the case.  Lots of BIOSes had issues with misreporting RMRRs and DRHDs,
and on at least one of those, enabing a DRHD which had invalid RMRRs and
things behind it caused the box not to boot [2].  Recall that at the
time, VT-d was very new, and didn't have wide support.  So, Keir, seeing
all these reports, said:

"If we want to keep iommu=1 as default, then it is unacceptable to fail
to boot on a fairly wide range of modern systems. We have to warn-and
disable, partially or completely, unless iommu=force is specified. Or we
need to revert to iommu=0 as the default."  [3]

I think that was a very sensible approach, given the circumstances.

Now, as it happens, while there were lots of reports of invalid RMRR /
DRHD information from BIOSes, the only report I could find of something
actually failing to boot was a Fujitsu private platform [4].  So it
might actually be the case that, while BIOS bugs were common, failing to
boot when enabling "invalid" DRHDs was pretty rare.  Or it may have been
common.  We don't really have any way of knowing.

I continue to think that given that none of this was captured in the
commit message or code comments, and that we had two releases where this
behavior was disabled with no bug reports, that removing the code was a
reasonable thing to do.  But asserting that there is no conceivable
reason for the code to ever have existed is not -- even without doing
the archaeology.

Regarding what to do in light of this further background:  Given that
VT-d is now a more mature and widespread technology, given that it's
required on many systems, given that we've had two release cycles with
no reported problems, given XenRT's extensive testing, and finally given
the fact that the only known situation where disabling the DRHD was
necessary was on a "private" platform, think that removing the code and
seeing what happens is the best approach.

 -George

[1]
https://lists.xenproject.org/archives/html/xen-devel/2010-01/msg00665.html
[2]
https://lists.xenproject.org/archives/html/xen-devel/2010-01/msg00691.html
[3]
https://lists.xenproject.org/archives/html/xen-devel/2010-01/msg00731.html
[4]
https://lists.xenproject.org/archives/html/xen-devel/2010-01/msg00786.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-26 13:39         ` Jan Beulich
  2019-03-27 14:38           ` Andrew Cooper
@ 2019-03-28 15:06           ` George Dunlap
  2019-03-28 16:12             ` Jan Beulich
  1 sibling, 1 reply; 17+ messages in thread
From: George Dunlap @ 2019-03-28 15:06 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Tim Deegan,
	Lars Kurth, Xen-devel, Julien Grall, Paul Durrant, Jun Nakajima,
	Ian Jackson, Roger Pau Monne

On 3/26/19 1:39 PM, Jan Beulich wrote:
>> This is a regression in 4.12 and needs resolving.  The choice is between
>> reverting dcf41790 or removing this code, and reverting dcf41790 is
>> obviously not a valid thing to do.
> 
> As explained before, there was an earlier regression, which - if it
> had been noticed in time - would have made all versions from 4.2
> to 4.11 behave like 4.12 without your change. This behavior was
> intended by the original author. Ripping the code out by convincing
> people to bypass normal review flow is, well, not very nice to put it
> mildly.

I would like to say, I thought Andy tried to be very scrupulous here.
He had two different R-b's from people familiar with the code, and an
Ack from the release coordinator. I think he would have been justified
in checking the patch in on that basis, but just to make sure he was
doing the right thing, he asked someone else from "The Rest" (me) to
double-check to make sure.

Given the difference between how the two of you approach this sort of
thing, I understand why you might interpret that in a negative light.
But I don't think it's in the best interest of the project for people to
be completely risk-averse in this sort of situation.  We have source
control for a reason: If the change turns out to have been wrong, we can
revert it.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely
  2019-03-28 15:06           ` George Dunlap
@ 2019-03-28 16:12             ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2019-03-28 16:12 UTC (permalink / raw)
  To: george.dunlap
  Cc: Juergen Gross, Igor Druzhinin, Kevin Tian, Stefano Stabellini,
	Wei Liu, Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper,
	Tim Deegan, Lars Kurth, Xen-devel, Julien Grall, Paul Durrant,
	Jun Nakajima, Ian Jackson, Roger Pau Monne

>>> On 28.03.19 at 16:06, <george.dunlap@citrix.com> wrote:
> On 3/26/19 1:39 PM, Jan Beulich wrote:
>>> This is a regression in 4.12 and needs resolving.  The choice is between
>>> reverting dcf41790 or removing this code, and reverting dcf41790 is
>>> obviously not a valid thing to do.
>> 
>> As explained before, there was an earlier regression, which - if it
>> had been noticed in time - would have made all versions from 4.2
>> to 4.11 behave like 4.12 without your change. This behavior was
>> intended by the original author. Ripping the code out by convincing
>> people to bypass normal review flow is, well, not very nice to put it
>> mildly.
> 
> I would like to say, I thought Andy tried to be very scrupulous here.
> He had two different R-b's from people familiar with the code, and an
> Ack from the release coordinator. I think he would have been justified
> in checking the patch in on that basis,

Hmm, a very interesting position - maintainer acks are then not
necessary anymore. I think in that case quite a few of my patches
could have been committed long ago. In fact, had I been fast
enough (just as was the case here) I could then have committed
"x86/mtrr: fix build with gcc9" as well, since Andrew's sort of
objecting response arrived only about a week after I had received
Wei's and Roger's R-b.

While taking that position would eliminate some of the gigantic
stalls I'm observing for some of my patches, I think a more formal
weakening of the need for maintainer acks should then first be
put in place.

> but just to make sure he was
> doing the right thing, he asked someone else from "The Rest" (me) to
> double-check to make sure.

But there was no real urgency: The supposedly regressing patch was
put in back in August. So there was an entire half year to notice and
work around the issue. (I was made vaguely aware of it by Andrew
a few days before this patch arrived.)

> Given the difference between how the two of you approach this sort of
> thing, I understand why you might interpret that in a negative light.
> But I don't think it's in the best interest of the project for people to
> be completely risk-averse in this sort of situation.  We have source
> control for a reason: If the change turns out to have been wrong, we can
> revert it.

Sure, I can accept taking this position in the middle of a dev cycle.
I don't think it's appropriate immediately prior to a release.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-03-28 16:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-21 20:26 [PATCH for-4.12] passthrough/vtd: Drop the "workaround_bios_bug" logic entirely Andrew Cooper
2019-03-21 20:47 ` Igor Druzhinin
2019-03-22  1:43 ` Roger Pau Monné
2019-03-22  6:07 ` Juergen Gross
2019-03-22 11:53 ` George Dunlap
2019-03-25 15:24 ` Jan Beulich
2019-03-25 17:36   ` Andrew Cooper
2019-03-26  9:08     ` Jan Beulich
2019-03-26 12:43       ` Andrew Cooper
2019-03-26 13:39         ` Jan Beulich
2019-03-27 14:38           ` Andrew Cooper
2019-03-27 15:41             ` Jan Beulich
2019-03-28 14:49             ` George Dunlap
2019-03-28 15:06           ` George Dunlap
2019-03-28 16:12             ` Jan Beulich
2019-03-27 11:02   ` George Dunlap
2019-03-27 11:46     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.