All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
@ 2019-02-18 16:21 Igor Druzhinin
  2019-02-18 18:30 ` Andrew Cooper
  2019-02-19  8:13 ` Jan Beulich
  0 siblings, 2 replies; 18+ messages in thread
From: Igor Druzhinin @ 2019-02-18 16:21 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich, xen-devel
  Cc: George.Dunlap, Igor Druzhinin, julien.grall, wei.liu2, roger.pau

It's unsafe to disable IOMMU on a live system which is the case
if we're crashing since remapping hardware doesn't usually know what
to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
etc. (depends on the firmware configuration) to signal these abnormalities.
This, in turn, doesn't play well with kexec transition process as there is
no any handling available at the moment for this kind of events resulting
in failures to enter the kernel.

Modern Linux kernels taught to copy all the necessary DMAR/IR tables
following kexec from the previous kernel (Xen in our case) - so it's
currently normal to keep IOMMU enabled. It would only require to change
crash kernel command line by enabling IOMMU drivers from the existing users.

An option is left for compatibility with ancient crash kernels which
didn't like to have IOMMU active under their feet on boot.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---

Jan, Andrew, should we have this option here and, if so, what is the default
value for it should be?

---
 docs/misc/xen-command-line.pandoc | 5 +++++
 xen/arch/x86/crash.c              | 5 +++--
 xen/drivers/passthrough/iommu.c   | 6 ++++++
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index c8d1ced..8fd4791 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1235,6 +1235,11 @@ boolean (e.g. `iommu=no`) can override this and leave the IOMMUs disabled.
     This option depends on `intremap`, and is disabled by default due to some
     corner cases in the implementation which have yet to be resolved.
 
+*   The `crash-shutdown` boolean controls shutting down IOMMU before switching
+    to a crash kernel through kexec. This option is inactive by default and
+    is for compatibility with older kexec kernels only as modern kernels copy
+    all the necessary tables from the previous kernel following kexec transition.
+
 The following options are specific to Intel VT-d hardware:
 
 *   The `snoop` boolean controls the Snoop Control sub-feature, and is active
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index 60c98b6..fd67c10 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -162,8 +162,9 @@ static void nmi_shootdown_cpus(void)
         printk("Failed to shoot down CPUs {%*pbl}\n",
                nr_cpu_ids, cpumask_bits(&waiting_to_crash));
 
-    /* Crash shutdown any IOMMU functionality as the crashdump kernel is not
-     * happy when booting if interrupt/dma remapping is still enabled */
+    /* Try to crash shutdown IOMMU functionality as some old crashdump
+     * kernels are not happy when booting if interrupt/dma remapping
+     * is still enabled */
     iommu_crash_shutdown();
 
     __stop_this_cpu();
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 5ecaa10..75f1211 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -35,6 +35,7 @@ bool_t __read_mostly iommu_igfx = 1;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
 bool_t __read_mostly iommu_intremap = 1;
+bool_t __read_mostly iommu_crash_shutdown_enable;
 
 static bool __hwdom_initdata iommu_hwdom_none;
 bool __hwdom_initdata iommu_hwdom_strict;
@@ -88,6 +89,8 @@ static int __init parse_iommu_param(const char *s)
             iommu_intremap = val;
         else if ( (val = parse_boolean("intpost", s, ss)) >= 0 )
             iommu_intpost = val;
+        else if ( (val = parse_boolean("crash-shutdown", s, ss)) >= 0 )
+            iommu_crash_shutdown_enable = val;
         else if ( (val = parse_boolean("debug", s, ss)) >= 0 )
         {
             iommu_debug = val;
@@ -579,6 +582,9 @@ void iommu_share_p2m_table(struct domain* d)
 
 void iommu_crash_shutdown(void)
 {
+    if ( !iommu_crash_shutdown_enable )
+        return;
+
     if ( iommu_enabled )
         iommu_get_ops()->crash_shutdown();
     iommu_enabled = iommu_intremap = iommu_intpost = 0;
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-18 16:21 [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition Igor Druzhinin
@ 2019-02-18 18:30 ` Andrew Cooper
  2019-02-19  7:43   ` Jan Beulich
  2019-02-19  8:13 ` Jan Beulich
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2019-02-18 18:30 UTC (permalink / raw)
  To: Igor Druzhinin, jbeulich, xen-devel
  Cc: George.Dunlap, julien.grall, wei.liu2, roger.pau

On 18/02/2019 16:21, Igor Druzhinin wrote:
> It's unsafe to disable IOMMU on a live system which is the case
> if we're crashing since remapping hardware doesn't usually know what
> to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
> etc. (depends on the firmware configuration) to signal these abnormalities.
> This, in turn, doesn't play well with kexec transition process as there is
> no any handling available at the moment for this kind of events resulting
> in failures to enter the kernel.
>
> Modern Linux kernels taught to copy all the necessary DMAR/IR tables
> following kexec from the previous kernel (Xen in our case) - so it's
> currently normal to keep IOMMU enabled. It would only require to change
> crash kernel command line by enabling IOMMU drivers from the existing users.
>
> An option is left for compatibility with ancient crash kernels which
> didn't like to have IOMMU active under their feet on boot.
>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

To provide a bit of extra background, it turns out that in hindsight,
turning off the IOMMU in a crash usually makes things worse rather than
better.

In particular, any guest with a PCI device which happens to allocate a
DMA buffer in GFN space which matches the crash region in MFN space will
end up corrupting the crash kernel when DMA remapping gets turned off.

Being able to boot with an IOMMU already active is becoming common, not
least because of the ongoing efforts to enforce pre-DXE DMA protection
to protect against cold-boot DMA rootkits.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-18 18:30 ` Andrew Cooper
@ 2019-02-19  7:43   ` Jan Beulich
  2019-02-19 12:31     ` Igor Druzhinin
  2019-02-19 21:19     ` Andrew Cooper
  0 siblings, 2 replies; 18+ messages in thread
From: Jan Beulich @ 2019-02-19  7:43 UTC (permalink / raw)
  To: Andrew Cooper, Igor Druzhinin
  Cc: George Dunlap, xen-devel, Julien Grall, Wei Liu, Roger Pau Monne

>>> On 18.02.19 at 19:30, <andrew.cooper3@citrix.com> wrote:
> On 18/02/2019 16:21, Igor Druzhinin wrote:
>> It's unsafe to disable IOMMU on a live system which is the case
>> if we're crashing since remapping hardware doesn't usually know what
>> to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
>> etc. (depends on the firmware configuration) to signal these abnormalities.
>> This, in turn, doesn't play well with kexec transition process as there is
>> no any handling available at the moment for this kind of events resulting
>> in failures to enter the kernel.
>>
>> Modern Linux kernels taught to copy all the necessary DMAR/IR tables
>> following kexec from the previous kernel (Xen in our case) - so it's
>> currently normal to keep IOMMU enabled. It would only require to change
>> crash kernel command line by enabling IOMMU drivers from the existing users.

Is this the normal option ("intel_iommu=on" in the Intel case), or
rather something special? Considering that you explicitly talk about
Linux here anyway, mentioning the option(s) explicitly would seem
to make sense.

>> An option is left for compatibility with ancient crash kernels which
>> didn't like to have IOMMU active under their feet on boot.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> 
> To provide a bit of extra background, it turns out that in hindsight,
> turning off the IOMMU in a crash usually makes things worse rather than
> better.

For an unknown definition of "usually". Corrupted (IOMMU) page
tables are not really an impossible crash reason.

> In particular, any guest with a PCI device which happens to allocate a
> DMA buffer in GFN space which matches the crash region in MFN space will
> end up corrupting the crash kernel when DMA remapping gets turned off.

Indeed, but that's only PVH Dom0 (unsupported as of yet) or PV
Dom0 using PV IOMMU functionality (not even in tree as of yet). So
the question is how applicable this change really is at this point in
time. I notice it hasn't been tagged for 4.12, so please don't take
this as objection to it going in - I'm only trying to better understand
the implications.

> Being able to boot with an IOMMU already active is becoming common, not
> least because of the ongoing efforts to enforce pre-DXE DMA protection
> to protect against cold-boot DMA rootkits.

What about the interrupt remapping part of the IOMMU functionality?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-18 16:21 [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition Igor Druzhinin
  2019-02-18 18:30 ` Andrew Cooper
@ 2019-02-19  8:13 ` Jan Beulich
  2019-02-19 12:08   ` Igor Druzhinin
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2019-02-19  8:13 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

>>> On 18.02.19 at 17:21, <igor.druzhinin@citrix.com> wrote:

First of all - please follow patch submission rules: They get sent _to_
the list, with maintainers and others _cc_-ed.

> It's unsafe to disable IOMMU on a live system which is the case
> if we're crashing since remapping hardware doesn't usually know what
> to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
> etc. (depends on the firmware configuration) to signal these abnormalities.
> This, in turn, doesn't play well with kexec transition process as there is
> no any handling available at the moment for this kind of events resulting
> in failures to enter the kernel.
> 
> Modern Linux kernels taught to copy all the necessary DMAR/IR tables
> following kexec from the previous kernel (Xen in our case) - so it's
> currently normal to keep IOMMU enabled. It would only require to change
> crash kernel command line by enabling IOMMU drivers from the existing users.
> 
> An option is left for compatibility with ancient crash kernels which
> didn't like to have IOMMU active under their feet on boot.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
> 
> Jan, Andrew, should we have this option here and, if so, what is the default
> value for it should be?

I think the option should definitely be here (I can't even see what
the alternative would be, as then the patch would be effectively
be deletion of the iommu_crash_shutdown() invocation, just for a
patch adding the option to re-instate it.

The default is more difficult to choose: Keeping the IOMMU on for
an unaware crash kernel is perhaps about as bad as turning it off
when the crash kernel would know how to deal with it.

Wouldn't it be possible to allow the kexec tool to specify the
intended behavior via a (flag to a) hypercall? How is adding of the
respective command line option controlled in the bare metal Linux
case?

> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1235,6 +1235,11 @@ boolean (e.g. `iommu=no`) can override this and leave the IOMMUs disabled.
>      This option depends on `intremap`, and is disabled by default due to some
>      corner cases in the implementation which have yet to be resolved.
>  
> +*   The `crash-shutdown` boolean controls shutting down IOMMU before switching
> +    to a crash kernel through kexec. This option is inactive by default and
> +    is for compatibility with older kexec kernels only as modern kernels copy
> +    all the necessary tables from the previous kernel following kexec transition.

You also want to append to the "List of" at the top of this option's
description.

> --- a/xen/arch/x86/crash.c
> +++ b/xen/arch/x86/crash.c
> @@ -162,8 +162,9 @@ static void nmi_shootdown_cpus(void)
>          printk("Failed to shoot down CPUs {%*pbl}\n",
>                 nr_cpu_ids, cpumask_bits(&waiting_to_crash));
>  
> -    /* Crash shutdown any IOMMU functionality as the crashdump kernel is not
> -     * happy when booting if interrupt/dma remapping is still enabled */
> +    /* Try to crash shutdown IOMMU functionality as some old crashdump
> +     * kernels are not happy when booting if interrupt/dma remapping
> +     * is still enabled */
>      iommu_crash_shutdown();

Please can you correct comment style here seeing the you
re-write it anyway?

> @@ -88,6 +89,8 @@ static int __init parse_iommu_param(const char *s)
>              iommu_intremap = val;
>          else if ( (val = parse_boolean("intpost", s, ss)) >= 0 )
>              iommu_intpost = val;
> +        else if ( (val = parse_boolean("crash-shutdown", s, ss)) >= 0 )
> +            iommu_crash_shutdown_enable = val;

#ifdef CONFIG_KEXEC ?

> @@ -579,6 +582,9 @@ void iommu_share_p2m_table(struct domain* d)
>  
>  void iommu_crash_shutdown(void)
>  {
> +    if ( !iommu_crash_shutdown_enable )
> +        return;

How to read this is very ambiguous - the way I've been reading it
first ("enable IOMMU on crash shutdown") the condition seemed
outright wrong. I think the command line option wants to be
something like "iommu=crash-disable" and the variable then
iommu_crash_disable (subject to further improvement if still
considered potentially ambiguous).

Since it looks as if we leave the IOMMU entirely untouched in the
non-crash-kexec case already, I also wonder whether an inverse
control for that case wouldn't have been desirable already in the
past.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-19  8:13 ` Jan Beulich
@ 2019-02-19 12:08   ` Igor Druzhinin
  2019-02-20  8:48     ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Igor Druzhinin @ 2019-02-19 12:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

On 19/02/2019 08:13, Jan Beulich wrote:
>>>> On 18.02.19 at 17:21, <igor.druzhinin@citrix.com> wrote:
> 
> First of all - please follow patch submission rules: They get sent _to_
> the list, with maintainers and others _cc_-ed.
> 
>> It's unsafe to disable IOMMU on a live system which is the case
>> if we're crashing since remapping hardware doesn't usually know what
>> to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
>> etc. (depends on the firmware configuration) to signal these abnormalities.
>> This, in turn, doesn't play well with kexec transition process as there is
>> no any handling available at the moment for this kind of events resulting
>> in failures to enter the kernel.
>>
>> Modern Linux kernels taught to copy all the necessary DMAR/IR tables
>> following kexec from the previous kernel (Xen in our case) - so it's
>> currently normal to keep IOMMU enabled. It would only require to change
>> crash kernel command line by enabling IOMMU drivers from the existing users.
>>
>> An option is left for compatibility with ancient crash kernels which
>> didn't like to have IOMMU active under their feet on boot.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>>
>> Jan, Andrew, should we have this option here and, if so, what is the default
>> value for it should be?
> 
> I think the option should definitely be here (I can't even see what
> the alternative would be, as then the patch would be effectively
> be deletion of the iommu_crash_shutdown() invocation, just for a
> patch adding the option to re-instate it.
> 
> The default is more difficult to choose: Keeping the IOMMU on for
> an unaware crash kernel is perhaps about as bad as turning it off
> when the crash kernel would know how to deal with it.
> 
> Wouldn't it be possible to allow the kexec tool to specify the
> intended behavior via a (flag to a) hypercall? How is adding of the
> respective command line option controlled in the bare metal Linux
> case?

We certainly don't want to change the existing ABI but as to extending:
Could you give a usecase for kexec tool having such an additional
hypercall? What additional value to the command line option it would add?

I'm not following your last question. Could you elaborate?

>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1235,6 +1235,11 @@ boolean (e.g. `iommu=no`) can override this and leave the IOMMUs disabled.
>>      This option depends on `intremap`, and is disabled by default due to some
>>      corner cases in the implementation which have yet to be resolved.
>>  
>> +*   The `crash-shutdown` boolean controls shutting down IOMMU before switching
>> +    to a crash kernel through kexec. This option is inactive by default and
>> +    is for compatibility with older kexec kernels only as modern kernels copy
>> +    all the necessary tables from the previous kernel following kexec transition.
> 
> You also want to append to the "List of" at the top of this option's
> description.
> 

Will do.

>> --- a/xen/arch/x86/crash.c
>> +++ b/xen/arch/x86/crash.c
>> @@ -162,8 +162,9 @@ static void nmi_shootdown_cpus(void)
>>          printk("Failed to shoot down CPUs {%*pbl}\n",
>>                 nr_cpu_ids, cpumask_bits(&waiting_to_crash));
>>  
>> -    /* Crash shutdown any IOMMU functionality as the crashdump kernel is not
>> -     * happy when booting if interrupt/dma remapping is still enabled */
>> +    /* Try to crash shutdown IOMMU functionality as some old crashdump
>> +     * kernels are not happy when booting if interrupt/dma remapping
>> +     * is still enabled */
>>      iommu_crash_shutdown();
> 
> Please can you correct comment style here seeing the you
> re-write it anyway?
> 

Ok.

>> @@ -88,6 +89,8 @@ static int __init parse_iommu_param(const char *s)
>>              iommu_intremap = val;
>>          else if ( (val = parse_boolean("intpost", s, ss)) >= 0 )
>>              iommu_intpost = val;
>> +        else if ( (val = parse_boolean("crash-shutdown", s, ss)) >= 0 )
>> +            iommu_crash_shutdown_enable = val;
> 
> #ifdef CONFIG_KEXEC ?
> 
>> @@ -579,6 +582,9 @@ void iommu_share_p2m_table(struct domain* d)
>>  
>>  void iommu_crash_shutdown(void)
>>  {
>> +    if ( !iommu_crash_shutdown_enable )
>> +        return;
> 
> How to read this is very ambiguous - the way I've been reading it
> first ("enable IOMMU on crash shutdown") the condition seemed
> outright wrong. I think the command line option wants to be
> something like "iommu=crash-disable" and the variable then
> iommu_crash_disable (subject to further improvement if still
> considered potentially ambiguous).
> 

I can't see how "crash-shutdown" is ambiguous but if you prefer
"crash-disable" I'll rename it.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-19  7:43   ` Jan Beulich
@ 2019-02-19 12:31     ` Igor Druzhinin
  2019-02-20  8:55       ` Jan Beulich
  2019-02-19 21:19     ` Andrew Cooper
  1 sibling, 1 reply; 18+ messages in thread
From: Igor Druzhinin @ 2019-02-19 12:31 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, xen-devel, Julien Grall, Wei Liu, Roger Pau Monne

On 19/02/2019 07:43, Jan Beulich wrote:
>>>> On 18.02.19 at 19:30, <andrew.cooper3@citrix.com> wrote:
>> On 18/02/2019 16:21, Igor Druzhinin wrote:
>>> It's unsafe to disable IOMMU on a live system which is the case
>>> if we're crashing since remapping hardware doesn't usually know what
>>> to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
>>> etc. (depends on the firmware configuration) to signal these abnormalities.
>>> This, in turn, doesn't play well with kexec transition process as there is
>>> no any handling available at the moment for this kind of events resulting
>>> in failures to enter the kernel.
>>>
>>> Modern Linux kernels taught to copy all the necessary DMAR/IR tables
>>> following kexec from the previous kernel (Xen in our case) - so it's
>>> currently normal to keep IOMMU enabled. It would only require to change
>>> crash kernel command line by enabling IOMMU drivers from the existing users.
> 
> Is this the normal option ("intel_iommu=on" in the Intel case), or
> rather something special? Considering that you explicitly talk about
> Linux here anyway, mentioning the option(s) explicitly would seem
> to make sense.
> 

Yes, it's intel_iommu / amd_iommu (which is enabled by default now).
I'll give an example in the commit message.

>>> An option is left for compatibility with ancient crash kernels which
>>> didn't like to have IOMMU active under their feet on boot.
>>>
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>
>> To provide a bit of extra background, it turns out that in hindsight,
>> turning off the IOMMU in a crash usually makes things worse rather than
>> better.
> 
> For an unknown definition of "usually". Corrupted (IOMMU) page
> tables are not really an impossible crash reason.
> 

Well, that still falls into the definition of "usually". Having IOMMU
pages corrupted given proper isolation of devices sounds unlikely.

>> In particular, any guest with a PCI device which happens to allocate a
>> DMA buffer in GFN space which matches the crash region in MFN space will
>> end up corrupting the crash kernel when DMA remapping gets turned off.
> 
> Indeed, but that's only PVH Dom0 (unsupported as of yet) or PV
> Dom0 using PV IOMMU functionality (not even in tree as of yet). So
> the question is how applicable this change really is at this point in
> time. I notice it hasn't been tagged for 4.12, so please don't take
> this as objection to it going in - I'm only trying to better understand
> the implications.
> 

I think the more prevalent case that I wanted to cover here is the one
described in the commit message: assuming you have ongoing DMAs anywhere
in the system with IOMMU enabled hardware will raise an event (that
we're unable to handle) as soon as you turn IOMMU translation off. This
happens so commonly now that we cannot simply tolerate it happening
sporadically.

And I have no strong feelings if it doesn't get into 4.12.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-19  7:43   ` Jan Beulich
  2019-02-19 12:31     ` Igor Druzhinin
@ 2019-02-19 21:19     ` Andrew Cooper
  2019-02-20  9:01       ` Jan Beulich
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2019-02-19 21:19 UTC (permalink / raw)
  To: Jan Beulich, Igor Druzhinin
  Cc: George Dunlap, xen-devel, Julien Grall, Wei Liu, Roger Pau Monne

On 19/02/2019 07:43, Jan Beulich wrote:
>
>>> An option is left for compatibility with ancient crash kernels which
>>> didn't like to have IOMMU active under their feet on boot.
>>>
>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> To provide a bit of extra background, it turns out that in hindsight,
>> turning off the IOMMU in a crash usually makes things worse rather than
>> better.
> For an unknown definition of "usually". Corrupted (IOMMU) page
> tables are not really an impossible crash reason.

And?  Why is this relevant in context?

>
>> In particular, any guest with a PCI device which happens to allocate a
>> DMA buffer in GFN space which matches the crash region in MFN space will
>> end up corrupting the crash kernel when DMA remapping gets turned off.
> Indeed, but that's only PVH Dom0 (unsupported as of yet) or PV
> Dom0 using PV IOMMU functionality (not even in tree as of yet).

It is every single HVM guest with a PCI device.

The kexec/crash path is very broken already in Xen as soon as any kind
of PCI Passthrough is in use.

>> Being able to boot with an IOMMU already active is becoming common, not
>> least because of the ongoing efforts to enforce pre-DXE DMA protection
>> to protect against cold-boot DMA rootkits.
> What about the interrupt remapping part of the IOMMU functionality?

What about it?  It is a necessary part of protection against rogue devices.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-19 12:08   ` Igor Druzhinin
@ 2019-02-20  8:48     ` Jan Beulich
  2019-02-20 18:19       ` Igor Druzhinin
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2019-02-20  8:48 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

>>> On 19.02.19 at 13:08, <igor.druzhinin@citrix.com> wrote:
> On 19/02/2019 08:13, Jan Beulich wrote:
>>>>> On 18.02.19 at 17:21, <igor.druzhinin@citrix.com> wrote:
>>> Jan, Andrew, should we have this option here and, if so, what is the default
>>> value for it should be?
>> 
>> I think the option should definitely be here (I can't even see what
>> the alternative would be, as then the patch would be effectively
>> be deletion of the iommu_crash_shutdown() invocation, just for a
>> patch adding the option to re-instate it.
>> 
>> The default is more difficult to choose: Keeping the IOMMU on for
>> an unaware crash kernel is perhaps about as bad as turning it off
>> when the crash kernel would know how to deal with it.
>> 
>> Wouldn't it be possible to allow the kexec tool to specify the
>> intended behavior via a (flag to a) hypercall? How is adding of the
>> respective command line option controlled in the bare metal Linux
>> case?
> 
> We certainly don't want to change the existing ABI but as to extending:
> Could you give a usecase for kexec tool having such an additional
> hypercall? What additional value to the command line option it would add?

Some entity needs to decide whether to add the respective command
line option to the crash kernel's command line. It should be this same
entity to tell Xen whether to keep the IOMMU enabled while invoking
the crash kernel. I am merely guessing that this entity is the kexec
tool.

> I'm not following your last question. Could you elaborate?

I'm simply asking what the bare metal equivalent behavior is here,
i.e. how command line option addition and IOMMU state are being
kept in sync in that case. Just for reference, in the hope that what
they do is sane, and hence we could follow that model.

>>> @@ -88,6 +89,8 @@ static int __init parse_iommu_param(const char *s)
>>>              iommu_intremap = val;
>>>          else if ( (val = parse_boolean("intpost", s, ss)) >= 0 )
>>>              iommu_intpost = val;
>>> +        else if ( (val = parse_boolean("crash-shutdown", s, ss)) >= 0 )
>>> +            iommu_crash_shutdown_enable = val;
>> 
>> #ifdef CONFIG_KEXEC ?
>> 
>>> @@ -579,6 +582,9 @@ void iommu_share_p2m_table(struct domain* d)
>>>  
>>>  void iommu_crash_shutdown(void)
>>>  {
>>> +    if ( !iommu_crash_shutdown_enable )
>>> +        return;
>> 
>> How to read this is very ambiguous - the way I've been reading it
>> first ("enable IOMMU on crash shutdown") the condition seemed
>> outright wrong. I think the command line option wants to be
>> something like "iommu=crash-disable" and the variable then
>> iommu_crash_disable (subject to further improvement if still
>> considered potentially ambiguous).
>> 
> 
> I can't see how "crash-shutdown" is ambiguous but if you prefer
> "crash-disable" I'll rename it.

I've explained how I read this initially. The main problem I see here
is that "shutdown" in this situation has two meanings - the fact
that we shut down (the entire system) in preparation of invoking
the crash kernel (that's what in my understanding the function's
name was derived from), and "shutting down" (i.e. disabling) of
the IOMMU in this case (which in my understanding is what you've
based the option and symbol names upon).

Commonly we don't use "shut down" for an action on the IOMMU.
We "enable" or "disable" it. Hence my alternative name suggestion.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-19 12:31     ` Igor Druzhinin
@ 2019-02-20  8:55       ` Jan Beulich
  2019-02-20 19:05         ` Igor Druzhinin
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2019-02-20  8:55 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

>>> On 19.02.19 at 13:31, <igor.druzhinin@citrix.com> wrote:
> On 19/02/2019 07:43, Jan Beulich wrote:
>>>>> On 18.02.19 at 19:30, <andrew.cooper3@citrix.com> wrote:
>>> On 18/02/2019 16:21, Igor Druzhinin wrote:
>>>> It's unsafe to disable IOMMU on a live system which is the case
>>>> if we're crashing since remapping hardware doesn't usually know what
>>>> to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
>>>> etc. (depends on the firmware configuration) to signal these abnormalities.
>>>> This, in turn, doesn't play well with kexec transition process as there is
>>>> no any handling available at the moment for this kind of events resulting
>>>> in failures to enter the kernel.
>>>>
>>>> Modern Linux kernels taught to copy all the necessary DMAR/IR tables
>>>> following kexec from the previous kernel (Xen in our case) - so it's
>>>> currently normal to keep IOMMU enabled. It would only require to change
>>>> crash kernel command line by enabling IOMMU drivers from the existing users.
>> 
>> Is this the normal option ("intel_iommu=on" in the Intel case), or
>> rather something special? Considering that you explicitly talk about
>> Linux here anyway, mentioning the option(s) explicitly would seem
>> to make sense.
>> 
> 
> Yes, it's intel_iommu / amd_iommu (which is enabled by default now).

Not as far as I can tell, at least in the Intel case: There's still
CONFIG_INTEL_IOMMU_DEFAULT_ON. (As an aside, while unlikely, it's
also still possible to build kernels without IOMMU support inn the first
place.)

> I'll give an example in the commit message.

Thanks.

>>>> An option is left for compatibility with ancient crash kernels which
>>>> didn't like to have IOMMU active under their feet on boot.
>>>>
>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>>
>>> To provide a bit of extra background, it turns out that in hindsight,
>>> turning off the IOMMU in a crash usually makes things worse rather than
>>> better.
>> 
>> For an unknown definition of "usually". Corrupted (IOMMU) page
>> tables are not really an impossible crash reason.
>> 
> 
> Well, that still falls into the definition of "usually". Having IOMMU
> pages corrupted given proper isolation of devices sounds unlikely.

Everything, absolutely everything is possible as a cause for a crash.
I don't see why device isolation would matter here at all. Page table
corruption (be it IOMMU or CPU one) can be caused by
malfunctioning kernel code, entirely unrelated to what devices do.

>>> In particular, any guest with a PCI device which happens to allocate a
>>> DMA buffer in GFN space which matches the crash region in MFN space will
>>> end up corrupting the crash kernel when DMA remapping gets turned off.
>> 
>> Indeed, but that's only PVH Dom0 (unsupported as of yet) or PV
>> Dom0 using PV IOMMU functionality (not even in tree as of yet). So
>> the question is how applicable this change really is at this point in
>> time. I notice it hasn't been tagged for 4.12, so please don't take
>> this as objection to it going in - I'm only trying to better understand
>> the implications.
>> 
> 
> I think the more prevalent case that I wanted to cover here is the one
> described in the commit message: assuming you have ongoing DMAs anywhere
> in the system with IOMMU enabled hardware will raise an event (that
> we're unable to handle) as soon as you turn IOMMU translation off. This
> happens so commonly now that we cannot simply tolerate it happening
> sporadically.

Well, as validly pointed out by Andrew, I did wrongly limit my thinking
to Dom0 only here. So please disregard this part of my reply.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-19 21:19     ` Andrew Cooper
@ 2019-02-20  9:01       ` Jan Beulich
  2019-02-20 18:57         ` Igor Druzhinin
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2019-02-20  9:01 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Igor Druzhinin, Wei Liu, George Dunlap, Julien Grall, xen-devel,
	Roger Pau Monne

>>> On 19.02.19 at 22:19, <andrew.cooper3@citrix.com> wrote:
> On 19/02/2019 07:43, Jan Beulich wrote:
>>
>>>> An option is left for compatibility with ancient crash kernels which
>>>> didn't like to have IOMMU active under their feet on boot.
>>>>
>>>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>>> To provide a bit of extra background, it turns out that in hindsight,
>>> turning off the IOMMU in a crash usually makes things worse rather than
>>> better.
>> For an unknown definition of "usually". Corrupted (IOMMU) page
>> tables are not really an impossible crash reason.
> 
> And?  Why is this relevant in context?

Because our chances of recovering (with the IOMMU still enabled)
depend on uncorrupted page tables for at least those parts of the
address space to/from which I/O is still in flight.

>>> In particular, any guest with a PCI device which happens to allocate a
>>> DMA buffer in GFN space which matches the crash region in MFN space will
>>> end up corrupting the crash kernel when DMA remapping gets turned off.
>> Indeed, but that's only PVH Dom0 (unsupported as of yet) or PV
>> Dom0 using PV IOMMU functionality (not even in tree as of yet).
> 
> It is every single HVM guest with a PCI device.
> 
> The kexec/crash path is very broken already in Xen as soon as any kind
> of PCI Passthrough is in use.

Indeed, as said in the other reply to Sergey, I did wrongly consider
Dom0 only here.

>>> Being able to boot with an IOMMU already active is becoming common, not
>>> least because of the ongoing efforts to enforce pre-DXE DMA protection
>>> to protect against cold-boot DMA rootkits.
>> What about the interrupt remapping part of the IOMMU functionality?
> 
> What about it?  It is a necessary part of protection against rogue devices.

But isn't it a valid question whether keeping interrupt remapping
enabled is helpful or potentially making things worse? The
description of the patch discusses the DMA translation aspects
only. Unless the crash kernel would always operate in polling
mode only, it needs to have interrupts routed to the right
handler(s). Whether that's guaranteed with remapping left
enabled is not something that goes without saying, imo.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-20  8:48     ` Jan Beulich
@ 2019-02-20 18:19       ` Igor Druzhinin
  2019-02-22  9:52         ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Igor Druzhinin @ 2019-02-20 18:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

On 20/02/2019 08:48, Jan Beulich wrote:
> 
> Some entity needs to decide whether to add the respective command
> line option to the crash kernel's command line. It should be this same
> entity to tell Xen whether to keep the IOMMU enabled while invoking
> the crash kernel. I am merely guessing that this entity is the kexec
> tool.
> 

I was just double checking and it seems (assuming the device reset
correctly in the crash kernel) everything seem to work even without
enabling intel_iommu in the command line - newer kernels handle this
case by explicitly disabling translation in any case: they expect it to
be enabled by the previous kernel.

I intended Xen command line option be more of a fallback mechanism for
those still using ancient kdump kernels. For all the others the
transition should be transparent. With the above said, having an
orchestration tool on top of that seems a bit of overkill.

>> I'm not following your last question. Could you elaborate?
> 
> I'm simply asking what the bare metal equivalent behavior is here,
> i.e. how command line option addition and IOMMU state are being
> kept in sync in that case. Just for reference, in the hope that what
> they do is sane, and hence we could follow that model.
> 

As mentioned above, Linux (as the main kernel) currently only supports
secondary kernels that expect IOMMU to be enabled and have basic DMAR
understanding which means crash kernel is supposed to be up to date with
the main one. On the other hand, Linux (as a crash kernel) expects the
main kernel (which might be Xen) to do either of things: keep IOMMU
enabled or disabled - both cases are handled.

Igor


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-20  9:01       ` Jan Beulich
@ 2019-02-20 18:57         ` Igor Druzhinin
  0 siblings, 0 replies; 18+ messages in thread
From: Igor Druzhinin @ 2019-02-20 18:57 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: George Dunlap, xen-devel, Julien Grall, Wei Liu, Roger Pau Monne

On 20/02/2019 09:01, Jan Beulich wrote:
> But isn't it a valid question whether keeping interrupt remapping
> enabled is helpful or potentially making things worse? The
> description of the patch discusses the DMA translation aspects
> only. Unless the crash kernel would always operate in polling
> mode only, it needs to have interrupts routed to the right
> handler(s). Whether that's guaranteed with remapping left
> enabled is not something that goes without saying, imo.
> 

IR copy logic works as follows - IRT is copied as is and still in
functioning state while remapable interrupts might be in-flight. New IRT
entries are allocated in addition to existing ones which makes the
transition transparent for a device. I'll mention this aspect in the
commit message since it doesn't seem obvious - you're right.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-20  8:55       ` Jan Beulich
@ 2019-02-20 19:05         ` Igor Druzhinin
  2019-02-22  9:49           ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Igor Druzhinin @ 2019-02-20 19:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

On 20/02/2019 08:55, Jan Beulich wrote:
> 
> Everything, absolutely everything is possible as a cause for a crash.
> I don't see why device isolation would matter here at all. Page table
> corruption (be it IOMMU or CPU one) can be caused by
> malfunctioning kernel code, entirely unrelated to what devices do.
> 

Yes, I blindly assumed that stray DMA is the most common source of
kernel data corruption. Unfortunately, we don't really have a choice now
- almost all platforms these days raise some form of an error in the
event of unhanded translation fault (at least most of those we have now
in our labs). So it's either slight chance of entering the crash kernel
with corrupted IOMMU pages or not entering it at all in most of the cases.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-20 19:05         ` Igor Druzhinin
@ 2019-02-22  9:49           ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2019-02-22  9:49 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

>>> On 20.02.19 at 20:05, <igor.druzhinin@citrix.com> wrote:
> On 20/02/2019 08:55, Jan Beulich wrote:
>> 
>> Everything, absolutely everything is possible as a cause for a crash.
>> I don't see why device isolation would matter here at all. Page table
>> corruption (be it IOMMU or CPU one) can be caused by
>> malfunctioning kernel code, entirely unrelated to what devices do.
>> 
> 
> Yes, I blindly assumed that stray DMA is the most common source of
> kernel data corruption. Unfortunately, we don't really have a choice now
> - almost all platforms these days raise some form of an error in the
> event of unhanded translation fault (at least most of those we have now
> in our labs). So it's either slight chance of entering the crash kernel
> with corrupted IOMMU pages or not entering it at all in most of the cases.

And I'm fine with this, as long as the new (even if smaller) risk is
mentioned in the description.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-20 18:19       ` Igor Druzhinin
@ 2019-02-22  9:52         ` Jan Beulich
  2019-02-22 12:40           ` Igor Druzhinin
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2019-02-22  9:52 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

>>> On 20.02.19 at 19:19, <igor.druzhinin@citrix.com> wrote:
> On 20/02/2019 08:48, Jan Beulich wrote:
>> 
>> Some entity needs to decide whether to add the respective command
>> line option to the crash kernel's command line. It should be this same
>> entity to tell Xen whether to keep the IOMMU enabled while invoking
>> the crash kernel. I am merely guessing that this entity is the kexec
>> tool.
>> 
> 
> I was just double checking and it seems (assuming the device reset
> correctly in the crash kernel) everything seem to work even without
> enabling intel_iommu in the command line - newer kernels handle this
> case by explicitly disabling translation in any case: they expect it to
> be enabled by the previous kernel.

So if they disable translation, how is them doing so any better than us
doing so, other than theirs occurring slightly later on the time scale and
hence there being better chances of in-flight (at the time of the crash)
DMA having completed meanwhile?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-22  9:52         ` Jan Beulich
@ 2019-02-22 12:40           ` Igor Druzhinin
  2019-02-22 12:51             ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Igor Druzhinin @ 2019-02-22 12:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

On 22/02/2019 09:52, Jan Beulich wrote:
>>>> On 20.02.19 at 19:19, <igor.druzhinin@citrix.com> wrote:
>> On 20/02/2019 08:48, Jan Beulich wrote:
>>>
>>> Some entity needs to decide whether to add the respective command
>>> line option to the crash kernel's command line. It should be this same
>>> entity to tell Xen whether to keep the IOMMU enabled while invoking
>>> the crash kernel. I am merely guessing that this entity is the kexec
>>> tool.
>>>
>>
>> I was just double checking and it seems (assuming the device reset
>> correctly in the crash kernel) everything seem to work even without
>> enabling intel_iommu in the command line - newer kernels handle this
>> case by explicitly disabling translation in any case: they expect it to
>> be enabled by the previous kernel.
> 
> So if they disable translation, how is them doing so any better than us
> doing so, other than theirs occurring slightly later on the time scale and
> hence there being better chances of in-flight (at the time of the crash)
> DMA having completed meanwhile?
> 

There are several reasons why it's better:
a) kernel is able to perform device reset properly as it has bus
specific code that does this. There is even a comment in the code
mentioning that at the moment it disables the translation bus-specific
reset is finished and it's safer (as devices likely stopped DMA at this
point) to do it now.
b) kernel has the drivers that do per-driver-specific reset of the
devices that do not work well with bus-specific reset. It's simply
impossible to implement that in Xen
c) even if a device is uncooperative and keeps sending bus transactions,
an error event that I mentioned earlier will be properly handled as we
have facilities for it in the kernel at that point

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-22 12:40           ` Igor Druzhinin
@ 2019-02-22 12:51             ` Jan Beulich
  2019-02-22 13:12               ` Igor Druzhinin
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2019-02-22 12:51 UTC (permalink / raw)
  To: Igor Druzhinin
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

>>> On 22.02.19 at 13:40, <igor.druzhinin@citrix.com> wrote:
> On 22/02/2019 09:52, Jan Beulich wrote:
>>>>> On 20.02.19 at 19:19, <igor.druzhinin@citrix.com> wrote:
>>> On 20/02/2019 08:48, Jan Beulich wrote:
>>>>
>>>> Some entity needs to decide whether to add the respective command
>>>> line option to the crash kernel's command line. It should be this same
>>>> entity to tell Xen whether to keep the IOMMU enabled while invoking
>>>> the crash kernel. I am merely guessing that this entity is the kexec
>>>> tool.
>>>>
>>>
>>> I was just double checking and it seems (assuming the device reset
>>> correctly in the crash kernel) everything seem to work even without
>>> enabling intel_iommu in the command line - newer kernels handle this
>>> case by explicitly disabling translation in any case: they expect it to
>>> be enabled by the previous kernel.
>> 
>> So if they disable translation, how is them doing so any better than us
>> doing so, other than theirs occurring slightly later on the time scale and
>> hence there being better chances of in-flight (at the time of the crash)
>> DMA having completed meanwhile?
>> 
> 
> There are several reasons why it's better:
> a) kernel is able to perform device reset properly as it has bus
> specific code that does this. There is even a comment in the code
> mentioning that at the moment it disables the translation bus-specific
> reset is finished and it's safer (as devices likely stopped DMA at this
> point) to do it now.
> b) kernel has the drivers that do per-driver-specific reset of the
> devices that do not work well with bus-specific reset. It's simply
> impossible to implement that in Xen
> c) even if a device is uncooperative and keeps sending bus transactions,
> an error event that I mentioned earlier will be properly handled as we
> have facilities for it in the kernel at that point

Okay, c) is a convincing argument. a) and b) are partly only: Iirc
crash kernels don't load unnecessary drivers, so a babbling device
may be left untouched unless generic kernel code can reset or
otherwise silence it.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition
  2019-02-22 12:51             ` Jan Beulich
@ 2019-02-22 13:12               ` Igor Druzhinin
  0 siblings, 0 replies; 18+ messages in thread
From: Igor Druzhinin @ 2019-02-22 13:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Julien Grall, xen-devel,
	Roger Pau Monne

On 22/02/2019 12:51, Jan Beulich wrote:
>>>> On 22.02.19 at 13:40, <igor.druzhinin@citrix.com> wrote:
>> There are several reasons why it's better:
>> a) kernel is able to perform device reset properly as it has bus
>> specific code that does this. There is even a comment in the code
>> mentioning that at the moment it disables the translation bus-specific
>> reset is finished and it's safer (as devices likely stopped DMA at this
>> point) to do it now.
>> b) kernel has the drivers that do per-driver-specific reset of the
>> devices that do not work well with bus-specific reset. It's simply
>> impossible to implement that in Xen
>> c) even if a device is uncooperative and keeps sending bus transactions,
>> an error event that I mentioned earlier will be properly handled as we
>> have facilities for it in the kernel at that point
> 
> Okay, c) is a convincing argument. a) and b) are partly only: Iirc
> crash kernels don't load unnecessary drivers, so a babbling device
> may be left untouched unless generic kernel code can reset or
> otherwise silence it.

Crash kernel loads whatever drivers are necessary to save crash dumps:
if it needs to load RAID controller drivers which is still sending DMAs
it will do it - therefore it will try to reset the device.

In order to avoid scenario in (c) if the device is untouched it's still
unsafe to disable translation while any of the devices on a bus haven't
been properly reset. So the kernel will try it's best to avoid (c) and
will reset all the devices.

I want to also mention that I've tested this particular change in our
lab on about 500 of machines of different ages and classes. It
definitely makes the crash path more reliable - before this change
entering the crash dump failed in ~25-30% of the cases while with this
change reliability is close to 98-99%.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-02-22 13:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-18 16:21 [PATCH] iommu: leave IOMMU enabled by default during kexec crash transition Igor Druzhinin
2019-02-18 18:30 ` Andrew Cooper
2019-02-19  7:43   ` Jan Beulich
2019-02-19 12:31     ` Igor Druzhinin
2019-02-20  8:55       ` Jan Beulich
2019-02-20 19:05         ` Igor Druzhinin
2019-02-22  9:49           ` Jan Beulich
2019-02-19 21:19     ` Andrew Cooper
2019-02-20  9:01       ` Jan Beulich
2019-02-20 18:57         ` Igor Druzhinin
2019-02-19  8:13 ` Jan Beulich
2019-02-19 12:08   ` Igor Druzhinin
2019-02-20  8:48     ` Jan Beulich
2019-02-20 18:19       ` Igor Druzhinin
2019-02-22  9:52         ` Jan Beulich
2019-02-22 12:40           ` Igor Druzhinin
2019-02-22 12:51             ` Jan Beulich
2019-02-22 13:12               ` Igor Druzhinin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.