All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: "Thimo E." <abc@digithi.de>
Cc: Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
	"Dong, Eddie" <eddie.dong@intel.com>,
	Xen-develList <xen-devel@lists.xen.org>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	"Zhang, Yang Z" <yang.z.zhang@intel.com>,
	"Zhang, Xiantao" <xiantao.zhang@intel.com>
Subject: Re: cpuidle and un-eoid interrupts at the local apic
Date: Wed, 4 Sep 2013 19:55:22 +0100	[thread overview]
Message-ID: <5227821A.9090201@citrix.com> (raw)
In-Reply-To: <52277CDA.8010401@digithi.de>

On 04/09/13 19:32, Thimo E. wrote:
> Hello again,
>
> the last two weeks no crash with pinning dom0_vcpus_pin and
> restricting dom0 to 1 cpu. But yesterday it crashed again. So changed
> the command line again to:
>
> iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0
> console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M
> watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M
> cpuid_mask_xsave_eax=0
>
> And today server crashed again and produced a lot of debugging
> messages, see attached. The "..." in the logfiles mean that the
> message above the points was repeated very often.
>
> My summary so far:
> - With only 1 cpu atteched to dom0 the server was stable for 2 weeks,
> the crash there did not really show any irq problems, see
> crash20130903.txt
>    You can find Andrews ideas to this in
> http://forums.citrix.com/thread.jspa?messageID=1760771#1760771
> - With more than 1 cpu and irqbalance the server produced the crashes
> I've already posted before
> - Without irqbalance crash with some other fancy output, see
> crash20130904.txt
>
> Next step is to change the network card.
>
> Zhang, any update from your side ? Or do the others have any idea ?
> Could "ioapic_ack=old" help somewhere ?
>
> Best regards
>   Thimo
>

Ok - the second attachment (crash20130903.txt) is the one I have triaged
before, and the crash is impossible given the expected code flow through
the function.

%r14 is calculated as a the per-cpu cpu_info, which cannot possibly be
-1 at the point of the fault.  The only explanation is that the
pagefault is a result of a spurious jump to this location.

>From a quick glance at the other crash, vector 2e was the problematic
one (iirc).  The "Bad vmexit (reason 3)" at the top would suggest that
something on the system has sent an INIT to pcpu 2, which seems antisocial.

As we have identified that the hardware is delivering invalid
interrupts, I wouldn't necessarily read any more into this new crash;
something is very broken in the hardware.

I would be interested for any update from Intel regarding the ISR violation.

~Andrew

  reply	other threads:[~2013-09-04 18:55 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-31 20:32 cpuidle and un-eoid interrupts at the local apic Andrew Cooper
2013-06-03 14:30 ` Jan Beulich
2013-07-31  8:30 ` Thimo E.
2013-07-31  9:47   ` Andrew Cooper
2013-08-02 22:50     ` Thimo E.
2013-08-02 23:32       ` Andrew Cooper
2013-08-05 12:45         ` Jan Beulich
2013-08-05 14:51           ` Andrew Cooper
2013-08-09 21:27             ` Thimo E.
2013-08-09 21:40               ` Andrew Cooper
2013-08-09 21:44                 ` Andrew Cooper
2013-08-11 17:46                   ` Thimo E.
2013-08-12  6:02                     ` Zhang, Yang Z
2013-08-12  8:49                     ` Zhang, Yang Z
2013-08-12  8:57                       ` Jan Beulich
2013-08-12 11:52                       ` Thimo E
2013-08-12 12:04                         ` Andrew Cooper
2013-08-19 15:14                           ` Thimo E.
2013-08-20  5:43                             ` Thimo Eichstädt
2013-08-20  8:40                               ` Jan Beulich
2013-08-20  8:50                                 ` Zhang, Yang Z
2013-08-23  7:22                                   ` Thimo Eichstädt
2013-08-23  7:30                                     ` Zhang, Yang Z
2013-08-27  1:03                                     ` Zhang, Yang Z
2013-09-04 18:32                                       ` Thimo E.
2013-09-04 18:55                                         ` Andrew Cooper [this message]
2013-09-04 19:56                                           ` Thimo E.
2013-09-04 20:54                                             ` Andrew Cooper
2013-09-05  1:45                                               ` Zhang, Yang Z
2013-09-05  7:20                                                 ` Thimo E.
2013-09-05  1:15                                         ` Zhang, Yang Z
2013-09-17  2:09                                         ` Zhang, Yang Z
2013-09-17  7:39                                           ` Thimo E.
2013-09-17  7:43                                             ` Zhang, Yang Z
2013-09-17 21:04                                               ` Thimo E.
2013-09-18  1:18                                                 ` Zhang, Xiantao
2013-09-18 17:24                                                   ` Thimo E.
2013-09-18 12:06                                                 ` Andrew Cooper
2013-08-12 13:54                       ` Thimo E
2013-08-12 14:06                         ` Andrew Cooper
2013-08-13  1:43                           ` Zhang, Yang Z
2013-08-13  6:39                             ` Thimo E.
2013-08-13 11:39                         ` Wu, Feng
2013-08-13 12:46                           ` Andrew Cooper
2013-08-12  9:10                     ` Andrew Cooper
2013-08-12  5:50                 ` Zhang, Yang Z
2013-08-12  8:20               ` Jan Beulich
2013-08-12  9:28                 ` Andrew Cooper
2013-08-12 10:05                   ` Jan Beulich
2013-08-12 10:27                     ` Andrew Cooper
2013-08-14  2:53                       ` Zhang, Yang Z
2013-08-14  7:51                         ` Thimo E.
2013-08-14  9:52                         ` Andrew Cooper
2013-09-07 13:27                           ` Thimo E.
2013-09-07 17:02                             ` Andrew Cooper
2013-09-07 23:37                               ` Thimo E.
2013-09-08  9:53                                 ` Andrew Cooper
2013-09-08 10:24                                   ` Thimo E.
2013-09-09 13:16                                     ` Andrew Cooper
2013-09-09 14:48                                       ` Thimo Eichstädt
2013-09-09 15:12                                         ` Andrew Cooper
2013-09-09  7:59                               ` Jan Beulich
2013-09-09 12:53                                 ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5227821A.9090201@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=abc@digithi.de \
    --cc=eddie.dong@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    --cc=xiantao.zhang@intel.com \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.