All of lore.kernel.org
 help / color / mirror / Atom feed
From: <Peter.Kurfer@gdata.de>
To: <jbeulich@suse.com>
Cc: xen-devel@lists.xenproject.org
Subject: Re: [Xen-devel] Host freezing after "fixing" recursive fault starting in multicalls.c
Date: Wed, 29 Jan 2020 13:52:54 +0000	[thread overview]
Message-ID: <7bb4c86ad97445269aee940c1ce07d4f@gdata.de> (raw)
In-Reply-To: <fc3bef3c-a10d-2cba-0277-d4a6b32bebf8@suse.com>

> Right, but the bad news is that there are no helpful hypervisor
> messages at all. Sadly this is partly my fault, because I should
> have asked you to do this log collection with a debug hypervisor.
> Most of the possibly interesting messages would appear only there.

> In any event, problems start quite a bit earlier, and typically
> it's the first instance of a problem that is the most helpful to
> analyze, as later ones may be cascade issues. The first sign of
> problems is an overlapping

To be honest, I was already wondering why there were only so few logs but while I already found the CMDLINE_XEN options for debug logs I didn't find any documentation how to build a debug hypervisor so far and it took me some time to work around the fact that I don't have physical access to the server to attach an actual serial cable and so on.

I will try to compile Xen with debug enabled and collect more logs afterwards.
Anything to be aware of?


Von: Jan Beulich <jbeulich@suse.com>
Gesendet: Mittwoch, 29. Januar 2020 09:59
An: Kurfer, Peter
Cc: xen-devel@lists.xenproject.org
Betreff: Re: Host freezing after "fixing" recursive fault starting in multicalls.c
    
On 29.01.2020 09:29, Peter.Kurfer@gdata.de wrote:
> As requested I configured one host with:
> 
>> loglvl=all guest_loglvl=all
> 
> and collected one day of logs via serial interface:
> 
>  https://drive.google.com/drive/folders/1sQvyNH0Sz28tUeVRZl9mowhB0Htd8ZpO?usp=sharing
> 
> searching for "error" or "multicalls.c" leads to some stacktraces that might be interesting.

Right, but the bad news is that there are no helpful hypervisor
messages at all. Sadly this is partly my fault, because I should
have asked you to do this log collection with a debug hypervisor.
Most of the possibly interesting messages would appear only there.

In any event, problems start quite a bit earlier, and typically
it's the first instance of a problem that is the most helpful to
analyze, as later ones may be cascade issues. The first sign of
problems is an overlapping

[14991.827762] BUG: unable to handle page fault for address: ffff888ae2eb6bd8

and

[14991.828172] WARNING: CPU: 5 PID: 2585 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x194/0x1c0

on CPUs 8 and 5.

> As far as I know the ACPI errors in the context of IPMI can be ignored.

Looks like so, yes, at least for the purposes here. What I wouldn't
put off as a possible reason for problems is the significant amount
of temperature related messages. What I also find at least curious
(but possibly just because I know too little of the respective
aspects of modern kernels) are the recurring __text_poke() instances
on the stack traces. Assuming these are to be expected in the first
place, there might be a race here which is either Xen-specific or
simply has a much better chance of hitting (larger window?) when
running on Xen. But I'm afraid this will need looking into (or at
least commenting on) by a kernel person.

Jan
    
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

      reply	other threads:[~2020-01-29 13:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-18 18:59 [Xen-devel] Host freezing after "fixing" recursive fault starting in multicalls.c Peter.Kurfer
2020-01-20  9:46 ` Jan Beulich
2020-01-20 12:09   ` Peter.Kurfer
2020-01-20 12:13     ` Jan Beulich
2020-01-29  8:29       ` Peter.Kurfer
2020-01-29  8:59         ` Jan Beulich
2020-01-29 13:52           ` Peter.Kurfer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7bb4c86ad97445269aee940c1ce07d4f@gdata.de \
    --to=peter.kurfer@gdata.de \
    --cc=jbeulich@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.