All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sander Eikelenboom <linux@eikelenboom.it>
To: Juergen Gross <jgross@suse.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel@lists.xen.org
Subject: Re: Resend: Linux 4.11-rc7: kernel BUG at drivers/xen/events/events_base.c:1221
Date: Tue, 25 Apr 2017 13:54:52 +0200	[thread overview]
Message-ID: <4f77989d-73b8-d156-15df-74763a6c4bff@eikelenboom.it> (raw)
In-Reply-To: <ee75cb6c-677e-2e51-44ce-74d0e8593169@suse.com>

On 25/04/17 13:38, Juergen Gross wrote:
> On 25/04/17 13:28, Sander Eikelenboom wrote:
>> On 25/04/17 13:00, Juergen Gross wrote:
>>> On 25/04/17 12:33, Sander Eikelenboom wrote:
>>>> On 25/04/17 09:01, Juergen Gross wrote:
>>>>> On 25/04/17 08:57, Sander Eikelenboom wrote:
>>>>>> On 25/04/17 08:42, Juergen Gross wrote:
>>>>>>> On 25/04/17 08:35, Sander Eikelenboom wrote:
>>>>>>>> (XEN) [2017-04-24 21:20:53.203] d0v0 Unhandled invalid opcode fault/trap [#6, ec=ffffffff]
>>>>>>>> (XEN) [2017-04-24 21:20:53.203] domain_crash_sync called from entry.S: fault at ffff82d080358f70 entry.o#create_bounce_frame+0x145/0x154
>>>>>>>> (XEN) [2017-04-24 21:20:53.203] Domain 0 (vcpu#0) crashed on cpu#0:
>>>>>>>> (XEN) [2017-04-24 21:20:53.203] ----[ Xen-4.9-unstable  x86_64  debug=y   Not tainted ]----
>>>>>>>> (XEN) [2017-04-24 21:20:53.203] CPU:    0
>>>>>>>> (XEN) [2017-04-24 21:20:53.203] RIP:    e033:[<ffffffff8255a485>]
>>>>>>>
>>>>>>> Can you please tell us symbol+offset for RIP?
>>>>>>>
>>>>>>> Juergen
>>>>>>>
>>>>>>
>>>>>> Sure:
>>>>>> # addr2line -e vmlinux-4.11.0-rc8-20170424-linus-doflr-xennext-boris+ ffffffff8255a485
>>>>>> linux-linus/arch/x86/xen/enlighten_pv.c:288
>>>>>>
>>>>>> Which is:
>>>>>> static bool __init xen_check_xsave(void)
>>>>>> {
>>>>>>         unsigned int err, eax, edx;
>>>>>>
>>>>>>         /*
>>>>>>          * Xen 4.0 and older accidentally leaked the host XSAVE flag into guest
>>>>>>          * view, despite not being able to support guests using the
>>>>>>          * functionality. Probe for the actual availability of XSAVE by seeing
>>>>>>          * whether xgetbv executes successfully or raises #UD.
>>>>>>          */
>>>>>> HERE -->    asm volatile("1: .byte 0x0f,0x01,0xd0\n\t" /* xgetbv */    
>>>>>>                      "xor %[err], %[err]\n"
>>>>>>                      "2:\n\t"
>>>>>>                      ".pushsection .fixup,\"ax\"\n\t"
>>>>>>                      "3: movl $1,%[err]\n\t"
>>>>>>                      "jmp 2b\n\t"
>>>>>>                      ".popsection\n\t"
>>>>>>                      _ASM_EXTABLE(1b, 3b)
>>>>>>                      : [err] "=r" (err), "=a" (eax), "=d" (edx)
>>>>>>                      : "c" (0));
>>>>>>
>>>>>>         return err == 0;
>>>>>
>>>>> I hoped so. :-)
>>>>>
>>>>> I posted a patch to repair this some minutes ago. Would you mind to try
>>>>> it? See:
>>>>>
>>>>> https://lists.xen.org/archives/html/xen-devel/2017-04/msg02925.html
>>>>>
>>>>>
>>>>> Juergen
>>>>
>>>> Hmm next up seems to be a hanging dom0 kernel somewhat later during boot, with not too many clues.
>>>> (any output of xen debug-keys that could be of interest ?)
>>>>
>>> ...
>>>> [    0.000000] ACPI: Early table checksum verification disabled
>>>> [    0.000000] ACPI: RSDP 0x00000000000FB100 000014 (v00 ACPIAM)
>>>> [    0.000000] ACPI: RSDT 0x00000000AFF90000 000048 (v01 MSI    OEMSLIC  20100913 MSFT 00000097)
>>>> [    0.000000] ACPI: FACP 0x00000000AFF90200 000084 (v01 7640MS A7640100 20100913 MSFT 00000097)
>>>> [    0.000000] ACPI: DSDT 0x00000000AFF905E0 009427 (v01 A7640  A7640100 00000100 INTL 20051117)
>>>> [    0.000000] ACPI: FACS 0x00000000AFF9E000 000040
>>>> [    0.000000] ACPI: APIC 0x00000000AFF90390 000088 (v01 7640MS A7640100 20100913 MSFT 00000097)
>>>> [    0.000000] ACPI: MCFG 0x00000000AFF90420 00003C (v01 7640MS OEMMCFG  20100913 MSFT 00000097)
>>>> [    0.000000] ACPI: SLIC 0x00000000AFF90460 000176 (v01 MSI    OEMSLIC  20100913 MSFT 00000097)
>>>> [    0.000000] ACPI: OEMB 0x00000000AFF9E040 000072 (v01 7640MS A7640100 20100913 MSFT 00000097)
>>>> [    0.000000] ACPI: SRAT 0x00000000AFF9A5E0 000108 (v03 AMD    FAM_F_10 00000002 AMD  00000001)
>>>> [    0.000000] ACPI: HPET 0x00000000AFF9A6F0 000038 (v01 7640MS OEMHPET  20100913 MSFT 00000097)
>>>> [    0.000000] ACPI: IVRS 0x00000000AFF9A730 000110 (v01 AMD    RD890S   00202031 AMD  00000000)
>>>> [    0.000000] ACPI: SSDT 0x00000000AFF9A840 000DA4 (v01 A M I  POWERNOW 00000001 AMD  00000001)
>>>> [    0.000000] ACPI: Local APIC address 0xfee00000
>>>> [    0.000000] Setting APIC ro
>>>
>>> Hmm, this seems to be only the first part of a message.
>>>
>>> Could you try debug-key "0" (probably multiple times) and have a look
>>> where dom0 vcpu 0 is spending its time?
>>>
>>>
>>> Juergen
>>>
>>
>> Here you are:
>>
>> [    0.000000] ACPI: Early table checksum verification disabled
>> [    0.000000] ACPI: RSDP 0x00000000000FB100 000014 (v00 ACPIAM)
>> [    0.000000] ACPI: RSDT 0x00000000AFF90000 000048 (v01 MSI    OEMSLIC  20100913 MSFT 00000097)
>> [    0.000000] ACPI: FACP 0x00000000AFF90200 000084 (v01 7640MS A7640100 20100913 MSFT 00000097)
>> [    0.000000] ACPI: DSDT 0x00000000AFF905E0 009427 (v01 A7640  A7640100 00000100 INTL 20051117)
>> [    0.000000] ACPI: FACS 0x00000000AFF9E000 000040
>> [    0.000000] ACPI: APIC 0x00000000AFF90390 000088 (v01 7640MS A7640100 20100913 MSFT 00000097)
>> [    0.000000] ACPI: MCFG 0x00000000AFF90420 00003C (v01 7640MS OEMMCFG  20100913 MSFT 00000097)
>> [    0.000000] ACPI: SLIC 0x00000000AFF90460 000176 (v01 MSI    OEMSLIC  20100913 MSFT 00000097)
>> [    0.000000] ACPI: OEMB 0x00000000AFF9E040 000072 (v01 7640MS A7640100 20100913 MSFT 00000097)
>> [    0.000000] ACPI: SRAT 0x00000000AFF9A5E0 000108 (v03 AMD    FAM_F_10 00000002 AMD  00000001)
>> [    0.000000] ACPI: HPET 0x00000000AFF9A6F0 000038 (v01 7640MS OEMHPET  20100913 MSFT 00000097)
>> [    0.000000] ACPI: IVRS 0x00000000AFF9A730 000110 (v01 AMD    RD890S   00202031 AMD  00000000)
>> [    0.000000] ACPI: SSDT 0x00000000AFF9A840 000DA4 (v01 A M I  POWERNOW 00000001 AMD  00000001)
>> [    0.000000] ACPI: Local APIC address 0xfee00000
>> [    0.000000] Setting AP(XEN) [2017-04-25 11:11:35.568] *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0)
>> (XEN) [2017-04-25 11:11:37.000] '0' pressed -> dumping Dom0's registers
>> (XEN) [2017-04-25 11:11:37.001] *** Dumping Dom0 vcpu#0 state: ***
>> (XEN) [2017-04-25 11:11:37.001] RIP:    e033:[<ffffffff81cccde9>]
>> (XEN) [2017-04-25 11:11:45.422] RIP:    e033:[<ffffffff81cccde9>]
>> (XEN) [2017-04-25 11:11:56.132] RIP:    e033:[<ffffffff81ccc740>]
>> (XEN) [2017-04-25 11:12:02.474] RIP:    e033:[<ffffffff81cccde9>]
>> (XEN) [2017-04-25 11:12:06.224] RIP:    e033:[<ffffffff81ccc740>]

ffffffff81cccde9
arch/x86/entry/entry_64.o:?

ffffffff81ccc740>
arch/x86/entry/entry_64.S:1007

         #ifdef CONFIG_XEN
         idtentry xen_debug              do_debug                has_error_code=0
         idtentry xen_int3               do_int3                 has_error_code=0
HERE --> idtentry xen_stack_segment      do_stack_segment        has_error_code=1
         #endif

> Thanks. If you just could translate those again to symbol+offset?
> And the suspicious addresses on the stack as well:
> 
> ffffffff81023bfd
arch/x86/kernel/process_64.c:485
        if (static_cpu_has_bug(X86_BUG_SYSRET_SS_ATTRS)) {
                /*
                 * AMD CPUs have a misfeature: SYSRET sets the SS selector but
                 * does not update the cached descriptor.  As a result, if we
                 * do SYSRET while SS is NULL, we'll end up in user mode with
                 * SS apparently equal to __USER_DS but actually unusable.
                 *
                 * The straightforward workaround would be to fix it up just
                 * before SYSRET, but that would slow down the system call
                 * fast paths.  Instead, we ensure that SS is never NULL in
                 * system call context.  We do this by replacing NULL SS
                 * selectors at every context switch.  SYSCALL sets up a valid
                 * SS, so the only way to get NULL is to re-enter the kernel
                 * from CPL 3 through an interrupt.  Since that can't happen
                 * in the same task as a running syscall, we are guaranteed to
                 * context switch between every interrupt vector entry and a
                 * subsequent SYSRET.
                 *
                 * We read SS first because SS reads are much faster than
                 * writes.  Out of caution, we force SS to __KERNEL_DS even if
                 * it previously had a different non-NULL value.
                 */
                unsigned short ss_sel;
                savesegment(ss, ss_sel);
                if (ss_sel != __KERNEL_DS)
HERE -->               loadsegment(ss, __KERNEL_DS);
        }



> ffffffff81cc4620
init/main.c:956

         static int __ref kernel_init(void *unused)
HERE --> {
            int ret;

            kernel_init_freeable();
            /* need to finish all async __init code before freeing the memory */
            async_synchronize_full();



> ffffffff81ccb060
arch/x86/entry/entry_64.S:412

           /*
            * A newly forked process directly context switches into this address.
            *
            * rax: prev task we switched from
            * rbx: kernel thread func (NULL for user thread)
            * r12: kernel thread arg
            */
            ENTRY(ret_from_fork)
HERE -->        FRAME_BEGIN                     /* help unwinder find end of stack */
                movq    %rax, %rdi
                call    schedule_tail           /* rdi: 'prev' task parameter */


> 
> 
> Juergen
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-04-25 11:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-24 10:06 Resend: Linux 4.11-rc7: kernel BUG at drivers/xen/events/events_base.c:1221 Sander Eikelenboom
2017-04-24 14:17 ` Boris Ostrovsky
2017-04-24 14:20   ` Sander Eikelenboom
2017-04-24 15:49     ` Boris Ostrovsky
2017-04-24 16:10       ` Sander Eikelenboom
2017-04-24 20:15         ` Boris Ostrovsky
2017-04-25  6:14           ` Juergen Gross
2017-04-25  6:35             ` Sander Eikelenboom
2017-04-25  6:42               ` Juergen Gross
2017-04-25  6:57                 ` Sander Eikelenboom
2017-04-25  7:01                   ` Juergen Gross
2017-04-25 10:33                     ` Sander Eikelenboom
2017-04-25 11:00                       ` Juergen Gross
2017-04-25 11:28                         ` Sander Eikelenboom
2017-04-25 11:38                           ` Juergen Gross
2017-04-25 11:54                             ` Sander Eikelenboom [this message]
2017-04-25 11:17                       ` Juergen Gross
2017-04-25 12:49                       ` Juergen Gross
2017-04-25 13:12                         ` Sander Eikelenboom
2017-04-25 14:07                           ` Juergen Gross
2017-04-25 16:31                             ` Sander Eikelenboom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f77989d-73b8-d156-15df-74763a6c4bff@eikelenboom.it \
    --to=linux@eikelenboom.it \
    --cc=boris.ostrovsky@oracle.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.