All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Thomas Garnier <thgarnie@google.com>
Cc: "Ingo Molnar" <mingo@kernel.org>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"kernel test robot" <xiaolong.ye@intel.com>,
	"Alexander Potapenko" <glider@google.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Andrey Ryabinin" <aryabinin@virtuozzo.com>,
	"Ard Biesheuvel" <ard.biesheuvel@linaro.org>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Borislav Petkov" <bp@suse.de>,
	"Chris Wilson" <chris@chris-wilson.co.uk>,
	"Christian Borntraeger" <borntraeger@de.ibm.com>,
	"Dmitry Vyukov" <dvyukov@google.com>,
	"Frederic Weisbecker" <fweisbec@gmail.com>,
	"Jiri Kosina" <jikos@kernel.org>,
	"Joerg Roedel" <joro@8bytes.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Josh Poimboeuf" <jpoimboe@redhat.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Len Brown" <len.brown@intel.com>,
	"Lorenzo Stoakes" <lstoakes@gmail.com>,
	"Luis R . Rodriguez" <mcgrof@kernel.org>,
	"Matt Fleming" <matt@codeblueprint.co.uk>,
	"Michal Hocko" <mhocko@suse.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Paul Gortmaker" <paul.gortmaker@windriver.com>,
	"Pavel Machek" <pavel@ucw.cz>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	"Rusty Russell" <rusty@rustcorp.com.au>,
	"Stanislaw Gruszka" <sgruszka@redhat.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	zijun_hu <zijun_hu@htc.com>, LKML <linux-kernel@vger.kernel.org>,
	"Stephen Rothwell" <sfr@canb.auug.org.au>, LKP <lkp@01.org>
Subject: Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
Date: Wed, 22 Mar 2017 09:59:29 -0700	[thread overview]
Message-ID: <CALCETrUJ2qU9i0CQmMyW-We-RodN7pZRFEVeb5TT9Bz1jUTa_A@mail.gmail.com> (raw)
In-Reply-To: <CAJcbSZF1MdvoafLQwvyt8MObVxnbKZ-T04g-8Uv8YB0eV59qVA@mail.gmail.com>

On Wed, Mar 22, 2017 at 9:38 AM, Thomas Garnier <thgarnie@google.com> wrote:
> On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Thomas Garnier <thgarnie@google.com> wrote:
>>>
>>>> >  static inline void setup_fixmap_gdt(int cpu)
>>>> >  {
>>>> >         __set_fixmap(get_cpu_gdt_ro_index(cpu),
>>>> > -                    __pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
>>>> > +                    slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
>>>> > +                    pg_fixmap_gdt_flags);
>>>> >  }
>>>> >
>>>> >  /* Load the original GDT from the per-cpu structure */
>>>> >
>>>> > This makes UP boot for me, but SMP (2 cpus) is still busted.
>>>>
>>>> This change fixed boot for me:
>>>>
>>>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>>>> index b65155cc3760..4e30707d9f9a 100644
>>>> --- a/arch/x86/include/asm/fixmap.h
>>>> +++ b/arch/x86/include/asm/fixmap.h
>>>> @@ -104,7 +104,9 @@ enum fixed_addresses {
>>>>         FIX_GDT_REMAP_BEGIN,
>>>>         FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
>>>>
>>>> -       __end_of_permanent_fixed_addresses,
>>>> +       __end_of_permanent_fixed_addresses =
>>>> +               (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
>>>> +               -PTRS_PER_PTE,
>>>>
>>>> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.
>>>>
>>>> I don't think that the right fix but it might help understand the
>>>> exact root cause.
>>>
>>> Could this be related to the permission bits in the PMD itself getting out of sync
>>> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up the
>>> restoration. If they are on separate PMDs then this is worked around because the
>>> fixmap GDT is on a separate PMD.
>>>
>>
>> I don't think so.  I think it's a pair of bugs related to the way that
>> percpu areas are virtually mapped.
>>
>> Bug 1: __pa is totally bogus on percpu pointers.  Oddly, we have one
>> older instance of exactly the same bug (on the same GDT address) in
>> the kernel.  I'll send a patch.
>>
>> Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into
>> initial_page_table.  This makes access to the gdt fail in
>> startup_32_smp.  This looks like a longstanding bug, and I don't see
>> what it has to do with Thomas' series.  I'm still mulling over what to
>> do about it.
>
> Why do you think padding the fixmap also fix the problem? That's the
> thing I don't get.
>
> With the padding the PA is now correct and the memcmp check also
> succeed. That's odd.
>

Not sure.  There are some complicated heuristics in the percpu code
that determine how it's allocated, and the padding might be affecting
those heuristics.

--Andy

WARNING: multiple messages have this Message-ID (diff)
From: Andy Lutomirski <luto@kernel.org>
To: lkp@lists.01.org
Subject: Re: [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage
Date: Wed, 22 Mar 2017 09:59:29 -0700	[thread overview]
Message-ID: <CALCETrUJ2qU9i0CQmMyW-We-RodN7pZRFEVeb5TT9Bz1jUTa_A@mail.gmail.com> (raw)
In-Reply-To: <CAJcbSZF1MdvoafLQwvyt8MObVxnbKZ-T04g-8Uv8YB0eV59qVA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2844 bytes --]

On Wed, Mar 22, 2017 at 9:38 AM, Thomas Garnier <thgarnie@google.com> wrote:
> On Wed, Mar 22, 2017 at 9:33 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Mar 22, 2017 at 12:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>>>
>>> * Thomas Garnier <thgarnie@google.com> wrote:
>>>
>>>> >  static inline void setup_fixmap_gdt(int cpu)
>>>> >  {
>>>> >         __set_fixmap(get_cpu_gdt_ro_index(cpu),
>>>> > -                    __pa(get_cpu_gdt_rw(cpu)), pg_fixmap_gdt_flags);
>>>> > +                    slow_virt_to_phys(get_cpu_gdt_rw(cpu)),
>>>> > +                    pg_fixmap_gdt_flags);
>>>> >  }
>>>> >
>>>> >  /* Load the original GDT from the per-cpu structure */
>>>> >
>>>> > This makes UP boot for me, but SMP (2 cpus) is still busted.
>>>>
>>>> This change fixed boot for me:
>>>>
>>>> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
>>>> index b65155cc3760..4e30707d9f9a 100644
>>>> --- a/arch/x86/include/asm/fixmap.h
>>>> +++ b/arch/x86/include/asm/fixmap.h
>>>> @@ -104,7 +104,9 @@ enum fixed_addresses {
>>>>         FIX_GDT_REMAP_BEGIN,
>>>>         FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1,
>>>>
>>>> -       __end_of_permanent_fixed_addresses,
>>>> +       __end_of_permanent_fixed_addresses =
>>>> +               (FIX_GDT_REMAP_END + PTRS_PER_PTE - 1) &
>>>> +               -PTRS_PER_PTE,
>>>>
>>>> Just ensure PKMAP_BASE & FIX_WP_TEST are on a different PMD.
>>>>
>>>> I don't think that the right fix but it might help understand the
>>>> exact root cause.
>>>
>>> Could this be related to the permission bits in the PMD itself getting out of sync
>>> with the PTEs? WP test marks a page writable/unwritable, and maybe we mess up the
>>> restoration. If they are on separate PMDs then this is worked around because the
>>> fixmap GDT is on a separate PMD.
>>>
>>
>> I don't think so.  I think it's a pair of bugs related to the way that
>> percpu areas are virtually mapped.
>>
>> Bug 1: __pa is totally bogus on percpu pointers.  Oddly, we have one
>> older instance of exactly the same bug (on the same GDT address) in
>> the kernel.  I'll send a patch.
>>
>> Bug 2: Nothing syncs a freshly-set-up CPU's percpu area into
>> initial_page_table.  This makes access to the gdt fail in
>> startup_32_smp.  This looks like a longstanding bug, and I don't see
>> what it has to do with Thomas' series.  I'm still mulling over what to
>> do about it.
>
> Why do you think padding the fixmap also fix the problem? That's the
> thing I don't get.
>
> With the padding the PA is now correct and the memcmp check also
> succeed. That's odd.
>

Not sure.  There are some complicated heuristics in the percpu code
that determine how it's allocated, and the padding might be affecting
those heuristics.

--Andy

  reply	other threads:[~2017-03-22 17:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-21  4:57 [lkp-robot] [x86] 69218e4799: BUG:kernel_hang_in_boot_stage kernel test robot
2017-03-21  4:57 ` kernel test robot
2017-03-21 18:16 ` Thomas Garnier
2017-03-21 18:16   ` Thomas Garnier
2017-03-21 19:20   ` Linus Torvalds
2017-03-21 19:20     ` Linus Torvalds
2017-03-21 19:23     ` Thomas Garnier
2017-03-21 19:23       ` Thomas Garnier
2017-03-21 20:25       ` Thomas Garnier
2017-03-21 20:25         ` Thomas Garnier
2017-03-21 21:11         ` Linus Torvalds
2017-03-21 21:11           ` Linus Torvalds
2017-03-21 22:32           ` Andy Lutomirski
2017-03-21 22:32             ` Andy Lutomirski
2017-03-21 23:51             ` Andy Lutomirski
2017-03-21 23:51               ` Andy Lutomirski
2017-03-22  0:41               ` Thomas Garnier
2017-03-22  0:41                 ` Thomas Garnier
2017-03-22  4:27                 ` Andy Lutomirski
2017-03-22  4:27                   ` Andy Lutomirski
2017-03-22  5:16                   ` Thomas Garnier
2017-03-22  5:16                     ` Thomas Garnier
2017-03-22  7:36                     ` Ingo Molnar
2017-03-22  7:36                       ` Ingo Molnar
2017-03-22 16:33                       ` Andy Lutomirski
2017-03-22 16:33                         ` Andy Lutomirski
2017-03-22 16:38                         ` Thomas Garnier
2017-03-22 16:38                           ` Thomas Garnier
2017-03-22 16:59                           ` Andy Lutomirski [this message]
2017-03-22 16:59                             ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrUJ2qU9i0CQmMyW-We-RodN7pZRFEVeb5TT9Bz1jUTa_A@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=borntraeger@de.ibm.com \
    --cc=bp@suse.de \
    --cc=chris@chris-wilson.co.uk \
    --cc=corbet@lwn.net \
    --cc=dvyukov@google.com \
    --cc=fweisbec@gmail.com \
    --cc=glider@google.com \
    --cc=jgross@suse.com \
    --cc=jikos@kernel.org \
    --cc=joro@8bytes.org \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=lstoakes@gmail.com \
    --cc=matt@codeblueprint.co.uk \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=pavel@ucw.cz \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=rkrcmar@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=sfr@canb.auug.org.au \
    --cc=sgruszka@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=thgarnie@google.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vkuznets@redhat.com \
    --cc=xiaolong.ye@intel.com \
    --cc=zijun_hu@htc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.