All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Thomas Garnier <thgarnie@google.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: KASLR causes intermittent boot failures on some systems
Date: Tue, 25 Apr 2017 07:07:38 +0800	[thread overview]
Message-ID: <20170424230738.GA11734@x1> (raw)
In-Reply-To: <CAPcyv4gzdwDmtxoz7O5=1zGow+6zx2YqQqjRu1=kBX087MLx2Q@mail.gmail.com>

On 04/24/17 at 01:52pm, Dan Williams wrote:
> On Mon, Apr 24, 2017 at 1:37 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >  )
> >
> > On Thu, Apr 20, 2017 at 6:26 AM, Baoquan He <bhe@redhat.com> wrote:
> >> On 04/19/17 at 07:27am, Thomas Garnier wrote:
> >>> On Wed, Apr 19, 2017 at 6:36 AM, Baoquan He <bhe@redhat.com> wrote:
> >>> > Hi all,
> >>> >
> >>> > I login in Jeff's system, and added debug code, no clue found. However
> >>> > DaveY found he disabled page_offset randomization only and the efi issue
> >>> > won't be seen on his system with kaslr enabled. I did it too on Jeff's
> >>> > pmem system, it has the same result. I have rebooted several times, all
> >>> > boot successfully. In the current code, no __PAGE_OFFSET_BASE is used
> >>> > directly, don't know why it failed.
> >>>
> >>> Great! I still cannot repro it.
> >>>
> >>> >
> >>> > Does anyone have any idea or hint I can try? I read pmem code about
> >>> > the devm_nsio_enable/pmem_attach_disk/arch_add_memory, have no idea yet.
> >>>
> >>> I would test couple things:
> >>>  - Set page_offset_base to 0 by default and set it to
> >>> __PAGE_OFFSET_BASE in kernel_randomize_memory (without randomizing
> >>> it). If it crashes on a low address, it might be due to using __va or
> >>> PAGE_OFFSET in general before randomization is done.
> >>>  - Does any change in __PAGE_OFFSET lead to a crash? Or only when
> >>> __PAGE_OFFSET is on a specific range. Given that you may have to
> >>> reboot multiple times to get a crash, I assume that a specific range
> >>> is the problem but might be worth checking.
> >>
> >> I added debug code and collected boot logs about failure cases and
> >> success cases, seems it's related to crossing pgd entry issue. Below
> >> code change is part of my debugging code, I added printing anywhere,
> >> just abstract this for better understanding of the printed information
> >> below it. The emulated pmem memory is [1TB, 1TB+192G], namely
> >> [0x10000000000, 0x13000000000). If the left pud entries indexed from 1TB
> >> is smaller than 192, it will fail. init_memory_mapping might have
> >> handled direct mapping well, I am not sure if __add_pages is OK.
> >>
> >> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> >> index 5b536be..f3f8d43 100644
> >> --- a/drivers/nvdimm/pmem.c
> >> +++ b/drivers/nvdimm/pmem.c
> >> @@ -87,6 +87,8 @@ static int read_pmem(struct page *page, unsigned int off,
> >>  {
> >>         int rc;
> >>         void *mem = kmap_atomic(page);
> >> +       pr_info("pfn:0x%llu, off=0x%lx, pmem_addr:0x%llx, len:0x%lx\n",
> >> +               page_to_pfn(page), off, pmem_addr, len);
> >>
> >>         rc = memcpy_from_pmem(mem + off, pmem_addr, len);
> >>         kunmap_atomic(mem);
> >> @@ -312,6 +318,8 @@ static int pmem_attach_disk(struct device *dev,
> >>         if (IS_ERR(addr))
> >>                 return PTR_ERR(addr);
> >>         pmem->virt_addr = addr;
> >> +       pr_info("pmem->virt_addr:0x%llx, pmem->phys_addr:0x%llx, pmem->size:0x%llx\n",
> >> +               pmem->virt_addr, pmem->phys_addr, pmem->size);
> >>
> >>         blk_queue_write_cache(q, true, true);
> >>         blk_queue_make_request(q, pmem_make_request);
> >>
> >>
> >
> > Super useful. I can see that the virt_addr field can be set in three
> > locations (http://lxr.free-electrons.com/source/drivers/nvdimm/pmem.c#L288).
> > Can you check which one is used for the faulting addresses?
> >
> > Also the two functions used (devm_memremap_pages and devm_memremap)
> > seem to check if the region intersects with IORESOURCE_SYSTEM_RAM, if
> > it does then the mapping is not done and the __va is returned. I would
> > be interested to know if this is what's happening. Basically logging
> > the VA on these lines:
> >
> >  - http://lxr.free-electrons.com/source/kernel/memremap.c#L307
> >  - http://lxr.free-electrons.com/source/kernel/memremap.c#L98
> >
> > This way, we can get closer to which code does not handle PG boundary correctly.
> >
> > Thanks!
> >
> 
> When using the memmap= parameter we're using this call by default:
> 
>         } else if (pmem_should_map_pages(dev)) {
>                 addr = devm_memremap_pages(dev, &nsio->res,
>                                 &q->q_usage_counter, NULL);
>                 pmem->pfn_flags |= PFN_MAP;
>         } else
> 
> ...where we are assuming that the memmap= parameter does not specify a
> range-size that will exhaust all of system-memory just to hold the
> struct page array.

Yeah, according to my debugging tracking, it goes as Dan said. And the
is_ram is REGION_DISJOINT. And till arch_add_memory, the parameters
passed to arch_add_memory are "arch_add_memory, align_start:0x10000000000, align_size:0x3000000000",
seems it's going well.

Hi Dan,

I am always confused that in devm_memremap_pages, the passed in
parameter altmap is NULL, while it used devres_alloc_node to allocate a
page_map and that page_map contained a altmap instance, not pointer.
Then the addr range were inserted into pgmap_radix with value of
page_map. Why later in __add_pages, to_vmem_altmap() return NULL
according to my debugging code?

Thanks
Baoquan
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Baoquan He <bhe@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Garnier <thgarnie@google.com>,
	Jeff Moyer <jmoyer@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>
Subject: Re: KASLR causes intermittent boot failures on some systems
Date: Tue, 25 Apr 2017 07:07:38 +0800	[thread overview]
Message-ID: <20170424230738.GA11734@x1> (raw)
In-Reply-To: <CAPcyv4gzdwDmtxoz7O5=1zGow+6zx2YqQqjRu1=kBX087MLx2Q@mail.gmail.com>

On 04/24/17 at 01:52pm, Dan Williams wrote:
> On Mon, Apr 24, 2017 at 1:37 PM, Thomas Garnier <thgarnie@google.com> wrote:
> >  )
> >
> > On Thu, Apr 20, 2017 at 6:26 AM, Baoquan He <bhe@redhat.com> wrote:
> >> On 04/19/17 at 07:27am, Thomas Garnier wrote:
> >>> On Wed, Apr 19, 2017 at 6:36 AM, Baoquan He <bhe@redhat.com> wrote:
> >>> > Hi all,
> >>> >
> >>> > I login in Jeff's system, and added debug code, no clue found. However
> >>> > DaveY found he disabled page_offset randomization only and the efi issue
> >>> > won't be seen on his system with kaslr enabled. I did it too on Jeff's
> >>> > pmem system, it has the same result. I have rebooted several times, all
> >>> > boot successfully. In the current code, no __PAGE_OFFSET_BASE is used
> >>> > directly, don't know why it failed.
> >>>
> >>> Great! I still cannot repro it.
> >>>
> >>> >
> >>> > Does anyone have any idea or hint I can try? I read pmem code about
> >>> > the devm_nsio_enable/pmem_attach_disk/arch_add_memory, have no idea yet.
> >>>
> >>> I would test couple things:
> >>>  - Set page_offset_base to 0 by default and set it to
> >>> __PAGE_OFFSET_BASE in kernel_randomize_memory (without randomizing
> >>> it). If it crashes on a low address, it might be due to using __va or
> >>> PAGE_OFFSET in general before randomization is done.
> >>>  - Does any change in __PAGE_OFFSET lead to a crash? Or only when
> >>> __PAGE_OFFSET is on a specific range. Given that you may have to
> >>> reboot multiple times to get a crash, I assume that a specific range
> >>> is the problem but might be worth checking.
> >>
> >> I added debug code and collected boot logs about failure cases and
> >> success cases, seems it's related to crossing pgd entry issue. Below
> >> code change is part of my debugging code, I added printing anywhere,
> >> just abstract this for better understanding of the printed information
> >> below it. The emulated pmem memory is [1TB, 1TB+192G], namely
> >> [0x10000000000, 0x13000000000). If the left pud entries indexed from 1TB
> >> is smaller than 192, it will fail. init_memory_mapping might have
> >> handled direct mapping well, I am not sure if __add_pages is OK.
> >>
> >> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> >> index 5b536be..f3f8d43 100644
> >> --- a/drivers/nvdimm/pmem.c
> >> +++ b/drivers/nvdimm/pmem.c
> >> @@ -87,6 +87,8 @@ static int read_pmem(struct page *page, unsigned int off,
> >>  {
> >>         int rc;
> >>         void *mem = kmap_atomic(page);
> >> +       pr_info("pfn:0x%llu, off=0x%lx, pmem_addr:0x%llx, len:0x%lx\n",
> >> +               page_to_pfn(page), off, pmem_addr, len);
> >>
> >>         rc = memcpy_from_pmem(mem + off, pmem_addr, len);
> >>         kunmap_atomic(mem);
> >> @@ -312,6 +318,8 @@ static int pmem_attach_disk(struct device *dev,
> >>         if (IS_ERR(addr))
> >>                 return PTR_ERR(addr);
> >>         pmem->virt_addr = addr;
> >> +       pr_info("pmem->virt_addr:0x%llx, pmem->phys_addr:0x%llx, pmem->size:0x%llx\n",
> >> +               pmem->virt_addr, pmem->phys_addr, pmem->size);
> >>
> >>         blk_queue_write_cache(q, true, true);
> >>         blk_queue_make_request(q, pmem_make_request);
> >>
> >>
> >
> > Super useful. I can see that the virt_addr field can be set in three
> > locations (http://lxr.free-electrons.com/source/drivers/nvdimm/pmem.c#L288).
> > Can you check which one is used for the faulting addresses?
> >
> > Also the two functions used (devm_memremap_pages and devm_memremap)
> > seem to check if the region intersects with IORESOURCE_SYSTEM_RAM, if
> > it does then the mapping is not done and the __va is returned. I would
> > be interested to know if this is what's happening. Basically logging
> > the VA on these lines:
> >
> >  - http://lxr.free-electrons.com/source/kernel/memremap.c#L307
> >  - http://lxr.free-electrons.com/source/kernel/memremap.c#L98
> >
> > This way, we can get closer to which code does not handle PG boundary correctly.
> >
> > Thanks!
> >
> 
> When using the memmap= parameter we're using this call by default:
> 
>         } else if (pmem_should_map_pages(dev)) {
>                 addr = devm_memremap_pages(dev, &nsio->res,
>                                 &q->q_usage_counter, NULL);
>                 pmem->pfn_flags |= PFN_MAP;
>         } else
> 
> ...where we are assuming that the memmap= parameter does not specify a
> range-size that will exhaust all of system-memory just to hold the
> struct page array.

Yeah, according to my debugging tracking, it goes as Dan said. And the
is_ram is REGION_DISJOINT. And till arch_add_memory, the parameters
passed to arch_add_memory are "arch_add_memory, align_start:0x10000000000, align_size:0x3000000000",
seems it's going well.

Hi Dan,

I am always confused that in devm_memremap_pages, the passed in
parameter altmap is NULL, while it used devres_alloc_node to allocate a
page_map and that page_map contained a altmap instance, not pointer.
Then the addr range were inserted into pgmap_radix with value of
page_map. Why later in __add_pages, to_vmem_altmap() return NULL
according to my debugging code?

Thanks
Baoquan

  reply	other threads:[~2017-04-24 23:07 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-07 14:41 KASLR causes intermittent boot failures on some systems Jeff Moyer
2017-04-07 14:41 ` Jeff Moyer
2017-04-07 14:49 ` Thomas Garnier
2017-04-07 14:49   ` Thomas Garnier
2017-04-07 14:51   ` Jeff Moyer
2017-04-07 14:51     ` Jeff Moyer
2017-04-07 21:25 ` Kees Cook
2017-04-07 21:25   ` Kees Cook
2017-04-10 15:49   ` Jeff Moyer
2017-04-10 15:49     ` Jeff Moyer
2017-04-10 18:13     ` Kees Cook
2017-04-10 18:13       ` Kees Cook
2017-04-10 18:22       ` Jeff Moyer
2017-04-10 18:22         ` Jeff Moyer
2017-04-10 19:03         ` Kees Cook
2017-04-10 19:03           ` Kees Cook
2017-04-10 19:18           ` Jeff Moyer
2017-04-10 19:18             ` Jeff Moyer
2017-04-08  2:51 ` Baoquan He
2017-04-08  2:51   ` Baoquan He
2017-04-08  4:08 ` Baoquan He
2017-04-08  4:08   ` Baoquan He
2017-04-08  7:02   ` Dan Williams
2017-04-08  7:02     ` Dan Williams
2017-04-08  7:52     ` Baoquan He
2017-04-08  7:52       ` Baoquan He
2017-04-10 15:57   ` Jeff Moyer
2017-04-10 15:57     ` Jeff Moyer
2017-04-12  8:24 ` Dave Young
2017-04-12  8:24   ` Dave Young
2017-04-12  8:24   ` Dave Young
2017-04-12  8:27   ` Dave Young
2017-04-12  8:27     ` Dave Young
2017-04-12  8:27     ` Dave Young
2017-04-12  8:40   ` Dave Young
2017-04-12  8:40     ` Dave Young
2017-04-12  8:40     ` Dave Young
2017-04-12 12:52     ` Jeff Moyer
2017-04-12 12:52       ` Jeff Moyer
2017-04-12 12:52       ` Jeff Moyer
2017-04-19 13:36 ` Baoquan He
2017-04-19 13:36   ` Baoquan He
2017-04-19 14:27   ` Thomas Garnier
2017-04-19 14:27     ` Thomas Garnier
2017-04-19 14:34     ` Dan Williams
2017-04-19 14:34       ` Dan Williams
2017-04-19 14:56       ` Baoquan He
2017-04-19 14:56         ` Baoquan He
2017-04-19 14:56       ` Thomas Garnier
2017-04-19 14:56         ` Thomas Garnier
2017-04-19 14:55     ` Baoquan He
2017-04-19 14:55       ` Baoquan He
2017-04-20 13:26     ` Baoquan He
2017-04-20 13:26       ` Baoquan He
2017-04-24 20:37       ` Thomas Garnier
2017-04-24 20:37         ` Thomas Garnier
2017-04-24 20:52         ` Dan Williams
2017-04-24 20:52           ` Dan Williams
2017-04-24 23:07           ` Baoquan He [this message]
2017-04-24 23:07             ` Baoquan He
2017-04-24 23:18             ` Dan Williams
2017-04-24 23:18               ` Dan Williams
2017-04-24 23:56               ` Baoquan He
2017-04-24 23:56                 ` Baoquan He
2017-04-25  0:41             ` Thomas Garnier
2017-04-25  0:41               ` Thomas Garnier
2017-04-25  1:18               ` Baoquan He
2017-04-25  1:18                 ` Baoquan He
2017-05-01 11:32 ` Baoquan He
2017-05-01 11:32   ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170424230738.GA11734@x1 \
    --to=bhe@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=mingo@kernel.org \
    --cc=thgarnie@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.