mm, kasan: don't poison boot memory
diff mbox series

Message ID 487751e1ccec8fcd32e25a06ce000617e96d7ae1.1613595269.git.andreyknvl@google.com
State In Next
Commit c34a66c47f469930e105e596d96dcb2402741afa
Headers show
Series
  • mm, kasan: don't poison boot memory
Related show

Commit Message

Andrey Konovalov Feb. 17, 2021, 8:56 p.m. UTC
During boot, all non-reserved memblock memory is exposed to the buddy
allocator. Poisoning all that memory with KASAN lengthens boot time,
especially on systems with large amount of RAM. This patch makes
page_alloc to not call kasan_free_pages() on all new memory.

__free_pages_core() is used when exposing fresh memory during system
boot and when onlining memory during hotplug. This patch adds a new
FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
free_pages_prepare() from __free_pages_core().

This has little impact on KASAN memory tracking.

Assuming that there are no references to newly exposed pages before they
are ever allocated, there won't be any intended (but buggy) accesses to
that memory that KASAN would normally detect.

However, with this patch, KASAN stops detecting wild and large
out-of-bounds accesses that happen to land on a fresh memory page that
was never allocated. This is taken as an acceptable trade-off.

All memory allocated normally when the boot is over keeps getting
poisoned as usual.

Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
---
 mm/page_alloc.c | 43 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 11 deletions(-)

Comments

David Hildenbrand Feb. 18, 2021, 8:55 a.m. UTC | #1
On 17.02.21 21:56, Andrey Konovalov wrote:
> During boot, all non-reserved memblock memory is exposed to the buddy
> allocator. Poisoning all that memory with KASAN lengthens boot time,
> especially on systems with large amount of RAM. This patch makes
> page_alloc to not call kasan_free_pages() on all new memory.
> 
> __free_pages_core() is used when exposing fresh memory during system
> boot and when onlining memory during hotplug. This patch adds a new
> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
> free_pages_prepare() from __free_pages_core().
> 
> This has little impact on KASAN memory tracking.
> 
> Assuming that there are no references to newly exposed pages before they
> are ever allocated, there won't be any intended (but buggy) accesses to
> that memory that KASAN would normally detect.
> 
> However, with this patch, KASAN stops detecting wild and large
> out-of-bounds accesses that happen to land on a fresh memory page that
> was never allocated. This is taken as an acceptable trade-off.
> 
> All memory allocated normally when the boot is over keeps getting
> poisoned as usual.
> 
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d

Not sure this is the right thing to do, see

https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com

Reversing the order in which memory gets allocated + used during boot 
(in a patch by me) might have revealed an invalid memory access during boot.

I suspect that that issue would no longer get detected with your patch, 
as the invalid memory access would simply not get detected. Now, I 
cannot prove that :)
Andrey Konovalov Feb. 18, 2021, 7:40 p.m. UTC | #2
On Thu, Feb 18, 2021 at 9:55 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 17.02.21 21:56, Andrey Konovalov wrote:
> > During boot, all non-reserved memblock memory is exposed to the buddy
> > allocator. Poisoning all that memory with KASAN lengthens boot time,
> > especially on systems with large amount of RAM. This patch makes
> > page_alloc to not call kasan_free_pages() on all new memory.
> >
> > __free_pages_core() is used when exposing fresh memory during system
> > boot and when onlining memory during hotplug. This patch adds a new
> > FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
> > free_pages_prepare() from __free_pages_core().
> >
> > This has little impact on KASAN memory tracking.
> >
> > Assuming that there are no references to newly exposed pages before they
> > are ever allocated, there won't be any intended (but buggy) accesses to
> > that memory that KASAN would normally detect.
> >
> > However, with this patch, KASAN stops detecting wild and large
> > out-of-bounds accesses that happen to land on a fresh memory page that
> > was never allocated. This is taken as an acceptable trade-off.
> >
> > All memory allocated normally when the boot is over keeps getting
> > poisoned as usual.
> >
> > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> > Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>
> Not sure this is the right thing to do, see
>
> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
>
> Reversing the order in which memory gets allocated + used during boot
> (in a patch by me) might have revealed an invalid memory access during boot.
>
> I suspect that that issue would no longer get detected with your patch,
> as the invalid memory access would simply not get detected. Now, I
> cannot prove that :)

This looks like a good example.

Ok, what we can do is:

1. For KASAN_GENERIC: leave everything as is to be able to detect
these boot-time bugs.

2. For KASAN_SW_TAGS: remove boot-time poisoning via
kasan_free_pages(), but use the "invalid" tag as the default shadow
value. The end result should be the same: bad accesses will be
detected. For unallocated memory as it has the default "invalid" tag,
and for allocated memory as it's poisoned properly when
allocated/freed.

3. For KASAN_HW_TAGS: just remove boot-time poisoning via
kasan_free_pages(). As the memory tags have a random unspecified
value, we'll still have a 15/16 chance to detect a memory corruption.

This also makes sense from the performance perspective: KASAN_GENERIC
isn't meant to be running in production, so having a larger perf
impact is acceptable. The other two modes will be faster.
David Hildenbrand Feb. 18, 2021, 7:46 p.m. UTC | #3
On 18.02.21 20:40, Andrey Konovalov wrote:
> On Thu, Feb 18, 2021 at 9:55 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 17.02.21 21:56, Andrey Konovalov wrote:
>>> During boot, all non-reserved memblock memory is exposed to the buddy
>>> allocator. Poisoning all that memory with KASAN lengthens boot time,
>>> especially on systems with large amount of RAM. This patch makes
>>> page_alloc to not call kasan_free_pages() on all new memory.
>>>
>>> __free_pages_core() is used when exposing fresh memory during system
>>> boot and when onlining memory during hotplug. This patch adds a new
>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
>>> free_pages_prepare() from __free_pages_core().
>>>
>>> This has little impact on KASAN memory tracking.
>>>
>>> Assuming that there are no references to newly exposed pages before they
>>> are ever allocated, there won't be any intended (but buggy) accesses to
>>> that memory that KASAN would normally detect.
>>>
>>> However, with this patch, KASAN stops detecting wild and large
>>> out-of-bounds accesses that happen to land on a fresh memory page that
>>> was never allocated. This is taken as an acceptable trade-off.
>>>
>>> All memory allocated normally when the boot is over keeps getting
>>> poisoned as usual.
>>>
>>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>>
>> Not sure this is the right thing to do, see
>>
>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
>>
>> Reversing the order in which memory gets allocated + used during boot
>> (in a patch by me) might have revealed an invalid memory access during boot.
>>
>> I suspect that that issue would no longer get detected with your patch,
>> as the invalid memory access would simply not get detected. Now, I
>> cannot prove that :)
> 
> This looks like a good example.
> 
> Ok, what we can do is:
> 
> 1. For KASAN_GENERIC: leave everything as is to be able to detect
> these boot-time bugs.
> 
> 2. For KASAN_SW_TAGS: remove boot-time poisoning via
> kasan_free_pages(), but use the "invalid" tag as the default shadow
> value. The end result should be the same: bad accesses will be
> detected. For unallocated memory as it has the default "invalid" tag,
> and for allocated memory as it's poisoned properly when
> allocated/freed.
> 
> 3. For KASAN_HW_TAGS: just remove boot-time poisoning via
> kasan_free_pages(). As the memory tags have a random unspecified
> value, we'll still have a 15/16 chance to detect a memory corruption.
> 
> This also makes sense from the performance perspective: KASAN_GENERIC
> isn't meant to be running in production, so having a larger perf
> impact is acceptable. The other two modes will be faster.

Sounds in principle sane to me.

Side note: I am not sure if anybody runs KASAN in production. Memory is 
expensive. Feel free to prove me wrong, I'd be very interest in actual 
users.
Andrey Konovalov Feb. 18, 2021, 8:26 p.m. UTC | #4
On Thu, Feb 18, 2021 at 8:46 PM David Hildenbrand <david@redhat.com> wrote:
>
> > 1. For KASAN_GENERIC: leave everything as is to be able to detect
> > these boot-time bugs.
> >
> > 2. For KASAN_SW_TAGS: remove boot-time poisoning via
> > kasan_free_pages(), but use the "invalid" tag as the default shadow
> > value. The end result should be the same: bad accesses will be
> > detected. For unallocated memory as it has the default "invalid" tag,
> > and for allocated memory as it's poisoned properly when
> > allocated/freed.
> >
> > 3. For KASAN_HW_TAGS: just remove boot-time poisoning via
> > kasan_free_pages(). As the memory tags have a random unspecified
> > value, we'll still have a 15/16 chance to detect a memory corruption.
> >
> > This also makes sense from the performance perspective: KASAN_GENERIC
> > isn't meant to be running in production, so having a larger perf
> > impact is acceptable. The other two modes will be faster.
>
> Sounds in principle sane to me.

I'll post a v2 soon, thanks!

> Side note: I am not sure if anybody runs KASAN in production. Memory is
> expensive. Feel free to prove me wrong, I'd be very interest in actual
> users.

We run KASAN_SW_TAGS on some dogfood testing devices, and
KASAN_HW_TAGS is being developed with the goal to be running in
production.
George Kennedy Feb. 19, 2021, 12:06 a.m. UTC | #5
On 2/18/2021 3:55 AM, David Hildenbrand wrote:
> On 17.02.21 21:56, Andrey Konovalov wrote:
>> During boot, all non-reserved memblock memory is exposed to the buddy
>> allocator. Poisoning all that memory with KASAN lengthens boot time,
>> especially on systems with large amount of RAM. This patch makes
>> page_alloc to not call kasan_free_pages() on all new memory.
>>
>> __free_pages_core() is used when exposing fresh memory during system
>> boot and when onlining memory during hotplug. This patch adds a new
>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
>> free_pages_prepare() from __free_pages_core().
>>
>> This has little impact on KASAN memory tracking.
>>
>> Assuming that there are no references to newly exposed pages before they
>> are ever allocated, there won't be any intended (but buggy) accesses to
>> that memory that KASAN would normally detect.
>>
>> However, with this patch, KASAN stops detecting wild and large
>> out-of-bounds accesses that happen to land on a fresh memory page that
>> was never allocated. This is taken as an acceptable trade-off.
>>
>> All memory allocated normally when the boot is over keeps getting
>> poisoned as usual.
>>
>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>
> Not sure this is the right thing to do, see
>
> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
>
> Reversing the order in which memory gets allocated + used during boot 
> (in a patch by me) might have revealed an invalid memory access during 
> boot.
>
> I suspect that that issue would no longer get detected with your 
> patch, as the invalid memory access would simply not get detected. 
> Now, I cannot prove that :)

Since David's patch we're having trouble with the iBFT ACPI table, which 
is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c". KASAN 
detects that it is being used after free when ibft_init() accesses the 
iBFT table, but as of yet we can't find where it get's freed (we've 
instrumented calls to kunmap()).

Thank you,
George
Andrey Konovalov Feb. 19, 2021, 12:09 a.m. UTC | #6
On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
<george.kennedy@oracle.com> wrote:
>
>
>
> On 2/18/2021 3:55 AM, David Hildenbrand wrote:
> > On 17.02.21 21:56, Andrey Konovalov wrote:
> >> During boot, all non-reserved memblock memory is exposed to the buddy
> >> allocator. Poisoning all that memory with KASAN lengthens boot time,
> >> especially on systems with large amount of RAM. This patch makes
> >> page_alloc to not call kasan_free_pages() on all new memory.
> >>
> >> __free_pages_core() is used when exposing fresh memory during system
> >> boot and when onlining memory during hotplug. This patch adds a new
> >> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
> >> free_pages_prepare() from __free_pages_core().
> >>
> >> This has little impact on KASAN memory tracking.
> >>
> >> Assuming that there are no references to newly exposed pages before they
> >> are ever allocated, there won't be any intended (but buggy) accesses to
> >> that memory that KASAN would normally detect.
> >>
> >> However, with this patch, KASAN stops detecting wild and large
> >> out-of-bounds accesses that happen to land on a fresh memory page that
> >> was never allocated. This is taken as an acceptable trade-off.
> >>
> >> All memory allocated normally when the boot is over keeps getting
> >> poisoned as usual.
> >>
> >> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> >> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
> >
> > Not sure this is the right thing to do, see
> >
> > https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
> >
> > Reversing the order in which memory gets allocated + used during boot
> > (in a patch by me) might have revealed an invalid memory access during
> > boot.
> >
> > I suspect that that issue would no longer get detected with your
> > patch, as the invalid memory access would simply not get detected.
> > Now, I cannot prove that :)
>
> Since David's patch we're having trouble with the iBFT ACPI table, which
> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c". KASAN
> detects that it is being used after free when ibft_init() accesses the
> iBFT table, but as of yet we can't find where it get's freed (we've
> instrumented calls to kunmap()).

Maybe it doesn't get freed, but what you see is a wild or a large
out-of-bounds access. Since KASAN marks all memory as freed during the
memblock->page_alloc transition, such bugs can manifest as
use-after-frees.
George Kennedy Feb. 19, 2021, 4:45 p.m. UTC | #7
On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
> <george.kennedy@oracle.com> wrote:
>>
>>
>> On 2/18/2021 3:55 AM, David Hildenbrand wrote:
>>> On 17.02.21 21:56, Andrey Konovalov wrote:
>>>> During boot, all non-reserved memblock memory is exposed to the buddy
>>>> allocator. Poisoning all that memory with KASAN lengthens boot time,
>>>> especially on systems with large amount of RAM. This patch makes
>>>> page_alloc to not call kasan_free_pages() on all new memory.
>>>>
>>>> __free_pages_core() is used when exposing fresh memory during system
>>>> boot and when onlining memory during hotplug. This patch adds a new
>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
>>>> free_pages_prepare() from __free_pages_core().
>>>>
>>>> This has little impact on KASAN memory tracking.
>>>>
>>>> Assuming that there are no references to newly exposed pages before they
>>>> are ever allocated, there won't be any intended (but buggy) accesses to
>>>> that memory that KASAN would normally detect.
>>>>
>>>> However, with this patch, KASAN stops detecting wild and large
>>>> out-of-bounds accesses that happen to land on a fresh memory page that
>>>> was never allocated. This is taken as an acceptable trade-off.
>>>>
>>>> All memory allocated normally when the boot is over keeps getting
>>>> poisoned as usual.
>>>>
>>>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>>> Not sure this is the right thing to do, see
>>>
>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
>>>
>>> Reversing the order in which memory gets allocated + used during boot
>>> (in a patch by me) might have revealed an invalid memory access during
>>> boot.
>>>
>>> I suspect that that issue would no longer get detected with your
>>> patch, as the invalid memory access would simply not get detected.
>>> Now, I cannot prove that :)
>> Since David's patch we're having trouble with the iBFT ACPI table, which
>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c". KASAN
>> detects that it is being used after free when ibft_init() accesses the
>> iBFT table, but as of yet we can't find where it get's freed (we've
>> instrumented calls to kunmap()).
> Maybe it doesn't get freed, but what you see is a wild or a large
> out-of-bounds access. Since KASAN marks all memory as freed during the
> memblock->page_alloc transition, such bugs can manifest as
> use-after-frees.

It gets freed and re-used. By the time the iBFT table is accessed by 
ibft_init() the page has been over-written.

Setting page flags like the following before the call to kmap() prevents 
the iBFT table page from being freed:

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 0418feb..41c1bbd 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -287,9 +287,14 @@ static void __iomem *acpi_map(acpi_physical_address 
pg_off, unsigned long pg_sz)

         pfn = pg_off >> PAGE_SHIFT;
         if (should_use_kmap(pfn)) {
+               struct page *page =  pfn_to_page(pfn);
+
                 if (pg_sz > PAGE_SIZE)
                         return NULL;
-               return (void __iomem __force *)kmap(pfn_to_page(pfn));
+
+               page->flags |= ((1UL << PG_unevictable) | (1UL << 
PG_reserved) | (1UL << PG_locked));
+
+               return (void __iomem __force *)kmap(page);
         } else
                 return acpi_os_ioremap(pg_off, pg_sz);
  }

Just not sure of the correct way to set the page flags.

George
George Kennedy Feb. 19, 2021, 11:04 p.m. UTC | #8
On 2/19/2021 11:45 AM, George Kennedy wrote:
>
>
> On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
>> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
>> <george.kennedy@oracle.com> wrote:
>>>
>>>
>>> On 2/18/2021 3:55 AM, David Hildenbrand wrote:
>>>> On 17.02.21 21:56, Andrey Konovalov wrote:
>>>>> During boot, all non-reserved memblock memory is exposed to the buddy
>>>>> allocator. Poisoning all that memory with KASAN lengthens boot time,
>>>>> especially on systems with large amount of RAM. This patch makes
>>>>> page_alloc to not call kasan_free_pages() on all new memory.
>>>>>
>>>>> __free_pages_core() is used when exposing fresh memory during system
>>>>> boot and when onlining memory during hotplug. This patch adds a new
>>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
>>>>> free_pages_prepare() from __free_pages_core().
>>>>>
>>>>> This has little impact on KASAN memory tracking.
>>>>>
>>>>> Assuming that there are no references to newly exposed pages 
>>>>> before they
>>>>> are ever allocated, there won't be any intended (but buggy) 
>>>>> accesses to
>>>>> that memory that KASAN would normally detect.
>>>>>
>>>>> However, with this patch, KASAN stops detecting wild and large
>>>>> out-of-bounds accesses that happen to land on a fresh memory page 
>>>>> that
>>>>> was never allocated. This is taken as an acceptable trade-off.
>>>>>
>>>>> All memory allocated normally when the boot is over keeps getting
>>>>> poisoned as usual.
>>>>>
>>>>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>>>> Not sure this is the right thing to do, see
>>>>
>>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com 
>>>>
>>>>
>>>> Reversing the order in which memory gets allocated + used during boot
>>>> (in a patch by me) might have revealed an invalid memory access during
>>>> boot.
>>>>
>>>> I suspect that that issue would no longer get detected with your
>>>> patch, as the invalid memory access would simply not get detected.
>>>> Now, I cannot prove that :)
>>> Since David's patch we're having trouble with the iBFT ACPI table, 
>>> which
>>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c". KASAN
>>> detects that it is being used after free when ibft_init() accesses the
>>> iBFT table, but as of yet we can't find where it get's freed (we've
>>> instrumented calls to kunmap()).
>> Maybe it doesn't get freed, but what you see is a wild or a large
>> out-of-bounds access. Since KASAN marks all memory as freed during the
>> memblock->page_alloc transition, such bugs can manifest as
>> use-after-frees.
>
> It gets freed and re-used. By the time the iBFT table is accessed by 
> ibft_init() the page has been over-written.
>
> Setting page flags like the following before the call to kmap() 
> prevents the iBFT table page from being freed:

Cleaned up version:

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 0418feb..8f0a8e7 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_address 
pg_off, unsigned long pg_sz)

      pfn = pg_off >> PAGE_SHIFT;
      if (should_use_kmap(pfn)) {
+        struct page *page = pfn_to_page(pfn);
+
          if (pg_sz > PAGE_SIZE)
              return NULL;
-        return (void __iomem __force *)kmap(pfn_to_page(pfn));
+        SetPageReserved(page);
+        return (void __iomem __force *)kmap(page);
      } else
          return acpi_os_ioremap(pg_off, pg_sz);
  }
@@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address 
pg_off, void __iomem *vaddr)
      unsigned long pfn;

      pfn = pg_off >> PAGE_SHIFT;
-    if (should_use_kmap(pfn))
-        kunmap(pfn_to_page(pfn));
-    else
+    if (should_use_kmap(pfn)) {
+        struct page *page = pfn_to_page(pfn);
+
+        ClearPageReserved(page);
+        kunmap(page);
+    } else
          iounmap(vaddr);
  }

David, the above works, but wondering why it is now necessary. kunmap() 
is not hit. What other ways could a page mapped via kmap() be unmapped?

Thank you,
George

>
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 0418feb..41c1bbd 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -287,9 +287,14 @@ static void __iomem 
> *acpi_map(acpi_physical_address pg_off, unsigned long pg_sz)
>
>         pfn = pg_off >> PAGE_SHIFT;
>         if (should_use_kmap(pfn)) {
> +               struct page *page =  pfn_to_page(pfn);
> +
>                 if (pg_sz > PAGE_SIZE)
>                         return NULL;
> -               return (void __iomem __force *)kmap(pfn_to_page(pfn));
> +
> +               page->flags |= ((1UL << PG_unevictable) | (1UL << 
> PG_reserved) | (1UL << PG_locked));
> +
> +               return (void __iomem __force *)kmap(page);
>         } else
>                 return acpi_os_ioremap(pg_off, pg_sz);
>  }
>
> Just not sure of the correct way to set the page flags.
>
> George
>
David Hildenbrand Feb. 22, 2021, 9:52 a.m. UTC | #9
On 20.02.21 00:04, George Kennedy wrote:
> 
> 
> On 2/19/2021 11:45 AM, George Kennedy wrote:
>>
>>
>> On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
>>> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
>>> <george.kennedy@oracle.com> wrote:
>>>>
>>>>
>>>> On 2/18/2021 3:55 AM, David Hildenbrand wrote:
>>>>> On 17.02.21 21:56, Andrey Konovalov wrote:
>>>>>> During boot, all non-reserved memblock memory is exposed to the buddy
>>>>>> allocator. Poisoning all that memory with KASAN lengthens boot time,
>>>>>> especially on systems with large amount of RAM. This patch makes
>>>>>> page_alloc to not call kasan_free_pages() on all new memory.
>>>>>>
>>>>>> __free_pages_core() is used when exposing fresh memory during system
>>>>>> boot and when onlining memory during hotplug. This patch adds a new
>>>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() through
>>>>>> free_pages_prepare() from __free_pages_core().
>>>>>>
>>>>>> This has little impact on KASAN memory tracking.
>>>>>>
>>>>>> Assuming that there are no references to newly exposed pages
>>>>>> before they
>>>>>> are ever allocated, there won't be any intended (but buggy)
>>>>>> accesses to
>>>>>> that memory that KASAN would normally detect.
>>>>>>
>>>>>> However, with this patch, KASAN stops detecting wild and large
>>>>>> out-of-bounds accesses that happen to land on a fresh memory page
>>>>>> that
>>>>>> was never allocated. This is taken as an acceptable trade-off.
>>>>>>
>>>>>> All memory allocated normally when the boot is over keeps getting
>>>>>> poisoned as usual.
>>>>>>
>>>>>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>>>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>>>>> Not sure this is the right thing to do, see
>>>>>
>>>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
>>>>>
>>>>>
>>>>> Reversing the order in which memory gets allocated + used during boot
>>>>> (in a patch by me) might have revealed an invalid memory access during
>>>>> boot.
>>>>>
>>>>> I suspect that that issue would no longer get detected with your
>>>>> patch, as the invalid memory access would simply not get detected.
>>>>> Now, I cannot prove that :)
>>>> Since David's patch we're having trouble with the iBFT ACPI table,
>>>> which
>>>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c". KASAN
>>>> detects that it is being used after free when ibft_init() accesses the
>>>> iBFT table, but as of yet we can't find where it get's freed (we've
>>>> instrumented calls to kunmap()).
>>> Maybe it doesn't get freed, but what you see is a wild or a large
>>> out-of-bounds access. Since KASAN marks all memory as freed during the
>>> memblock->page_alloc transition, such bugs can manifest as
>>> use-after-frees.
>>
>> It gets freed and re-used. By the time the iBFT table is accessed by
>> ibft_init() the page has been over-written.
>>
>> Setting page flags like the following before the call to kmap()
>> prevents the iBFT table page from being freed:
> 
> Cleaned up version:
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 0418feb..8f0a8e7 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_address
> pg_off, unsigned long pg_sz)
> 
>        pfn = pg_off >> PAGE_SHIFT;
>        if (should_use_kmap(pfn)) {
> +        struct page *page = pfn_to_page(pfn);
> +
>            if (pg_sz > PAGE_SIZE)
>                return NULL;
> -        return (void __iomem __force *)kmap(pfn_to_page(pfn));
> +        SetPageReserved(page);
> +        return (void __iomem __force *)kmap(page);
>        } else
>            return acpi_os_ioremap(pg_off, pg_sz);
>    }
> @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address
> pg_off, void __iomem *vaddr)
>        unsigned long pfn;
> 
>        pfn = pg_off >> PAGE_SHIFT;
> -    if (should_use_kmap(pfn))
> -        kunmap(pfn_to_page(pfn));
> -    else
> +    if (should_use_kmap(pfn)) {
> +        struct page *page = pfn_to_page(pfn);
> +
> +        ClearPageReserved(page);
> +        kunmap(page);
> +    } else
>            iounmap(vaddr);
>    }
> 
> David, the above works, but wondering why it is now necessary. kunmap()
> is not hit. What other ways could a page mapped via kmap() be unmapped?
> 

Let me look into the code ... I have little experience with ACPI 
details, so bear with me.

I assume that acpi_map()/acpi_unmap() map some firmware blob that is 
provided via firmware/bios/... to us.

should_use_kmap() tells us whether
a) we have a "struct page" and should kmap() that one
b) we don't have a "struct page" and should ioremap.

As it is a blob, the firmware should always reserve that memory region 
via memblock (e.g., memblock_reserve()), such that we either
1) don't create a memmap ("struct page") at all (-> case b) )
2) if we have to create e memmap, we mark the page PG_reserved and
    *never* expose it to the buddy (-> case a) )


Are you telling me that in this case we might have a memmap for the HW 
blob that is *not* PG_reserved? In that case it most probably got 
exposed to the buddy where it can happily get allocated/freed.

The latent BUG would be that that blob gets exposed to the system like 
ordinary RAM, and not reserved via memblock early during boot. Assuming 
that blob has a low physical address, with my patch it will get 
allocated/used a lot earlier - which would mean we trigger this latent 
BUG now more easily.

There have been similar latent BUGs on ARM boards that my patch 
discovered where special RAM regions did not get marked as reserved via 
the device tree properly.

Now, this is just a wild guess :) Can you dump the page when mapping 
(before PageReserved()) and when unmapping, to see what the state of 
that memmap is?
George Kennedy Feb. 22, 2021, 3:13 p.m. UTC | #10
On 2/22/2021 4:52 AM, David Hildenbrand wrote:
> On 20.02.21 00:04, George Kennedy wrote:
>>
>>
>> On 2/19/2021 11:45 AM, George Kennedy wrote:
>>>
>>>
>>> On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
>>>> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
>>>> <george.kennedy@oracle.com> wrote:
>>>>>
>>>>>
>>>>> On 2/18/2021 3:55 AM, David Hildenbrand wrote:
>>>>>> On 17.02.21 21:56, Andrey Konovalov wrote:
>>>>>>> During boot, all non-reserved memblock memory is exposed to the 
>>>>>>> buddy
>>>>>>> allocator. Poisoning all that memory with KASAN lengthens boot 
>>>>>>> time,
>>>>>>> especially on systems with large amount of RAM. This patch makes
>>>>>>> page_alloc to not call kasan_free_pages() on all new memory.
>>>>>>>
>>>>>>> __free_pages_core() is used when exposing fresh memory during 
>>>>>>> system
>>>>>>> boot and when onlining memory during hotplug. This patch adds a new
>>>>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok() 
>>>>>>> through
>>>>>>> free_pages_prepare() from __free_pages_core().
>>>>>>>
>>>>>>> This has little impact on KASAN memory tracking.
>>>>>>>
>>>>>>> Assuming that there are no references to newly exposed pages
>>>>>>> before they
>>>>>>> are ever allocated, there won't be any intended (but buggy)
>>>>>>> accesses to
>>>>>>> that memory that KASAN would normally detect.
>>>>>>>
>>>>>>> However, with this patch, KASAN stops detecting wild and large
>>>>>>> out-of-bounds accesses that happen to land on a fresh memory page
>>>>>>> that
>>>>>>> was never allocated. This is taken as an acceptable trade-off.
>>>>>>>
>>>>>>> All memory allocated normally when the boot is over keeps getting
>>>>>>> poisoned as usual.
>>>>>>>
>>>>>>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>>>>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>>>>>> Not sure this is the right thing to do, see
>>>>>>
>>>>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com 
>>>>>>
>>>>>>
>>>>>>
>>>>>> Reversing the order in which memory gets allocated + used during 
>>>>>> boot
>>>>>> (in a patch by me) might have revealed an invalid memory access 
>>>>>> during
>>>>>> boot.
>>>>>>
>>>>>> I suspect that that issue would no longer get detected with your
>>>>>> patch, as the invalid memory access would simply not get detected.
>>>>>> Now, I cannot prove that :)
>>>>> Since David's patch we're having trouble with the iBFT ACPI table,
>>>>> which
>>>>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c". 
>>>>> KASAN
>>>>> detects that it is being used after free when ibft_init() accesses 
>>>>> the
>>>>> iBFT table, but as of yet we can't find where it get's freed (we've
>>>>> instrumented calls to kunmap()).
>>>> Maybe it doesn't get freed, but what you see is a wild or a large
>>>> out-of-bounds access. Since KASAN marks all memory as freed during the
>>>> memblock->page_alloc transition, such bugs can manifest as
>>>> use-after-frees.
>>>
>>> It gets freed and re-used. By the time the iBFT table is accessed by
>>> ibft_init() the page has been over-written.
>>>
>>> Setting page flags like the following before the call to kmap()
>>> prevents the iBFT table page from being freed:
>>
>> Cleaned up version:
>>
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index 0418feb..8f0a8e7 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_address
>> pg_off, unsigned long pg_sz)
>>
>>        pfn = pg_off >> PAGE_SHIFT;
>>        if (should_use_kmap(pfn)) {
>> +        struct page *page = pfn_to_page(pfn);
>> +
>>            if (pg_sz > PAGE_SIZE)
>>                return NULL;
>> -        return (void __iomem __force *)kmap(pfn_to_page(pfn));
>> +        SetPageReserved(page);
>> +        return (void __iomem __force *)kmap(page);
>>        } else
>>            return acpi_os_ioremap(pg_off, pg_sz);
>>    }
>> @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address
>> pg_off, void __iomem *vaddr)
>>        unsigned long pfn;
>>
>>        pfn = pg_off >> PAGE_SHIFT;
>> -    if (should_use_kmap(pfn))
>> -        kunmap(pfn_to_page(pfn));
>> -    else
>> +    if (should_use_kmap(pfn)) {
>> +        struct page *page = pfn_to_page(pfn);
>> +
>> +        ClearPageReserved(page);
>> +        kunmap(page);
>> +    } else
>>            iounmap(vaddr);
>>    }
>>
>> David, the above works, but wondering why it is now necessary. kunmap()
>> is not hit. What other ways could a page mapped via kmap() be unmapped?
>>
>
> Let me look into the code ... I have little experience with ACPI 
> details, so bear with me.
>
> I assume that acpi_map()/acpi_unmap() map some firmware blob that is 
> provided via firmware/bios/... to us.
>
> should_use_kmap() tells us whether
> a) we have a "struct page" and should kmap() that one
> b) we don't have a "struct page" and should ioremap.
>
> As it is a blob, the firmware should always reserve that memory region 
> via memblock (e.g., memblock_reserve()), such that we either
> 1) don't create a memmap ("struct page") at all (-> case b) )
> 2) if we have to create e memmap, we mark the page PG_reserved and
>    *never* expose it to the buddy (-> case a) )
>
>
> Are you telling me that in this case we might have a memmap for the HW 
> blob that is *not* PG_reserved? In that case it most probably got 
> exposed to the buddy where it can happily get allocated/freed.
>
> The latent BUG would be that that blob gets exposed to the system like 
> ordinary RAM, and not reserved via memblock early during boot. 
> Assuming that blob has a low physical address, with my patch it will 
> get allocated/used a lot earlier - which would mean we trigger this 
> latent BUG now more easily.
>
> There have been similar latent BUGs on ARM boards that my patch 
> discovered where special RAM regions did not get marked as reserved 
> via the device tree properly.
>
> Now, this is just a wild guess :) Can you dump the page when mapping 
> (before PageReserved()) and when unmapping, to see what the state of 
> that memmap is?

Thank you David for the explanation and your help on this,

dump_page() before PageReserved and before kmap() in the above patch:

[    1.116480] ACPI: Core revision 20201113
[    1.117628] XXX acpi_map: about to call kmap()...
[    1.118561] page:ffffea0002f914c0 refcount:0 mapcount:0 
mapping:0000000000000000 index:0x0 pfn:0xbe453
[    1.120381] flags: 0xfffffc0000000()
[    1.121116] raw: 000fffffc0000000 ffffea0002f914c8 ffffea0002f914c8 
0000000000000000
[    1.122638] raw: 0000000000000000 0000000000000000 00000000ffffffff 
0000000000000000
[    1.124146] page dumped because: acpi_map pre SetPageReserved

I also added dump_page() before unmapping, but it is not hit. The 
following for the same pfn now shows up I believe as a result of setting 
PageReserved:

[   28.098208] BUG: Bad page state in process modprobe  pfn:be453
[   28.098394] page:ffffea0002f914c0 refcount:0 mapcount:0 
mapping:0000000000000000 index:0x1 pfn:0xbe453
[   28.098394] flags: 0xfffffc0001000(reserved)
[   28.098394] raw: 000fffffc0001000 dead000000000100 dead000000000122 
0000000000000000
[   28.098394] raw: 0000000000000001 0000000000000000 00000000ffffffff 
0000000000000000
[   28.098394] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
[   28.098394] page_owner info is not present (never set?)
[   28.098394] Modules linked in:
[   28.098394] CPU: 2 PID: 204 Comm: modprobe Not tainted 5.11.0-3dbd5e3 #66
[   28.098394] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 0.0.0 02/06/2015
[   28.098394] Call Trace:
[   28.098394]  dump_stack+0xdb/0x120
[   28.098394]  bad_page.cold.108+0xc6/0xcb
[   28.098394]  check_new_page_bad+0x47/0xa0
[   28.098394]  get_page_from_freelist+0x30cd/0x5730
[   28.098394]  ? __isolate_free_page+0x4f0/0x4f0
[   28.098394]  ? init_object+0x7e/0x90
[   28.098394]  __alloc_pages_nodemask+0x2d8/0x650
[   28.098394]  ? write_comp_data+0x2f/0x90
[   28.098394]  ? __alloc_pages_slowpath.constprop.103+0x2110/0x2110
[   28.098394]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   28.098394]  alloc_pages_vma+0xe2/0x560
[   28.098394]  do_fault+0x194/0x12c0
[   28.098394]  ? write_comp_data+0x2f/0x90
[   28.098394]  __handle_mm_fault+0x1650/0x26c0
[   28.098394]  ? copy_page_range+0x1350/0x1350
[   28.098394]  ? write_comp_data+0x2f/0x90
[   28.098394]  ? write_comp_data+0x2f/0x90
[   28.098394]  handle_mm_fault+0x1f9/0x810
[   28.098394]  ? write_comp_data+0x2f/0x90
[   28.098394]  do_user_addr_fault+0x6f7/0xca0
[   28.098394]  exc_page_fault+0xaf/0x1a0
[   28.098394]  asm_exc_page_fault+0x1e/0x30
[   28.098394] RIP: 0010:__clear_user+0x30/0x60

What would be  the correct way to reserve the page so that the above 
would not be hit?

BTW, this is running with Konrad's patch that pairs acpi_get_table & 
acpi_put_table for the iBFT table which should result in an eventual 
call to acpi_unmap() and kunmap(), though that does not occur. Could be 
a possible acpi page refcount issue that will have to be looked into.

George
David Hildenbrand Feb. 22, 2021, 4:13 p.m. UTC | #11
On 22.02.21 16:13, George Kennedy wrote:
> 
> 
> On 2/22/2021 4:52 AM, David Hildenbrand wrote:
>> On 20.02.21 00:04, George Kennedy wrote:
>>>
>>>
>>> On 2/19/2021 11:45 AM, George Kennedy wrote:
>>>>
>>>>
>>>> On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
>>>>> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
>>>>> <george.kennedy@oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 2/18/2021 3:55 AM, David Hildenbrand wrote:
>>>>>>> On 17.02.21 21:56, Andrey Konovalov wrote:
>>>>>>>> During boot, all non-reserved memblock memory is exposed to the
>>>>>>>> buddy
>>>>>>>> allocator. Poisoning all that memory with KASAN lengthens boot
>>>>>>>> time,
>>>>>>>> especially on systems with large amount of RAM. This patch makes
>>>>>>>> page_alloc to not call kasan_free_pages() on all new memory.
>>>>>>>>
>>>>>>>> __free_pages_core() is used when exposing fresh memory during
>>>>>>>> system
>>>>>>>> boot and when onlining memory during hotplug. This patch adds a new
>>>>>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok()
>>>>>>>> through
>>>>>>>> free_pages_prepare() from __free_pages_core().
>>>>>>>>
>>>>>>>> This has little impact on KASAN memory tracking.
>>>>>>>>
>>>>>>>> Assuming that there are no references to newly exposed pages
>>>>>>>> before they
>>>>>>>> are ever allocated, there won't be any intended (but buggy)
>>>>>>>> accesses to
>>>>>>>> that memory that KASAN would normally detect.
>>>>>>>>
>>>>>>>> However, with this patch, KASAN stops detecting wild and large
>>>>>>>> out-of-bounds accesses that happen to land on a fresh memory page
>>>>>>>> that
>>>>>>>> was never allocated. This is taken as an acceptable trade-off.
>>>>>>>>
>>>>>>>> All memory allocated normally when the boot is over keeps getting
>>>>>>>> poisoned as usual.
>>>>>>>>
>>>>>>>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>>>>>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>>>>>>> Not sure this is the right thing to do, see
>>>>>>>
>>>>>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Reversing the order in which memory gets allocated + used during
>>>>>>> boot
>>>>>>> (in a patch by me) might have revealed an invalid memory access
>>>>>>> during
>>>>>>> boot.
>>>>>>>
>>>>>>> I suspect that that issue would no longer get detected with your
>>>>>>> patch, as the invalid memory access would simply not get detected.
>>>>>>> Now, I cannot prove that :)
>>>>>> Since David's patch we're having trouble with the iBFT ACPI table,
>>>>>> which
>>>>>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c".
>>>>>> KASAN
>>>>>> detects that it is being used after free when ibft_init() accesses
>>>>>> the
>>>>>> iBFT table, but as of yet we can't find where it get's freed (we've
>>>>>> instrumented calls to kunmap()).
>>>>> Maybe it doesn't get freed, but what you see is a wild or a large
>>>>> out-of-bounds access. Since KASAN marks all memory as freed during the
>>>>> memblock->page_alloc transition, such bugs can manifest as
>>>>> use-after-frees.
>>>>
>>>> It gets freed and re-used. By the time the iBFT table is accessed by
>>>> ibft_init() the page has been over-written.
>>>>
>>>> Setting page flags like the following before the call to kmap()
>>>> prevents the iBFT table page from being freed:
>>>
>>> Cleaned up version:
>>>
>>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>>> index 0418feb..8f0a8e7 100644
>>> --- a/drivers/acpi/osl.c
>>> +++ b/drivers/acpi/osl.c
>>> @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_address
>>> pg_off, unsigned long pg_sz)
>>>
>>>         pfn = pg_off >> PAGE_SHIFT;
>>>         if (should_use_kmap(pfn)) {
>>> +        struct page *page = pfn_to_page(pfn);
>>> +
>>>             if (pg_sz > PAGE_SIZE)
>>>                 return NULL;
>>> -        return (void __iomem __force *)kmap(pfn_to_page(pfn));
>>> +        SetPageReserved(page);
>>> +        return (void __iomem __force *)kmap(page);
>>>         } else
>>>             return acpi_os_ioremap(pg_off, pg_sz);
>>>     }
>>> @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address
>>> pg_off, void __iomem *vaddr)
>>>         unsigned long pfn;
>>>
>>>         pfn = pg_off >> PAGE_SHIFT;
>>> -    if (should_use_kmap(pfn))
>>> -        kunmap(pfn_to_page(pfn));
>>> -    else
>>> +    if (should_use_kmap(pfn)) {
>>> +        struct page *page = pfn_to_page(pfn);
>>> +
>>> +        ClearPageReserved(page);
>>> +        kunmap(page);
>>> +    } else
>>>             iounmap(vaddr);
>>>     }
>>>
>>> David, the above works, but wondering why it is now necessary. kunmap()
>>> is not hit. What other ways could a page mapped via kmap() be unmapped?
>>>
>>
>> Let me look into the code ... I have little experience with ACPI
>> details, so bear with me.
>>
>> I assume that acpi_map()/acpi_unmap() map some firmware blob that is
>> provided via firmware/bios/... to us.
>>
>> should_use_kmap() tells us whether
>> a) we have a "struct page" and should kmap() that one
>> b) we don't have a "struct page" and should ioremap.
>>
>> As it is a blob, the firmware should always reserve that memory region
>> via memblock (e.g., memblock_reserve()), such that we either
>> 1) don't create a memmap ("struct page") at all (-> case b) )
>> 2) if we have to create e memmap, we mark the page PG_reserved and
>>     *never* expose it to the buddy (-> case a) )
>>
>>
>> Are you telling me that in this case we might have a memmap for the HW
>> blob that is *not* PG_reserved? In that case it most probably got
>> exposed to the buddy where it can happily get allocated/freed.
>>
>> The latent BUG would be that that blob gets exposed to the system like
>> ordinary RAM, and not reserved via memblock early during boot.
>> Assuming that blob has a low physical address, with my patch it will
>> get allocated/used a lot earlier - which would mean we trigger this
>> latent BUG now more easily.
>>
>> There have been similar latent BUGs on ARM boards that my patch
>> discovered where special RAM regions did not get marked as reserved
>> via the device tree properly.
>>
>> Now, this is just a wild guess :) Can you dump the page when mapping
>> (before PageReserved()) and when unmapping, to see what the state of
>> that memmap is?
> 
> Thank you David for the explanation and your help on this,
> 
> dump_page() before PageReserved and before kmap() in the above patch:
> 
> [    1.116480] ACPI: Core revision 20201113
> [    1.117628] XXX acpi_map: about to call kmap()...
> [    1.118561] page:ffffea0002f914c0 refcount:0 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0xbe453
> [    1.120381] flags: 0xfffffc0000000()
> [    1.121116] raw: 000fffffc0000000 ffffea0002f914c8 ffffea0002f914c8
> 0000000000000000
> [    1.122638] raw: 0000000000000000 0000000000000000 00000000ffffffff
> 0000000000000000
> [    1.124146] page dumped because: acpi_map pre SetPageReserved
> 
> I also added dump_page() before unmapping, but it is not hit. The
> following for the same pfn now shows up I believe as a result of setting
> PageReserved:
> 
> [   28.098208] BUG:Bad page state in process mo dprobe  pfn:be453
> [   28.098394] page:ffffea0002f914c0 refcount:0 mapcount:0
> mapping:0000000000000000 index:0x1 pfn:0xbe453
> [   28.098394] flags: 0xfffffc0001000(reserved)
> [   28.098394] raw: 000fffffc0001000 dead000000000100 dead000000000122
> 0000000000000000
> [   28.098394] raw: 0000000000000001 0000000000000000 00000000ffffffff
> 0000000000000000
> [   28.098394] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
> [   28.098394] page_owner info is not present (never set?)
> [   28.098394] Modules linked in:
> [   28.098394] CPU: 2 PID: 204 Comm: modprobe Not tainted 5.11.0-3dbd5e3 #66
> [   28.098394] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 0.0.0 02/06/2015
> [   28.098394] Call Trace:
> [   28.098394]  dump_stack+0xdb/0x120
> [   28.098394]  bad_page.cold.108+0xc6/0xcb
> [   28.098394]  check_new_page_bad+0x47/0xa0
> [   28.098394]  get_page_from_freelist+0x30cd/0x5730
> [   28.098394]  ? __isolate_free_page+0x4f0/0x4f0
> [   28.098394]  ? init_object+0x7e/0x90
> [   28.098394]  __alloc_pages_nodemask+0x2d8/0x650
> [   28.098394]  ? write_comp_data+0x2f/0x90
> [   28.098394]  ? __alloc_pages_slowpath.constprop.103+0x2110/0x2110
> [   28.098394]  ? __sanitizer_cov_trace_pc+0x21/0x50
> [   28.098394]  alloc_pages_vma+0xe2/0x560
> [   28.098394]  do_fault+0x194/0x12c0
> [   28.098394]  ? write_comp_data+0x2f/0x90
> [   28.098394]  __handle_mm_fault+0x1650/0x26c0
> [   28.098394]  ? copy_page_range+0x1350/0x1350
> [   28.098394]  ? write_comp_data+0x2f/0x90
> [   28.098394]  ? write_comp_data+0x2f/0x90
> [   28.098394]  handle_mm_fault+0x1f9/0x810
> [   28.098394]  ? write_comp_data+0x2f/0x90
> [   28.098394]  do_user_addr_fault+0x6f7/0xca0
> [   28.098394]  exc_page_fault+0xaf/0x1a0
> [   28.098394]  asm_exc_page_fault+0x1e/0x30
> [   28.098394] RIP: 0010:__clear_user+0x30/0x60

I think the PAGE_FLAGS_CHECK_AT_PREP check in this instance means that 
someone is trying to allocate that page with the PG_reserved bit set. 
This means that the page actually was exposed to the buddy.

However, when you SetPageReserved(), I don't think that PG_buddy is set 
and the refcount is 0. That could indicate that the page is on the buddy 
PCP list. Could be that it is getting reused a couple of times.

The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables 
close to 3 GiB ? No idea. Could it be that you are trying to map a wrong 
table? Just a guess.

> 
> What would be  the correct way to reserve the page so that the above
> would not be hit?

I would have assumed that if this is a binary blob, that someone (which 
I think would be acpi code) reserved via memblock_reserve() early during 
boot.

E.g., see drivers/acpi/tables.c:acpi_table_upgrade()->memblock_reserve().
David Hildenbrand Feb. 22, 2021, 4:39 p.m. UTC | #12
On 22.02.21 17:13, David Hildenbrand wrote:
> On 22.02.21 16:13, George Kennedy wrote:
>>
>>
>> On 2/22/2021 4:52 AM, David Hildenbrand wrote:
>>> On 20.02.21 00:04, George Kennedy wrote:
>>>>
>>>>
>>>> On 2/19/2021 11:45 AM, George Kennedy wrote:
>>>>>
>>>>>
>>>>> On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
>>>>>> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
>>>>>> <george.kennedy@oracle.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2/18/2021 3:55 AM, David Hildenbrand wrote:
>>>>>>>> On 17.02.21 21:56, Andrey Konovalov wrote:
>>>>>>>>> During boot, all non-reserved memblock memory is exposed to the
>>>>>>>>> buddy
>>>>>>>>> allocator. Poisoning all that memory with KASAN lengthens boot
>>>>>>>>> time,
>>>>>>>>> especially on systems with large amount of RAM. This patch makes
>>>>>>>>> page_alloc to not call kasan_free_pages() on all new memory.
>>>>>>>>>
>>>>>>>>> __free_pages_core() is used when exposing fresh memory during
>>>>>>>>> system
>>>>>>>>> boot and when onlining memory during hotplug. This patch adds a new
>>>>>>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok()
>>>>>>>>> through
>>>>>>>>> free_pages_prepare() from __free_pages_core().
>>>>>>>>>
>>>>>>>>> This has little impact on KASAN memory tracking.
>>>>>>>>>
>>>>>>>>> Assuming that there are no references to newly exposed pages
>>>>>>>>> before they
>>>>>>>>> are ever allocated, there won't be any intended (but buggy)
>>>>>>>>> accesses to
>>>>>>>>> that memory that KASAN would normally detect.
>>>>>>>>>
>>>>>>>>> However, with this patch, KASAN stops detecting wild and large
>>>>>>>>> out-of-bounds accesses that happen to land on a fresh memory page
>>>>>>>>> that
>>>>>>>>> was never allocated. This is taken as an acceptable trade-off.
>>>>>>>>>
>>>>>>>>> All memory allocated normally when the boot is over keeps getting
>>>>>>>>> poisoned as usual.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>>>>>>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>>>>>>>> Not sure this is the right thing to do, see
>>>>>>>>
>>>>>>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Reversing the order in which memory gets allocated + used during
>>>>>>>> boot
>>>>>>>> (in a patch by me) might have revealed an invalid memory access
>>>>>>>> during
>>>>>>>> boot.
>>>>>>>>
>>>>>>>> I suspect that that issue would no longer get detected with your
>>>>>>>> patch, as the invalid memory access would simply not get detected.
>>>>>>>> Now, I cannot prove that :)
>>>>>>> Since David's patch we're having trouble with the iBFT ACPI table,
>>>>>>> which
>>>>>>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c".
>>>>>>> KASAN
>>>>>>> detects that it is being used after free when ibft_init() accesses
>>>>>>> the
>>>>>>> iBFT table, but as of yet we can't find where it get's freed (we've
>>>>>>> instrumented calls to kunmap()).
>>>>>> Maybe it doesn't get freed, but what you see is a wild or a large
>>>>>> out-of-bounds access. Since KASAN marks all memory as freed during the
>>>>>> memblock->page_alloc transition, such bugs can manifest as
>>>>>> use-after-frees.
>>>>>
>>>>> It gets freed and re-used. By the time the iBFT table is accessed by
>>>>> ibft_init() the page has been over-written.
>>>>>
>>>>> Setting page flags like the following before the call to kmap()
>>>>> prevents the iBFT table page from being freed:
>>>>
>>>> Cleaned up version:
>>>>
>>>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>>>> index 0418feb..8f0a8e7 100644
>>>> --- a/drivers/acpi/osl.c
>>>> +++ b/drivers/acpi/osl.c
>>>> @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_address
>>>> pg_off, unsigned long pg_sz)
>>>>
>>>>          pfn = pg_off >> PAGE_SHIFT;
>>>>          if (should_use_kmap(pfn)) {
>>>> +        struct page *page = pfn_to_page(pfn);
>>>> +
>>>>              if (pg_sz > PAGE_SIZE)
>>>>                  return NULL;
>>>> -        return (void __iomem __force *)kmap(pfn_to_page(pfn));
>>>> +        SetPageReserved(page);
>>>> +        return (void __iomem __force *)kmap(page);
>>>>          } else
>>>>              return acpi_os_ioremap(pg_off, pg_sz);
>>>>      }
>>>> @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address
>>>> pg_off, void __iomem *vaddr)
>>>>          unsigned long pfn;
>>>>
>>>>          pfn = pg_off >> PAGE_SHIFT;
>>>> -    if (should_use_kmap(pfn))
>>>> -        kunmap(pfn_to_page(pfn));
>>>> -    else
>>>> +    if (should_use_kmap(pfn)) {
>>>> +        struct page *page = pfn_to_page(pfn);
>>>> +
>>>> +        ClearPageReserved(page);
>>>> +        kunmap(page);
>>>> +    } else
>>>>              iounmap(vaddr);
>>>>      }
>>>>
>>>> David, the above works, but wondering why it is now necessary. kunmap()
>>>> is not hit. What other ways could a page mapped via kmap() be unmapped?
>>>>
>>>
>>> Let me look into the code ... I have little experience with ACPI
>>> details, so bear with me.
>>>
>>> I assume that acpi_map()/acpi_unmap() map some firmware blob that is
>>> provided via firmware/bios/... to us.
>>>
>>> should_use_kmap() tells us whether
>>> a) we have a "struct page" and should kmap() that one
>>> b) we don't have a "struct page" and should ioremap.
>>>
>>> As it is a blob, the firmware should always reserve that memory region
>>> via memblock (e.g., memblock_reserve()), such that we either
>>> 1) don't create a memmap ("struct page") at all (-> case b) )
>>> 2) if we have to create e memmap, we mark the page PG_reserved and
>>>      *never* expose it to the buddy (-> case a) )
>>>
>>>
>>> Are you telling me that in this case we might have a memmap for the HW
>>> blob that is *not* PG_reserved? In that case it most probably got
>>> exposed to the buddy where it can happily get allocated/freed.
>>>
>>> The latent BUG would be that that blob gets exposed to the system like
>>> ordinary RAM, and not reserved via memblock early during boot.
>>> Assuming that blob has a low physical address, with my patch it will
>>> get allocated/used a lot earlier - which would mean we trigger this
>>> latent BUG now more easily.
>>>
>>> There have been similar latent BUGs on ARM boards that my patch
>>> discovered where special RAM regions did not get marked as reserved
>>> via the device tree properly.
>>>
>>> Now, this is just a wild guess :) Can you dump the page when mapping
>>> (before PageReserved()) and when unmapping, to see what the state of
>>> that memmap is?
>>
>> Thank you David for the explanation and your help on this,
>>
>> dump_page() before PageReserved and before kmap() in the above patch:
>>
>> [    1.116480] ACPI: Core revision 20201113
>> [    1.117628] XXX acpi_map: about to call kmap()...
>> [    1.118561] page:ffffea0002f914c0 refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x0 pfn:0xbe453
>> [    1.120381] flags: 0xfffffc0000000()
>> [    1.121116] raw: 000fffffc0000000 ffffea0002f914c8 ffffea0002f914c8
>> 0000000000000000
>> [    1.122638] raw: 0000000000000000 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [    1.124146] page dumped because: acpi_map pre SetPageReserved
>>
>> I also added dump_page() before unmapping, but it is not hit. The
>> following for the same pfn now shows up I believe as a result of setting
>> PageReserved:
>>
>> [   28.098208] BUG:Bad page state in process mo dprobe  pfn:be453
>> [   28.098394] page:ffffea0002f914c0 refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x1 pfn:0xbe453
>> [   28.098394] flags: 0xfffffc0001000(reserved)
>> [   28.098394] raw: 000fffffc0001000 dead000000000100 dead000000000122
>> 0000000000000000
>> [   28.098394] raw: 0000000000000001 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [   28.098394] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
>> [   28.098394] page_owner info is not present (never set?)
>> [   28.098394] Modules linked in:
>> [   28.098394] CPU: 2 PID: 204 Comm: modprobe Not tainted 5.11.0-3dbd5e3 #66
>> [   28.098394] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 0.0.0 02/06/2015
>> [   28.098394] Call Trace:
>> [   28.098394]  dump_stack+0xdb/0x120
>> [   28.098394]  bad_page.cold.108+0xc6/0xcb
>> [   28.098394]  check_new_page_bad+0x47/0xa0
>> [   28.098394]  get_page_from_freelist+0x30cd/0x5730
>> [   28.098394]  ? __isolate_free_page+0x4f0/0x4f0
>> [   28.098394]  ? init_object+0x7e/0x90
>> [   28.098394]  __alloc_pages_nodemask+0x2d8/0x650
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  ? __alloc_pages_slowpath.constprop.103+0x2110/0x2110
>> [   28.098394]  ? __sanitizer_cov_trace_pc+0x21/0x50
>> [   28.098394]  alloc_pages_vma+0xe2/0x560
>> [   28.098394]  do_fault+0x194/0x12c0
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  __handle_mm_fault+0x1650/0x26c0
>> [   28.098394]  ? copy_page_range+0x1350/0x1350
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  handle_mm_fault+0x1f9/0x810
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  do_user_addr_fault+0x6f7/0xca0
>> [   28.098394]  exc_page_fault+0xaf/0x1a0
>> [   28.098394]  asm_exc_page_fault+0x1e/0x30
>> [   28.098394] RIP: 0010:__clear_user+0x30/0x60
> 
> I think the PAGE_FLAGS_CHECK_AT_PREP check in this instance means that
> someone is trying to allocate that page with the PG_reserved bit set.
> This means that the page actually was exposed to the buddy.
> 
> However, when you SetPageReserved(), I don't think that PG_buddy is set
> and the refcount is 0. That could indicate that the page is on the buddy
> PCP list. Could be that it is getting reused a couple of times.
> 
> The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables
> close to 3 GiB ? No idea. Could it be that you are trying to map a wrong
> table? Just a guess.

... but I assume ibft_check_device() would bail out on an invalid 
checksum. So the question is, why is this page not properly marked as 
reserved already.
Konrad Rzeszutek Wilk Feb. 22, 2021, 5:40 p.m. UTC | #13
On Mon, Feb 22, 2021 at 05:39:29PM +0100, David Hildenbrand wrote:
> On 22.02.21 17:13, David Hildenbrand wrote:
> > On 22.02.21 16:13, George Kennedy wrote:
> > > 
> > > 
> > > On 2/22/2021 4:52 AM, David Hildenbrand wrote:
> > > > On 20.02.21 00:04, George Kennedy wrote:
> > > > > 
> > > > > 
> > > > > On 2/19/2021 11:45 AM, George Kennedy wrote:
> > > > > > 
> > > > > > 
> > > > > > On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
> > > > > > > On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
> > > > > > > <george.kennedy@oracle.com> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 2/18/2021 3:55 AM, David Hildenbrand wrote:
> > > > > > > > > On 17.02.21 21:56, Andrey Konovalov wrote:
> > > > > > > > > > During boot, all non-reserved memblock memory is exposed to the
> > > > > > > > > > buddy
> > > > > > > > > > allocator. Poisoning all that memory with KASAN lengthens boot
> > > > > > > > > > time,
> > > > > > > > > > especially on systems with large amount of RAM. This patch makes
> > > > > > > > > > page_alloc to not call kasan_free_pages() on all new memory.
> > > > > > > > > > 
> > > > > > > > > > __free_pages_core() is used when exposing fresh memory during
> > > > > > > > > > system
> > > > > > > > > > boot and when onlining memory during hotplug. This patch adds a new
> > > > > > > > > > FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok()
> > > > > > > > > > through
> > > > > > > > > > free_pages_prepare() from __free_pages_core().
> > > > > > > > > > 
> > > > > > > > > > This has little impact on KASAN memory tracking.
> > > > > > > > > > 
> > > > > > > > > > Assuming that there are no references to newly exposed pages
> > > > > > > > > > before they
> > > > > > > > > > are ever allocated, there won't be any intended (but buggy)
> > > > > > > > > > accesses to
> > > > > > > > > > that memory that KASAN would normally detect.
> > > > > > > > > > 
> > > > > > > > > > However, with this patch, KASAN stops detecting wild and large
> > > > > > > > > > out-of-bounds accesses that happen to land on a fresh memory page
> > > > > > > > > > that
> > > > > > > > > > was never allocated. This is taken as an acceptable trade-off.
> > > > > > > > > > 
> > > > > > > > > > All memory allocated normally when the boot is over keeps getting
> > > > > > > > > > poisoned as usual.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> > > > > > > > > > Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
> > > > > > > > > Not sure this is the right thing to do, see
> > > > > > > > > 
> > > > > > > > > https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Reversing the order in which memory gets allocated + used during
> > > > > > > > > boot
> > > > > > > > > (in a patch by me) might have revealed an invalid memory access
> > > > > > > > > during
> > > > > > > > > boot.
> > > > > > > > > 
> > > > > > > > > I suspect that that issue would no longer get detected with your
> > > > > > > > > patch, as the invalid memory access would simply not get detected.
> > > > > > > > > Now, I cannot prove that :)
> > > > > > > > Since David's patch we're having trouble with the iBFT ACPI table,
> > > > > > > > which
> > > > > > > > is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c".
> > > > > > > > KASAN
> > > > > > > > detects that it is being used after free when ibft_init() accesses
> > > > > > > > the
> > > > > > > > iBFT table, but as of yet we can't find where it get's freed (we've
> > > > > > > > instrumented calls to kunmap()).
> > > > > > > Maybe it doesn't get freed, but what you see is a wild or a large
> > > > > > > out-of-bounds access. Since KASAN marks all memory as freed during the
> > > > > > > memblock->page_alloc transition, such bugs can manifest as
> > > > > > > use-after-frees.
> > > > > > 
> > > > > > It gets freed and re-used. By the time the iBFT table is accessed by
> > > > > > ibft_init() the page has been over-written.
> > > > > > 
> > > > > > Setting page flags like the following before the call to kmap()
> > > > > > prevents the iBFT table page from being freed:
> > > > > 
> > > > > Cleaned up version:
> > > > > 
> > > > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> > > > > index 0418feb..8f0a8e7 100644
> > > > > --- a/drivers/acpi/osl.c
> > > > > +++ b/drivers/acpi/osl.c
> > > > > @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_address
> > > > > pg_off, unsigned long pg_sz)
> > > > > 
> > > > >          pfn = pg_off >> PAGE_SHIFT;
> > > > >          if (should_use_kmap(pfn)) {
> > > > > +        struct page *page = pfn_to_page(pfn);
> > > > > +
> > > > >              if (pg_sz > PAGE_SIZE)
> > > > >                  return NULL;
> > > > > -        return (void __iomem __force *)kmap(pfn_to_page(pfn));
> > > > > +        SetPageReserved(page);
> > > > > +        return (void __iomem __force *)kmap(page);
> > > > >          } else
> > > > >              return acpi_os_ioremap(pg_off, pg_sz);
> > > > >      }
> > > > > @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address
> > > > > pg_off, void __iomem *vaddr)
> > > > >          unsigned long pfn;
> > > > > 
> > > > >          pfn = pg_off >> PAGE_SHIFT;
> > > > > -    if (should_use_kmap(pfn))
> > > > > -        kunmap(pfn_to_page(pfn));
> > > > > -    else
> > > > > +    if (should_use_kmap(pfn)) {
> > > > > +        struct page *page = pfn_to_page(pfn);
> > > > > +
> > > > > +        ClearPageReserved(page);
> > > > > +        kunmap(page);
> > > > > +    } else
> > > > >              iounmap(vaddr);
> > > > >      }
> > > > > 
> > > > > David, the above works, but wondering why it is now necessary. kunmap()
> > > > > is not hit. What other ways could a page mapped via kmap() be unmapped?
> > > > > 
> > > > 
> > > > Let me look into the code ... I have little experience with ACPI
> > > > details, so bear with me.
> > > > 
> > > > I assume that acpi_map()/acpi_unmap() map some firmware blob that is
> > > > provided via firmware/bios/... to us.
> > > > 
> > > > should_use_kmap() tells us whether
> > > > a) we have a "struct page" and should kmap() that one
> > > > b) we don't have a "struct page" and should ioremap.
> > > > 
> > > > As it is a blob, the firmware should always reserve that memory region
> > > > via memblock (e.g., memblock_reserve()), such that we either
> > > > 1) don't create a memmap ("struct page") at all (-> case b) )
> > > > 2) if we have to create e memmap, we mark the page PG_reserved and
> > > >      *never* expose it to the buddy (-> case a) )
> > > > 
> > > > 
> > > > Are you telling me that in this case we might have a memmap for the HW
> > > > blob that is *not* PG_reserved? In that case it most probably got
> > > > exposed to the buddy where it can happily get allocated/freed.
> > > > 
> > > > The latent BUG would be that that blob gets exposed to the system like
> > > > ordinary RAM, and not reserved via memblock early during boot.
> > > > Assuming that blob has a low physical address, with my patch it will
> > > > get allocated/used a lot earlier - which would mean we trigger this
> > > > latent BUG now more easily.
> > > > 
> > > > There have been similar latent BUGs on ARM boards that my patch
> > > > discovered where special RAM regions did not get marked as reserved
> > > > via the device tree properly.
> > > > 
> > > > Now, this is just a wild guess :) Can you dump the page when mapping
> > > > (before PageReserved()) and when unmapping, to see what the state of
> > > > that memmap is?
> > > 
> > > Thank you David for the explanation and your help on this,
> > > 
> > > dump_page() before PageReserved and before kmap() in the above patch:
> > > 
> > > [    1.116480] ACPI: Core revision 20201113
> > > [    1.117628] XXX acpi_map: about to call kmap()...
> > > [    1.118561] page:ffffea0002f914c0 refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0x0 pfn:0xbe453
> > > [    1.120381] flags: 0xfffffc0000000()
> > > [    1.121116] raw: 000fffffc0000000 ffffea0002f914c8 ffffea0002f914c8
> > > 0000000000000000
> > > [    1.122638] raw: 0000000000000000 0000000000000000 00000000ffffffff
> > > 0000000000000000
> > > [    1.124146] page dumped because: acpi_map pre SetPageReserved
> > > 
> > > I also added dump_page() before unmapping, but it is not hit. The
> > > following for the same pfn now shows up I believe as a result of setting
> > > PageReserved:
> > > 
> > > [   28.098208] BUG:Bad page state in process mo dprobe  pfn:be453
> > > [   28.098394] page:ffffea0002f914c0 refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0x1 pfn:0xbe453
> > > [   28.098394] flags: 0xfffffc0001000(reserved)
> > > [   28.098394] raw: 000fffffc0001000 dead000000000100 dead000000000122
> > > 0000000000000000
> > > [   28.098394] raw: 0000000000000001 0000000000000000 00000000ffffffff
> > > 0000000000000000
> > > [   28.098394] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
> > > [   28.098394] page_owner info is not present (never set?)
> > > [   28.098394] Modules linked in:
> > > [   28.098394] CPU: 2 PID: 204 Comm: modprobe Not tainted 5.11.0-3dbd5e3 #66
> > > [   28.098394] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > BIOS 0.0.0 02/06/2015
> > > [   28.098394] Call Trace:
> > > [   28.098394]  dump_stack+0xdb/0x120
> > > [   28.098394]  bad_page.cold.108+0xc6/0xcb
> > > [   28.098394]  check_new_page_bad+0x47/0xa0
> > > [   28.098394]  get_page_from_freelist+0x30cd/0x5730
> > > [   28.098394]  ? __isolate_free_page+0x4f0/0x4f0
> > > [   28.098394]  ? init_object+0x7e/0x90
> > > [   28.098394]  __alloc_pages_nodemask+0x2d8/0x650
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  ? __alloc_pages_slowpath.constprop.103+0x2110/0x2110
> > > [   28.098394]  ? __sanitizer_cov_trace_pc+0x21/0x50
> > > [   28.098394]  alloc_pages_vma+0xe2/0x560
> > > [   28.098394]  do_fault+0x194/0x12c0
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  __handle_mm_fault+0x1650/0x26c0
> > > [   28.098394]  ? copy_page_range+0x1350/0x1350
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  handle_mm_fault+0x1f9/0x810
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  do_user_addr_fault+0x6f7/0xca0
> > > [   28.098394]  exc_page_fault+0xaf/0x1a0
> > > [   28.098394]  asm_exc_page_fault+0x1e/0x30
> > > [   28.098394] RIP: 0010:__clear_user+0x30/0x60
> > 
> > I think the PAGE_FLAGS_CHECK_AT_PREP check in this instance means that
> > someone is trying to allocate that page with the PG_reserved bit set.
> > This means that the page actually was exposed to the buddy.
> > 
> > However, when you SetPageReserved(), I don't think that PG_buddy is set
> > and the refcount is 0. That could indicate that the page is on the buddy
> > PCP list. Could be that it is getting reused a couple of times.
> > 
> > The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables
> > close to 3 GiB ? No idea. Could it be that you are trying to map a wrong
> > table? Just a guess.

Nah, ACPI MADT enumerates the table and that is the proper location of it.
> 
> ... but I assume ibft_check_device() would bail out on an invalid checksum.
> So the question is, why is this page not properly marked as reserved
> already.

The ibft_check_device ends up being called as module way way after the
kernel has cleaned the memory.

The funny thing about iBFT is that (it is also mentioned in the spec)
that the table can resize in memory .. or in the ACPI regions (which
have no E820_RAM and are considered "MMIO" regions).

Either place is fine, so it can be in either RAM or MMIO :-(

> 
> -- 
> Thanks,
> 
> David / dhildenb
>
George Kennedy Feb. 22, 2021, 6:42 p.m. UTC | #14
On 2/22/2021 11:13 AM, David Hildenbrand wrote:
> On 22.02.21 16:13, George Kennedy wrote:
>>
>>
>> On 2/22/2021 4:52 AM, David Hildenbrand wrote:
>>> On 20.02.21 00:04, George Kennedy wrote:
>>>>
>>>>
>>>> On 2/19/2021 11:45 AM, George Kennedy wrote:
>>>>>
>>>>>
>>>>> On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
>>>>>> On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
>>>>>> <george.kennedy@oracle.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2/18/2021 3:55 AM, David Hildenbrand wrote:
>>>>>>>> On 17.02.21 21:56, Andrey Konovalov wrote:
>>>>>>>>> During boot, all non-reserved memblock memory is exposed to the
>>>>>>>>> buddy
>>>>>>>>> allocator. Poisoning all that memory with KASAN lengthens boot
>>>>>>>>> time,
>>>>>>>>> especially on systems with large amount of RAM. This patch makes
>>>>>>>>> page_alloc to not call kasan_free_pages() on all new memory.
>>>>>>>>>
>>>>>>>>> __free_pages_core() is used when exposing fresh memory during
>>>>>>>>> system
>>>>>>>>> boot and when onlining memory during hotplug. This patch adds 
>>>>>>>>> a new
>>>>>>>>> FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok()
>>>>>>>>> through
>>>>>>>>> free_pages_prepare() from __free_pages_core().
>>>>>>>>>
>>>>>>>>> This has little impact on KASAN memory tracking.
>>>>>>>>>
>>>>>>>>> Assuming that there are no references to newly exposed pages
>>>>>>>>> before they
>>>>>>>>> are ever allocated, there won't be any intended (but buggy)
>>>>>>>>> accesses to
>>>>>>>>> that memory that KASAN would normally detect.
>>>>>>>>>
>>>>>>>>> However, with this patch, KASAN stops detecting wild and large
>>>>>>>>> out-of-bounds accesses that happen to land on a fresh memory page
>>>>>>>>> that
>>>>>>>>> was never allocated. This is taken as an acceptable trade-off.
>>>>>>>>>
>>>>>>>>> All memory allocated normally when the boot is over keeps getting
>>>>>>>>> poisoned as usual.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>>>>>>>>> Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
>>>>>>>> Not sure this is the right thing to do, see
>>>>>>>>
>>>>>>>> https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com 
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Reversing the order in which memory gets allocated + used during
>>>>>>>> boot
>>>>>>>> (in a patch by me) might have revealed an invalid memory access
>>>>>>>> during
>>>>>>>> boot.
>>>>>>>>
>>>>>>>> I suspect that that issue would no longer get detected with your
>>>>>>>> patch, as the invalid memory access would simply not get detected.
>>>>>>>> Now, I cannot prove that :)
>>>>>>> Since David's patch we're having trouble with the iBFT ACPI table,
>>>>>>> which
>>>>>>> is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c".
>>>>>>> KASAN
>>>>>>> detects that it is being used after free when ibft_init() accesses
>>>>>>> the
>>>>>>> iBFT table, but as of yet we can't find where it get's freed (we've
>>>>>>> instrumented calls to kunmap()).
>>>>>> Maybe it doesn't get freed, but what you see is a wild or a large
>>>>>> out-of-bounds access. Since KASAN marks all memory as freed 
>>>>>> during the
>>>>>> memblock->page_alloc transition, such bugs can manifest as
>>>>>> use-after-frees.
>>>>>
>>>>> It gets freed and re-used. By the time the iBFT table is accessed by
>>>>> ibft_init() the page has been over-written.
>>>>>
>>>>> Setting page flags like the following before the call to kmap()
>>>>> prevents the iBFT table page from being freed:
>>>>
>>>> Cleaned up version:
>>>>
>>>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>>>> index 0418feb..8f0a8e7 100644
>>>> --- a/drivers/acpi/osl.c
>>>> +++ b/drivers/acpi/osl.c
>>>> @@ -287,9 +287,12 @@ static void __iomem 
>>>> *acpi_map(acpi_physical_address
>>>> pg_off, unsigned long pg_sz)
>>>>
>>>>         pfn = pg_off >> PAGE_SHIFT;
>>>>         if (should_use_kmap(pfn)) {
>>>> +        struct page *page = pfn_to_page(pfn);
>>>> +
>>>>             if (pg_sz > PAGE_SIZE)
>>>>                 return NULL;
>>>> -        return (void __iomem __force *)kmap(pfn_to_page(pfn));
>>>> +        SetPageReserved(page);
>>>> +        return (void __iomem __force *)kmap(page);
>>>>         } else
>>>>             return acpi_os_ioremap(pg_off, pg_sz);
>>>>     }
>>>> @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address
>>>> pg_off, void __iomem *vaddr)
>>>>         unsigned long pfn;
>>>>
>>>>         pfn = pg_off >> PAGE_SHIFT;
>>>> -    if (should_use_kmap(pfn))
>>>> -        kunmap(pfn_to_page(pfn));
>>>> -    else
>>>> +    if (should_use_kmap(pfn)) {
>>>> +        struct page *page = pfn_to_page(pfn);
>>>> +
>>>> +        ClearPageReserved(page);
>>>> +        kunmap(page);
>>>> +    } else
>>>>             iounmap(vaddr);
>>>>     }
>>>>
>>>> David, the above works, but wondering why it is now necessary. 
>>>> kunmap()
>>>> is not hit. What other ways could a page mapped via kmap() be 
>>>> unmapped?
>>>>
>>>
>>> Let me look into the code ... I have little experience with ACPI
>>> details, so bear with me.
>>>
>>> I assume that acpi_map()/acpi_unmap() map some firmware blob that is
>>> provided via firmware/bios/... to us.
>>>
>>> should_use_kmap() tells us whether
>>> a) we have a "struct page" and should kmap() that one
>>> b) we don't have a "struct page" and should ioremap.
>>>
>>> As it is a blob, the firmware should always reserve that memory region
>>> via memblock (e.g., memblock_reserve()), such that we either
>>> 1) don't create a memmap ("struct page") at all (-> case b) )
>>> 2) if we have to create e memmap, we mark the page PG_reserved and
>>>     *never* expose it to the buddy (-> case a) )
>>>
>>>
>>> Are you telling me that in this case we might have a memmap for the HW
>>> blob that is *not* PG_reserved? In that case it most probably got
>>> exposed to the buddy where it can happily get allocated/freed.
>>>
>>> The latent BUG would be that that blob gets exposed to the system like
>>> ordinary RAM, and not reserved via memblock early during boot.
>>> Assuming that blob has a low physical address, with my patch it will
>>> get allocated/used a lot earlier - which would mean we trigger this
>>> latent BUG now more easily.
>>>
>>> There have been similar latent BUGs on ARM boards that my patch
>>> discovered where special RAM regions did not get marked as reserved
>>> via the device tree properly.
>>>
>>> Now, this is just a wild guess :) Can you dump the page when mapping
>>> (before PageReserved()) and when unmapping, to see what the state of
>>> that memmap is?
>>
>> Thank you David for the explanation and your help on this,
>>
>> dump_page() before PageReserved and before kmap() in the above patch:
>>
>> [    1.116480] ACPI: Core revision 20201113
>> [    1.117628] XXX acpi_map: about to call kmap()...
>> [    1.118561] page:ffffea0002f914c0 refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x0 pfn:0xbe453
>> [    1.120381] flags: 0xfffffc0000000()
>> [    1.121116] raw: 000fffffc0000000 ffffea0002f914c8 ffffea0002f914c8
>> 0000000000000000
>> [    1.122638] raw: 0000000000000000 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [    1.124146] page dumped because: acpi_map pre SetPageReserved
>>
>> I also added dump_page() before unmapping, but it is not hit. The
>> following for the same pfn now shows up I believe as a result of setting
>> PageReserved:
>>
>> [   28.098208] BUG:Bad page state in process mo dprobe pfn:be453
>> [   28.098394] page:ffffea0002f914c0 refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x1 pfn:0xbe453
>> [   28.098394] flags: 0xfffffc0001000(reserved)
>> [   28.098394] raw: 000fffffc0001000 dead000000000100 dead000000000122
>> 0000000000000000
>> [   28.098394] raw: 0000000000000001 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [   28.098394] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
>> [   28.098394] page_owner info is not present (never set?)
>> [   28.098394] Modules linked in:
>> [   28.098394] CPU: 2 PID: 204 Comm: modprobe Not tainted 
>> 5.11.0-3dbd5e3 #66
>> [   28.098394] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 0.0.0 02/06/2015
>> [   28.098394] Call Trace:
>> [   28.098394]  dump_stack+0xdb/0x120
>> [   28.098394]  bad_page.cold.108+0xc6/0xcb
>> [   28.098394]  check_new_page_bad+0x47/0xa0
>> [   28.098394]  get_page_from_freelist+0x30cd/0x5730
>> [   28.098394]  ? __isolate_free_page+0x4f0/0x4f0
>> [   28.098394]  ? init_object+0x7e/0x90
>> [   28.098394]  __alloc_pages_nodemask+0x2d8/0x650
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  ? __alloc_pages_slowpath.constprop.103+0x2110/0x2110
>> [   28.098394]  ? __sanitizer_cov_trace_pc+0x21/0x50
>> [   28.098394]  alloc_pages_vma+0xe2/0x560
>> [   28.098394]  do_fault+0x194/0x12c0
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  __handle_mm_fault+0x1650/0x26c0
>> [   28.098394]  ? copy_page_range+0x1350/0x1350
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  handle_mm_fault+0x1f9/0x810
>> [   28.098394]  ? write_comp_data+0x2f/0x90
>> [   28.098394]  do_user_addr_fault+0x6f7/0xca0
>> [   28.098394]  exc_page_fault+0xaf/0x1a0
>> [   28.098394]  asm_exc_page_fault+0x1e/0x30
>> [   28.098394] RIP: 0010:__clear_user+0x30/0x60
>
> I think the PAGE_FLAGS_CHECK_AT_PREP check in this instance means that 
> someone is trying to allocate that page with the PG_reserved bit set. 
> This means that the page actually was exposed to the buddy.
>
> However, when you SetPageReserved(), I don't think that PG_buddy is 
> set and the refcount is 0. That could indicate that the page is on the 
> buddy PCP list. Could be that it is getting reused a couple of times.
>
> The PFN 0xbe453 looks a little strange, though. Do we expect ACPI 
> tables close to 3 GiB ? No idea. Could it be that you are trying to 
> map a wrong table? Just a guess.
>
>>
>> What would be  the correct way to reserve the page so that the above
>> would not be hit?
>
> I would have assumed that if this is a binary blob, that someone 
> (which I think would be acpi code) reserved via memblock_reserve() 
> early during boot.
>
> E.g., see drivers/acpi/tables.c:acpi_table_upgrade()->memblock_reserve().

acpi_table_upgrade() gets called, but bails out before 
memblock_reserve() is called. Thus, it appears no pages are getting 
reserved.

     503 void __init acpi_table_upgrade(void)
     504 {
     505         void *data;
     506         size_t size;
     507         int sig, no, table_nr = 0, total_offset = 0;
     508         long offset = 0;
     509         struct acpi_table_header *table;
     510         char cpio_path[32] = "kernel/firmware/acpi/";
     511         struct cpio_data file;
     512
     513         if 
(IS_ENABLED(CONFIG_ACPI_TABLE_OVERRIDE_VIA_BUILTIN_INITRD)) {
     514                 data = __initramfs_start;
     515                 size = __initramfs_size;
     516         } else {
     517                 data = (void *)initrd_start;
     518                 size = initrd_end - initrd_start;
     519         }
     520
     521         if (data == NULL || size == 0)
     522                 return;
     523
     524         for (no = 0; no < NR_ACPI_INITRD_TABLES; no++) {
     525                 file = find_cpio_data(cpio_path, data, size, 
&offset);
     526                 if (!file.data)
     527                         break;
...
     563                 all_tables_size += table->length;
     564                 acpi_initrd_files[table_nr].data = file.data;
     565                 acpi_initrd_files[table_nr].size = file.size;
     566                 table_nr++;
     567         }
     568         if (table_nr == 0)
     569                 return;                                 <-- 
bails out here
"drivers/acpi/tables.c"

George
Mike Rapoport Feb. 22, 2021, 6:45 p.m. UTC | #15
On Mon, Feb 22, 2021 at 12:40:36PM -0500, Konrad Rzeszutek Wilk wrote:
> On Mon, Feb 22, 2021 at 05:39:29PM +0100, David Hildenbrand wrote:
> > On 22.02.21 17:13, David Hildenbrand wrote:
> > > On 22.02.21 16:13, George Kennedy wrote:
> > > > 
> > > > 
> > > > On 2/22/2021 4:52 AM, David Hildenbrand wrote:
> > > > > On 20.02.21 00:04, George Kennedy wrote:
> > > > > > 
> > > > > > 
> > > > > > On 2/19/2021 11:45 AM, George Kennedy wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On 2/18/2021 7:09 PM, Andrey Konovalov wrote:
> > > > > > > > On Fri, Feb 19, 2021 at 1:06 AM George Kennedy
> > > > > > > > <george.kennedy@oracle.com> wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On 2/18/2021 3:55 AM, David Hildenbrand wrote:
> > > > > > > > > > On 17.02.21 21:56, Andrey Konovalov wrote:
> > > > > > > > > > > During boot, all non-reserved memblock memory is exposed to the
> > > > > > > > > > > buddy
> > > > > > > > > > > allocator. Poisoning all that memory with KASAN lengthens boot
> > > > > > > > > > > time,
> > > > > > > > > > > especially on systems with large amount of RAM. This patch makes
> > > > > > > > > > > page_alloc to not call kasan_free_pages() on all new memory.
> > > > > > > > > > > 
> > > > > > > > > > > __free_pages_core() is used when exposing fresh memory during
> > > > > > > > > > > system
> > > > > > > > > > > boot and when onlining memory during hotplug. This patch adds a new
> > > > > > > > > > > FPI_SKIP_KASAN_POISON flag and passes it to __free_pages_ok()
> > > > > > > > > > > through
> > > > > > > > > > > free_pages_prepare() from __free_pages_core().
> > > > > > > > > > > 
> > > > > > > > > > > This has little impact on KASAN memory tracking.
> > > > > > > > > > > 
> > > > > > > > > > > Assuming that there are no references to newly exposed pages
> > > > > > > > > > > before they
> > > > > > > > > > > are ever allocated, there won't be any intended (but buggy)
> > > > > > > > > > > accesses to
> > > > > > > > > > > that memory that KASAN would normally detect.
> > > > > > > > > > > 
> > > > > > > > > > > However, with this patch, KASAN stops detecting wild and large
> > > > > > > > > > > out-of-bounds accesses that happen to land on a fresh memory page
> > > > > > > > > > > that
> > > > > > > > > > > was never allocated. This is taken as an acceptable trade-off.
> > > > > > > > > > > 
> > > > > > > > > > > All memory allocated normally when the boot is over keeps getting
> > > > > > > > > > > poisoned as usual.
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
> > > > > > > > > > > Change-Id: Iae6b1e4bb8216955ffc14af255a7eaaa6f35324d
> > > > > > > > > > Not sure this is the right thing to do, see
> > > > > > > > > > 
> > > > > > > > > > https://lkml.kernel.org/r/bcf8925d-0949-3fe1-baa8-cc536c529860@oracle.com
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Reversing the order in which memory gets allocated + used during
> > > > > > > > > > boot
> > > > > > > > > > (in a patch by me) might have revealed an invalid memory access
> > > > > > > > > > during
> > > > > > > > > > boot.
> > > > > > > > > > 
> > > > > > > > > > I suspect that that issue would no longer get detected with your
> > > > > > > > > > patch, as the invalid memory access would simply not get detected.
> > > > > > > > > > Now, I cannot prove that :)
> > > > > > > > > Since David's patch we're having trouble with the iBFT ACPI table,
> > > > > > > > > which
> > > > > > > > > is mapped in via kmap() - see acpi_map() in "drivers/acpi/osl.c".
> > > > > > > > > KASAN
> > > > > > > > > detects that it is being used after free when ibft_init() accesses
> > > > > > > > > the
> > > > > > > > > iBFT table, but as of yet we can't find where it get's freed (we've
> > > > > > > > > instrumented calls to kunmap()).
> > > > > > > > Maybe it doesn't get freed, but what you see is a wild or a large
> > > > > > > > out-of-bounds access. Since KASAN marks all memory as freed during the
> > > > > > > > memblock->page_alloc transition, such bugs can manifest as
> > > > > > > > use-after-frees.
> > > > > > > 
> > > > > > > It gets freed and re-used. By the time the iBFT table is accessed by
> > > > > > > ibft_init() the page has been over-written.
> > > > > > > 
> > > > > > > Setting page flags like the following before the call to kmap()
> > > > > > > prevents the iBFT table page from being freed:
> > > > > > 
> > > > > > Cleaned up version:
> > > > > > 
> > > > > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> > > > > > index 0418feb..8f0a8e7 100644
> > > > > > --- a/drivers/acpi/osl.c
> > > > > > +++ b/drivers/acpi/osl.c
> > > > > > @@ -287,9 +287,12 @@ static void __iomem *acpi_map(acpi_physical_address
> > > > > > pg_off, unsigned long pg_sz)
> > > > > > 
> > > > > >          pfn = pg_off >> PAGE_SHIFT;
> > > > > >          if (should_use_kmap(pfn)) {
> > > > > > +        struct page *page = pfn_to_page(pfn);
> > > > > > +
> > > > > >              if (pg_sz > PAGE_SIZE)
> > > > > >                  return NULL;
> > > > > > -        return (void __iomem __force *)kmap(pfn_to_page(pfn));
> > > > > > +        SetPageReserved(page);
> > > > > > +        return (void __iomem __force *)kmap(page);
> > > > > >          } else
> > > > > >              return acpi_os_ioremap(pg_off, pg_sz);
> > > > > >      }
> > > > > > @@ -299,9 +302,12 @@ static void acpi_unmap(acpi_physical_address
> > > > > > pg_off, void __iomem *vaddr)
> > > > > >          unsigned long pfn;
> > > > > > 
> > > > > >          pfn = pg_off >> PAGE_SHIFT;
> > > > > > -    if (should_use_kmap(pfn))
> > > > > > -        kunmap(pfn_to_page(pfn));
> > > > > > -    else
> > > > > > +    if (should_use_kmap(pfn)) {
> > > > > > +        struct page *page = pfn_to_page(pfn);
> > > > > > +
> > > > > > +        ClearPageReserved(page);
> > > > > > +        kunmap(page);
> > > > > > +    } else
> > > > > >              iounmap(vaddr);
> > > > > >      }
> > > > > > 
> > > > > > David, the above works, but wondering why it is now necessary. kunmap()
> > > > > > is not hit. What other ways could a page mapped via kmap() be unmapped?
> > > > > > 
> > > > > 
> > > > > Let me look into the code ... I have little experience with ACPI
> > > > > details, so bear with me.
> > > > > 
> > > > > I assume that acpi_map()/acpi_unmap() map some firmware blob that is
> > > > > provided via firmware/bios/... to us.
> > > > > 
> > > > > should_use_kmap() tells us whether
> > > > > a) we have a "struct page" and should kmap() that one
> > > > > b) we don't have a "struct page" and should ioremap.
> > > > > 
> > > > > As it is a blob, the firmware should always reserve that memory region
> > > > > via memblock (e.g., memblock_reserve()), such that we either
> > > > > 1) don't create a memmap ("struct page") at all (-> case b) )
> > > > > 2) if we have to create e memmap, we mark the page PG_reserved and
> > > > >      *never* expose it to the buddy (-> case a) )
> > > > > 
> > > > > 
> > > > > Are you telling me that in this case we might have a memmap for the HW
> > > > > blob that is *not* PG_reserved? In that case it most probably got
> > > > > exposed to the buddy where it can happily get allocated/freed.
> > > > > 
> > > > > The latent BUG would be that that blob gets exposed to the system like
> > > > > ordinary RAM, and not reserved via memblock early during boot.
> > > > > Assuming that blob has a low physical address, with my patch it will
> > > > > get allocated/used a lot earlier - which would mean we trigger this
> > > > > latent BUG now more easily.
> > > > > 
> > > > > There have been similar latent BUGs on ARM boards that my patch
> > > > > discovered where special RAM regions did not get marked as reserved
> > > > > via the device tree properly.
> > > > > 
> > > > > Now, this is just a wild guess :) Can you dump the page when mapping
> > > > > (before PageReserved()) and when unmapping, to see what the state of
> > > > > that memmap is?
> > > > 
> > > > Thank you David for the explanation and your help on this,
> > > > 
> > > > dump_page() before PageReserved and before kmap() in the above patch:
> > > > 
> > > > [    1.116480] ACPI: Core revision 20201113
> > > > [    1.117628] XXX acpi_map: about to call kmap()...
> > > > [    1.118561] page:ffffea0002f914c0 refcount:0 mapcount:0
> > > > mapping:0000000000000000 index:0x0 pfn:0xbe453
> > > > [    1.120381] flags: 0xfffffc0000000()
> > > > [    1.121116] raw: 000fffffc0000000 ffffea0002f914c8 ffffea0002f914c8
> > > > 0000000000000000
> > > > [    1.122638] raw: 0000000000000000 0000000000000000 00000000ffffffff
> > > > 0000000000000000
> > > > [    1.124146] page dumped because: acpi_map pre SetPageReserved
> > > > 
> > > > I also added dump_page() before unmapping, but it is not hit. The
> > > > following for the same pfn now shows up I believe as a result of setting
> > > > PageReserved:
> > > > 
> > > > [   28.098208] BUG:Bad page state in process mo dprobe  pfn:be453
> > > > [   28.098394] page:ffffea0002f914c0 refcount:0 mapcount:0
> > > > mapping:0000000000000000 index:0x1 pfn:0xbe453
> > > > [   28.098394] flags: 0xfffffc0001000(reserved)
> > > > [   28.098394] raw: 000fffffc0001000 dead000000000100 dead000000000122
> > > > 0000000000000000
> > > > [   28.098394] raw: 0000000000000001 0000000000000000 00000000ffffffff
> > > > 0000000000000000
> > > > [   28.098394] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
> > > > [   28.098394] page_owner info is not present (never set?)
> > > > [   28.098394] Modules linked in:
> > > > [   28.098394] CPU: 2 PID: 204 Comm: modprobe Not tainted 5.11.0-3dbd5e3 #66
> > > > [   28.098394] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > > BIOS 0.0.0 02/06/2015
> > > > [   28.098394] Call Trace:
> > > > [   28.098394]  dump_stack+0xdb/0x120
> > > > [   28.098394]  bad_page.cold.108+0xc6/0xcb
> > > > [   28.098394]  check_new_page_bad+0x47/0xa0
> > > > [   28.098394]  get_page_from_freelist+0x30cd/0x5730
> > > > [   28.098394]  ? __isolate_free_page+0x4f0/0x4f0
> > > > [   28.098394]  ? init_object+0x7e/0x90
> > > > [   28.098394]  __alloc_pages_nodemask+0x2d8/0x650
> > > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > > [   28.098394]  ? __alloc_pages_slowpath.constprop.103+0x2110/0x2110
> > > > [   28.098394]  ? __sanitizer_cov_trace_pc+0x21/0x50
> > > > [   28.098394]  alloc_pages_vma+0xe2/0x560
> > > > [   28.098394]  do_fault+0x194/0x12c0
> > > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > > [   28.098394]  __handle_mm_fault+0x1650/0x26c0
> > > > [   28.098394]  ? copy_page_range+0x1350/0x1350
> > > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > > [   28.098394]  handle_mm_fault+0x1f9/0x810
> > > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > > [   28.098394]  do_user_addr_fault+0x6f7/0xca0
> > > > [   28.098394]  exc_page_fault+0xaf/0x1a0
> > > > [   28.098394]  asm_exc_page_fault+0x1e/0x30
> > > > [   28.098394] RIP: 0010:__clear_user+0x30/0x60
> > > 
> > > I think the PAGE_FLAGS_CHECK_AT_PREP check in this instance means that
> > > someone is trying to allocate that page with the PG_reserved bit set.
> > > This means that the page actually was exposed to the buddy.
> > > 
> > > However, when you SetPageReserved(), I don't think that PG_buddy is set
> > > and the refcount is 0. That could indicate that the page is on the buddy
> > > PCP list. Could be that it is getting reused a couple of times.
> > > 
> > > The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables
> > > close to 3 GiB ? No idea. Could it be that you are trying to map a wrong
> > > table? Just a guess.
> 
> Nah, ACPI MADT enumerates the table and that is the proper location of it.
> > 
> > ... but I assume ibft_check_device() would bail out on an invalid checksum.
> > So the question is, why is this page not properly marked as reserved
> > already.
> 
> The ibft_check_device ends up being called as module way way after the
> kernel has cleaned the memory.
> 
> The funny thing about iBFT is that (it is also mentioned in the spec)
> that the table can resize in memory .. or in the ACPI regions (which

                   ^ reside I presume?

> have no E820_RAM and are considered "MMIO" regions).
> 
> Either place is fine, so it can be in either RAM or MMIO :-(

I'd say that the tables in this case are in E820_RAM, because with MMIO we
wouldn't get to kmap() at the first place.
It can be easily confirmed by comparing the problematic address with
/proc/iomem.

Can't say I have a clue about what's going on there, but the theory that
somehow iBFT table does not get PG_Reserved during boot makes sense.

Do you see "iBFT found at 0x<addr>" early in the kernel log?

I don't know if ACPI relocates the tables, but I could not find anywhere
that it reserves the original ones. The memblock_reserve() in
acpi_table_upgrade() is merely a part of open coded memblock allocation.
Mike Rapoport Feb. 22, 2021, 9:55 p.m. UTC | #16
On Mon, Feb 22, 2021 at 01:42:56PM -0500, George Kennedy wrote:
> 
> On 2/22/2021 11:13 AM, David Hildenbrand wrote:
> > On 22.02.21 16:13, George Kennedy wrote:
> > > 
> > > On 2/22/2021 4:52 AM, David Hildenbrand wrote:
> > > > 
> > > > Let me look into the code ... I have little experience with ACPI
> > > > details, so bear with me.
> > > > 
> > > > I assume that acpi_map()/acpi_unmap() map some firmware blob that is
> > > > provided via firmware/bios/... to us.
> > > > 
> > > > should_use_kmap() tells us whether
> > > > a) we have a "struct page" and should kmap() that one
> > > > b) we don't have a "struct page" and should ioremap.
> > > > 
> > > > As it is a blob, the firmware should always reserve that memory region
> > > > via memblock (e.g., memblock_reserve()), such that we either
> > > > 1) don't create a memmap ("struct page") at all (-> case b) )
> > > > 2) if we have to create e memmap, we mark the page PG_reserved and
> > > >     *never* expose it to the buddy (-> case a) )
> > > > 
> > > > 
> > > > Are you telling me that in this case we might have a memmap for the HW
> > > > blob that is *not* PG_reserved? In that case it most probably got
> > > > exposed to the buddy where it can happily get allocated/freed.
> > > > 
> > > > The latent BUG would be that that blob gets exposed to the system like
> > > > ordinary RAM, and not reserved via memblock early during boot.
> > > > Assuming that blob has a low physical address, with my patch it will
> > > > get allocated/used a lot earlier - which would mean we trigger this
> > > > latent BUG now more easily.
> > > > 
> > > > There have been similar latent BUGs on ARM boards that my patch
> > > > discovered where special RAM regions did not get marked as reserved
> > > > via the device tree properly.
> > > > 
> > > > Now, this is just a wild guess :) Can you dump the page when mapping
> > > > (before PageReserved()) and when unmapping, to see what the state of
> > > > that memmap is?
> > > 
> > > Thank you David for the explanation and your help on this,
> > > 
> > > dump_page() before PageReserved and before kmap() in the above patch:
> > > 
> > > [    1.116480] ACPI: Core revision 20201113
> > > [    1.117628] XXX acpi_map: about to call kmap()...
> > > [    1.118561] page:ffffea0002f914c0 refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0x0 pfn:0xbe453
> > > [    1.120381] flags: 0xfffffc0000000()
> > > [    1.121116] raw: 000fffffc0000000 ffffea0002f914c8 ffffea0002f914c8
> > > 0000000000000000
> > > [    1.122638] raw: 0000000000000000 0000000000000000 00000000ffffffff
> > > 0000000000000000
> > > [    1.124146] page dumped because: acpi_map pre SetPageReserved
> > > 
> > > I also added dump_page() before unmapping, but it is not hit. The
> > > following for the same pfn now shows up I believe as a result of setting
> > > PageReserved:
> > > 
> > > [   28.098208] BUG:Bad page state in process mo dprobe pfn:be453
> > > [   28.098394] page:ffffea0002f914c0 refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0x1 pfn:0xbe453
> > > [   28.098394] flags: 0xfffffc0001000(reserved)
> > > [   28.098394] raw: 000fffffc0001000 dead000000000100 dead000000000122
> > > 0000000000000000
> > > [   28.098394] raw: 0000000000000001 0000000000000000 00000000ffffffff
> > > 0000000000000000
> > > [   28.098394] page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
> > > [   28.098394] page_owner info is not present (never set?)
> > > [   28.098394] Modules linked in:
> > > [   28.098394] CPU: 2 PID: 204 Comm: modprobe Not tainted
> > > 5.11.0-3dbd5e3 #66
> > > [   28.098394] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > > BIOS 0.0.0 02/06/2015
> > > [   28.098394] Call Trace:
> > > [   28.098394]  dump_stack+0xdb/0x120
> > > [   28.098394]  bad_page.cold.108+0xc6/0xcb
> > > [   28.098394]  check_new_page_bad+0x47/0xa0
> > > [   28.098394]  get_page_from_freelist+0x30cd/0x5730
> > > [   28.098394]  ? __isolate_free_page+0x4f0/0x4f0
> > > [   28.098394]  ? init_object+0x7e/0x90
> > > [   28.098394]  __alloc_pages_nodemask+0x2d8/0x650
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  ? __alloc_pages_slowpath.constprop.103+0x2110/0x2110
> > > [   28.098394]  ? __sanitizer_cov_trace_pc+0x21/0x50
> > > [   28.098394]  alloc_pages_vma+0xe2/0x560
> > > [   28.098394]  do_fault+0x194/0x12c0
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  __handle_mm_fault+0x1650/0x26c0
> > > [   28.098394]  ? copy_page_range+0x1350/0x1350
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  handle_mm_fault+0x1f9/0x810
> > > [   28.098394]  ? write_comp_data+0x2f/0x90
> > > [   28.098394]  do_user_addr_fault+0x6f7/0xca0
> > > [   28.098394]  exc_page_fault+0xaf/0x1a0
> > > [   28.098394]  asm_exc_page_fault+0x1e/0x30
> > > [   28.098394] RIP: 0010:__clear_user+0x30/0x60
> > 
> > I think the PAGE_FLAGS_CHECK_AT_PREP check in this instance means that
> > someone is trying to allocate that page with the PG_reserved bit set.
> > This means that the page actually was exposed to the buddy.
> > 
> > However, when you SetPageReserved(), I don't think that PG_buddy is set
> > and the refcount is 0. That could indicate that the page is on the buddy
> > PCP list. Could be that it is getting reused a couple of times.
> > 
> > The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables
> > close to 3 GiB ? No idea. Could it be that you are trying to map a wrong
> > table? Just a guess.
> > 
> > > 
> > > What would be  the correct way to reserve the page so that the above
> > > would not be hit?
> > 
> > I would have assumed that if this is a binary blob, that someone (which
> > I think would be acpi code) reserved via memblock_reserve() early during
> > boot.
> > 
> > E.g., see drivers/acpi/tables.c:acpi_table_upgrade()->memblock_reserve().
> 
> acpi_table_upgrade() gets called, but bails out before memblock_reserve() is
> called. Thus, it appears no pages are getting reserved.

acpi_table_upgrade() does not actually reserve memory but rather open
codes memblock allocation with memblock_find_in_range() +
memblock_reserve(), so it does not seem related anyway.

Do you have by chance a full boot log handy? 
 
>     503 void __init acpi_table_upgrade(void)
>     504 {

...

>     568         if (table_nr == 0)
>     569                 return;                                 <-- bails
> out here
> "drivers/acpi/tables.c"
> 
> George
>
Mike Rapoport Feb. 23, 2021, 10:33 a.m. UTC | #17
(re-added CC)

On Mon, Feb 22, 2021 at 08:24:59PM -0500, George Kennedy wrote:
> 
> On 2/22/2021 4:55 PM, Mike Rapoport wrote:
> > On Mon, Feb 22, 2021 at 01:42:56PM -0500, George Kennedy wrote:
> > > On 2/22/2021 11:13 AM, David Hildenbrand wrote:
> > > > On 22.02.21 16:13, George Kennedy wrote:
> > > > 
> > > > The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables
> > > > close to 3 GiB ? No idea. Could it be that you are trying to map a wrong
> > > > table? Just a guess.
> > > > 
> > > > > What would be  the correct way to reserve the page so that the above
> > > > > would not be hit?
> > > > I would have assumed that if this is a binary blob, that someone (which
> > > > I think would be acpi code) reserved via memblock_reserve() early during
> > > > boot.
> > > > 
> > > > E.g., see drivers/acpi/tables.c:acpi_table_upgrade()->memblock_reserve().
> > > acpi_table_upgrade() gets called, but bails out before memblock_reserve() is
> > > called. Thus, it appears no pages are getting reserved.
> > acpi_table_upgrade() does not actually reserve memory but rather open
> > codes memblock allocation with memblock_find_in_range() +
> > memblock_reserve(), so it does not seem related anyway.
> > 
> > Do you have by chance a full boot log handy?
> 
> Hello Mike,
> 
> Are you after the console output? See attached.
> 
> It includes my patch to set PG_Reserved along with the dump_page() debug
> that David asked for - see: "page:"

So, iBFT is indeed at pfn 0xbe453:

[    0.077698] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS  BXPCFACP 00000000      00000000)
 
and it's in E820_TYPE_RAM region rather than in ACPI data:

[    0.000000] BIOS-e820: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000000900000-0x00000000be49afff] usable
[    0.000000] BIOS-e820: [mem 0x00000000be49b000-0x00000000be49bfff] ACPI data

I could not find anywhere in x86 setup or in ACPI tables parsing the code
that reserves this memory or any other ACPI data for that matter. It could
be that I've missed some copying of the data to statically allocated
initial_tables, but AFAICS any ACPI data that was not marked as such in
e820 tables by BIOS resides in memory that is considered as free.

Can you please check if this hack (entirely untested) changes anything:

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 7bdc0239a943..c118dd54a747 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
 	if (acpi_disabled)
 		return;
 
+#if 0
 	/*
 	 * Initialize the ACPI boot-time table parser.
 	 */
@@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
 		disable_acpi();
 		return;
 	}
+#endif
 
 	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d883176ef2ce..c8a07a7b9577 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1032,6 +1032,14 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	find_smp_config();
 
+	/*
+	 * Initialize the ACPI boot-time table parser.
+	 */
+	if (acpi_table_init()) {
+		disable_acpi();
+		return;
+	}
+
 	reserve_ibft_region();
 
 	early_alloc_pgt_buf();
diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
index 64bb94523281..2e5e04090fe2 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -80,6 +80,21 @@ static int __init find_ibft_in_mem(void)
 done:
 	return len;
 }
+
+static void __init acpi_find_ibft_region(void)
+{
+	int i;
+	struct acpi_table_header *table = NULL;
+
+	if (acpi_disabled)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
+		acpi_get_table(ibft_signs[i].sign, 0, &table);
+		ibft_addr = (struct acpi_table_ibft *)table;
+	}
+}
+
 /*
  * Routine used to find the iSCSI Boot Format Table. The logical
  * kernel address is set in the ibft_addr global variable.
@@ -93,6 +108,8 @@ unsigned long __init find_ibft_region(unsigned long *sizep)
 
 	if (!efi_enabled(EFI_BOOT))
 		find_ibft_in_mem();
+	else
+		acpi_find_ibft_region();
 
 	if (ibft_addr) {
 		*sizep = PAGE_ALIGN(ibft_addr->header.length);

> Thank you,
> George
Mike Rapoport Feb. 23, 2021, 3:47 p.m. UTC | #18
Hi George,

On Tue, Feb 23, 2021 at 09:35:32AM -0500, George Kennedy wrote:
> 
> On 2/23/2021 5:33 AM, Mike Rapoport wrote:
> > (re-added CC)
> > 
> > On Mon, Feb 22, 2021 at 08:24:59PM -0500, George Kennedy wrote:
> > > On 2/22/2021 4:55 PM, Mike Rapoport wrote:
> > > > On Mon, Feb 22, 2021 at 01:42:56PM -0500, George Kennedy wrote:
> > > > > On 2/22/2021 11:13 AM, David Hildenbrand wrote:
> > > > > > On 22.02.21 16:13, George Kennedy wrote:
> > > > > > 
> > > > > > The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables
> > > > > > close to 3 GiB ? No idea. Could it be that you are trying to map a wrong
> > > > > > table? Just a guess.
> > > > > > 
> > > > > > > What would be  the correct way to reserve the page so that the above
> > > > > > > would not be hit?
> > > > > > I would have assumed that if this is a binary blob, that someone (which
> > > > > > I think would be acpi code) reserved via memblock_reserve() early during
> > > > > > boot.
> > > > > > 
> > > > > > E.g., see drivers/acpi/tables.c:acpi_table_upgrade()->memblock_reserve().
> > > > > acpi_table_upgrade() gets called, but bails out before memblock_reserve() is
> > > > > called. Thus, it appears no pages are getting reserved.
> > > > acpi_table_upgrade() does not actually reserve memory but rather open
> > > > codes memblock allocation with memblock_find_in_range() +
> > > > memblock_reserve(), so it does not seem related anyway.
> > > > 
> > > > Do you have by chance a full boot log handy?
> > > Hello Mike,
> > > 
> > > Are you after the console output? See attached.
> > > 
> > > It includes my patch to set PG_Reserved along with the dump_page() debug
> > > that David asked for - see: "page:"
> > So, iBFT is indeed at pfn 0xbe453:
> > 
> > [    0.077698] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS  BXPCFACP 00000000      00000000)
> > and it's in E820_TYPE_RAM region rather than in ACPI data:
> > 
> > [    0.000000] BIOS-e820: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
> > [    0.000000] BIOS-e820: [mem 0x0000000000900000-0x00000000be49afff] usable
> > [    0.000000] BIOS-e820: [mem 0x00000000be49b000-0x00000000be49bfff] ACPI data
> > 
> > I could not find anywhere in x86 setup or in ACPI tables parsing the code
> > that reserves this memory or any other ACPI data for that matter. It could
> > be that I've missed some copying of the data to statically allocated
> > initial_tables, but AFAICS any ACPI data that was not marked as such in
> > e820 tables by BIOS resides in memory that is considered as free.
> > 
> 
> Close...
> 
> Applied the patch, see "[   30.136157] iBFT detected.", but now hit the
> following (missing iounmap()? see full console output attached):
> 
> diff --git a/drivers/firmware/iscsi_ibft_find.c
> b/drivers/firmware/iscsi_ibft_find.c
> index 64bb945..2e5e040 100644
> --- a/drivers/firmware/iscsi_ibft_find.c
> +++ b/drivers/firmware/iscsi_ibft_find.c
> @@ -80,6 +80,21 @@ static int __init find_ibft_in_mem(void)
>  done:
>         return len;
>  }
> +
> +static void __init acpi_find_ibft_region(void)
> +{
> +       int i;
> +       struct acpi_table_header *table = NULL;
> +
> +       if (acpi_disabled)
> +               return;
> +
> +       for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> +               acpi_get_table(ibft_signs[i].sign, 0, &table);
> +               ibft_addr = (struct acpi_table_ibft *)table;

Can you try adding 

	acpi_put_table(table);

here?

> +       }
> +}
> +
George Kennedy Feb. 23, 2021, 6:05 p.m. UTC | #19
On 2/23/2021 10:47 AM, Mike Rapoport wrote:
> Hi George,
>
> On Tue, Feb 23, 2021 at 09:35:32AM -0500, George Kennedy wrote:
>> On 2/23/2021 5:33 AM, Mike Rapoport wrote:
>>> (re-added CC)
>>>
>>> On Mon, Feb 22, 2021 at 08:24:59PM -0500, George Kennedy wrote:
>>>> On 2/22/2021 4:55 PM, Mike Rapoport wrote:
>>>>> On Mon, Feb 22, 2021 at 01:42:56PM -0500, George Kennedy wrote:
>>>>>> On 2/22/2021 11:13 AM, David Hildenbrand wrote:
>>>>>>> On 22.02.21 16:13, George Kennedy wrote:
>>>>>>>
>>>>>>> The PFN 0xbe453 looks a little strange, though. Do we expect ACPI tables
>>>>>>> close to 3 GiB ? No idea. Could it be that you are trying to map a wrong
>>>>>>> table? Just a guess.
>>>>>>>
>>>>>>>> What would be  the correct way to reserve the page so that the above
>>>>>>>> would not be hit?
>>>>>>> I would have assumed that if this is a binary blob, that someone (which
>>>>>>> I think would be acpi code) reserved via memblock_reserve() early during
>>>>>>> boot.
>>>>>>>
>>>>>>> E.g., see drivers/acpi/tables.c:acpi_table_upgrade()->memblock_reserve().
>>>>>> acpi_table_upgrade() gets called, but bails out before memblock_reserve() is
>>>>>> called. Thus, it appears no pages are getting reserved.
>>>>> acpi_table_upgrade() does not actually reserve memory but rather open
>>>>> codes memblock allocation with memblock_find_in_range() +
>>>>> memblock_reserve(), so it does not seem related anyway.
>>>>>
>>>>> Do you have by chance a full boot log handy?
>>>> Hello Mike,
>>>>
>>>> Are you after the console output? See attached.
>>>>
>>>> It includes my patch to set PG_Reserved along with the dump_page() debug
>>>> that David asked for - see: "page:"
>>> So, iBFT is indeed at pfn 0xbe453:
>>>
>>> [    0.077698] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS  BXPCFACP 00000000      00000000)
>>> and it's in E820_TYPE_RAM region rather than in ACPI data:
>>>
>>> [    0.000000] BIOS-e820: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
>>> [    0.000000] BIOS-e820: [mem 0x0000000000900000-0x00000000be49afff] usable
>>> [    0.000000] BIOS-e820: [mem 0x00000000be49b000-0x00000000be49bfff] ACPI data
>>>
>>> I could not find anywhere in x86 setup or in ACPI tables parsing the code
>>> that reserves this memory or any other ACPI data for that matter. It could
>>> be that I've missed some copying of the data to statically allocated
>>> initial_tables, but AFAICS any ACPI data that was not marked as such in
>>> e820 tables by BIOS resides in memory that is considered as free.
>>>
>> Close...
>>
>> Applied the patch, see "[   30.136157] iBFT detected.", but now hit the
>> following (missing iounmap()? see full console output attached):
>>
>> diff --git a/drivers/firmware/iscsi_ibft_find.c
>> b/drivers/firmware/iscsi_ibft_find.c
>> index 64bb945..2e5e040 100644
>> --- a/drivers/firmware/iscsi_ibft_find.c
>> +++ b/drivers/firmware/iscsi_ibft_find.c
>> @@ -80,6 +80,21 @@ static int __init find_ibft_in_mem(void)
>>   done:
>>          return len;
>>   }
>> +
>> +static void __init acpi_find_ibft_region(void)
>> +{
>> +       int i;
>> +       struct acpi_table_header *table = NULL;
>> +
>> +       if (acpi_disabled)
>> +               return;
>> +
>> +       for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
>> +               acpi_get_table(ibft_signs[i].sign, 0, &table);
>> +               ibft_addr = (struct acpi_table_ibft *)table;
> Can you try adding
>
> 	acpi_put_table(table);
>
> here?
Mike,

It now crashes here:

[    0.051019] ACPI: Early table checksum verification disabled
[    0.056721] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
[    0.057874] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP 
00000001      01000013)
[    0.059590] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP 
00000001 BXPC 00000001)
[    0.061306] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT 
00000001 BXPC 00000001)
[    0.063006] ACPI: FACS 0x00000000BFBFD000 000040
[    0.063938] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC 
00000001 BXPC 00000001)
[    0.065638] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET 
00000001 BXPC 00000001)
[    0.067335] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2     
00000002      01000013)
[    0.069030] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP 
00000000      00000000)
[    0.070734] XXX acpi_find_ibft_region:
[    0.071468] XXX iBFT, status=0
[    0.072073] XXX about to call acpi_put_table()... 
ibft_addr=ffffffffff240000
[    0.073449] XXX acpi_find_ibft_region(EXIT):
PANIC: early exception 0x0e IP 10:ffffffff9259f439 error 0 cr2 
0xffffffffff240004
[    0.075711] CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-34a2105 #8
[    0.076983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 0.0.0 02/06/2015
[    0.078579] RIP: 0010:find_ibft_region+0x470/0x577
[    0.079541] Code: f1 40 0f 9e c6 84 c9 0f 95 c1 40 84 ce 75 11 83 e0 
07 38 c2 0f 9e c1 84 d2 0f 95 c0 84 c1 74 0a be 04 00 00 00 e8 37 f8 5f 
ef <8b> 5b 04 4c 89 fa b8 ff ff 37 00 48 c1 ea 03 48 c1 e0 2a 81 c3 ff
[    0.083207] RSP: 0000:ffffffff8fe07ca8 EFLAGS: 00010046 ORIG_RAX: 
0000000000000000
[    0.084709] RAX: 0000000000000000 RBX: ffffffffff240000 RCX: 
ffffffff815fcf01
[    0.086109] RDX: dffffc0000000000 RSI: 0000000000000001 RDI: 
ffffffffff240004
[    0.087509] RBP: ffffffff8fe07d60 R08: fffffbfff1fc0f21 R09: 
fffffbfff1fc0f21
[    0.088911] R10: ffffffff8fe07907 R11: fffffbfff1fc0f20 R12: 
ffffffff8fe07d38
[    0.090310] R13: 0000000000000001 R14: 0000000000000001 R15: 
ffffffff8fe07e80
[    0.091716] FS:  0000000000000000(0000) GS:ffffffff92409000(0000) 
knlGS:0000000000000000
[    0.093304] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.094435] CR2: ffffffffff240004 CR3: 0000000027630000 CR4: 
00000000000006a0
[    0.095843] Call Trace:
[    0.096345]  ? acpi_table_init+0x3eb/0x428
[    0.097164]  ? dmi_id_init+0x871/0x871
[    0.097912]  ? early_memunmap+0x22/0x27
[    0.098683]  ? smp_scan_config+0x20e/0x230
[    0.099500]  setup_arch+0xd3e/0x181d
[    0.100221]  ? reserve_standard_io_resources+0x3e/0x3e
[    0.101265]  ? __sanitizer_cov_trace_pc+0x21/0x50
[    0.102203]  ? vprintk_func+0xe9/0x200
[    0.102953]  ? printk+0xac/0xd4
[    0.103589]  ? record_print_text.cold.38+0x16/0x16
[    0.104540]  ? write_comp_data+0x2f/0x90
[    0.105325]  ? __sanitizer_cov_trace_pc+0x21/0x50
[    0.106262]  start_kernel+0x6c/0x474
[    0.106981]  x86_64_start_reservations+0x37/0x39
[    0.107902]  x86_64_start_kernel+0x7b/0x7e
[    0.108722]  secondary_startup_64_no_verify+0xb0/0xbb


Added debug to dump out the ibft_addr:

[root@gkennedy-20210107-1202 linux-upwork]# git diff
diff --git a/drivers/firmware/iscsi_ibft_find.c 
b/drivers/firmware/iscsi_ibft_find.c
index 2e5e040..a246373 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -83,16 +83,22 @@ static int __init find_ibft_in_mem(void)

  static void __init acpi_find_ibft_region(void)
  {
-       int i;
+       int i, status;
         struct acpi_table_header *table = NULL;
-
+printk(KERN_ERR "XXX acpi_find_ibft_region:\n");
         if (acpi_disabled)
                 return;

         for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
-               acpi_get_table(ibft_signs[i].sign, 0, &table);
-               ibft_addr = (struct acpi_table_ibft *)table;
+               status = acpi_get_table(ibft_signs[i].sign, 0, &table);
+               printk(KERN_ERR "XXX %s, status=%x\n", 
ibft_signs[i].sign, status);
+               if (ACPI_SUCCESS(status)) {
+                       ibft_addr = (struct acpi_table_ibft *)table;
+                       printk(KERN_ERR "XXX about to call 
acpi_put_table()... ibft_addr=%llx\n", (u64)ibft_addr);
+                       acpi_put_table(table);
+               }
         }
+printk(KERN_ERR "XXX acpi_find_ibft_region(EXIT):\n");
  }

  /*
(END)

George
>
>> +       }
>> +}
>> +
Mike Rapoport Feb. 23, 2021, 8:09 p.m. UTC | #20
On Tue, Feb 23, 2021 at 01:05:05PM -0500, George Kennedy wrote:
> On 2/23/2021 10:47 AM, Mike Rapoport wrote:
> 
> It now crashes here:
> 
> [    0.051019] ACPI: Early table checksum verification disabled
> [    0.056721] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
> [    0.057874] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
> 00000001      01000013)
> [    0.059590] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
> 00000001 BXPC 00000001)
> [    0.061306] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
> 00000001 BXPC 00000001)
> [    0.063006] ACPI: FACS 0x00000000BFBFD000 000040
> [    0.063938] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
> 00000001 BXPC 00000001)
> [    0.065638] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
> 00000001 BXPC 00000001)
> [    0.067335] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2    
> 00000002      01000013)
> [    0.069030] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
> 00000000      00000000)
> [    0.070734] XXX acpi_find_ibft_region:
> [    0.071468] XXX iBFT, status=0
> [    0.072073] XXX about to call acpi_put_table()...
> ibft_addr=ffffffffff240000
> [    0.073449] XXX acpi_find_ibft_region(EXIT):
> PANIC: early exception 0x0e IP 10:ffffffff9259f439 error 0 cr2
> 0xffffffffff240004

Right, I've missed the dereference of the ibft_addr after
acpi_find_ibft_region(). 

With this change to iscsi_ibft_find.c instead of the previous one it should
be better:

diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
index 64bb94523281..1be7481d5c69 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -80,6 +80,27 @@ static int __init find_ibft_in_mem(void)
 done:
 	return len;
 }
+
+static void __init acpi_find_ibft_region(unsigned long *sizep)
+{
+	int i;
+	struct acpi_table_header *table = NULL;
+	acpi_status status;
+
+	if (acpi_disabled)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
+		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
+		if (ACPI_SUCCESS(status)) {
+			ibft_addr = (struct acpi_table_ibft *)table;
+			*sizep = PAGE_ALIGN(ibft_addr->header.length);
+			acpi_put_table(table);
+			break;
+		}
+	}
+}
+
 /*
  * Routine used to find the iSCSI Boot Format Table. The logical
  * kernel address is set in the ibft_addr global variable.
@@ -91,14 +112,16 @@ unsigned long __init find_ibft_region(unsigned long *sizep)
 	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
 	 * only use ACPI for this */
 
-	if (!efi_enabled(EFI_BOOT))
+	if (!efi_enabled(EFI_BOOT)) {
 		find_ibft_in_mem();
-
-	if (ibft_addr) {
 		*sizep = PAGE_ALIGN(ibft_addr->header.length);
-		return (u64)virt_to_phys(ibft_addr);
+	} else {
+		acpi_find_ibft_region(sizep);
 	}
 
+	if (ibft_addr)
+		return (u64)virt_to_phys(ibft_addr);
+
 	*sizep = 0;
 	return 0;
 }

> [    0.075711] CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-34a2105 #8
> [    0.076983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 0.0.0 02/06/2015
> [    0.078579] RIP: 0010:find_ibft_region+0x470/0x577
George Kennedy Feb. 23, 2021, 9:16 p.m. UTC | #21
On 2/23/2021 3:09 PM, Mike Rapoport wrote:
> On Tue, Feb 23, 2021 at 01:05:05PM -0500, George Kennedy wrote:
>> On 2/23/2021 10:47 AM, Mike Rapoport wrote:
>>
>> It now crashes here:
>>
>> [    0.051019] ACPI: Early table checksum verification disabled
>> [    0.056721] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
>> [    0.057874] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
>> 00000001      01000013)
>> [    0.059590] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
>> 00000001 BXPC 00000001)
>> [    0.061306] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
>> 00000001 BXPC 00000001)
>> [    0.063006] ACPI: FACS 0x00000000BFBFD000 000040
>> [    0.063938] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
>> 00000001 BXPC 00000001)
>> [    0.065638] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
>> 00000001 BXPC 00000001)
>> [    0.067335] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2
>> 00000002      01000013)
>> [    0.069030] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
>> 00000000      00000000)
>> [    0.070734] XXX acpi_find_ibft_region:
>> [    0.071468] XXX iBFT, status=0
>> [    0.072073] XXX about to call acpi_put_table()...
>> ibft_addr=ffffffffff240000
>> [    0.073449] XXX acpi_find_ibft_region(EXIT):
>> PANIC: early exception 0x0e IP 10:ffffffff9259f439 error 0 cr2
>> 0xffffffffff240004
> Right, I've missed the dereference of the ibft_addr after
> acpi_find_ibft_region().
>
> With this change to iscsi_ibft_find.c instead of the previous one it should
> be better:
>
> diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
> index 64bb94523281..1be7481d5c69 100644
> --- a/drivers/firmware/iscsi_ibft_find.c
> +++ b/drivers/firmware/iscsi_ibft_find.c
> @@ -80,6 +80,27 @@ static int __init find_ibft_in_mem(void)
>   done:
>   	return len;
>   }
> +
> +static void __init acpi_find_ibft_region(unsigned long *sizep)
> +{
> +	int i;
> +	struct acpi_table_header *table = NULL;
> +	acpi_status status;
> +
> +	if (acpi_disabled)
> +		return;
> +
> +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
> +		if (ACPI_SUCCESS(status)) {
> +			ibft_addr = (struct acpi_table_ibft *)table;
> +			*sizep = PAGE_ALIGN(ibft_addr->header.length);
> +			acpi_put_table(table);
> +			break;
> +		}
> +	}
> +}
> +
>   /*
>    * Routine used to find the iSCSI Boot Format Table. The logical
>    * kernel address is set in the ibft_addr global variable.
> @@ -91,14 +112,16 @@ unsigned long __init find_ibft_region(unsigned long *sizep)
>   	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>   	 * only use ACPI for this */
>   
> -	if (!efi_enabled(EFI_BOOT))
> +	if (!efi_enabled(EFI_BOOT)) {
>   		find_ibft_in_mem();
> -
> -	if (ibft_addr) {
>   		*sizep = PAGE_ALIGN(ibft_addr->header.length);
> -		return (u64)virt_to_phys(ibft_addr);
> +	} else {
> +		acpi_find_ibft_region(sizep);
>   	}
>   
> +	if (ibft_addr)
> +		return (u64)virt_to_phys(ibft_addr);
> +
>   	*sizep = 0;
>   	return 0;
>   }
Mike,

No luck. Back to the original KASAN ibft_init crash.

I ran with only the above patch from you. Was that what you wanted? Your 
previous patch had a section defined out by #if 0. Was that supposed to 
be in there as well?

See the attached console output.

This is all I ran with:

# git diff
diff --git a/drivers/firmware/iscsi_ibft_find.c 
b/drivers/firmware/iscsi_ibft_find.c
index 64bb945..1be7481 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -80,6 +80,27 @@ static int __init find_ibft_in_mem(void)
  done:
         return len;
  }
+
+static void __init acpi_find_ibft_region(unsigned long *sizep)
+{
+       int i;
+       struct acpi_table_header *table = NULL;
+       acpi_status status;
+
+       if (acpi_disabled)
+               return;
+
+       for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
+               status = acpi_get_table(ibft_signs[i].sign, 0, &table);
+               if (ACPI_SUCCESS(status)) {
+                       ibft_addr = (struct acpi_table_ibft *)table;
+                       *sizep = PAGE_ALIGN(ibft_addr->header.length);
+                       acpi_put_table(table);
+                       break;
+               }
+       }
+}
+
  /*
   * Routine used to find the iSCSI Boot Format Table. The logical
   * kernel address is set in the ibft_addr global variable.
@@ -91,14 +112,16 @@ unsigned long __init find_ibft_region(unsigned long 
*sizep)
         /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
          * only use ACPI for this */

-       if (!efi_enabled(EFI_BOOT))
+       if (!efi_enabled(EFI_BOOT)) {
                 find_ibft_in_mem();
-
-       if (ibft_addr) {
                 *sizep = PAGE_ALIGN(ibft_addr->header.length);
-               return (u64)virt_to_phys(ibft_addr);
+       } else {
+               acpi_find_ibft_region(sizep);
         }

+       if (ibft_addr)
+               return (u64)virt_to_phys(ibft_addr);
+
         *sizep = 0;
         return 0;
  }


Thank you,
George
>> [    0.075711] CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-34a2105 #8
>> [    0.076983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 0.0.0 02/06/2015
>> [    0.078579] RIP: 0010:find_ibft_region+0x470/0x577
[    0.000000] Linux version 5.11.0-f9593a0 (root@gkennedy-20210107-1202) (gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3.2.0.1), GNU ld version 2.30-55.0.1.el7.2) #9 SMP Tue Feb 23 20:46:12 GMT 2021
[    0.000000] Command line:  slub_debug=FZPU page_owner=on earlyprintk=serial oops=panic nmi_watchdog=panic panic_on_warn=1 panic=1 ftrace_dump_on_oops=orig_cpu vsyscall=native net.ifnames=0 biosdevname=0 root=/dev/sda console=ttyS0 kvm-intel.nested=1 kvm-intel.unrestricted_guest=1 kvm-intel.vmm_exclusive=1 kvm-intel.fasteoi=1 kvm-intel.ept=1 kvm-intel.flexpriority=1 kvm-intel.vpid=1 kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1 kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1 kvm-intel.enable_apicv=1 console=ttyS0 loglevel=8 root=/dev/mapper/ol-root
[    0.000000] x86/fpu: x87 FPU will use FXSAVE
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000007fffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000800000-0x0000000000807fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000000808000-0x000000000080ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000000900000-0x00000000be49afff] usable
[    0.000000] BIOS-e820: [mem 0x00000000be49b000-0x00000000be49bfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000be49c000-0x00000000be91cfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000be91d000-0x00000000be920fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000be921000-0x00000000be9c7fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000be9c8000-0x00000000be9d3fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000be9d4000-0x00000000be9ebfff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000be9ec000-0x00000000bea70fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bea71000-0x00000000beb1afff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000beb1b000-0x00000000bfb9afff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bfb9b000-0x00000000bfbf2fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bfbf3000-0x00000000bfbfafff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000bfbfb000-0x00000000bfbfefff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000bfbff000-0x00000000bfedbfff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bfedc000-0x00000000bff5ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000bff60000-0x00000000bfffffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
[    0.000000] printk: bootconsole [earlyser0] enabled
[    0.000000] ERROR: earlyprintk= earlyser already used
[    0.000000] ERROR: earlyprintk= earlyser already used
[    0.000000] Malformed early option 'vsyscall'
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] e820: update [mem 0xbe490018-0xbe499457] usable ==> usable
[    0.000000] e820: update [mem 0xbe490018-0xbe499457] usable ==> usable
[    0.000000] e820: update [mem 0xbe454018-0xbe48f457] usable ==> usable
[    0.000000] e820: update [mem 0xbe454018-0xbe48f457] usable ==> usable
[    0.000000] extended physical RAM map:
[    0.000000] reserve setup_data: [mem 0x0000000000000000-0x000000000009ffff] usable
[    0.000000] reserve setup_data: [mem 0x0000000000100000-0x00000000007fffff] usable
[    0.000000] reserve setup_data: [mem 0x0000000000800000-0x0000000000807fff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x0000000000808000-0x000000000080ffff] usable
[    0.000000] reserve setup_data: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x0000000000900000-0x00000000be454017] usable
[    0.000000] reserve setup_data: [mem 0x00000000be454018-0x00000000be48f457] usable
[    0.000000] reserve setup_data: [mem 0x00000000be48f458-0x00000000be490017] usable
[    0.000000] reserve setup_data: [mem 0x00000000be490018-0x00000000be499457] usable
[    0.000000] reserve setup_data: [mem 0x00000000be499458-0x00000000be49afff] usable
[    0.000000] reserve setup_data: [mem 0x00000000be49b000-0x00000000be49bfff] ACPI data
[    0.000000] reserve setup_data: [mem 0x00000000be49c000-0x00000000be91cfff] usable
[    0.000000] reserve setup_data: [mem 0x00000000be91d000-0x00000000be920fff] reserved
[    0.000000] reserve setup_data: [mem 0x00000000be921000-0x00000000be9c7fff] usable
[    0.000000] reserve setup_data: [mem 0x00000000be9c8000-0x00000000be9d3fff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x00000000be9d4000-0x00000000be9ebfff] reserved
[    0.000000] reserve setup_data: [mem 0x00000000be9ec000-0x00000000bea70fff] usable
[    0.000000] reserve setup_data: [mem 0x00000000bea71000-0x00000000beb1afff] reserved
[    0.000000] reserve setup_data: [mem 0x00000000beb1b000-0x00000000bfb9afff] usable
[    0.000000] reserve setup_data: [mem 0x00000000bfb9b000-0x00000000bfbf2fff] reserved
[    0.000000] reserve setup_data: [mem 0x00000000bfbf3000-0x00000000bfbfafff] ACPI data
[    0.000000] reserve setup_data: [mem 0x00000000bfbfb000-0x00000000bfbfefff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x00000000bfbff000-0x00000000bfedbfff] usable
[    0.000000] reserve setup_data: [mem 0x00000000bfedc000-0x00000000bff5ffff] reserved
[    0.000000] reserve setup_data: [mem 0x00000000bff60000-0x00000000bfffffff] ACPI NVS
[    0.000000] reserve setup_data: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
[    0.000000] reserve setup_data: [mem 0x0000000100000000-0x000000013fffffff] usable
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: SMBIOS=0xbfbcc000 ACPI=0xbfbfa000 ACPI 2.0=0xbfbfa014 MEMATTR=0xbf2a2018 RNG=0xbfbcd898 
[    0.000000] efi: seeding entropy pool
[    0.000000] SMBIOS 2.8 present.
[    0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 28a01001, primary cpu clock
[    0.000001] kvm-clock: using sched offset of 4659585693 cycles
[    0.001117] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.008816] tsc: Detected 1995.312 MHz processor
[    0.009873] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.011184] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.012320] last_pfn = 0x140000 max_arch_pfn = 0x400000000
[    0.013436] MTRR default type: write-back
[    0.014215] MTRR fixed ranges enabled:
[    0.014946]   00000-9FFFF write-back
[    0.015654]   A0000-FFFFF uncachable
[    0.016357] MTRR variable ranges enabled:
[    0.017135]   0 base 00C0000000 mask FFC0000000 uncachable
[    0.018208]   1 base 2000000000 mask E000000000 uncachable
[    0.019289]   2 disabled
[    0.019784]   3 disabled
[    0.020287]   4 disabled
[    0.020781]   5 disabled
[    0.021278]   6 disabled
[    0.021775]   7 disabled
[    0.022289] x86/PAT: PAT not supported by the CPU.
[    0.023224] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC  
[    0.024555] last_pfn = 0xbfedc max_arch_pfn = 0x400000000
[    0.055231] check: Scanning 1 areas for low memory corruption
[    0.062548] Secure boot disabled
[    0.063174] RAMDISK: [mem 0x7eac5000-0x7fffffff]
[    0.064137] ACPI: Early table checksum verification disabled
[    0.065255] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
[    0.066385] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS  BXPCFACP 00000001      01000013)
[    0.068061] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS  BXPCFACP 00000001 BXPC 00000001)
[    0.069755] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
[    0.071433] ACPI: FACS 0x00000000BFBFD000 000040
[    0.072356] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS  BXPCAPIC 00000001 BXPC 00000001)
[    0.074033] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS  BXPCHPET 00000001 BXPC 00000001)
[    0.075707] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL  EDK2     00000002      01000013)
[    0.077381] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS  BXPCFACP 00000000      00000000)
[    0.079064] ACPI: Local APIC address 0xfee00000
[    0.081542] No NUMA configuration found
[    0.082285] Faking a node at [mem 0x0000000000000000-0x000000013fffffff]
[    0.083690] NODE_DATA(0) allocated [mem 0x13ffcf000-0x13fff9fff]
[    0.090839] Zone ranges:
[    0.091340]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.092544]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.093742]   Normal   [mem 0x0000000100000000-0x000000013fffffff]
[    0.094939]   Device   empty
[    0.095502] Movable zone start for each node
[    0.096377] Early memory node ranges
[    0.097079]   node   0: [mem 0x0000000000001000-0x000000000009ffff]
[    0.098297]   node   0: [mem 0x0000000000100000-0x00000000007fffff]
[    0.099507]   node   0: [mem 0x0000000000808000-0x000000000080ffff]
[    0.100715]   node   0: [mem 0x0000000000900000-0x00000000be49afff]
[    0.101927]   node   0: [mem 0x00000000be49c000-0x00000000be91cfff]
[    0.103136]   node   0: [mem 0x00000000be921000-0x00000000be9c7fff]
[    0.104348]   node   0: [mem 0x00000000be9ec000-0x00000000bea70fff]
[    0.105556]   node   0: [mem 0x00000000beb1b000-0x00000000bfb9afff]
[    0.106783]   node   0: [mem 0x00000000bfbff000-0x00000000bfedbfff]
[    0.108005]   node   0: [mem 0x0000000100000000-0x000000013fffffff]
[    0.109282] Zeroed struct page in unavailable ranges: 948 pages
[    0.109293] Initmem setup node 0 [mem 0x0000000000001000-0x000000013fffffff]
[    0.111790] On node 0 totalpages: 1047628
[    0.112571]   DMA zone: 59 pages used for memmap
[    0.113464]   DMA zone: 1814 pages reserved
[    0.114273]   DMA zone: 3751 pages, LIFO batch:0
[    0.115324]   DMA32 zone: 12215 pages used for memmap
[    0.116301]   DMA32 zone: 781733 pages, LIFO batch:63
[    0.147523]   Normal zone: 4096 pages used for memmap
[    0.148520]   Normal zone: 262144 pages, LIFO batch:63
[    0.297005] kasan: KernelAddressSanitizer initialized
[    0.303935] ACPI: PM-Timer IO Port: 0xb008
[    0.304737] ACPI: Local APIC address 0xfee00000
[    0.305656] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.306835] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.308180] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.309418] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.310706] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.311993] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.313305] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.314620] ACPI: IRQ0 used by override.
[    0.315383] ACPI: IRQ5 used by override.
[    0.316142] ACPI: IRQ9 used by override.
[    0.316901] ACPI: IRQ10 used by override.
[    0.317676] ACPI: IRQ11 used by override.
[    0.318462] Using ACPI (MADT) for SMP configuration information
[    0.319603] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.320631] e820: update [mem 0xbec6a000-0xbec72fff] usable ==> reserved
[    0.321997] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[    0.323279] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.324755] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000fffff]
[    0.326229] PM: hibernation: Registered nosave memory: [mem 0x00800000-0x00807fff]
[    0.327707] PM: hibernation: Registered nosave memory: [mem 0x00810000-0x008fffff]
[    0.329177] PM: hibernation: Registered nosave memory: [mem 0xbe454000-0xbe454fff]
[    0.330657] PM: hibernation: Registered nosave memory: [mem 0xbe48f000-0xbe48ffff]
[    0.332118] PM: hibernation: Registered nosave memory: [mem 0xbe490000-0xbe490fff]
[    0.333590] PM: hibernation: Registered nosave memory: [mem 0xbe499000-0xbe499fff]
[    0.335069] PM: hibernation: Registered nosave memory: [mem 0xbe49b000-0xbe49bfff]
[    0.336546] PM: hibernation: Registered nosave memory: [mem 0xbe91d000-0xbe920fff]
[    0.338017] PM: hibernation: Registered nosave memory: [mem 0xbe9c8000-0xbe9d3fff]
[    0.339488] PM: hibernation: Registered nosave memory: [mem 0xbe9d4000-0xbe9ebfff]
[    0.340960] PM: hibernation: Registered nosave memory: [mem 0xbea71000-0xbeb1afff]
[    0.342438] PM: hibernation: Registered nosave memory: [mem 0xbec6a000-0xbec72fff]
[    0.343910] PM: hibernation: Registered nosave memory: [mem 0xbfb9b000-0xbfbf2fff]
[    0.345366] PM: hibernation: Registered nosave memory: [mem 0xbfbf3000-0xbfbfafff]
[    0.346821] PM: hibernation: Registered nosave memory: [mem 0xbfbfb000-0xbfbfefff]
[    0.348300] PM: hibernation: Registered nosave memory: [mem 0xbfedc000-0xbff5ffff]
[    0.349753] PM: hibernation: Registered nosave memory: [mem 0xbff60000-0xbfffffff]
[    0.351212] PM: hibernation: Registered nosave memory: [mem 0xc0000000-0xffbfffff]
[    0.352671] PM: hibernation: Registered nosave memory: [mem 0xffc00000-0xffffffff]
[    0.354129] [mem 0xc0000000-0xffbfffff] available for PCI devices
[    0.355317] Booting paravirtualized kernel on KVM
[    0.356222] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.358292] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1
[    0.360504] percpu: Embedded 64 pages/cpu s225280 r8192 d28672 u524288
[    0.361934] pcpu-alloc: s225280 r8192 d28672 u524288 alloc=1*2097152
[    0.363188] pcpu-alloc: [0] 0 1 2 3 
[    0.364333] kvm-guest: stealtime: cpu 0, msr 10ac36080
[    0.365344] kvm-guest: PV spinlocks disabled, no host support
[    0.366482] Built 1 zonelists, mobility grouping on.  Total pages: 1029444
[    0.367824] Policy zone: Normal
[    0.368440] Kernel command line:  slub_debug=FZPU page_owner=on earlyprintk=serial oops=panic nmi_watchdog=panic panic_on_warn=1 panic=1 ftrace_dump_on_oops=orig_cpu vsyscall=native net.ifnames=0 biosdevname=0 root=/dev/sda console=ttyS0 kvm-intel.nested=1 kvm-intel.unrestricted_guest=1 kvm-intel.vmm_exclusive=1 kvm-intel.fasteoi=1 kvm-intel.ept=1 kvm-intel.flexpriority=1 kvm-intel.vpid=1 kvm-intel.emulate_invalid_guest_state=1 kvm-intel.eptad=1 kvm-intel.enable_shadow_vmcs=1 kvm-intel.pml=1 kvm-intel.enable_apicv=1 console=ttyS0 loglevel=8 root=/dev/mapper/ol-root
[    0.386154] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.388146] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.389915] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.481702] Memory: 2851380K/4190512K available (198686K kernel code, 38641K rwdata, 42112K rodata, 6012K init, 18396K bss, 1298756K reserved, 0K cma-reserved)
[    0.485010] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.486303] Kernel/User page tables isolation: enabled
[    0.487669] ftrace: allocating 176586 entries in 690 pages
[    0.939816] ftrace: allocated 690 pages with 5 groups
[    0.945552] 
[    0.945842] **********************************************************
[    0.947080] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.948307] **                                                      **
[    0.949539] ** trace_printk() being used. Allocating extra memory.  **
[    0.950766] **                                                      **
[    0.951994] ** This means that this is a DEBUG kernel and it is     **
[    0.953231] ** unsafe for production use.                           **
[    0.954459] **                                                      **
[    0.955688] ** If you see this message and you are not debugging    **
[    0.956925] ** the kernel, report this immediately to your vendor!  **
[    0.958155] **                                                      **
[    0.959389] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.960621] **********************************************************
[    0.965174] rcu: Hierarchical RCU implementation.
[    0.966081] rcu: 	RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=4.
[    0.967375] 	Rude variant of Tasks RCU enabled.
[    0.968231] 	Tracing variant of Tasks RCU enabled.
[    0.969133] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[    0.970583] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    1.049576] NR_IRQS: 524544, nr_irqs: 456, preallocated irqs: 16
[    1.051620] random: get_random_bytes called from start_kernel+0x237/0x474 with crng_init=0
[    1.051869] Console: colour dummy device 80x25
[    1.054340] printk: console [ttyS0] enabled
[    1.054340] printk: console [ttyS0] enabled
[    1.055949] printk: bootconsole [earlyser0] disabled
[    1.055949] printk: bootconsole [earlyser0] disabled
[    1.057927] ACPI: Core revision 20201113
[    1.059524] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[    1.061460] APIC: Switch to symmetric I/O mode setup
[    1.062874] x2apic enabled
[    1.063887] Switched APIC routing to physical x2apic.
[    1.067745] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    1.068984] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x3985c314e25, max_idle_ns: 881590612270 ns
[    1.071056] Calibrating delay loop (skipped) preset value.. 3990.62 BogoMIPS (lpj=1995312)
[    1.072056] pid_max: default: 32768 minimum: 301
[    1.080396] LSM: Security Framework initializing
[    1.081112] Yama: becoming mindful.
[    1.082139] SELinux:  Initializing.
[    1.083535] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    1.084069] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    1.087913] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    1.088059] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    1.089059] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[    1.090058] Spectre V2 : Mitigation: Full generic retpoline
[    1.091055] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[    1.092055] Speculative Store Bypass: Vulnerable
[    1.093060] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[    1.095292] Freeing SMP alternatives memory: 128K
[    1.208377] smpboot: CPU0: Intel QEMU Virtual CPU version 2.5+ (family: 0x6, model: 0x6, stepping: 0x3)
[    1.210711] Performance Events: PMU not available due to virtualization, using software events only.
[    1.213228] rcu: Hierarchical SRCU implementation.
[    1.230343] NMI watchdog: Perf NMI watchdog permanently disabled
[    1.231506] smp: Bringing up secondary CPUs ...
[    1.236204] x86: Booting SMP configuration:
[    1.237035] .... node  #0, CPUs:      #1
[    0.015565] kvm-clock: cpu 1, msr 28a01041, secondary cpu clock
[    0.015565] smpboot: CPU 1 Converting physical 0 to logical die 1
[    1.241149] kvm-guest: stealtime: cpu 1, msr 10acb6080
[    1.246292]  #2
[    0.015565] kvm-clock: cpu 2, msr 28a01081, secondary cpu clock
[    0.015565] smpboot: CPU 2 Converting physical 0 to logical die 2
[    1.250114] kvm-guest: stealtime: cpu 2, msr 10ad36080
[    1.254984]  #3
[    0.015565] kvm-clock: cpu 3, msr 28a010c1, secondary cpu clock
[    0.015565] smpboot: CPU 3 Converting physical 0 to logical die 3
[    1.259119] kvm-guest: stealtime: cpu 3, msr 10adb6080
[    1.260138] smp: Brought up 1 node, 4 CPUs
[    1.261065] smpboot: Max logical packages: 4
[    1.262056] smpboot: Total of 4 processors activated (15962.49 BogoMIPS)
[    1.266774] node 0 deferred pages initialised in 1ms
[    1.286501] allocated 41943040 bytes of page_ext
[    1.287264] Node 0, zone      DMA: page owner found early allocated 0 pages
[    1.294645] Node 0, zone    DMA32: page owner found early allocated 32 pages
[    1.310991] Node 0, zone   Normal: page owner found early allocated 15143 pages
[    1.312066] devtmpfs: initialized
[    1.313601] x86/mm: Memory block size: 128MB
[    1.369040] PM: Registering ACPI NVS region [mem 0x00800000-0x00807fff] (32768 bytes)
[    1.370113] PM: Registering ACPI NVS region [mem 0x00810000-0x008fffff] (983040 bytes)
[    1.372501] PM: Registering ACPI NVS region [mem 0xbe9c8000-0xbe9d3fff] (49152 bytes)
[    1.374099] PM: Registering ACPI NVS region [mem 0xbfbfb000-0xbfbfefff] (16384 bytes)
[    1.375089] PM: Registering ACPI NVS region [mem 0xbff60000-0xbfffffff] (655360 bytes)
[    1.377645] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    1.378066] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[    1.380364] pinctrl core: initialized pinctrl subsystem
[    1.385702] NET: Registered protocol family 16
[    1.394727] DMA: preallocated 512 KiB GFP_KERNEL pool for atomic allocations
[    1.395120] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    1.396129] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    1.397166] audit: initializing netlink subsys (disabled)
[    1.398225] audit: type=2000 audit(1614113722.831:1): state=initialized audit_enabled=0 res=1
[    1.400852] thermal_sys: Registered thermal governor 'fair_share'
[    1.400864] thermal_sys: Registered thermal governor 'bang_bang'
[    1.401057] thermal_sys: Registered thermal governor 'step_wise'
[    1.402131] cpuidle: using governor menu
[    1.411373] ACPI: bus type PCI registered
[    1.412058] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    1.414263] dca service started, version 1.12.1
[    1.415269] PCI: Using configuration type 1 for base access
[    1.922479] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    1.930360] cryptd: max_cpu_qlen set to 1000
[    1.959057] raid6: sse2x4   gen()  2950 MB/s
[    1.976061] raid6: sse2x4   xor()  1499 MB/s
[    1.993055] raid6: sse2x2   gen()  1851 MB/s
[    2.010062] raid6: sse2x2   xor()   845 MB/s
[    2.027058] raid6: sse2x1   gen()  1117 MB/s
[    2.044064] raid6: sse2x1   xor()   457 MB/s
[    2.044908] raid6: using algorithm sse2x4 gen() 2950 MB/s
[    2.045054] raid6: .... xor() 1499 MB/s, rmw enabled
[    2.046056] raid6: using intx1 recovery algorithm
[    2.048573] ACPI: Added _OSI(Module Device)
[    2.049057] ACPI: Added _OSI(Processor Device)
[    2.050062] ACPI: Added _OSI(3.0 _SCP Extensions)
[    2.051056] ACPI: Added _OSI(Processor Aggregator Device)
[    2.052070] ACPI: Added _OSI(Linux-Dell-Video)
[    2.053083] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[    2.054070] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[    2.130850] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    2.172947] ACPI: Interpreter enabled
[    2.173283] ACPI: (supports S0 S3 S4 S5)
[    2.174057] ACPI: Using IOAPIC for interrupt routing
[    2.175239] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    2.181882] ACPI: Enabled 2 GPEs in block 00 to 0F
[    2.369532] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    2.370116] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[    2.371368] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    2.382402] acpiphp: Slot [3] registered
[    2.383433] acpiphp: Slot [4] registered
[    2.384404] acpiphp: Slot [5] registered
[    2.385415] acpiphp: Slot [6] registered
[    2.387081] acpiphp: Slot [7] registered
[    2.388236] acpiphp: Slot [8] registered
[    2.389398] acpiphp: Slot [9] registered
[    2.390423] acpiphp: Slot [10] registered
[    2.391409] acpiphp: Slot [11] registered
[    2.392410] acpiphp: Slot [12] registered
[    2.394099] acpiphp: Slot [13] registered
[    2.395281] acpiphp: Slot [14] registered
[    2.396400] acpiphp: Slot [15] registered
[    2.397407] acpiphp: Slot [16] registered
[    2.399230] acpiphp: Slot [19] registered
[    2.400417] acpiphp: Slot [20] registered
[    2.401396] acpiphp: Slot [21] registered
[    2.402433] acpiphp: Slot [22] registered
[    2.403430] acpiphp: Slot [23] registered
[    2.405166] acpiphp: Slot [24] registered
[    2.406335] acpiphp: Slot [25] registered
[    2.407431] acpiphp: Slot [26] registered
[    2.408418] acpiphp: Slot [27] registered
[    2.409403] acpiphp: Slot [28] registered
[    2.411058] acpiphp: Slot [29] registered
[    2.412238] acpiphp: Slot [30] registered
[    2.413413] acpiphp: Slot [31] registered
[    2.414205] PCI host bridge to bus 0000:00
[    2.415068] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    2.416066] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    2.417066] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    2.418066] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
[    2.419066] pci_bus 0000:00: root bus resource [mem 0x2000000000-0x207fffffff window]
[    2.420073] pci_bus 0000:00: root bus resource [bus 00-ff]
[    2.421293] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
[    2.426332] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100
[    2.431505] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180
[    2.436068] pci 0000:00:01.1: reg 0x20: [io  0xd2c0-0xd2cf]
[    2.438414] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    2.439058] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    2.440057] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    2.441058] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    2.445036] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000
[    2.446765] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4 ACPI
[    2.447071] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4 SMB
[    2.452240] pci 0000:00:02.0: [1234:1111] type 00 class 0x030000
[    2.455106] pci 0000:00:02.0: reg 0x10: [mem 0xc0000000-0xc0ffffff pref]
[    2.460104] pci 0000:00:02.0: reg 0x18: [mem 0xc1402000-0xc1402fff]
[    2.469104] pci 0000:00:02.0: reg 0x30: [mem 0xffff0000-0xffffffff pref]
[    2.471128] pci 0000:00:02.0: BAR 0: assigned to efifb
[    2.475885] pci 0000:00:03.0: [1af4:1005] type 00 class 0x00ff00
[    2.476905] pci 0000:00:03.0: reg 0x10: [io  0xd2a0-0xd2bf]
[    2.480060] pci 0000:00:03.0: reg 0x20: [mem 0x2000100000-0x2000103fff 64bit pref]
[    2.487024] pci 0000:00:04.0: [14e4:16dc] type 00 class 0x020000
[    2.490775] pci 0000:00:04.0: reg 0x10: [mem 0x2000104000-0x2000107fff 64bit pref]
[    2.493104] pci 0000:00:04.0: reg 0x18: [mem 0x2000000000-0x20000fffff 64bit pref]
[    2.496106] pci 0000:00:04.0: reg 0x20: [mem 0x2000108000-0x200010bfff 64bit pref]
[    2.499168] pci 0000:00:04.0: enabling Extended Tags
[    2.504607] pci 0000:00:05.0: [1af4:1000] type 00 class 0x020000
[    2.506551] pci 0000:00:05.0: reg 0x10: [io  0xd280-0xd29f]
[    2.507990] pci 0000:00:05.0: reg 0x14: [mem 0xc1401000-0xc1401fff]
[    2.511060] pci 0000:00:05.0: reg 0x20: [mem 0x200010c000-0x200010ffff 64bit pref]
[    2.513060] pci 0000:00:05.0: reg 0x30: [mem 0xfffc0000-0xffffffff pref]
[    2.518348] pci 0000:00:10.0: [1af4:1001] type 00 class 0x010000
[    2.520592] pci 0000:00:10.0: reg 0x10: [io  0xd200-0xd27f]
[    2.522559] pci 0000:00:10.0: reg 0x14: [mem 0xc1400000-0xc1400fff]
[    2.526060] pci 0000:00:10.0: reg 0x20: [mem 0x2000110000-0x2000113fff 64bit pref]
[    2.533096] pci 0000:00:11.0: [1b36:0001] type 01 class 0x060400
[    2.535544] pci 0000:00:11.0: reg 0x10: [mem 0x2000115000-0x20001150ff 64bit]
[    2.541646] pci 0000:00:12.0: [1b36:0001] type 01 class 0x060400
[    2.544060] pci 0000:00:12.0: reg 0x10: [mem 0x2000114000-0x20001140ff 64bit]
[    2.551297] pci_bus 0000:01: extended config space not accessible
[    2.561689] acpiphp: Slot [0] registered
[    2.562444] acpiphp: Slot [1] registered
[    2.564075] acpiphp: Slot [2] registered
[    2.567007] acpiphp: Slot [3-2] registered
[    2.567563] acpiphp: Slot [4-2] registered
[    2.568542] acpiphp: Slot [5-2] registered
[    2.570355] acpiphp: Slot [6-2] registered
[    2.571558] acpiphp: Slot [7-2] registered
[    2.572555] acpiphp: Slot [8-2] registered
[    2.574394] acpiphp: Slot [9-2] registered
[    2.575554] acpiphp: Slot [10-2] registered
[    2.577096] acpiphp: Slot [11-2] registered
[    2.578448] acpiphp: Slot [12-2] registered
[    2.579570] acpiphp: Slot [13-2] registered
[    2.581196] acpiphp: Slot [14-2] registered
[    2.582570] acpiphp: Slot [15-2] registered
[    2.583555] acpiphp: Slot [16-2] registered
[    2.585159] acpiphp: Slot [17] registered
[    2.586338] acpiphp: Slot [18] registered
[    2.587562] acpiphp: Slot [19-2] registered
[    2.588572] acpiphp: Slot [20-2] registered
[    2.590404] acpiphp: Slot [21-2] registered
[    2.591568] acpiphp: Slot [22-2] registered
[    2.593150] acpiphp: Slot [23-2] registered
[    2.594502] acpiphp: Slot [24-2] registered
[    2.595543] acpiphp: Slot [25-2] registered
[    2.597235] acpiphp: Slot [26-2] registered
[    2.598557] acpiphp: Slot [27-2] registered
[    2.599556] acpiphp: Slot [28-2] registered
[    2.601309] acpiphp: Slot [29-2] registered
[    2.602580] acpiphp: Slot [30-2] registered
[    2.603538] acpiphp: Slot [31-2] registered
[    2.605263] pci 0000:00:11.0: PCI bridge to [bus 01]
[    2.606082] pci 0000:00:11.0:   bridge window [io  0xd000-0xdfff]
[    2.607080] pci 0000:00:11.0:   bridge window [mem 0xc1200000-0xc13fffff]
[    2.609618] pci_bus 0000:02: extended config space not accessible
[    2.619345] acpiphp: Slot [0-2] registered
[    2.620552] acpiphp: Slot [1-2] registered
[    2.621562] acpiphp: Slot [2-2] registered
[    2.623412] acpiphp: Slot [3-3] registered
[    2.624562] acpiphp: Slot [4-3] registered
[    2.626093] acpiphp: Slot [5-3] registered
[    2.627434] acpiphp: Slot [6-3] registered
[    2.628582] acpiphp: Slot [7-3] registered
[    2.630147] acpiphp: Slot [8-3] registered
[    2.631486] acpiphp: Slot [9-3] registered
[    2.632584] acpiphp: Slot [10-3] registered
[    2.634219] acpiphp: Slot [11-3] registered
[    2.635581] acpiphp: Slot [12-3] registered
[    2.636560] acpiphp: Slot [13-3] registered
[    2.638332] acpiphp: Slot [14-3] registered
[    2.639563] acpiphp: Slot [15-3] registered
[    2.640539] acpiphp: Slot [16-3] registered
[    2.642411] acpiphp: Slot [17-2] registered
[    2.643569] acpiphp: Slot [18-2] registered
[    2.645147] acpiphp: Slot [19-3] registered
[    2.646501] acpiphp: Slot [20-3] registered
[    2.647573] acpiphp: Slot [21-3] registered
[    2.649241] acpiphp: Slot [22-3] registered
[    2.650572] acpiphp: Slot [23-3] registered
[    2.651569] acpiphp: Slot [24-3] registered
[    2.653344] acpiphp: Slot [25-3] registered
[    2.654560] acpiphp: Slot [26-3] registered
[    2.656093] acpiphp: Slot [27-3] registered
[    2.657453] acpiphp: Slot [28-3] registered
[    2.658572] acpiphp: Slot [29-3] registered
[    2.660192] acpiphp: Slot [30-3] registered
[    2.661548] acpiphp: Slot [31-3] registered
[    2.662426] pci 0000:00:12.0: PCI bridge to [bus 02]
[    2.663081] pci 0000:00:12.0:   bridge window [io  0xc000-0xcfff]
[    2.665078] pci 0000:00:12.0:   bridge window [mem 0xc1000000-0xc11fffff]
[    2.681149] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 10 *11)
[    2.684593] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 10 *11)
[    2.688002] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 *10 11)
[    2.690367] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 *10 11)
[    2.692506] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[    2.701882] iommu: Default domain type: Translated 
[    2.703238] pci 0000:00:02.0: vgaarb: setting as boot VGA device
[    2.704052] pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    2.704065] pci 0000:00:02.0: vgaarb: bridge control possible
[    2.705059] vgaarb: loaded
[    2.709611] SCSI subsystem initialized
[    2.710453] libata version 3.00 loaded.
[    2.712272] ACPI: bus type USB registered
[    2.713445] usbcore: registered new interface driver usbfs
[    2.714213] usbcore: registered new interface driver hub
[    2.715163] usbcore: registered new device driver usb
[    2.717828] mc: Linux media interface: v0.10
[    2.718153] videodev: Linux video capture interface: v2.00
[    2.720179] pps_core: LinuxPPS API ver. 1 registered
[    2.721055] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    2.723082] PTP clock support registered
[    2.724157] EDAC MC: Ver: 3.0.0
[    2.727630] Registered efivars operations
[    2.730743] Advanced Linux Sound Architecture Driver Initialized.
[    2.733676] Bluetooth: Core ver 2.22
[    2.734163] NET: Registered protocol family 31
[    2.735055] Bluetooth: HCI device and connection manager initialized
[    2.736069] Bluetooth: HCI socket layer initialized
[    2.737063] Bluetooth: L2CAP socket layer initialized
[    2.738094] Bluetooth: SCO socket layer initialized
[    2.739083] NET: Registered protocol family 8
[    2.740055] NET: Registered protocol family 20
[    2.741196] NetLabel: Initializing
[    2.742055] NetLabel:  domain hash size = 128
[    2.743054] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    2.744329] NetLabel:  unlabeled traffic allowed by default
[    2.746476] PCI: Using ACPI for IRQ routing
[    2.747063] PCI: pci_cache_line_size set to 64 bytes
[    2.748116] pci 0000:00:01.1: can't claim BAR 4 [io  0xd2c0-0xd2cf]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
[    2.749105] pci 0000:00:03.0: can't claim BAR 0 [io  0xd2a0-0xd2bf]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
[    2.750100] pci 0000:00:05.0: can't claim BAR 0 [io  0xd280-0xd29f]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
[    2.751098] pci 0000:00:10.0: can't claim BAR 0 [io  0xd200-0xd27f]: address conflict with PCI Bus 0000:01 [io  0xd000-0xdfff]
[    2.752230] e820: reserve RAM buffer [mem 0x00810000-0x008fffff]
[    2.753071] e820: reserve RAM buffer [mem 0xbe454018-0xbfffffff]
[    2.754081] e820: reserve RAM buffer [mem 0xbe490018-0xbfffffff]
[    2.755078] e820: reserve RAM buffer [mem 0xbe49b000-0xbfffffff]
[    2.756077] e820: reserve RAM buffer [mem 0xbe91d000-0xbfffffff]
[    2.758086] e820: reserve RAM buffer [mem 0xbe9c8000-0xbfffffff]
[    2.759075] e820: reserve RAM buffer [mem 0xbea71000-0xbfffffff]
[    2.760073] e820: reserve RAM buffer [mem 0xbec6a000-0xbfffffff]
[    2.761071] e820: reserve RAM buffer [mem 0xbfb9b000-0xbfffffff]
[    2.762069] e820: reserve RAM buffer [mem 0xbfedc000-0xbfffffff]
[    2.764114] dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.4)
[    2.772887] hpet: 3 channels of 0 reserved for per-cpu timers
[    2.773084] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
[    2.774055] hpet0: 3 comparators, 64-bit 100.000000 MHz counter
[    2.777386] clocksource: Switched to clocksource kvm-clock
[    3.326901] VFS: Disk quotas dquot_6.6.0
[    3.327894] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    3.329987] FS-Cache: Loaded
[    3.331278] CacheFiles: Loaded
[    3.332208] pnp: PnP ACPI init
[    3.333738] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active)
[    3.335751] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active)
[    3.337680] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active)
[    3.339202] pnp 00:03: [dma 2]
[    3.340063] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active)
[    3.342482] pnp 00:04: Plug and Play ACPI device, IDs PNP0400 (active)
[    3.344719] pnp 00:05: Plug and Play ACPI device, IDs PNP0501 (active)
[    3.353663] pnp: PnP ACPI: found 6 devices
[    3.391644] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    3.395010] NET: Registered protocol family 2
[    8.146025] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
[    8.152511] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    8.154546] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
[    8.156626] TCP: Hash tables configured (established 32768 bind 32768)
[    8.158340] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    8.159764] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    8.161882] NET: Registered protocol family 1
[    8.164190] RPC: Registered named UNIX socket transport module.
[    8.165370] RPC: Registered udp transport module.
[    8.166310] RPC: Registered tcp transport module.
[    8.167245] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    8.170438] pci 0000:00:02.0: can't claim BAR 6 [mem 0xffff0000-0xffffffff pref]: no compatible bridge window
[    8.172397] pci 0000:00:05.0: can't claim BAR 6 [mem 0xfffc0000-0xffffffff pref]: no compatible bridge window
[    8.174454] pci 0000:00:05.0: BAR 6: assigned [mem 0xc1440000-0xc147ffff pref]
[    8.175888] pci 0000:00:02.0: BAR 6: assigned [mem 0xc1410000-0xc141ffff pref]
[    8.177309] pci 0000:00:10.0: BAR 0: assigned [io  0x1000-0x107f]
[    8.179307] pci 0000:00:03.0: BAR 0: assigned [io  0x1080-0x109f]
[    8.181211] pci 0000:00:05.0: BAR 0: assigned [io  0x10a0-0x10bf]
[    8.183110] pci 0000:00:01.1: BAR 4: assigned [io  0x10c0-0x10cf]
[    8.184996] pci 0000:00:11.0: PCI bridge to [bus 01]
[    8.186005] pci 0000:00:11.0:   bridge window [io  0xd000-0xdfff]
[    8.188298] pci 0000:00:11.0:   bridge window [mem 0xc1200000-0xc13fffff]
[    8.191756] pci 0000:00:12.0: PCI bridge to [bus 02]
[    8.192771] pci 0000:00:12.0:   bridge window [io  0xc000-0xcfff]
[    8.195032] pci 0000:00:12.0:   bridge window [mem 0xc1000000-0xc11fffff]
[    8.198457] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
[    8.200195] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff window]
[    8.201422] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[    8.202779] pci_bus 0000:00: resource 7 [mem 0xc0000000-0xfebfffff window]
[    8.204137] pci_bus 0000:00: resource 8 [mem 0x2000000000-0x207fffffff window]
[    8.205573] pci_bus 0000:01: resource 0 [io  0xd000-0xdfff]
[    8.206683] pci_bus 0000:01: resource 1 [mem 0xc1200000-0xc13fffff]
[    8.207921] pci_bus 0000:02: resource 0 [io  0xc000-0xcfff]
[    8.209025] pci_bus 0000:02: resource 1 [mem 0xc1000000-0xc11fffff]
[    8.213676] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    8.214845] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    8.216044] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    8.217412] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    8.219257] PCI: CLS 0 bytes, default 64
[    8.220605] Trying to unpack rootfs image as initramfs...
Connection timed out during banner exchange
<20>[   10.535028] Freeing initrd memory: 21740K
[   10.535903] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[   10.537161] software IO TLB: mapped [mem 0x00000000b7e5e000-0x00000000bbe5e000] (64MB)
[   10.545696] kvm: no hardware support
[   10.546439] has_svm: not amd or hygon
[   10.547181] kvm: no hardware support
[   10.547897] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x3985c314e25, max_idle_ns: 881590612270 ns
[   10.550126] mce: Machine check injector initialized
[   10.555487] check: Scanning for low memory corruption every 60 seconds
[   10.558373] CPU feature 'AVX registers' is not supported.
[   10.560740] CPU feature 'AVX registers' is not supported.
[   10.561819] AVX2 instructions are not detected.
[   10.564259] AVX or AES-NI instructions are not detected.
[   10.565321] AVX2 or AES-NI instructions are not detected.
[   10.567208] CPU feature 'AVX registers' is not supported.
[   10.568284] CPU feature 'AVX registers' is not supported.
[   10.569358] PCLMULQDQ-NI instructions are not detected.
[   10.593762] Initialise system trusted keyrings
[   10.594785] Key type blacklist registered
[   10.595897] workingset: timestamp_bits=36 max_order=20 bucket_order=0
[   10.636285] zbud: loaded
[   10.643362] DLM installed
[   10.650246] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[   10.655932] FS-Cache: Netfs 'nfs' registered for caching
[   10.659415] NFS: Registering the id_resolver key type
[   10.660461] Key type id_resolver registered
[   10.661294] Key type id_legacy registered
[   10.662479] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[   10.663808] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[   10.669344] FS-Cache: Netfs 'cifs' registered for caching
[   10.672454] Key type cifs.spnego registered
[   10.673346] Key type cifs.idmap registered
[   10.674182] jffs2: version 2.2. (NAND) (SUMMARY)  © 2001-2006 Red Hat, Inc.
[   10.679101] fuse: init (API version 7.33)
[   10.681992] SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled
[   10.691456] ocfs2: Registered cluster interface o2cb
[   10.692721] ocfs2: Registered cluster interface user
[   10.694154] OCFS2 User DLM kernel interface loaded
[   10.700947] gfs2: GFS2 installed
[   10.703693] FS-Cache: Netfs 'ceph' registered for caching
[   10.704763] ceph: loaded (mds proto 32)
[   10.708143] Allocating IMA blacklist keyring.
[   10.727208] NET: Registered protocol family 38
[   10.728141] xor: measuring software checksum speed
[   10.730016]    prefetch64-sse  : 10806 MB/sec
[   10.731865]    generic_sse     : 10207 MB/sec
[   10.732733] xor: using function: prefetch64-sse (10806 MB/sec)
[   10.733909] async_tx: api initialized (async)
[   10.734780] Key type asymmetric registered
[   10.735603] Asymmetric key parser 'x509' registered
[   10.736572] Key type pkcs7_test registered
[   10.737480] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
[   10.739123] io scheduler mq-deadline registered
[   10.740018] io scheduler kyber registered
Connection timed out during banner exchange
<19>[   20.714496] atomic64_test: passed for x86-64 platform with CX8 and with SSE
[   20.776736] acpiphp_ibm: ibm_acpiphp_init: acpi_walk_namespace failed
[   20.778226] hv_vmbus: registering driver hv_pci
[   20.780234] VIA Graphics Integration Chipset framebuffer 2.4 initializing
[   20.785303] hv_vmbus: registering driver hyperv_fb
[   20.786781] vga16fb: initializing
[   20.787480] vga16fb: mapped to 0x(____ptrval____)
[   21.062874] Console: switching to colour frame buffer device 80x30
[   21.247676] fb0: VGA16 VGA frame buffer device
[   21.267378] IPMI message handler: version 39.2
[   21.270227] ipmi device interface
[   21.271074] ipmi_si: IPMI System Interface driver
[   21.272580] ipmi_si: Unable to find any System Interface(s)
[   21.273684] ipmi_ssif: IPMI SSIF Interface driver
[   21.274789] IPMI Watchdog: driver initialized
[   21.275651] IPMI poweroff: Copyright (C) 2004 MontaVista Software - IPMI Powerdown via sys_reboot
[   21.281007] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[   21.282981] ACPI: Power Button [PWRF]
[   21.288499] EINJ: EINJ table not found.
[   21.289281] ERST DBG: ERST support is disabled.
[   21.290878] ioatdma: Intel(R) QuickData Technology Driver 5.00
[   23.406676] PCI Interrupt Link [LNKC] enabled at IRQ 10
[   25.595148] PCI Interrupt Link [LNKA] enabled at IRQ 11
[   27.942571] PCI Interrupt Link [LNKD] enabled at IRQ 10
[   27.946961] N_HDLC line discipline registered with maxframe=4096
[   27.948233] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[   27.950310] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[   27.965003] Cyclades driver 2.6
[   27.965846] Initializing Nozomi driver 2.1d
[   27.966890] SyncLink GT
[   27.967441] SyncLink GT, tty major#237
[   27.968351] SyncLink GT no devices found
[   27.970246] mmtimer: Hardware unsupported
[   27.978787] lp: driver loaded but no devices found
[   27.980756] Non-volatile memory driver v1.3
[   27.983384] random: fast init done
[   27.984361] random: crng init done
[   27.996500] ppdev: user-space parallel port driver
[   27.997548] telclk_interrupt = 0xf non-mcpbl0010 hw.
[   27.998545] Linux agpgart interface v0.103
[   28.000643] Hangcheck: starting hangcheck timer 0.9.1 (tick is 180 seconds, margin is 60 seconds).
[   28.002999] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[   28.004314] AMD-Vi: AMD IOMMUv2 functionality not available on this system
[   28.005950] [drm] radeon kernel modesetting enabled.
[   28.012643] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
[   28.014361] usbcore: registered new interface driver udl
[   28.015842] checking generic (a0000 10000) vs hw (c0000000 1000000)
[   28.017087] checking generic (a0000 10000) vs hw (c1402000 1000)
[   28.018286] fb0: switching to bochsdrmfb from VGA16 VGA
[   28.079637] Console: switching to colour dummy device 80x25
[   28.080943] bochs-drm 0000:00:02.0: vgaarb: deactivate vga console
[   28.082910] bochs-drm 0000:00:02.0: BAR 0: can't reserve [mem 0xc0000000-0xc0ffffff pref]
[   28.084542] [drm] Cannot request framebuffer, boot fb still active?
[   28.085857] [drm] Found bochs VGA, ID 0xb0c5.
[   28.086723] [drm] Framebuffer size 16384 kB @ 0xc0000000, mmio @ 0xc1402000.
[   28.088831] [TTM] Zone  kernel: Available graphics memory: 1470040 KiB
[   28.092814] [drm] Found EDID data blob.
[   28.095050] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 1
[   28.103101] fbcon: bochs-drmdrmfb (fb0) is primary device
[   28.135899] Console: switching to colour frame buffer device 128x48
[   28.174382] bochs-drm 0000:00:02.0: [drm] fb0: bochs-drmdrmfb frame buffer device
[   28.179569] parport_pc 00:04: reported by Plug and Play ACPI
[   28.181673] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[   28.276543] lp0: using parport0 (interrupt-driven).
[   28.277560] lp0: console ready
[   28.291822] Floppy drive(s): fd0 is 2.88M AMI BIOS
[   28.308677] FDC 0 is a S82078B
[   28.317841] brd: module loaded
[   28.318785] loop: module loaded
[   28.320803] MM: desc_per_page = 128
[   28.392353] virtio_blk virtio2: [vda] 97677312 512-byte logical blocks (50.0 GB/46.6 GiB)
[   28.399859]  vda: vda1 vda2 vda3
[   28.511614] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
[   28.512880] drbd: built-in
[   28.513453] drbd: registered as block device major 147
[   28.515688] rbd: loaded (major 250)
[   28.516401] mtip32xx Version 1.3.1
[   28.518982] zram: Added device: zram0
[   28.524083] null_blk: module loaded
[   28.526957] Guest personality initialized and is inactive
[   28.528455] VMCI host device registered (name=vmci, major=10, minor=119)
[   28.529801] Initialized host personality
[   28.532074] usbcore: registered new interface driver viperboard
[   28.541026] Loading iSCSI transport class v2.0-870.
[   28.544669] rdac: device handler registered
[   28.545884] hp_sw: device handler registered
[   28.546743] emc: device handler registered
[   28.547910] alua: device handler registered
[   28.550322] fnic: Cisco FCoE HBA Driver, ver 1.6.0.53
[   28.551606] fnic: Successfully Initialized Trace Buffer
[   28.553098] fnic: Successfully Initialized FC_CTLR Trace Buffer
[   28.555984] snic:Cisco SCSI NIC Driver, ver 0.0.1.18
[   28.557185] snic:Trace Facility Enabled.
[   28.557185]  Trace Buffer SZ 16 Pages.
[   28.560227] bnx2fc: QLogic FCoE Driver bnx2fc v2.12.13 (October 15, 2015)
[   28.562747] [0000:00:00.0]:[qedf_init:4053]: QLogic FCoE Offload Driver v8.42.3.0.
[   28.565219] iscsi: registered transport (tcp)
[   28.566771] Adaptec aacraid driver 1.2.1[50983]-custom
[   28.568016] aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded
[   28.570332] isci: Intel(R) C600 SAS Controller Driver - version 1.2.0
[   28.572411] qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 10.02.00.104-k.
[   28.574681] iscsi: registered transport (qla4xxx)
[   28.575770] QLogic iSCSI HBA Driver
[   28.576473] Emulex LightPulse Fibre Channel SCSI driver 12.8.0.6
[   28.577650] Copyright (C) 2017-2020 Broadcom. All Rights Reserved. The term "Broadcom" refers to Broadcom Inc. and/or its subsidiaries.
[   28.580558] QLogic BR-series BFA FC/FCOE SCSI driver - version: 3.2.25.1
[   28.582090] csiostor: Chelsio FCoE driver 1.0.0-ko
[   28.583520] Microsemi PQI Driver (v1.2.16-012)
[   28.584804] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
[   28.586554] megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
[   28.588122] megasas: 07.714.04.00-rc1
[   28.589146] mpt3sas version 36.100.00.00 loaded
[   28.591248] 3ware 9000 Storage Controller device driver for Linux v2.26.02.014.
[   28.592865] LSI 3ware SAS/SATA-RAID Controller device driver for Linux v3.26.02.000.
[   28.594526] ppa: Version 2.07 (for Linux 2.4.x)
[   28.598872] imm: Version 2.05 (for Linux 2.4.0)
[   28.601538] RocketRAID 3xxx/4xxx Controller driver v1.10.0
[   28.602832] stex: Promise SuperTrak EX Driver version: 6.02.0000.01
[   28.604545] libcxgbi:libcxgbi_init_module: Chelsio iSCSI driver library libcxgbi v0.9.1-ko (Apr. 2015)
[   28.606371] Chelsio T3 iSCSI Driver cxgb3i v2.0.1-ko (Apr. 2015)
[   28.607710] iscsi: registered transport (cxgb3i)
[   28.608634] Chelsio T4-T6 iSCSI Driver cxgb4i v0.9.5-ko (Apr. 2015)
[   28.610067] iscsi: registered transport (cxgb4i)
[   28.610992] QLogic NetXtreme II iSCSI Driver bnx2i v2.7.10.1 (Jul 16, 2014)
[   28.612488] iscsi: registered transport (bnx2i)
[   28.614550] iscsi: registered transport (qedi)
[   28.623941] iscsi: registered transport (be2iscsi)
[   28.624911] In beiscsi_module_init, tt=00000000bd936c63
[   28.627128] VMware PVSCSI driver - version 1.0.7.0-k
[   28.628309] hv_vmbus: registering driver hv_storvsc
[   28.629300] st: Version 20160209, fixed bufsize 32768, s/g segs 256
[   28.631477] SCSI Media Changer driver v0.25 
[   28.644303] scsi host0: scsi_debug: version 0190 [20200710]
[   28.644303]   dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
[   28.654444] scsi 0:0:0:0: Direct-Access     Linux    scsi_debug       0190 PQ: 0 ANSI: 7
[   28.658498] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   28.658573] sd 0:0:0:0: Power-on or device reset occurred
[   28.660179] ata_piix 0000:00:01.1: version 2.13
[   28.663314] sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
[   28.665901] sd 0:0:0:0: [sda] Write Protect is off
[   28.666692] scsi host1: ata_piix
[   28.666924] sd 0:0:0:0: [sda] Mode Sense: 73 00 10 08
[   28.669134] scsi host2: ata_piix
[   28.670502] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0x10c0 irq 14
[   28.670752] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[   28.671845] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0x10c8 irq 15
[   28.676817] sd 0:0:0:0: [sda] Optimal transfer size 524288 bytes
[   28.684481] Rounding down aligned max_sectors from 4294967295 to 4294967288
[   28.686266] db_root: cannot open: /etc/target
[   28.690660] SSFDC read-only Flash Translation layer
[   28.692031] mtdoops: mtd device (mtddev=name/number) must be supplied
[   28.693317] device id = 2440
[   28.693898] device id = 2480
[   28.694489] device id = 24c0
[   28.695078] device id = 24d0
[   28.695666] device id = 25a1
[   28.696264] device id = 2670
[   28.697234] Ramix PMC551 PCI Mezzanine Ram Driver. (C) 1999,2000 Nortel Networks.
[   28.698697] pmc551: not detected
[   28.702521] sd 0:0:0:0: [sda] Attached SCSI disk
[   28.707276] eql: Equalizer2002: Simon Janes (simon@ncm.com) and David S. Miller (davem@redhat.com)
[   28.719711] MACsec IEEE 802.1AE
[   28.726157] libphy: Fixed MDIO Bus: probed
[   28.740373] tun: Universal TUN/TAP device driver, 1.6
[   28.746969] vcan: Virtual CAN interface driver
[   28.747880] slcan: serial line CAN interface driver
[   28.748845] slcan: 10 dynamic interface channels.
[   28.749789] CAN device driver interface
[   28.750707] usbcore: registered new interface driver usb_8dev
[   28.751975] usbcore: registered new interface driver ems_usb
[   28.753238] usbcore: registered new interface driver esd_usb2
[   28.754485] usbcore: registered new interface driver gs_usb
[   28.755750] usbcore: registered new interface driver kvaser_usb
[   28.757032] usbcore: registered new interface driver peak_usb
[   28.758352] cc770: CAN netdevice driver
[   28.759348] sja1000 CAN netdevice driver
[   28.770364] cnic: QLogic cnicDriver v2.5.22 (July 20, 2015)
[   28.780176] bnxt_en 0000:00:04.0 eth1: Broadcom NetXtreme-E Ethernet Virtual Function found at mem 2000104000, node addr 36:bd:8c:d7:40:33
[   28.782679] bnxt_en 0000:00:04.0: 0.000 Gb/s available PCIe bandwidth (Unknown x8 link)
[   28.788573] e100: Intel(R) PRO/100 Network Driver
[   28.789530] e100: Copyright(c) 1999-2006 Intel Corporation
[   28.790787] e1000: Intel(R) PRO/1000 Network Driver
[   28.791756] e1000: Copyright (c) 1999-2006 Intel Corporation.
[   28.793098] e1000e: Intel(R) PRO/1000 Network Driver
[   28.794085] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[   28.795494] igb: Intel(R) Gigabit Ethernet Network Driver
[   28.796558] igb: Copyright (c) 2007-2014 Intel Corporation.
[   28.797829] Intel(R) 2.5G Ethernet Linux Driver
[   28.798734] Copyright(c) 2018 Intel Corporation.
[   28.799824] igbvf: Intel(R) Gigabit Virtual Function Network Driver
[   28.801067] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[   28.802382] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver
[   28.803582] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[   28.805390] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver
[   28.806915] ixgbevf: Copyright (c) 2009 - 2018 Intel Corporation.
[   28.808634] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[   28.809885] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[   28.811556] ixgb: Intel(R) PRO/10GbE Network Driver
[   28.812535] ixgb: Copyright (c) 1999-2008 Intel Corporation.
[   28.813844] iavf: Intel(R) Ethernet Adaptive Virtual Function Network Driver
[   28.815229] Copyright (c) 2013 - 2018 Intel Corporation.
[   28.816830] Intel(R) Ethernet Switch Host Interface Driver
[   28.817922] Copyright(c) 2013 - 2019 Intel Corporation.
[   28.819485] ice: Intel(R) Ethernet Connection E800 Series Linux Driver
[   28.820768] ice: Copyright (c) 2018, Intel Corporation.
[   28.822368] jme: JMicron JMC2XX ethernet driver version 1.0.8
[   28.823927] sky2: driver version 1.30
[   28.825960] myri10ge: Version 1.5.3-1.534
[   28.827108] vxge: Copyright(c) 2002-2010 Exar Corp.
[   28.828078] vxge: Driver version: 2.5.3.22640-k
[   28.829539] QLogic 1/10 GbE Converged/Intelligent Ethernet Driver v5.3.66
[   28.834195] ata1.01: NODEV after polling detection
[   28.835531] ata2.01: NODEV after polling detection
[   28.836886] ata1.00: ATA-7: QEMU HARDDISK, 2.5+, max UDMA/100
[   28.837854] QLogic/NetXen Network Driver v4.0.82
[   28.838046] ata1.00: 31457280 sectors, multi 16: LBA48 
[   28.839163] QLogic FastLinQ 4xxxx Core Module qed 8.37.0.20
[   28.840611] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[   28.841239] qede_init: QLogic FastLinQ 4xxxx Ethernet Driver qede 8.37.0.20
[   28.841239] 
[   28.845366] scsi 1:0:0:0: Direct-Access     ATA      QEMU HARDDISK    2.5+ PQ: 0 ANSI: 5
[   28.845708] Solarflare NET driver
[   28.849223] sd 1:0:0:0: Attached scsi generic sg1 type 0
[   28.849927] tehuti: Tehuti Networks(R) Network Driver, 7.29.3
[   28.851442] tehuti: Options: hw_csum 
[   28.852085] scsi 2:0:0:0: CD-ROM            QEMU     QEMU DVD-ROM     2.5+ PQ: 0 ANSI: 5
[   28.852424] tlan: ThunderLAN driver v1.17
[   28.855010] tlan: 0 devices installed, PCI: 0  EISA: 0
[   28.856400] PPP generic driver version 2.4.2
[   28.857878] PPP BSD Compression module registered
[   28.857962] sd 1:0:0:0: [sdb] 31457280 512-byte logical blocks: (16.1 GB/15.0 GiB)
[   28.858830] PPP Deflate Compression module registered
[   28.860439] sd 1:0:0:0: [sdb] Write Protect is off
[   28.861403] PPP MPPE Compression module registered
[   28.862370] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   28.863298] NET: Registered protocol family 24
[   28.864442] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   28.865257] PPTP driver version 0.8.5
[   28.868414] SLIP: version 0.8.4-NET3.019-NEWTTY (dynamic channels, max=256).
[   28.869593]  sdb: sdb1 sdb2
[   28.870032] CSLIP: code copyright 1989 Regents of the University of California.
[   28.872041] SLIP linefill/keepalive option.
[   28.872895] hdlc: HDLC support module revision 1.22
[   28.874631] usbcore: registered new interface driver ath9k_htc
[   28.875086] sd 1:0:0:0: [sdb] Attached SCSI disk
[   28.875947] usbcore: registered new interface driver carl9170
[   28.878305] Atmel at76x USB Wireless LAN Driver 0.17 loading
[   28.879600] usbcore: registered new interface driver at76c50x-usb
[   28.881077] Broadcom 43xx driver loaded [ Features: PNLS ]
[   28.882570] Broadcom 43xx-legacy driver loaded [ Features: PLID ]
[   28.882622] sr 2:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[   28.884204] usbcore: registered new interface driver brcmfmac
[   28.885136] cdrom: Uniform CD-ROM driver Revision: 3.20
[   28.886608] airo(): Probing for PCI adapters
[   28.888456] airo(): Finished probing for PCI adapters
[   28.889562] ipw2100: Intel(R) PRO/Wireless 2100 Network Driver, git-1.2.2
[   28.890895] ipw2100: Copyright(c) 2003-2006 Intel Corporation
[   28.892252] ipw2200: Intel(R) PRO/Wireless 2200/2915 Network Driver, 1.2.2kdmprq
[   28.893695] ipw2200: Copyright(c) 2003-2006 Intel Corporation
[   28.895008] libipw: 802.11 data/management/control stack, git-1.1.13
[   28.896264] libipw: Copyright (C) 2004-2005 Intel Corporation <jketreno@linux.intel.com>
[   28.897836] iwl4965: Intel(R) Wireless WiFi 4965 driver for Linux, in-tree:d
[   28.899212] iwl4965: Copyright(c) 2003-2011 Intel Corporation
[   28.900525] iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, in-tree:ds
[   28.902315] iwl3945: Copyright(c) 2003-2011 Intel Corporation
[   28.903444] iwl3945: hw_scan is disabled
[   28.904411] Intel(R) Wireless WiFi driver for Linux
[   28.906390] orinoco 0.15 (David Gibson <hermes@gibson.dropbear.id.au>, Pavel Roskin <proski@gnu.org>, et al)
[   28.908408] orinoco_plx 0.15 (Pavel Roskin <proski@gnu.org>, David Gibson <hermes@gibson.dropbear.id.au>, Daniel Barlow <dan@telent.net>)
[   28.910945] orinoco_tmd 0.15 (Joerg Dorchain <joerg@dorchain.net>)
[   28.912309] orinoco_nortel 0.15 (Tobias Hoffmann & Christoph Jungegger <disdos@traum404.de>)
[   28.914326] usbcore: registered new interface driver p54usb
[   28.915746] usbcore: registered new interface driver usb8xxx
[   28.916938] libertas_sdio: Libertas SDIO driver
[   28.917841] libertas_sdio: Copyright Pierre Ossman
[   28.919364] usbcore: registered new interface driver lbtf_usb
[   28.920932] usbcore: registered new interface driver mwifiex_usb
[   28.923128] usbcore: registered new interface driver rt2500usb
[   28.924433] usbcore: registered new interface driver rt73usb
[   28.925711] usbcore: registered new interface driver rt2800usb
[   28.927135] usbcore: registered new interface driver rtl8187
[   28.935550] usbcore: registered new interface driver rtl8192cu
[   28.938442] usbcore: registered new interface driver zd1211rw
[   28.939753] usbcore: registered new interface driver zd1201
[   28.941379] usbcore: registered new interface driver rndis_wlan
[   28.942690] mac80211_hwsim: initializing netlink
[   28.944417] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[   28.951370] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[   28.952958] sr 2:0:0:0: Attached scsi CD-ROM sr0
[   28.954896] sr 2:0:0:0: Attached scsi generic sg2 type 5
[   28.959292] fakelb driver is marked as deprecated, please use mac802154_hwsim!
[   28.965035] ieee802154fakelb ieee802154fakelb: added 2 fake ieee802154 hardware devices
[   28.966754] VMware vmxnet3 virtual NIC driver - version 1.5.0.0-k-NAPI
[   28.968377] usbcore: registered new interface driver catc
[   28.969564] usbcore: registered new interface driver kaweth
[   28.970663] pegasus: v0.9.3 (2013/04/25), Pegasus/Pegasus II USB Ethernet driver
[   28.972252] usbcore: registered new interface driver pegasus
[   28.973508] usbcore: registered new interface driver rtl8150
[   28.974789] usbcore: registered new interface driver r8152
[   28.975872] hso: drivers/net/usb/hso.c: Option Wireless
[   28.977178] usbcore: registered new interface driver hso
[   28.978398] usbcore: registered new interface driver asix
[   28.979586] usbcore: registered new interface driver ax88179_178a
[   28.980917] usbcore: registered new interface driver cdc_ether
[   28.982202] usbcore: registered new interface driver cdc_eem
[   28.983453] usbcore: registered new interface driver dm9601
[   28.984667] usbcore: registered new interface driver sr9700
[   28.993244] usbcore: registered new interface driver CoreChips
[   28.994575] usbcore: registered new interface driver smsc75xx
[   28.995865] usbcore: registered new interface driver smsc95xx
[   28.997130] usbcore: registered new interface driver gl620a
[   28.998373] usbcore: registered new interface driver net1080
[   28.999622] usbcore: registered new interface driver plusb
[   29.000829] usbcore: registered new interface driver rndis_host
[   29.002129] usbcore: registered new interface driver cdc_subset
[   29.003427] usbcore: registered new interface driver zaurus
[   29.004665] usbcore: registered new interface driver MOSCHIP usb-ethernet driver
[   29.006247] usbcore: registered new interface driver int51x1
[   29.007487] usbcore: registered new interface driver cdc_phonet
[   29.008791] usbcore: registered new interface driver kalmia
[   29.010022] usbcore: registered new interface driver ipheth
[   29.011265] usbcore: registered new interface driver sierra_net
[   29.012558] usbcore: registered new interface driver cx82310_eth
[   29.013912] usbcore: registered new interface driver cdc_ncm
[   29.015168] usbcore: registered new interface driver huawei_cdc_ncm
[   29.016549] usbcore: registered new interface driver lg-vl600
[   29.017818] usbcore: registered new interface driver qmi_wwan
[   29.019103] usbcore: registered new interface driver cdc_mbim
[   29.020251] hv_vmbus: registering driver hv_netvsc
[   29.021218] Fusion MPT base driver 3.04.20
[   29.022026] Copyright (c) 1999-2008 LSI Corporation
[   29.023017] Fusion MPT SPI Host driver 3.04.20
[   29.024084] Fusion MPT FC Host driver 3.04.20
[   29.025169] Fusion MPT SAS Host driver 3.04.20
[   29.026246] Fusion MPT misc device (ioctl) driver 3.04.20
[   29.027630] mptctl: Registered with Fusion MPT base driver
[   29.028729] mptctl: /dev/mptctl @ (major,minor=10,220)
[   29.029758] Fusion MPT LAN driver 3.04.20
[   29.032653] hv_vmbus: registering driver uio_hv_generic
[   29.034174] VFIO - User Level meta-driver version: 0.3
[   29.036238] parport0: cannot grant exclusive access for device ks0108
[   29.037515] ks0108: ERROR: parport didn't register new device
[   29.038647] cfag12864b: ERROR: ks0108 is not initialized
[   29.039702] cfag12864bfb: ERROR: cfag12864b is not initialized
[   29.043773] aoe: AoE v85 initialised.
[   29.045195] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   29.046510] ehci-pci: EHCI PCI platform driver
[   29.047616] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[   29.048978] ohci-pci: OHCI PCI platform driver
[   29.050024] uhci_hcd: USB Universal Host Controller Interface driver
[   29.052047] driver u132_hcd
[   29.053197] usbcore: registered new interface driver cdc_acm
[   29.054323] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
[   29.056052] usbcore: registered new interface driver usblp
[   29.057293] usbcore: registered new interface driver cdc_wdm
[   29.058562] usbcore: registered new interface driver usbtmc
[   29.060110] usbcore: registered new interface driver uas
[   29.061343] usbcore: registered new interface driver usb-storage
[   29.062655] usbcore: registered new interface driver ums-alauda
[   29.063931] usbcore: registered new interface driver ums-cypress
[   29.065243] usbcore: registered new interface driver ums-datafab
[   29.066536] usbcore: registered new interface driver ums_eneub6250
[   29.067887] usbcore: registered new interface driver ums-freecom
[   29.069174] usbcore: registered new interface driver ums-isd200
[   29.070464] usbcore: registered new interface driver ums-jumpshot
[   29.071769] usbcore: registered new interface driver ums-karma
[   29.073066] usbcore: registered new interface driver ums-onetouch
[   29.074420] usbcore: registered new interface driver ums-realtek
[   29.075737] usbcore: registered new interface driver ums-sddr09
[   29.077018] usbcore: registered new interface driver ums-sddr55
[   29.078313] usbcore: registered new interface driver ums-usbat
[   29.079632] usbcore: registered new interface driver mdc800
[   29.080736] mdc800: v0.7.5 (30/10/2000):USB Driver for Mustek MDC800 Digital Camera
[   29.082368] usbcore: registered new interface driver microtekX6
[   29.083867] usbcore: registered new interface driver usbserial_generic
[   29.085237] usbserial: USB Serial support registered for generic
[   29.092837] usbcore: registered new interface driver aircable
[   29.094071] usbserial: USB Serial support registered for aircable
[   29.095435] usbcore: registered new interface driver ark3116
[   29.096625] usbserial: USB Serial support registered for ark3116
[   29.097941] usbcore: registered new interface driver belkin_sa
[   29.099185] usbserial: USB Serial support registered for Belkin / Peracom / GoHubs USB Serial Adapter
[   29.101142] usbcore: registered new interface driver ch341
[   29.102302] usbserial: USB Serial support registered for ch341-uart
[   29.103668] usbcore: registered new interface driver cp210x
[   29.104838] usbserial: USB Serial support registered for cp210x
[   29.106148] usbcore: registered new interface driver cyberjack
[   29.107372] usbserial: USB Serial support registered for Reiner SCT Cyberjack USB card reader
[   29.109175] usbcore: registered new interface driver cypress_m8
[   29.110419] usbserial: USB Serial support registered for DeLorme Earthmate USB
[   29.111917] usbserial: USB Serial support registered for HID->COM RS232 Adapter
[   29.113415] usbserial: USB Serial support registered for Nokia CA-42 V2 Adapter
[   29.115000] usbcore: registered new interface driver usb_debug
[   29.116227] usbserial: USB Serial support registered for debug
[   29.117460] usbserial: USB Serial support registered for xhci_dbc
[   29.118775] usbcore: registered new interface driver digi_acceleport
[   29.120112] usbserial: USB Serial support registered for Digi 2 port USB adapter
[   29.121635] usbserial: USB Serial support registered for Digi 4 port USB adapter
[   29.123222] usbcore: registered new interface driver io_edgeport
[   29.124474] usbserial: USB Serial support registered for Edgeport 2 port adapter
[   29.126001] usbserial: USB Serial support registered for Edgeport 4 port adapter
[   29.127518] usbserial: USB Serial support registered for Edgeport 8 port adapter
[   29.129045] usbserial: USB Serial support registered for EPiC device
[   29.130415] usbcore: registered new interface driver io_ti
[   29.131575] usbserial: USB Serial support registered for Edgeport TI 1 port adapter
[   29.133135] usbserial: USB Serial support registered for Edgeport TI 2 port adapter
[   29.134773] usbcore: registered new interface driver empeg
[   29.135937] usbserial: USB Serial support registered for empeg
[   29.137235] usbcore: registered new interface driver f81534a_ctrl
[   29.138555] usbcore: registered new interface driver f81232
[   29.139735] usbserial: USB Serial support registered for f81232
[   29.140995] usbserial: USB Serial support registered for f81534a
[   29.142316] usbcore: registered new interface driver ftdi_sio
[   29.143523] usbserial: USB Serial support registered for FTDI USB Serial Device
[   29.145095] usbcore: registered new interface driver garmin_gps
[   29.146350] usbserial: USB Serial support registered for Garmin GPS usb/tty
[   29.147865] usbcore: registered new interface driver ipaq
[   29.149004] usbserial: USB Serial support registered for PocketPC PDA
[   29.150411] usbcore: registered new interface driver ipw
[   29.151535] usbserial: USB Serial support registered for IPWireless converter
[   29.153079] usbcore: registered new interface driver ir_usb
[   29.154266] usbserial: USB Serial support registered for IR Dongle
[   29.155615] usbcore: registered new interface driver iuu_phoenix
[   29.156886] usbserial: USB Serial support registered for iuu_phoenix
[   29.158281] usbcore: registered new interface driver keyspan
[   29.159469] usbserial: USB Serial support registered for Keyspan - (without firmware)
[   29.161111] usbserial: USB Serial support registered for Keyspan 1 port adapter
[   29.162617] usbserial: USB Serial support registered for Keyspan 2 port adapter
[   29.164132] usbserial: USB Serial support registered for Keyspan 4 port adapter
[   29.165694] usbcore: registered new interface driver keyspan_pda
[   29.166959] usbserial: USB Serial support registered for Keyspan PDA
[   29.168291] usbserial: USB Serial support registered for Keyspan PDA - (prerenumeration)
[   29.170009] usbcore: registered new interface driver kl5kusb105
[   29.171250] usbserial: USB Serial support registered for KL5KUSB105D / PalmConnect
[   29.172885] usbcore: registered new interface driver kobil_sct
[   29.174114] usbserial: USB Serial support registered for KOBIL USB smart card terminal
[   29.175802] usbcore: registered new interface driver mct_u232
[   29.177012] usbserial: USB Serial support registered for MCT U232
[   29.178348] usbcore: registered new interface driver metro_usb
[   29.179567] usbserial: USB Serial support registered for Metrologic USB to Serial
[   29.181169] usbcore: registered new interface driver mos7720
[   29.182392] usbserial: USB Serial support registered for Moschip 2 port adapter
[   29.183960] usbcore: registered new interface driver mos7840
[   29.185164] usbserial: USB Serial support registered for Moschip 7840/7820 USB Serial Driver
[   29.186946] usbcore: registered new interface driver mxuport
[   29.188135] usbserial: USB Serial support registered for MOXA UPort
[   29.189517] usbcore: registered new interface driver navman
[   29.190691] usbserial: USB Serial support registered for navman
[   29.191990] usbcore: registered new interface driver omninet
[   29.193174] usbserial: USB Serial support registered for ZyXEL - omni.net lcd plus usb
[   29.194866] usbcore: registered new interface driver opticon
[   29.196093] usbserial: USB Serial support registered for opticon
[   29.197414] usbcore: registered new interface driver option
[   29.198583] usbserial: USB Serial support registered for GSM modem (1-port)
[   29.200094] usbcore: registered new interface driver oti6858
[   29.201299] usbserial: USB Serial support registered for oti6858
[   29.202630] usbcore: registered new interface driver pl2303
[   29.203799] usbserial: USB Serial support registered for pl2303
[   29.205107] usbcore: registered new interface driver qcaux
[   29.213363] usbserial: USB Serial support registered for qcaux
[   29.214679] usbcore: registered new interface driver qcserial
[   29.215893] usbserial: USB Serial support registered for Qualcomm USB modem
[   29.217395] usbcore: registered new interface driver quatech2
[   29.218603] usbserial: USB Serial support registered for Quatech 2nd gen USB to Serial Driver
[   29.220405] usbcore: registered new interface driver safe_serial
[   29.221671] usbserial: USB Serial support registered for safe_serial
[   29.223052] usbcore: registered new interface driver sierra
[   29.224249] usbserial: USB Serial support registered for Sierra USB modem
[   29.225715] usbcore: registered new interface driver usb_serial_simple
[   29.227075] usbserial: USB Serial support registered for carelink
[   29.228368] usbserial: USB Serial support registered for zio
[   29.229571] usbserial: USB Serial support registered for funsoft
[   29.230836] usbserial: USB Serial support registered for flashloader
[   29.232171] usbserial: USB Serial support registered for google
[   29.233431] usbserial: USB Serial support registered for libtransistor
[   29.234781] usbserial: USB Serial support registered for vivopay
[   29.236064] usbserial: USB Serial support registered for moto_modem
[   29.237368] usbserial: USB Serial support registered for motorola_tetra
[   29.238750] usbserial: USB Serial support registered for novatel_gps
[   29.240075] usbserial: USB Serial support registered for hp4x
[   29.241287] usbserial: USB Serial support registered for suunto
[   29.242520] usbserial: USB Serial support registered for siemens_mpi
[   29.243924] usbcore: registered new interface driver spcp8x5
[   29.245123] usbserial: USB Serial support registered for SPCP8x5
[   29.246444] usbcore: registered new interface driver ssu100
[   29.247615] usbserial: USB Serial support registered for Quatech SSU-100 USB to Serial Driver
[   29.249422] usbcore: registered new interface driver symbolserial
[   29.250700] usbserial: USB Serial support registered for symbol
[   29.252019] usbcore: registered new interface driver ti_usb_3410_5052
[   29.253355] usbserial: USB Serial support registered for TI USB 3410 1 port adapter
[   29.254929] usbserial: USB Serial support registered for TI USB 5052 2 port adapter
[   29.256564] usbcore: registered new interface driver visor
[   29.257736] usbserial: USB Serial support registered for Handspring Visor / Palm OS
[   29.259308] usbserial: USB Serial support registered for Sony Clie 5.0
[   29.260679] usbserial: USB Serial support registered for Sony Clie 3.5
[   29.262090] usbcore: registered new interface driver wishbone_serial
[   29.263426] usbserial: USB Serial support registered for wishbone_serial
[   29.264876] usbcore: registered new interface driver whiteheat
[   29.266131] usbserial: USB Serial support registered for Connect Tech - WhiteHEAT - (prerenumeration)
[   29.267999] usbserial: USB Serial support registered for Connect Tech - WhiteHEAT
[   29.269599] usbcore: registered new interface driver xsens_mt
[   29.270819] usbserial: USB Serial support registered for xsens_mt
[   29.272189] usbcore: registered new interface driver adutux
[   29.273421] usbcore: registered new interface driver appledisplay
[   29.274772] usbcore: registered new interface driver emi26 - firmware loader
[   29.276292] usbcore: registered new interface driver emi62 - firmware loader
[   29.277679] ftdi_elan: driver ftdi-elan
[   29.278588] usbcore: registered new interface driver ftdi-elan
[   29.279873] usbcore: registered new interface driver idmouse
[   29.281143] usbcore: registered new interface driver iowarrior
[   29.282426] usbcore: registered new interface driver isight_firmware
[   29.283832] usbcore: registered new interface driver usblcd
[   29.285075] usbcore: registered new interface driver ldusb
[   29.286308] usbcore: registered new interface driver legousbtower
[   29.287646] usbcore: registered new interface driver uss720
[   29.288753] uss720: USB Parport Cable driver for Cables using the Lucent Technologies USS720 Chip
[   29.290486] uss720: NOTE: this is a special purpose driver to allow nonstandard
[   29.291912] uss720: protocols (eg. bitbang) over USS720 usb to parallel cables
[   29.293332] uss720: If you just want to connect to a printer, use usblp instead
[   29.294904] usbcore: registered new interface driver usbsevseg
[   29.296377] usbcore: registered new interface driver sisusb
[   29.297621] usbcore: registered new interface driver cxacru
[   29.298898] usbcore: registered new interface driver speedtch
[   29.300234] usbcore: registered new interface driver ueagle-atm
[   29.301410] xusbatm: malformed module parameters
[   29.303191] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[   29.306264] serio: i8042 KBD port at 0x60,0x64 irq 1
[   29.307289] serio: i8042 AUX port at 0x60,0x64 irq 12
[   29.309532] hv_vmbus: registering driver hyperv_keyboard
[   29.311098] mousedev: PS/2 mouse device common for all mice
[   29.313740] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[   29.315779] usbcore: registered new interface driver appletouch
[   29.317182] usbcore: registered new interface driver bcm5974
[   29.319067] usbcore: registered new interface driver synaptics_usb
[   29.320543] usbcore: registered new interface driver usb_acecad
[   29.321858] usbcore: registered new interface driver aiptek
[   29.323096] usbcore: registered new interface driver hanwang
[   29.324459] usbcore: registered new interface driver kbtab
[   29.326391] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input4
[   29.330570] input: VirtualPS/2 VMware VMMouse as /devices/platform/i8042/serio1/input/input3
[   29.333957] mk712: device not present
[   29.334899] usbcore: registered new interface driver usbtouchscreen
[   29.337363] apanel: Fujitsu BIOS signature 'FJKEYINF' not found...
[   29.338758] usbcore: registered new interface driver ati_remote2
[   29.340159] cm109: Keymap for Komunikate KIP1000 phone loaded
[   29.341455] usbcore: registered new interface driver cm109
[   29.342550] cm109: CM109 phone driver: 20080805 (C) Alfred E. Heggestad
[   29.344017] usbcore: registered new interface driver keyspan_remote
[   29.345888] input: PC Speaker as /devices/platform/pcspkr/input/input5
[   29.347912] usbcore: registered new interface driver powermate
[   29.349580] usbcore: registered new interface driver yealink
[   29.351136] rtc_cmos 00:00: RTC can wake from S4
[   29.354598] rtc_cmos 00:00: registered as rtc0
[   29.355732] rtc_cmos 00:00: setting system clock to 2021-02-23T20:55:50 UTC (1614113750)
[   29.357577] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram, hpet irqs
[   29.362133] i2c /dev entries driver
[   29.364220] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0
[   29.369373] eeprom 0-0050: eeprom driver is deprecated, please use at24 instead
[   29.379666] eeprom 0-0051: eeprom driver is deprecated, please use at24 instead
[   29.382694] eeprom 0-0052: eeprom driver is deprecated, please use at24 instead
[   29.385696] eeprom 0-0053: eeprom driver is deprecated, please use at24 instead
[   29.388661] eeprom 0-0054: eeprom driver is deprecated, please use at24 instead
[   29.391634] eeprom 0-0055: eeprom driver is deprecated, please use at24 instead
[   29.394584] eeprom 0-0056: eeprom driver is deprecated, please use at24 instead
[   29.402017] eeprom 0-0057: eeprom driver is deprecated, please use at24 instead
[   29.405225] usbcore: registered new interface driver i2c-diolan-u2c
[   29.406474] i2c-parport: adapter type unspecified
[   29.407549] usbcore: registered new interface driver RobotFuzz Open Source InterFace, OSIF
[   29.409463] usbcore: registered new interface driver i2c-tiny-usb
[   29.421905] IR NEC protocol handler initialized
[   29.422816] IR RC5(x/sz) protocol handler initialized
[   29.423814] IR RC6 protocol handler initialized
[   29.424709] IR JVC protocol handler initialized
[   29.425605] IR Sony protocol handler initialized
[   29.426523] IR SANYO protocol handler initialized
[   29.427452] IR Sharp protocol handler initialized
[   29.428384] IR MCE Keyboard/mouse protocol handler initialized
[   29.429526] IR XMP protocol handler initialized
[   29.430620] usbcore: registered new interface driver ati_remote
[   29.431932] usbcore: registered new interface driver imon
[   29.433232] usbcore: registered new interface driver mceusb
[   29.434732] usbcore: registered new interface driver redrat3
[   29.435990] usbcore: registered new interface driver streamzap
[   29.437285] Registered IR keymap rc-empty
[   29.438325] rc rc0: rc-core loopback device as /devices/virtual/rc/rc0
[   29.440015] input: rc-core loopback device as /devices/virtual/rc/rc0/input6
[   29.442315] usbcore: registered new interface driver iguanair
[   29.443629] usbcore: registered new interface driver ttusbir
[   29.444820] b2c2-flexcop: B2C2 FlexcopII/II(b)/III digital TV receiver chip loaded successfully
[   29.446594] saa7146: register extension 'budget dvb'
[   29.447827] saa7146: register extension 'budget_av'
[   29.449010] saa7146: register extension 'budget_ci dvb'
[   29.450227] saa7146: register extension 'budget_patch dvb'
[   29.451520] saa7146: register extension 'av7110'
[   29.453684] ngene: nGene PCIE bridge driver, Copyright (C) 2005-2007 Micronas
[   29.455282] ddbridge: Digital Devices PCIE bridge driver 0.9.33-integrated, Copyright (C) 2010-17 Digital Devices GmbH
[   29.457642] ivtv: Start initialization, version 1.4.3
[   29.458813] ivtv: End initialization
[   29.459543] ivtvfb: no cards found
[   29.460231] cx18:  Start initialization, version 1.5.1
[   29.461424] cx18:  End initialization
[   29.462157] cx18-alsa: module loading...
[   29.462933] cx23885: cx23885 driver version 0.0.4 loaded
[   29.464642] cx88_blackbird: cx2388x blackbird driver version 1.0.0 loaded
[   29.465975] cx8802: registering cx8802 driver, type: blackbird access: shared
[   29.467376] cx88_dvb: cx2388x dvb driver version 1.0.0 loaded
[   29.468505] cx8802: registering cx8802 driver, type: dvb access: shared
[   29.469800] bttv: driver version 0.9.19 loaded
[   29.470683] bttv: using 8 buffers with 2080k (520 pages) each for capture
[   29.472015] bttv: Host bridge needs ETBF enabled
[   29.473239] bt878: AUDIO driver version 0.0.0 loaded
[   29.474513] saa7134: saa7130/34: v4l2 driver version 0, 2, 17 loaded
[   29.476288] saa7164 driver loaded
[   29.477136] usbcore: registered new interface driver ttusb-dec
[   29.478461] usbcore: registered new interface driver ttusb
[   29.479714] usbcore: registered new interface driver dvb_usb_vp7045
[   29.481126] usbcore: registered new interface driver dvb_usb_vp702x
[   29.482507] usbcore: registered new interface driver dvb_usb_gp8psk
[   29.483925] usbcore: registered new interface driver dvb_usb_dtt200u
[   29.485319] usbcore: registered new interface driver dvb_usb_a800
[   29.486696] usbcore: registered new interface driver dvb_usb_dibusb_mb
[   29.488122] usbcore: registered new interface driver dvb_usb_dibusb_mc
[   29.489591] usbcore: registered new interface driver dvb_usb_nova_t_usb2
[   29.491045] usbcore: registered new interface driver dvb_usb_umt_010
[   29.492488] usbcore: registered new interface driver dvb_usb_m920x
[   29.493843] usbcore: registered new interface driver dvb_usb_digitv
[   29.495279] usbcore: registered new interface driver dvb_usb_cxusb
[   29.496650] usbcore: registered new interface driver dvb_usb_ttusb2
[   29.498047] usbcore: registered new interface driver dvb_usb_dib0700
[   29.499454] usbcore: registered new interface driver opera1
[   29.500720] usbcore: registered new interface driver dvb_usb_af9005
[   29.502136] usbcore: registered new interface driver pctv452e
[   29.503447] usbcore: registered new interface driver dw2102
[   29.504737] usbcore: registered new interface driver dvb_usb_dtv5100
[   29.506176] usbcore: registered new interface driver cinergyT2
[   29.507487] usbcore: registered new interface driver dvb_usb_az6027
[   29.508877] usbcore: registered new interface driver dvb_usb_technisat_usb2
[   29.510465] usbcore: registered new interface driver dvb_usb_af9015
[   29.511899] usbcore: registered new interface driver dvb_usb_af9035
[   29.513349] usbcore: registered new interface driver dvb_usb_anysee
[   29.514803] usbcore: registered new interface driver dvb_usb_au6610
[   29.516176] usbcore: registered new interface driver dvb_usb_az6007
[   29.517579] usbcore: registered new interface driver dvb_usb_ce6230
[   29.518970] usbcore: registered new interface driver dvb_usb_ec168
[   29.520330] usbcore: registered new interface driver dvb_usb_lmedm04
[   29.521719] usbcore: registered new interface driver dvb_usb_gl861
[   29.523089] usbcore: registered new interface driver dvb_usb_mxl111sf
[   29.524516] usbcore: registered new interface driver dvb_usb_rtl28xxu
[   29.525952] usbcore: registered new interface driver dvb_usb_dvbsky
[   29.534104] usbcore: registered new interface driver smsusb
[   29.535412] usbcore: registered new interface driver b2c2_flexcop_usb
[   29.536820] usbcore: registered new interface driver zr364xx
[   29.538145] usbcore: registered new interface driver stkwebcam
[   29.539460] usbcore: registered new interface driver s2255
[   29.540739] usbcore: registered new interface driver uvcvideo
[   29.541873] USB Video Class driver (1.1.1)
[   29.542688] gspca_main: v2.14.0 registered
[   29.543628] usbcore: registered new interface driver benq
[   29.544851] usbcore: registered new interface driver conex
[   29.546064] usbcore: registered new interface driver cpia1
[   29.547289] usbcore: registered new interface driver dtcs033
[   29.548547] usbcore: registered new interface driver etoms
[   29.549793] usbcore: registered new interface driver finepix
[   29.551045] usbcore: registered new interface driver jeilinj
[   29.552313] usbcore: registered new interface driver jl2005bcd
[   29.553680] usbcore: registered new interface driver kinect
[   29.554929] usbcore: registered new interface driver konica
[   29.556160] usbcore: registered new interface driver mars
[   29.557427] usbcore: registered new interface driver mr97310a
[   29.558730] usbcore: registered new interface driver nw80x
[   29.559994] usbcore: registered new interface driver ov519
[   29.561204] usbcore: registered new interface driver ov534
[   29.562425] usbcore: registered new interface driver ov534_9
[   29.563730] usbcore: registered new interface driver pac207
[   29.564964] usbcore: registered new interface driver gspca_pac7302
[   29.566312] usbcore: registered new interface driver pac7311
[   29.567574] usbcore: registered new interface driver se401
[   29.568786] usbcore: registered new interface driver sn9c2028
[   29.570075] usbcore: registered new interface driver gspca_sn9c20x
[   29.571413] usbcore: registered new interface driver sonixb
[   29.572651] usbcore: registered new interface driver sonixj
[   29.573889] usbcore: registered new interface driver spca500
[   29.575147] usbcore: registered new interface driver spca501
[   29.576399] usbcore: registered new interface driver spca505
[   29.577774] usbcore: registered new interface driver spca506
[   29.579091] usbcore: registered new interface driver spca508
[   29.580426] usbcore: registered new interface driver spca561
[   29.581690] usbcore: registered new interface driver spca1528
[   29.582974] usbcore: registered new interface driver sq905
[   29.584202] usbcore: registered new interface driver sq905c
[   29.585444] usbcore: registered new interface driver sq930x
[   29.586670] usbcore: registered new interface driver sunplus
[   29.587933] usbcore: registered new interface driver stk014
[   29.589160] usbcore: registered new interface driver stk1135
[   29.590417] usbcore: registered new interface driver stv0680
[   29.591688] usbcore: registered new interface driver t613
[   29.592932] usbcore: registered new interface driver gspca_topro
[   29.594253] usbcore: registered new interface driver tv8532
[   29.595503] usbcore: registered new interface driver vc032x
[   29.596727] usbcore: registered new interface driver vicam
[   29.598009] usbcore: registered new interface driver xirlink-cit
[   29.599364] usbcore: registered new interface driver gspca_zc3xx
[   29.600735] usbcore: registered new interface driver ALi m5602
[   29.602040] usbcore: registered new interface driver STV06xx
[   29.603344] usbcore: registered new interface driver gspca_gl860
[   29.604671] usbcore: registered new interface driver Philips webcam
[   29.605902] au0828: au0828 driver loaded
[   29.606880] usbcore: registered new interface driver au0828
[   29.608139] usbcore: registered new interface driver hdpvr
[   29.609773] usbcore: registered new interface driver pvrusb2
[   29.610900] pvrusb2: V4L in-tree version:Hauppauge WinTV-PVR-USB2 MPEG2 Encoder/Tuner
[   29.612425] pvrusb2: Debug mask is 31 (0x1f)
[   29.613481] usbcore: registered new interface driver stk1160
[   29.614745] usbcore: registered new interface driver cx231xx
[   29.616099] usbcore: registered new interface driver tm6000
[   29.617362] usbcore: registered new interface driver em28xx
[   29.618470] em28xx: Registered (Em28xx Audio Extension) extension
[   29.619666] em28xx: Registered (Em28xx dvb Extension) extension
[   29.620829] em28xx: Registered (Em28xx Input Extension) extension
[   29.622025] smssdio: Siano SMS1xxx SDIO driver
[   29.622905] smssdio: Copyright Pierre Ossman
[   29.623937] pps_ldisc: PPS line discipline registered
[   29.624934] pps_parport: parallel port PPS client
[   29.626094] parport0: cannot grant exclusive access for device pps_parport
[   29.627449] pps_parport: couldn't register with parport0
[   29.629107] Driver for 1-wire Dallas network protocol.
[   29.720898] applesmc: supported laptop not found!
[   29.721856] applesmc: driver init failed (ret=-19)!
[   29.898929] pc87360: PC8736x not detected, module not inserted
[   29.935838] intel_powerclamp: CPU does not support MWAIT
[   29.937355] usbcore: registered new interface driver pcwd_usb
[   29.938495] acquirewdt: WDT driver for Acquire single board computer initialising
[   29.940306] acquirewdt: I/O address 0x0043 already in use
[   29.941387] acquirewdt: probe of acquirewdt failed with error -5
[   29.942798] advantechwdt: WDT driver for Advantech single board computer initialising
[   29.945026] advantechwdt: initialized. timeout=60 sec (nowayout=0)
[   29.946358] alim7101_wdt: Steve Hill <steve@navaho.co.uk>
[   29.947424] alim7101_wdt: ALi M7101 PMU not present - WDT not set
[   29.948761] genirq: Flags mismatch irq 10. 00000000 (eurwdt) vs. 00000080 (virtio0)
[   29.949629] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #9
[   29.949629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[   29.949629] Call Trace:
[   29.949629]  dump_stack+0xdb/0x120
[   29.949629]  __setup_irq.cold.57+0xfc/0x21f
[   29.949629]  request_threaded_irq+0x29d/0x3c0
[   29.949629]  ? fitpc2_wdt_init+0x1a2/0x1a2
[   29.949629]  eurwdt_init+0x33/0x1d4
[   29.949629]  do_one_initcall+0xc4/0x3e0
[   29.949629]  ? perf_trace_initcall_level+0x3e0/0x3e0
[   29.949629]  ? unpoison_range+0x14/0x40
[   29.949629]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
[   29.949629]  ? kernel_init_freeable+0x420/0x652
[   29.949629]  ? __kasan_kmalloc+0x9/0x10
[   29.949629]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   29.949629]  kernel_init_freeable+0x596/0x652
[   29.949629]  ? console_on_rootfs+0x7d/0x7d
[   29.949629]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   29.949629]  ? rest_init+0xf0/0xf0
[   29.949629]  kernel_init+0x16/0x1d0
[   29.949629]  ? rest_init+0xf0/0xf0
[   29.949629]  ret_from_fork+0x22/0x30
[   29.969171] eurotechwdt: IRQ 10 is not free
[   29.969998] ib700wdt: WDT driver for IB700 single board computer initialising
[   29.971752] ib700wdt: START method I/O 443 is not available
[   29.972871] ib700wdt: probe of ib700wdt failed with error -5
[   29.974216] wafer5823wdt: WDT driver for Wafer 5823 single board computer initialising
[   29.975812] wafer5823wdt: I/O address 0x0443 already in use
[   29.977225] iTCO_vendor_support: vendor-support=0
[   29.978349] it87_wdt: no device
[   29.979195] sc1200wdt: build 20020303
[   29.980028] sc1200wdt: io parameter must be specified
[   29.981094] pc87413_wdt: Version 1.1 at io 0x2E
[   29.981993] pc87413_wdt: cannot register miscdev on minor=130 (err=-16)
[   29.983292] nv_tco: NV TCO WatchDog Timer Driver v0.01
[   29.984753] sbc60xxwdt: I/O address 0x0443 already in use
[   29.985841] cpu5wdt: misc_register failed
[   29.986727] smsc37b787_wdt: SMsC 37B787 watchdog component driver 1.1 initialising...
[   29.989360] smsc37b787_wdt: Unable to register miscdev on minor 130
[   29.990877] w83877f_wdt: I/O address 0x0443 already in use
[   29.991966] w83977f_wdt: driver v1.00
[   29.992706] w83977f_wdt: cannot register miscdev on minor=130 (err=-16)
[   29.994003] machzwd: MachZ ZF-Logic Watchdog driver initializing
[   29.995187] machzwd: no ZF-Logic found
[   29.995937] sbc_epx_c3: cannot register miscdev on minor=130 (err=-16)
[   29.997494] watchdog: Software Watchdog: cannot register miscdev on minor=130 (err=-16).
[   29.999068] watchdog: Software Watchdog: a legacy watchdog module is probably present.
[   30.007851] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
[   30.009510] softdog:              soft_reboot_cmd=<not set> soft_active_on_boot=0
[   30.011616] md-cluster: support raid1 and raid10 (limited support)
[   30.012837] Registering Cluster MD functions
[   30.014963] device-mapper: uevent: version 1.0.3
[   30.017020] device-mapper: ioctl: 4.43.0-ioctl (2020-10-01) initialised: dm-devel@redhat.com
[   30.019991] device-mapper: multipath round-robin: version 1.2.0 loaded
[   30.021282] device-mapper: multipath queue-length: version 0.2.0 loaded
[   30.022588] device-mapper: multipath service-time: version 0.3.0 loaded
[   30.024790] device-mapper: dm-log-userspace: version 1.3.0 loaded
[   30.025992] device-mapper: raid: Loading target version 1.15.1
[   30.028330] Bluetooth: HCI UART driver ver 2.3
[   30.029224] Bluetooth: HCI UART protocol H4 registered
[   30.030240] Bluetooth: HCI UART protocol BCSP registered
[   30.031373] Bluetooth: HCI UART protocol LL registered
[   30.032387] Bluetooth: HCI UART protocol ATH3K registered
[   30.033530] Bluetooth: HCI UART protocol Three-wire (H5) registered
[   30.034929] usbcore: registered new interface driver bcm203x
[   30.036210] usbcore: registered new interface driver bpa10x
[   30.037477] usbcore: registered new interface driver bfusb
[   30.038978] usbcore: registered new interface driver btusb
[   30.040283] usbcore: registered new interface driver ath3k
[   30.042147] CAPI 2.0 started up with major 68 (middleware)
[   30.043282] Modular ISDN core version 1.1.29
[   30.044645] NET: Registered protocol family 34
[   30.045566] DSP module 2.0
[   30.046143] mISDN_dsp: DSP clocks every 64 samples. This equals 8 jiffies.
[   30.068976] mISDN: Layer-1-over-IP driver Rev. 2.00
[   30.071096] 0 virtual devices registered
[   30.072107] mISDN: HFC-multi driver 2.03
[   30.073268] usbcore: registered new interface driver HFC-S_USB
[   30.074416] AVM Fritz PCI driver Rev. 2.3
[   30.075385] Sedlbauer Speedfax+ Driver Rev. 2.0
[   30.076454] Infineon ISDN Driver Rev. 1.0
[   30.077435] Winbond W6692 PCI driver Rev. 2.0
[   30.078474] Netjet PCI driver Rev. 2.0
[   30.079400] mISDNipac module version 2.0
[   30.080187] mISDN: ISAR driver Rev. 2.1
[   30.083502] intel_pstate: CPU model not supported
[   30.084777] sdhci: Secure Digital Host Controller Interface driver
[   30.085990] sdhci: Copyright(c) Pierre Ossman
[   30.087171] wbsd: Winbond W83L51xD SD/MMC card interface driver
[   30.088336] wbsd: Copyright(c) Pierre Ossman
[   30.089681] VUB300 Driver rom wait states = 1C irqpoll timeout = 0400
[   30.090853] usbcore: registered new interface driver vub300
[   30.093385] usbcore: registered new interface driver ushc
[   30.094556] sdhci-pltfm: SDHCI platform and OF driver helper
[   30.098046] leds_ss4200: no LED devices found
[   30.101872] i40e: Registered client i40iw
[   30.111292] usnic_verbs: Cisco VIC (USNIC) Verbs Driver v1.0.3 (December 19, 2013)
[   30.112830] usnic_verbs:usnic_uiom_init:563: 
[   30.112844] IOMMU required but not present or enabled.  USNIC QPs will not function w/o enabling IOMMU
[   30.115505] usnic_verbs:usnic_ib_init:647: 
[   30.115518] Unable to initialize umem with err -1
[   30.117278] qedr: discovered and registered 0 RDMA funcs
[   30.118324] bnxt_re: Broadcom NetXtreme-C/E RoCE Driver
[   30.119750] bnxt_en 0000:00:04.0: bnxt_re: probe error: RoCE is not supported on this device
[   30.122468] iscsi: registered transport (iser)
[   30.124650] iBFT detected.
[   30.125228] ==================================================================
[   30.126201] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
[   30.126201] Read of size 4 at addr ffff8880be453004 by task swapper/0/1
[   30.126201] 
[   30.126201] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #9
[   30.126201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[   30.126201] Call Trace:
[   30.126201]  dump_stack+0xdb/0x120
[   30.126201]  ? ibft_init+0x134/0xc33
[   30.126201]  print_address_description.constprop.7+0x41/0x60
[   30.126201]  ? ibft_init+0x134/0xc33
[   30.126201]  ? ibft_init+0x134/0xc33
[   30.126201]  kasan_report.cold.10+0x78/0xd1
[   30.126201]  ? ibft_init+0x134/0xc33
[   30.126201]  __asan_report_load_n_noabort+0xf/0x20
[   30.126201]  ibft_init+0x134/0xc33
[   30.126201]  ? write_comp_data+0x2f/0x90
[   30.126201]  ? ibft_check_initiator_for+0x159/0x159
[   30.126201]  ? write_comp_data+0x2f/0x90
[   30.126201]  ? ibft_check_initiator_for+0x159/0x159
[   30.126201]  do_one_initcall+0xc4/0x3e0
[   30.126201]  ? perf_trace_initcall_level+0x3e0/0x3e0
[   30.126201]  ? unpoison_range+0x14/0x40
[   30.126201]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
[   30.126201]  ? kernel_init_freeable+0x420/0x652
[   30.126201]  ? __kasan_kmalloc+0x9/0x10
[   30.126201]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.126201]  kernel_init_freeable+0x596/0x652
[   30.126201]  ? console_on_rootfs+0x7d/0x7d
[   30.126201]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.126201]  ? rest_init+0xf0/0xf0
[   30.126201]  kernel_init+0x16/0x1d0
[   30.126201]  ? rest_init+0xf0/0xf0
[   30.126201]  ret_from_fork+0x22/0x30
[   30.126201] 
[   30.126201] The buggy address belongs to the page:
[   30.126201] page:0000000091b8f2b4 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0xbe453
[   30.126201] flags: 0xfffffc0000000()
[   30.126201] raw: 000fffffc0000000 ffffea0002fac708 ffffea0002fac748 0000000000000000
[   30.126201] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[   30.126201] page dumped because: kasan: bad access detected
[   30.126201] page_owner tracks the page as freed
[   30.126201] page last allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 27975563827
[   30.126201]  prep_new_page+0xfb/0x140
[   30.126201]  get_page_from_freelist+0x3503/0x5730
[   30.126201]  __alloc_pages_nodemask+0x2d8/0x650
[   30.126201]  alloc_pages_vma+0xe2/0x560
[   30.126201]  __handle_mm_fault+0x930/0x26c0
[   30.126201]  handle_mm_fault+0x1f9/0x810
[   30.126201]  do_user_addr_fault+0x6f7/0xca0
[   30.126201]  exc_page_fault+0xaf/0x1a0
[   30.126201]  asm_exc_page_fault+0x1e/0x30
[   30.126201] page last free stack trace:
[   30.126201]  free_pcp_prepare+0x122/0x290
[   30.126201]  free_unref_page_list+0xe6/0x490
[   30.126201]  release_pages+0x2ed/0x1270
[   30.126201]  free_pages_and_swap_cache+0x245/0x2e0
[   30.126201]  tlb_flush_mmu+0x11e/0x680
[   30.126201]  tlb_finish_mmu+0xa6/0x3e0
[   30.126201]  exit_mmap+0x2b3/0x540
[   30.126201]  mmput+0x11d/0x450
[   30.126201]  do_exit+0xaa6/0x2d40
[   30.126201]  do_group_exit+0x128/0x340
[   30.126201]  __x64_sys_exit_group+0x43/0x50
[   30.126201]  do_syscall_64+0x37/0x50
[   30.126201]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.126201] 
[   30.126201] Memory state around the buggy address:
[   30.126201]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   30.126201]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   30.126201] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   30.126201]                    ^
[   30.126201]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   30.126201]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   30.126201] ==================================================================
[   30.126201] Disabling lock debugging due to kernel taint
[   30.195934] Kernel panic - not syncing: panic_on_warn set ...
[   30.196900] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G    B             5.11.0-f9593a0 #9
[   30.198187] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[   30.198187] Call Trace:
[   30.198187]  dump_stack+0xdb/0x120
[   30.198187]  ? ibft_init+0xba/0xc33
[   30.198187]  panic+0x28f/0x6ad
[   30.202184]  ? print_oops_end_marker.cold.10+0x15/0x15
[   30.202184]  ? add_taint+0x68/0xc0
[   30.202184]  ? ibft_init+0x134/0xc33
[   30.202184]  ? ibft_init+0x134/0xc33
[   30.202184]  end_report+0x5c/0x64
[   30.206185]  kasan_report.cold.10+0x66/0xd1
[   30.206185]  ? ibft_init+0x134/0xc33
[   30.206185]  __asan_report_load_n_noabort+0xf/0x20
[   30.206185]  ibft_init+0x134/0xc33
[   30.206185]  ? write_comp_data+0x2f/0x90
[   30.210190]  ? ibft_check_initiator_for+0x159/0x159
[   30.210190]  ? write_comp_data+0x2f/0x90
[   30.210190]  ? ibft_check_initiator_for+0x159/0x159
[   30.210190]  do_one_initcall+0xc4/0x3e0
[   30.210190]  ? perf_trace_initcall_level+0x3e0/0x3e0
[   30.214185]  ? unpoison_range+0x14/0x40
[   30.214185]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
[   30.214185]  ? kernel_init_freeable+0x420/0x652
[   30.214185]  ? __kasan_kmalloc+0x9/0x10
[   30.218184]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.218184]  kernel_init_freeable+0x596/0x652
[   30.218184]  ? console_on_rootfs+0x7d/0x7d
[   30.218184]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.218184]  ? rest_init+0xf0/0xf0
[   30.222185]  kernel_init+0x16/0x1d0
[   30.222185]  ? rest_init+0xf0/0xf0
[   30.222185]  ret_from_fork+0x22/0x30
[   30.222185] Dumping ftrace buffer:
[   30.222185]    (ftrace buffer empty)
[   30.222185] Kernel Offset: disabled
[   30.222185] Rebooting in 1 seconds..
George Kennedy Feb. 23, 2021, 9:26 p.m. UTC | #22
On 2/23/2021 3:09 PM, Mike Rapoport wrote:
> On Tue, Feb 23, 2021 at 01:05:05PM -0500, George Kennedy wrote:
>> On 2/23/2021 10:47 AM, Mike Rapoport wrote:
>>
>> It now crashes here:
>>
>> [    0.051019] ACPI: Early table checksum verification disabled
>> [    0.056721] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
>> [    0.057874] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
>> 00000001      01000013)
>> [    0.059590] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
>> 00000001 BXPC 00000001)
>> [    0.061306] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
>> 00000001 BXPC 00000001)
>> [    0.063006] ACPI: FACS 0x00000000BFBFD000 000040
>> [    0.063938] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
>> 00000001 BXPC 00000001)
>> [    0.065638] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
>> 00000001 BXPC 00000001)
>> [    0.067335] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2
>> 00000002      01000013)
>> [    0.069030] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
>> 00000000      00000000)
>> [    0.070734] XXX acpi_find_ibft_region:
>> [    0.071468] XXX iBFT, status=0
>> [    0.072073] XXX about to call acpi_put_table()...
>> ibft_addr=ffffffffff240000
>> [    0.073449] XXX acpi_find_ibft_region(EXIT):
>> PANIC: early exception 0x0e IP 10:ffffffff9259f439 error 0 cr2
>> 0xffffffffff240004
> Right, I've missed the dereference of the ibft_addr after
> acpi_find_ibft_region().
>
> With this change to iscsi_ibft_find.c instead of the previous one it should
> be better:
>
> diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
> index 64bb94523281..1be7481d5c69 100644
> --- a/drivers/firmware/iscsi_ibft_find.c
> +++ b/drivers/firmware/iscsi_ibft_find.c
> @@ -80,6 +80,27 @@ static int __init find_ibft_in_mem(void)
>   done:
>   	return len;
>   }
> +
> +static void __init acpi_find_ibft_region(unsigned long *sizep)
> +{
> +	int i;
> +	struct acpi_table_header *table = NULL;
> +	acpi_status status;
> +
> +	if (acpi_disabled)
> +		return;
> +
> +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
> +		if (ACPI_SUCCESS(status)) {
> +			ibft_addr = (struct acpi_table_ibft *)table;
> +			*sizep = PAGE_ALIGN(ibft_addr->header.length);
> +			acpi_put_table(table);
> +			break;
> +		}
> +	}
> +}
> +
>   /*
>    * Routine used to find the iSCSI Boot Format Table. The logical
>    * kernel address is set in the ibft_addr global variable.
> @@ -91,14 +112,16 @@ unsigned long __init find_ibft_region(unsigned long *sizep)
>   	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>   	 * only use ACPI for this */
>   
> -	if (!efi_enabled(EFI_BOOT))
> +	if (!efi_enabled(EFI_BOOT)) {
>   		find_ibft_in_mem();
> -
> -	if (ibft_addr) {
>   		*sizep = PAGE_ALIGN(ibft_addr->header.length);
> -		return (u64)virt_to_phys(ibft_addr);
> +	} else {
> +		acpi_find_ibft_region(sizep);
>   	}
>   
> +	if (ibft_addr)
> +		return (u64)virt_to_phys(ibft_addr);
> +
>   	*sizep = 0;
>   	return 0;
>   }
Mike,

No luck. Back to the original KASAN ibft_init crash.

I ran with only the above patch from you. Was that what you wanted? Your 
previous patch had a section defined out by #if 0. Was that supposed to 
be in there as well?

If you need the console output let me know. Got bounced because it was 
too large.

[   30.124650] iBFT detected.
[   30.125228] 
==================================================================
[   30.126201] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
[   30.126201] Read of size 4 at addr ffff8880be453004 by task swapper/0/1
[   30.126201]
[   30.126201] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #9
[   30.126201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 0.0.0 02/06/2015
[   30.126201] Call Trace:
[   30.126201]  dump_stack+0xdb/0x120
[   30.126201]  ? ibft_init+0x134/0xc33
[   30.126201]  print_address_description.constprop.7+0x41/0x60
[   30.126201]  ? ibft_init+0x134/0xc33
[   30.126201]  ? ibft_init+0x134/0xc33
[   30.126201]  kasan_report.cold.10+0x78/0xd1
[   30.126201]  ? ibft_init+0x134/0xc33
[   30.126201]  __asan_report_load_n_noabort+0xf/0x20
[   30.126201]  ibft_init+0x134/0xc33
[   30.126201]  ? write_comp_data+0x2f/0x90
[   30.126201]  ? ibft_check_initiator_for+0x159/0x159
[   30.126201]  ? write_comp_data+0x2f/0x90
[   30.126201]  ? ibft_check_initiator_for+0x159/0x159
[   30.126201]  do_one_initcall+0xc4/0x3e0
[   30.126201]  ? perf_trace_initcall_level+0x3e0/0x3e0
[   30.126201]  ? unpoison_range+0x14/0x40
[   30.126201]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
[   30.126201]  ? kernel_init_freeable+0x420/0x652
[   30.126201]  ? __kasan_kmalloc+0x9/0x10
[   30.126201]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.126201]  kernel_init_freeable+0x596/0x652
[   30.126201]  ? console_on_rootfs+0x7d/0x7d
[   30.126201]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.126201]  ? rest_init+0xf0/0xf0
[   30.126201]  kernel_init+0x16/0x1d0
[   30.126201]  ? rest_init+0xf0/0xf0
[   30.126201]  ret_from_fork+0x22/0x30
[   30.126201]
[   30.126201] The buggy address belongs to the page:
[   30.126201] page:0000000091b8f2b4 refcount:0 mapcount:0 
mapping:0000000000000000 index:0x1 pfn:0xbe453
[   30.126201] flags: 0xfffffc0000000()
[   30.126201] raw: 000fffffc0000000 ffffea0002fac708 ffffea0002fac748 
0000000000000000
[   30.126201] raw: 0000000000000001 0000000000000000 00000000ffffffff 
0000000000000000
[   30.126201] page dumped because: kasan: bad access detected
[   30.126201] page_owner tracks the page as freed
[   30.126201] page last allocated via order 0, migratetype Movable, 
gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 27975563827
[   30.126201]  prep_new_page+0xfb/0x140
[   30.126201]  get_page_from_freelist+0x3503/0x5730
[   30.126201]  __alloc_pages_nodemask+0x2d8/0x650
[   30.126201]  alloc_pages_vma+0xe2/0x560
[   30.126201]  __handle_mm_fault+0x930/0x26c0
[   30.126201]  handle_mm_fault+0x1f9/0x810
[   30.126201]  do_user_addr_fault+0x6f7/0xca0
[   30.126201]  exc_page_fault+0xaf/0x1a0
[   30.126201]  asm_exc_page_fault+0x1e/0x30
[   30.126201] page last free stack trace:
[   30.126201]  free_pcp_prepare+0x122/0x290
[   30.126201]  free_unref_page_list+0xe6/0x490
[   30.126201]  release_pages+0x2ed/0x1270
[   30.126201]  free_pages_and_swap_cache+0x245/0x2e0
[   30.126201]  tlb_flush_mmu+0x11e/0x680
[   30.126201]  tlb_finish_mmu+0xa6/0x3e0
[   30.126201]  exit_mmap+0x2b3/0x540
[   30.126201]  mmput+0x11d/0x450
[   30.126201]  do_exit+0xaa6/0x2d40
[   30.126201]  do_group_exit+0x128/0x340
[   30.126201]  __x64_sys_exit_group+0x43/0x50
[   30.126201]  do_syscall_64+0x37/0x50
[   30.126201]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.126201]
[   30.126201] Memory state around the buggy address:
[   30.126201]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.126201]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.126201] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.126201]                    ^
[   30.126201]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.126201]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.126201] 
==================================================================


This is all I ran with:

# git diff
diff --git a/drivers/firmware/iscsi_ibft_find.c 
b/drivers/firmware/iscsi_ibft_find.c
index 64bb945..1be7481 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -80,6 +80,27 @@ static int __init find_ibft_in_mem(void)
  done:
         return len;
  }
+
+static void __init acpi_find_ibft_region(unsigned long *sizep)
+{
+       int i;
+       struct acpi_table_header *table = NULL;
+       acpi_status status;
+
+       if (acpi_disabled)
+               return;
+
+       for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
+               status = acpi_get_table(ibft_signs[i].sign, 0, &table);
+               if (ACPI_SUCCESS(status)) {
+                       ibft_addr = (struct acpi_table_ibft *)table;
+                       *sizep = PAGE_ALIGN(ibft_addr->header.length);
+                       acpi_put_table(table);
+                       break;
+               }
+       }
+}
+
  /*
   * Routine used to find the iSCSI Boot Format Table. The logical
   * kernel address is set in the ibft_addr global variable.
@@ -91,14 +112,16 @@ unsigned long __init find_ibft_region(unsigned long 
*sizep)
         /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
          * only use ACPI for this */

-       if (!efi_enabled(EFI_BOOT))
+       if (!efi_enabled(EFI_BOOT)) {
                 find_ibft_in_mem();
-
-       if (ibft_addr) {
                 *sizep = PAGE_ALIGN(ibft_addr->header.length);
-               return (u64)virt_to_phys(ibft_addr);
+       } else {
+               acpi_find_ibft_region(sizep);
         }

+       if (ibft_addr)
+               return (u64)virt_to_phys(ibft_addr);
+
         *sizep = 0;
         return 0;
  }


Thank you,
George
>> [    0.075711] CPU: 0 PID: 0 Comm: swapper Not tainted 5.11.0-34a2105 #8
>> [    0.076983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 0.0.0 02/06/2015
>> [    0.078579] RIP: 0010:find_ibft_region+0x470/0x577
Mike Rapoport Feb. 23, 2021, 9:32 p.m. UTC | #23
On Tue, Feb 23, 2021 at 04:16:44PM -0500, George Kennedy wrote:
> 
> 
> On 2/23/2021 3:09 PM, Mike Rapoport wrote:
> > On Tue, Feb 23, 2021 at 01:05:05PM -0500, George Kennedy wrote:
> > > On 2/23/2021 10:47 AM, Mike Rapoport wrote:
> > > 
> > > It now crashes here:
> > > 
> > > [    0.051019] ACPI: Early table checksum verification disabled
> > > [    0.056721] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
> > > [    0.057874] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
> > > 00000001      01000013)
> > > [    0.059590] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
> > > 00000001 BXPC 00000001)
> > > [    0.061306] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
> > > 00000001 BXPC 00000001)
> > > [    0.063006] ACPI: FACS 0x00000000BFBFD000 000040
> > > [    0.063938] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
> > > 00000001 BXPC 00000001)
> > > [    0.065638] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
> > > 00000001 BXPC 00000001)
> > > [    0.067335] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2
> > > 00000002      01000013)
> > > [    0.069030] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
> > > 00000000      00000000)
> > > [    0.070734] XXX acpi_find_ibft_region:
> > > [    0.071468] XXX iBFT, status=0
> > > [    0.072073] XXX about to call acpi_put_table()...
> > > ibft_addr=ffffffffff240000
> > > [    0.073449] XXX acpi_find_ibft_region(EXIT):
> > > PANIC: early exception 0x0e IP 10:ffffffff9259f439 error 0 cr2
> > > 0xffffffffff240004
> > Right, I've missed the dereference of the ibft_addr after
> > acpi_find_ibft_region().
> > 
> > With this change to iscsi_ibft_find.c instead of the previous one it should
> > be better:
> > 
> > diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
> > index 64bb94523281..1be7481d5c69 100644
> > --- a/drivers/firmware/iscsi_ibft_find.c
> > +++ b/drivers/firmware/iscsi_ibft_find.c
> > @@ -80,6 +80,27 @@ static int __init find_ibft_in_mem(void)
> >   done:
> >   	return len;
> >   }
> > +
> > +static void __init acpi_find_ibft_region(unsigned long *sizep)
> > +{
> > +	int i;
> > +	struct acpi_table_header *table = NULL;
> > +	acpi_status status;
> > +
> > +	if (acpi_disabled)
> > +		return;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> > +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
> > +		if (ACPI_SUCCESS(status)) {
> > +			ibft_addr = (struct acpi_table_ibft *)table;
> > +			*sizep = PAGE_ALIGN(ibft_addr->header.length);
> > +			acpi_put_table(table);
> > +			break;
> > +		}
> > +	}
> > +}
> > +
> >   /*
> >    * Routine used to find the iSCSI Boot Format Table. The logical
> >    * kernel address is set in the ibft_addr global variable.
> > @@ -91,14 +112,16 @@ unsigned long __init find_ibft_region(unsigned long *sizep)
> >   	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> >   	 * only use ACPI for this */
> > -	if (!efi_enabled(EFI_BOOT))
> > +	if (!efi_enabled(EFI_BOOT)) {
> >   		find_ibft_in_mem();
> > -
> > -	if (ibft_addr) {
> >   		*sizep = PAGE_ALIGN(ibft_addr->header.length);
> > -		return (u64)virt_to_phys(ibft_addr);
> > +	} else {
> > +		acpi_find_ibft_region(sizep);
> >   	}
> > +	if (ibft_addr)
> > +		return (u64)virt_to_phys(ibft_addr);
> > +
> >   	*sizep = 0;
> >   	return 0;
> >   }
> Mike,
> 
> No luck. Back to the original KASAN ibft_init crash.
> 
> I ran with only the above patch from you. Was that what you wanted? Your
> previous patch had a section defined out by #if 0. Was that supposed to be
> in there as well?

Sorry, I wasn't clear, but I meant to use the first patch and only replace
changes to iscsi_ibft_find.c with the new patch. 

Here's the full patch to be sure we're on the same page:

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 7bdc0239a943..c118dd54a747 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
 	if (acpi_disabled)
 		return;
 
+#if 0
 	/*
 	 * Initialize the ACPI boot-time table parser.
 	 */
@@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
 		disable_acpi();
 		return;
 	}
+#endif
 
 	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d883176ef2ce..c8a07a7b9577 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1032,6 +1032,14 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	find_smp_config();
 
+	/*
+	 * Initialize the ACPI boot-time table parser.
+	 */
+	if (acpi_table_init()) {
+		disable_acpi();
+		return;
+	}
+
 	reserve_ibft_region();
 
 	early_alloc_pgt_buf();
diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
index 64bb94523281..1be7481d5c69 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -80,6 +80,27 @@ static int __init find_ibft_in_mem(void)
 done:
 	return len;
 }
+
+static void __init acpi_find_ibft_region(unsigned long *sizep)
+{
+	int i;
+	struct acpi_table_header *table = NULL;
+	acpi_status status;
+
+	if (acpi_disabled)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
+		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
+		if (ACPI_SUCCESS(status)) {
+			ibft_addr = (struct acpi_table_ibft *)table;
+			*sizep = PAGE_ALIGN(ibft_addr->header.length);
+			acpi_put_table(table);
+			break;
+		}
+	}
+}
+
 /*
  * Routine used to find the iSCSI Boot Format Table. The logical
  * kernel address is set in the ibft_addr global variable.
@@ -91,14 +112,16 @@ unsigned long __init find_ibft_region(unsigned long *sizep)
 	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
 	 * only use ACPI for this */
 
-	if (!efi_enabled(EFI_BOOT))
+	if (!efi_enabled(EFI_BOOT)) {
 		find_ibft_in_mem();
-
-	if (ibft_addr) {
 		*sizep = PAGE_ALIGN(ibft_addr->header.length);
-		return (u64)virt_to_phys(ibft_addr);
+	} else {
+		acpi_find_ibft_region(sizep);
 	}
 
+	if (ibft_addr)
+		return (u64)virt_to_phys(ibft_addr);
+
 	*sizep = 0;
 	return 0;
 }
George Kennedy Feb. 23, 2021, 9:46 p.m. UTC | #24
On 2/23/2021 4:32 PM, Mike Rapoport wrote:
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 7bdc0239a943..c118dd54a747 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
>   	if (acpi_disabled)
>   		return;
>   
> +#if 0
>   	/*
>   	 * Initialize the ACPI boot-time table parser.
>   	 */
> @@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
>   		disable_acpi();
>   		return;
>   	}
> +#endif
>   
>   	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
>   
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d883176ef2ce..c8a07a7b9577 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1032,6 +1032,14 @@ void __init setup_arch(char **cmdline_p)
>   	 */
>   	find_smp_config();
>   
> +	/*
> +	 * Initialize the ACPI boot-time table parser.
> +	 */
> +	if (acpi_table_init()) {
> +		disable_acpi();
> +		return;
> +	}
> +
>   	reserve_ibft_region();
>   
>   	early_alloc_pgt_buf();
> diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
> index 64bb94523281..1be7481d5c69 100644
> --- a/drivers/firmware/iscsi_ibft_find.c
> +++ b/drivers/firmware/iscsi_ibft_find.c
> @@ -80,6 +80,27 @@ static int __init find_ibft_in_mem(void)
>   done:
>   	return len;
>   }
> +
> +static void __init acpi_find_ibft_region(unsigned long *sizep)
> +{
> +	int i;
> +	struct acpi_table_header *table = NULL;
> +	acpi_status status;
> +
> +	if (acpi_disabled)
> +		return;
> +
> +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
> +		if (ACPI_SUCCESS(status)) {
> +			ibft_addr = (struct acpi_table_ibft *)table;
> +			*sizep = PAGE_ALIGN(ibft_addr->header.length);
> +			acpi_put_table(table);
> +			break;
> +		}
> +	}
> +}
> +
>   /*
>    * Routine used to find the iSCSI Boot Format Table. The logical
>    * kernel address is set in the ibft_addr global variable.
> @@ -91,14 +112,16 @@ unsigned long __init find_ibft_region(unsigned long *sizep)
>   	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>   	 * only use ACPI for this */
>   
> -	if (!efi_enabled(EFI_BOOT))
> +	if (!efi_enabled(EFI_BOOT)) {
>   		find_ibft_in_mem();
> -
> -	if (ibft_addr) {
>   		*sizep = PAGE_ALIGN(ibft_addr->header.length);
> -		return (u64)virt_to_phys(ibft_addr);
> +	} else {
> +		acpi_find_ibft_region(sizep);
>   	}
>   
> +	if (ibft_addr)
> +		return (u64)virt_to_phys(ibft_addr);
> +
>   	*sizep = 0;
>   	return 0;
>   }
>   
Mike,

Still no luck.

[   30.193723] iscsi: registered transport (iser)
[   30.195970] iBFT detected.
[   30.196571] BUG: unable to handle page fault for address: 
ffffffffff240004
[   30.196824] #PF: supervisor read access in kernel mode
[   30.196824] #PF: error_code(0x0000) - not-present page
[   30.196824] PGD 24e34067 P4D 24e34067 PUD 24e36067 PMD 27a0e067 PTE 0
[   30.196824] Oops: 0000 [#1] SMP KASAN PTI
[   30.196824] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #10
[   30.196824] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 0.0.0 02/06/2015
[   30.196824] RIP: 0010:ibft_init+0x13d/0xc33
[   30.196824] Code: c1 40 84 ce 75 11 83 e0 07 38 c2 0f 9e c1 84 d2 0f 
95 c0 84 c1 74 0a be 04 00 00 00 e8 77 f2 5f ef 49 8d 7f 08 b8 ff ff 37 
00 <4d> 63 6f 04 48 89 fa 48 c1 e0 2a 48 c1 ea 03 8a 04 02 48 89 fa 83
[   30.196824] RSP: 0000:ffff888100fafc30 EFLAGS: 00010246
[   30.196824] RAX: 000000000037ffff RBX: ffffffff937c6fc0 RCX: 
ffffffff815fcf01
[   30.196824] RDX: dffffc0000000000 RSI: 0000000000000001 RDI: 
ffffffffff240008
[   30.196824] RBP: ffff888100fafcf8 R08: ffffed10201f5f12 R09: 
ffffed10201f5f12
[   30.196824] R10: ffff888100faf88f R11: ffffed10201f5f11 R12: 
dffffc0000000000
[   30.196824] R13: ffff888100fafdc0 R14: ffff888100fafcd0 R15: 
ffffffffff240000
[   30.196824] FS:  0000000000000000(0000) GS:ffff88810ad80000(0000) 
knlGS:0000000000000000
[   30.196824] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.196824] CR2: ffffffffff240004 CR3: 0000000024e30000 CR4: 
00000000000006e0
[   30.196824] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   30.196824] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[   30.196824] Call Trace:
[   30.196824]  ? write_comp_data+0x2f/0x90
[   30.196824]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.196824]  ? ibft_check_initiator_for+0x159/0x159
[   30.196824]  ? dmi_setup+0x46c/0x46c
[   30.196824]  ? write_comp_data+0x2f/0x90
[   30.196824]  ? ibft_check_initiator_for+0x159/0x159
[   30.196824]  do_one_initcall+0xc4/0x3e0
[   30.196824]  ? perf_trace_initcall_level+0x3e0/0x3e0
[   30.196824]  ? asm_sysvec_error_interrupt+0x10/0x20
[   30.196824]  ? do_one_initcall+0x18c/0x3e0
[   30.196824]  kernel_init_freeable+0x596/0x652
[   30.196824]  ? console_on_rootfs+0x7d/0x7d
[   30.196824]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.196824]  ? rest_init+0xf0/0xf0
[   30.196824]  kernel_init+0x16/0x1d0
[   30.196824]  ? rest_init+0xf0/0xf0
[   30.196824]  ret_from_fork+0x22/0x30
[   30.196824] Modules linked in:
[   30.196824] Dumping ftrace buffer:
[   30.196824]    (ftrace buffer empty)
[   30.196824] CR2: ffffffffff240004
[   30.196824] ---[ end trace 293eae51adac1398 ]---
[   30.196824] RIP: 0010:ibft_init+0x13d/0xc33
[   30.196824] Code: c1 40 84 ce 75 11 83 e0 07 38 c2 0f 9e c1 84 d2 0f 
95 c0 84 c1 74 0a be 04 00 00 00 e8 77 f2 5f ef 49 8d 7f 08 b8 ff ff 37 
00 <4d> 63 6f 04 48 89 fa 48 c1 e0 2a 48 c1 ea 03 8a 04 02 48 89 fa 83
[   30.196824] RSP: 0000:ffff888100fafc30 EFLAGS: 00010246
[   30.196824] RAX: 000000000037ffff RBX: ffffffff937c6fc0 RCX: 
ffffffff815fcf01
[   30.196824] RDX: dffffc0000000000 RSI: 0000000000000001 RDI: 
ffffffffff240008
[   30.196824] RBP: ffff888100fafcf8 R08: ffffed10201f5f12 R09: 
ffffed10201f5f12
[   30.196824] R10: ffff888100faf88f R11: ffffed10201f5f11 R12: 
dffffc0000000000
[   30.196824] R13: ffff888100fafdc0 R14: ffff888100fafcd0 R15: 
ffffffffff240000
[   30.196824] FS:  0000000000000000(0000) GS:ffff88810ad80000(0000) 
knlGS:0000000000000000
[   30.196824] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.196824] CR2: ffffffffff240004 CR3: 0000000024e30000 CR4: 
00000000000006e0
[   30.196824] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   30.196824] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[   30.196824] Kernel panic - not syncing: Fatal exception
[   30.196824] Dumping ftrace buffer:
[   30.196824]    (ftrace buffer empty)
[   30.196824] Kernel Offset: disabled
[   30.196824] Rebooting in 1 seconds..

George
Mike Rapoport Feb. 24, 2021, 10:37 a.m. UTC | #25
On Tue, Feb 23, 2021 at 04:46:28PM -0500, George Kennedy wrote:
> 
> Mike,
> 
> Still no luck.
> 
> [   30.193723] iscsi: registered transport (iser)
> [   30.195970] iBFT detected.
> [   30.196571] BUG: unable to handle page fault for address: ffffffffff240004

Hmm, we cannot set ibft_addr to early pointer to the ACPI table.
Let's try something more disruptive and move the reservation back to
iscsi_ibft_find.c.

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 7bdc0239a943..c118dd54a747 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
 	if (acpi_disabled)
 		return;
 
+#if 0
 	/*
 	 * Initialize the ACPI boot-time table parser.
 	 */
@@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
 		disable_acpi();
 		return;
 	}
+#endif
 
 	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d883176ef2ce..c615ce96c9a2 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -570,16 +570,6 @@ void __init reserve_standard_io_resources(void)
 
 }
 
-static __init void reserve_ibft_region(void)
-{
-	unsigned long addr, size = 0;
-
-	addr = find_ibft_region(&size);
-
-	if (size)
-		memblock_reserve(addr, size);
-}
-
 static bool __init snb_gfx_workaround_needed(void)
 {
 #ifdef CONFIG_PCI
@@ -1032,6 +1022,12 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	find_smp_config();
 
+	/*
+	 * Initialize the ACPI boot-time table parser.
+	 */
+	if (acpi_table_init())
+		disable_acpi();
+
 	reserve_ibft_region();
 
 	early_alloc_pgt_buf();
diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
index 64bb94523281..01be513843d6 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -47,7 +47,25 @@ static const struct {
 #define VGA_MEM 0xA0000 /* VGA buffer */
 #define VGA_SIZE 0x20000 /* 128kB */
 
-static int __init find_ibft_in_mem(void)
+static void __init *acpi_find_ibft_region(void)
+{
+	int i;
+	struct acpi_table_header *table = NULL;
+	acpi_status status;
+
+	if (acpi_disabled)
+		return NULL;
+
+	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
+		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
+		if (ACPI_SUCCESS(status))
+			return table;
+	}
+
+	return NULL;
+}
+
+static void __init *find_ibft_in_mem(void)
 {
 	unsigned long pos;
 	unsigned int len = 0;
@@ -70,35 +88,44 @@ static int __init find_ibft_in_mem(void)
 				/* if the length of the table extends past 1M,
 				 * the table cannot be valid. */
 				if (pos + len <= (IBFT_END-1)) {
-					ibft_addr = (struct acpi_table_ibft *)virt;
 					pr_info("iBFT found at 0x%lx.\n", pos);
-					goto done;
+					return virt;
 				}
 			}
 		}
 	}
-done:
-	return len;
+
+	return NULL;
 }
+
+static void __init *find_ibft(void)
+{
+	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
+	 * only use ACPI for this */
+	if (!efi_enabled(EFI_BOOT))
+		return find_ibft_in_mem();
+	else
+		return acpi_find_ibft_region();
+}
+
 /*
  * Routine used to find the iSCSI Boot Format Table. The logical
  * kernel address is set in the ibft_addr global variable.
  */
-unsigned long __init find_ibft_region(unsigned long *sizep)
+void __init reserve_ibft_region(void)
 {
-	ibft_addr = NULL;
+	struct acpi_table_ibft *table;
+	unsigned long size;
 
-	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
-	 * only use ACPI for this */
+	table = find_ibft();
+	if (!table)
+		return;
 
-	if (!efi_enabled(EFI_BOOT))
-		find_ibft_in_mem();
-
-	if (ibft_addr) {
-		*sizep = PAGE_ALIGN(ibft_addr->header.length);
-		return (u64)virt_to_phys(ibft_addr);
-	}
+	size = PAGE_ALIGN(table->header.length);
+	memblock_reserve(virt_to_phys(table), size);
 
-	*sizep = 0;
-	return 0;
+	if (efi_enabled(EFI_BOOT))
+		acpi_put_table(&table->header);
+	else
+		ibft_addr = table;
 }
diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
index b7b45ca82bea..da813c891990 100644
--- a/include/linux/iscsi_ibft.h
+++ b/include/linux/iscsi_ibft.h
@@ -26,13 +26,9 @@ extern struct acpi_table_ibft *ibft_addr;
  * mapped address is set in the ibft_addr variable.
  */
 #ifdef CONFIG_ISCSI_IBFT_FIND
-unsigned long find_ibft_region(unsigned long *sizep);
+void reserve_ibft_region(void);
 #else
-static inline unsigned long find_ibft_region(unsigned long *sizep)
-{
-	*sizep = 0;
-	return 0;
-}
+static inline void reserve_ibft_region(void) {}
 #endif
 
 #endif /* ISCSI_IBFT_H */
George Kennedy Feb. 24, 2021, 2:22 p.m. UTC | #26
On 2/24/2021 5:37 AM, Mike Rapoport wrote:
> On Tue, Feb 23, 2021 at 04:46:28PM -0500, George Kennedy wrote:
>> Mike,
>>
>> Still no luck.
>>
>> [   30.193723] iscsi: registered transport (iser)
>> [   30.195970] iBFT detected.
>> [   30.196571] BUG: unable to handle page fault for address: ffffffffff240004
> Hmm, we cannot set ibft_addr to early pointer to the ACPI table.
> Let's try something more disruptive and move the reservation back to
> iscsi_ibft_find.c.
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 7bdc0239a943..c118dd54a747 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
>   	if (acpi_disabled)
>   		return;
>   
> +#if 0
>   	/*
>   	 * Initialize the ACPI boot-time table parser.
>   	 */
> @@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
>   		disable_acpi();
>   		return;
>   	}
> +#endif
>   
>   	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
>   
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d883176ef2ce..c615ce96c9a2 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -570,16 +570,6 @@ void __init reserve_standard_io_resources(void)
>   
>   }
>   
> -static __init void reserve_ibft_region(void)
> -{
> -	unsigned long addr, size = 0;
> -
> -	addr = find_ibft_region(&size);
> -
> -	if (size)
> -		memblock_reserve(addr, size);
> -}
> -
>   static bool __init snb_gfx_workaround_needed(void)
>   {
>   #ifdef CONFIG_PCI
> @@ -1032,6 +1022,12 @@ void __init setup_arch(char **cmdline_p)
>   	 */
>   	find_smp_config();
>   
> +	/*
> +	 * Initialize the ACPI boot-time table parser.
> +	 */
> +	if (acpi_table_init())
> +		disable_acpi();
> +
>   	reserve_ibft_region();
>   
>   	early_alloc_pgt_buf();
> diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
> index 64bb94523281..01be513843d6 100644
> --- a/drivers/firmware/iscsi_ibft_find.c
> +++ b/drivers/firmware/iscsi_ibft_find.c
> @@ -47,7 +47,25 @@ static const struct {
>   #define VGA_MEM 0xA0000 /* VGA buffer */
>   #define VGA_SIZE 0x20000 /* 128kB */
>   
> -static int __init find_ibft_in_mem(void)
> +static void __init *acpi_find_ibft_region(void)
> +{
> +	int i;
> +	struct acpi_table_header *table = NULL;
> +	acpi_status status;
> +
> +	if (acpi_disabled)
> +		return NULL;
> +
> +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
> +		if (ACPI_SUCCESS(status))
> +			return table;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void __init *find_ibft_in_mem(void)
>   {
>   	unsigned long pos;
>   	unsigned int len = 0;
> @@ -70,35 +88,44 @@ static int __init find_ibft_in_mem(void)
>   				/* if the length of the table extends past 1M,
>   				 * the table cannot be valid. */
>   				if (pos + len <= (IBFT_END-1)) {
> -					ibft_addr = (struct acpi_table_ibft *)virt;
>   					pr_info("iBFT found at 0x%lx.\n", pos);
> -					goto done;
> +					return virt;
>   				}
>   			}
>   		}
>   	}
> -done:
> -	return len;
> +
> +	return NULL;
>   }
> +
> +static void __init *find_ibft(void)
> +{
> +	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> +	 * only use ACPI for this */
> +	if (!efi_enabled(EFI_BOOT))
> +		return find_ibft_in_mem();
> +	else
> +		return acpi_find_ibft_region();
> +}
> +
>   /*
>    * Routine used to find the iSCSI Boot Format Table. The logical
>    * kernel address is set in the ibft_addr global variable.
>    */
> -unsigned long __init find_ibft_region(unsigned long *sizep)
> +void __init reserve_ibft_region(void)
>   {
> -	ibft_addr = NULL;
> +	struct acpi_table_ibft *table;
> +	unsigned long size;
>   
> -	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> -	 * only use ACPI for this */
> +	table = find_ibft();
> +	if (!table)
> +		return;
>   
> -	if (!efi_enabled(EFI_BOOT))
> -		find_ibft_in_mem();
> -
> -	if (ibft_addr) {
> -		*sizep = PAGE_ALIGN(ibft_addr->header.length);
> -		return (u64)virt_to_phys(ibft_addr);
> -	}
> +	size = PAGE_ALIGN(table->header.length);
> +	memblock_reserve(virt_to_phys(table), size);
>   
> -	*sizep = 0;
> -	return 0;
> +	if (efi_enabled(EFI_BOOT))
> +		acpi_put_table(&table->header);
> +	else
> +		ibft_addr = table;
>   }
> diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
> index b7b45ca82bea..da813c891990 100644
> --- a/include/linux/iscsi_ibft.h
> +++ b/include/linux/iscsi_ibft.h
> @@ -26,13 +26,9 @@ extern struct acpi_table_ibft *ibft_addr;
>    * mapped address is set in the ibft_addr variable.
>    */
>   #ifdef CONFIG_ISCSI_IBFT_FIND
> -unsigned long find_ibft_region(unsigned long *sizep);
> +void reserve_ibft_region(void);
>   #else
> -static inline unsigned long find_ibft_region(unsigned long *sizep)
> -{
> -	*sizep = 0;
> -	return 0;
> -}
> +static inline void reserve_ibft_region(void) {}
>   #endif
>   
>   #endif /* ISCSI_IBFT_H */

Still no luck Mike,

We're back to the original problem where the only thing that worked was 
to run "SetPageReserved(page)" before calling "kmap(page)". The page is 
being "freed" before ibft_init() is called as a result of the recent 
buddy page freeing changes.

[   30.385207] iscsi: registered transport (iser)
[   30.387462] iBFT detected.
[   30.388042] 
==================================================================
[   30.388119] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
[   30.388119] Read of size 4 at addr ffff8880be453004 by task swapper/0/1
[   30.388119]
[   30.388119] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #11
[   30.388119] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 0.0.0 02/06/2015
[   30.388119] Call Trace:
[   30.388119]  dump_stack+0xdb/0x120
[   30.388119]  ? ibft_init+0x134/0xc33
[   30.388119]  print_address_description.constprop.7+0x41/0x60
[   30.388119]  ? ibft_init+0x134/0xc33
[   30.388119]  ? ibft_init+0x134/0xc33
[   30.388119]  kasan_report.cold.10+0x78/0xd1
[   30.388119]  ? ibft_init+0x134/0xc33
[   30.388119]  __asan_report_load_n_noabort+0xf/0x20
[   30.388119]  ibft_init+0x134/0xc33
[   30.388119]  ? write_comp_data+0x2f/0x90
[   30.388119]  ? ibft_check_initiator_for+0x159/0x159
[   30.388119]  ? write_comp_data+0x2f/0x90
[   30.388119]  ? ibft_check_initiator_for+0x159/0x159
[   30.388119]  do_one_initcall+0xc4/0x3e0
[   30.388119]  ? perf_trace_initcall_level+0x3e0/0x3e0
[   30.388119]  ? unpoison_range+0x14/0x40
[   30.388119]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
[   30.388119]  ? kernel_init_freeable+0x420/0x652

George

>
Mike Rapoport Feb. 25, 2021, 8:53 a.m. UTC | #27
Hi George,

> On 2/24/2021 5:37 AM, Mike Rapoport wrote:
> > On Tue, Feb 23, 2021 at 04:46:28PM -0500, George Kennedy wrote:
> > > Mike,
> > > 
> > > Still no luck.
> > > 
> > > [   30.193723] iscsi: registered transport (iser)
> > > [   30.195970] iBFT detected.
> > > [   30.196571] BUG: unable to handle page fault for address: ffffffffff240004
> > Hmm, we cannot set ibft_addr to early pointer to the ACPI table.
> > Let's try something more disruptive and move the reservation back to
> > iscsi_ibft_find.c.
> > 
> > diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> > index 7bdc0239a943..c118dd54a747 100644
> > --- a/arch/x86/kernel/acpi/boot.c
> > +++ b/arch/x86/kernel/acpi/boot.c
> > @@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
> >   	if (acpi_disabled)
> >   		return;
> > +#if 0
> >   	/*
> >   	 * Initialize the ACPI boot-time table parser.
> >   	 */
> > @@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
> >   		disable_acpi();
> >   		return;
> >   	}
> > +#endif
> >   	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index d883176ef2ce..c615ce96c9a2 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -570,16 +570,6 @@ void __init reserve_standard_io_resources(void)
> >   }
> > -static __init void reserve_ibft_region(void)
> > -{
> > -	unsigned long addr, size = 0;
> > -
> > -	addr = find_ibft_region(&size);
> > -
> > -	if (size)
> > -		memblock_reserve(addr, size);
> > -}
> > -
> >   static bool __init snb_gfx_workaround_needed(void)
> >   {
> >   #ifdef CONFIG_PCI
> > @@ -1032,6 +1022,12 @@ void __init setup_arch(char **cmdline_p)
> >   	 */
> >   	find_smp_config();
> > +	/*
> > +	 * Initialize the ACPI boot-time table parser.
> > +	 */
> > +	if (acpi_table_init())
> > +		disable_acpi();
> > +
> >   	reserve_ibft_region();
> >   	early_alloc_pgt_buf();
> > diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
> > index 64bb94523281..01be513843d6 100644
> > --- a/drivers/firmware/iscsi_ibft_find.c
> > +++ b/drivers/firmware/iscsi_ibft_find.c
> > @@ -47,7 +47,25 @@ static const struct {
> >   #define VGA_MEM 0xA0000 /* VGA buffer */
> >   #define VGA_SIZE 0x20000 /* 128kB */
> > -static int __init find_ibft_in_mem(void)
> > +static void __init *acpi_find_ibft_region(void)
> > +{
> > +	int i;
> > +	struct acpi_table_header *table = NULL;
> > +	acpi_status status;
> > +
> > +	if (acpi_disabled)
> > +		return NULL;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> > +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
> > +		if (ACPI_SUCCESS(status))
> > +			return table;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +static void __init *find_ibft_in_mem(void)
> >   {
> >   	unsigned long pos;
> >   	unsigned int len = 0;
> > @@ -70,35 +88,44 @@ static int __init find_ibft_in_mem(void)
> >   				/* if the length of the table extends past 1M,
> >   				 * the table cannot be valid. */
> >   				if (pos + len <= (IBFT_END-1)) {
> > -					ibft_addr = (struct acpi_table_ibft *)virt;
> >   					pr_info("iBFT found at 0x%lx.\n", pos);
> > -					goto done;
> > +					return virt;
> >   				}
> >   			}
> >   		}
> >   	}
> > -done:
> > -	return len;
> > +
> > +	return NULL;
> >   }
> > +
> > +static void __init *find_ibft(void)
> > +{
> > +	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> > +	 * only use ACPI for this */
> > +	if (!efi_enabled(EFI_BOOT))
> > +		return find_ibft_in_mem();
> > +	else
> > +		return acpi_find_ibft_region();
> > +}
> > +
> >   /*
> >    * Routine used to find the iSCSI Boot Format Table. The logical
> >    * kernel address is set in the ibft_addr global variable.
> >    */
> > -unsigned long __init find_ibft_region(unsigned long *sizep)
> > +void __init reserve_ibft_region(void)
> >   {
> > -	ibft_addr = NULL;
> > +	struct acpi_table_ibft *table;
> > +	unsigned long size;
> > -	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> > -	 * only use ACPI for this */
> > +	table = find_ibft();
> > +	if (!table)
> > +		return;
> > -	if (!efi_enabled(EFI_BOOT))
> > -		find_ibft_in_mem();
> > -
> > -	if (ibft_addr) {
> > -		*sizep = PAGE_ALIGN(ibft_addr->header.length);
> > -		return (u64)virt_to_phys(ibft_addr);
> > -	}
> > +	size = PAGE_ALIGN(table->header.length);
> > +	memblock_reserve(virt_to_phys(table), size);
> > -	*sizep = 0;
> > -	return 0;
> > +	if (efi_enabled(EFI_BOOT))
> > +		acpi_put_table(&table->header);
> > +	else
> > +		ibft_addr = table;
> >   }
> > diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
> > index b7b45ca82bea..da813c891990 100644
> > --- a/include/linux/iscsi_ibft.h
> > +++ b/include/linux/iscsi_ibft.h
> > @@ -26,13 +26,9 @@ extern struct acpi_table_ibft *ibft_addr;
> >    * mapped address is set in the ibft_addr variable.
> >    */
> >   #ifdef CONFIG_ISCSI_IBFT_FIND
> > -unsigned long find_ibft_region(unsigned long *sizep);
> > +void reserve_ibft_region(void);
> >   #else
> > -static inline unsigned long find_ibft_region(unsigned long *sizep)
> > -{
> > -	*sizep = 0;
> > -	return 0;
> > -}
> > +static inline void reserve_ibft_region(void) {}
> >   #endif
> >   #endif /* ISCSI_IBFT_H */
> 
> Still no luck Mike,
> 
> We're back to the original problem where the only thing that worked was to
> run "SetPageReserved(page)" before calling "kmap(page)". The page is being
> "freed" before ibft_init() is called as a result of the recent buddy page
> freeing changes.

I keep missing some little details each time :(
Ok, let's try from the different angle.

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index 4b9b329a5a92..ec43e1447336 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -7,6 +7,8 @@
  *
  *****************************************************************************/
 
+#include <linux/memblock.h>
+
 #include <acpi/acpi.h>
 #include "accommon.h"
 #include "actables.h"
@@ -339,6 +341,21 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
 			acpi_tb_parse_fadt();
 		}
 
+		if (ACPI_SUCCESS(status) &&
+		    ACPI_COMPARE_NAMESEG(&acpi_gbl_root_table_list.
+					 tables[table_index].signature,
+					 ACPI_SIG_IBFT)) {
+			struct acpi_table_header *ibft;
+			struct acpi_table_desc *desc;
+
+			desc = &acpi_gbl_root_table_list.tables[table_index];
+			status = acpi_tb_get_table(desc, &ibft);
+			if (ACPI_SUCCESS(status)) {
+				memblock_reserve(address, ibft->length);
+				acpi_tb_put_table(desc);
+			}
+		}
+
 next_table:
 
 		table_entry += table_entry_size;
George Kennedy Feb. 25, 2021, 12:38 p.m. UTC | #28
On 2/25/2021 3:53 AM, Mike Rapoport wrote:
> Hi George,
>
>> On 2/24/2021 5:37 AM, Mike Rapoport wrote:
>>> On Tue, Feb 23, 2021 at 04:46:28PM -0500, George Kennedy wrote:
>>>> Mike,
>>>>
>>>> Still no luck.
>>>>
>>>> [   30.193723] iscsi: registered transport (iser)
>>>> [   30.195970] iBFT detected.
>>>> [   30.196571] BUG: unable to handle page fault for address: ffffffffff240004
>>> Hmm, we cannot set ibft_addr to early pointer to the ACPI table.
>>> Let's try something more disruptive and move the reservation back to
>>> iscsi_ibft_find.c.
>>>
>>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>>> index 7bdc0239a943..c118dd54a747 100644
>>> --- a/arch/x86/kernel/acpi/boot.c
>>> +++ b/arch/x86/kernel/acpi/boot.c
>>> @@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
>>>    	if (acpi_disabled)
>>>    		return;
>>> +#if 0
>>>    	/*
>>>    	 * Initialize the ACPI boot-time table parser.
>>>    	 */
>>> @@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
>>>    		disable_acpi();
>>>    		return;
>>>    	}
>>> +#endif
>>>    	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>> index d883176ef2ce..c615ce96c9a2 100644
>>> --- a/arch/x86/kernel/setup.c
>>> +++ b/arch/x86/kernel/setup.c
>>> @@ -570,16 +570,6 @@ void __init reserve_standard_io_resources(void)
>>>    }
>>> -static __init void reserve_ibft_region(void)
>>> -{
>>> -	unsigned long addr, size = 0;
>>> -
>>> -	addr = find_ibft_region(&size);
>>> -
>>> -	if (size)
>>> -		memblock_reserve(addr, size);
>>> -}
>>> -
>>>    static bool __init snb_gfx_workaround_needed(void)
>>>    {
>>>    #ifdef CONFIG_PCI
>>> @@ -1032,6 +1022,12 @@ void __init setup_arch(char **cmdline_p)
>>>    	 */
>>>    	find_smp_config();
>>> +	/*
>>> +	 * Initialize the ACPI boot-time table parser.
>>> +	 */
>>> +	if (acpi_table_init())
>>> +		disable_acpi();
>>> +
>>>    	reserve_ibft_region();
>>>    	early_alloc_pgt_buf();
>>> diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
>>> index 64bb94523281..01be513843d6 100644
>>> --- a/drivers/firmware/iscsi_ibft_find.c
>>> +++ b/drivers/firmware/iscsi_ibft_find.c
>>> @@ -47,7 +47,25 @@ static const struct {
>>>    #define VGA_MEM 0xA0000 /* VGA buffer */
>>>    #define VGA_SIZE 0x20000 /* 128kB */
>>> -static int __init find_ibft_in_mem(void)
>>> +static void __init *acpi_find_ibft_region(void)
>>> +{
>>> +	int i;
>>> +	struct acpi_table_header *table = NULL;
>>> +	acpi_status status;
>>> +
>>> +	if (acpi_disabled)
>>> +		return NULL;
>>> +
>>> +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
>>> +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
>>> +		if (ACPI_SUCCESS(status))
>>> +			return table;
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static void __init *find_ibft_in_mem(void)
>>>    {
>>>    	unsigned long pos;
>>>    	unsigned int len = 0;
>>> @@ -70,35 +88,44 @@ static int __init find_ibft_in_mem(void)
>>>    				/* if the length of the table extends past 1M,
>>>    				 * the table cannot be valid. */
>>>    				if (pos + len <= (IBFT_END-1)) {
>>> -					ibft_addr = (struct acpi_table_ibft *)virt;
>>>    					pr_info("iBFT found at 0x%lx.\n", pos);
>>> -					goto done;
>>> +					return virt;
>>>    				}
>>>    			}
>>>    		}
>>>    	}
>>> -done:
>>> -	return len;
>>> +
>>> +	return NULL;
>>>    }
>>> +
>>> +static void __init *find_ibft(void)
>>> +{
>>> +	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>>> +	 * only use ACPI for this */
>>> +	if (!efi_enabled(EFI_BOOT))
>>> +		return find_ibft_in_mem();
>>> +	else
>>> +		return acpi_find_ibft_region();
>>> +}
>>> +
>>>    /*
>>>     * Routine used to find the iSCSI Boot Format Table. The logical
>>>     * kernel address is set in the ibft_addr global variable.
>>>     */
>>> -unsigned long __init find_ibft_region(unsigned long *sizep)
>>> +void __init reserve_ibft_region(void)
>>>    {
>>> -	ibft_addr = NULL;
>>> +	struct acpi_table_ibft *table;
>>> +	unsigned long size;
>>> -	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>>> -	 * only use ACPI for this */
>>> +	table = find_ibft();
>>> +	if (!table)
>>> +		return;
>>> -	if (!efi_enabled(EFI_BOOT))
>>> -		find_ibft_in_mem();
>>> -
>>> -	if (ibft_addr) {
>>> -		*sizep = PAGE_ALIGN(ibft_addr->header.length);
>>> -		return (u64)virt_to_phys(ibft_addr);
>>> -	}
>>> +	size = PAGE_ALIGN(table->header.length);
>>> +	memblock_reserve(virt_to_phys(table), size);
>>> -	*sizep = 0;
>>> -	return 0;
>>> +	if (efi_enabled(EFI_BOOT))
>>> +		acpi_put_table(&table->header);
>>> +	else
>>> +		ibft_addr = table;
>>>    }
>>> diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
>>> index b7b45ca82bea..da813c891990 100644
>>> --- a/include/linux/iscsi_ibft.h
>>> +++ b/include/linux/iscsi_ibft.h
>>> @@ -26,13 +26,9 @@ extern struct acpi_table_ibft *ibft_addr;
>>>     * mapped address is set in the ibft_addr variable.
>>>     */
>>>    #ifdef CONFIG_ISCSI_IBFT_FIND
>>> -unsigned long find_ibft_region(unsigned long *sizep);
>>> +void reserve_ibft_region(void);
>>>    #else
>>> -static inline unsigned long find_ibft_region(unsigned long *sizep)
>>> -{
>>> -	*sizep = 0;
>>> -	return 0;
>>> -}
>>> +static inline void reserve_ibft_region(void) {}
>>>    #endif
>>>    #endif /* ISCSI_IBFT_H */
>> Still no luck Mike,
>>
>> We're back to the original problem where the only thing that worked was to
>> run "SetPageReserved(page)" before calling "kmap(page)". The page is being
>> "freed" before ibft_init() is called as a result of the recent buddy page
>> freeing changes.
> I keep missing some little details each time :(
No worries. Thanks for all your help. Does this patch go on top of your 
previous patch or is it standalone?

George
> Ok, let's try from the different angle.
>
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> index 4b9b329a5a92..ec43e1447336 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -7,6 +7,8 @@
>    *
>    *****************************************************************************/
>   
> +#include <linux/memblock.h>
> +
>   #include <acpi/acpi.h>
>   #include "accommon.h"
>   #include "actables.h"
> @@ -339,6 +341,21 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>   			acpi_tb_parse_fadt();
>   		}
>   
> +		if (ACPI_SUCCESS(status) &&
> +		    ACPI_COMPARE_NAMESEG(&acpi_gbl_root_table_list.
> +					 tables[table_index].signature,
> +					 ACPI_SIG_IBFT)) {
> +			struct acpi_table_header *ibft;
> +			struct acpi_table_desc *desc;
> +
> +			desc = &acpi_gbl_root_table_list.tables[table_index];
> +			status = acpi_tb_get_table(desc, &ibft);
> +			if (ACPI_SUCCESS(status)) {
> +				memblock_reserve(address, ibft->length);
> +				acpi_tb_put_table(desc);
> +			}
> +		}
> +
>   next_table:
>   
>   		table_entry += table_entry_size;
>   
>
>
Mike Rapoport Feb. 25, 2021, 2:57 p.m. UTC | #29
On Thu, Feb 25, 2021 at 07:38:19AM -0500, George Kennedy wrote:
> On 2/25/2021 3:53 AM, Mike Rapoport wrote:
> > Hi George,
> > 
> > > On 2/24/2021 5:37 AM, Mike Rapoport wrote:
> > > > On Tue, Feb 23, 2021 at 04:46:28PM -0500, George Kennedy wrote:
> > > > > Mike,
> > > > > 
> > > > > Still no luck.
> > > > > 
> > > > > [   30.193723] iscsi: registered transport (iser)
> > > > > [   30.195970] iBFT detected.
> > > > > [   30.196571] BUG: unable to handle page fault for address: ffffffffff240004
> > > > Hmm, we cannot set ibft_addr to early pointer to the ACPI table.
> > > > Let's try something more disruptive and move the reservation back to
> > > > iscsi_ibft_find.c.
> > > > 
> > > > diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> > > > index 7bdc0239a943..c118dd54a747 100644
> > > > --- a/arch/x86/kernel/acpi/boot.c
> > > > +++ b/arch/x86/kernel/acpi/boot.c
> > > > @@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
> > > >    	if (acpi_disabled)
> > > >    		return;
> > > > +#if 0
> > > >    	/*
> > > >    	 * Initialize the ACPI boot-time table parser.
> > > >    	 */
> > > > @@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
> > > >    		disable_acpi();
> > > >    		return;
> > > >    	}
> > > > +#endif
> > > >    	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
> > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > index d883176ef2ce..c615ce96c9a2 100644
> > > > --- a/arch/x86/kernel/setup.c
> > > > +++ b/arch/x86/kernel/setup.c
> > > > @@ -570,16 +570,6 @@ void __init reserve_standard_io_resources(void)
> > > >    }
> > > > -static __init void reserve_ibft_region(void)
> > > > -{
> > > > -	unsigned long addr, size = 0;
> > > > -
> > > > -	addr = find_ibft_region(&size);
> > > > -
> > > > -	if (size)
> > > > -		memblock_reserve(addr, size);
> > > > -}
> > > > -
> > > >    static bool __init snb_gfx_workaround_needed(void)
> > > >    {
> > > >    #ifdef CONFIG_PCI
> > > > @@ -1032,6 +1022,12 @@ void __init setup_arch(char **cmdline_p)
> > > >    	 */
> > > >    	find_smp_config();
> > > > +	/*
> > > > +	 * Initialize the ACPI boot-time table parser.
> > > > +	 */
> > > > +	if (acpi_table_init())
> > > > +		disable_acpi();
> > > > +
> > > >    	reserve_ibft_region();
> > > >    	early_alloc_pgt_buf();
> > > > diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
> > > > index 64bb94523281..01be513843d6 100644
> > > > --- a/drivers/firmware/iscsi_ibft_find.c
> > > > +++ b/drivers/firmware/iscsi_ibft_find.c
> > > > @@ -47,7 +47,25 @@ static const struct {
> > > >    #define VGA_MEM 0xA0000 /* VGA buffer */
> > > >    #define VGA_SIZE 0x20000 /* 128kB */
> > > > -static int __init find_ibft_in_mem(void)
> > > > +static void __init *acpi_find_ibft_region(void)
> > > > +{
> > > > +	int i;
> > > > +	struct acpi_table_header *table = NULL;
> > > > +	acpi_status status;
> > > > +
> > > > +	if (acpi_disabled)
> > > > +		return NULL;
> > > > +
> > > > +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> > > > +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
> > > > +		if (ACPI_SUCCESS(status))
> > > > +			return table;
> > > > +	}
> > > > +
> > > > +	return NULL;
> > > > +}
> > > > +
> > > > +static void __init *find_ibft_in_mem(void)
> > > >    {
> > > >    	unsigned long pos;
> > > >    	unsigned int len = 0;
> > > > @@ -70,35 +88,44 @@ static int __init find_ibft_in_mem(void)
> > > >    				/* if the length of the table extends past 1M,
> > > >    				 * the table cannot be valid. */
> > > >    				if (pos + len <= (IBFT_END-1)) {
> > > > -					ibft_addr = (struct acpi_table_ibft *)virt;
> > > >    					pr_info("iBFT found at 0x%lx.\n", pos);
> > > > -					goto done;
> > > > +					return virt;
> > > >    				}
> > > >    			}
> > > >    		}
> > > >    	}
> > > > -done:
> > > > -	return len;
> > > > +
> > > > +	return NULL;
> > > >    }
> > > > +
> > > > +static void __init *find_ibft(void)
> > > > +{
> > > > +	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> > > > +	 * only use ACPI for this */
> > > > +	if (!efi_enabled(EFI_BOOT))
> > > > +		return find_ibft_in_mem();
> > > > +	else
> > > > +		return acpi_find_ibft_region();
> > > > +}
> > > > +
> > > >    /*
> > > >     * Routine used to find the iSCSI Boot Format Table. The logical
> > > >     * kernel address is set in the ibft_addr global variable.
> > > >     */
> > > > -unsigned long __init find_ibft_region(unsigned long *sizep)
> > > > +void __init reserve_ibft_region(void)
> > > >    {
> > > > -	ibft_addr = NULL;
> > > > +	struct acpi_table_ibft *table;
> > > > +	unsigned long size;
> > > > -	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> > > > -	 * only use ACPI for this */
> > > > +	table = find_ibft();
> > > > +	if (!table)
> > > > +		return;
> > > > -	if (!efi_enabled(EFI_BOOT))
> > > > -		find_ibft_in_mem();
> > > > -
> > > > -	if (ibft_addr) {
> > > > -		*sizep = PAGE_ALIGN(ibft_addr->header.length);
> > > > -		return (u64)virt_to_phys(ibft_addr);
> > > > -	}
> > > > +	size = PAGE_ALIGN(table->header.length);
> > > > +	memblock_reserve(virt_to_phys(table), size);
> > > > -	*sizep = 0;
> > > > -	return 0;
> > > > +	if (efi_enabled(EFI_BOOT))
> > > > +		acpi_put_table(&table->header);
> > > > +	else
> > > > +		ibft_addr = table;
> > > >    }
> > > > diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
> > > > index b7b45ca82bea..da813c891990 100644
> > > > --- a/include/linux/iscsi_ibft.h
> > > > +++ b/include/linux/iscsi_ibft.h
> > > > @@ -26,13 +26,9 @@ extern struct acpi_table_ibft *ibft_addr;
> > > >     * mapped address is set in the ibft_addr variable.
> > > >     */
> > > >    #ifdef CONFIG_ISCSI_IBFT_FIND
> > > > -unsigned long find_ibft_region(unsigned long *sizep);
> > > > +void reserve_ibft_region(void);
> > > >    #else
> > > > -static inline unsigned long find_ibft_region(unsigned long *sizep)
> > > > -{
> > > > -	*sizep = 0;
> > > > -	return 0;
> > > > -}
> > > > +static inline void reserve_ibft_region(void) {}
> > > >    #endif
> > > >    #endif /* ISCSI_IBFT_H */
> > > Still no luck Mike,
> > > 
> > > We're back to the original problem where the only thing that worked was to
> > > run "SetPageReserved(page)" before calling "kmap(page)". The page is being
> > > "freed" before ibft_init() is called as a result of the recent buddy page
> > > freeing changes.
> > I keep missing some little details each time :(
> No worries. Thanks for all your help. Does this patch go on top of your
> previous patch or is it standalone?

This is standalone.
 
> George
> > Ok, let's try from the different angle.
> > 
> > diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> > index 4b9b329a5a92..ec43e1447336 100644
> > --- a/drivers/acpi/acpica/tbutils.c
> > +++ b/drivers/acpi/acpica/tbutils.c
> > @@ -7,6 +7,8 @@
> >    *
> >    *****************************************************************************/
> > +#include <linux/memblock.h>
> > +
> >   #include <acpi/acpi.h>
> >   #include "accommon.h"
> >   #include "actables.h"
> > @@ -339,6 +341,21 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
> >   			acpi_tb_parse_fadt();
> >   		}
> > +		if (ACPI_SUCCESS(status) &&
> > +		    ACPI_COMPARE_NAMESEG(&acpi_gbl_root_table_list.
> > +					 tables[table_index].signature,
> > +					 ACPI_SIG_IBFT)) {
> > +			struct acpi_table_header *ibft;
> > +			struct acpi_table_desc *desc;
> > +
> > +			desc = &acpi_gbl_root_table_list.tables[table_index];
> > +			status = acpi_tb_get_table(desc, &ibft);
> > +			if (ACPI_SUCCESS(status)) {
> > +				memblock_reserve(address, ibft->length);
> > +				acpi_tb_put_table(desc);
> > +			}
> > +		}
> > +
> >   next_table:
> >   		table_entry += table_entry_size;
> > 
> > 
>
George Kennedy Feb. 25, 2021, 3:22 p.m. UTC | #30
On 2/25/2021 9:57 AM, Mike Rapoport wrote:
> On Thu, Feb 25, 2021 at 07:38:19AM -0500, George Kennedy wrote:
>> On 2/25/2021 3:53 AM, Mike Rapoport wrote:
>>> Hi George,
>>>
>>>> On 2/24/2021 5:37 AM, Mike Rapoport wrote:
>>>>> On Tue, Feb 23, 2021 at 04:46:28PM -0500, George Kennedy wrote:
>>>>>> Mike,
>>>>>>
>>>>>> Still no luck.
>>>>>>
>>>>>> [   30.193723] iscsi: registered transport (iser)
>>>>>> [   30.195970] iBFT detected.
>>>>>> [   30.196571] BUG: unable to handle page fault for address: ffffffffff240004
>>>>> Hmm, we cannot set ibft_addr to early pointer to the ACPI table.
>>>>> Let's try something more disruptive and move the reservation back to
>>>>> iscsi_ibft_find.c.
>>>>>
>>>>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>>>>> index 7bdc0239a943..c118dd54a747 100644
>>>>> --- a/arch/x86/kernel/acpi/boot.c
>>>>> +++ b/arch/x86/kernel/acpi/boot.c
>>>>> @@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
>>>>>     	if (acpi_disabled)
>>>>>     		return;
>>>>> +#if 0
>>>>>     	/*
>>>>>     	 * Initialize the ACPI boot-time table parser.
>>>>>     	 */
>>>>> @@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
>>>>>     		disable_acpi();
>>>>>     		return;
>>>>>     	}
>>>>> +#endif
>>>>>     	acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
>>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>>>> index d883176ef2ce..c615ce96c9a2 100644
>>>>> --- a/arch/x86/kernel/setup.c
>>>>> +++ b/arch/x86/kernel/setup.c
>>>>> @@ -570,16 +570,6 @@ void __init reserve_standard_io_resources(void)
>>>>>     }
>>>>> -static __init void reserve_ibft_region(void)
>>>>> -{
>>>>> -	unsigned long addr, size = 0;
>>>>> -
>>>>> -	addr = find_ibft_region(&size);
>>>>> -
>>>>> -	if (size)
>>>>> -		memblock_reserve(addr, size);
>>>>> -}
>>>>> -
>>>>>     static bool __init snb_gfx_workaround_needed(void)
>>>>>     {
>>>>>     #ifdef CONFIG_PCI
>>>>> @@ -1032,6 +1022,12 @@ void __init setup_arch(char **cmdline_p)
>>>>>     	 */
>>>>>     	find_smp_config();
>>>>> +	/*
>>>>> +	 * Initialize the ACPI boot-time table parser.
>>>>> +	 */
>>>>> +	if (acpi_table_init())
>>>>> +		disable_acpi();
>>>>> +
>>>>>     	reserve_ibft_region();
>>>>>     	early_alloc_pgt_buf();
>>>>> diff --git a/drivers/firmware/iscsi_ibft_find.c b/drivers/firmware/iscsi_ibft_find.c
>>>>> index 64bb94523281..01be513843d6 100644
>>>>> --- a/drivers/firmware/iscsi_ibft_find.c
>>>>> +++ b/drivers/firmware/iscsi_ibft_find.c
>>>>> @@ -47,7 +47,25 @@ static const struct {
>>>>>     #define VGA_MEM 0xA0000 /* VGA buffer */
>>>>>     #define VGA_SIZE 0x20000 /* 128kB */
>>>>> -static int __init find_ibft_in_mem(void)
>>>>> +static void __init *acpi_find_ibft_region(void)
>>>>> +{
>>>>> +	int i;
>>>>> +	struct acpi_table_header *table = NULL;
>>>>> +	acpi_status status;
>>>>> +
>>>>> +	if (acpi_disabled)
>>>>> +		return NULL;
>>>>> +
>>>>> +	for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
>>>>> +		status = acpi_get_table(ibft_signs[i].sign, 0, &table);
>>>>> +		if (ACPI_SUCCESS(status))
>>>>> +			return table;
>>>>> +	}
>>>>> +
>>>>> +	return NULL;
>>>>> +}
>>>>> +
>>>>> +static void __init *find_ibft_in_mem(void)
>>>>>     {
>>>>>     	unsigned long pos;
>>>>>     	unsigned int len = 0;
>>>>> @@ -70,35 +88,44 @@ static int __init find_ibft_in_mem(void)
>>>>>     				/* if the length of the table extends past 1M,
>>>>>     				 * the table cannot be valid. */
>>>>>     				if (pos + len <= (IBFT_END-1)) {
>>>>> -					ibft_addr = (struct acpi_table_ibft *)virt;
>>>>>     					pr_info("iBFT found at 0x%lx.\n", pos);
>>>>> -					goto done;
>>>>> +					return virt;
>>>>>     				}
>>>>>     			}
>>>>>     		}
>>>>>     	}
>>>>> -done:
>>>>> -	return len;
>>>>> +
>>>>> +	return NULL;
>>>>>     }
>>>>> +
>>>>> +static void __init *find_ibft(void)
>>>>> +{
>>>>> +	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>>>>> +	 * only use ACPI for this */
>>>>> +	if (!efi_enabled(EFI_BOOT))
>>>>> +		return find_ibft_in_mem();
>>>>> +	else
>>>>> +		return acpi_find_ibft_region();
>>>>> +}
>>>>> +
>>>>>     /*
>>>>>      * Routine used to find the iSCSI Boot Format Table. The logical
>>>>>      * kernel address is set in the ibft_addr global variable.
>>>>>      */
>>>>> -unsigned long __init find_ibft_region(unsigned long *sizep)
>>>>> +void __init reserve_ibft_region(void)
>>>>>     {
>>>>> -	ibft_addr = NULL;
>>>>> +	struct acpi_table_ibft *table;
>>>>> +	unsigned long size;
>>>>> -	/* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>>>>> -	 * only use ACPI for this */
>>>>> +	table = find_ibft();
>>>>> +	if (!table)
>>>>> +		return;
>>>>> -	if (!efi_enabled(EFI_BOOT))
>>>>> -		find_ibft_in_mem();
>>>>> -
>>>>> -	if (ibft_addr) {
>>>>> -		*sizep = PAGE_ALIGN(ibft_addr->header.length);
>>>>> -		return (u64)virt_to_phys(ibft_addr);
>>>>> -	}
>>>>> +	size = PAGE_ALIGN(table->header.length);
>>>>> +	memblock_reserve(virt_to_phys(table), size);
>>>>> -	*sizep = 0;
>>>>> -	return 0;
>>>>> +	if (efi_enabled(EFI_BOOT))
>>>>> +		acpi_put_table(&table->header);
>>>>> +	else
>>>>> +		ibft_addr = table;
>>>>>     }
>>>>> diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
>>>>> index b7b45ca82bea..da813c891990 100644
>>>>> --- a/include/linux/iscsi_ibft.h
>>>>> +++ b/include/linux/iscsi_ibft.h
>>>>> @@ -26,13 +26,9 @@ extern struct acpi_table_ibft *ibft_addr;
>>>>>      * mapped address is set in the ibft_addr variable.
>>>>>      */
>>>>>     #ifdef CONFIG_ISCSI_IBFT_FIND
>>>>> -unsigned long find_ibft_region(unsigned long *sizep);
>>>>> +void reserve_ibft_region(void);
>>>>>     #else
>>>>> -static inline unsigned long find_ibft_region(unsigned long *sizep)
>>>>> -{
>>>>> -	*sizep = 0;
>>>>> -	return 0;
>>>>> -}
>>>>> +static inline void reserve_ibft_region(void) {}
>>>>>     #endif
>>>>>     #endif /* ISCSI_IBFT_H */
>>>> Still no luck Mike,
>>>>
>>>> We're back to the original problem where the only thing that worked was to
>>>> run "SetPageReserved(page)" before calling "kmap(page)". The page is being
>>>> "freed" before ibft_init() is called as a result of the recent buddy page
>>>> freeing changes.
>>> I keep missing some little details each time :(
>> No worries. Thanks for all your help. Does this patch go on top of your
>> previous patch or is it standalone?
> This is standalone.
>   
>> George
>>> Ok, let's try from the different angle.
>>>
>>> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
>>> index 4b9b329a5a92..ec43e1447336 100644
>>> --- a/drivers/acpi/acpica/tbutils.c
>>> +++ b/drivers/acpi/acpica/tbutils.c
>>> @@ -7,6 +7,8 @@
>>>     *
>>>     *****************************************************************************/
>>> +#include <linux/memblock.h>
>>> +
>>>    #include <acpi/acpi.h>
>>>    #include "accommon.h"
>>>    #include "actables.h"
>>> @@ -339,6 +341,21 @@ acpi_tb_parse_root_table(acpi_physical_address rsdp_address)
>>>    			acpi_tb_parse_fadt();
>>>    		}
>>> +		if (ACPI_SUCCESS(status) &&
>>> +		    ACPI_COMPARE_NAMESEG(&acpi_gbl_root_table_list.
>>> +					 tables[table_index].signature,
>>> +					 ACPI_SIG_IBFT)) {
>>> +			struct acpi_table_header *ibft;
>>> +			struct acpi_table_desc *desc;
>>> +
>>> +			desc = &acpi_gbl_root_table_list.tables[table_index];
>>> +			status = acpi_tb_get_table(desc, &ibft);
>>> +			if (ACPI_SUCCESS(status)) {
>>> +				memblock_reserve(address, ibft->length);
>>> +				acpi_tb_put_table(desc);
>>> +		
>>> +		}
>>> +
>>>    next_table:
>>>    		table_entry += table_entry_size;
>>>
>>>
Applied just your latest patch, but same failure.

I thought there was an earlier comment (which I can't find now) that 
stated that memblock_reserve() wouldn't reserve the page, which is 
what's needed here.

[   30.308229] iBFT detected..
[   30.308796] 
==================================================================
[   30.308890] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
[   30.308890] Read of size 4 at addr ffff8880be453004 by task swapper/0/1
[   30.308890]
[   30.308890] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #12
[   30.308890] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 0.0.0 02/06/2015
[   30.308890] Call Trace:
[   30.308890]  dump_stack+0xdb/0x120
[   30.308890]  ? ibft_init+0x134/0xc33
[   30.308890]  print_address_description.constprop.7+0x41/0x60
[   30.308890]  ? ibft_init+0x134/0xc33
[   30.308890]  ? ibft_init+0x134/0xc33
[   30.308890]  kasan_report.cold.10+0x78/0xd1
[   30.308890]  ? ibft_init+0x134/0xc33
[   30.308890]  __asan_report_load_n_noabort+0xf/0x20
[   30.308890]  ibft_init+0x134/0xc33
[   30.308890]  ? write_comp_data+0x2f/0x90
[   30.308890]  ? ibft_check_initiator_for+0x159/0x159
[   30.308890]  ? write_comp_data+0x2f/0x90
[   30.308890]  ? ibft_check_initiator_for+0x159/0x159
[   30.308890]  do_one_initcall+0xc4/0x3e0
[   30.308890]  ? perf_trace_initcall_level+0x3e0/0x3e0
[   30.308890]  ? unpoison_range+0x14/0x40
[   30.308890]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
[   30.308890]  ? kernel_init_freeable+0x420/0x652
[   30.308890]  ? __kasan_kmalloc+0x9/0x10
[   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.308890]  kernel_init_freeable+0x596/0x652
[   30.308890]  ? console_on_rootfs+0x7d/0x7d
[   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
[   30.308890]  ? rest_init+0xf0/0xf0
[   30.308890]  kernel_init+0x16/0x1d0
[   30.308890]  ? rest_init+0xf0/0xf0
[   30.308890]  ret_from_fork+0x22/0x30
[   30.308890]
[   30.308890] The buggy address belongs to the page:
[   30.308890] page:0000000001b7b17c refcount:0 mapcount:0 
mapping:0000000000000000 index:0x1 pfn:0xbe453
[   30.308890] flags: 0xfffffc0000000()
[   30.308890] raw: 000fffffc0000000 ffffea0002ef9788 ffffea0002f91488 
0000000000000000
[   30.308890] raw: 0000000000000001 0000000000000000 00000000ffffffff 
0000000000000000
[   30.308890] page dumped because: kasan: bad access detected
[   30.308890] page_owner tracks the page as freed
[   30.308890] page last allocated via order 0, migratetype Movable, 
gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 28121288605
[   30.308890]  prep_new_page+0xfb/0x140
[   30.308890]  get_page_from_freelist+0x3503/0x5730
[   30.308890]  __alloc_pages_nodemask+0x2d8/0x650
[   30.308890]  alloc_pages_vma+0xe2/0x560
[   30.308890]  __handle_mm_fault+0x930/0x26c0
[   30.308890]  handle_mm_fault+0x1f9/0x810
[   30.308890]  do_user_addr_fault+0x6f7/0xca0
[   30.308890]  exc_page_fault+0xaf/0x1a0
[   30.308890]  asm_exc_page_fault+0x1e/0x30
[   30.308890] page last free stack trace:
[   30.308890]  free_pcp_prepare+0x122/0x290
[   30.308890]  free_unref_page_list+0xe6/0x490
[   30.308890]  release_pages+0x2ed/0x1270
[   30.308890]  free_pages_and_swap_cache+0x245/0x2e0
[   30.308890]  tlb_flush_mmu+0x11e/0x680
[   30.308890]  tlb_finish_mmu+0xa6/0x3e0
[   30.308890]  exit_mmap+0x2b3/0x540
[   30.308890]  mmput+0x11d/0x450
[   30.308890]  do_exit+0xaa6/0x2d40
[   30.308890]  do_group_exit+0x128/0x340
[   30.308890]  __x64_sys_exit_group+0x43/0x50
[   30.308890]  do_syscall_64+0x37/0x50
[   30.308890]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.308890]
[   30.308890] Memory state around the buggy address:
[   30.308890]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.308890]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.308890] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.308890]                    ^
[   30.308890]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.308890]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff ff ff 
ff ff ff
[   30.308890] 
==================================================================

George
George Kennedy Feb. 25, 2021, 4:06 p.m. UTC | #31
On 2/25/2021 10:22 AM, George Kennedy wrote:
>
>
> On 2/25/2021 9:57 AM, Mike Rapoport wrote:
>> On Thu, Feb 25, 2021 at 07:38:19AM -0500, George Kennedy wrote:
>>> On 2/25/2021 3:53 AM, Mike Rapoport wrote:
>>>> Hi George,
>>>>
>>>>> On 2/24/2021 5:37 AM, Mike Rapoport wrote:
>>>>>> On Tue, Feb 23, 2021 at 04:46:28PM -0500, George Kennedy wrote:
>>>>>>> Mike,
>>>>>>>
>>>>>>> Still no luck.
>>>>>>>
>>>>>>> [   30.193723] iscsi: registered transport (iser)
>>>>>>> [   30.195970] iBFT detected.
>>>>>>> [   30.196571] BUG: unable to handle page fault for address: 
>>>>>>> ffffffffff240004
>>>>>> Hmm, we cannot set ibft_addr to early pointer to the ACPI table.
>>>>>> Let's try something more disruptive and move the reservation back to
>>>>>> iscsi_ibft_find.c.
>>>>>>
>>>>>> diff --git a/arch/x86/kernel/acpi/boot.c 
>>>>>> b/arch/x86/kernel/acpi/boot.c
>>>>>> index 7bdc0239a943..c118dd54a747 100644
>>>>>> --- a/arch/x86/kernel/acpi/boot.c
>>>>>> +++ b/arch/x86/kernel/acpi/boot.c
>>>>>> @@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
>>>>>>         if (acpi_disabled)
>>>>>>             return;
>>>>>> +#if 0
>>>>>>         /*
>>>>>>          * Initialize the ACPI boot-time table parser.
>>>>>>          */
>>>>>> @@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
>>>>>>             disable_acpi();
>>>>>>             return;
>>>>>>         }
>>>>>> +#endif
>>>>>>         acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
>>>>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>>>>> index d883176ef2ce..c615ce96c9a2 100644
>>>>>> --- a/arch/x86/kernel/setup.c
>>>>>> +++ b/arch/x86/kernel/setup.c
>>>>>> @@ -570,16 +570,6 @@ void __init reserve_standard_io_resources(void)
>>>>>>     }
>>>>>> -static __init void reserve_ibft_region(void)
>>>>>> -{
>>>>>> -    unsigned long addr, size = 0;
>>>>>> -
>>>>>> -    addr = find_ibft_region(&size);
>>>>>> -
>>>>>> -    if (size)
>>>>>> -        memblock_reserve(addr, size);
>>>>>> -}
>>>>>> -
>>>>>>     static bool __init snb_gfx_workaround_needed(void)
>>>>>>     {
>>>>>>     #ifdef CONFIG_PCI
>>>>>> @@ -1032,6 +1022,12 @@ void __init setup_arch(char **cmdline_p)
>>>>>>          */
>>>>>>         find_smp_config();
>>>>>> +    /*
>>>>>> +     * Initialize the ACPI boot-time table parser.
>>>>>> +     */
>>>>>> +    if (acpi_table_init())
>>>>>> +        disable_acpi();
>>>>>> +
>>>>>>         reserve_ibft_region();
>>>>>>         early_alloc_pgt_buf();
>>>>>> diff --git a/drivers/firmware/iscsi_ibft_find.c 
>>>>>> b/drivers/firmware/iscsi_ibft_find.c
>>>>>> index 64bb94523281..01be513843d6 100644
>>>>>> --- a/drivers/firmware/iscsi_ibft_find.c
>>>>>> +++ b/drivers/firmware/iscsi_ibft_find.c
>>>>>> @@ -47,7 +47,25 @@ static const struct {
>>>>>>     #define VGA_MEM 0xA0000 /* VGA buffer */
>>>>>>     #define VGA_SIZE 0x20000 /* 128kB */
>>>>>> -static int __init find_ibft_in_mem(void)
>>>>>> +static void __init *acpi_find_ibft_region(void)
>>>>>> +{
>>>>>> +    int i;
>>>>>> +    struct acpi_table_header *table = NULL;
>>>>>> +    acpi_status status;
>>>>>> +
>>>>>> +    if (acpi_disabled)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
>>>>>> +        status = acpi_get_table(ibft_signs[i].sign, 0, &table);
>>>>>> +        if (ACPI_SUCCESS(status))
>>>>>> +            return table;
>>>>>> +    }
>>>>>> +
>>>>>> +    return NULL;
>>>>>> +}
>>>>>> +
>>>>>> +static void __init *find_ibft_in_mem(void)
>>>>>>     {
>>>>>>         unsigned long pos;
>>>>>>         unsigned int len = 0;
>>>>>> @@ -70,35 +88,44 @@ static int __init find_ibft_in_mem(void)
>>>>>>                     /* if the length of the table extends past 1M,
>>>>>>                      * the table cannot be valid. */
>>>>>>                     if (pos + len <= (IBFT_END-1)) {
>>>>>> -                    ibft_addr = (struct acpi_table_ibft *)virt;
>>>>>>                         pr_info("iBFT found at 0x%lx.\n", pos);
>>>>>> -                    goto done;
>>>>>> +                    return virt;
>>>>>>                     }
>>>>>>                 }
>>>>>>             }
>>>>>>         }
>>>>>> -done:
>>>>>> -    return len;
>>>>>> +
>>>>>> +    return NULL;
>>>>>>     }
>>>>>> +
>>>>>> +static void __init *find_ibft(void)
>>>>>> +{
>>>>>> +    /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>>>>>> +     * only use ACPI for this */
>>>>>> +    if (!efi_enabled(EFI_BOOT))
>>>>>> +        return find_ibft_in_mem();
>>>>>> +    else
>>>>>> +        return acpi_find_ibft_region();
>>>>>> +}
>>>>>> +
>>>>>>     /*
>>>>>>      * Routine used to find the iSCSI Boot Format Table. The logical
>>>>>>      * kernel address is set in the ibft_addr global variable.
>>>>>>      */
>>>>>> -unsigned long __init find_ibft_region(unsigned long *sizep)
>>>>>> +void __init reserve_ibft_region(void)
>>>>>>     {
>>>>>> -    ibft_addr = NULL;
>>>>>> +    struct acpi_table_ibft *table;
>>>>>> +    unsigned long size;
>>>>>> -    /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
>>>>>> -     * only use ACPI for this */
>>>>>> +    table = find_ibft();
>>>>>> +    if (!table)
>>>>>> +        return;
>>>>>> -    if (!efi_enabled(EFI_BOOT))
>>>>>> -        find_ibft_in_mem();
>>>>>> -
>>>>>> -    if (ibft_addr) {
>>>>>> -        *sizep = PAGE_ALIGN(ibft_addr->header.length);
>>>>>> -        return (u64)virt_to_phys(ibft_addr);
>>>>>> -    }
>>>>>> +    size = PAGE_ALIGN(table->header.length);
>>>>>> +    memblock_reserve(virt_to_phys(table), size);
>>>>>> -    *sizep = 0;
>>>>>> -    return 0;
>>>>>> +    if (efi_enabled(EFI_BOOT))
>>>>>> +        acpi_put_table(&table->header);
>>>>>> +    else
>>>>>> +        ibft_addr = table;
>>>>>>     }
>>>>>> diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
>>>>>> index b7b45ca82bea..da813c891990 100644
>>>>>> --- a/include/linux/iscsi_ibft.h
>>>>>> +++ b/include/linux/iscsi_ibft.h
>>>>>> @@ -26,13 +26,9 @@ extern struct acpi_table_ibft *ibft_addr;
>>>>>>      * mapped address is set in the ibft_addr variable.
>>>>>>      */
>>>>>>     #ifdef CONFIG_ISCSI_IBFT_FIND
>>>>>> -unsigned long find_ibft_region(unsigned long *sizep);
>>>>>> +void reserve_ibft_region(void);
>>>>>>     #else
>>>>>> -static inline unsigned long find_ibft_region(unsigned long *sizep)
>>>>>> -{
>>>>>> -    *sizep = 0;
>>>>>> -    return 0;
>>>>>> -}
>>>>>> +static inline void reserve_ibft_region(void) {}
>>>>>>     #endif
>>>>>>     #endif /* ISCSI_IBFT_H */
>>>>> Still no luck Mike,
>>>>>
>>>>> We're back to the original problem where the only thing that 
>>>>> worked was to
>>>>> run "SetPageReserved(page)" before calling "kmap(page)". The page 
>>>>> is being
>>>>> "freed" before ibft_init() is called as a result of the recent 
>>>>> buddy page
>>>>> freeing changes.
>>>> I keep missing some little details each time :(
>>> No worries. Thanks for all your help. Does this patch go on top of your
>>> previous patch or is it standalone?
>> This is standalone.
>>> George
>>>> Ok, let's try from the different angle.
>>>>
>>>> diff --git a/drivers/acpi/acpica/tbutils.c 
>>>> b/drivers/acpi/acpica/tbutils.c
>>>> index 4b9b329a5a92..ec43e1447336 100644
>>>> --- a/drivers/acpi/acpica/tbutils.c
>>>> +++ b/drivers/acpi/acpica/tbutils.c
>>>> @@ -7,6 +7,8 @@
>>>>     *
>>>> *****************************************************************************/
>>>> +#include <linux/memblock.h>
>>>> +
>>>>    #include <acpi/acpi.h>
>>>>    #include "accommon.h"
>>>>    #include "actables.h"
>>>> @@ -339,6 +341,21 @@ acpi_tb_parse_root_table(acpi_physical_address 
>>>> rsdp_address)
>>>>                acpi_tb_parse_fadt();
>>>>            }
>>>> +        if (ACPI_SUCCESS(status) &&
>>>> + ACPI_COMPARE_NAMESEG(&acpi_gbl_root_table_list.
>>>> +                     tables[table_index].signature,
>>>> +                     ACPI_SIG_IBFT)) {
>>>> +            struct acpi_table_header *ibft;
>>>> +            struct acpi_table_desc *desc;
>>>> +
>>>> +            desc = &acpi_gbl_root_table_list.tables[table_index];
>>>> +            status = acpi_tb_get_table(desc, &ibft);
>>>> +            if (ACPI_SUCCESS(status)) {
>>>> +                memblock_reserve(address, ibft->length);
>>>> +                acpi_tb_put_table(desc);
>>>> +
>>>> +        }
>>>> +
>>>>    next_table:
>>>>            table_entry += table_entry_size;
>>>>
>>>>
> Applied just your latest patch, but same failure.
>
> I thought there was an earlier comment (which I can't find now) that 
> stated that memblock_reserve() wouldn't reserve the page, which is 
> what's needed here.
Mike,

Here was David's explanation of what he thinks is going on (or should be 
going on) from a few days ago:

QUOTE...
I assume that acpi_map()/acpi_unmap() map some firmware blob that is 
provided via firmware/bios/... to us.

should_use_kmap() tells us whether
a) we have a "struct page" and should kmap() that one
b) we don't have a "struct page" and should ioremap.

As it is a blob, the firmware should always reserve that memory region 
via memblock (e.g., memblock_reserve()), such that we either
1) don't create a memmap ("struct page") at all (-> case b) )
2) if we have to create e memmap, we mark the page PG_reserved and
*never* expose it to the buddy (-> case a) )


Are you telling me that in this case we might have a memmap for the HW 
blob that is *not* PG_reserved? In that case it most probably got 
exposed to the buddy where it can happily get allocated/freed.

The latent BUG would be that that blob gets exposed to the system like 
ordinary RAM, and not reserved via memblock early during boot. Assuming 
that blob has a low physical address, with my patch it will get 
allocated/used a lot earlier - which would mean we trigger this latent 
BUG now more easily.
...END_QUOTE

Your most recent patch has added the memblock_reserve(), but it's still 
missing the PG_reserved setting.

Thanks,
George

>
> [   30.308229] iBFT detected..
> [   30.308796] 
> ==================================================================
> [   30.308890] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
> [   30.308890] Read of size 4 at addr ffff8880be453004 by task 
> swapper/0/1
> [   30.308890]
> [   30.308890] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
> 5.11.0-f9593a0 #12
> [   30.308890] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS 0.0.0 02/06/2015
> [   30.308890] Call Trace:
> [   30.308890]  dump_stack+0xdb/0x120
> [   30.308890]  ? ibft_init+0x134/0xc33
> [   30.308890]  print_address_description.constprop.7+0x41/0x60
> [   30.308890]  ? ibft_init+0x134/0xc33
> [   30.308890]  ? ibft_init+0x134/0xc33
> [   30.308890]  kasan_report.cold.10+0x78/0xd1
> [   30.308890]  ? ibft_init+0x134/0xc33
> [   30.308890]  __asan_report_load_n_noabort+0xf/0x20
> [   30.308890]  ibft_init+0x134/0xc33
> [   30.308890]  ? write_comp_data+0x2f/0x90
> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
> [   30.308890]  ? write_comp_data+0x2f/0x90
> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
> [   30.308890]  do_one_initcall+0xc4/0x3e0
> [   30.308890]  ? perf_trace_initcall_level+0x3e0/0x3e0
> [   30.308890]  ? unpoison_range+0x14/0x40
> [   30.308890]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
> [   30.308890]  ? kernel_init_freeable+0x420/0x652
> [   30.308890]  ? __kasan_kmalloc+0x9/0x10
> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
> [   30.308890]  kernel_init_freeable+0x596/0x652
> [   30.308890]  ? console_on_rootfs+0x7d/0x7d
> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
> [   30.308890]  ? rest_init+0xf0/0xf0
> [   30.308890]  kernel_init+0x16/0x1d0
> [   30.308890]  ? rest_init+0xf0/0xf0
> [   30.308890]  ret_from_fork+0x22/0x30
> [   30.308890]
> [   30.308890] The buggy address belongs to the page:
> [   30.308890] page:0000000001b7b17c refcount:0 mapcount:0 
> mapping:0000000000000000 index:0x1 pfn:0xbe453
> [   30.308890] flags: 0xfffffc0000000()
> [   30.308890] raw: 000fffffc0000000 ffffea0002ef9788 ffffea0002f91488 
> 0000000000000000
> [   30.308890] raw: 0000000000000001 0000000000000000 00000000ffffffff 
> 0000000000000000
> [   30.308890] page dumped because: kasan: bad access detected
> [   30.308890] page_owner tracks the page as freed
> [   30.308890] page last allocated via order 0, migratetype Movable, 
> gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 
> 28121288605
> [   30.308890]  prep_new_page+0xfb/0x140
> [   30.308890]  get_page_from_freelist+0x3503/0x5730
> [   30.308890]  __alloc_pages_nodemask+0x2d8/0x650
> [   30.308890]  alloc_pages_vma+0xe2/0x560
> [   30.308890]  __handle_mm_fault+0x930/0x26c0
> [   30.308890]  handle_mm_fault+0x1f9/0x810
> [   30.308890]  do_user_addr_fault+0x6f7/0xca0
> [   30.308890]  exc_page_fault+0xaf/0x1a0
> [   30.308890]  asm_exc_page_fault+0x1e/0x30
> [   30.308890] page last free stack trace:
> [   30.308890]  free_pcp_prepare+0x122/0x290
> [   30.308890]  free_unref_page_list+0xe6/0x490
> [   30.308890]  release_pages+0x2ed/0x1270
> [   30.308890]  free_pages_and_swap_cache+0x245/0x2e0
> [   30.308890]  tlb_flush_mmu+0x11e/0x680
> [   30.308890]  tlb_finish_mmu+0xa6/0x3e0
> [   30.308890]  exit_mmap+0x2b3/0x540
> [   30.308890]  mmput+0x11d/0x450
> [   30.308890]  do_exit+0xaa6/0x2d40
> [   30.308890]  do_group_exit+0x128/0x340
> [   30.308890]  __x64_sys_exit_group+0x43/0x50
> [   30.308890]  do_syscall_64+0x37/0x50
> [   30.308890]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   30.308890]
> [   30.308890] Memory state around the buggy address:
> [   30.308890]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff ff ff
> [   30.308890]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff ff ff
> [   30.308890] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff ff ff
> [   30.308890]                    ^
> [   30.308890]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff ff ff
> [   30.308890]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff ff 
> ff ff ff ff
> [   30.308890] 
> ==================================================================
>
> George
>
Mike Rapoport Feb. 25, 2021, 4:07 p.m. UTC | #32
On Thu, Feb 25, 2021 at 10:22:44AM -0500, George Kennedy wrote:
> 
> > > > > On 2/24/2021 5:37 AM, Mike Rapoport wrote:
>
> Applied just your latest patch, but same failure.
> 
> I thought there was an earlier comment (which I can't find now) that stated
> that memblock_reserve() wouldn't reserve the page, which is what's needed
> here.

Actually, I think that memblock_reserve() should be just fine, but it seems
I'm missing something in address calculation each time.

What would happen if you stuck

	memblock_reserve(0xbe453000, PAGE_SIZE);

say, at the beginning of find_ibft_region()?
 
> [   30.308229] iBFT detected..
> [   30.308796]
> ==================================================================
> [   30.308890] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
> [   30.308890] Read of size 4 at addr ffff8880be453004 by task swapper/0/1
> [   30.308890]
> [   30.308890] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #12
> [   30.308890] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 0.0.0 02/06/2015
> [   30.308890] Call Trace:
> [   30.308890]  dump_stack+0xdb/0x120
> [   30.308890]  ? ibft_init+0x134/0xc33
> [   30.308890]  print_address_description.constprop.7+0x41/0x60
> [   30.308890]  ? ibft_init+0x134/0xc33
> [   30.308890]  ? ibft_init+0x134/0xc33
> [   30.308890]  kasan_report.cold.10+0x78/0xd1
> [   30.308890]  ? ibft_init+0x134/0xc33
> [   30.308890]  __asan_report_load_n_noabort+0xf/0x20
> [   30.308890]  ibft_init+0x134/0xc33
> [   30.308890]  ? write_comp_data+0x2f/0x90
> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
> [   30.308890]  ? write_comp_data+0x2f/0x90
> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
> [   30.308890]  do_one_initcall+0xc4/0x3e0
> [   30.308890]  ? perf_trace_initcall_level+0x3e0/0x3e0
> [   30.308890]  ? unpoison_range+0x14/0x40
> [   30.308890]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
> [   30.308890]  ? kernel_init_freeable+0x420/0x652
> [   30.308890]  ? __kasan_kmalloc+0x9/0x10
> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
> [   30.308890]  kernel_init_freeable+0x596/0x652
> [   30.308890]  ? console_on_rootfs+0x7d/0x7d
> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
> [   30.308890]  ? rest_init+0xf0/0xf0
> [   30.308890]  kernel_init+0x16/0x1d0
> [   30.308890]  ? rest_init+0xf0/0xf0
> [   30.308890]  ret_from_fork+0x22/0x30
> [   30.308890]
> [   30.308890] The buggy address belongs to the page:
> [   30.308890] page:0000000001b7b17c refcount:0 mapcount:0
> mapping:0000000000000000 index:0x1 pfn:0xbe453
> [   30.308890] flags: 0xfffffc0000000()
> [   30.308890] raw: 000fffffc0000000 ffffea0002ef9788 ffffea0002f91488
> 0000000000000000
> [   30.308890] raw: 0000000000000001 0000000000000000 00000000ffffffff
> 0000000000000000
> [   30.308890] page dumped because: kasan: bad access detected
> [   30.308890] page_owner tracks the page as freed
> [   30.308890] page last allocated via order 0, migratetype Movable,
> gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 28121288605
> [   30.308890]  prep_new_page+0xfb/0x140
> [   30.308890]  get_page_from_freelist+0x3503/0x5730
> [   30.308890]  __alloc_pages_nodemask+0x2d8/0x650
> [   30.308890]  alloc_pages_vma+0xe2/0x560
> [   30.308890]  __handle_mm_fault+0x930/0x26c0
> [   30.308890]  handle_mm_fault+0x1f9/0x810
> [   30.308890]  do_user_addr_fault+0x6f7/0xca0
> [   30.308890]  exc_page_fault+0xaf/0x1a0
> [   30.308890]  asm_exc_page_fault+0x1e/0x30
> [   30.308890] page last free stack trace:
> [   30.308890]  free_pcp_prepare+0x122/0x290
> [   30.308890]  free_unref_page_list+0xe6/0x490
> [   30.308890]  release_pages+0x2ed/0x1270
> [   30.308890]  free_pages_and_swap_cache+0x245/0x2e0
> [   30.308890]  tlb_flush_mmu+0x11e/0x680
> [   30.308890]  tlb_finish_mmu+0xa6/0x3e0
> [   30.308890]  exit_mmap+0x2b3/0x540
> [   30.308890]  mmput+0x11d/0x450
> [   30.308890]  do_exit+0xaa6/0x2d40
> [   30.308890]  do_group_exit+0x128/0x340
> [   30.308890]  __x64_sys_exit_group+0x43/0x50
> [   30.308890]  do_syscall_64+0x37/0x50
> [   30.308890]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   30.308890]
> [   30.308890] Memory state around the buggy address:
> [   30.308890]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff
> [   30.308890]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff
> [   30.308890] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff
> [   30.308890]                    ^
> [   30.308890]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff
> [   30.308890]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ff ff
> [   30.308890]
> ==================================================================
> 
> George
>
George Kennedy Feb. 25, 2021, 4:31 p.m. UTC | #33
On 2/25/2021 11:07 AM, Mike Rapoport wrote:
> On Thu, Feb 25, 2021 at 10:22:44AM -0500, George Kennedy wrote:
>>>>>> On 2/24/2021 5:37 AM, Mike Rapoport wrote:
>> Applied just your latest patch, but same failure.
>>
>> I thought there was an earlier comment (which I can't find now) that stated
>> that memblock_reserve() wouldn't reserve the page, which is what's needed
>> here.
> Actually, I think that memblock_reserve() should be just fine, but it seems
> I'm missing something in address calculation each time.
>
> What would happen if you stuck
>
> 	memblock_reserve(0xbe453000, PAGE_SIZE);
>
> say, at the beginning of find_ibft_region()?

Added debug to your patch and this is all that shows up. Looks like the 
patch is in the wrong place as acpi_tb_parse_root_table() is only called 
for the RSDP address.

[    0.064317] ACPI: Early table checksum verification disabled
[    0.065437] XXX acpi_tb_parse_root_table: rsdp_address=bfbfa014
[    0.066612] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
[    0.067759] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP 
00000001      01000013)
[    0.069470] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP 
00000001 BXPC 00000001)
[    0.071183] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT 
00000001 BXPC 00000001)
[    0.072876] ACPI: FACS 0x00000000BFBFD000 000040
[    0.073806] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC 
00000001 BXPC 00000001)
[    0.075501] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET 
00000001 BXPC 00000001)
[    0.077194] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2     
00000002      01000013)
[    0.078880] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP 
00000000      00000000)
[    0.080588] ACPI: Local APIC address 0xfee00000

diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
index dfe1ac3..603b3a8 100644
--- a/drivers/acpi/acpica/tbutils.c
+++ b/drivers/acpi/acpica/tbutils.c
@@ -7,6 +7,8 @@
   *
*****************************************************************************/

+#include <linux/memblock.h>
+
  #include <acpi/acpi.h>
  #include "accommon.h"
  #include "actables.h"
@@ -232,6 +234,8 @@ struct acpi_table_header *acpi_tb_copy_dsdt(u32 
table_index)
      acpi_status status;
      u32 table_index;

+printk(KERN_ERR "XXX acpi_tb_parse_root_table: rsdp_address=%llx\n", 
rsdp_address);
+
      ACPI_FUNCTION_TRACE(tb_parse_root_table);

      /* Map the entire RSDP and extract the address of the RSDT or XSDT */
@@ -339,6 +343,22 @@ struct acpi_table_header *acpi_tb_copy_dsdt(u32 
table_index)
              acpi_tb_parse_fadt();
          }

+        if (ACPI_SUCCESS(status) &&
+            ACPI_COMPARE_NAMESEG(&acpi_gbl_root_table_list.
+                     tables[table_index].signature,
+                     ACPI_SIG_IBFT)) {
+            struct acpi_table_header *ibft;
+            struct acpi_table_desc *desc;
+
+            desc = &acpi_gbl_root_table_list.tables[table_index];
+            status = acpi_tb_get_table(desc, &ibft);
+            if (ACPI_SUCCESS(status)) {
+printk(KERN_ERR "XXX acpi_tb_parse_root_table(calling 
memblock_reserve()): addres=%llx, ibft->length=%x\n", address, 
ibft->length);
+                memblock_reserve(address, ibft->length);
+                acpi_tb_put_table(desc);
+            }
+        }
+
  next_table:

          table_entry += table_entry_size;


>   
>> [   30.308229] iBFT detected..
>> [   30.308796]
>> ==================================================================
>> [   30.308890] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
>> [   30.308890] Read of size 4 at addr ffff8880be453004 by task swapper/0/1
>> [   30.308890]
>> [   30.308890] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #12
>> [   30.308890] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 0.0.0 02/06/2015
>> [   30.308890] Call Trace:
>> [   30.308890]  dump_stack+0xdb/0x120
>> [   30.308890]  ? ibft_init+0x134/0xc33
>> [   30.308890]  print_address_description.constprop.7+0x41/0x60
>> [   30.308890]  ? ibft_init+0x134/0xc33
>> [   30.308890]  ? ibft_init+0x134/0xc33
>> [   30.308890]  kasan_report.cold.10+0x78/0xd1
>> [   30.308890]  ? ibft_init+0x134/0xc33
>> [   30.308890]  __asan_report_load_n_noabort+0xf/0x20
>> [   30.308890]  ibft_init+0x134/0xc33
>> [   30.308890]  ? write_comp_data+0x2f/0x90
>> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
>> [   30.308890]  ? write_comp_data+0x2f/0x90
>> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
>> [   30.308890]  do_one_initcall+0xc4/0x3e0
>> [   30.308890]  ? perf_trace_initcall_level+0x3e0/0x3e0
>> [   30.308890]  ? unpoison_range+0x14/0x40
>> [   30.308890]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
>> [   30.308890]  ? kernel_init_freeable+0x420/0x652
>> [   30.308890]  ? __kasan_kmalloc+0x9/0x10
>> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
>> [   30.308890]  kernel_init_freeable+0x596/0x652
>> [   30.308890]  ? console_on_rootfs+0x7d/0x7d
>> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
>> [   30.308890]  ? rest_init+0xf0/0xf0
>> [   30.308890]  kernel_init+0x16/0x1d0
>> [   30.308890]  ? rest_init+0xf0/0xf0
>> [   30.308890]  ret_from_fork+0x22/0x30
>> [   30.308890]
>> [   30.308890] The buggy address belongs to the page:
>> [   30.308890] page:0000000001b7b17c refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x1 pfn:0xbe453
>> [   30.308890] flags: 0xfffffc0000000()
>> [   30.308890] raw: 000fffffc0000000 ffffea0002ef9788 ffffea0002f91488
>> 0000000000000000
>> [   30.308890] raw: 0000000000000001 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [   30.308890] page dumped because: kasan: bad access detected
>> [   30.308890] page_owner tracks the page as freed
>> [   30.308890] page last allocated via order 0, migratetype Movable,
>> gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 28121288605
>> [   30.308890]  prep_new_page+0xfb/0x140
>> [   30.308890]  get_page_from_freelist+0x3503/0x5730
>> [   30.308890]  __alloc_pages_nodemask+0x2d8/0x650
>> [   30.308890]  alloc_pages_vma+0xe2/0x560
>> [   30.308890]  __handle_mm_fault+0x930/0x26c0
>> [   30.308890]  handle_mm_fault+0x1f9/0x810
>> [   30.308890]  do_user_addr_fault+0x6f7/0xca0
>> [   30.308890]  exc_page_fault+0xaf/0x1a0
>> [   30.308890]  asm_exc_page_fault+0x1e/0x30
>> [   30.308890] page last free stack trace:
>> [   30.308890]  free_pcp_prepare+0x122/0x290
>> [   30.308890]  free_unref_page_list+0xe6/0x490
>> [   30.308890]  release_pages+0x2ed/0x1270
>> [   30.308890]  free_pages_and_swap_cache+0x245/0x2e0
>> [   30.308890]  tlb_flush_mmu+0x11e/0x680
>> [   30.308890]  tlb_finish_mmu+0xa6/0x3e0
>> [   30.308890]  exit_mmap+0x2b3/0x540
>> [   30.308890]  mmput+0x11d/0x450
>> [   30.308890]  do_exit+0xaa6/0x2d40
>> [   30.308890]  do_group_exit+0x128/0x340
>> [   30.308890]  __x64_sys_exit_group+0x43/0x50
>> [   30.308890]  do_syscall_64+0x37/0x50
>> [   30.308890]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [   30.308890]
>> [   30.308890] Memory state around the buggy address:
>> [   30.308890]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890]                    ^
>> [   30.308890]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890]
>> ==================================================================
>>
>> George
>>
David Hildenbrand Feb. 25, 2021, 5:23 p.m. UTC | #34
On 25.02.21 17:31, George Kennedy wrote:
> : rsdp_address=bfbfa014
> [    0.066612] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
> [    0.067759] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
> 00000001      01000013)
> [    0.069470] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
> 00000001 BXPC 00000001)
> [    0.071183] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
> 00000001 BXPC 00000001)
> [    0.072876] ACPI: FACS 0x00000000BFBFD000 000040
> [    0.073806] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
> 00000001 BXPC 00000001)
> [    0.075501] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
> 00000001 BXPC 00000001)
> [    0.077194] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2
> 00000002      01000013)
> [    0.078880] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
> 00000000      00000000)


Can you explore the relevant area using the page-flags tools (located in 
Linux src code located in tools/vm/page-flags.c)


./page-types -L -r -a 0xbe490,0xbe4a0
George Kennedy Feb. 25, 2021, 5:33 p.m. UTC | #35
On 2/25/2021 11:07 AM, Mike Rapoport wrote:
> On Thu, Feb 25, 2021 at 10:22:44AM -0500, George Kennedy wrote:
>>>>>> On 2/24/2021 5:37 AM, Mike Rapoport wrote:
>> Applied just your latest patch, but same failure.
>>
>> I thought there was an earlier comment (which I can't find now) that stated
>> that memblock_reserve() wouldn't reserve the page, which is what's needed
>> here.
> Actually, I think that memblock_reserve() should be just fine, but it seems
> I'm missing something in address calculation each time.
>
> What would happen if you stuck
>
> 	memblock_reserve(0xbe453000, PAGE_SIZE);
>
> say, at the beginning of find_ibft_region()?

Good news Mike!

The above hack in yesterday's last patch works - 10 successful reboots. 
See: "BE453" below for the hack.

I'll modify the patch to use "table_desc->address" instead, which is the 
physical address of the table.

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 7bdc023..c118dd5 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
      if (acpi_disabled)
          return;

+#if 0
      /*
       * Initialize the ACPI boot-time table parser.
       */
@@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
          disable_acpi();
          return;
      }
+#endif

      acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 740f3bdb..b045ab2 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -571,16 +571,6 @@ void __init reserve_standard_io_resources(void)

  }

-static __init void reserve_ibft_region(void)
-{
-    unsigned long addr, size = 0;
-
-    addr = find_ibft_region(&size);
-
-    if (size)
-        memblock_reserve(addr, size);
-}
-
  static bool __init snb_gfx_workaround_needed(void)
  {
  #ifdef CONFIG_PCI
@@ -1033,6 +1023,12 @@ void __init setup_arch(char **cmdline_p)
       */
      find_smp_config();

+    /*
+     * Initialize the ACPI boot-time table parser.
+     */
+    if (acpi_table_init())
+        disable_acpi();
+
      reserve_ibft_region();

      early_alloc_pgt_buf();
diff --git a/drivers/firmware/iscsi_ibft_find.c 
b/drivers/firmware/iscsi_ibft_find.c
index 64bb945..95fc1a6 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -47,7 +47,25 @@
  #define VGA_MEM 0xA0000 /* VGA buffer */
  #define VGA_SIZE 0x20000 /* 128kB */

-static int __init find_ibft_in_mem(void)
+static void __init *acpi_find_ibft_region(void)
+{
+    int i;
+    struct acpi_table_header *table = NULL;
+    acpi_status status;
+
+    if (acpi_disabled)
+        return NULL;
+
+    for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
+        status = acpi_get_table(ibft_signs[i].sign, 0, &table);
+        if (ACPI_SUCCESS(status))
+            return table;
+    }
+
+    return NULL;
+}
+
+static void __init *find_ibft_in_mem(void)
  {
      unsigned long pos;
      unsigned int len = 0;
@@ -70,35 +88,52 @@ static int __init find_ibft_in_mem(void)
                  /* if the length of the table extends past 1M,
                   * the table cannot be valid. */
                  if (pos + len <= (IBFT_END-1)) {
-                    ibft_addr = (struct acpi_table_ibft *)virt;
                      pr_info("iBFT found at 0x%lx.\n", pos);
-                    goto done;
+                    return virt;
                  }
              }
          }
      }
-done:
-    return len;
+
+    return NULL;
  }
+
+static void __init *find_ibft(void)
+{
+    /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
+     * only use ACPI for this */
+    if (!efi_enabled(EFI_BOOT))
+        return find_ibft_in_mem();
+    else
+        return acpi_find_ibft_region();
+}
+
  /*
   * Routine used to find the iSCSI Boot Format Table. The logical
   * kernel address is set in the ibft_addr global variable.
   */
-unsigned long __init find_ibft_region(unsigned long *sizep)
+void __init reserve_ibft_region(void)
  {
-    ibft_addr = NULL;
+    struct acpi_table_ibft *table;
+    unsigned long size;

-    /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
-     * only use ACPI for this */
+    table = find_ibft();
+    if (!table)
+        return;

-    if (!efi_enabled(EFI_BOOT))
-        find_ibft_in_mem();
-
-    if (ibft_addr) {
-        *sizep = PAGE_ALIGN(ibft_addr->header.length);
-        return (u64)virt_to_phys(ibft_addr);
-    }
+    size = PAGE_ALIGN(table->header.length);
+#if 0
+printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, 
virt_to_phys(table)=%llx, size=%lx\n",
+    (u64)table, virt_to_phys(table), size);
+    memblock_reserve(virt_to_phys(table), size);
+#else
+printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, 
0x00000000BE453000, size=%lx\n",
+    (u64)table, size);
+    memblock_reserve(0x00000000BE453000, size);
+#endif

-    *sizep = 0;
-    return 0;
+    if (efi_enabled(EFI_BOOT))
+        acpi_put_table(&table->header);
+    else
+        ibft_addr = table;
  }
diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
index b7b45ca..da813c8 100644
--- a/include/linux/iscsi_ibft.h
+++ b/include/linux/iscsi_ibft.h
@@ -26,13 +26,9 @@
   * mapped address is set in the ibft_addr variable.
   */
  #ifdef CONFIG_ISCSI_IBFT_FIND
-unsigned long find_ibft_region(unsigned long *sizep);
+void reserve_ibft_region(void);
  #else
-static inline unsigned long find_ibft_region(unsigned long *sizep)
-{
-    *sizep = 0;
-    return 0;
-}
+static inline void reserve_ibft_region(void) {}
  #endif

  #endif /* ISCSI_IBFT_H */


Debug from the above:

[    0.020293] last_pfn = 0xbfedc max_arch_pfn = 0x400000000
[    0.050778] ACPI: Early table checksum verification disabled
[    0.056475] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
[    0.057628] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP 
00000001      01000013)
[    0.059341] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP 
00000001 BXPC 00000001)
[    0.061043] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT 
00000001 BXPC 00000001)
[    0.062740] ACPI: FACS 0x00000000BFBFD000 000040
[    0.063673] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC 
00000001 BXPC 00000001)
[    0.065369] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET 
00000001 BXPC 00000001)
[    0.067061] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2     
00000002      01000013)
[    0.068761] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP 
00000000      00000000)
[    0.070461] XXX reserve_ibft_region: table=ffffffffff240000, 
0x00000000BE453000, size=1000
[    0.072231] check: Scanning 1 areas for low memory corruption

George
>   
>> [   30.308229] iBFT detected..
>> [   30.308796]
>> ==================================================================
>> [   30.308890] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
>> [   30.308890] Read of size 4 at addr ffff8880be453004 by task swapper/0/1
>> [   30.308890]
>> [   30.308890] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #12
>> [   30.308890] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>> 0.0.0 02/06/2015
>> [   30.308890] Call Trace:
>> [   30.308890]  dump_stack+0xdb/0x120
>> [   30.308890]  ? ibft_init+0x134/0xc33
>> [   30.308890]  print_address_description.constprop.7+0x41/0x60
>> [   30.308890]  ? ibft_init+0x134/0xc33
>> [   30.308890]  ? ibft_init+0x134/0xc33
>> [   30.308890]  kasan_report.cold.10+0x78/0xd1
>> [   30.308890]  ? ibft_init+0x134/0xc33
>> [   30.308890]  __asan_report_load_n_noabort+0xf/0x20
>> [   30.308890]  ibft_init+0x134/0xc33
>> [   30.308890]  ? write_comp_data+0x2f/0x90
>> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
>> [   30.308890]  ? write_comp_data+0x2f/0x90
>> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
>> [   30.308890]  do_one_initcall+0xc4/0x3e0
>> [   30.308890]  ? perf_trace_initcall_level+0x3e0/0x3e0
>> [   30.308890]  ? unpoison_range+0x14/0x40
>> [   30.308890]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
>> [   30.308890]  ? kernel_init_freeable+0x420/0x652
>> [   30.308890]  ? __kasan_kmalloc+0x9/0x10
>> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
>> [   30.308890]  kernel_init_freeable+0x596/0x652
>> [   30.308890]  ? console_on_rootfs+0x7d/0x7d
>> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
>> [   30.308890]  ? rest_init+0xf0/0xf0
>> [   30.308890]  kernel_init+0x16/0x1d0
>> [   30.308890]  ? rest_init+0xf0/0xf0
>> [   30.308890]  ret_from_fork+0x22/0x30
>> [   30.308890]
>> [   30.308890] The buggy address belongs to the page:
>> [   30.308890] page:0000000001b7b17c refcount:0 mapcount:0
>> mapping:0000000000000000 index:0x1 pfn:0xbe453
>> [   30.308890] flags: 0xfffffc0000000()
>> [   30.308890] raw: 000fffffc0000000 ffffea0002ef9788 ffffea0002f91488
>> 0000000000000000
>> [   30.308890] raw: 0000000000000001 0000000000000000 00000000ffffffff
>> 0000000000000000
>> [   30.308890] page dumped because: kasan: bad access detected
>> [   30.308890] page_owner tracks the page as freed
>> [   30.308890] page last allocated via order 0, migratetype Movable,
>> gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 28121288605
>> [   30.308890]  prep_new_page+0xfb/0x140
>> [   30.308890]  get_page_from_freelist+0x3503/0x5730
>> [   30.308890]  __alloc_pages_nodemask+0x2d8/0x650
>> [   30.308890]  alloc_pages_vma+0xe2/0x560
>> [   30.308890]  __handle_mm_fault+0x930/0x26c0
>> [   30.308890]  handle_mm_fault+0x1f9/0x810
>> [   30.308890]  do_user_addr_fault+0x6f7/0xca0
>> [   30.308890]  exc_page_fault+0xaf/0x1a0
>> [   30.308890]  asm_exc_page_fault+0x1e/0x30
>> [   30.308890] page last free stack trace:
>> [   30.308890]  free_pcp_prepare+0x122/0x290
>> [   30.308890]  free_unref_page_list+0xe6/0x490
>> [   30.308890]  release_pages+0x2ed/0x1270
>> [   30.308890]  free_pages_and_swap_cache+0x245/0x2e0
>> [   30.308890]  tlb_flush_mmu+0x11e/0x680
>> [   30.308890]  tlb_finish_mmu+0xa6/0x3e0
>> [   30.308890]  exit_mmap+0x2b3/0x540
>> [   30.308890]  mmput+0x11d/0x450
>> [   30.308890]  do_exit+0xaa6/0x2d40
>> [   30.308890]  do_group_exit+0x128/0x340
>> [   30.308890]  __x64_sys_exit_group+0x43/0x50
>> [   30.308890]  do_syscall_64+0x37/0x50
>> [   30.308890]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [   30.308890]
>> [   30.308890] Memory state around the buggy address:
>> [   30.308890]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890]                    ^
>> [   30.308890]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>> ff ff
>> [   30.308890]
>> ==================================================================
>>
>> George
>>
Mike Rapoport Feb. 25, 2021, 5:41 p.m. UTC | #36
On Thu, Feb 25, 2021 at 06:23:24PM +0100, David Hildenbrand wrote:
> On 25.02.21 17:31, George Kennedy wrote:
> > : rsdp_address=bfbfa014
> > [    0.066612] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
> > [    0.067759] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
> > 00000001      01000013)
> > [    0.069470] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
> > 00000001 BXPC 00000001)
> > [    0.071183] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
> > 00000001 BXPC 00000001)
> > [    0.072876] ACPI: FACS 0x00000000BFBFD000 000040
> > [    0.073806] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
> > 00000001 BXPC 00000001)
> > [    0.075501] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
> > 00000001 BXPC 00000001)
> > [    0.077194] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2
> > 00000002      01000013)
> > [    0.078880] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
> > 00000000      00000000)
> 
> 
> Can you explore the relevant area using the page-flags tools (located in
> Linux src code located in tools/vm/page-flags.c)
> 
> 
> ./page-types -L -r -a 0xbe490,0xbe4a0

These are not iBFT and they are "ACPI data", so we should have them as
PG_Reserved set at init_unavailable_mem().


[    0.000000] BIOS-e820: [mem 0x0000000000808000-0x000000000080ffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000810000-0x00000000008fffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000000900000-0x00000000be49afff] usable

                               ^ iBFT@0xbe453 lives here ^ 

And it should be a normal page, as it's in "usable" memory and nothing
reserves it at boot, so no reason it won't be freed to buddy.

If iBFT was in the low memory (<1M) it would have been reserved by
reserve_ibft_region(), but with ACPI any block not marked by BIOS as "ACPI
something" is treated like a normal memory and there is nothing that
reserves it.

So we do need to memblock_reserve() iBFT region, but I still couldn't find
the right place to properly get its address without duplicating ACPI tables
parsing :(

[    0.000000] BIOS-e820: [mem 0x00000000be49b000-0x00000000be49bfff] ACPI data
Mike Rapoport Feb. 25, 2021, 5:50 p.m. UTC | #37
On Thu, Feb 25, 2021 at 11:31:04AM -0500, George Kennedy wrote:
> 
> 
> On 2/25/2021 11:07 AM, Mike Rapoport wrote:
> > On Thu, Feb 25, 2021 at 10:22:44AM -0500, George Kennedy wrote:
> > > > > > > On 2/24/2021 5:37 AM, Mike Rapoport wrote:
> > > Applied just your latest patch, but same failure.
> > > 
> > > I thought there was an earlier comment (which I can't find now) that stated
> > > that memblock_reserve() wouldn't reserve the page, which is what's needed
> > > here.
> > Actually, I think that memblock_reserve() should be just fine, but it seems
> > I'm missing something in address calculation each time.
> > 
> > What would happen if you stuck
> > 
> > 	memblock_reserve(0xbe453000, PAGE_SIZE);
> > 
> > say, at the beginning of find_ibft_region()?
> 
> Added debug to your patch and this is all that shows up. Looks like the
> patch is in the wrong place as acpi_tb_parse_root_table() is only called for
> the RSDP address.

Right, but I think it parses table description of the other tables and
populates local tables with them.
I think the problem is with how I compare the signatures, please see below

> [    0.064317] ACPI: Early table checksum verification disabled
> [    0.065437] XXX acpi_tb_parse_root_table: rsdp_address=bfbfa014
> [    0.066612] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
> [    0.067759] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
> 00000001      01000013)
> [    0.069470] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
> 00000001 BXPC 00000001)
> [    0.071183] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
> 00000001 BXPC 00000001)
> [    0.072876] ACPI: FACS 0x00000000BFBFD000 000040
> [    0.073806] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
> 00000001 BXPC 00000001)
> [    0.075501] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
> 00000001 BXPC 00000001)
> [    0.077194] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2    
> 00000002      01000013)
> [    0.078880] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
> 00000000      00000000)
> [    0.080588] ACPI: Local APIC address 0xfee00000
> 
> diff --git a/drivers/acpi/acpica/tbutils.c b/drivers/acpi/acpica/tbutils.c
> index dfe1ac3..603b3a8 100644
> --- a/drivers/acpi/acpica/tbutils.c
> +++ b/drivers/acpi/acpica/tbutils.c
> @@ -7,6 +7,8 @@
>   *
> *****************************************************************************/
> 
> +#include <linux/memblock.h>
> +
>  #include <acpi/acpi.h>
>  #include "accommon.h"
>  #include "actables.h"
> @@ -232,6 +234,8 @@ struct acpi_table_header *acpi_tb_copy_dsdt(u32
> table_index)
>      acpi_status status;
>      u32 table_index;
> 
> +printk(KERN_ERR "XXX acpi_tb_parse_root_table: rsdp_address=%llx\n",
> rsdp_address);
> +
>      ACPI_FUNCTION_TRACE(tb_parse_root_table);
> 
>      /* Map the entire RSDP and extract the address of the RSDT or XSDT */
> @@ -339,6 +343,22 @@ struct acpi_table_header *acpi_tb_copy_dsdt(u32
> table_index)
>              acpi_tb_parse_fadt();
>          }
> 
> +        if (ACPI_SUCCESS(status) &&
> +            ACPI_COMPARE_NAMESEG(&acpi_gbl_root_table_list.
> +                     tables[table_index].signature,
> +                     ACPI_SIG_IBFT)) {

We have:

include/acpi/actbl1.h:#define ACPI_SIG_IBFT           "IBFT"    /* iSCSI Boot Firmware Table */

and the BIOS uses "iBFT", so we need to loop over possible signature
variants like iscsi_ibft_find does.

Do you mind replacing ACPI_SIG_IBFT with "iBFT" and try again?

> +            struct acpi_table_header *ibft;
> +            struct acpi_table_desc *desc;
> +
> +            desc = &acpi_gbl_root_table_list.tables[table_index];
> +            status = acpi_tb_get_table(desc, &ibft);
> +            if (ACPI_SUCCESS(status)) {
> +printk(KERN_ERR "XXX acpi_tb_parse_root_table(calling memblock_reserve()):
> addres=%llx, ibft->length=%x\n", address, ibft->length);
> +                memblock_reserve(address, ibft->length);
> +                acpi_tb_put_table(desc);
> +            }
> +        }
> +
>  next_table:
> 
>          table_entry += table_entry_size;
> 
> 
> > > [   30.308229] iBFT detected..
> > > [   30.308796]
> > > ==================================================================
> > > [   30.308890] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
> > > [   30.308890] Read of size 4 at addr ffff8880be453004 by task swapper/0/1
> > > [   30.308890]
> > > [   30.308890] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.11.0-f9593a0 #12
> > > [   30.308890] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > 0.0.0 02/06/2015
> > > [   30.308890] Call Trace:
> > > [   30.308890]  dump_stack+0xdb/0x120
> > > [   30.308890]  ? ibft_init+0x134/0xc33
> > > [   30.308890]  print_address_description.constprop.7+0x41/0x60
> > > [   30.308890]  ? ibft_init+0x134/0xc33
> > > [   30.308890]  ? ibft_init+0x134/0xc33
> > > [   30.308890]  kasan_report.cold.10+0x78/0xd1
> > > [   30.308890]  ? ibft_init+0x134/0xc33
> > > [   30.308890]  __asan_report_load_n_noabort+0xf/0x20
> > > [   30.308890]  ibft_init+0x134/0xc33
> > > [   30.308890]  ? write_comp_data+0x2f/0x90
> > > [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
> > > [   30.308890]  ? write_comp_data+0x2f/0x90
> > > [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
> > > [   30.308890]  do_one_initcall+0xc4/0x3e0
> > > [   30.308890]  ? perf_trace_initcall_level+0x3e0/0x3e0
> > > [   30.308890]  ? unpoison_range+0x14/0x40
> > > [   30.308890]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
> > > [   30.308890]  ? kernel_init_freeable+0x420/0x652
> > > [   30.308890]  ? __kasan_kmalloc+0x9/0x10
> > > [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
> > > [   30.308890]  kernel_init_freeable+0x596/0x652
> > > [   30.308890]  ? console_on_rootfs+0x7d/0x7d
> > > [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
> > > [   30.308890]  ? rest_init+0xf0/0xf0
> > > [   30.308890]  kernel_init+0x16/0x1d0
> > > [   30.308890]  ? rest_init+0xf0/0xf0
> > > [   30.308890]  ret_from_fork+0x22/0x30
> > > [   30.308890]
> > > [   30.308890] The buggy address belongs to the page:
> > > [   30.308890] page:0000000001b7b17c refcount:0 mapcount:0
> > > mapping:0000000000000000 index:0x1 pfn:0xbe453
> > > [   30.308890] flags: 0xfffffc0000000()
> > > [   30.308890] raw: 000fffffc0000000 ffffea0002ef9788 ffffea0002f91488
> > > 0000000000000000
> > > [   30.308890] raw: 0000000000000001 0000000000000000 00000000ffffffff
> > > 0000000000000000
> > > [   30.308890] page dumped because: kasan: bad access detected
> > > [   30.308890] page_owner tracks the page as freed
> > > [   30.308890] page last allocated via order 0, migratetype Movable,
> > > gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 28121288605
> > > [   30.308890]  prep_new_page+0xfb/0x140
> > > [   30.308890]  get_page_from_freelist+0x3503/0x5730
> > > [   30.308890]  __alloc_pages_nodemask+0x2d8/0x650
> > > [   30.308890]  alloc_pages_vma+0xe2/0x560
> > > [   30.308890]  __handle_mm_fault+0x930/0x26c0
> > > [   30.308890]  handle_mm_fault+0x1f9/0x810
> > > [   30.308890]  do_user_addr_fault+0x6f7/0xca0
> > > [   30.308890]  exc_page_fault+0xaf/0x1a0
> > > [   30.308890]  asm_exc_page_fault+0x1e/0x30
> > > [   30.308890] page last free stack trace:
> > > [   30.308890]  free_pcp_prepare+0x122/0x290
> > > [   30.308890]  free_unref_page_list+0xe6/0x490
> > > [   30.308890]  release_pages+0x2ed/0x1270
> > > [   30.308890]  free_pages_and_swap_cache+0x245/0x2e0
> > > [   30.308890]  tlb_flush_mmu+0x11e/0x680
> > > [   30.308890]  tlb_finish_mmu+0xa6/0x3e0
> > > [   30.308890]  exit_mmap+0x2b3/0x540
> > > [   30.308890]  mmput+0x11d/0x450
> > > [   30.308890]  do_exit+0xaa6/0x2d40
> > > [   30.308890]  do_group_exit+0x128/0x340
> > > [   30.308890]  __x64_sys_exit_group+0x43/0x50
> > > [   30.308890]  do_syscall_64+0x37/0x50
> > > [   30.308890]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > [   30.308890]
> > > [   30.308890] Memory state around the buggy address:
> > > [   30.308890]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff
> > > [   30.308890]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff
> > > [   30.308890] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff
> > > [   30.308890]                    ^
> > > [   30.308890]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff
> > > [   30.308890]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > ff ff
> > > [   30.308890]
> > > ==================================================================
> > > 
> > > George
> > > 
>
George Kennedy Feb. 26, 2021, 1:19 a.m. UTC | #38
On 2/25/2021 12:33 PM, George Kennedy wrote:
>
>
> On 2/25/2021 11:07 AM, Mike Rapoport wrote:
>> On Thu, Feb 25, 2021 at 10:22:44AM -0500, George Kennedy wrote:
>>>>>>> On 2/24/2021 5:37 AM, Mike Rapoport wrote:
>>> Applied just your latest patch, but same failure.
>>>
>>> I thought there was an earlier comment (which I can't find now) that 
>>> stated
>>> that memblock_reserve() wouldn't reserve the page, which is what's 
>>> needed
>>> here.
>> Actually, I think that memblock_reserve() should be just fine, but it 
>> seems
>> I'm missing something in address calculation each time.
>>
>> What would happen if you stuck
>>
>>     memblock_reserve(0xbe453000, PAGE_SIZE);
>>
>> say, at the beginning of find_ibft_region()?
>
> Good news Mike!
>
> The above hack in yesterday's last patch works - 10 successful 
> reboots. See: "BE453" below for the hack.
>
> I'll modify the patch to use "table_desc->address" instead, which is 
> the physical address of the table.
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 7bdc023..c118dd5 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -1551,6 +1551,7 @@ void __init acpi_boot_table_init(void)
>      if (acpi_disabled)
>          return;
>
> +#if 0
>      /*
>       * Initialize the ACPI boot-time table parser.
>       */
> @@ -1558,6 +1559,7 @@ void __init acpi_boot_table_init(void)
>          disable_acpi();
>          return;
>      }
> +#endif
>
>      acpi_table_parse(ACPI_SIG_BOOT, acpi_parse_sbf);
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 740f3bdb..b045ab2 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -571,16 +571,6 @@ void __init reserve_standard_io_resources(void)
>
>  }
>
> -static __init void reserve_ibft_region(void)
> -{
> -    unsigned long addr, size = 0;
> -
> -    addr = find_ibft_region(&size);
> -
> -    if (size)
> -        memblock_reserve(addr, size);
> -}
> -
>  static bool __init snb_gfx_workaround_needed(void)
>  {
>  #ifdef CONFIG_PCI
> @@ -1033,6 +1023,12 @@ void __init setup_arch(char **cmdline_p)
>       */
>      find_smp_config();
>
> +    /*
> +     * Initialize the ACPI boot-time table parser.
> +     */
> +    if (acpi_table_init())
> +        disable_acpi();
> +
>      reserve_ibft_region();
>
>      early_alloc_pgt_buf();
> diff --git a/drivers/firmware/iscsi_ibft_find.c 
> b/drivers/firmware/iscsi_ibft_find.c
> index 64bb945..95fc1a6 100644
> --- a/drivers/firmware/iscsi_ibft_find.c
> +++ b/drivers/firmware/iscsi_ibft_find.c
> @@ -47,7 +47,25 @@
>  #define VGA_MEM 0xA0000 /* VGA buffer */
>  #define VGA_SIZE 0x20000 /* 128kB */
>
> -static int __init find_ibft_in_mem(void)
> +static void __init *acpi_find_ibft_region(void)
> +{
> +    int i;
> +    struct acpi_table_header *table = NULL;
> +    acpi_status status;
> +
> +    if (acpi_disabled)
> +        return NULL;
> +
> +    for (i = 0; i < ARRAY_SIZE(ibft_signs) && !ibft_addr; i++) {
> +        status = acpi_get_table(ibft_signs[i].sign, 0, &table);
> +        if (ACPI_SUCCESS(status))
> +            return table;
> +    }
> +
> +    return NULL;
> +}
> +
> +static void __init *find_ibft_in_mem(void)
>  {
>      unsigned long pos;
>      unsigned int len = 0;
> @@ -70,35 +88,52 @@ static int __init find_ibft_in_mem(void)
>                  /* if the length of the table extends past 1M,
>                   * the table cannot be valid. */
>                  if (pos + len <= (IBFT_END-1)) {
> -                    ibft_addr = (struct acpi_table_ibft *)virt;
>                      pr_info("iBFT found at 0x%lx.\n", pos);
> -                    goto done;
> +                    return virt;
>                  }
>              }
>          }
>      }
> -done:
> -    return len;
> +
> +    return NULL;
>  }
> +
> +static void __init *find_ibft(void)
> +{
> +    /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> +     * only use ACPI for this */
> +    if (!efi_enabled(EFI_BOOT))
> +        return find_ibft_in_mem();
> +    else
> +        return acpi_find_ibft_region();
> +}
> +
>  /*
>   * Routine used to find the iSCSI Boot Format Table. The logical
>   * kernel address is set in the ibft_addr global variable.
>   */
> -unsigned long __init find_ibft_region(unsigned long *sizep)
> +void __init reserve_ibft_region(void)
>  {
> -    ibft_addr = NULL;
> +    struct acpi_table_ibft *table;
> +    unsigned long size;
>
> -    /* iBFT 1.03 section 1.4.3.1 mandates that UEFI machines will
> -     * only use ACPI for this */
> +    table = find_ibft();
> +    if (!table)
> +        return;
>
> -    if (!efi_enabled(EFI_BOOT))
> -        find_ibft_in_mem();
> -
> -    if (ibft_addr) {
> -        *sizep = PAGE_ALIGN(ibft_addr->header.length);
> -        return (u64)virt_to_phys(ibft_addr);
> -    }
> +    size = PAGE_ALIGN(table->header.length);
> +#if 0
> +printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, 
> virt_to_phys(table)=%llx, size=%lx\n",
> +    (u64)table, virt_to_phys(table), size);
> +    memblock_reserve(virt_to_phys(table), size);
> +#else
> +printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, 
> 0x00000000BE453000, size=%lx\n",
> +    (u64)table, size);
> +    memblock_reserve(0x00000000BE453000, size);
> +#endif
>
> -    *sizep = 0;
> -    return 0;
> +    if (efi_enabled(EFI_BOOT))
> +        acpi_put_table(&table->header);
> +    else
> +        ibft_addr = table;
>  }
> diff --git a/include/linux/iscsi_ibft.h b/include/linux/iscsi_ibft.h
> index b7b45ca..da813c8 100644
> --- a/include/linux/iscsi_ibft.h
> +++ b/include/linux/iscsi_ibft.h
> @@ -26,13 +26,9 @@
>   * mapped address is set in the ibft_addr variable.
>   */
>  #ifdef CONFIG_ISCSI_IBFT_FIND
> -unsigned long find_ibft_region(unsigned long *sizep);
> +void reserve_ibft_region(void);
>  #else
> -static inline unsigned long find_ibft_region(unsigned long *sizep)
> -{
> -    *sizep = 0;
> -    return 0;
> -}
> +static inline void reserve_ibft_region(void) {}
>  #endif
>
>  #endif /* ISCSI_IBFT_H */

Mike,

To get rid of the 0x00000000BE453000 hardcoding, I added the following 
patch to your above patch to get the iBFT table "address" to use with 
memblock_reserve():

diff --git a/drivers/acpi/acpica/tbfind.c b/drivers/acpi/acpica/tbfind.c
index 56d81e4..4bc7bf3 100644
--- a/drivers/acpi/acpica/tbfind.c
+++ b/drivers/acpi/acpica/tbfind.c
@@ -120,3 +120,34 @@
      (void)acpi_ut_release_mutex(ACPI_MTX_TABLES);
      return_ACPI_STATUS(status);
  }
+
+acpi_physical_address
+acpi_tb_find_table_address(char *signature)
+{
+    acpi_physical_address address = 0;
+    struct acpi_table_desc *table_desc;
+    int i;
+
+    ACPI_FUNCTION_TRACE(tb_find_table_address);
+
+printk(KERN_ERR "XXX acpi_tb_find_table_address: signature=%s\n", 
signature);
+
+    (void)acpi_ut_acquire_mutex(ACPI_MTX_TABLES);
+    for (i = 0; i < acpi_gbl_root_table_list.current_table_count; ++i) {
+        if (memcmp(&(acpi_gbl_root_table_list.tables[i].signature),
+               signature, ACPI_NAMESEG_SIZE)) {
+
+            /* Not the requested table */
+
+            continue;
+        }
+
+        /* Table with matching signature has been found */
+        table_desc = &acpi_gbl_root_table_list.tables[i];
+        address = table_desc->address;
+    }
+
+    (void)acpi_ut_release_mutex(ACPI_MTX_TABLES);
+printk(KERN_ERR "XXX acpi_tb_find_table_address(EXIT): address=%llx\n", 
address);
+    return address;
+}
diff --git a/drivers/firmware/iscsi_ibft_find.c 
b/drivers/firmware/iscsi_ibft_find.c
index 95fc1a6..0de70b4 100644
--- a/drivers/firmware/iscsi_ibft_find.c
+++ b/drivers/firmware/iscsi_ibft_find.c
@@ -28,6 +28,8 @@

  #include <asm/mmzone.h>

+extern acpi_physical_address acpi_tb_find_table_address(char *signature);
+
  /*
   * Physical location of iSCSI Boot Format Table.
   */
@@ -116,24 +118,32 @@ void __init reserve_ibft_region(void)
  {
      struct acpi_table_ibft *table;
      unsigned long size;
+    acpi_physical_address address;

      table = find_ibft();
      if (!table)
          return;

      size = PAGE_ALIGN(table->header.length);
+    address = acpi_tb_find_table_address(table->header.signature);
  #if 0
  printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, 
virt_to_phys(table)=%llx, size=%lx\n",
      (u64)table, virt_to_phys(table), size);
      memblock_reserve(virt_to_phys(table), size);
  #else
-printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, 
0x00000000BE453000, size=%lx\n",
-    (u64)table, size);
-    memblock_reserve(0x00000000BE453000, size);
+printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, address=%llx, 
size=%lx\n",
+    (u64)table, address, size);
+    if (address)
+        memblock_reserve(address, size);
+    else
+        printk(KERN_ERR "%s: Can't find table address\n", __func__);
  #endif

-    if (efi_enabled(EFI_BOOT))
+    if (efi_enabled(EFI_BOOT)) {
+printk(KERN_ERR "XXX reserve_ibft_region: calling 
acpi_put_table(%llx)\n", (u64)&table->header);
          acpi_put_table(&table->header);
-    else
+    } else {
          ibft_addr = table;
+printk(KERN_ERR "XXX reserve_ibft_region: ibft_addr=%llx\n", 
(u64)ibft_addr);
+    }
  }

Debug from the above:
[    0.050646] ACPI: Early table checksum verification disabled
[    0.051778] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
[    0.052922] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP 
00000001      01000013)
[    0.054623] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP 
00000001 BXPC 00000001)
[    0.056326] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT 
00000001 BXPC 00000001)
[    0.058016] ACPI: FACS 0x00000000BFBFD000 000040
[    0.058940] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC 
00000001 BXPC 00000001)
[    0.060627] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET 
00000001 BXPC 00000001)
[    0.062304] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2     
00000002      01000013)
[    0.063987] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP 
00000000      00000000)
[    0.065683] XXX acpi_tb_find_table_address: signature=iBFT
[    0.066754] XXX acpi_tb_find_table_address(EXIT): address=be453000
[    0.067959] XXX reserve_ibft_region: table=ffffffffff240000, 
address=be453000, size=1000
[    0.069534] XXX reserve_ibft_region: calling 
acpi_put_table(ffffffffff240000)

Not sure if it's the right thing to do, but added 
"acpi_tb_find_table_address()" to return the physical address of a table 
to use with memblock_reserve().

virt_to_phys(table) does not seem to return the physical address for the 
iBFT table (it would be nice if struct acpi_table_header also had a 
"address" element for the physical address of the table).

Ran 10 successful boots with the above without failure.

George
>
>
> Debug from the above:
>
> [    0.020293] last_pfn = 0xbfedc max_arch_pfn = 0x400000000
> [    0.050778] ACPI: Early table checksum verification disabled
> [    0.056475] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
> [    0.057628] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS 
> BXPCFACP 00000001      01000013)
> [    0.059341] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS 
> BXPCFACP 00000001 BXPC 00000001)
> [    0.061043] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS 
> BXPCDSDT 00000001 BXPC 00000001)
> [    0.062740] ACPI: FACS 0x00000000BFBFD000 000040
> [    0.063673] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS 
> BXPCAPIC 00000001 BXPC 00000001)
> [    0.065369] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS 
> BXPCHPET 00000001 BXPC 00000001)
> [    0.067061] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL 
> EDK2     00000002      01000013)
> [    0.068761] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS 
> BXPCFACP 00000000      00000000)
> [    0.070461] XXX reserve_ibft_region: table=ffffffffff240000, 
> 0x00000000BE453000, size=1000
> [    0.072231] check: Scanning 1 areas for low memory corruption
>
> George
>>> [   30.308229] iBFT detected..
>>> [   30.308796]
>>> ==================================================================
>>> [   30.308890] BUG: KASAN: use-after-free in ibft_init+0x134/0xc33
>>> [   30.308890] Read of size 4 at addr ffff8880be453004 by task 
>>> swapper/0/1
>>> [   30.308890]
>>> [   30.308890] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
>>> 5.11.0-f9593a0 #12
>>> [   30.308890] Hardware name: QEMU Standard PC (i440FX + PIIX, 
>>> 1996), BIOS
>>> 0.0.0 02/06/2015
>>> [   30.308890] Call Trace:
>>> [   30.308890]  dump_stack+0xdb/0x120
>>> [   30.308890]  ? ibft_init+0x134/0xc33
>>> [   30.308890] print_address_description.constprop.7+0x41/0x60
>>> [   30.308890]  ? ibft_init+0x134/0xc33
>>> [   30.308890]  ? ibft_init+0x134/0xc33
>>> [   30.308890]  kasan_report.cold.10+0x78/0xd1
>>> [   30.308890]  ? ibft_init+0x134/0xc33
>>> [   30.308890]  __asan_report_load_n_noabort+0xf/0x20
>>> [   30.308890]  ibft_init+0x134/0xc33
>>> [   30.308890]  ? write_comp_data+0x2f/0x90
>>> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
>>> [   30.308890]  ? write_comp_data+0x2f/0x90
>>> [   30.308890]  ? ibft_check_initiator_for+0x159/0x159
>>> [   30.308890]  do_one_initcall+0xc4/0x3e0
>>> [   30.308890]  ? perf_trace_initcall_level+0x3e0/0x3e0
>>> [   30.308890]  ? unpoison_range+0x14/0x40
>>> [   30.308890]  ? ____kasan_kmalloc.constprop.5+0x8f/0xc0
>>> [   30.308890]  ? kernel_init_freeable+0x420/0x652
>>> [   30.308890]  ? __kasan_kmalloc+0x9/0x10
>>> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
>>> [   30.308890]  kernel_init_freeable+0x596/0x652
>>> [   30.308890]  ? console_on_rootfs+0x7d/0x7d
>>> [   30.308890]  ? __sanitizer_cov_trace_pc+0x21/0x50
>>> [   30.308890]  ? rest_init+0xf0/0xf0
>>> [   30.308890]  kernel_init+0x16/0x1d0
>>> [   30.308890]  ? rest_init+0xf0/0xf0
>>> [   30.308890]  ret_from_fork+0x22/0x30
>>> [   30.308890]
>>> [   30.308890] The buggy address belongs to the page:
>>> [   30.308890] page:0000000001b7b17c refcount:0 mapcount:0
>>> mapping:0000000000000000 index:0x1 pfn:0xbe453
>>> [   30.308890] flags: 0xfffffc0000000()
>>> [   30.308890] raw: 000fffffc0000000 ffffea0002ef9788 ffffea0002f91488
>>> 0000000000000000
>>> [   30.308890] raw: 0000000000000001 0000000000000000 00000000ffffffff
>>> 0000000000000000
>>> [   30.308890] page dumped because: kasan: bad access detected
>>> [   30.308890] page_owner tracks the page as freed
>>> [   30.308890] page last allocated via order 0, migratetype Movable,
>>> gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), pid 204, ts 
>>> 28121288605
>>> [   30.308890]  prep_new_page+0xfb/0x140
>>> [   30.308890]  get_page_from_freelist+0x3503/0x5730
>>> [   30.308890]  __alloc_pages_nodemask+0x2d8/0x650
>>> [   30.308890]  alloc_pages_vma+0xe2/0x560
>>> [   30.308890]  __handle_mm_fault+0x930/0x26c0
>>> [   30.308890]  handle_mm_fault+0x1f9/0x810
>>> [   30.308890]  do_user_addr_fault+0x6f7/0xca0
>>> [   30.308890]  exc_page_fault+0xaf/0x1a0
>>> [   30.308890]  asm_exc_page_fault+0x1e/0x30
>>> [   30.308890] page last free stack trace:
>>> [   30.308890]  free_pcp_prepare+0x122/0x290
>>> [   30.308890]  free_unref_page_list+0xe6/0x490
>>> [   30.308890]  release_pages+0x2ed/0x1270
>>> [   30.308890]  free_pages_and_swap_cache+0x245/0x2e0
>>> [   30.308890]  tlb_flush_mmu+0x11e/0x680
>>> [   30.308890]  tlb_finish_mmu+0xa6/0x3e0
>>> [   30.308890]  exit_mmap+0x2b3/0x540
>>> [   30.308890]  mmput+0x11d/0x450
>>> [   30.308890]  do_exit+0xaa6/0x2d40
>>> [   30.308890]  do_group_exit+0x128/0x340
>>> [   30.308890]  __x64_sys_exit_group+0x43/0x50
>>> [   30.308890]  do_syscall_64+0x37/0x50
>>> [   30.308890]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [   30.308890]
>>> [   30.308890] Memory state around the buggy address:
>>> [   30.308890]  ffff8880be452f00: ff ff ff ff ff ff ff ff ff ff ff 
>>> ff ff ff
>>> ff ff
>>> [   30.308890]  ffff8880be452f80: ff ff ff ff ff ff ff ff ff ff ff 
>>> ff ff ff
>>> ff ff
>>> [   30.308890] >ffff8880be453000: ff ff ff ff ff ff ff ff ff ff ff 
>>> ff ff ff
>>> ff ff
>>> [   30.308890]                    ^
>>> [   30.308890]  ffff8880be453080: ff ff ff ff ff ff ff ff ff ff ff 
>>> ff ff ff
>>> ff ff
>>> [   30.308890]  ffff8880be453100: ff ff ff ff ff ff ff ff ff ff ff 
>>> ff ff ff
>>> ff ff
>>> [   30.308890]
>>> ==================================================================
>>>
>>> George
>>>
>
Mike Rapoport Feb. 26, 2021, 11:17 a.m. UTC | #39
Hi George,

On Thu, Feb 25, 2021 at 08:19:18PM -0500, George Kennedy wrote:
> 
> Mike,
> 
> To get rid of the 0x00000000BE453000 hardcoding, I added the following patch
> to your above patch to get the iBFT table "address" to use with
> memblock_reserve():
> 
> diff --git a/drivers/acpi/acpica/tbfind.c b/drivers/acpi/acpica/tbfind.c
> index 56d81e4..4bc7bf3 100644
> --- a/drivers/acpi/acpica/tbfind.c
> +++ b/drivers/acpi/acpica/tbfind.c
> @@ -120,3 +120,34 @@
>      (void)acpi_ut_release_mutex(ACPI_MTX_TABLES);
>      return_ACPI_STATUS(status);
>  }
> +
> +acpi_physical_address
> +acpi_tb_find_table_address(char *signature)
> +{
> +    acpi_physical_address address = 0;
> +    struct acpi_table_desc *table_desc;
> +    int i;
> +
> +    ACPI_FUNCTION_TRACE(tb_find_table_address);
> +
> +printk(KERN_ERR "XXX acpi_tb_find_table_address: signature=%s\n",
> signature);
> +
> +    (void)acpi_ut_acquire_mutex(ACPI_MTX_TABLES);
> +    for (i = 0; i < acpi_gbl_root_table_list.current_table_count; ++i) {
> +        if (memcmp(&(acpi_gbl_root_table_list.tables[i].signature),
> +               signature, ACPI_NAMESEG_SIZE)) {
> +
> +            /* Not the requested table */
> +
> +            continue;
> +        }
> +
> +        /* Table with matching signature has been found */
> +        table_desc = &acpi_gbl_root_table_list.tables[i];
> +        address = table_desc->address;
> +    }
> +
> +    (void)acpi_ut_release_mutex(ACPI_MTX_TABLES);
> +printk(KERN_ERR "XXX acpi_tb_find_table_address(EXIT): address=%llx\n",
> address);
> +    return address;
> +}
> diff --git a/drivers/firmware/iscsi_ibft_find.c
> b/drivers/firmware/iscsi_ibft_find.c
> index 95fc1a6..0de70b4 100644
> --- a/drivers/firmware/iscsi_ibft_find.c
> +++ b/drivers/firmware/iscsi_ibft_find.c
> @@ -28,6 +28,8 @@
> 
>  #include <asm/mmzone.h>
> 
> +extern acpi_physical_address acpi_tb_find_table_address(char *signature);
> +
>  /*
>   * Physical location of iSCSI Boot Format Table.
>   */
> @@ -116,24 +118,32 @@ void __init reserve_ibft_region(void)
>  {
>      struct acpi_table_ibft *table;
>      unsigned long size;
> +    acpi_physical_address address;
> 
>      table = find_ibft();
>      if (!table)
>          return;
> 
>      size = PAGE_ALIGN(table->header.length);
> +    address = acpi_tb_find_table_address(table->header.signature);
>  #if 0
>  printk(KERN_ERR "XXX reserve_ibft_region: table=%llx,
> virt_to_phys(table)=%llx, size=%lx\n",
>      (u64)table, virt_to_phys(table), size);
>      memblock_reserve(virt_to_phys(table), size);
>  #else
> -printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, 0x00000000BE453000,
> size=%lx\n",
> -    (u64)table, size);
> -    memblock_reserve(0x00000000BE453000, size);
> +printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, address=%llx,
> size=%lx\n",
> +    (u64)table, address, size);
> +    if (address)
> +        memblock_reserve(address, size);
> +    else
> +        printk(KERN_ERR "%s: Can't find table address\n", __func__);
>  #endif
> 
> -    if (efi_enabled(EFI_BOOT))
> +    if (efi_enabled(EFI_BOOT)) {
> +printk(KERN_ERR "XXX reserve_ibft_region: calling acpi_put_table(%llx)\n",
> (u64)&table->header);
>          acpi_put_table(&table->header);
> -    else
> +    } else {
>          ibft_addr = table;
> +printk(KERN_ERR "XXX reserve_ibft_region: ibft_addr=%llx\n",
> (u64)ibft_addr);
> +    }
>  }
> 
> Debug from the above:
> [    0.050646] ACPI: Early table checksum verification disabled
> [    0.051778] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
> [    0.052922] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
> 00000001      01000013)
> [    0.054623] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
> 00000001 BXPC 00000001)
> [    0.056326] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
> 00000001 BXPC 00000001)
> [    0.058016] ACPI: FACS 0x00000000BFBFD000 000040
> [    0.058940] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
> 00000001 BXPC 00000001)
> [    0.060627] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
> 00000001 BXPC 00000001)
> [    0.062304] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2    
> 00000002      01000013)
> [    0.063987] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
> 00000000      00000000)
> [    0.065683] XXX acpi_tb_find_table_address: signature=iBFT
> [    0.066754] XXX acpi_tb_find_table_address(EXIT): address=be453000
> [    0.067959] XXX reserve_ibft_region: table=ffffffffff240000,
> address=be453000, size=1000
> [    0.069534] XXX reserve_ibft_region: calling
> acpi_put_table(ffffffffff240000)
> 
> Not sure if it's the right thing to do, but added
> "acpi_tb_find_table_address()" to return the physical address of a table to
> use with memblock_reserve().
> 
> virt_to_phys(table) does not seem to return the physical address for the
> iBFT table (it would be nice if struct acpi_table_header also had a
> "address" element for the physical address of the table).

virt_to_phys() does not work that early because then it is mapped with
early_memremap()  which uses different virtual to physical scheme.

I'd say that acpi_tb_find_table_address() makes sense if we'd like to
reserve ACPI tables outside of drivers/acpi. 

But probably we should simply reserve all the tables during
acpi_table_init() so that any table that firmware put in the normal memory
will be surely reserved.
 
> Ran 10 successful boots with the above without failure.

That's good news indeed :)

> George
> > 
> >
George Kennedy Feb. 26, 2021, 4:16 p.m. UTC | #40
Hi Mike,

On 2/26/2021 6:17 AM, Mike Rapoport wrote:
> Hi George,
>
> On Thu, Feb 25, 2021 at 08:19:18PM -0500, George Kennedy wrote:
>> Mike,
>>
>> To get rid of the 0x00000000BE453000 hardcoding, I added the following patch
>> to your above patch to get the iBFT table "address" to use with
>> memblock_reserve():
>>
>> diff --git a/drivers/acpi/acpica/tbfind.c b/drivers/acpi/acpica/tbfind.c
>> index 56d81e4..4bc7bf3 100644
>> --- a/drivers/acpi/acpica/tbfind.c
>> +++ b/drivers/acpi/acpica/tbfind.c
>> @@ -120,3 +120,34 @@
>>       (void)acpi_ut_release_mutex(ACPI_MTX_TABLES);
>>       return_ACPI_STATUS(status);
>>   }
>> +
>> +acpi_physical_address
>> +acpi_tb_find_table_address(char *signature)
>> +{
>> +    acpi_physical_address address = 0;
>> +    struct acpi_table_desc *table_desc;
>> +    int i;
>> +
>> +    ACPI_FUNCTION_TRACE(tb_find_table_address);
>> +
>> +printk(KERN_ERR "XXX acpi_tb_find_table_address: signature=%s\n",
>> signature);
>> +
>> +    (void)acpi_ut_acquire_mutex(ACPI_MTX_TABLES);
>> +    for (i = 0; i < acpi_gbl_root_table_list.current_table_count; ++i) {
>> +        if (memcmp(&(acpi_gbl_root_table_list.tables[i].signature),
>> +               signature, ACPI_NAMESEG_SIZE)) {
>> +
>> +            /* Not the requested table */
>> +
>> +            continue;
>> +        }
>> +
>> +        /* Table with matching signature has been found */
>> +        table_desc = &acpi_gbl_root_table_list.tables[i];
>> +        address = table_desc->address;
>> +    }
>> +
>> +    (void)acpi_ut_release_mutex(ACPI_MTX_TABLES);
>> +printk(KERN_ERR "XXX acpi_tb_find_table_address(EXIT): address=%llx\n",
>> address);
>> +    return address;
>> +}
>> diff --git a/drivers/firmware/iscsi_ibft_find.c
>> b/drivers/firmware/iscsi_ibft_find.c
>> index 95fc1a6..0de70b4 100644
>> --- a/drivers/firmware/iscsi_ibft_find.c
>> +++ b/drivers/firmware/iscsi_ibft_find.c
>> @@ -28,6 +28,8 @@
>>
>>   #include <asm/mmzone.h>
>>
>> +extern acpi_physical_address acpi_tb_find_table_address(char *signature);
>> +
>>   /*
>>    * Physical location of iSCSI Boot Format Table.
>>    */
>> @@ -116,24 +118,32 @@ void __init reserve_ibft_region(void)
>>   {
>>       struct acpi_table_ibft *table;
>>       unsigned long size;
>> +    acpi_physical_address address;
>>
>>       table = find_ibft();
>>       if (!table)
>>           return;
>>
>>       size = PAGE_ALIGN(table->header.length);
>> +    address = acpi_tb_find_table_address(table->header.signature);
>>   #if 0
>>   printk(KERN_ERR "XXX reserve_ibft_region: table=%llx,
>> virt_to_phys(table)=%llx, size=%lx\n",
>>       (u64)table, virt_to_phys(table), size);
>>       memblock_reserve(virt_to_phys(table), size);
>>   #else
>> -printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, 0x00000000BE453000,
>> size=%lx\n",
>> -    (u64)table, size);
>> -    memblock_reserve(0x00000000BE453000, size);
>> +printk(KERN_ERR "XXX reserve_ibft_region: table=%llx, address=%llx,
>> size=%lx\n",
>> +    (u64)table, address, size);
>> +    if (address)
>> +        memblock_reserve(address, size);
>> +    else
>> +        printk(KERN_ERR "%s: Can't find table address\n", __func__);
>>   #endif
>>
>> -    if (efi_enabled(EFI_BOOT))
>> +    if (efi_enabled(EFI_BOOT)) {
>> +printk(KERN_ERR "XXX reserve_ibft_region: calling acpi_put_table(%llx)\n",
>> (u64)&table->header);
>>           acpi_put_table(&table->header);
>> -    else
>> +    } else {
>>           ibft_addr = table;
>> +printk(KERN_ERR "XXX reserve_ibft_region: ibft_addr=%llx\n",
>> (u64)ibft_addr);
>> +    }
>>   }
>>
>> Debug from the above:
>> [    0.050646] ACPI: Early table checksum verification disabled
>> [    0.051778] ACPI: RSDP 0x00000000BFBFA014 000024 (v02 BOCHS )
>> [    0.052922] ACPI: XSDT 0x00000000BFBF90E8 00004C (v01 BOCHS BXPCFACP
>> 00000001      01000013)
>> [    0.054623] ACPI: FACP 0x00000000BFBF5000 000074 (v01 BOCHS BXPCFACP
>> 00000001 BXPC 00000001)
>> [    0.056326] ACPI: DSDT 0x00000000BFBF6000 00238D (v01 BOCHS BXPCDSDT
>> 00000001 BXPC 00000001)
>> [    0.058016] ACPI: FACS 0x00000000BFBFD000 000040
>> [    0.058940] ACPI: APIC 0x00000000BFBF4000 000090 (v01 BOCHS BXPCAPIC
>> 00000001 BXPC 00000001)
>> [    0.060627] ACPI: HPET 0x00000000BFBF3000 000038 (v01 BOCHS BXPCHPET
>> 00000001 BXPC 00000001)
>> [    0.062304] ACPI: BGRT 0x00000000BE49B000 000038 (v01 INTEL EDK2
>> 00000002      01000013)
>> [    0.063987] ACPI: iBFT 0x00000000BE453000 000800 (v01 BOCHS BXPCFACP
>> 00000000      00000000)
>> [    0.065683] XXX acpi_tb_find_table_address: signature=iBFT
>> [    0.066754] XXX acpi_tb_find_table_address(EXIT): address=be453000
>> [    0.067959] XXX reserve_ibft_region: table=ffffffffff240000,
>> address=be453000, size=1000
>> [    0.069534] XXX reserve_ibft_region: calling
>> acpi_put_table(ffffffffff240000)
>>
>> Not sure if it's the right thing to do, but added
>> "acpi_tb_find_table_address()" to return the physical address of a table to
>> use with memblock_reserve().
>>
>> virt_to_phys(table) does not seem to return the physical address for the
>> iBFT table (it would be nice if struct acpi_table_header also had a
>> "address" element for the physical address of the table).
> virt_to_phys() does not work that early because then it is mapped with
> early_memremap()  which uses different virtual to physical scheme.
>
> I'd say that acpi_tb_find_table_address() makes sense if we'd like to
> reserve ACPI tables outside of drivers/acpi.
>
> But probably we should simply reserve all the tables during
> acpi_table_init() so that any table that firmware put in the normal memory
> will be surely reserved.
>   
>> Ran 10 successful boots with the above without failure.
> That's good news indeed :)

Wondering if we could do something like this instead (trying to keep 
changes minimal). Just do the memblock_reserve() for all the standard 
tables.

diff --git a/drivers/acpi/acpica/tbinstal.c b/drivers/acpi/acpica/tbinstal.c
index 0bb15ad..830f82c 100644
--- a/drivers/acpi/acpica/tbinstal.c
+++ b/drivers/acpi/acpica/tbinstal.c
@@ -7,6 +7,7 @@
   *
*****************************************************************************/

+#include <linux/memblock.h>
  #include <acpi/acpi.h>
  #include "accommon.h"
  #include "actables.h"
@@ -14,6 +15,23 @@
  #define _COMPONENT          ACPI_TABLES
  ACPI_MODULE_NAME("tbinstal")

+void
+acpi_tb_reserve_standard_table(acpi_physical_address address,
+               struct acpi_table_header *header)
+{
+    struct acpi_table_header local_header;
+
+    if ((ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_FACS)) ||
+        (ACPI_VALIDATE_RSDP_SIG(header->signature))) {
+        return;
+    }
+    /* Standard ACPI table with full common header */
+
+    memcpy(&local_header, header, sizeof(struct acpi_table_header));
+
+    memblock_reserve(address, PAGE_ALIGN(local_header.length));
+}
+
  /*******************************************************************************
   *
   * FUNCTION:    acpi_tb_install_table_with_override
@@ -58,6 +76,9 @@
                        new_table_desc->flags,
                        new_table_desc->pointer);

+    acpi_tb_reserve_standard_table(new_table_desc->address,
+                   new_table_desc->pointer);
+
      acpi_tb_print_table_header(new_table_desc->address,
                     new_table_desc->pointer);

There should be no harm in doing the memblock_reserve() for all the 
standard tables, right?

Ran 10 boots with the above without failure.

George
>> George
>>>
Mike Rapoport Feb. 28, 2021, 6:08 p.m. UTC | #41
On Fri, Feb 26, 2021 at 11:16:06AM -0500, George Kennedy wrote:
> On 2/26/2021 6:17 AM, Mike Rapoport wrote:
> > Hi George,
> > 
> > On Thu, Feb 25, 2021 at 08:19:18PM -0500, George Kennedy wrote:
> > > 
> > > Not sure if it's the right thing to do, but added
> > > "acpi_tb_find_table_address()" to return the physical address of a table to
> > > use with memblock_reserve().
> > > 
> > > virt_to_phys(table) does not seem to return the physical address for the
> > > iBFT table (it would be nice if struct acpi_table_header also had a
> > > "address" element for the physical address of the table).
> >
> > virt_to_phys() does not work that early because then it is mapped with
> > early_memremap()  which uses different virtual to physical scheme.
> > 
> > I'd say that acpi_tb_find_table_address() makes sense if we'd like to
> > reserve ACPI tables outside of drivers/acpi.
> > 
> > But probably we should simply reserve all the tables during
> > acpi_table_init() so that any table that firmware put in the normal memory
> > will be surely reserved.
> > > Ran 10 successful boots with the above without failure.
> > That's good news indeed :)
> 
> Wondering if we could do something like this instead (trying to keep changes
> minimal). Just do the memblock_reserve() for all the standard tables.

I think something like this should work, but I'm not an ACPI expert to say
if this the best way to reserve the tables.
 
> diff --git a/drivers/acpi/acpica/tbinstal.c b/drivers/acpi/acpica/tbinstal.c
> index 0bb15ad..830f82c 100644
> --- a/drivers/acpi/acpica/tbinstal.c
> +++ b/drivers/acpi/acpica/tbinstal.c
> @@ -7,6 +7,7 @@
>   *
> *****************************************************************************/
> 
> +#include <linux/memblock.h>
>  #include <acpi/acpi.h>
>  #include "accommon.h"
>  #include "actables.h"
> @@ -14,6 +15,23 @@
>  #define _COMPONENT          ACPI_TABLES
>  ACPI_MODULE_NAME("tbinstal")
> 
> +void
> +acpi_tb_reserve_standard_table(acpi_physical_address address,
> +               struct acpi_table_header *header)
> +{
> +    struct acpi_table_header local_header;
> +
> +    if ((ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_FACS)) ||
> +        (ACPI_VALIDATE_RSDP_SIG(header->signature))) {
> +        return;
> +    }
> +    /* Standard ACPI table with full common header */
> +
> +    memcpy(&local_header, header, sizeof(struct acpi_table_header));
> +
> +    memblock_reserve(address, PAGE_ALIGN(local_header.length));
> +}
> +
>  /*******************************************************************************
>   *
>   * FUNCTION:    acpi_tb_install_table_with_override
> @@ -58,6 +76,9 @@
>                        new_table_desc->flags,
>                        new_table_desc->pointer);
> 
> +    acpi_tb_reserve_standard_table(new_table_desc->address,
> +                   new_table_desc->pointer);
> +
>      acpi_tb_print_table_header(new_table_desc->address,
>                     new_table_desc->pointer);
> 
> There should be no harm in doing the memblock_reserve() for all the standard
> tables, right?

It should be ok to memblock_reserve() all the tables very early as long as
we don't run out of static entries in memblock.reserved.

We just need to make sure the tables are reserved before memblock
allocations are possible, so we'd still need to move acpi_table_init() in
x86::setup_arch() before e820__memblock_setup().
Not sure how early ACPI is initialized on arm64.
 
> Ran 10 boots with the above without failure.
> 
> George
George Kennedy March 1, 2021, 2:29 p.m. UTC | #42
On 2/28/2021 1:08 PM, Mike Rapoport wrote:
> On Fri, Feb 26, 2021 at 11:16:06AM -0500, George Kennedy wrote:
>> On 2/26/2021 6:17 AM, Mike Rapoport wrote:
>>> Hi George,
>>>
>>> On Thu, Feb 25, 2021 at 08:19:18PM -0500, George Kennedy wrote:
>>>> Not sure if it's the right thing to do, but added
>>>> "acpi_tb_find_table_address()" to return the physical address of a table to
>>>> use with memblock_reserve().
>>>>
>>>> virt_to_phys(table) does not seem to return the physical address for the
>>>> iBFT table (it would be nice if struct acpi_table_header also had a
>>>> "address" element for the physical address of the table).
>>> virt_to_phys() does not work that early because then it is mapped with
>>> early_memremap()  which uses different virtual to physical scheme.
>>>
>>> I'd say that acpi_tb_find_table_address() makes sense if we'd like to
>>> reserve ACPI tables outside of drivers/acpi.
>>>
>>> But probably we should simply reserve all the tables during
>>> acpi_table_init() so that any table that firmware put in the normal memory
>>> will be surely reserved.
>>>> Ran 10 successful boots with the above without failure.
>>> That's good news indeed :)
>> Wondering if we could do something like this instead (trying to keep changes
>> minimal). Just do the memblock_reserve() for all the standard tables.
> I think something like this should work, but I'm not an ACPI expert to say
> if this the best way to reserve the tables.
Adding ACPI maintainers to the CC list.
>   
>> diff --git a/drivers/acpi/acpica/tbinstal.c b/drivers/acpi/acpica/tbinstal.c
>> index 0bb15ad..830f82c 100644
>> --- a/drivers/acpi/acpica/tbinstal.c
>> +++ b/drivers/acpi/acpica/tbinstal.c
>> @@ -7,6 +7,7 @@
>>    *
>> *****************************************************************************/
>>
>> +#include <linux/memblock.h>
>>   #include <acpi/acpi.h>
>>   #include "accommon.h"
>>   #include "actables.h"
>> @@ -14,6 +15,23 @@
>>   #define _COMPONENT          ACPI_TABLES
>>   ACPI_MODULE_NAME("tbinstal")
>>
>> +void
>> +acpi_tb_reserve_standard_table(acpi_physical_address address,
>> +               struct acpi_table_header *header)
>> +{
>> +    struct acpi_table_header local_header;
>> +
>> +    if ((ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_FACS)) ||
>> +        (ACPI_VALIDATE_RSDP_SIG(header->signature))) {
>> +        return;
>> +    }
>> +    /* Standard ACPI table with full common header */
>> +
>> +    memcpy(&local_header, header, sizeof(struct acpi_table_header));
>> +
>> +    memblock_reserve(address, PAGE_ALIGN(local_header.length));
>> +}
>> +
>>   /*******************************************************************************
>>    *
>>    * FUNCTION:    acpi_tb_install_table_with_override
>> @@ -58,6 +76,9 @@
>>                         new_table_desc->flags,
>>                         new_table_desc->pointer);
>>
>> +    acpi_tb_reserve_standard_table(new_table_desc->address,
>> +                   new_table_desc->pointer);
>> +
>>       acpi_tb_print_table_header(new_table_desc->address,
>>                      new_table_desc->pointer);
>>
>> There should be no harm in doing the memblock_reserve() for all the standard
>> tables, right?
> It should be ok to memblock_reserve() all the tables very early as long as
> we don't run out of static entries in memblock.reserved.
>
> We just need to make sure the tables are reserved before memblock
> allocations are possible, so we'd still need to move acpi_table_init() in
> x86::setup_arch() before e820__memblock_setup().
> Not sure how early ACPI is initialized on arm64.

Thanks Mike. Will try to move the memblock_reserves() before 
e820__memblock_setup().

George
>   
>> Ran 10 boots with the above without failure.
>>
>> George
George Kennedy March 2, 2021, 1:20 a.m. UTC | #43
On 3/1/2021 9:29 AM, George Kennedy wrote:
>
>
> On 2/28/2021 1:08 PM, Mike Rapoport wrote:
>> On Fri, Feb 26, 2021 at 11:16:06AM -0500, George Kennedy wrote:
>>> On 2/26/2021 6:17 AM, Mike Rapoport wrote:
>>>> Hi George,
>>>>
>>>> On Thu, Feb 25, 2021 at 08:19:18PM -0500, George Kennedy wrote:
>>>>> Not sure if it's the right thing to do, but added
>>>>> "acpi_tb_find_table_address()" to return the physical address of a 
>>>>> table to
>>>>> use with memblock_reserve().
>>>>>
>>>>> virt_to_phys(table) does not seem to return the physical address 
>>>>> for the
>>>>> iBFT table (it would be nice if struct acpi_table_header also had a
>>>>> "address" element for the physical address of the table).
>>>> virt_to_phys() does not work that early because then it is mapped with
>>>> early_memremap()  which uses different virtual to physical scheme.
>>>>
>>>> I'd say that acpi_tb_find_table_address() makes sense if we'd like to
>>>> reserve ACPI tables outside of drivers/acpi.
>>>>
>>>> But probably we should simply reserve all the tables during
>>>> acpi_table_init() so that any table that firmware put in the normal 
>>>> memory
>>>> will be surely reserved.
>>>>> Ran 10 successful boots with the above without failure.
>>>> That's good news indeed :)
>>> Wondering if we could do something like this instead (trying to keep 
>>> changes
>>> minimal). Just do the memblock_reserve() for all the standard tables.
>> I think something like this should work, but I'm not an ACPI expert 
>> to say
>> if this the best way to reserve the tables.
> Adding ACPI maintainers to the CC list.
>>> diff --git a/drivers/acpi/acpica/tbinstal.c 
>>> b/drivers/acpi/acpica/tbinstal.c
>>> index 0bb15ad..830f82c 100644
>>> --- a/drivers/acpi/acpica/tbinstal.c
>>> +++ b/drivers/acpi/acpica/tbinstal.c
>>> @@ -7,6 +7,7 @@
>>>    *
>>> *****************************************************************************/ 
>>>
>>>
>>> +#include <linux/memblock.h>
>>>   #include <acpi/acpi.h>
>>>   #include "accommon.h"
>>>   #include "actables.h"
>>> @@ -14,6 +15,23 @@
>>>   #define _COMPONENT          ACPI_TABLES
>>>   ACPI_MODULE_NAME("tbinstal")
>>>
>>> +void
>>> +acpi_tb_reserve_standard_table(acpi_physical_address address,
>>> +               struct acpi_table_header *header)
>>> +{
>>> +    struct acpi_table_header local_header;
>>> +
>>> +    if ((ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_FACS)) ||
>>> +        (ACPI_VALIDATE_RSDP_SIG(header->signature))) {
>>> +        return;
>>> +    }
>>> +    /* Standard ACPI table with full common header */
>>> +
>>> +    memcpy(&local_header, header, sizeof(struct acpi_table_header));
>>> +
>>> +    memblock_reserve(address, PAGE_ALIGN(local_header.length));
>>> +}
>>> +
>>>   /******************************************************************************* 
>>>
>>>    *
>>>    * FUNCTION:    acpi_tb_install_table_with_override
>>> @@ -58,6 +76,9 @@
>>>                         new_table_desc->flags,
>>>                         new_table_desc->pointer);
>>>
>>> + acpi_tb_reserve_standard_table(new_table_desc->address,
>>> +                   new_table_desc->pointer);
>>> +
>>>       acpi_tb_print_table_header(new_table_desc->address,
>>>                      new_table_desc->pointer);
>>>
>>> There should be no harm in doing the memblock_reserve() for all the 
>>> standard
>>> tables, right?
>> It should be ok to memblock_reserve() all the tables very early as 
>> long as
>> we don't run out of static entries in memblock.reserved.
>>
>> We just need to make sure the tables are reserved before memblock
>> allocations are possible, so we'd still need to move 
>> acpi_table_init() in
>> x86::setup_arch() before e820__memblock_setup().
>> Not sure how early ACPI is initialized on arm64.
>
> Thanks Mike. Will try to move the memblock_reserves() before 
> e820__memblock_setup().

Hi Mike,

Moved acpi_table_init() in x86::setup_arch() before 
e820__memblock_setup() as you suggested.

Ran 10 boots with the following without error.

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 740f3bdb..3b1dd24 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1047,6 +1047,7 @@ void __init setup_arch(char **cmdline_p)
      cleanup_highmap();

      memblock_set_current_limit(ISA_END_ADDRESS);
+    acpi_boot_table_init();
      e820__memblock_setup();

      /*
@@ -1140,8 +1141,6 @@ void __init setup_arch(char **cmdline_p)
      /*
       * Parse the ACPI tables for possible boot-time SMP configuration.
       */
-    acpi_boot_table_init();
-
      early_acpi_boot_init();

      initmem_init();
diff --git a/drivers/acpi/acpica/tbinstal.c b/drivers/acpi/acpica/tbinstal.c
index 0bb15ad..7830109 100644
--- a/drivers/acpi/acpica/tbinstal.c
+++ b/drivers/acpi/acpica/tbinstal.c
@@ -7,6 +7,7 @@
   *
*****************************************************************************/

+#include <linux/memblock.h>
  #include <acpi/acpi.h>
  #include "accommon.h"
  #include "actables.h"
@@ -16,6 +17,33 @@

  /*******************************************************************************
   *
+ * FUNCTION:    acpi_tb_reserve_standard_table
+ *
+ * PARAMETERS:  address             - Table physical address
+ *              header              - Table header
+ *
+ * RETURN:      None
+ *
+ * DESCRIPTION: To avoid an acpi table page from being "stolen" by the 
buddy
+ *              allocator run memblock_reserve() on all the standard 
acpi tables.
+ *
+ 
******************************************************************************/
+void
+acpi_tb_reserve_standard_table(acpi_physical_address address,
+               struct acpi_table_header *header)
+{
+    if ((ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_FACS)) ||
+        (ACPI_VALIDATE_RSDP_SIG(header->signature)))
+        return;
+
+    if (header->length > PAGE_SIZE) /* same check as in acpi_map() */
+        return;
+
+    memblock_reserve(address, PAGE_ALIGN(header->length));
+}
+
+/*******************************************************************************
+ *
   * FUNCTION:    acpi_tb_install_table_with_override
   *
   * PARAMETERS:  new_table_desc          - New table descriptor to install
@@ -58,6 +86,9 @@
                        new_table_desc->flags,
                        new_table_desc->pointer);

+    acpi_tb_reserve_standard_table(new_table_desc->address,
+                   new_table_desc->pointer);
+
      acpi_tb_print_table_header(new_table_desc->address,
                     new_table_desc->pointer);

George

>
> George
>>> Ran 10 boots with the above without failure.
>>>
>>> George
>
Mike Rapoport March 2, 2021, 9:57 a.m. UTC | #44
Hi George,

On Mon, Mar 01, 2021 at 08:20:45PM -0500, George Kennedy wrote:
> > > > > 
> > > > There should be no harm in doing the memblock_reserve() for all
> > > > the standard
> > > > tables, right?
> > > It should be ok to memblock_reserve() all the tables very early as
> > > long as
> > > we don't run out of static entries in memblock.reserved.
> > > 
> > > We just need to make sure the tables are reserved before memblock
> > > allocations are possible, so we'd still need to move
> > > acpi_table_init() in
> > > x86::setup_arch() before e820__memblock_setup().
> > > Not sure how early ACPI is initialized on arm64.
> > 
> > Thanks Mike. Will try to move the memblock_reserves() before
> > e820__memblock_setup().
> 
> Hi Mike,
> 
> Moved acpi_table_init() in x86::setup_arch() before e820__memblock_setup()
> as you suggested.
> 
> Ran 10 boots with the following without error.

I'd suggest to send it as a formal patch to see what x86 and ACPI folks
have to say about this.
 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 740f3bdb..3b1dd24 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1047,6 +1047,7 @@ void __init setup_arch(char **cmdline_p)
>      cleanup_highmap();
> 
>      memblock_set_current_limit(ISA_END_ADDRESS);
> +    acpi_boot_table_init();
>      e820__memblock_setup();
> 
>      /*
> @@ -1140,8 +1141,6 @@ void __init setup_arch(char **cmdline_p)
>      /*
>       * Parse the ACPI tables for possible boot-time SMP configuration.
>       */
> -    acpi_boot_table_init();
> -
>      early_acpi_boot_init();
> 
>      initmem_init();
> diff --git a/drivers/acpi/acpica/tbinstal.c b/drivers/acpi/acpica/tbinstal.c
> index 0bb15ad..7830109 100644
> --- a/drivers/acpi/acpica/tbinstal.c
> +++ b/drivers/acpi/acpica/tbinstal.c
> @@ -7,6 +7,7 @@
>   *
> *****************************************************************************/
> 
> +#include <linux/memblock.h>
>  #include <acpi/acpi.h>
>  #include "accommon.h"
>  #include "actables.h"
> @@ -16,6 +17,33 @@
> 
>  /*******************************************************************************
>   *
> + * FUNCTION:    acpi_tb_reserve_standard_table
> + *
> + * PARAMETERS:  address             - Table physical address
> + *              header              - Table header
> + *
> + * RETURN:      None
> + *
> + * DESCRIPTION: To avoid an acpi table page from being "stolen" by the
> buddy
> + *              allocator run memblock_reserve() on all the standard acpi
> tables.
> + *
> + ******************************************************************************/
> +void
> +acpi_tb_reserve_standard_table(acpi_physical_address address,
> +               struct acpi_table_header *header)
> +{
> +    if ((ACPI_COMPARE_NAMESEG(header->signature, ACPI_SIG_FACS)) ||
> +        (ACPI_VALIDATE_RSDP_SIG(header->signature)))
> +        return;
> +

Why these should be excluded?

> +    if (header->length > PAGE_SIZE) /* same check as in acpi_map() */
> +        return;

I don't think this is required, I believe acpi_map() has this check because
kmap() cannot handle multiple pages.

> +
> +    memblock_reserve(address, PAGE_ALIGN(header->length));
> +}
> +
> +/*******************************************************************************
> + *
>   * FUNCTION:    acpi_tb_install_table_with_override
>   *
>   * PARAMETERS:  new_table_desc          - New table descriptor to install
> @@ -58,6 +86,9 @@
>                        new_table_desc->flags,
>                        new_table_desc->pointer);
> 
> +    acpi_tb_reserve_standard_table(new_table_desc->address,
> +                   new_table_desc->pointer);
> +
>      acpi_tb_print_table_header(new_table_desc->address,
>                     new_table_desc->pointer);
> 
> George

Patch
diff mbox series

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0b55c9c95364..f10966e3b4a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -108,6 +108,17 @@  typedef int __bitwise fpi_t;
  */
 #define FPI_TO_TAIL		((__force fpi_t)BIT(1))
 
+/*
+ * Don't poison memory with KASAN.
+ * During boot, all non-reserved memblock memory is exposed to the buddy
+ * allocator. Poisoning all that memory lengthens boot time, especially on
+ * systems with large amount of RAM. This flag is used to skip that poisoning.
+ * Assuming that there are no references to those newly exposed pages before
+ * they are ever allocated, this has little effect on KASAN memory tracking.
+ * All memory allocated normally after boot gets poisoned as usual.
+ */
+#define FPI_SKIP_KASAN_POISON	((__force fpi_t)BIT(2))
+
 /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
 static DEFINE_MUTEX(pcp_batch_high_lock);
 #define MIN_PERCPU_PAGELIST_FRACTION	(8)
@@ -384,10 +395,14 @@  static DEFINE_STATIC_KEY_TRUE(deferred_pages);
  * on-demand allocation and then freed again before the deferred pages
  * initialization is done, but this is not likely to happen.
  */
-static inline void kasan_free_nondeferred_pages(struct page *page, int order)
+static inline void kasan_free_nondeferred_pages(struct page *page, int order,
+							fpi_t fpi_flags)
 {
-	if (!static_branch_unlikely(&deferred_pages))
-		kasan_free_pages(page, order);
+	if (static_branch_unlikely(&deferred_pages))
+		return;
+	if (fpi_flags & FPI_SKIP_KASAN_POISON)
+		return;
+	kasan_free_pages(page, order);
 }
 
 /* Returns true if the struct page for the pfn is uninitialised */
@@ -438,7 +453,13 @@  defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
 	return false;
 }
 #else
-#define kasan_free_nondeferred_pages(p, o)	kasan_free_pages(p, o)
+static inline void kasan_free_nondeferred_pages(struct page *page, int order,
+							fpi_t fpi_flags)
+{
+	if (fpi_flags & FPI_SKIP_KASAN_POISON)
+		return;
+	kasan_free_pages(page, order);
+}
 
 static inline bool early_page_uninitialised(unsigned long pfn)
 {
@@ -1216,7 +1237,7 @@  static void kernel_init_free_pages(struct page *page, int numpages)
 }
 
 static __always_inline bool free_pages_prepare(struct page *page,
-					unsigned int order, bool check_free)
+			unsigned int order, bool check_free, fpi_t fpi_flags)
 {
 	int bad = 0;
 
@@ -1290,7 +1311,7 @@  static __always_inline bool free_pages_prepare(struct page *page,
 
 	debug_pagealloc_unmap_pages(page, 1 << order);
 
-	kasan_free_nondeferred_pages(page, order);
+	kasan_free_nondeferred_pages(page, order, fpi_flags);
 
 	return true;
 }
@@ -1303,7 +1324,7 @@  static __always_inline bool free_pages_prepare(struct page *page,
  */
 static bool free_pcp_prepare(struct page *page)
 {
-	return free_pages_prepare(page, 0, true);
+	return free_pages_prepare(page, 0, true, FPI_NONE);
 }
 
 static bool bulkfree_pcp_prepare(struct page *page)
@@ -1323,9 +1344,9 @@  static bool bulkfree_pcp_prepare(struct page *page)
 static bool free_pcp_prepare(struct page *page)
 {
 	if (debug_pagealloc_enabled_static())
-		return free_pages_prepare(page, 0, true);
+		return free_pages_prepare(page, 0, true, FPI_NONE);
 	else
-		return free_pages_prepare(page, 0, false);
+		return free_pages_prepare(page, 0, false, FPI_NONE);
 }
 
 static bool bulkfree_pcp_prepare(struct page *page)
@@ -1533,7 +1554,7 @@  static void __free_pages_ok(struct page *page, unsigned int order,
 	int migratetype;
 	unsigned long pfn = page_to_pfn(page);
 
-	if (!free_pages_prepare(page, order, true))
+	if (!free_pages_prepare(page, order, true, fpi_flags))
 		return;
 
 	migratetype = get_pfnblock_migratetype(page, pfn);
@@ -1570,7 +1591,7 @@  void __free_pages_core(struct page *page, unsigned int order)
 	 * Bypass PCP and place fresh pages right to the tail, primarily
 	 * relevant for memory onlining.
 	 */
-	__free_pages_ok(page, order, FPI_TO_TAIL);
+	__free_pages_ok(page, order, FPI_TO_TAIL | FPI_SKIP_KASAN_POISON);
 }
 
 #ifdef CONFIG_NEED_MULTIPLE_NODES