linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Palmer Dabbelt <palmer@sifive.com>
To: schwab@suse.de
Cc: linux-riscv@lists.infradead.org,
	David Abdurachmanov <david.abdurachmanov@gmail.com>,
	opensbi@lists.infradead.org,
	Paul Walmsley <paul.walmsley@sifive.com>
Subject: Re: Random memory corruption with v5.2
Date: Thu, 01 Aug 2019 19:00:07 -0700 (PDT)	[thread overview]
Message-ID: <mhng-780916c8-0f2d-4487-b55c-2b1236e8778b@palmer-si-x1c4> (raw)
In-Reply-To: <mvmwofw68ji.fsf@suse.de>

On Thu, 01 Aug 2019 11:32:33 PDT (-0700), schwab@suse.de wrote:
> On Jul 30 2019, Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
>> On Tue, 30 Jul 2019, Andreas Schwab wrote:
>>
>>> On Jul 30 2019, David Abdurachmanov <david.abdurachmanov@gmail.com> wrote:
>>>
>>> > On Mon, Jul 29, 2019 at 1:51 PM Andreas Schwab <schwab@suse.de> wrote:
>>> >>
>>> >> Since switching to 5.2 kernels I'm seeing random crashes and
>>> >> misbehaviors on the HiFive, for example while building gcc or glibc.
>>> >> Perhaps missing TLB flushes?
>>> >
>>> > Do you have some examples of crashes?
>>>
>>> While building glibc:
>>>
>>> an_ES.UTF-8...realloc(): invalid pointer
>>> /bin/sh: line 1:  7841 Aborted                 (core dumped) I18NPATH=. GCONV_PATH=/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/iconvdata LC_ALL=C /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf/ld-linux-riscv64-lp64d.so.1 --library-path /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/math:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/elf:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/dlfcn:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nss:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nis:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/rt:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/resolv:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/mathvec:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/support:/home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/nptl /home/abuild/rpmbuild/BUILD/glibc-2.29/cc-base/locale/localedef $flags --alias-file=../intl/locale.alias -i locales/$input -f charmaps/$charset --prefix=/home/abuild/rpmbuild/BUILD
 ROOT/glibc-2.29-0.riscv64
>
>>> make[2]: *** [Makefile:422: install-archive-an_ES.UTF-8/UTF-8] Error 134
>>>
>>> While building gcc:
>>>
>>> ../../gcc/ada/exp_aggr.adb: In function 'Exp_Aggr.Expand_N_Aggregate':
>>> ../../gcc/ada/exp_aggr.adb:5311:21: warning: 'Csiz' may be used uninitialized in this function [-Wmaybe-uninitialized]
>>> ../../gcc/ada/exp_aggr.adb:5220:10: note: 'Csiz' was declared here
>>> +===========================GNAT BUG DETECTED==============================+
>>> | 10.0.0 20190727 (experimental) [trunk revision 273844] (riscv64-suse-linux) |
>>> | Storage_Error stack overflow or erroneous memory access                  |
>>> | Error detected at output.ads:39:8                                        |
>>> realloc(): invalid pointer
>>
>> I personally haven't seen these issues; but then again, I haven't done any
>> glibc or gcc builds on v5.2.  Will take a closer look.
>
> I think there is some fundamental problem with SBI_REMOTE_SFENCE_VMA or
> the kernel interface to it.
>
> For exmaple, flush_tlb_page is defined as:
>
> #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, 0)
>
> But the third argument of flush_tlb_range is supposed to be the end
> address, so this should actually be:
>
> #define flush_tlb_page(vma, addr) flush_tlb_range(vma, addr, (addr) + PAGE_SIZE)
>
> Alas, that doesn't fix the crashes.

This line of reasoning smells like it'd find the issue: BBL just flushes the
entire TLB every time, but IIRC OpenSBI respects the ranges.  It looks like

    Fixes: 90cb4917b584 ("lib: Implement sfence.vma correctly.")

is what introduced the new behavior in OpenSBI, which may have triggered a lot
of latent bugs in Linux.  If you have an easy way to compile OpenSBI, does
something like

    $ git diff | cat
    diff --git a/lib/sbi/sbi_tlb.c b/lib/sbi/sbi_tlb.c
    index cffda52d66ab..007266b1f970 100644
    --- a/lib/sbi/sbi_tlb.c
    +++ b/lib/sbi/sbi_tlb.c
    @@ -133,50 +133,12 @@ static void sbi_tlb_flush_all(void)
    
     static void sbi_tlb_fifo_sfence_vma(struct sbi_tlb_info *tinfo)
     {
    -       unsigned long start = tinfo->start;
    -       unsigned long size  = tinfo->size;
    -       unsigned long i;
    -
    -       if ((start == 0 && size == 0) || (size == SBI_TLB_FLUSH_ALL)) {
    -               sbi_tlb_flush_all();
    -               return;
    -       }
    -
    -       for (i = 0; i < size; i += PAGE_SIZE) {
    -               __asm__ __volatile__("sfence.vma %0"
    -                                    :
    -                                    : "r"(start + i)
    -                                    : "memory");
    -       }
    +       sbi_tlb_flush_all();
     }
    
     static void sbi_tlb_fifo_sfence_vma_asid(struct sbi_tlb_info *tinfo)
     {
    -       unsigned long start = tinfo->start;
    -       unsigned long size  = tinfo->size;
    -       unsigned long asid  = tinfo->asid;
    -       unsigned long i;
    -
    -       if (start == 0 && size == 0) {
    -               sbi_tlb_flush_all();
    -               return;
    -       }
    -
    -       /* Flush entire MM context for a given ASID */
    -       if (size == SBI_TLB_FLUSH_ALL) {
    -               __asm__ __volatile__("sfence.vma x0, %0"
    -                                    :
    -                                    : "r"(asid)
    -                                    : "memory");
    -               return;
    -       }
    -
    -       for (i = 0; i < size; i += PAGE_SIZE) {
    -               __asm__ __volatile__("sfence.vma %0, %1"
    -                                    :
    -                                    : "r"(start + i), "r"(asid)
    -                                    : "memory");
    -       }
    +       sbi_tlb_flush_all();
     }
    
     void sbi_tlb_fifo_process(struct sbi_scratch *scratch, u32 event)

cause the issue to go away?  If so, then I'd bet we need to scour Linux for
broken TLB flushing, as given the one you found is pretty obvious I'd bet
there's a lot more...

>
> Andreas.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2019-08-02  2:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-29 10:51 Random memory corruption with v5.2 Andreas Schwab
2019-07-29 22:58 ` David Abdurachmanov
2019-07-30  4:27   ` Atish Patra
2019-07-30  6:56   ` Andreas Schwab
2019-07-31  0:22     ` Paul Walmsley
2019-07-31  7:39       ` Andreas Schwab
2019-07-31  8:14         ` Anup Patel
2019-08-01 19:57         ` Palmer Dabbelt
2019-07-31 10:19       ` Andreas Schwab
2019-07-31 12:57         ` Troy Benjegerdes
2019-07-31 13:10           ` Andreas Schwab
2019-08-01 18:32       ` Andreas Schwab
2019-08-02  2:00         ` Palmer Dabbelt [this message]
2019-08-02  2:15         ` Anup Patel
2019-08-05 14:08           ` Andreas Schwab
2019-08-05 14:34             ` Andreas Schwab
2019-08-05 15:36               ` Andreas Schwab
2019-08-05 22:34               ` Atish Patra
2019-08-06  0:25                 ` Troy Benjegerdes
2019-08-06  0:30                   ` Atish Patra
2019-08-06  6:41                 ` Andreas Schwab
2019-08-06  7:43                 ` Andreas Schwab
2019-08-02  7:25       ` Paul Walmsley
2019-08-02 12:08         ` Andreas Schwab
2019-08-02 17:32           ` Paul Walmsley
2019-08-05  7:13             ` Andreas Schwab
2019-08-15 20:52     ` Atish Patra
2019-08-16  5:22       ` Atish Patra
2019-08-16 15:38         ` Troy Benjegerdes
2019-08-19 10:53       ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mhng-780916c8-0f2d-4487-b55c-2b1236e8778b@palmer-si-x1c4 \
    --to=palmer@sifive.com \
    --cc=david.abdurachmanov@gmail.com \
    --cc=linux-riscv@lists.infradead.org \
    --cc=opensbi@lists.infradead.org \
    --cc=paul.walmsley@sifive.com \
    --cc=schwab@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).