* [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
@ 2009-09-24 0:45 Rex Feany
2009-09-24 6:44 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 58+ messages in thread
From: Rex Feany @ 2009-09-24 0:45 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.
Signed-off-by: Rex Feany <rfeany@mrv.com>
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 627767d..d8e6725 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -30,6 +30,8 @@
#include <asm/tlbflush.h>
#include <asm/tlb.h>
+#include "mmu_decl.h"
+
static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
static unsigned long pte_freelist_forced_free;
@@ -119,7 +121,7 @@ void pte_free_finish(void)
/*
* Handle i/d cache flushing, called from set_pte_at() or ptep_set_access_flags()
*/
-static pte_t do_dcache_icache_coherency(pte_t pte)
+static pte_t do_dcache_icache_coherency(pte_t pte, unsigned long addr)
{
unsigned long pfn = pte_pfn(pte);
struct page *page;
@@ -128,6 +130,17 @@ static pte_t do_dcache_icache_coherency(pte_t pte)
return pte;
page = pfn_to_page(pfn);
+#ifdef CONFIG_8xx
+ /* On 8xx, cache control instructions (particularly
+ * "dcbst" from flush_dcache_icache) fault as write
+ * operation if there is an unpopulated TLB entry
+ * for the address in question. To workaround that,
+ * we invalidate the TLB here, thus avoiding dcbst
+ * misbehaviour.
+ */
+ _tlbil_va(addr, 0 /* 8xx doesn't care about PID */);
+#endif
+
if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)) {
pr_devel("do_dcache_icache_coherency... flushing\n");
flush_dcache_icache_page(page);
@@ -198,7 +211,7 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte
*/
pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
if (pte_need_exec_flush(pte, 1))
- pte = do_dcache_icache_coherency(pte);
+ pte = do_dcache_icache_coherency(pte, addr);
/* Perform the setting of the PTE */
__set_pte_at(mm, addr, ptep, pte, 0);
@@ -216,7 +229,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
{
int changed;
if (!dirty && pte_need_exec_flush(entry, 0))
- entry = do_dcache_icache_coherency(entry);
+ entry = do_dcache_icache_coherency(entry, address);
changed = !pte_same(*(ptep), entry);
if (changed) {
if (!(vma->vm_flags & VM_HUGETLB))
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-24 0:45 [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite Rex Feany
@ 2009-09-24 6:44 ` Benjamin Herrenschmidt
2009-09-24 23:33 ` Rex Feany
0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-24 6:44 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
On Wed, 2009-09-23 at 17:45 -0700, Rex Feany wrote:
> After upgrading to the latest kernel on my mpc875 userspace started
> running incredibly slow (hours to get to a shell, even!).
> I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36,
> that patch removed a work-around for the 8xx. Adding it
> back makes my problem go away.
>
> Signed-off-by: Rex Feany <rfeany@mrv.com>
Note that while your patch applies to earlier kernels such as 2.6.31, it
doesn't apply with current Linus tree due to changes in that file. I've
updated the powerpc "merge" branch of my tree with a hand-modified
version, please test and let me know. That should also contain the
_PAGE_SPECIAL fix.
You can get my tree at:
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
(To get the "merge" branch, just add "merge" after the clone if you
are cloning it, or just create a local branch and manually pull
into it)
Note that i just pushed out so it may take a little while for
the kernel.org mirrors to get it, the commit ID is:
ae48e383e3299c2f87723d21df2db97927774b1e
Cheers,
Ben.
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 627767d..d8e6725 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -30,6 +30,8 @@
> #include <asm/tlbflush.h>
> #include <asm/tlb.h>
>
> +#include "mmu_decl.h"
> +
> static DEFINE_PER_CPU(struct pte_freelist_batch *, pte_freelist_cur);
> static unsigned long pte_freelist_forced_free;
>
> @@ -119,7 +121,7 @@ void pte_free_finish(void)
> /*
> * Handle i/d cache flushing, called from set_pte_at() or ptep_set_access_flags()
> */
> -static pte_t do_dcache_icache_coherency(pte_t pte)
> +static pte_t do_dcache_icache_coherency(pte_t pte, unsigned long addr)
> {
> unsigned long pfn = pte_pfn(pte);
> struct page *page;
> @@ -128,6 +130,17 @@ static pte_t do_dcache_icache_coherency(pte_t pte)
> return pte;
> page = pfn_to_page(pfn);
>
> +#ifdef CONFIG_8xx
> + /* On 8xx, cache control instructions (particularly
> + * "dcbst" from flush_dcache_icache) fault as write
> + * operation if there is an unpopulated TLB entry
> + * for the address in question. To workaround that,
> + * we invalidate the TLB here, thus avoiding dcbst
> + * misbehaviour.
> + */
> + _tlbil_va(addr, 0 /* 8xx doesn't care about PID */);
> +#endif
> +
> if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)) {
> pr_devel("do_dcache_icache_coherency... flushing\n");
> flush_dcache_icache_page(page);
> @@ -198,7 +211,7 @@ void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte
> */
> pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
> if (pte_need_exec_flush(pte, 1))
> - pte = do_dcache_icache_coherency(pte);
> + pte = do_dcache_icache_coherency(pte, addr);
>
> /* Perform the setting of the PTE */
> __set_pte_at(mm, addr, ptep, pte, 0);
> @@ -216,7 +229,7 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
> {
> int changed;
> if (!dirty && pte_need_exec_flush(entry, 0))
> - entry = do_dcache_icache_coherency(entry);
> + entry = do_dcache_icache_coherency(entry, address);
> changed = !pte_same(*(ptep), entry);
> if (changed) {
> if (!(vma->vm_flags & VM_HUGETLB))
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-24 6:44 ` Benjamin Herrenschmidt
@ 2009-09-24 23:33 ` Rex Feany
2009-09-24 23:52 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 58+ messages in thread
From: Rex Feany @ 2009-09-24 23:33 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):
> You can get my tree at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
>
> (To get the "merge" branch, just add "merge" after the clone if you
> are cloning it, or just create a local branch and manually pull
> into it)
Your tree hangs on boot, similar to what I saw without the 8xx
work-around patch -- it is hard to tell if it is the same though. :(
> Note that i just pushed out so it may take a little while for
> the kernel.org mirrors to get it, the commit ID is:
>
> ae48e383e3299c2f87723d21df2db97927774b1e
Maybe I fail git 101 -- I can't seem to find that commit ID in the repo
above, but I do see the patches you commited to fix PAGE_SPECIAL and the
8xx work-around.
thanks!
/rex.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-24 23:33 ` Rex Feany
@ 2009-09-24 23:52 ` Benjamin Herrenschmidt
2009-09-25 1:35 ` Rex Feany
0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-24 23:52 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
On Thu, 2009-09-24 at 16:33 -0700, Rex Feany wrote:
> Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):
>
> > You can get my tree at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
> >
> > (To get the "merge" branch, just add "merge" after the clone if you
> > are cloning it, or just create a local branch and manually pull
> > into it)
>
> Your tree hangs on boot, similar to what I saw without the 8xx
> work-around patch -- it is hard to tell if it is the same though. :(
There's no backtrace ? Where does it hang ? Also which workaround
patch ? The missing tlbil_va() or the _PAGE_SPECIAL problem ?
> > Note that i just pushed out so it may take a little while for
> > the kernel.org mirrors to get it, the commit ID is:
> >
> > ae48e383e3299c2f87723d21df2db97927774b1e
>
> Maybe I fail git 101 -- I can't seem to find that commit ID in the repo
> above, but I do see the patches you commited to fix PAGE_SPECIAL and the
> 8xx work-around.
Right. I must have screwed something with the ID.
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-24 23:52 ` Benjamin Herrenschmidt
@ 2009-09-25 1:35 ` Rex Feany
2009-09-25 1:51 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 58+ messages in thread
From: Rex Feany @ 2009-09-25 1:35 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):
> > Your tree hangs on boot, similar to what I saw without the 8xx
> > work-around patch -- it is hard to tell if it is the same though. :(
>
> There's no backtrace ? Where does it hang ? Also which workaround
> patch ? The missing tlbil_va() or the _PAGE_SPECIAL problem ?
>
Ben, I'm sorry, my last email was basically useless. I was refering to the
missing tlbil_va(). The system doesn't crash, but it does seem to hang right
after "Freeing unused kernel memory: 100k init" using your tree.
If I move the tlbil_va() outside of the test for PG_arch_1 :
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 5304093..d927269 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -176,7 +176,7 @@ static pte_t set_pte_filter(pte_t pte, unsigned long addr)
struct page *pg = maybe_pte_to_page(pte);
if (!pg)
return pte;
- if (!test_bit(PG_arch_1, &pg->flags)) {
+
#ifdef CONFIG_8xx
/* On 8xx, cache control instructions (particularly
* "dcbst" from flush_dcache_icache) fault as write
@@ -188,6 +188,8 @@ static pte_t set_pte_filter(pte_t pte, unsigned long addr)
/* 8xx doesn't care about PID, size or ind args */
_tlbil_va(addr, 0, 0, 0);
#endif /* CONFIG_8xx */
+
+ if (!test_bit(PG_arch_1, &pg->flags)) {
flush_dcache_icache_page(pg);
set_bit(PG_arch_1, &pg->flags);
}
Then I can boot and get to a shell, but userspace is slow. 8 seconds to mount
/proc (vs. less then a second using my old kernel)! Maybe this is an
unrelated issue? I'm pretty clueless about the details, I'm sorry.
PG_arch_1 is used to prevent a cache flush unless it is actually needed?
Then why would changing the location of the tlbil_va() make a difference?
take care!
/rex.
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-25 1:35 ` Rex Feany
@ 2009-09-25 1:51 ` Benjamin Herrenschmidt
2009-09-25 3:03 ` Benjamin Herrenschmidt
2009-09-29 1:21 ` Rex Feany
0 siblings, 2 replies; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-25 1:51 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
On Thu, 2009-09-24 at 18:35 -0700, Rex Feany wrote:
>
> Then I can boot and get to a shell, but userspace is slow. 8 seconds
> to mount
> /proc (vs. less then a second using my old kernel)! Maybe this is an
> unrelated issue? I'm pretty clueless about the details, I'm sorry.
> PG_arch_1 is used to prevent a cache flush unless it is actually
> needed?
> Then why would changing the location of the tlbil_va() make a
> difference?
I think there's more finishyness to 8xx than we thought. IE. That
tlbil_va might have more reasons to be there than what the comment
seems to advertize. Can you try to move it even higher up ? IE.
Unconditionally at the beginning of set_pte_filter ?
Also, if that doesn't help, can you try putting one in
set_access_flags_filter() just below ?
(Beware that there's two different versions of both functions, only the
first one is compiled/used on 8xx).
It's going to be hard for me to get that "right" since I don't really
know what's going on with the core here, but I suppose if we get it
moving along with extra tlb invalidations, that should be "good enough"
until somebody who really knows what's going on comes up with possibly
a better fix.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-25 1:51 ` Benjamin Herrenschmidt
@ 2009-09-25 3:03 ` Benjamin Herrenschmidt
2009-09-25 8:31 ` Joakim Tjernlund
2009-09-25 21:18 ` Rex Feany
2009-09-29 1:21 ` Rex Feany
1 sibling, 2 replies; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-25 3:03 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
> I think there's more finishyness to 8xx than we thought. IE. That
> tlbil_va might have more reasons to be there than what the comment
> seems to advertize. Can you try to move it even higher up ? IE.
> Unconditionally at the beginning of set_pte_filter ?
>
> Also, if that doesn't help, can you try putting one in
> set_access_flags_filter() just below ?
Ok, I got a refresher on the whole concept of "unpopulated TLB entries"
on 8xx, and that's damn scary. I think what mislead me initially is that
the comment around the workaround is simply not properly describing the
extent of the problem :-)
So I'm not going to make the 8xx TLB miss code sane, that's beyond what
I'm prepare to do with it, but I suspect that this should fix it (on top
of upstream). Let me know if that's enough or if we also need to put
one of these in ptep_set_access_flags().
Please let me know if that works for you.
Cheers,
Ben.
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 5304093..7a8e676 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -170,6 +170,16 @@ struct page * maybe_pte_to_page(pte_t pte)
static pte_t set_pte_filter(pte_t pte, unsigned long addr)
{
+#ifdef CONFIG_8xx
+ /* 8xx has a weird concept of "unpopulated" entries. When we take
+ * a TLB miss for a non-valid PTE, we insert such an entry which
+ * causes a page fault the next time around. This entry must now
+ * be kicked out or we'll just fault again
+ */
+ /* 8xx doesn't care about PID, size or ind args */
+ _tlbil_va(addr, 0, 0, 0);
+#endif /* CONFIG_8xx */
+
pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) ||
cpu_has_feature(CPU_FTR_NOEXECUTE))) {
@@ -177,17 +187,6 @@ static pte_t set_pte_filter(pte_t pte, unsigned long addr)
if (!pg)
return pte;
if (!test_bit(PG_arch_1, &pg->flags)) {
-#ifdef CONFIG_8xx
- /* On 8xx, cache control instructions (particularly
- * "dcbst" from flush_dcache_icache) fault as write
- * operation if there is an unpopulated TLB entry
- * for the address in question. To workaround that,
- * we invalidate the TLB here, thus avoiding dcbst
- * misbehaviour.
- */
- /* 8xx doesn't care about PID, size or ind args */
- _tlbil_va(addr, 0, 0, 0);
-#endif /* CONFIG_8xx */
flush_dcache_icache_page(pg);
set_bit(PG_arch_1, &pg->flags);
}
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-25 3:03 ` Benjamin Herrenschmidt
@ 2009-09-25 8:31 ` Joakim Tjernlund
2009-09-25 9:47 ` Benjamin Herrenschmidt
2009-09-25 21:18 ` Rex Feany
1 sibling, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-25 8:31 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
>
>
> > I think there's more finishyness to 8xx than we thought. IE. That
> > tlbil_va might have more reasons to be there than what the comment
> > seems to advertize. Can you try to move it even higher up ? IE.
> > Unconditionally at the beginning of set_pte_filter ?
> >
> > Also, if that doesn't help, can you try putting one in
> > set_access_flags_filter() just below ?
>
> Ok, I got a refresher on the whole concept of "unpopulated TLB entries"
> on 8xx, and that's damn scary. I think what mislead me initially is that
> the comment around the workaround is simply not properly describing the
> extent of the problem :-)
>
> So I'm not going to make the 8xx TLB miss code sane, that's beyond what
> I'm prepare to do with it, but I suspect that this should fix it (on top
> of upstream). Let me know if that's enough or if we also need to put
> one of these in ptep_set_access_flags().
>
> Please let me know if that works for you.
>
> Cheers,
> Ben.
>
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 5304093..7a8e676 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -170,6 +170,16 @@ struct page * maybe_pte_to_page(pte_t pte)
>
> static pte_t set_pte_filter(pte_t pte, unsigned long addr)
> {
> +#ifdef CONFIG_8xx
> + /* 8xx has a weird concept of "unpopulated" entries. When we take
> + * a TLB miss for a non-valid PTE, we insert such an entry which
> + * causes a page fault the next time around. This entry must now
> + * be kicked out or we'll just fault again
> + */
> + /* 8xx doesn't care about PID, size or ind args */
> + _tlbil_va(addr, 0, 0, 0);
> +#endif /* CONFIG_8xx */
> +
The main problem with 8xx it does not update the DAR register in
the TLB Miss/Fault handlers for cache instructions :( It on old bug
that was found only some years ago.
I think the old comment is correct though, as I recall it was Marcelo
that found the problem and added the workaround.
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-25 8:31 ` Joakim Tjernlund
@ 2009-09-25 9:47 ` Benjamin Herrenschmidt
2009-09-25 10:21 ` Joakim Tjernlund
0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-25 9:47 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev, Rex Feany
On Fri, 2009-09-25 at 10:31 +0200, Joakim Tjernlund wrote:
>
> The main problem with 8xx it does not update the DAR register in
> the TLB Miss/Fault handlers for cache instructions :( It on old bug
> that was found only some years ago.
>
> I think the old comment is correct though, as I recall it was Marcelo
> that found the problem and added the workaround.
But the TLB needs flushing on more than just the cache instructions,
no ?
IE. We take a TLB miss, there's no valid PTE, we put one of those
"unpopulated" entries in and get into the page fault, at which point we
do a set_pte, we -still- need to do an invalidation to get rid of the
unpopulated entry so it gets a new TLB miss no ? Without that, it's just
going to fault over and over again...
In any case, I think flushing unconditionally the target address isn't
going to hurt since we are just changing its PTE anyways.
As for the DAR problem, I'm not sure whether we really need a workaround
since I haven't seem much people complaining about it so far :-)
Can you educate me more on the problem ? Can it be fixed without
bloating those handlers to oblivion ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-25 9:47 ` Benjamin Herrenschmidt
@ 2009-09-25 10:21 ` Joakim Tjernlund
0 siblings, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-25 10:21 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
[-- Attachment #1: Type: text/plain, Size: 2112 bytes --]
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 25/09/2009 11:47:34:
>
> On Fri, 2009-09-25 at 10:31 +0200, Joakim Tjernlund wrote:
> >
> > The main problem with 8xx it does not update the DAR register in
> > the TLB Miss/Fault handlers for cache instructions :( It on old bug
> > that was found only some years ago.
> >
> > I think the old comment is correct though, as I recall it was Marcelo
> > that found the problem and added the workaround.
>
> But the TLB needs flushing on more than just the cache instructions,
> no ?
>
> IE. We take a TLB miss, there's no valid PTE, we put one of those
> "unpopulated" entries in and get into the page fault, at which point we
> do a set_pte, we -still- need to do an invalidation to get rid of the
> unpopulated entry so it gets a new TLB miss no ? Without that, it's just
> going to fault over and over again...
I don't know enough about 8xx in 2.6 as we still use 2.4 for 8xx to
say for sure.
>
> In any case, I think flushing unconditionally the target address isn't
> going to hurt since we are just changing its PTE anyways.
>
> As for the DAR problem, I'm not sure whether we really need a workaround
> since I haven't seem much people complaining about it so far :-)
I did some years ago on 2.4 but no one cared enough :(
The drawbacks of not handling this problem is that you will have
to very carful to use cache instructions and user space must
be especially compiled to omit using them in optimizations.
>
> Can you educate me more on the problem ? Can it be fixed without
> bloating those handlers to oblivion ?
Yes, I fixed it for myself but the fix was never accepted. Currently
only TLB Error depends on DAR so what I did was to tag DAR with an impossible
value and test for that value in the TLB Error handler. If it matched I
branched to a subroutine the did instruction decoding in assembler to
get at registers used and calculate DAR, then return to the TLB error
handler. In hindsight it would have been better to do this work in
handle_page_fault.
I am attaching my old head_8xx.S for 2.4
Jocke
(See attached file: head_8xx.S)
[-- Attachment #2: head_8xx.S --]
[-- Type: application/octet-stream, Size: 33768 bytes --]
/*
* arch/ppc/kernel/except_8xx.S
*
* PowerPC version
* Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
* Rewritten by Cort Dougan (cort@cs.nmt.edu) for PReP
* Copyright (C) 1996 Cort Dougan <cort@cs.nmt.edu>
* Low-level exception handlers and MMU support
* rewritten by Paul Mackerras.
* Copyright (C) 1996 Paul Mackerras.
* MPC8xx modifications by Dan Malek
* Copyright (C) 1997 Dan Malek (dmalek@jlc.net).
*
* This file contains low-level support and setup for PowerPC 8xx
* embedded processors, including trap and interrupt dispatch.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version
* 2 of the License, or (at your option) any later version.
*
*/
#include <linux/config.h>
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/mmu.h>
#include <asm/cache.h>
#include <asm/pgtable.h>
#include <asm/cputable.h>
#include <asm/ppc_asm.h>
#include "ppc_defs.h"
#ifdef CONFIG_8xx_DCBxFIXED
/* These macros are used to tag DAR with a known value so that the
* DataTLBError can recognize a buggy dcbx instruction and workaround
* the problem.
*/
#define TAG_VAL 0x00f0 /* -1 may also be used */
#define TAG_DAR_R20 \
li r20, TAG_VAL;\
mtspr DAR, r20;
#else
#define TAG_DAR_R20
#endif
/* Macro to make the code more readable. */
#ifdef CONFIG_8xx_CPU6
#define DO_8xx_CPU6(val, reg) \
li reg, val; \
stw reg, 12(r0); \
lwz reg, 12(r0);
#else
#define DO_8xx_CPU6(val, reg)
#endif
.text
.globl _stext
_stext:
/*
* _start is defined this way because the XCOFF loader in the OpenFirmware
* on the powermac expects the entry point to be a procedure descriptor.
*/
.text
.globl _start
_start:
/* MPC8xx
* This port was done on an MBX board with an 860. Right now I only
* support an ELF compressed (zImage) boot from EPPC-Bug because the
* code there loads up some registers before calling us:
* r3: ptr to board info data
* r4: initrd_start or if no initrd then 0
* r5: initrd_end - unused if r4 is 0
* r6: Start of command line string
* r7: End of command line string
*
* I decided to use conditional compilation instead of checking PVR and
* adding more processor specific branches around code I don't need.
* Since this is an embedded processor, I also appreciate any memory
* savings I can get.
*
* The MPC8xx does not have any BATs, but it supports large page sizes.
* We first initialize the MMU to support 8M byte pages, then load one
* entry into each of the instruction and data TLBs to map the first
* 8M 1:1. I also mapped an additional I/O space 1:1 so we can get to
* the "internal" processor registers before MMU_init is called.
*
* The TLB code currently contains a major hack. Since I use the condition
* code register, I have to save and restore it. I am out of registers, so
* I just store it in memory location 0 (the TLB handlers are not reentrant).
* To avoid making any decisions, I need to use the "segment" valid bit
* in the first level table, but that would require many changes to the
* Linux page directory/table functions that I don't want to do right now.
*
* I used to use SPRG2 for a temporary register in the TLB handler, but it
* has since been put to other uses. I now use a hack to save a register
* and the CCR at memory location 0.....Someday I'll fix this.....
* -- Dan
*/
.globl __start
__start:
/* To accomodate some SMP systems that overwrite the first few
* locations before cpu 0 starts, the bootloader starts us at 0xc.
*/
nop
nop
nop
mr r31,r3 /* save parameters */
mr r30,r4
mr r29,r5
mr r28,r6
mr r27,r7
li r24,0 /* cpu # */
/* We have to turn on the MMU right away so we get cache modes
* set correctly.
*/
bl initial_mmu
/* We now have the lower 8 Meg mapped into TLB entries, and the caches
* ready to work.
*/
turn_on_mmu:
mfmsr r0
ori r0,r0,MSR_DR|MSR_IR
mtspr SRR1,r0
lis r0,start_here@h
ori r0,r0,start_here@l
mtspr SRR0,r0
SYNC
rfi /* enables MMU */
/*
* Exception entry code. This code runs with address translation
* turned off, i.e. using physical addresses.
* We assume sprg3 has the physical address of the current
* task's thread_struct.
*/
#define EXCEPTION_PROLOG \
mtspr SPRG0,r20; \
mtspr SPRG1,r21; \
mfcr r20; \
mfspr r21,SPRG2; /* exception stack to use from */ \
cmpwi 0,r21,0; /* user mode or RTAS */ \
bne 1f; \
tophys(r21,r1); /* use tophys(kernel sp) otherwise */ \
subi r21,r21,INT_FRAME_SIZE; /* alloc exc. frame */\
1: stw r20,_CCR(r21); /* save registers */ \
stw r22,GPR22(r21); \
stw r23,GPR23(r21); \
mfspr r20,SPRG0; \
stw r20,GPR20(r21); \
mfspr r22,SPRG1; \
stw r22,GPR21(r21); \
mflr r20; \
stw r20,_LINK(r21); \
mfctr r22; \
stw r22,_CTR(r21); \
mfspr r20,XER; \
stw r20,_XER(r21); \
mfspr r22,SRR0; \
mfspr r23,SRR1; \
stw r0,GPR0(r21); \
stw r1,GPR1(r21); \
stw r2,GPR2(r21); \
stw r1,0(r21); \
tovirt(r1,r21); /* set new kernel sp */ \
SAVE_4GPRS(3, r21); \
SAVE_GPR(7, r21);
/*
* Note: code which follows this uses cr0.eq (set if from kernel),
* r21, r22 (SRR0), and r23 (SRR1).
*/
/*
* Exception vectors.
*/
#define FINISH_EXCEPTION(func) \
bl transfer_to_handler; \
.long func; \
.long ret_from_except
#define STD_EXCEPTION(n, label, hdlr) \
. = n; \
label: \
EXCEPTION_PROLOG; \
TAG_DAR_R20; \
addi r3,r1,STACK_FRAME_OVERHEAD; \
li r20,MSR_KERNEL; \
FINISH_EXCEPTION(hdlr)
/* System reset */
STD_EXCEPTION(0x100, Reset, UnknownException)
/* Machine check */
STD_EXCEPTION(0x200, MachineCheck, MachineCheckException)
/* Data access exception.
* This is "never generated" by the MPC8xx. We jump to it for other
* translation errors.
*/
. = 0x300
DataAccess:
EXCEPTION_PROLOG
mfspr r20,DSISR
stw r20,_DSISR(r21)
mr r5,r20
mfspr r4,DAR
stw r4,_DAR(r21)
TAG_DAR_R20
addi r3,r1,STACK_FRAME_OVERHEAD
li r20,MSR_KERNEL
rlwimi r20,r23,0,16,16 /* copy EE bit from saved MSR */
FINISH_EXCEPTION(do_page_fault)
/* Instruction access exception.
* This is "never generated" by the MPC8xx. We jump to it for other
* translation errors.
*/
. = 0x400
InstructionAccess:
EXCEPTION_PROLOG
addi r3,r1,STACK_FRAME_OVERHEAD
mr r4,r22
mr r5,r23
li r20,MSR_KERNEL
rlwimi r20,r23,0,16,16 /* copy EE bit from saved MSR */
FINISH_EXCEPTION(do_page_fault)
/* External interrupt */
. = 0x500;
HardwareInterrupt:
EXCEPTION_PROLOG;
addi r3,r1,STACK_FRAME_OVERHEAD
li r20,MSR_KERNEL
li r4,0
bl transfer_to_handler
.globl do_IRQ_intercept
do_IRQ_intercept:
.long do_IRQ;
.long ret_from_intercept
/* Alignment exception */
. = 0x600
Alignment:
EXCEPTION_PROLOG
mfspr r4,DAR
stw r4,_DAR(r21)
TAG_DAR_R20
mfspr r5,DSISR
stw r5,_DSISR(r21)
addi r3,r1,STACK_FRAME_OVERHEAD
li r20,MSR_KERNEL
rlwimi r20,r23,0,16,16 /* copy EE bit from saved MSR */
FINISH_EXCEPTION(AlignmentException)
/* Program check exception */
. = 0x700
ProgramCheck:
EXCEPTION_PROLOG
addi r3,r1,STACK_FRAME_OVERHEAD
li r20,MSR_KERNEL
rlwimi r20,r23,0,16,16 /* copy EE bit from saved MSR */
FINISH_EXCEPTION(ProgramCheckException)
/* No FPU on MPC8xx. This exception is not supposed to happen.
*/
STD_EXCEPTION(0x800, FPUnavailable, UnknownException)
. = 0x900
Decrementer:
EXCEPTION_PROLOG
addi r3,r1,STACK_FRAME_OVERHEAD
li r20,MSR_KERNEL
bl transfer_to_handler
.globl timer_interrupt_intercept
timer_interrupt_intercept:
.long timer_interrupt
.long ret_from_intercept
STD_EXCEPTION(0xa00, Trap_0a, UnknownException)
STD_EXCEPTION(0xb00, Trap_0b, UnknownException)
/* System call */
. = 0xc00
SystemCall:
EXCEPTION_PROLOG
stw r3,ORIG_GPR3(r21)
li r20,MSR_KERNEL
rlwimi r20,r23,0,16,16 /* copy EE bit from saved MSR */
FINISH_EXCEPTION(DoSyscall)
/* Single step - not used on 601 */
STD_EXCEPTION(0xd00, SingleStep, SingleStepException)
STD_EXCEPTION(0xe00, Trap_0e, UnknownException)
STD_EXCEPTION(0xf00, Trap_0f, UnknownException)
/* On the MPC8xx, this is a software emulation interrupt. It occurs
* for all unimplemented and illegal instructions.
*/
STD_EXCEPTION(0x1000, SoftEmu, SoftwareEmulation)
. = 0x1100
/*
* For the MPC8xx, this is a software tablewalk to load the instruction
* TLB. It is modelled after the example in the Motorola manual. The task
* switch loads the M_TWB register with the pointer to the first level table.
* If we discover there is no second level table (the value is zero), the
* plan was to load that into the TLB, which causes another fault into the
* TLB Error interrupt where we can handle such problems. However, that did
* not work, so if we discover there is no second level table, we restore
* registers and branch to the error exception. We have to use the MD_xxx
* registers for the tablewalk because the equivalent MI_xxx registers
* only perform the attribute functions.
*/
InstructionTLBMiss:
#ifdef CONFIG_8xx_CPU6
stw r3, 8(r0)
li r3, 0x3f80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr M_TW, r20 /* Save a couple of working registers */
#if !CONFIG_PIN_TLB || CONFIG_MODULES
mfcr r20
stw r20, 0(r0)
#endif
stw r21, 4(r0)
mfspr r20, SRR0 /* Get effective address of fault */
#ifdef CONFIG_8xx_CPU6
li r3, 0x3780
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MD_EPN, r20 /* Have to use MD_EPN for walk, MI_EPN can't */
mfspr r20, M_TWB /* Get level 1 table entry address */
#if !CONFIG_PIN_TLB || CONFIG_MODULES
/* If we are faulting a kernel address, we have to use the
* kernel page tables.
*/
andi. r21, r20, 0x0800 /* Address >= 0x80000000 */
beq 3f
lis r21, swapper_pg_dir@h
ori r21, r21, swapper_pg_dir@l
rlwimi r20, r21, 0, 2, 19
3:
lwz r21, 0(r20) /* Get the level 1 entry */
rlwinm. r20, r21,0,0,19 /* Extract page descriptor page address */
tophys(r21,r21)
ori r21,r21,1 /* Set valid bit */
beq 2f /* If zero, don't try to find a pte */
#else
lwz r21, 0(r20) /* Get the level 1 entry */
mfcr r20
cmplwi cr0,r21,0x0fff /* Test page descriptor page address */
tophys(r21,r21)
ori r21,r21,1 /* Set valid bit */
bng- 2f /* If zero, don't try to find a pte */
mtcr r20
#endif
/* We have a pte table, so load the MI_TWC with the attributes
* for this "segment."
*/
#ifdef CONFIG_8xx_CPU6
li r3, 0x2b80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MI_TWC, r21 /* Set segment attributes */
#ifdef CONFIG_8xx_CPU6
li r3, 0x3b80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MD_TWC, r21 /* Load pte table base address */
mfspr r21, MD_TWC /* ....and get the pte address */
lwz r20, 0(r21) /* Get the pte */
ori r20, r20, _PAGE_ACCESSED
stw r20, 0(r21)
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 21, 22 and 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
li r21, 0x00f0
rlwimi r20, r21, 0, 24, 28 /* Set 24-27, clear 28 */
#ifdef CONFIG_8xx_CPU6
li r3, 0x2d80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MI_RPN, r20 /* Update TLB entry */
mfspr r20, M_TW /* Restore registers */
#if !CONFIG_PIN_TLB || CONFIG_MODULES
lwz r21, 0(r0)
mtcr r21
#endif
lwz r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
rfi
2: /* Restore registers */
#if !CONFIG_PIN_TLB || CONFIG_MODULES
lwz r21, 0(r0)
mtcr r21
#else
mtcr r20
#endif
mfspr r20, M_TW
lwz r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
b InstructionAccess
. = 0x1200
DataStoreTLBMiss:
#ifdef CONFIG_8xx_CPU6
stw r3, 8(r0)
li r3, 0x3f80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr M_TW, r20 /* Save a couple of working registers */
mfcr r20
stw r20, 0(r0)
stw r21, 4(r0)
mfspr r20, M_TWB /* Get level 1 table entry address */
/* If we are faulting a kernel address, we have to use the
* kernel page tables.
*/
andi. r21, r20, 0x0800
beq+ 3f
lis r21, swapper_pg_dir@h
ori r21, r21, swapper_pg_dir@l
rlwimi r20, r21, 0, 2, 19
3:
lwz r21, 0(r20) /* Get the level 1 entry */
rlwinm. r20, r21,0,0,19 /* Extract page descriptor page address */
// beq 4f /* If zero, don't try to find a pte */
/* We have a pte table, so load fetch the pte from the table.
*/
tophys(r21, r21)
ori r21, r21, 1 /* Set valid bit in physical L2 page */
// beq- 4f /* If zero, don't try to find a pte */
beq- 2f /* If zero, don't try to find a pte */
#ifdef CONFIG_8xx_CPU6
li r3, 0x3b80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MD_TWC, r21 /* Load pte table base address */
mfspr r20, MD_TWC /* ....and get the pte address */
lwz r20, 0(r20) /* Get the pte */
/* Insert the Guarded flag into the TWC from the Linux PTE.
* It is bit 27 of both the Linux PTE and the TWC (at least
* I got that right :-). It will be better when we can put
* this into the Linux pgd/pmd and load it in the operation
* above.
*/
rlwimi r21, r20, 0, 27, 27
#ifdef CONFIG_8xx_CPU6
li r3, 0x3b80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MD_TWC, r21
// mfspr r21, MD_TWC /* get the pte address again */
ori r20, r20, _PAGE_ACCESSED
999: mfspr r21, MD_TWC /* get the pte address again */
stw r20, 0(r21)
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 21, 22 and 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
4: li r21, 0x00f0
rlwimi r20, r21, 0, 24, 28 /* Set 24-27, clear 28 */
#ifdef CONFIG_8xx_CPU6
li r3, 0x3d80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MD_RPN, r20 /* Update TLB entry */
#ifdef CONFIG_8xx_DCBxFIXED
#if TAG_VAL == 0x00f0 /* Save 1 instr. by reusing the val loaded in r21 above */
mtspr DAR, r21
#else
TAG_DAR_R20
#endif
#endif
mfspr r20, M_TW /* Restore registers */
lwz r21, 0(r0)
mtcr r21
lwz r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
rfi
2:
#ifdef CONFIG_8xx_DCBxFIXED
/* Copy 20 msb from MD_EPN to DAR since the dcxx instructions fails
* to update DAR when they cause a DTLB Miss.
*/
mfspr r21, MD_EPN
mfspr r20, DAR
rlwimi r20, r21, 0, 0, 19
mtspr DAR, r20
#endif
mfspr r20, M_TW /* Restore registers */
lwz r21, 0(r0)
mtcr r21
lwz r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
b DataAccess
/* This is an instruction TLB error on the MPC8xx. This could be due
* to many reasons, such as executing guarded memory or illegal instruction
* addresses. There is nothing to do but handle a big time error fault.
*/
. = 0x1300
InstructionTLBError:
b InstructionAccess
/* This is the data TLB error on the MPC8xx. This could be due to
* many reasons, including a dirty update to a pte. We can catch that
* one here, but anything else is an error. First, we track down the
* Linux pte. If it is valid, write access is allowed, but the
* page dirty bit is not set, we will set it and reload the TLB. For
* any other case, we bail out to a higher level function that can
* handle it.
*/
. = 0x1400
DataTLBError:
#ifdef CONFIG_8xx_CPU6
stw r3, 8(r0)
li r3, 0x3f80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr M_TW, r20 /* Save a couple of working registers */
mfcr r20
stw r20, 0(r0)
stw r21, 4(r0)
mfspr r20, DAR
#ifdef CONFIG_8xx_DCBxFIXED
/* If DAR contains TAG_VAL implies a buggy dcbx instruction
* that did not set DAR.
*/
cmpwi cr0, r20, TAG_VAL
beq- 100f /* Branch if TAG_VAL to dcbx workaround procedure */
101: /* return from dcbx instruction bug workaround, r20 holds value of DAR */
/* First, make sure this was a store operation.
*/
#endif
mfspr r21, DSISR
andis. r21, r21, 0x0200 /* If set, indicates store op */
// beq 2f
/* The EA of a data TLB miss is automatically stored in the MD_EPN
* register. The EA of a data TLB error is automatically stored in
* the DAR, but not the MD_EPN register. We must copy the 20 most
* significant bits of the EA from the DAR to MD_EPN before we
* start walking the page tables. We also need to copy the CASID
* value from the M_CASID register.
* Addendum: The EA of a data TLB error is _supposed_ to be stored
* in DAR, but it seems that this doesn't happen in some cases, such
* as when the error is due to a dcbi instruction to a page with a
* TLB that doesn't have the changed bit set. In such cases, there
* does not appear to be any way to recover the EA of the error
* since it is neither in DAR nor MD_EPN. As a workaround, the
* _PAGE_HWWRITE bit is set for all kernel data pages when the PTEs
* are initialized in mapin_ram(). This will avoid the problem,
* assuming we only use the dcbi instruction on kernel addresses.
*/
/* DAR is in r20 already */
rlwinm r21, r20, 0, 0, 19
ori r21, r21, MD_EVALID
beq- 2f
mfspr r20, M_CASID
rlwimi r21, r20, 0, 28, 31
#ifdef CONFIG_8xx_CPU6
li r3, 0x3780
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MD_EPN, r21
mfspr r20, M_TWB /* Get level 1 table entry address */
/* If we are faulting a kernel address, we have to use the
* kernel page tables.
*/
andi. r21, r20, 0x0800
beq+ 3f
lis r21, swapper_pg_dir@h
ori r21, r21, swapper_pg_dir@l
rlwimi r20, r21, 0, 2, 19
3:
lwz r21, 0(r20) /* Get the level 1 entry */
rlwinm. r20, r21,0,0,19 /* Extract page descriptor page address */
// beq 2f /* If zero, bail */
/* We have a pte table, so fetch the pte from the table.
*/
tophys(r21, r21)
ori r21, r21, 1 /* Set valid bit in physical L2 page */
beq- 2f /* If zero, bail */
#ifdef CONFIG_8xx_CPU6
li r3, 0x3b80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MD_TWC, r21 /* Load pte table base address */
mfspr r21, MD_TWC /* ....and get the pte address */
lwz r20, 0(r21) /* Get the pte */
andi. r21, r20, _PAGE_RW /* Is it writeable? */
// beq 2f /* Bail out if not */
/* Update 'changed', among others.
*/
ori r20, r20, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
beq- 2f /* Bail out if not */
b 999b
mfspr r21, MD_TWC /* Get pte address again */
stw r20, 0(r21) /* and update pte in table */
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 21, 22 and 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
li r21, 0x00f0
rlwimi r20, r21, 0, 24, 28 /* Set 24-27, clear 28 */
#ifdef CONFIG_8xx_CPU6
li r3, 0x3d80
stw r3, 12(r0)
lwz r3, 12(r0)
#endif
mtspr MD_RPN, r20 /* Update TLB entry */
#ifdef CONFIG_8xx_DCBxFIXED
#if TAG_VAL == 0x00f0 /* Save 1 instr. by reusing the val loaded in r21 above */
mtspr DAR, r21
#else
TAG_DAR_R20
#endif
#endif
mfspr r20, M_TW /* Restore registers */
lwz r21, 0(r0)
mtcr r21
lwz r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
rfi
2:
mfspr r20, M_TW /* Restore registers */
lwz r21, 0(r0)
mtcr r21
lwz r21, 4(r0)
#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0)
#endif
b DataAccess
STD_EXCEPTION(0x1500, Trap_15, UnknownException)
STD_EXCEPTION(0x1600, Trap_16, UnknownException)
STD_EXCEPTION(0x1700, Trap_17, TAUException)
STD_EXCEPTION(0x1800, Trap_18, UnknownException)
STD_EXCEPTION(0x1900, Trap_19, UnknownException)
STD_EXCEPTION(0x1a00, Trap_1a, UnknownException)
STD_EXCEPTION(0x1b00, Trap_1b, UnknownException)
/* On the MPC8xx, these next four traps are used for development
* support of breakpoints and such. Someday I will get around to
* using them.
*/
STD_EXCEPTION(0x1c00, Trap_1c, UnknownException)
STD_EXCEPTION(0x1d00, Trap_1d, UnknownException)
STD_EXCEPTION(0x1e00, Trap_1e, UnknownException)
STD_EXCEPTION(0x1f00, Trap_1f, UnknownException)
. = 0x2000
#ifdef CONFIG_8xx_DCBxFIXED
/* This is the workaround procedure to calculate the data EA for buggy dcbx,dcbi instructions
* by decoding the registers used by the dcbx instruction and adding them.
* DAR is set to the calculated address and r20 also holds the EA on exit.
*/
//#define INSTR_CHECK /* define to verify if it is a dcbx instr. Should not be needed. */
//#define NO_SELF_MODIFYING_CODE /* define if you don't want to use self modifying code */
//#define DEBUG_DCBX_INSTRUCTIONS /* for debugging only. Needs INSTR_CHECK defined as well. */
//#define KERNEL_SPACE_ONLY /* define if user space do NOT contain dcbx instructions. */
#ifndef KERNEL_SPACE_ONLY
nop /* A few nops to make the modified_instr: space below cache line aligned */
nop
139: /* fetch instruction from userspace memory */
DO_8xx_CPU6(0x3780, r3)
mtspr MD_EPN, r20
mfspr r21, M_TWB /* Get level 1 table entry address */
lwz r21, 0(r21) /* Get the level 1 entry */
tophys (r21, r21)
DO_8xx_CPU6(0x3b80, r3)
mtspr MD_TWC, r21 /* Load pte table base address */
mfspr r21, MD_TWC /* ....and get the pte address */
lwz r21, 0(r21) /* Get the pte */
/* concat physical page address(r21) and page offset(r20) */
rlwimi r21, r20, 0, 20, 31
b 140f
#endif
100: /* Entry point for dcbx workaround. */
/* fetch instruction from memory. */
mfspr r20,SRR0
#ifndef KERNEL_SPACE_ONLY
andis. r21, r20, 0x8000
tophys (r21, r20)
beq- 139b /* Branch if user space address */
#else
tophys (r21, r20)
#endif
140: lwz r21,0(r21)
#ifdef INSTR_CHECK
/* Check if it really is a dcbx instruction. This is not needed as far as I can tell */
/* dcbt and dcbtst does not generate DTLB Misses/Errors, no need to include them here */
rlwinm r20, r21, 0, 21, 30
cmpwi cr0, r20, 2028 /* Is dcbz? */
beq+ 142f
cmpwi cr0, r20, 940 /* Is dcbi? */
beq+ 142f
cmpwi cr0, r20, 108 /* Is dcbst? */
beq+ 142f
cmpwi cr0, r20, 172 /* Is dcbf? */
beq+ 142f
cmpwi cr0, r20, 1964 /* Is icbi? */
beq+ 142f
#ifdef DEBUG_DCBX_INSTRUCTIONS
141: b 141b /* Stop here if no dcbx instruction */
#endif
mfspr r20, DAR /* r20 must hold DAR at exit */
b 101b /* None of the above, go back to normal TLB processing */
142: /* continue, it was a dcbx instruction. */
#endif
#ifdef CONFIG_8xx_CPU6
lwz r3, 8(r0) /* restore r3 from memory */
#endif
#ifndef NO_SELF_MODIFYING_CODE
andis. r20,r21,0x1f /* test if reg RA is r0 */
li r20,modified_instr@l
dcbtst r0,r20 /* touch for store */
rlwinm r21,r21,0,0,20 /* Zero lower 10 bits */
oris r21,r21,640 /* Transform instr. to a "add r20,RA,RB" */
ori r21,r21,532
stw r21,0(r20) /* store add/and instruction */
dcbf 0,r20 /* flush new instr. to memory. */
icbi 0,r20 /* invalidate instr. cache line */
lwz r21, 4(r0) /* restore r21 from memory */
mfspr r20, M_TW /* restore r20 from M_TW */
isync /* Wait until new instr is loaded from memory */
modified_instr:
.space 4 /* this is where the add/and instr. is stored */
#ifdef DEBUG_DCBX_INSTRUCTIONS
/* fill with some garbage */
li r21,0xffff
stw r21,0(r21)
#endif
bne+ 143f
subf r20,r0,r20 /* r20=r20-r0, only if reg RA is r0 */
143: mtdar r20 /* store faulting EA in DAR */
b 101b /* Go back to normal TLB handling */
#else
mfctr r20
mtdar r20 /* save ctr reg in DAR */
rlwinm r20, r21, 24, 24, 28 /* offset into jump table for reg RB */
addi r20, r20, 150f@l /* add start of table */
mtctr r20 /* load ctr with jump address */
xor r20, r20, r20 /* sum starts at zero */
bctr /* jump into table */
150:
add r20, r20, r0
b 151f
add r20, r20, r1
b 151f
add r20, r20, r2
b 151f
add r20, r20, r3
b 151f
add r20, r20, r4
b 151f
add r20, r20, r5
b 151f
add r20, r20, r6
b 151f
add r20, r20, r7
b 151f
add r20, r20, r8
b 151f
add r20, r20, r9
b 151f
add r20, r20, r10
b 151f
add r20, r20, r11
b 151f
add r20, r20, r12
b 151f
add r20, r20, r13
b 151f
add r20, r20, r14
b 151f
add r20, r20, r15
b 151f
add r20, r20, r16
b 151f
add r20, r20, r17
b 151f
add r20, r20, r18
b 151f
add r20, r20, r19
b 151f
mtctr r21 /* reg 20 needs special handling */
b 154f
mtctr r21 /* reg 21 needs special handling */
b 153f
add r20, r20, r22
b 151f
add r20, r20, r23
b 151f
add r20, r20, r24
b 151f
add r20, r20, r25
b 151f
add r20, r20, r25
b 151f
add r20, r20, r27
b 151f
add r20, r20, r28
b 151f
add r20, r20, r29
b 151f
add r20, r20, r30
b 151f
add r20, r20, r31
151:
rlwinm. r21,r21,19,24,28 /* offset into jump table for reg RA */
beq 152f /* if reg RA is zero, don't add it */
addi r21, r21, 150b@l /* add start of table */
mtctr r21 /* load ctr with jump address */
rlwinm r21,r21,0,16,10 /* make sure we don't execute this more than once */
bctr /* jump into table */
152:
mfdar r21
mtctr r21 /* restore ctr reg from DAR */
mtdar r20 /* save fault EA to DAR */
b 101b /* Go back to normal TLB handling */
/* special handling for r20,r21 since these are modified already */
153: lwz r21, 4(r0) /* load r21 from memory */
b 155f
154: mfspr r21, M_TW /* load r20 from M_TW */
155: add r20, r20, r21 /* add it */
mfctr r21 /* restore r21 */
b 151b
#endif
#endif
/*
* This code finishes saving the registers to the exception frame
* and jumps to the appropriate handler for the exception, turning
* on address translation.
*/
.globl transfer_to_handler
transfer_to_handler:
stw r22,_NIP(r21)
lis r22,MSR_POW@h
andc r23,r23,r22
stw r23,_MSR(r21)
SAVE_4GPRS(8, r21)
SAVE_8GPRS(12, r21)
SAVE_8GPRS(24, r21)
andi. r23,r23,MSR_PR
mfspr r23,SPRG3 /* if from user, fix up THREAD.regs */
beq 2f
addi r24,r1,STACK_FRAME_OVERHEAD
stw r24,PT_REGS(r23)
2: addi r2,r23,-THREAD /* set r2 to current */
tovirt(r2,r2)
mflr r23
andi. r24,r23,0x3f00 /* get vector offset */
stw r24,TRAP(r21)
li r22,0
stw r22,RESULT(r21)
mtspr SPRG2,r22 /* r1 is now kernel sp */
addi r24,r2,TASK_STRUCT_SIZE /* check for kernel stack overflow */
cmplw 0,r1,r2
cmplw 1,r1,r24
crand 1,1,4
bgt- stack_ovf /* if r2 < r1 < r2+TASK_STRUCT_SIZE */
lwz r24,0(r23) /* virtual address of handler */
lwz r23,4(r23) /* where to go when done */
mtspr SRR0,r24
mtspr SRR1,r20
mtlr r23
SYNC
rfi /* jump to handler, enable MMU */
/*
* On kernel stack overflow, load up an initial stack pointer
* and call StackOverflow(regs), which should not return.
*/
stack_ovf:
addi r3,r1,STACK_FRAME_OVERHEAD
lis r1,init_task_union@ha
addi r1,r1,init_task_union@l
addi r1,r1,TASK_UNION_SIZE-STACK_FRAME_OVERHEAD
lis r24,StackOverflow@ha
addi r24,r24,StackOverflow@l
li r20,MSR_KERNEL
mtspr SRR0,r24
mtspr SRR1,r20
SYNC
rfi
.globl giveup_fpu
giveup_fpu:
blr
/* Maybe someday.......
*/
_GLOBAL(__setup_cpu_8xx)
blr
/*
* This is where the main kernel code starts.
*/
start_here:
/* ptr to current */
lis r2,init_task_union@h
ori r2,r2,init_task_union@l
/* ptr to phys current thread */
tophys(r4,r2)
addi r4,r4,THREAD /* init task's THREAD */
mtspr SPRG3,r4
li r3,0
mtspr SPRG2,r3 /* 0 => r1 has kernel sp */
/* stack */
addi r1,r2,TASK_UNION_SIZE
li r0,0
stwu r0,-STACK_FRAME_OVERHEAD(r1)
bl early_init /* We have to do this with MMU on */
/*
* Decide what sort of machine this is and initialize the MMU.
*/
mr r3,r31
mr r4,r30
mr r5,r29
mr r6,r28
mr r7,r27
bl machine_init
bl MMU_init
/*
* Go back to running unmapped so we can load up new values
* and change to using our exception vectors.
* On the 8xx, all we have to do is invalidate the TLB to clear
* the old 8M byte TLB mappings and load the page table base register.
*/
/* The right way to do this would be to track it down through
* init's THREAD like the context switch code does, but this is
* easier......until someone changes init's static structures.
*/
lis r6, swapper_pg_dir@h
ori r6, r6, swapper_pg_dir@l
tophys(r6,r6)
#ifdef CONFIG_8xx_CPU6
lis r4, cpu6_errata_word@h
ori r4, r4, cpu6_errata_word@l
li r3, 0x3980
stw r3, 12(r4)
lwz r3, 12(r4)
#endif
mtspr M_TWB, r6
lis r4,2f@h
ori r4,r4,2f@l
tophys(r4,r4)
li r3,MSR_KERNEL & ~(MSR_IR|MSR_DR)
mtspr SRR0,r4
mtspr SRR1,r3
rfi
/* Load up the kernel context */
2:
SYNC /* Force all PTE updates to finish */
tlbia /* Clear all TLB entries */
sync /* wait for tlbia/tlbie to finish */
TLBSYNC /* ... on all CPUs */
#ifdef CONFIG_BDI_SWITCH
/* Add helper information for the Abatron bdiGDB debugger.
* We do this here because we know the mmu is disabled, and
* will be enabled for real in just a few instructions.
*/
tovirt(r6,r6)
lis r5, abatron_pteptrs@h
ori r5, r5, abatron_pteptrs@l
stw r5, 0xf0(r0) /* Must match your Abatron config file */
tophys(r5,r5)
stw r6, 0(r5)
#endif
/* Now turn on the MMU for real! */
li r4,MSR_KERNEL
lis r3,start_kernel@h
ori r3,r3,start_kernel@l
mtspr SRR0,r3
mtspr SRR1,r4
rfi /* enable MMU and jump to start_kernel */
/* Set up the initial MMU state so we can do the first level of
* kernel initialization. This maps the first 8 MBytes of memory 1:1
* virtual to physical. Also, set the cache mode since that is defined
* by TLB entries and perform any additional mapping (like of the IMMR).
* If configured to pin some TLBs, we pin the first 8 Mbytes of kernel,
* 24 Mbytes of data, and the 8M IMMR space. Anything not covered by
* these mappings is mapped by page tables.
*/
initial_mmu:
tlbia /* Invalidate all TLB entries */
#ifdef CONFIG_PIN_TLB
lis r8, MI_RSV4I@h
ori r8, r8, 0x1c00
#else
li r8, 0
#endif
mtspr MI_CTR, r8 /* Set instruction MMU control */
#ifdef CONFIG_PIN_TLB
lis r10, (MD_RSV4I | MD_RESETVAL)@h
ori r10, r10, 0x1c00
mr r8, r10
#else
lis r10, MD_RESETVAL@h
#endif
#ifndef CONFIG_8xx_COPYBACK
oris r10, r10, MD_WTDEF@h
#endif
mtspr MD_CTR, r10 /* Set data TLB control */
/* Now map the lower 8 Meg into the TLBs. For this quick hack,
* we can load the instruction and data TLB registers with the
* same values.
*/
lis r8, KERNELBASE@h /* Create vaddr for TLB */
ori r8, r8, MI_EVALID /* Mark it valid */
mtspr MI_EPN, r8
mtspr MD_EPN, r8
li r8, MI_PS8MEG /* Set 8M byte page */
ori r8, r8, MI_SVALID /* Make it valid */
mtspr MI_TWC, r8
mtspr MD_TWC, r8
li r8, MI_BOOTINIT /* Create RPN for address 0 */
mtspr MI_RPN, r8 /* Store TLB entry */
mtspr MD_RPN, r8
lis r8, MI_Kp@h /* Set the protection mode */
mtspr MI_AP, r8
mtspr MD_AP, r8
/* Map another 8 MByte at the IMMR to get the processor
* internal registers (among other things).
*/
#ifdef CONFIG_PIN_TLB
addi r10, r10, 0x0100
mtspr MD_CTR, r10
#endif
mfspr r9, 638 /* Get current IMMR */
andis. r9, r9, 0xff80 /* Get 8Mbyte boundary */
mr r8, r9 /* Create vaddr for TLB */
ori r8, r8, MD_EVALID /* Mark it valid */
mtspr MD_EPN, r8
li r8, MD_PS8MEG /* Set 8M byte page */
ori r8, r8, MD_SVALID /* Make it valid */
mtspr MD_TWC, r8
mr r8, r9 /* Create paddr for TLB */
ori r8, r8, MI_BOOTINIT|0x2 /* Inhibit cache -- Cort */
mtspr MD_RPN, r8
#ifdef CONFIG_PIN_TLB
/* Map two more 8M kernel data pages.
*/
addi r10, r10, 0x0100
mtspr MD_CTR, r10
lis r8, KERNELBASE@h /* Create vaddr for TLB */
addis r8, r8, 0x0080 /* Add 8M */
ori r8, r8, MI_EVALID /* Mark it valid */
mtspr MD_EPN, r8
li r9, MI_PS8MEG /* Set 8M byte page */
ori r9, r9, MI_SVALID /* Make it valid */
mtspr MD_TWC, r9
li r11, MI_BOOTINIT /* Create RPN for address 0 */
addis r11, r11, 0x0080 /* Add 8M */
mtspr MD_RPN, r8
addis r8, r8, 0x0080 /* Add 8M */
mtspr MD_EPN, r8
mtspr MD_TWC, r9
addis r11, r11, 0x0080 /* Add 8M */
mtspr MD_RPN, r8
#endif
/* Since the cache is enabled according to the information we
* just loaded into the TLB, invalidate and enable the caches here.
* We should probably check/set other modes....later.
*/
lis r8, IDC_INVALL@h
mtspr IC_CST, r8
mtspr DC_CST, r8
lis r8, IDC_ENABLE@h
mtspr IC_CST, r8
#ifdef CONFIG_8xx_COPYBACK
mtspr DC_CST, r8
#else
/* For a debug option, I left this here to easily enable
* the write through cache mode
*/
lis r8, DC_SFWT@h
mtspr DC_CST, r8
lis r8, IDC_ENABLE@h
mtspr DC_CST, r8
#endif
blr
/*
* Set up to use a given MMU context.
* r3 is context number, r4 is PGD pointer.
*
* We place the physical address of the new task page directory loaded
* into the MMU base register, and set the ASID compare register with
* the new "context."
*/
_GLOBAL(set_context)
#ifdef CONFIG_BDI_SWITCH
/* Context switch the PTE pointer for the Abatron BDI2000.
* The PGDIR is passed as second argument.
*/
lis r5, KERNELBASE@h
lwz r5, 0xf0(r5)
stw r4, 0x4(r5)
#endif
#ifdef CONFIG_8xx_CPU6
lis r6, cpu6_errata_word@h
ori r6, r6, cpu6_errata_word@l
tophys (r4, r4)
li r7, 0x3980
stw r7, 12(r6)
lwz r7, 12(r6)
mtspr M_TWB, r4 /* Update MMU base address */
li r7, 0x3380
stw r7, 12(r6)
lwz r7, 12(r6)
mtspr M_CASID, r3 /* Update context */
#else
mtspr M_CASID,r3 /* Update context */
tophys (r4, r4)
mtspr M_TWB, r4 /* and pgd */
#endif
SYNC
blr
#ifdef CONFIG_8xx_CPU6
/* It's here because it is unique to the 8xx.
* It is important we get called with interrupts disabled. I used to
* do that, but it appears that all code that calls this already had
* interrupt disabled.
*/
.globl set_dec_cpu6
set_dec_cpu6:
lis r7, cpu6_errata_word@h
ori r7, r7, cpu6_errata_word@l
li r4, 0x2c00
stw r4, 8(r7)
lwz r4, 8(r7)
mtspr 22, r3 /* Update Decrementer */
SYNC
blr
#endif
/*
* We put a few things here that have to be page-aligned.
* This stuff goes at the beginning of the data segment,
* which is page-aligned.
*/
.data
.globl sdata
sdata:
.globl empty_zero_page
empty_zero_page:
.space 4096
.globl swapper_pg_dir
swapper_pg_dir:
.space 4096
/*
* This space gets a copy of optional info passed to us by the bootstrap
* Used to pass parameters into the kernel like root=/dev/sda1, etc.
*/
.globl cmd_line
cmd_line:
.space 512
#ifdef CONFIG_BDI_SWITCH
/* Room for two PTE table poiners, usually the kernel and current user
* pointer to their respective root page table (pgdir).
*/
abatron_pteptrs:
.space 8
#endif
#ifdef CONFIG_8xx_CPU6
.globl cpu6_errata_word
cpu6_errata_word:
.space 16
#endif
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-25 3:03 ` Benjamin Herrenschmidt
2009-09-25 8:31 ` Joakim Tjernlund
@ 2009-09-25 21:18 ` Rex Feany
2009-09-27 13:22 ` Joakim Tjernlund
1 sibling, 1 reply; 58+ messages in thread
From: Rex Feany @ 2009-09-25 21:18 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):
>
> > I think there's more finishyness to 8xx than we thought. IE. That
> > tlbil_va might have more reasons to be there than what the comment
> > seems to advertize. Can you try to move it even higher up ? IE.
> > Unconditionally at the beginning of set_pte_filter ?
> >
> > Also, if that doesn't help, can you try putting one in
> > set_access_flags_filter() just below ?
>
> Ok, I got a refresher on the whole concept of "unpopulated TLB entries"
> on 8xx, and that's damn scary. I think what mislead me initially is that
> the comment around the workaround is simply not properly describing the
> extent of the problem :-)
Oh boy, that sounds bad. Where is a good place to read about this?
> So I'm not going to make the 8xx TLB miss code sane, that's beyond what
> I'm prepare to do with it, but I suspect that this should fix it (on top
> of upstream). Let me know if that's enough or if we also need to put
> one of these in ptep_set_access_flags().
>
> Please let me know if that works for you.
Putting the tlbil_va() in the top of set_pte_filter() doesn't work - it
hangs on boot before it even prints any messages to the console.
However, adding tlbil_va() to ptep_set_access_flags() as you suggested
makes everything happy. I need to test it some more, but it looks good
so far. Below is what I am testing now.
thanks!
/rex.
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 5304093..aef552a 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -176,18 +176,19 @@ static pte_t set_pte_filter(pte_t pte, unsigned long addr)
struct page *pg = maybe_pte_to_page(pte);
if (!pg)
return pte;
- if (!test_bit(PG_arch_1, &pg->flags)) {
#ifdef CONFIG_8xx
- /* On 8xx, cache control instructions (particularly
- * "dcbst" from flush_dcache_icache) fault as write
- * operation if there is an unpopulated TLB entry
- * for the address in question. To workaround that,
- * we invalidate the TLB here, thus avoiding dcbst
- * misbehaviour.
- */
- /* 8xx doesn't care about PID, size or ind args */
- _tlbil_va(addr, 0, 0, 0);
+ /* On 8xx, cache control instructions (particularly
+ * "dcbst" from flush_dcache_icache) fault as write
+ * operation if there is an unpopulated TLB entry
+ * for the address in question. To workaround that,
+ * we invalidate the TLB here, thus avoiding dcbst
+ * misbehaviour.
+ */
+ /* 8xx doesn't care about PID, size or ind args */
+ _tlbil_va(addr, 0, 0, 0);
#endif /* CONFIG_8xx */
+
+ if (!test_bit(PG_arch_1, &pg->flags)) {
flush_dcache_icache_page(pg);
set_bit(PG_arch_1, &pg->flags);
}
@@ -308,6 +309,12 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
int changed;
entry = set_access_flags_filter(entry, vma, dirty);
changed = !pte_same(*(ptep), entry);
+
+#ifdef CONFIG_8xx
+ /* 8xx doesn't care about PID, size or ind args */
+ _tlbil_va(address, 0, 0, 0);
+#endif /* CONFIG_8xx */
+
if (changed) {
if (!(vma->vm_flags & VM_HUGETLB))
assert_pte_locked(vma->vm_mm, address);
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-25 21:18 ` Rex Feany
@ 2009-09-27 13:22 ` Joakim Tjernlund
2009-09-28 3:21 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-27 13:22 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
>
> Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):
>
> >
> > > I think there's more finishyness to 8xx than we thought. IE. That
> > > tlbil_va might have more reasons to be there than what the comment
> > > seems to advertize. Can you try to move it even higher up ? IE.
> > > Unconditionally at the beginning of set_pte_filter ?
> > >
> > > Also, if that doesn't help, can you try putting one in
> > > set_access_flags_filter() just below ?
> >
> > Ok, I got a refresher on the whole concept of "unpopulated TLB entries"
> > on 8xx, and that's damn scary. I think what mislead me initially is that
> > the comment around the workaround is simply not properly describing the
> > extent of the problem :-)
>
> Oh boy, that sounds bad. Where is a good place to read about this?
>
> > So I'm not going to make the 8xx TLB miss code sane, that's beyond what
> > I'm prepare to do with it, but I suspect that this should fix it (on top
> > of upstream). Let me know if that's enough or if we also need to put
> > one of these in ptep_set_access_flags().
> >
> > Please let me know if that works for you.
>
> Putting the tlbil_va() in the top of set_pte_filter() doesn't work - it
> hangs on boot before it even prints any messages to the console.
>
> However, adding tlbil_va() to ptep_set_access_flags() as you suggested
> makes everything happy. I need to test it some more, but it looks good
> so far. Below is what I am testing now.
8xx, is getting very hacky and I suspect that the only long term fix is
add code to trap the cache instructions in TLB error/miss and fixup the
exception in page fault handler. This will also have the added benefit on being able
to use the cache instructions in both kernel and user space like any other
ppc arch.
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-27 13:22 ` Joakim Tjernlund
@ 2009-09-28 3:21 ` Benjamin Herrenschmidt
2009-09-28 7:22 ` Joakim Tjernlund
0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-28 3:21 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev, Rex Feany
On Sun, 2009-09-27 at 15:22 +0200, Joakim Tjernlund wrote:
> > However, adding tlbil_va() to ptep_set_access_flags() as you suggested
> > makes everything happy. I need to test it some more, but it looks good
> > so far. Below is what I am testing now.
>
> 8xx, is getting very hacky and I suspect that the only long term fix is
> add code to trap the cache instructions in TLB error/miss and fixup the
> exception in page fault handler. This will also have the added benefit on being able
> to use the cache instructions in both kernel and user space like any other
> ppc arch.
First I'd like to understand exactly what's happening today, since
it makes little sense :-) I suppose I'll have to get myself some
8xx doco and understand how the bloody MMU works.
Then, I saw your old patch and it's -very- invasive. If we can get away
with a one liner just adding tlbil_va in the right place, I think I'm
happy to stick with it until somebody comes up with a real good reason
to do more :-) 8xx is on life support and has been around for long
enough without people feeling the need overall to work around that
problem so I'm tempted to keep the status-quo here.
Cheers,
Ben
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-28 3:21 ` Benjamin Herrenschmidt
@ 2009-09-28 7:22 ` Joakim Tjernlund
2009-09-28 7:34 ` Benjamin Herrenschmidt
2009-09-28 10:02 ` Joakim Tjernlund
0 siblings, 2 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-28 7:22 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 28/09/2009 05:21:00:
>
> On Sun, 2009-09-27 at 15:22 +0200, Joakim Tjernlund wrote:
> > > However, adding tlbil_va() to ptep_set_access_flags() as you suggested
> > > makes everything happy. I need to test it some more, but it looks good
> > > so far. Below is what I am testing now.
> >
> > 8xx, is getting very hacky and I suspect that the only long term fix is
> > add code to trap the cache instructions in TLB error/miss and fixup the
> > exception in page fault handler. This will also have the added benefit on being able
> > to use the cache instructions in both kernel and user space like any other
> > ppc arch.
>
> First I'd like to understand exactly what's happening today, since
> it makes little sense :-) I suppose I'll have to get myself some
> 8xx doco and understand how the bloody MMU works.
hmm, I recall that the reason for the currect "force a TLB error"
procedure was to fix a problem Montavista had, possibly Tom Rini.
He fixed the problem differently at first but later it
was changed to what is in use today.
If you have access to the old linuxppc-embedded you will find a lot
of info in a thread with subject "[PATCH 2.6.14] mm: 8xx MM fix for"
Also I made a ref that I cannot access anymore:
http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html
>
> Then, I saw your old patch and it's -very- invasive. If we can get away
> with a one liner just adding tlbil_va in the right place, I think I'm
Yes, my patch is invasive but that is because I did it in asm. If you do
it in the page fault handler, it will be much easier.
In principle you would only have to tag DAR and test for
the tag value in asm then branch to the page fault handler. Once there
it is easy to decode the insns.
> happy to stick with it until somebody comes up with a real good reason
> to do more :-) 8xx is on life support and has been around for long
yeah, I figured that too but Freescale seems to be cranking out new variants
still.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-28 7:22 ` Joakim Tjernlund
@ 2009-09-28 7:34 ` Benjamin Herrenschmidt
2009-09-28 7:39 ` Joakim Tjernlund
2009-09-28 10:02 ` Joakim Tjernlund
1 sibling, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-28 7:34 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev, Rex Feany
On Mon, 2009-09-28 at 09:22 +0200, Joakim Tjernlund wrote:
>
> > happy to stick with it until somebody comes up with a real good
> reason
> > to do more :-) 8xx is on life support and has been around for long
>
> yeah, I figured that too but Freescale seems to be cranking out new
> variants still.
Ugh ? And still not fixing the major core bugs ? Grmbl...
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-28 7:34 ` Benjamin Herrenschmidt
@ 2009-09-28 7:39 ` Joakim Tjernlund
0 siblings, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-28 7:39 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 28/09/2009 09:34:46:
>
> On Mon, 2009-09-28 at 09:22 +0200, Joakim Tjernlund wrote:
> >
> > > happy to stick with it until somebody comes up with a real good
> > reason
> > > to do more :-) 8xx is on life support and has been around for long
> >
> > yeah, I figured that too but Freescale seems to be cranking out new
> > variants still.
>
> Ugh ? And still not fixing the major core bugs ? Grmbl...
yes, I even had them chase down the problem so they know exactly where
it is :( I don't think it is in any errata though, but I haven't looked
for a few years.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-28 7:22 ` Joakim Tjernlund
2009-09-28 7:34 ` Benjamin Herrenschmidt
@ 2009-09-28 10:02 ` Joakim Tjernlund
1 sibling, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-28 10:02 UTC (permalink / raw)
Cc: Rex Feany, linuxppc-dev
>
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 28/09/2009 05:21:00:
> >
> > On Sun, 2009-09-27 at 15:22 +0200, Joakim Tjernlund wrote:
> > > > However, adding tlbil_va() to ptep_set_access_flags() as you suggested
> > > > makes everything happy. I need to test it some more, but it looks good
> > > > so far. Below is what I am testing now.
> > >
> > > 8xx, is getting very hacky and I suspect that the only long term fix is
> > > add code to trap the cache instructions in TLB error/miss and fixup the
> > > exception in page fault handler. This will also have the added benefit on being able
> > > to use the cache instructions in both kernel and user space like any other
> > > ppc arch.
> >
> > First I'd like to understand exactly what's happening today, since
> > it makes little sense :-) I suppose I'll have to get myself some
> > 8xx doco and understand how the bloody MMU works.
>
> hmm, I recall that the reason for the currect "force a TLB error"
> procedure was to fix a problem Montavista had, possibly Tom Rini.
> He fixed the problem differently at first but later it
> was changed to what is in use today.
>
> If you have access to the old linuxppc-embedded you will find a lot
> of info in a thread with subject "[PATCH 2.6.14] mm: 8xx MM fix for"
> Also I made a ref that I cannot access anymore:
> http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html
Found a link to the original problem
http://www.mail-archive.com/linuxppc-embedded@ozlabs.org/msg03376.html
seems like I had something to do with the solution that is haunting
8xx now :(
Basically 8xx should revert back to the fix suggested in the above link.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-25 1:51 ` Benjamin Herrenschmidt
2009-09-25 3:03 ` Benjamin Herrenschmidt
@ 2009-09-29 1:21 ` Rex Feany
2009-09-29 6:26 ` Joakim Tjernlund
2009-09-29 7:07 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 58+ messages in thread
From: Rex Feany @ 2009-09-29 1:21 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):
> On Thu, 2009-09-24 at 18:35 -0700, Rex Feany wrote:
> >
> > Then I can boot and get to a shell, but userspace is slow. 8 seconds
> > to mount
> > /proc (vs. less then a second using my old kernel)! Maybe this is an
> > unrelated issue? I'm pretty clueless about the details, I'm sorry.
> > PG_arch_1 is used to prevent a cache flush unless it is actually
> > needed?
> > Then why would changing the location of the tlbil_va() make a
> > difference?
>
> I think there's more finishyness to 8xx than we thought. IE. That
> tlbil_va might have more reasons to be there than what the comment
> seems to advertize. Can you try to move it even higher up ? IE.
> Unconditionally at the beginning of set_pte_filter ?
>
> Also, if that doesn't help, can you try putting one in
> set_access_flags_filter() just below ?
>
> (Beware that there's two different versions of both functions, only the
> first one is compiled/used on 8xx).
>
> It's going to be hard for me to get that "right" since I don't really
> know what's going on with the core here, but I suppose if we get it
> moving along with extra tlb invalidations, that should be "good enough"
> until somebody who really knows what's going on comes up with possibly
> a better fix.
I've tried sticking tlbil_va() in those places, nothing seems to help.
In some cases userspace is slow, in other cases userspace is faster and
unstable: sometimes commands hang, sometimes I am able to ctrl-c and
and kill it, sometimes I get other strange crashes or falures (so far no
kernel oopses though).
take care!
/rex.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 1:21 ` Rex Feany
@ 2009-09-29 6:26 ` Joakim Tjernlund
2009-09-29 7:07 ` Benjamin Herrenschmidt
2009-09-29 7:07 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-29 6:26 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
>
> Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):
>
> > On Thu, 2009-09-24 at 18:35 -0700, Rex Feany wrote:
> > >
> > > Then I can boot and get to a shell, but userspace is slow. 8 seconds
> > > to mount
> > > /proc (vs. less then a second using my old kernel)! Maybe this is an
> > > unrelated issue? I'm pretty clueless about the details, I'm sorry.
> > > PG_arch_1 is used to prevent a cache flush unless it is actually
> > > needed?
> > > Then why would changing the location of the tlbil_va() make a
> > > difference?
> >
> > I think there's more finishyness to 8xx than we thought. IE. That
> > tlbil_va might have more reasons to be there than what the comment
> > seems to advertize. Can you try to move it even higher up ? IE.
> > Unconditionally at the beginning of set_pte_filter ?
> >
> > Also, if that doesn't help, can you try putting one in
> > set_access_flags_filter() just below ?
> >
> > (Beware that there's two different versions of both functions, only the
> > first one is compiled/used on 8xx).
> >
> > It's going to be hard for me to get that "right" since I don't really
> > know what's going on with the core here, but I suppose if we get it
> > moving along with extra tlb invalidations, that should be "good enough"
> > until somebody who really knows what's going on comes up with possibly
> > a better fix.
>
> I've tried sticking tlbil_va() in those places, nothing seems to help.
> In some cases userspace is slow, in other cases userspace is faster and
> unstable: sometimes commands hang, sometimes I am able to ctrl-c and
> and kill it, sometimes I get other strange crashes or falures (so far no
> kernel oopses though).
This is exactly what you get when the "cache insn does not update DAR" bug hits
you.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 1:21 ` Rex Feany
2009-09-29 6:26 ` Joakim Tjernlund
@ 2009-09-29 7:07 ` Benjamin Herrenschmidt
2009-09-29 21:09 ` Rex Feany
1 sibling, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-29 7:07 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
On Mon, 2009-09-28 at 18:21 -0700, Rex Feany wrote:
> > It's going to be hard for me to get that "right" since I don't really
> > know what's going on with the core here, but I suppose if we get it
> > moving along with extra tlb invalidations, that should be "good enough"
> > until somebody who really knows what's going on comes up with possibly
> > a better fix.
>
> I've tried sticking tlbil_va() in those places, nothing seems to help.
> In some cases userspace is slow, in other cases userspace is faster and
> unstable: sometimes commands hang, sometimes I am able to ctrl-c and
> and kill it, sometimes I get other strange crashes or falures (so far no
> kernel oopses though).
And you are positive that with 2.6.31 and your other patch, it works
both fast and stable ? This is strange... the code should be mostly
identical. I'll have a second look and see if I can get you a patch that
reproduce -exactly- the behaviour of 2.6.31 plus your patch.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 6:26 ` Joakim Tjernlund
@ 2009-09-29 7:07 ` Benjamin Herrenschmidt
2009-09-29 8:13 ` Joakim Tjernlund
0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-29 7:07 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev, Rex Feany
On Tue, 2009-09-29 at 08:26 +0200, Joakim Tjernlund wrote:
> > I've tried sticking tlbil_va() in those places, nothing seems to
> help.
> > In some cases userspace is slow, in other cases userspace is faster
> and
> > unstable: sometimes commands hang, sometimes I am able to ctrl-c and
> > and kill it, sometimes I get other strange crashes or falures (so
> far no
> > kernel oopses though).
>
> This is exactly what you get when the "cache insn does not update DAR"
> bug hits
> you.
But then why was it working fine before ? Or it wasn't ?
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 7:07 ` Benjamin Herrenschmidt
@ 2009-09-29 8:13 ` Joakim Tjernlund
2009-09-29 8:16 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-29 8:13 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 29/09/2009 09:07:37:
>
> On Tue, 2009-09-29 at 08:26 +0200, Joakim Tjernlund wrote:
> > > I've tried sticking tlbil_va() in those places, nothing seems to
> > help.
> > > In some cases userspace is slow, in other cases userspace is faster
> > and
> > > unstable: sometimes commands hang, sometimes I am able to ctrl-c and
> > > and kill it, sometimes I get other strange crashes or falures (so
> > far no
> > > kernel oopses though).
> >
> > This is exactly what you get when the "cache insn does not update DAR"
> > bug hits
> > you.
>
> But then why was it working fine before ? Or it wasn't ?
hmm, yes. You do get this and mysterious SEGV if you hit the but so does
other bugs too so this is probably due to missing invalidation.
I suspect that something like below will fix the problem and
is the "correct" fix(untested, not even compiled):
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..f579a11 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -428,7 +428,7 @@ DataStoreTLBMiss:
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
-2: li r11, 0x00f0
+ li r11, 0x00f0
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
@@ -441,7 +441,23 @@ DataStoreTLBMiss:
lwz r3, 8(r0)
#endif
rfi
+2:
+ /* Copy 20 msb from MD_EPN to DAR since the dcxx instructions fails
+ * to update DAR when they cause a DTLB Miss.
+ */
+ mfspr r11, SPRN_MD_EPN
+ mfspr r10, SPRN_DAR
+ rlwimi r10, r11, 0, 0, 19
+ mtspr SPRN_DAR, r10
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ b DataAccess
/* This is an instruction TLB error on the MPC8xx. This could be due
* to many reasons, such as executing guarded memory or illegal instruction
* addresses. There is nothing to do but handle a big time error fault.
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 8:13 ` Joakim Tjernlund
@ 2009-09-29 8:16 ` Benjamin Herrenschmidt
2009-09-29 8:24 ` Joakim Tjernlund
2009-09-29 11:56 ` Joakim Tjernlund
0 siblings, 2 replies; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-29 8:16 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev, Rex Feany
> hmm, yes. You do get this and mysterious SEGV if you hit the but so does
> other bugs too so this is probably due to missing invalidation.
>
> I suspect that something like below will fix the problem and
> is the "correct" fix(untested, not even compiled):
Ok but do we also still have to worry about the "unpopulated" TLB
entries and invalidate them somehow when populating ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 8:16 ` Benjamin Herrenschmidt
@ 2009-09-29 8:24 ` Joakim Tjernlund
2009-09-29 11:56 ` Joakim Tjernlund
1 sibling, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-29 8:24 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 29/09/2009 10:16:38:
>
>
> > hmm, yes. You do get this and mysterious SEGV if you hit the but so does
> > other bugs too so this is probably due to missing invalidation.
> >
> > I suspect that something like below will fix the problem and
> > is the "correct" fix(untested, not even compiled):
>
> Ok but do we also still have to worry about the "unpopulated" TLB
> entries and invalidate them somehow when populating ?
No, this patch is to not populate the MMU with invalid entries at
all. Hopefully you can remove those invalidate 8xx TLBs hacks
you currently have.
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 8:16 ` Benjamin Herrenschmidt
2009-09-29 8:24 ` Joakim Tjernlund
@ 2009-09-29 11:56 ` Joakim Tjernlund
2009-09-29 21:03 ` Rex Feany
1 sibling, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-29 11:56 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 29/09/2009 10:16:38:
>
>
> > hmm, yes. You do get this and mysterious SEGV if you hit the but so does
> > other bugs too so this is probably due to missing invalidation.
> >
> > I suspect that something like below will fix the problem and
> > is the "correct" fix(untested, not even compiled):
>
> Ok but do we also still have to worry about the "unpopulated" TLB
> entries and invalidate them somehow when populating ?
Since I am probably the only one that knows about DAR problem I figured
I should take a stab at it. This is not tested, but I hope Rex and the list
can do that. Once this works as it should, we can remove all special handling
for 8xx in copy_tofrom_user() and friends.
No sign-off yet, want some confirmation first.
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 4dd38f1..691ebd3 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -774,7 +774,14 @@ restore:
lwz r11,_CTR(r1)
mtspr SPRN_XER,r10
mtctr r11
-
+#ifdef CONFIG_8xx
+ /* Tag DAR with a well know value.
+ * This needs to match head_8xx.S and
+ * do_page_fault()
+ */
+ li r10, 0xf0
+ mtspr SPRN_DAR, r10
+#endif
PPC405_ERR77(0,r1)
BEGIN_FTR_SECTION
lwarx r11,0,r1
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..418ea96 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -39,6 +39,15 @@
#else
#define DO_8xx_CPU6(val, reg)
#endif
+
+/* DAR needs to be tagged with a known value so that the
+ * DataTLB Miss/Error and do_page_fault() can recognize a
+ * buggy dcbx instruction and workaround the problem.
+ * dcbf, dcbi, dcbst, dcbz instructions do not update DAR
+ * when trapping into a Data TLB Miss/Error. See
+ * DataStoreTLBMiss and DataTLBError for details
+ */
+
__HEAD
_ENTRY(_stext);
_ENTRY(_start);
@@ -428,7 +437,8 @@ DataStoreTLBMiss:
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
-2: li r11, 0x00f0
+ li r11, 0x00f0
+ mtspr SPRN_DAR, r11 /* Tag DAR */
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
@@ -441,7 +451,15 @@ DataStoreTLBMiss:
lwz r3, 8(r0)
#endif
rfi
-
+2:
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ b DataAccess
/* This is an instruction TLB error on the MPC8xx. This could be due
* to many reasons, such as executing guarded memory or illegal instruction
* addresses. There is nothing to do but handle a big time error fault.
@@ -492,6 +510,8 @@ DataTLBError:
* assuming we only use the dcbi instruction on kernel addresses.
*/
mfspr r10, SPRN_DAR
+ cmpwi cr0, r10, 0xf0 /* check it DAR holds a tag */
+ beq- 2f
rlwinm r11, r10, 0, 0, 19
ori r11, r11, MD_EVALID
mfspr r10, SPRN_M_CASID
@@ -547,6 +567,7 @@ DataTLBError:
* of the MMU.
*/
li r11, 0x00f0
+ mtspr SPRN_DAR, r11 /* Tag DAR */
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 7699394..be779b2 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -125,6 +125,32 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
int trap = TRAP(regs);
int is_exec = trap == 0x400;
+#if defined(CONFIG_8xx)
+/*
+ Workarund DTLB Miss/Error, as these do not update DAR
+ for dcbf, dcbi, dcbst, dcbz instructions
+ This relies on every exception tagging DAR with 0xf0
+ before returning (rfi)
+ DAR as passed as address to this function.
+ */
+#define RA(inst) (((inst) & 0x001F0000) >> 16)
+#define RB(inst) (((inst) & 0x0000F800) >> 11)
+ {
+ unsigned long ra, rb, dar, insns;
+
+ if (trap == 0x300 && address == 0xf0) {
+ insns = *((unsigned long *)regs->nip);
+ /* Really check if it is an dcbf, dcbi, dcbst, dcbz insns ? */
+ ra = RA(insns); /* Reg Ra */
+ rb = RB(insns); /* Reg Rb */
+ dar = regs->gpr[rb];
+ if (ra)
+ dar += regs->gpr[ra];
+ /* regs->dar = dar; perhaps */
+ address = dar;
+ }
+ }
+#endif
#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
/*
* Fortunately the bit assignments in SRR1 for an instruction
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 11:56 ` Joakim Tjernlund
@ 2009-09-29 21:03 ` Rex Feany
2009-09-30 7:59 ` Joakim Tjernlund
0 siblings, 1 reply; 58+ messages in thread
From: Rex Feany @ 2009-09-29 21:03 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev
Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 29/09/2009 10:16:38:
> >
> >
> > > hmm, yes. You do get this and mysterious SEGV if you hit the but so does
> > > other bugs too so this is probably due to missing invalidation.
> > >
> > > I suspect that something like below will fix the problem and
> > > is the "correct" fix(untested, not even compiled):
> >
> > Ok but do we also still have to worry about the "unpopulated" TLB
> > entries and invalidate them somehow when populating ?
>
> Since I am probably the only one that knows about DAR problem I figured
> I should take a stab at it. This is not tested, but I hope Rex and the list
> can do that. Once this works as it should, we can remove all special handling
> for 8xx in copy_tofrom_user() and friends.
> No sign-off yet, want some confirmation first.
It doesn't make a difference. I applied it to the top of the tree,
it got to userspace but it is stuck. Using break I am able to dump the
registers, I'm not sure if this is useful. Oh, well, I finally got to a shell
but it is unusably slow. Is there some test code that would be better to run?
I tried looking through the mailing list archives but I couldn't find anything.
thanks!
/rex.
SysRq : Show Regs
NIP: c00588d0 LR: c000e3c4 CTR: 00001fde
REGS: c3459d90 TRAP: 0501 Not tainted (2.6.32-rc2-00013-g2d222d9-dirty)
MSR: 00009032 <EE,ME,IR,DR> CR: 44008422 XER: 20006a02
TASK = c3438050[13] 'rc.sysinit' THREAD: c3458000
GPR00: c000e3c4 c3459e40 c3438050 c21a2a60 c345bdf4 0fd81032 00000000 3001e920
GPR08: 00000000 00000005 c21a2a60 c0210000 03438260
NIP [c00588d0] handle_mm_fault+0x10/0xacc
LR [c000e3c4] do_page_fault+0x2f0/0x474
Call Trace:
[c3459e40] [c0059328] handle_mm_fault+0xa68/0xacc (unreliable)
[c3459e90] [c000e418] do_page_fault+0x344/0x474
[c3459f40] [c000d520] handle_page_fault+0xc/0x80
Instruction dump:
4bff15f1 39600000 80010014 7d635b78 7c0803a6 bbc10008 38210010 4e800020
7c0802a6 9421ffb0 3d60c021 be810020 <90010054> 38000000 90020000 396b4bec
SysRq : Show Regs
NIP: c000e0e0 LR: c000d520 CTR: 00001fde
REGS: c21adde0 TRAP: 0501 Not tainted (2.6.32-rc2-00013-g2d222d9-dirty)
MSR: 00009032 <EE,ME,IR,DR> CR: 48008424 XER: 00006a02
TASK = c3438460[18] 'sh' THREAD: c21ac000
GPR00: c000d520 c21ade90 c3438460 c21adf50 0fd76cb8 c0000000 00000004 0fee8c44
GPR08: 00009f6c c000d788 00009032 c000d514 03438670
NIP [c000e0e0] do_page_fault+0xc/0x474
LR [c000d520] handle_page_fault+0xc/0x80
Call Trace:
[c21ade90] [c000e418] do_page_fault+0x344/0x474 (unreliable)
[c21adf40] [c000d520] handle_page_fault+0xc/0x80
Instruction dump:
3863f604 7fe4fb78 7fc5f378 4bffcd39 80010014 bbc10008 7c0803a6 38210010
4e800020 7c0802a6 9421ff50 bf010090 <900100b4> 7c7e1b78 800300a0 7c9d2378
SysRq : Show Regs
NIP: c005930c LR: c000e3c4 CTR: 00001fde
REGS: c21add90 TRAP: 0501 Not tainted (2.6.32-rc2-00013-g2d222d9-dirty)
MSR: 00009032 <EE,ME,IR,DR> CR: 24002422 XER: 00006a02
TASK = c3438460[18] 'sh' THREAD: c21ac000
GPR00: c0236000 c21ade40 c3438460 c21a2a60 c3444df4 c3456000 00000000 feff0000
GPR08: c21ac000 03456000 c0220000 00000001 03438670
NIP [c005930c] handle_mm_fault+0xa4c/0xacc
LR [c000e3c4] do_page_fault+0x2f0/0x474
Call Trace:
[c21ade40] [c0059328] handle_mm_fault+0xa68/0xacc (unreliable)
[c21ade90] [c000e3c4] do_page_fault+0x2f0/0x474
[c21adf40] [c000d520] handle_page_fault+0xc/0x80
Instruction dump:
4802192d 48000094 2f9f0000 3ba00001 419e0088 7fe3fb78 4bff46a1 48000078
7fe3fb78 4bff4695 48000070 63390020 <633f0080> 7f45d378 7f63db78 7f04c378
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 7:07 ` Benjamin Herrenschmidt
@ 2009-09-29 21:09 ` Rex Feany
0 siblings, 0 replies; 58+ messages in thread
From: Rex Feany @ 2009-09-29 21:09 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
Thus spake Benjamin Herrenschmidt (benh@kernel.crashing.org):
> On Mon, 2009-09-28 at 18:21 -0700, Rex Feany wrote:
> > > It's going to be hard for me to get that "right" since I don't really
> > > know what's going on with the core here, but I suppose if we get it
> > > moving along with extra tlb invalidations, that should be "good enough"
> > > until somebody who really knows what's going on comes up with possibly
> > > a better fix.
> >
> > I've tried sticking tlbil_va() in those places, nothing seems to help.
> > In some cases userspace is slow, in other cases userspace is faster and
> > unstable: sometimes commands hang, sometimes I am able to ctrl-c and
> > and kill it, sometimes I get other strange crashes or falures (so far no
> > kernel oopses though).
>
> And you are positive that with 2.6.31 and your other patch, it works
> both fast and stable ? This is strange... the code should be mostly
> identical. I'll have a second look and see if I can get you a patch that
> reproduce -exactly- the behaviour of 2.6.31 plus your patch.
The difference is night and day - with 2.6.31 I can boot into single
user mode, I can run our custom software (I left it running over night
with very simple test script - no crashes).
With the top of the tree I can sometimes boot into a shell, and if it
isn't crashing or hanging on boot it runs very slow (10+ seconds to do
anything, if I am lucky).
thanks!
/rex
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-29 21:03 ` Rex Feany
@ 2009-09-30 7:59 ` Joakim Tjernlund
2009-09-30 8:19 ` Joakim Tjernlund
0 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-30 7:59 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
Rex Feany <RFeany@mrv.com> wrote on 29/09/2009 23:03:31:
>
> Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
>
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 29/09/2009 10:16:38:
> > >
> > >
> > > > hmm, yes. You do get this and mysterious SEGV if you hit the but so does
> > > > other bugs too so this is probably due to missing invalidation.
> > > >
> > > > I suspect that something like below will fix the problem and
> > > > is the "correct" fix(untested, not even compiled):
> > >
> > > Ok but do we also still have to worry about the "unpopulated" TLB
> > > entries and invalidate them somehow when populating ?
> >
> > Since I am probably the only one that knows about DAR problem I figured
> > I should take a stab at it. This is not tested, but I hope Rex and the list
> > can do that. Once this works as it should, we can remove all special handling
> > for 8xx in copy_tofrom_user() and friends.
> > No sign-off yet, want some confirmation first.
>
> It doesn't make a difference. I applied it to the top of the tree,
> it got to userspace but it is stuck. Using break I am able to dump the
> registers, I'm not sure if this is useful. Oh, well, I finally got to a shell
> but it is unusably slow. Is there some test code that would be better to run?
> I tried looking through the mailing list archives but I couldn't find anything.
>
> thanks!
> /rex.
Ok, I have made some minor tweaks and added debug code in
do_page_fault(). Would be great if you could try on both
.31 and top of tree.
Jocke
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 4dd38f1..b3f6687 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -709,6 +709,14 @@ ret_from_except:
SYNC /* Some chip revs have problems here... */
MTMSRD(r10) /* disable interrupts */
+#ifdef CONFIG_8xx
+ /* Tag DAR with a well know value.
+ * This needs to match head_8xx.S and
+ * do_page_fault()
+ */
+ li r3, 0x00f0
+ mtspr SPRN_DAR, r3
+#endif
lwz r3,_MSR(r1) /* Returning to user mode? */
andi. r0,r3,MSR_PR
beq resume_kernel
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..f9e2363 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -39,6 +39,15 @@
#else
#define DO_8xx_CPU6(val, reg)
#endif
+
+/* DAR needs to be tagged with a known value so that the
+ * DataTLB Miss/Error and do_page_fault() can recognize a
+ * buggy dcbx instruction and workaround the problem.
+ * dcbf, dcbi, dcbst, dcbz instructions do not update DAR
+ * when trapping into a Data TLB Miss/Error. See
+ * DataStoreTLBMiss and DataTLBError for details
+ */
+
__HEAD
_ENTRY(_stext);
_ENTRY(_start);
@@ -352,7 +361,7 @@ InstructionTLBMiss:
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
-2: li r11, 0x00f0
+ li r11, 0x00f0
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x2d80, r3)
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
@@ -365,6 +374,19 @@ InstructionTLBMiss:
lwz r3, 8(r0)
#endif
rfi
+2:
+ mfspr r10, SPRN_SRR1
+ rlwinm r10,r10,0,5,3 /* clear bit 4(0x08000000) */
+ mtspr SPRN_SRR1, r10
+
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ b InstructionAccess
. = 0x1200
DataStoreTLBMiss:
@@ -428,10 +450,11 @@ DataStoreTLBMiss:
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
-2: li r11, 0x00f0
+ li r11, 0x00f0
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
+ mtspr SPRN_DAR, r11 /* Tag DAR */
mfspr r10, SPRN_M_TW /* Restore registers */
lwz r11, 0(r0)
@@ -441,7 +464,15 @@ DataStoreTLBMiss:
lwz r3, 8(r0)
#endif
rfi
-
+2:
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ b DataAccess
/* This is an instruction TLB error on the MPC8xx. This could be due
* to many reasons, such as executing guarded memory or illegal instruction
* addresses. There is nothing to do but handle a big time error fault.
@@ -492,6 +523,8 @@ DataTLBError:
* assuming we only use the dcbi instruction on kernel addresses.
*/
mfspr r10, SPRN_DAR
+ cmpwi cr0, r10, 0x00f0 /* check if DAR holds a tag */
+ beq- 2f
rlwinm r11, r10, 0, 0, 19
ori r11, r11, MD_EVALID
mfspr r10, SPRN_M_CASID
@@ -550,6 +583,7 @@ DataTLBError:
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
+ mtspr SPRN_DAR, r11 /* Tag DAR */
mfspr r10, SPRN_M_TW /* Restore registers */
lwz r11, 0(r0)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 7699394..4fefd01 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -125,6 +125,61 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
int trap = TRAP(regs);
int is_exec = trap == 0x400;
+#if 1 /* defined(CONFIG_8xx)*/
+/*
+ Work around DTLB Miss/Error, as these do not update
+ DAR for dcbf, dcbi, dcbst, dcbz instructions
+ This relies on every exception tagging DAR with 0xf0
+ before returning (rfi)
+ DAR is passed as 'address' to this function.
+ */
+ {
+ unsigned long ra, rb, dar, insn;
+ char *istr = NULL;
+
+ insn = *((unsigned long *)regs->nip);
+ if ((insn >> (31-5)) == 31) {
+ if ((insn >> 1) == 1014) /* dcbz ? */
+ istr = "dcbz";
+ if ((insn >> 1) == 86) /* dcbf ? */
+ istr = "dcbf";
+ if ((insn >> 1) == 470) /* dcbi ? */
+ istr = "dcbi";
+ if ((insn >> 1) == 54) /* dcbst ? */
+ istr = "dcbst";
+ }
+ if (istr) {
+ ra = (insn >> (31-15)) & 0x1f; /* Reg RA */
+ rb = (insn >> (31-20)) & 0x1f; /* Reg RB */
+ printk(KERN_CRIT "%s r%ld,r%ld found at NIP:%lx\n",
+ istr, regs->nip, ra, rb);
+ printk(KERN_CRIT "DAR is:%lx (should be 0xf0)\n", regs->dar);
+
+ }
+ if (regs->dar == 0xf0 && !istr)
+ printk(KERN_CRIT "DAR is 0xf0 but insn at NIP is: %lx is"
+ "not a cache insn:%lx!\n", regs->nip, insn);
+ if (trap == 0x300 && address != regs->dar)
+ printk(KERN_CRIT "DAR:%lx != address:%lx!\n", address, regs->dar);
+
+ if (istr || (trap == 0x300 && address == 0xf0)) {
+ insn = *((unsigned long *)regs->nip);
+ /* Check if it is a
+ * dcbf, dcbi, dcbst, dcbz insn ?
+ if ((insn >> (31-5)) != 31)
+ break; // Not a dcbx instruction
+ */
+
+ ra = (insn >> (31-15)) & 0x1f; /* Reg RA */
+ rb = (insn >> (31-20)) & 0x1f; /* Reg RB */
+ dar = regs->gpr[rb];
+ if (ra)
+ dar += regs->gpr[ra];
+ regs->dar = dar;
+ address = dar;
+ }
+ }
+#endif
#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
/*
* Fortunately the bit assignments in SRR1 for an instruction
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 7:59 ` Joakim Tjernlund
@ 2009-09-30 8:19 ` Joakim Tjernlund
2009-09-30 9:00 ` Rex Feany
0 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-30 8:19 UTC (permalink / raw)
Cc: linuxppc-dev, Rex Feany
>
> Rex Feany <RFeany@mrv.com> wrote on 29/09/2009 23:03:31:
> >
> > Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
> >
> > > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 29/09/2009 10:16:38:
> > > >
> > > >
> > > > > hmm, yes. You do get this and mysterious SEGV if you hit the but so does
> > > > > other bugs too so this is probably due to missing invalidation.
> > > > >
> > > > > I suspect that something like below will fix the problem and
> > > > > is the "correct" fix(untested, not even compiled):
> > > >
> > > > Ok but do we also still have to worry about the "unpopulated" TLB
> > > > entries and invalidate them somehow when populating ?
> > >
> > > Since I am probably the only one that knows about DAR problem I figured
> > > I should take a stab at it. This is not tested, but I hope Rex and the list
> > > can do that. Once this works as it should, we can remove all special handling
> > > for 8xx in copy_tofrom_user() and friends.
> > > No sign-off yet, want some confirmation first.
> >
> > It doesn't make a difference. I applied it to the top of the tree,
> > it got to userspace but it is stuck. Using break I am able to dump the
> > registers, I'm not sure if this is useful. Oh, well, I finally got to a shell
> > but it is unusably slow. Is there some test code that would be better to run?
> > I tried looking through the mailing list archives but I couldn't find anything.
> >
> > thanks!
> > /rex.
>
> Ok, I have made some minor tweaks and added debug code in
> do_page_fault(). Would be great if you could try on both
> .31 and top of tree.
>
> Jocke
OOPS, found a bug. Use this one instead:
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 4dd38f1..b3f6687 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -709,6 +709,14 @@ ret_from_except:
SYNC /* Some chip revs have problems here... */
MTMSRD(r10) /* disable interrupts */
+#ifdef CONFIG_8xx
+ /* Tag DAR with a well know value.
+ * This needs to match head_8xx.S and
+ * do_page_fault()
+ */
+ li r3, 0x00f0
+ mtspr SPRN_DAR, r3
+#endif
lwz r3,_MSR(r1) /* Returning to user mode? */
andi. r0,r3,MSR_PR
beq resume_kernel
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..f9e2363 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -39,6 +39,15 @@
#else
#define DO_8xx_CPU6(val, reg)
#endif
+
+/* DAR needs to be tagged with a known value so that the
+ * DataTLB Miss/Error and do_page_fault() can recognize a
+ * buggy dcbx instruction and workaround the problem.
+ * dcbf, dcbi, dcbst, dcbz instructions do not update DAR
+ * when trapping into a Data TLB Miss/Error. See
+ * DataStoreTLBMiss and DataTLBError for details
+ */
+
__HEAD
_ENTRY(_stext);
_ENTRY(_start);
@@ -352,7 +361,7 @@ InstructionTLBMiss:
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
-2: li r11, 0x00f0
+ li r11, 0x00f0
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x2d80, r3)
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
@@ -365,6 +374,19 @@ InstructionTLBMiss:
lwz r3, 8(r0)
#endif
rfi
+2:
+ mfspr r10, SPRN_SRR1
+ rlwinm r10,r10,0,5,3 /* clear bit 4(0x08000000) */
+ mtspr SPRN_SRR1, r10
+
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ b InstructionAccess
. = 0x1200
DataStoreTLBMiss:
@@ -428,10 +450,11 @@ DataStoreTLBMiss:
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
-2: li r11, 0x00f0
+ li r11, 0x00f0
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
+ mtspr SPRN_DAR, r11 /* Tag DAR */
mfspr r10, SPRN_M_TW /* Restore registers */
lwz r11, 0(r0)
@@ -441,7 +464,15 @@ DataStoreTLBMiss:
lwz r3, 8(r0)
#endif
rfi
-
+2:
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ b DataAccess
/* This is an instruction TLB error on the MPC8xx. This could be due
* to many reasons, such as executing guarded memory or illegal instruction
* addresses. There is nothing to do but handle a big time error fault.
@@ -492,6 +523,8 @@ DataTLBError:
* assuming we only use the dcbi instruction on kernel addresses.
*/
mfspr r10, SPRN_DAR
+ cmpwi cr0, r10, 0x00f0 /* check if DAR holds a tag */
+ beq- 2f
rlwinm r11, r10, 0, 0, 19
ori r11, r11, MD_EVALID
mfspr r10, SPRN_M_CASID
@@ -550,6 +583,7 @@ DataTLBError:
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
+ mtspr SPRN_DAR, r11 /* Tag DAR */
mfspr r10, SPRN_M_TW /* Restore registers */
lwz r11, 0(r0)
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 7699394..789b16b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -125,6 +125,61 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
int trap = TRAP(regs);
int is_exec = trap == 0x400;
+#if 1 /* defined(CONFIG_8xx)*/
+/*
+ Work around DTLB Miss/Error, as these do not update
+ DAR for dcbf, dcbi, dcbst, dcbz instructions
+ This relies on every exception tagging DAR with 0xf0
+ before returning (rfi)
+ DAR is passed as 'address' to this function.
+ */
+ {
+ unsigned long ra, rb, dar, insn;
+ char *istr = NULL;
+
+ insn = *((unsigned long *)regs->nip);
+ if (((insn >> (31-5)) & 0x3f) == 31) {
+ if (((insn >> 1) & 0x3ff) == 1014) /* dcbz ? 0x3f6 */
+ istr = "dcbz";
+ if (((insn >> 1) & 0x3ff) == 86) /* dcbf ? 0x56 */
+ istr = "dcbf";
+ if (((insn >> 1) & 0x3ff) == 470) /* dcbi ? 0x1d6 */
+ istr = "dcbi";
+ if (((insn >> 1) & 0x3ff) == 54) /* dcbst ? 0x36 */
+ istr = "dcbst";
+ }
+ if (istr) {
+ ra = (insn >> (31-15)) & 0x1f; /* Reg RA */
+ rb = (insn >> (31-20)) & 0x1f; /* Reg RB */
+ printk(KERN_CRIT "%s r%ld,r%ld found at NIP:%lx\n",
+ istr, regs->nip, ra, rb);
+ printk(KERN_CRIT "DAR is:%lx (should be 0xf0)\n", regs->dar);
+
+ }
+ if (regs->dar == 0x00f0 && !istr)
+ printk(KERN_CRIT "DAR is 0x00f0 but insn at NIP is: %lx is"
+ "not a cache insn:%lx!\n", regs->nip, insn);
+ if (trap == 0x300 && address != regs->dar)
+ printk(KERN_CRIT "DAR:%lx != address:%lx!\n", address, regs->dar);
+
+ if (istr || (trap == 0x300 && address == 0x00f0)) {
+ insn = *((unsigned long *)regs->nip);
+ /* Check if it is a
+ * dcbf, dcbi, dcbst, dcbz insn ?
+ if ((insn >> (31-5)) != 31)
+ break; // Not a dcbx instruction
+ */
+
+ ra = (insn >> (31-15)) & 0x1f; /* Reg RA */
+ rb = (insn >> (31-20)) & 0x1f; /* Reg RB */
+ dar = regs->gpr[rb];
+ if (ra)
+ dar += regs->gpr[ra];
+ regs->dar = dar;
+ address = dar;
+ }
+ }
+#endif
#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
/*
* Fortunately the bit assignments in SRR1 for an instruction
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 8:19 ` Joakim Tjernlund
@ 2009-09-30 9:00 ` Rex Feany
2009-09-30 9:58 ` Joakim Tjernlund
2009-09-30 11:18 ` Joakim Tjernlund
0 siblings, 2 replies; 58+ messages in thread
From: Rex Feany @ 2009-09-30 9:00 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev
Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
> > Ok, I have made some minor tweaks and added debug code in
> > do_page_fault(). Would be great if you could try on both
> > .31 and top of tree.
> >
> > Jocke
>
> OOPS, found a bug. Use this one instead:
.31 - no change, it worked before your patch and it works after.
None of your debugging printks show up. I tried removing the tlbil_va()
from do_dcache_icache_coherency() and userspace goes back to be being
slow/non functional.
top of tree - userspace is slow and unstable, random segfaults,
and none of your debugging printks show up :(
take care!
/rex
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 9:00 ` Rex Feany
@ 2009-09-30 9:58 ` Joakim Tjernlund
2009-09-30 11:18 ` Joakim Tjernlund
1 sibling, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-30 9:58 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
Rex Feany <RFeany@mrv.com> wrote on 30/09/2009 11:00:02:
>
> Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
>
> > > Ok, I have made some minor tweaks and added debug code in
> > > do_page_fault(). Would be great if you could try on both
> > > .31 and top of tree.
> > >
> > > Jocke
> >
> > OOPS, found a bug. Use this one instead:
>
> .31 - no change, it worked before your patch and it works after.
> None of your debugging printks show up. I tried removing the tlbil_va()
> from do_dcache_icache_coherency() and userspace goes back to be being
> slow/non functional.
>
> top of tree - userspace is slow and unstable, random segfaults,
> and none of your debugging printks show up :(
OK, something strange is going on. I am starting to suspect that there
is some other problem here.
If my patch is any good you should be able to use dcbx insn in copy_tofrom_user().
You could try changing all CONFIG_8xx to CONFIG_8xx_deleted arch/powerpc/lib/copy_32.S
and see if that works and if you see any debug prints.
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 9:00 ` Rex Feany
2009-09-30 9:58 ` Joakim Tjernlund
@ 2009-09-30 11:18 ` Joakim Tjernlund
2009-09-30 17:23 ` Joakim Tjernlund
1 sibling, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-30 11:18 UTC (permalink / raw)
To: Rex Feany; +Cc: linuxppc-dev
Rex Feany <RFeany@mrv.com> wrote on 30/09/2009 11:00:02:
>
> Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
>
> > > Ok, I have made some minor tweaks and added debug code in
> > > do_page_fault(). Would be great if you could try on both
> > > .31 and top of tree.
> > >
> > > Jocke
> >
> > OOPS, found a bug. Use this one instead:
>
> .31 - no change, it worked before your patch and it works after.
> None of your debugging printks show up. I tried removing the tlbil_va()
> from do_dcache_icache_coherency() and userspace goes back to be being
> slow/non functional.
Had a look at linus tree and there is something I don't understand.
Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
works and top of tree does not, how can that be?
To me it seems more likely that some mm change introduced between .31 and
top of tree is the culprit.
My patch addresses the problem described in the comment:
/* On 8xx, cache control instructions (particularly
* "dcbst" from flush_dcache_icache) fault as write
* operation if there is an unpopulated TLB entry
* for the address in question. To workaround that,
* we invalidate the TLB here, thus avoiding dcbst
* misbehaviour.
*/
Now you are using this old fix to paper over some other bug or so I think.
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 11:18 ` Joakim Tjernlund
@ 2009-09-30 17:23 ` Joakim Tjernlund
2009-09-30 22:35 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-09-30 17:23 UTC (permalink / raw)
Cc: linuxppc-dev, Rex Feany
>
> Rex Feany <RFeany@mrv.com> wrote on 30/09/2009 11:00:02:
> >
> > Thus spake Joakim Tjernlund (joakim.tjernlund@transmode.se):
> >
> > > > Ok, I have made some minor tweaks and added debug code in
> > > > do_page_fault(). Would be great if you could try on both
> > > > .31 and top of tree.
> > > >
> > > > Jocke
> > >
> > > OOPS, found a bug. Use this one instead:
> >
> > .31 - no change, it worked before your patch and it works after.
> > None of your debugging printks show up. I tried removing the tlbil_va()
> > from do_dcache_icache_coherency() and userspace goes back to be being
> > slow/non functional.
>
> Had a look at linus tree and there is something I don't understand.
> Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> works and top of tree does not, how can that be?
>
> To me it seems more likely that some mm change introduced between .31 and
> top of tree is the culprit.
> My patch addresses the problem described in the comment:
> /* On 8xx, cache control instructions (particularly
> * "dcbst" from flush_dcache_icache) fault as write
> * operation if there is an unpopulated TLB entry
> * for the address in question. To workaround that,
> * we invalidate the TLB here, thus avoiding dcbst
> * misbehaviour.
> */
> Now you are using this old fix to paper over some other bug or so I think.
There is something fishy with the TLB status, looking into the mpc862 manual I
don't see how it can work reliably. Need to dwell some more.
Anyhow, I have incorporated some of my findings into a new patch,
perhaps this will be the golden one?
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 4dd38f1..b3f6687 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -709,6 +709,14 @@ ret_from_except:
SYNC /* Some chip revs have problems here... */
MTMSRD(r10) /* disable interrupts */
+#ifdef CONFIG_8xx
+ /* Tag DAR with a well know value.
+ * This needs to match head_8xx.S and
+ * do_page_fault()
+ */
+ li r3, 0x00f0
+ mtspr SPRN_DAR, r3
+#endif
lwz r3,_MSR(r1) /* Returning to user mode? */
andi. r0,r3,MSR_PR
beq resume_kernel
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..de1fd58 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -39,6 +39,15 @@
#else
#define DO_8xx_CPU6(val, reg)
#endif
+
+/* DAR needs to be tagged with a known value so that the
+ * DataTLB Miss/Error and do_page_fault() can recognize a
+ * buggy dcbx instruction and workaround the problem.
+ * dcbf, dcbi, dcbst, dcbz instructions do not update DAR
+ * when trapping into a Data TLB Miss/Error. See
+ * DataStoreTLBMiss and DataTLBError for details
+ */
+
__HEAD
_ENTRY(_stext);
_ENTRY(_start);
@@ -222,6 +231,7 @@ DataAccess:
stw r10,_DSISR(r11)
mr r5,r10
mfspr r4,SPRN_DAR
+ stw r4,_DAR(r11)
EXC_XFER_EE_LITE(0x300, handle_page_fault)
/* Instruction access exception.
@@ -352,7 +362,7 @@ InstructionTLBMiss:
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
-2: li r11, 0x00f0
+ li r11, 0x00f0
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x2d80, r3)
mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
@@ -365,6 +375,19 @@ InstructionTLBMiss:
lwz r3, 8(r0)
#endif
rfi
+2:
+ mfspr r10, SPRN_SRR1
+ rlwinm r10,r10,0,5,3 /* clear bit 4(0x08000000) */
+ mtspr SPRN_SRR1, r10
+
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ b InstructionAccess
. = 0x1200
DataStoreTLBMiss:
@@ -428,10 +451,11 @@ DataStoreTLBMiss:
* set. All other Linux PTE bits control the behavior
* of the MMU.
*/
-2: li r11, 0x00f0
+ li r11, 0x00f0
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
+ mtspr SPRN_DAR, r11 /* Tag DAR */
mfspr r10, SPRN_M_TW /* Restore registers */
lwz r11, 0(r0)
@@ -441,7 +465,18 @@ DataStoreTLBMiss:
lwz r3, 8(r0)
#endif
rfi
+2:
+ li r10, 0
+ mtspr SPRN_DSISR, r10 /* Clear all bits */
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ b DataAccess
/* This is an instruction TLB error on the MPC8xx. This could be due
* to many reasons, such as executing guarded memory or illegal instruction
* addresses. There is nothing to do but handle a big time error fault.
@@ -492,6 +527,8 @@ DataTLBError:
* assuming we only use the dcbi instruction on kernel addresses.
*/
mfspr r10, SPRN_DAR
+ cmpwi cr0, r10, 0x00f0 /* check if DAR holds a tag */
+ beq- 2f
rlwinm r11, r10, 0, 0, 19
ori r11, r11, MD_EVALID
mfspr r10, SPRN_M_CASID
@@ -550,6 +587,7 @@ DataTLBError:
rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
DO_8xx_CPU6(0x3d80, r3)
mtspr SPRN_MD_RPN, r10 /* Update TLB entry */
+ mtspr SPRN_DAR, r11 /* Tag DAR */
mfspr r10, SPRN_M_TW /* Restore registers */
lwz r11, 0(r0)
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 17:23 ` Joakim Tjernlund
@ 2009-09-30 22:35 ` Benjamin Herrenschmidt
2009-10-01 7:05 ` Joakim Tjernlund
` (2 more replies)
0 siblings, 3 replies; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-30 22:35 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: linuxppc-dev, Rex Feany
> > Had a look at linus tree and there is something I don't understand.
> > Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> > that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> > 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> > works and top of tree does not, how can that be?
> >
> > To me it seems more likely that some mm change introduced between .31 and
> > top of tree is the culprit.
> > My patch addresses the problem described in the comment:
> > /* On 8xx, cache control instructions (particularly
> > * "dcbst" from flush_dcache_icache) fault as write
> > * operation if there is an unpopulated TLB entry
> > * for the address in question. To workaround that,
> > * we invalidate the TLB here, thus avoiding dcbst
> > * misbehaviour.
> > */
> > Now you are using this old fix to paper over some other bug or so I think.
>
> There is something fishy with the TLB status, looking into the mpc862 manual I
> don't see how it can work reliably. Need to dwell some more.
> Anyhow, I have incorporated some of my findings into a new patch,
> perhaps this will be the golden one?
Well, I still wonder about what whole unpopulated entry business.
>From what I can see, the TLB miss code will check _PAGE_PRESENT, and
when not set, it will -still- insert something into the TLB (unlike
all other CPU types that go straight to data access faults from there).
So we end up populating with those unpopulated entries that will then
cause us to take a DSI (or ISI) instead of a TLB miss the next time
around ... and so we need to remove them once we do that, no ? IE. Once
we have actually put a valid PTE in.
At least that's my understanding and why I believe we should always have
tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
in putting it into the 2 "filter" functions in the new code.
Well.. actually, the ptep_set_access_flags() will already do a
flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
really need here would be in set_pte_filter(), to do the same if we are
writing a PTE that has _PAGE_PRESENT, at least on 8xx.
But just unconditionally doing a tlbil_va() in both the filter functions
shouldn't harm and also fix the problem, though Rex seems to indicate
that is not the case.
Now, we -might- have something else broken on 8xx, hard to tell. You may
want to try to bisect, adding back the removed tlbil_va() as you go
backward, between .31 and upstream...
Cheers,
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 22:35 ` Benjamin Herrenschmidt
@ 2009-10-01 7:05 ` Joakim Tjernlund
2009-10-02 13:06 ` Joakim Tjernlund
2009-10-02 21:49 ` Scott Wood
2 siblings, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-01 7:05 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 01/10/2009 00:35:59:
>
>
> > > Had a look at linus tree and there is something I don't understand.
> > > Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> > > that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> > > 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> > > works and top of tree does not, how can that be?
> > >
> > > To me it seems more likely that some mm change introduced between .31 and
> > > top of tree is the culprit.
> > > My patch addresses the problem described in the comment:
> > > /* On 8xx, cache control instructions (particularly
> > > * "dcbst" from flush_dcache_icache) fault as write
> > > * operation if there is an unpopulated TLB entry
> > > * for the address in question. To workaround that,
> > > * we invalidate the TLB here, thus avoiding dcbst
> > > * misbehaviour.
> > > */
> > > Now you are using this old fix to paper over some other bug or so I think.
> >
> > There is something fishy with the TLB status, looking into the mpc862 manual I
> > don't see how it can work reliably. Need to dwell some more.
> > Anyhow, I have incorporated some of my findings into a new patch,
> > perhaps this will be the golden one?
>
> Well, I still wonder about what whole unpopulated entry business.
Yes, is a odd compared to the other ppc arch. I know why it is there
and I also know that there is alternative way to work around the problem
it is supposed to fix. I went back to the old way in my patch but it didn't
help. although I don't see why there is such a benefit to branch to DataAccess
directly, is it so expensive to to a DTLB error and then go to DataAccess?
>
> >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
No, it will branch to before that happens.
> when not set, it will -still- insert something into the TLB (unlike
> all other CPU types that go straight to data access faults from there).
Yes, a non valid pte so that you will trap to DTLB Error.
The other arch also muck around to get load/store bit etc. and
accroding to the 8xx user manual there are no such info available, not even DAR:
Table 6-14. Register Settings after a Data TLB Miss Exception
Register Setting
SRR0 Set to the EA of the instruction that caused the exception.
SRR1 1–4 0
10–15 0
Others Loaded from MSR[16-31]. SRR1[30] is cleared only by loading a zero from MSR[RI].
MSR IP No change
ME No change
LE Copied from the ILE setting of the interrupted process
Other 0
However from the past we know that DAR is avaliable (but not for dcbX insn though)
For a TLB Error there are:
Table 6-16. Register Settings after a Data TLB Error Exception
Register Setting
SRR0 Set to the EA of the instruction that caused the exception.
SRR1 1–4 0
10–15 0
Others—Loaded from MSR[16-31]. SRR1[30] is cleared only by loading a zero from MSR[RI].
MSR IP No change
ME No change
LE Copied from the ILE setting of the interrupted process
Other 0
DSISR 0 0
1 Set if the translation of an attempted access is not found in the translation tables. Otherwise, cleared
2–3 0
4 Set if the memory access is not permitted by the protection mechanism; otherwise cleared
5 0
6 1 for a store operation; 0 for a load operation.
7–31 0
DAR Set to the EA of the data access that caused the exception.
For completeness here are ITLB Miss/Error too:
Table 6-13. Register Settings after an Instruction TLB Miss Exception
Register Setting
SRR0 Set to the EA of the instruction that caused the exception.
SRR1 0–3 0
4 1
10 1
11–15 0
Others Loaded from MSR[16-31]. SRR1[30] is cleared only by loading a zero from MSR[RI].
MSR IP No change
ME No change
LE Copied from the ILE setting of the interrupted process
Other 0
Table 6-15. Register Settings after an Instruction TLB Error Exception
Register Setting
SRR0 Set to the EA of the instruction that caused the exception.
SRR1 Note: Only one of bits 1, 3, and 4 will be set.
1 1 if the translation of an attempted access is not in the translation tables. Otherwise 0
2 0
3 1 if the fetch access was to guarded memory when MSR[IR] = 1. Otherwise 0
4 1 if the access is not permitted by the protection mechanism; otherwise 0.
11–15 0
Others Loaded from MSR[16-31]. SRR1[30] is cleared only by loading a zero from MSR[RI].
MSR IP No change
ME No change
LE Copied from the ILE setting of the interrupted process
Other 0
>
> So we end up populating with those unpopulated entries that will then
> cause us to take a DSI (or ISI) instead of a TLB miss the next time
> around ... and so we need to remove them once we do that, no ? IE. Once
> we have actually put a valid PTE in.
>
> At least that's my understanding and why I believe we should always have
> tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> in putting it into the 2 "filter" functions in the new code.
>
> Well.. actually, the ptep_set_access_flags() will already do a
> flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> really need here would be in set_pte_filter(), to do the same if we are
> writing a PTE that has _PAGE_PRESENT, at least on 8xx.
I am pondering another idea. I suspect that this bit is set for non-valid ptes:
1 Set if the translation of an attempted access is not found in the translation tables. Otherwise, cleared
So perhaps we can test this bit in do_page_fault() and then do the
tlbil_va() directly?
>
> But just unconditionally doing a tlbil_va() in both the filter functions
> shouldn't harm and also fix the problem, though Rex seems to indicate
> that is not the case.
>
> Now, we -might- have something else broken on 8xx, hard to tell. You may
> want to try to bisect, adding back the removed tlbil_va() as you go
> backward, between .31 and upstream...
>
> Cheers,
> Ben.
>
>
>
>
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 22:35 ` Benjamin Herrenschmidt
2009-10-01 7:05 ` Joakim Tjernlund
@ 2009-10-02 13:06 ` Joakim Tjernlund
2009-10-02 18:10 ` Joakim Tjernlund
2009-10-02 21:49 ` Scott Wood
2 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-02 13:06 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 01/10/2009 00:35:59:
>
>
> > > Had a look at linus tree and there is something I don't understand.
> > > Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> > > that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> > > 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> > > works and top of tree does not, how can that be?
> > >
> > > To me it seems more likely that some mm change introduced between .31 and
> > > top of tree is the culprit.
> > > My patch addresses the problem described in the comment:
> > > /* On 8xx, cache control instructions (particularly
> > > * "dcbst" from flush_dcache_icache) fault as write
> > > * operation if there is an unpopulated TLB entry
> > > * for the address in question. To workaround that,
> > > * we invalidate the TLB here, thus avoiding dcbst
> > > * misbehaviour.
> > > */
> > > Now you are using this old fix to paper over some other bug or so I think.
> >
> > There is something fishy with the TLB status, looking into the mpc862 manual I
> > don't see how it can work reliably. Need to dwell some more.
> > Anyhow, I have incorporated some of my findings into a new patch,
> > perhaps this will be the golden one?
>
> Well, I still wonder about what whole unpopulated entry business.
>
> >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> when not set, it will -still- insert something into the TLB (unlike
> all other CPU types that go straight to data access faults from there).
>
> So we end up populating with those unpopulated entries that will then
> cause us to take a DSI (or ISI) instead of a TLB miss the next time
> around ... and so we need to remove them once we do that, no ? IE. Once
> we have actually put a valid PTE in.
>
> At least that's my understanding and why I believe we should always have
> tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> in putting it into the 2 "filter" functions in the new code.
>
> Well.. actually, the ptep_set_access_flags() will already do a
> flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> really need here would be in set_pte_filter(), to do the same if we are
> writing a PTE that has _PAGE_PRESENT, at least on 8xx.
>
> But just unconditionally doing a tlbil_va() in both the filter functions
> shouldn't harm and also fix the problem, though Rex seems to indicate
> that is not the case.
>
> Now, we -might- have something else broken on 8xx, hard to tell. You may
> want to try to bisect, adding back the removed tlbil_va() as you go
> backward, between .31 and upstream...
Found something odd, perhaps Rex can test?
I tested this on my old 2.4 and it worked well.
Jocke
>From dd0f213d51437bb48ff42c1c0e690f243266c133 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Fri, 2 Oct 2009 14:59:21 +0200
Subject: [PATCH] powerpc, 8xx: DTLB Error must check for more errors.
DataTLBError do only check for store op's. I think this
is too narrow. Protection and and no translation errors
needs to be checked too.
---
arch/powerpc/kernel/head_8xx.S | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..ef8a5ce 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -472,7 +472,7 @@ DataTLBError:
/* First, make sure this was a store operation.
*/
mfspr r10, SPRN_DSISR
- andis. r11, r10, 0x0200 /* If set, indicates store op */
+ andis. r11, r10, 0x4a00 /* If set, store, protection or no trans. */
beq 2f
/* The EA of a data TLB miss is automatically stored in the MD_EPN
--
1.6.4.4
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-02 13:06 ` Joakim Tjernlund
@ 2009-10-02 18:10 ` Joakim Tjernlund
0 siblings, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-02 18:10 UTC (permalink / raw)
Cc: linuxppc-dev
>
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 01/10/2009 00:35:59:
> >
> >
> > > > Had a look at linus tree and there is something I don't understand.
> > > > Your fix, e0908085fc2391c85b85fb814ae1df377c8e0dcb, fixes a problem
> > > > that was introduced by 8d30c14cab30d405a05f2aaceda1e9ad57800f36 but
> > > > 8d30c14cab30d405a05f2aaceda1e9ad57800f36 was included in .31 and .31
> > > > works and top of tree does not, how can that be?
> > > >
> > > > To me it seems more likely that some mm change introduced between .31 and
> > > > top of tree is the culprit.
> > > > My patch addresses the problem described in the comment:
> > > > /* On 8xx, cache control instructions (particularly
> > > > * "dcbst" from flush_dcache_icache) fault as write
> > > > * operation if there is an unpopulated TLB entry
> > > > * for the address in question. To workaround that,
> > > > * we invalidate the TLB here, thus avoiding dcbst
> > > > * misbehaviour.
> > > > */
> > > > Now you are using this old fix to paper over some other bug or so I think.
> > >
> > > There is something fishy with the TLB status, looking into the mpc862 manual I
> > > don't see how it can work reliably. Need to dwell some more.
> > > Anyhow, I have incorporated some of my findings into a new patch,
> > > perhaps this will be the golden one?
> >
> > Well, I still wonder about what whole unpopulated entry business.
> >
> > >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> > when not set, it will -still- insert something into the TLB (unlike
> > all other CPU types that go straight to data access faults from there).
> >
> > So we end up populating with those unpopulated entries that will then
> > cause us to take a DSI (or ISI) instead of a TLB miss the next time
> > around ... and so we need to remove them once we do that, no ? IE. Once
> > we have actually put a valid PTE in.
> >
> > At least that's my understanding and why I believe we should always have
> > tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> > in putting it into the 2 "filter" functions in the new code.
> >
> > Well.. actually, the ptep_set_access_flags() will already do a
> > flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> > really need here would be in set_pte_filter(), to do the same if we are
> > writing a PTE that has _PAGE_PRESENT, at least on 8xx.
> >
> > But just unconditionally doing a tlbil_va() in both the filter functions
> > shouldn't harm and also fix the problem, though Rex seems to indicate
> > that is not the case.
> >
> > Now, we -might- have something else broken on 8xx, hard to tell. You may
> > want to try to bisect, adding back the removed tlbil_va() as you go
> > backward, between .31 and upstream...
>
> Found something odd, perhaps Rex can test?
> I tested this on my old 2.4 and it worked well.
That was not quite correct, sorry.
Try this instead:
>From c5f1a70561730b8a443f7081fbd9c5b023147043 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Fri, 2 Oct 2009 14:59:21 +0200
Subject: [PATCH] powerpc, 8xx: DTLB Error must check for more errors.
DataTLBError currently does:
if ((err & 0x02000000) == 0)
DSI();
This won't handle a store with no valid translation.
Change this to
if ((err & 0x48000000) != 0)
DSI();
that is, branch to DSI if either !permission or
!translation.
---
arch/powerpc/kernel/head_8xx.S | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..118bb05 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -472,8 +472,8 @@ DataTLBError:
/* First, make sure this was a store operation.
*/
mfspr r10, SPRN_DSISR
- andis. r11, r10, 0x0200 /* If set, indicates store op */
- beq 2f
+ andis. r11, r10, 0x4800 /* no translation, no permission. */
+ bne 2f /* branch if either is set */
/* The EA of a data TLB miss is automatically stored in the MD_EPN
* register. The EA of a data TLB error is automatically stored in
--
1.6.4.4
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-09-30 22:35 ` Benjamin Herrenschmidt
2009-10-01 7:05 ` Joakim Tjernlund
2009-10-02 13:06 ` Joakim Tjernlund
@ 2009-10-02 21:49 ` Scott Wood
2009-10-02 22:04 ` Benjamin Herrenschmidt
` (2 more replies)
2 siblings, 3 replies; 58+ messages in thread
From: Scott Wood @ 2009-10-02 21:49 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
On Thu, Oct 01, 2009 at 08:35:59AM +1000, Benjamin Herrenschmidt wrote:
> >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> when not set, it will -still- insert something into the TLB (unlike
> all other CPU types that go straight to data access faults from there).
>
> So we end up populating with those unpopulated entries that will then
> cause us to take a DSI (or ISI) instead of a TLB miss the next time
> around ... and so we need to remove them once we do that, no ? IE. Once
> we have actually put a valid PTE in.
>
> At least that's my understanding and why I believe we should always have
> tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> in putting it into the 2 "filter" functions in the new code.
>
> Well.. actually, the ptep_set_access_flags() will already do a
> flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> really need here would be in set_pte_filter(), to do the same if we are
> writing a PTE that has _PAGE_PRESENT, at least on 8xx.
>
> But just unconditionally doing a tlbil_va() in both the filter functions
> shouldn't harm and also fix the problem, though Rex seems to indicate
> that is not the case.
Adding a tlbil_va to do_page_fault makes the problem go away for me (on
top of your "merge" branch) -- none of the other changes in this thread
do (assuming I didn't miss any). FWIW, when it gets stuck on a fault,
DSISR is 0xc0000000, and handle_mm_fault returns zero.
-Scott
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-02 21:49 ` Scott Wood
@ 2009-10-02 22:04 ` Benjamin Herrenschmidt
2009-10-05 19:28 ` Scott Wood
2009-10-03 8:05 ` Joakim Tjernlund
2009-10-04 20:10 ` Joakim Tjernlund
2 siblings, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-02 22:04 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, Rex Feany
On Fri, 2009-10-02 at 16:49 -0500, Scott Wood wrote:
> Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> top of your "merge" branch) -- none of the other changes in this thread
> do (assuming I didn't miss any). FWIW, when it gets stuck on a fault,
> DSISR is 0xc0000000, and handle_mm_fault returns zero.
But in that case, it should hit ptep_set_access_flags() (via
handle_mm_fault) and from there call tlbil_va (provided we add a call to
it in the relevant filter function which I proposed initially), no ?
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-02 21:49 ` Scott Wood
2009-10-02 22:04 ` Benjamin Herrenschmidt
@ 2009-10-03 8:05 ` Joakim Tjernlund
2009-10-03 8:31 ` Benjamin Herrenschmidt
2009-10-05 18:24 ` Scott Wood
2009-10-04 20:10 ` Joakim Tjernlund
2 siblings, 2 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-03 8:05 UTC (permalink / raw)
To: Scott Wood; +Cc: Rex Feany, linuxppc-dev
Scott Wood <scottwood@freescale.com> wrote on 02/10/2009 23:49:49:
>
> On Thu, Oct 01, 2009 at 08:35:59AM +1000, Benjamin Herrenschmidt wrote:
> > >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> > when not set, it will -still- insert something into the TLB (unlike
> > all other CPU types that go straight to data access faults from there).
> >
> > So we end up populating with those unpopulated entries that will then
> > cause us to take a DSI (or ISI) instead of a TLB miss the next time
> > around ... and so we need to remove them once we do that, no ? IE. Once
> > we have actually put a valid PTE in.
> >
> > At least that's my understanding and why I believe we should always have
> > tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> > in putting it into the 2 "filter" functions in the new code.
> >
> > Well.. actually, the ptep_set_access_flags() will already do a
> > flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> > really need here would be in set_pte_filter(), to do the same if we are
> > writing a PTE that has _PAGE_PRESENT, at least on 8xx.
> >
> > But just unconditionally doing a tlbil_va() in both the filter functions
> > shouldn't harm and also fix the problem, though Rex seems to indicate
> > that is not the case.
>
> Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> top of your "merge" branch) -- none of the other changes in this thread
> do (assuming I didn't miss any). FWIW, when it gets stuck on a fault,
> DSISR is 0xc0000000, and handle_mm_fault returns zero.
OK, that is a no translation error for a load (assuming trap is 0x400)
Do you know what insn this is? I am adding a patch last in this mail
for catching dcbX insn in do_page_fault()
You need the patch I posted yesterday too:
>From c5f1a70561730b8a443f7081fbd9c5b023147043 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Fri, 2 Oct 2009 14:59:21 +0200
Subject: [PATCH] powerpc, 8xx: DTLB Error must check for more errors.
DataTLBError currently does:
if ((err & 0x02000000) == 0)
DSI();
This won't handle a store with no valid translation.
Change this to
if ((err & 0x48000000) != 0)
DSI();
that is, branch to DSI if either !permission or
!translation.
---
arch/powerpc/kernel/head_8xx.S | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52ff8c5..118bb05 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -472,8 +472,8 @@ DataTLBError:
/* First, make sure this was a store operation.
*/
mfspr r10, SPRN_DSISR
- andis. r11, r10, 0x0200 /* If set, indicates store op */
- beq 2f
+ andis. r11, r10, 0x4800 /* no translation, no permission. */
+ bne 2f /* branch if either is set */
/* The EA of a data TLB miss is automatically stored in the MD_EPN
* register. The EA of a data TLB error is automatically stored in
--
1.6.4.4
Cannot shake the feeling that it this snip of code that causes it
lwz r11, 0(r10) /* Get the level 1 entry */
rlwinm. r10, r11,0,0,19 /* Extract page descriptor page address */
beq 2f /* If zero, don't try to find a pte */
Perhaps we can do something better? I still feel that we need to
force a TLB Error as the TLBMiss does not set DSISR so we have no way of
knowing if it is an load or store.
Jocke
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 7699394..c33c6de 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -139,6 +139,88 @@ int __kprobes do_page_fault(struct pt_regs *regs, unsigned long address,
#else
is_write = error_code & ESR_DST;
#endif /* CONFIG_4xx || CONFIG_BOOKE */
+#if 1 /* defined(CONFIG_8xx)*/
+#define DEBUG_DCBX
+/*
+ Work around DTLB Miss/Error, as these do not update
+ DAR for dcbf, dcbi, dcbst, dcbz and icbi instructions
+ This relies on every exception tagging DAR with 0xf0
+ before returning (rfi)
+ DAR is passed as 'address' to this function.
+ */
+ {
+ unsigned long ra, rb, dar, insn;
+#ifdef DEBUG_DCBX
+ const char *istr = NULL;
+
+ insn = *((unsigned long *)regs->nip);
+ if (((insn >> (31-5)) & 0x3f) == 31) {
+ if (((insn >> 1) & 0x3ff) == 1014) /* dcbz ? 0x3f6 */
+ istr = "dcbz";
+ if (((insn >> 1) & 0x3ff) == 86) /* dcbf ? 0x56 */
+ istr = "dcbf";
+ if (((insn >> 1) & 0x3ff) == 470) /* dcbi ? 0x1d6 */
+ istr = "dcbi";
+ if (((insn >> 1) & 0x3ff) == 54) /* dcbst ? 0x36 */
+ istr = "dcbst";
+ if (((insn >> 1) & 0x3ff) == 982) /* icbi ? 0x3d6 */
+ istr = "icbi";
+ if (istr) {
+ ra = (insn >> (31-15)) & 0x1f; /* Reg RA */
+ rb = (insn >> (31-20)) & 0x1f; /* Reg RB */
+ dar = regs->gpr[rb];
+ if (ra)
+ dar += regs->gpr[ra];
+ if (dar != address && address != 0x00f0 && trap == 0x300)
+ printk(KERN_CRIT "%s: address:%lx, dar:%lx!\n", istr, address, dar);
+ if (!strcmp(istr, "dcbst") && is_write) {
+ printk(KERN_CRIT "dcbst R%ld,R%ld = %lx as a store, fixing!\n",
+ ra, rb, dar);
+ is_write = 0;
+ }
+
+ if (trap == 0x300 && address != dar) {
+ __asm__ ("mtdar %0" : : "r" (dar));
+ return 0;
+ }
+ }
+ }
+#endif
+ if (address == 0x00f0 && trap == 0x300) {
+ pte_t *ptep;
+
+ /* This is from a dcbX or icbi insn gone bad, these
+ * insn do not set DAR so we have to do it here instead */
+ insn = *((unsigned long *)regs->nip);
+
+ ra = (insn >> (31-15)) & 0x1f; /* Reg RA */
+ rb = (insn >> (31-20)) & 0x1f; /* Reg RB */
+ dar = regs->gpr[rb];
+ if (ra)
+ dar += regs->gpr[ra];
+ /* Set DAR to correct address for the DTLB Miss/Error handler
+ * to redo the TLB exception. This time with correct address */
+ __asm__ ("mtdar %0" : : "r" (dar));
+#ifdef DEBUG_DCBX
+ printk(KERN_CRIT "trap:%x address:%lx, dar:%lx,err:%lx %s\n",
+ trap, address, dar, error_code, istr);
+#endif
+ address = dar;
+#if 1
+ if (is_write && get_pteptr(mm, dar, &ptep, NULL)) {
+ pte_t my_pte = *ptep;
+
+ if (pte_present(my_pte) && pte_write(my_pte)) {
+ pte_val(my_pte) |= _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE;
+ set_pte_at(mm, dar, ptep, my_pte);
+ }
+ }
+#else
+ return 0;
+#endif
+ }
+ }
+#endif
if (notify_page_fault(regs))
return 0;
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-03 8:05 ` Joakim Tjernlund
@ 2009-10-03 8:31 ` Benjamin Herrenschmidt
2009-10-03 9:24 ` Joakim Tjernlund
2009-10-05 18:24 ` Scott Wood
1 sibling, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-03 8:31 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Rex Feany
On Sat, 2009-10-03 at 10:05 +0200, Joakim Tjernlund wrote:
> Cannot shake the feeling that it this snip of code that causes it
> lwz r11, 0(r10) /* Get the level 1 entry */
> rlwinm. r10, r11,0,0,19 /* Extract page descriptor page
> address */
> beq 2f /* If zero, don't try to find a pte */
> Perhaps we can do something better? I still feel that we need to
> force a TLB Error as the TLBMiss does not set DSISR so we have no way
> of
> knowing if it is an load or store.
Can't we manufacture a DSISR and branch to the right function ?
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-03 8:31 ` Benjamin Herrenschmidt
@ 2009-10-03 9:24 ` Joakim Tjernlund
2009-10-03 10:57 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-03 9:24 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 03/10/2009 10:31:18:
>
> On Sat, 2009-10-03 at 10:05 +0200, Joakim Tjernlund wrote:
> > Cannot shake the feeling that it this snip of code that causes it
> > lwz r11, 0(r10) /* Get the level 1 entry */
> > rlwinm. r10, r11,0,0,19 /* Extract page descriptor page
> > address */
> > beq 2f /* If zero, don't try to find a pte */
> > Perhaps we can do something better? I still feel that we need to
> > force a TLB Error as the TLBMiss does not set DSISR so we have no way
> > of
> > knowing if it is an load or store.
>
> Can't we manufacture a DSISR and branch to the right function ?
Not if we want know if it is a load or store. There is no info to manufacture
a DSISR from. The best we can do here is try getting the RPN physical page
number correct. Perhaps something like this will do:
/* Copy 20 msb from MD_EPN to r20 to get the correct page
* number. Do not rely on DAR since the dcxx instructions fails
* to update DAR when they cause a DTLB Miss
*/
mfspr r21, MD_EPN
li r20, 0x0
rlwimi r20, r21, 0, 0, 19
Then go back and set the RPN accordingly.
The 8xx is different as as it will force a TLB error every time
it needs to deal with a page fault.
I suspect adding
if (!ret)
_tlbil_va(address);
in do_page_fault() will do the trick too.
So yes, there is a missing _tlbil_va() missing for 8xx somewhere
but there is something more too.
Maybe your new filter functions and my
powerpc, 8xx: DTLB Error must check for more errors.
will do the trick?
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-03 9:24 ` Joakim Tjernlund
@ 2009-10-03 10:57 ` Benjamin Herrenschmidt
2009-10-03 11:47 ` Joakim Tjernlund
2009-10-04 8:35 ` Joakim Tjernlund
0 siblings, 2 replies; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-03 10:57 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Rex Feany
On Sat, 2009-10-03 at 11:24 +0200, Joakim Tjernlund wrote:
>
> So yes, there is a missing _tlbil_va() missing for 8xx somewhere
> but there is something more too.
> Maybe your new filter functions and my
> powerpc, 8xx: DTLB Error must check for more errors.
> will do the trick?
Well, if we can't tell between a load and a store on a TLB miss, then
we should probably let it create an unpopulated entry in all cases,
so that we do take a proper DSI/ISI the second time around, which
would then tell us where we come from...
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-03 10:57 ` Benjamin Herrenschmidt
@ 2009-10-03 11:47 ` Joakim Tjernlund
2009-10-04 8:35 ` Joakim Tjernlund
1 sibling, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-03 11:47 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 03/10/2009 12:57:28:
> On Sat, 2009-10-03 at 11:24 +0200, Joakim Tjernlund wrote:
> >
> > So yes, there is a missing _tlbil_va() missing for 8xx somewhere
> > but there is something more too.
> > Maybe your new filter functions and my
> > powerpc, 8xx: DTLB Error must check for more errors.
> > will do the trick?
>
> Well, if we can't tell between a load and a store on a TLB miss, then
> we should probably let it create an unpopulated entry in all cases,
> so that we do take a proper DSI/ISI the second time around, which
> would then tell us where we come from...
Not quite sure what you mean here?
Always branch to DSI at TLB Miss and create a unpopulated
entry? That does not feel right.
The current method is better. The only odd thing is
the null pmd entry and what to load into RPN. I am not convinced
that it is a problem but I would like to see a generic impl. of
a null pmd and use that instead. A 8xx impl. of that
looks like this(tested on 2.4, works fine)
lwz r11, 0(r10) /* Get the level 1 entry */
rlwinm. r10, r11,0,0,19 /* Extract page descriptor page address */
bne+ 4f /* If zero, use a null pmd */
lis r11, null_pmd_dir@h
ori r11, r11, null_pmd_dir@l
4: tophys(r11,r11)
...
.data
.globl null_pmd_dir
null_pmd_dir:
.space 4096
If the pmd_none() and friends are updated to test for a null_pmd_dir
we can loose the test above and save 4 insn :)
Anyhow did you post a patch(can't find one) about your suggested
filter functions for 8xx? I sure would like to see one :)
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-03 10:57 ` Benjamin Herrenschmidt
2009-10-03 11:47 ` Joakim Tjernlund
@ 2009-10-04 8:35 ` Joakim Tjernlund
2009-10-04 20:26 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-04 8:35 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 03/10/2009 12:57:28:
>
> On Sat, 2009-10-03 at 11:24 +0200, Joakim Tjernlund wrote:
> >
> > So yes, there is a missing _tlbil_va() missing for 8xx somewhere
> > but there is something more too.
> > Maybe your new filter functions and my
> > powerpc, 8xx: DTLB Error must check for more errors.
> > will do the trick?
>
> Well, if we can't tell between a load and a store on a TLB miss, then
> we should probably let it create an unpopulated entry in all cases,
> so that we do take a proper DSI/ISI the second time around, which
> would then tell us where we come from...
While looking closer on 8xx TLB handling I noticed we were taking
an extra TLB Error when making a page dirty. Tracked it down to:
static inline pte_t pte_mkdirty(pte_t pte) {
pte_val(pte) |= _PAGE_DIRTY; return pte; }
pte_mkdirty does not set HWWRITE thus forcing a new
TLB error to clear it. Adding HWWRITE to pte_mkdirty fixes the
problem.
Looking at (especially pte_mkclean):
static inline pte_t pte_wrprotect(pte_t pte) {
pte_val(pte) &= ~(_PAGE_RW | _PAGE_HWWRITE); return pte; }
static inline pte_t pte_mkclean(pte_t pte) {
pte_val(pte) &= ~(_PAGE_DIRTY | _PAGE_HWWRITE); return pte; }
it looks like a mistake not to include HWWRITE in pte_mkdirty(),
what do you think?
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-02 21:49 ` Scott Wood
2009-10-02 22:04 ` Benjamin Herrenschmidt
2009-10-03 8:05 ` Joakim Tjernlund
@ 2009-10-04 20:10 ` Joakim Tjernlund
2009-10-04 20:28 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-04 20:10 UTC (permalink / raw)
To: Scott Wood, Rex Feany; +Cc: linuxppc-dev
Scott Wood <scottwood@freescale.com> wrote on 02/10/2009 23:49:49:
>
> On Thu, Oct 01, 2009 at 08:35:59AM +1000, Benjamin Herrenschmidt wrote:
> > >From what I can see, the TLB miss code will check _PAGE_PRESENT, and
> > when not set, it will -still- insert something into the TLB (unlike
> > all other CPU types that go straight to data access faults from there).
> >
> > So we end up populating with those unpopulated entries that will then
> > cause us to take a DSI (or ISI) instead of a TLB miss the next time
> > around ... and so we need to remove them once we do that, no ? IE. Once
> > we have actually put a valid PTE in.
> >
> > At least that's my understanding and why I believe we should always have
> > tlbil_va() in set_pte() and ptep_set_access_flags(), which boils down
> > in putting it into the 2 "filter" functions in the new code.
> >
> > Well.. actually, the ptep_set_access_flags() will already do a
> > flush_tlb_page_nohash() when the PTE is changed. So I suppose all we
> > really need here would be in set_pte_filter(), to do the same if we are
> > writing a PTE that has _PAGE_PRESENT, at least on 8xx.
> >
> > But just unconditionally doing a tlbil_va() in both the filter functions
> > shouldn't harm and also fix the problem, though Rex seems to indicate
> > that is not the case.
>
> Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> top of your "merge" branch) -- none of the other changes in this thread
> do (assuming I didn't miss any). FWIW, when it gets stuck on a fault,
> DSISR is 0xc0000000, and handle_mm_fault returns zero.
Scott and Rex
I have managed to update the TLB code to make proper use of dirty and accessed states.
Advantages are:
- I/D TLB Miss never needs to write to the linux pte, saving a few cycles
- Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.
- _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
and there will be no extra DTLB Error to actually set the changed bit
when a page has been made dirty.
- Proper RO/RW mapping of user space.
Cons:
- 4 more insn in TLB Miss handlers, but the since the linux pte isn't
written it should still be a win.
However, I did this on my 2.4 tree but I can port it to 2.6 if you guys
can test it for me.
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-04 8:35 ` Joakim Tjernlund
@ 2009-10-04 20:26 ` Benjamin Herrenschmidt
2009-10-04 20:38 ` Joakim Tjernlund
0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-04 20:26 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Rex Feany
On Sun, 2009-10-04 at 10:35 +0200, Joakim Tjernlund wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 03/10/2009 12:57:28:
> >
> > On Sat, 2009-10-03 at 11:24 +0200, Joakim Tjernlund wrote:
> > >
> > > So yes, there is a missing _tlbil_va() missing for 8xx somewhere
> > > but there is something more too.
> > > Maybe your new filter functions and my
> > > powerpc, 8xx: DTLB Error must check for more errors.
> > > will do the trick?
> >
> > Well, if we can't tell between a load and a store on a TLB miss, then
> > we should probably let it create an unpopulated entry in all cases,
> > so that we do take a proper DSI/ISI the second time around, which
> > would then tell us where we come from...
>
> While looking closer on 8xx TLB handling I noticed we were taking
> an extra TLB Error when making a page dirty. Tracked it down to:
> static inline pte_t pte_mkdirty(pte_t pte) {
> pte_val(pte) |= _PAGE_DIRTY; return pte; }
> pte_mkdirty does not set HWWRITE thus forcing a new
> TLB error to clear it. Adding HWWRITE to pte_mkdirty fixes the
> problem.
We should just get rid of HWWRITE like I did for 44x and BookE.
> Looking at (especially pte_mkclean):
> static inline pte_t pte_wrprotect(pte_t pte) {
> pte_val(pte) &= ~(_PAGE_RW | _PAGE_HWWRITE); return pte; }
> static inline pte_t pte_mkclean(pte_t pte) {
> pte_val(pte) &= ~(_PAGE_DIRTY | _PAGE_HWWRITE); return pte; }
>
> it looks like a mistake not to include HWWRITE in pte_mkdirty(),
> what do you think?
Maybe yes. No big deal right now, we have more important problems
to fix no ? :-)
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-04 20:10 ` Joakim Tjernlund
@ 2009-10-04 20:28 ` Benjamin Herrenschmidt
2009-10-04 20:45 ` Joakim Tjernlund
` (2 more replies)
0 siblings, 3 replies; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-04 20:28 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Rex Feany
> I have managed to update the TLB code to make proper use of dirty and accessed states.
> Advantages are:
> - I/D TLB Miss never needs to write to the linux pte, saving a few cycles
That's good, that leaves us with only 40x to fix now. Also we can remove
atomic updates of PTEs for all non-hash. It's pointless on those CPUs
anyway.
> - Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.
No need for that neither.
ISI/DSI shouldn't touch the PTE. They should just fall back to C code
which takes care of it all.l
> - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> and there will be no extra DTLB Error to actually set the changed bit
> when a page has been made dirty.
> - Proper RO/RW mapping of user space.
>
> Cons:
> - 4 more insn in TLB Miss handlers, but the since the linux pte isn't
> written it should still be a win.
>
> However, I did this on my 2.4 tree but I can port it to 2.6 if you guys
> can test it for me.
Why don't you use and test 2.6 ? :-)
Cheers,
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-04 20:26 ` Benjamin Herrenschmidt
@ 2009-10-04 20:38 ` Joakim Tjernlund
0 siblings, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-04 20:38 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 04/10/2009 22:26:42:
> On Sun, 2009-10-04 at 10:35 +0200, Joakim Tjernlund wrote:
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 03/10/2009 12:57:28:
> > >
> > > On Sat, 2009-10-03 at 11:24 +0200, Joakim Tjernlund wrote:
> > > >
> > > > So yes, there is a missing _tlbil_va() missing for 8xx somewhere
> > > > but there is something more too.
> > > > Maybe your new filter functions and my
> > > > powerpc, 8xx: DTLB Error must check for more errors.
> > > > will do the trick?
> > >
> > > Well, if we can't tell between a load and a store on a TLB miss, then
> > > we should probably let it create an unpopulated entry in all cases,
> > > so that we do take a proper DSI/ISI the second time around, which
> > > would then tell us where we come from...
> >
> > While looking closer on 8xx TLB handling I noticed we were taking
> > an extra TLB Error when making a page dirty. Tracked it down to:
> > static inline pte_t pte_mkdirty(pte_t pte) {
> > pte_val(pte) |= _PAGE_DIRTY; return pte; }
> > pte_mkdirty does not set HWWRITE thus forcing a new
> > TLB error to clear it. Adding HWWRITE to pte_mkdirty fixes the
> > problem.
>
> We should just get rid of HWWRITE like I did for 44x and BookE.
I am trying :)
>
> > Looking at (especially pte_mkclean):
> > static inline pte_t pte_wrprotect(pte_t pte) {
> > pte_val(pte) &= ~(_PAGE_RW | _PAGE_HWWRITE); return pte; }
> > static inline pte_t pte_mkclean(pte_t pte) {
> > pte_val(pte) &= ~(_PAGE_DIRTY | _PAGE_HWWRITE); return pte; }
> >
> > it looks like a mistake not to include HWWRITE in pte_mkdirty(),
> > what do you think?
>
> Maybe yes. No big deal right now, we have more important problems
> to fix no ? :-)
The missing invalidate you guys need to work out. I have no clue
where to put it. If we are really lucky, getting rid of HWWRITE
might help :)
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-04 20:28 ` Benjamin Herrenschmidt
@ 2009-10-04 20:45 ` Joakim Tjernlund
2009-10-05 7:28 ` Joakim Tjernlund
2009-10-05 19:16 ` Joakim Tjernlund
2 siblings, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-04 20:45 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 04/10/2009 22:28:38:
>
> > I have managed to update the TLB code to make proper use of dirty and accessed states.
> > Advantages are:
> > - I/D TLB Miss never needs to write to the linux pte, saving a few cycles
>
> That's good, that leaves us with only 40x to fix now. Also we can remove
> atomic updates of PTEs for all non-hash. It's pointless on those CPUs
> anyway.
>
> > - Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.
>
> No need for that neither.
Since 8xx lacks HW support for ACCESSED, the only way is map
the page NoAccess and take a TLB Error on first access that sets
access bit (or bails to do_page_fault)
>
> ISI/DSI shouldn't touch the PTE. They should just fall back to C code
> which takes care of it all.l
Yes, that is what I do now(i.e I only read the pte). ISI and DSI is the
TLB Miss handlers on 8xx.
>
> > - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> > and there will be no extra DTLB Error to actually set the changed bit
> > when a page has been made dirty.
> > - Proper RO/RW mapping of user space.
> >
> > Cons:
> > - 4 more insn in TLB Miss handlers, but the since the linux pte isn't
> > written it should still be a win.
> >
> > However, I did this on my 2.4 tree but I can port it to 2.6 if you guys
> > can test it for me.
>
> Why don't you use and test 2.6 ? :-)
Because porting my 8xx board to 2.6 isn't going to be easy so
I havn't yet. One day I might when we can't get away with 2.4 on our
old boards.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-04 20:28 ` Benjamin Herrenschmidt
2009-10-04 20:45 ` Joakim Tjernlund
@ 2009-10-05 7:28 ` Joakim Tjernlund
2009-10-05 19:16 ` Joakim Tjernlund
2 siblings, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-05 7:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 04/10/2009 22:28:38:
>
>
> > I have managed to update the TLB code to make proper use of dirty and accessed states.
> > Advantages are:
> > - I/D TLB Miss never needs to write to the linux pte, saving a few cycles
>
> That's good, that leaves us with only 40x to fix now. Also we can remove
> atomic updates of PTEs for all non-hash. It's pointless on those CPUs
> anyway.
>
> > - Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.
>
> No need for that neither.
>
> ISI/DSI shouldn't touch the PTE. They should just fall back to C code
> which takes care of it all.l
>
> > - _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
> > and there will be no extra DTLB Error to actually set the changed bit
> > when a page has been made dirty.
> > - Proper RO/RW mapping of user space.
> >
> > Cons:
> > - 4 more insn in TLB Miss handlers, but the since the linux pte isn't
> > written it should still be a win.
> >
> > However, I did this on my 2.4 tree but I can port it to 2.6 if you guys
> > can test it for me.
So it was easy to update the patch for 2.6, this is on top
of "powerpc, 8xx: DTLB Error must check for more errors."
You probably need the extra tlbil_va(), but let us know if you
can get away without it.
Scott and Rex, please give this a spin. Comments welcome too :)
Jocke
>From 7880d402cc05dd6e27d8804218ee4c80a879403e Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Mon, 5 Oct 2009 09:08:31 +0200
Subject: [PATCH] 8xx: get rid of _PAGE_HWWRITE dependency in MMU.
Update the TLB asm to make proper use of _PAGE_DIRY and _PAGE_ACCESSED.
Pros:
- I/D TLB Miss never needs to write to the linux pte.
- _PAGE_ACCESSED is only set on I/D TLB Error fixing accounting
- _PAGE_DIRTY is mapped to 0x100, the changed bit, and is set directly
when a page has been made dirty.
- Proper RO/RW mapping of user space.
Cons:
- 4 more instructions in I/D TLB Miss, but the since the linux pte is
not written anymore, it should still be a win.
---
arch/powerpc/include/asm/pte-8xx.h | 9 +-
arch/powerpc/kernel/head_8xx.S | 163 ++++++++++++++++++++++++++----------
2 files changed, 122 insertions(+), 50 deletions(-)
diff --git a/arch/powerpc/include/asm/pte-8xx.h b/arch/powerpc/include/asm/pte-8xx.h
index 8c6e312..af541a2 100644
--- a/arch/powerpc/include/asm/pte-8xx.h
+++ b/arch/powerpc/include/asm/pte-8xx.h
@@ -32,22 +32,21 @@
#define _PAGE_FILE 0x0002 /* when !present: nonlinear file mapping */
#define _PAGE_NO_CACHE 0x0002 /* I: cache inhibit */
#define _PAGE_SHARED 0x0004 /* No ASID (context) compare */
+#define _PAGE_DIRTY 0x0100 /* C: page changed */
/* These five software bits must be masked out when the entry is loaded
* into the TLB.
*/
#define _PAGE_EXEC 0x0008 /* software: i-cache coherency required */
#define _PAGE_GUARDED 0x0010 /* software: guarded access */
-#define _PAGE_DIRTY 0x0020 /* software: page changed */
-#define _PAGE_RW 0x0040 /* software: user write access allowed */
-#define _PAGE_ACCESSED 0x0080 /* software: page referenced */
+#define _PAGE_USER 0x0020 /* software: User space access */
/* Setting any bits in the nibble with the follow two controls will
* require a TLB exception handler change. It is assumed unused bits
* are always zero.
*/
-#define _PAGE_HWWRITE 0x0100 /* h/w write enable: never set in Linux PTE */
-#define _PAGE_USER 0x0800 /* One of the PP bits, the other is USER&~RW */
+#define _PAGE_RW 0x0400 /* lsb PP bits */
+#define _PAGE_ACCESSED 0x0800 /* msb PP bits */
#define _PMD_PRESENT 0x0001
#define _PMD_BAD 0x0ff0
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 118bb05..e111d2f 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -333,21 +333,15 @@ InstructionTLBMiss:
mfspr r11, SPRN_MD_TWC /* ....and get the pte address */
lwz r10, 0(r11) /* Get the pte */
-#ifdef CONFIG_SWAP
- /* do not set the _PAGE_ACCESSED bit of a non-present page */
- andi. r11, r10, _PAGE_PRESENT
- beq 4f
- ori r10, r10, _PAGE_ACCESSED
- mfspr r11, SPRN_MD_TWC /* get the pte address again */
- stw r10, 0(r11)
-4:
-#else
- ori r10, r10, _PAGE_ACCESSED
- stw r10, 0(r11)
-#endif
-
+ andi. r11, r10, _PAGE_USER | _PAGE_ACCESSED
+ cmpwi cr0, r11, _PAGE_USER | _PAGE_ACCESSED
+ beq- cr0, 5f /* branch if access allowed */
+ rlwinm r10, r10, 0, 22, 19 /* r20 &= ~(_PAGE_ACCESSED | _PAGE_RW) */
+ b 6f
+5: xori r10, r10, _PAGE_RW /* invert RW bit */
+6:
/* The Linux PTE won't go exactly into the MMU TLB.
- * Software indicator bits 21, 22 and 28 must be clear.
+ * Software indicator bit 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
* of the MMU.
@@ -409,21 +403,15 @@ DataStoreTLBMiss:
DO_8xx_CPU6(0x3b80, r3)
mtspr SPRN_MD_TWC, r11
-#ifdef CONFIG_SWAP
- /* do not set the _PAGE_ACCESSED bit of a non-present page */
- andi. r11, r10, _PAGE_PRESENT
- beq 4f
- ori r10, r10, _PAGE_ACCESSED
-4:
- /* and update pte in table */
-#else
- ori r10, r10, _PAGE_ACCESSED
-#endif
- mfspr r11, SPRN_MD_TWC /* get the pte address again */
- stw r10, 0(r11)
-
+ andi. r11, r10, _PAGE_USER | _PAGE_ACCESSED
+ cmpwi cr0, r11, _PAGE_USER | _PAGE_ACCESSED
+ beq- cr0, 5f /* branch if access allowed */
+ rlwinm r10, r10, 0, 22, 19 /* r20 &= ~(_PAGE_ACCESSED | _PAGE_RW) */
+ b 6f
+5: xori r10, r10, _PAGE_RW /* invert RW bit */
+6:
/* The Linux PTE won't go exactly into the MMU TLB.
- * Software indicator bits 21, 22 and 28 must be clear.
+ * Software indicator bit 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
* of the MMU.
@@ -448,6 +436,91 @@ DataStoreTLBMiss:
*/
. = 0x1300
InstructionTLBError:
+#ifdef CONFIG_8xx_CPU6
+ stw r3, 8(r0)
+#endif
+ DO_8xx_CPU6(0x3f80, r3)
+ mtspr SPRN_M_TW, r10 /* Save a couple of working registers */
+ mfcr r10
+ stw r10, 0(r0)
+ stw r11, 4(r0)
+
+ mfspr r11, SRR1
+ andis. r11, r11, 0x5000 /* no translation, guarded */
+ bne 2f
+
+ mfspr r10, SPRN_SRR0 /* Get effective address of fault */
+#ifdef CONFIG_8xx_CPU15
+ addi r11, r10, 0x1000
+ tlbie r11
+ addi r11, r10, -0x1000
+ tlbie r11
+#endif
+ DO_8xx_CPU6(0x3780, r3)
+ mtspr SPRN_MD_EPN, r10 /* Have to use MD_EPN for walk, MI_EPN can't */
+ mfspr r10, SPRN_M_TWB /* Get level 1 table entry address */
+
+ /* If we are faulting a kernel address, we have to use the
+ * kernel page tables.
+ */
+ andi. r11, r10, 0x0800 /* Address >= 0x80000000 */
+ beq 3f
+ lis r11, swapper_pg_dir@h
+ ori r11, r11, swapper_pg_dir@l
+ rlwimi r10, r11, 0, 2, 19
+3:
+ lwz r11, 0(r10) /* Get the level 1 entry */
+ rlwinm. r10, r11,0,0,19 /* Extract page descriptor page address */
+ beq 2f /* If zero, don't try to find a pte */
+
+ /* We have a pte table, so load the MI_TWC with the attributes
+ * for this "segment."
+ */
+ ori r11,r11,1 /* Set valid bit */
+ DO_8xx_CPU6(0x2b80, r3)
+ mtspr SPRN_MI_TWC, r11 /* Set segment attributes */
+ DO_8xx_CPU6(0x3b80, r3)
+ mtspr SPRN_MD_TWC, r11 /* Load pte table base address */
+
+ mfspr r11, SRR1
+ andi. r11, r11, 0x4000 /* MSR[PR] */
+ mfspr r11, MD_TWC /* ....and get the pte address */
+ lwz r10, 0(r11) /* Get the pte */
+ beq 5f /* Kernel access always OK */
+ andi. r11,r10, _PAGE_USER
+ beq 2f
+5: ori r10, r10, _PAGE_ACCESSED
+ mfspr r21, MD_TWC /* ....and get the pte address */
+ stw r10, 0(r11)
+ xori r10, r10, _PAGE_RW /* invert RW bit */
+
+ /* The Linux PTE won't go exactly into the MMU TLB.
+ * Software indicator bit 28 must be clear.
+ * Software indicator bits 24, 25, 26, and 27 must be
+ * set. All other Linux PTE bits control the behavior
+ * of the MMU.
+ */
+ li r11, 0x00f0
+ rlwimi r10, r11, 0, 24, 28 /* Set 24-27, clear 28 */
+ DO_8xx_CPU6(0x2d80, r3)
+ mtspr SPRN_MI_RPN, r10 /* Update TLB entry */
+
+ mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
+ rfi
+
+2: mfspr r10, SPRN_M_TW /* Restore registers */
+ lwz r11, 0(r0)
+ mtcr r11
+ lwz r11, 4(r0)
+#ifdef CONFIG_8xx_CPU6
+ lwz r3, 8(r0)
+#endif
b InstructionAccess
/* This is the data TLB error on the MPC8xx. This could be due to
@@ -472,8 +545,8 @@ DataTLBError:
/* First, make sure this was a store operation.
*/
mfspr r10, SPRN_DSISR
- andis. r11, r10, 0x4800 /* no translation, no permission. */
- bne 2f /* branch if either is set */
+ andis. r11, r10, 0x4000 /* no translation */
+ bne 2f /* branch if set */
/* The EA of a data TLB miss is automatically stored in the MD_EPN
* register. The EA of a data TLB error is automatically stored in
@@ -522,26 +595,26 @@ DataTLBError:
mfspr r11, SPRN_MD_TWC /* ....and get the pte address */
lwz r10, 0(r11) /* Get the pte */
- andi. r11, r10, _PAGE_RW /* Is it writeable? */
- beq 2f /* Bail out if not */
+ mfspr r11, SRR1
+ andi. r11, r11, 0x4000 /* MSR[PR] */
+ beq 5f /* Kernel access always OK */
+ andi. r11,r10, _PAGE_USER
+ beq 2f
+5: mfspr r11, DSISR
+ andis. r11, r11, 0x0200 /* store */
+ beq 6f
+ andi. r11, r10, _PAGE_RW /* writeable? */
+ beq 2f /* branch if not */
+ ori r10, r10, _PAGE_DIRTY | _PAGE_HWWRITE
+6: ori r10, r10, _PAGE_ACCESSED
- /* Update 'changed', among others.
- */
-#ifdef CONFIG_SWAP
- ori r10, r10, _PAGE_DIRTY|_PAGE_HWWRITE
- /* do not set the _PAGE_ACCESSED bit of a non-present page */
- andi. r11, r10, _PAGE_PRESENT
- beq 4f
- ori r10, r10, _PAGE_ACCESSED
-4:
-#else
- ori r10, r10, _PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_HWWRITE
-#endif
mfspr r11, SPRN_MD_TWC /* Get pte address again */
stw r10, 0(r11) /* and update pte in table */
+ xori r10, r10, _PAGE_RW /* Invert RW bit */
+
/* The Linux PTE won't go exactly into the MMU TLB.
- * Software indicator bits 21, 22 and 28 must be clear.
+ * Software indicator bit 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
* of the MMU.
--
1.6.4.4
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-03 8:05 ` Joakim Tjernlund
2009-10-03 8:31 ` Benjamin Herrenschmidt
@ 2009-10-05 18:24 ` Scott Wood
2009-10-05 18:50 ` Joakim Tjernlund
1 sibling, 1 reply; 58+ messages in thread
From: Scott Wood @ 2009-10-05 18:24 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Rex Feany, linuxppc-dev
On Sat, Oct 03, 2009 at 10:05:46AM +0200, Joakim Tjernlund wrote:
> Scott Wood <scottwood@freescale.com> wrote on 02/10/2009 23:49:49:
> > Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> > top of your "merge" branch) -- none of the other changes in this thread
> > do (assuming I didn't miss any). FWIW, when it gets stuck on a fault,
> > DSISR is 0xc0000000, and handle_mm_fault returns zero.
>
> OK, that is a no translation error for a load (assuming trap is 0x400)
Yes, 0x400.
> Do you know what insn this is?
Various lbz and lwz.
-Scott
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-05 18:24 ` Scott Wood
@ 2009-10-05 18:50 ` Joakim Tjernlund
0 siblings, 0 replies; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-05 18:50 UTC (permalink / raw)
To: Scott Wood; +Cc: Rex Feany, linuxppc-dev
Scott Wood <scottwood@freescale.com> wrote on 05/10/2009 20:24:29:
>
> On Sat, Oct 03, 2009 at 10:05:46AM +0200, Joakim Tjernlund wrote:
> > Scott Wood <scottwood@freescale.com> wrote on 02/10/2009 23:49:49:
> > > Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> > > top of your "merge" branch) -- none of the other changes in this thread
> > > do (assuming I didn't miss any). FWIW, when it gets stuck on a fault,
> > > DSISR is 0xc0000000, and handle_mm_fault returns zero.
> >
> > OK, that is a no translation error for a load (assuming trap is 0x400)
>
> Yes, 0x400.
>
> > Do you know what insn this is?
>
> Various lbz and lwz.
OK, this rules out any dcbX problem. Perhaps you can try any of Bens
ideas? Not sure what they were but hopefully you and Ben do :)
Preferably after you have tested my new patches :)
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-04 20:28 ` Benjamin Herrenschmidt
2009-10-04 20:45 ` Joakim Tjernlund
2009-10-05 7:28 ` Joakim Tjernlund
@ 2009-10-05 19:16 ` Joakim Tjernlund
2009-10-05 20:28 ` Benjamin Herrenschmidt
2 siblings, 1 reply; 58+ messages in thread
From: Joakim Tjernlund @ 2009-10-05 19:16 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: Scott Wood, linuxppc-dev, Rex Feany
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote on 04/10/2009 22:28:38:
>
>
> > I have managed to update the TLB code to make proper use of dirty and accessed states.
> > Advantages are:
> > - I/D TLB Miss never needs to write to the linux pte, saving a few cycles
>
> That's good, that leaves us with only 40x to fix now. Also we can remove
> atomic updates of PTEs for all non-hash. It's pointless on those CPUs
> anyway.
>
> > - Accessed is only set by I/D TLB Error, should be a plus when SWAP is used.
>
> No need for that neither.
>
> ISI/DSI shouldn't touch the PTE. They should just fall back to C code
> which takes care of it all.l
Ben, for my understanding: It seems to that the TLB Miss routines in
head_32.S are less than optimal as it too touches the pte every time
it hits. Would it not be better to test if ACCESSED and friends are already set
and skip storing the same pte over and over again?
Jocke
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-02 22:04 ` Benjamin Herrenschmidt
@ 2009-10-05 19:28 ` Scott Wood
2009-10-05 20:29 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 58+ messages in thread
From: Scott Wood @ 2009-10-05 19:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
On Sat, Oct 03, 2009 at 08:04:33AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2009-10-02 at 16:49 -0500, Scott Wood wrote:
> > Adding a tlbil_va to do_page_fault makes the problem go away for me (on
> > top of your "merge" branch) -- none of the other changes in this thread
> > do (assuming I didn't miss any). FWIW, when it gets stuck on a fault,
> > DSISR is 0xc0000000, and handle_mm_fault returns zero.
>
> But in that case, it should hit ptep_set_access_flags() (via
> handle_mm_fault) and from there call tlbil_va (provided we add a call to
> it in the relevant filter function which I proposed initially), no ?
Yes, it hits ptep_set_access_flags() and adding _tlbil_va there helps (I
didn't put it in the filter function, because that doesn't take address as a
parameter). I'd misread your suggestion as referring to the various changes
to set_pte_filter() that were posted.
As for unconditionally invalidating in set_pte_filter(), that doesn't boot
for me unless I check for kernel addresses -- I guess we populate page
tables that overlap the pinned large page region?
-Scott
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-05 19:16 ` Joakim Tjernlund
@ 2009-10-05 20:28 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-05 20:28 UTC (permalink / raw)
To: Joakim Tjernlund; +Cc: Scott Wood, linuxppc-dev, Rex Feany
On Mon, 2009-10-05 at 21:16 +0200, Joakim Tjernlund wrote:
> Ben, for my understanding: It seems to that the TLB Miss routines in
> head_32.S are less than optimal as it too touches the pte every time
> it hits. Would it not be better to test if ACCESSED and friends are
> already set
> and skip storing the same pte over and over again?
I wouldn't think it's a big deal, but then, the 32-bit hash code
has to also update _PAGE_HASHPTE etc... overall I wouldn't touch
it for now.
However, 8xx should instead look at what I do in recent versions
of head_44x.S or what Kumar did in head_fsl_booke.S
Cheers,
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-05 19:28 ` Scott Wood
@ 2009-10-05 20:29 ` Benjamin Herrenschmidt
2009-10-05 21:04 ` Scott Wood
0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Herrenschmidt @ 2009-10-05 20:29 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, Rex Feany
On Mon, 2009-10-05 at 14:28 -0500, Scott Wood wrote:
> Yes, it hits ptep_set_access_flags() and adding _tlbil_va there helps (I
> didn't put it in the filter function, because that doesn't take address as a
> parameter). I'd misread your suggestion as referring to the various changes
> to set_pte_filter() that were posted.
>
> As for unconditionally invalidating in set_pte_filter(), that doesn't boot
> for me unless I check for kernel addresses -- I guess we populate page
> tables that overlap the pinned large page region?
Good point. I think we do on 8xx. Does doing it in set_pte_filter()
(with the kernel check) makes any difference ? faster ? slower ? no
visible change ?
Cheers,
Ben.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite
2009-10-05 20:29 ` Benjamin Herrenschmidt
@ 2009-10-05 21:04 ` Scott Wood
0 siblings, 0 replies; 58+ messages in thread
From: Scott Wood @ 2009-10-05 21:04 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Rex Feany
Benjamin Herrenschmidt wrote:
> On Mon, 2009-10-05 at 14:28 -0500, Scott Wood wrote:
>> Yes, it hits ptep_set_access_flags() and adding _tlbil_va there helps (I
>> didn't put it in the filter function, because that doesn't take address as a
>> parameter). I'd misread your suggestion as referring to the various changes
>> to set_pte_filter() that were posted.
>>
>> As for unconditionally invalidating in set_pte_filter(), that doesn't boot
>> for me unless I check for kernel addresses -- I guess we populate page
>> tables that overlap the pinned large page region?
>
> Good point. I think we do on 8xx. Does doing it in set_pte_filter()
> (with the kernel check) makes any difference ? faster ? slower ? no
> visible change ?
ptep_set_access_flags() is enough to make the worst of it go away. I'm
having another intermittent problem that seems to happen more often
without the change in set_pte_filter(), but I've yet to narrow down
what's actually going on there.
-Scott
^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2009-10-05 21:04 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-24 0:45 [PATCH] powerpc/8xx: fix regression introduced by cache coherency rewrite Rex Feany
2009-09-24 6:44 ` Benjamin Herrenschmidt
2009-09-24 23:33 ` Rex Feany
2009-09-24 23:52 ` Benjamin Herrenschmidt
2009-09-25 1:35 ` Rex Feany
2009-09-25 1:51 ` Benjamin Herrenschmidt
2009-09-25 3:03 ` Benjamin Herrenschmidt
2009-09-25 8:31 ` Joakim Tjernlund
2009-09-25 9:47 ` Benjamin Herrenschmidt
2009-09-25 10:21 ` Joakim Tjernlund
2009-09-25 21:18 ` Rex Feany
2009-09-27 13:22 ` Joakim Tjernlund
2009-09-28 3:21 ` Benjamin Herrenschmidt
2009-09-28 7:22 ` Joakim Tjernlund
2009-09-28 7:34 ` Benjamin Herrenschmidt
2009-09-28 7:39 ` Joakim Tjernlund
2009-09-28 10:02 ` Joakim Tjernlund
2009-09-29 1:21 ` Rex Feany
2009-09-29 6:26 ` Joakim Tjernlund
2009-09-29 7:07 ` Benjamin Herrenschmidt
2009-09-29 8:13 ` Joakim Tjernlund
2009-09-29 8:16 ` Benjamin Herrenschmidt
2009-09-29 8:24 ` Joakim Tjernlund
2009-09-29 11:56 ` Joakim Tjernlund
2009-09-29 21:03 ` Rex Feany
2009-09-30 7:59 ` Joakim Tjernlund
2009-09-30 8:19 ` Joakim Tjernlund
2009-09-30 9:00 ` Rex Feany
2009-09-30 9:58 ` Joakim Tjernlund
2009-09-30 11:18 ` Joakim Tjernlund
2009-09-30 17:23 ` Joakim Tjernlund
2009-09-30 22:35 ` Benjamin Herrenschmidt
2009-10-01 7:05 ` Joakim Tjernlund
2009-10-02 13:06 ` Joakim Tjernlund
2009-10-02 18:10 ` Joakim Tjernlund
2009-10-02 21:49 ` Scott Wood
2009-10-02 22:04 ` Benjamin Herrenschmidt
2009-10-05 19:28 ` Scott Wood
2009-10-05 20:29 ` Benjamin Herrenschmidt
2009-10-05 21:04 ` Scott Wood
2009-10-03 8:05 ` Joakim Tjernlund
2009-10-03 8:31 ` Benjamin Herrenschmidt
2009-10-03 9:24 ` Joakim Tjernlund
2009-10-03 10:57 ` Benjamin Herrenschmidt
2009-10-03 11:47 ` Joakim Tjernlund
2009-10-04 8:35 ` Joakim Tjernlund
2009-10-04 20:26 ` Benjamin Herrenschmidt
2009-10-04 20:38 ` Joakim Tjernlund
2009-10-05 18:24 ` Scott Wood
2009-10-05 18:50 ` Joakim Tjernlund
2009-10-04 20:10 ` Joakim Tjernlund
2009-10-04 20:28 ` Benjamin Herrenschmidt
2009-10-04 20:45 ` Joakim Tjernlund
2009-10-05 7:28 ` Joakim Tjernlund
2009-10-05 19:16 ` Joakim Tjernlund
2009-10-05 20:28 ` Benjamin Herrenschmidt
2009-09-29 7:07 ` Benjamin Herrenschmidt
2009-09-29 21:09 ` Rex Feany
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.