All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Neuling <mikey@neuling.org>
To: Eric Van Hensbergen <ericvh@gmail.com>
Cc: linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	bg-linux@lists.anl-external.org
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU
Date: Fri, 20 May 2011 09:16:41 +1000	[thread overview]
Message-ID: <6241.1305847001@neuling.org> (raw)
In-Reply-To: <BANLkTi=rc5vZm3xAXHpHSxSH1wBWKhv92A@mail.gmail.com>

In message <BANLkTi=rc5vZm3xAXHpHSxSH1wBWKhv92A@mail.gmail.com> you wrote:
> Damnit Mikey, just after I hit send on [V2].....
> 
> On Thu, May 19, 2011 at 4:36 PM, Michael Neuling <mikey@neuling.org> wrote:
> > In message <BANLkTimKhApFW8G1-pG0u_9Kv2YB0R1O0w@mail.gmail.com> you wrote=
> :
> >> On Thu, May 19, 2011 at 12:58 AM, Michael Neuling <mikey@neuling.org> wr=
> ote=3D
> >> :
> >> > Eric,
> >> >
> >> >> This patch adds save/restore register support for the BlueGene/P
> >> >> double hummer FPU.
> >> >
> >> > What does this mean? =3DA0Needs more details here.
> >> >
> 
> okay, I've changed it a bit in [V2], if you want more I can do my best.

If you can describe the whole primary and secondary registers that'd be
cool.  ASCII art would be awesome! :-)


> 
> >> "Each of the two FPU units contains 32 64-bit floating point registers
> >> for a total of 64 FP registers per processor." which would seem to
> >> point to the kittyhawk version - but they have a second SAVE_32SFPRS
> >> for the second hummer. =A0What wasn't clear to me with this version of
> >> the code was whether or not they were doing something clever like
> >> saving the pair of the 64-bit FPU registers in a single 128-bit slot
> >> (seems plausible).
> >
> > Ok, sounds like there is 32*8*2 bytes of data, rather than the normal
> > 32*8 bytes for FP only (ignoring VSX). =A0If this is the case, then you'l=
> l
> > need make 'fpr' in the thread struct bigger which you can do by setting
> > TS_FPRWIDTH =3D 2 like we do for VSX.
> >
> 
> Okay, I'll incorporate that into [V3].
> 
> > If there is some instruction that saves and restores two of these at a
> > time (which LFPDX/STFPDX might I guess), then we can use that, otherwise
> > we'll have to do 64 saves/restores. =A0Double load/stores will be faster
> > I'm guessing though.
> 
> I assume that's true.
> 
> >
> > If two at a time, do we need to increase the index in pairs?
> >
> 
> I don't believe so.
> 
> >> If this is not the way to go, I can certainly
> >> switch the kittyhawk version of the patch with the *, the extra
> >> SAVE32SFPR and the extra double hummer specific storage space in the
> >> thread_struct.
> >
> > I'd be tempted to keep it in the 'fpr' part of the struct so you can
> > then access it with ptrace/signals/core dumps.
> >
> >> If it would help I can post an alternate version of the patch for
> >> discussion with the kittyhawk version.
> >
> > Sure.
> >
> 
> Kittyhawk version can be seen here:
> 
> http://git.kernel.org/?p=3Dlinux/kernel/git/ericvh/bluegene.git;a=3Dcommitd=
> iff;h=3D94bffe786324b9bd07187b11afd836e3ec362d95

OK.  I can see the secondary.

BTW I think it's buggy in a different way.

--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -51,6 +51,9 @@ _GLOBAL(load_up_fpu)
        toreal(r4)
        addi    r4,r4,THREAD            /* want last_task_used_math->thread */
        SAVE_32FPRS(0, r4)
+#ifdef CONFIG_DOUBLE_HUMMER
+        SAVE_32SFPRS(0, r10, r3)
+#endif /* CONFIG_DOUBLE_HUMMER */
        mffs    fr0
        stfd    fr0,THREAD_FPSCR(r4)
        PPC_LL  r5,PT_REGS(r4)
@@ -78,6 +81,9 @@ _GLOBAL(load_up_fpu)
        lfd     fr0,THREAD_FPSCR(r5)
        MTFSF_L(fr0)
        REST_32FPRS(0, r5)
+#ifdef  CONFIG_DOUBLE_HUMMER
+        REST_32SFPRS(0, r10, r5)
+#endif  /* CONFIG_DOUBLE_HUMMER */

REST uses r5 as the base in both cases (primary and secondary) which is
good.  SAVE uses r4 in the primary case and r3 in the secondary, which
is the wrong base. 

> 
> >
> > The most useful thing would be to see the instruction definition for
> > STFPDX/LFPDX.
> >
> 
> https://wiki.alcf.anl.gov/images/d/d9/PPC440_FP2_arch.pdf

stfpdx  does Primary->DW[EA]  Secondary->DW[EA+8]

I'm tempted to continue to use this and store the data in 'fpr' in the
thread_struct.  Doing it this way the primary register will continue to
be in the same location as before, which will mean ptrace etc will
continue to work at least for the primary.  The secondary will be
accessible using ptrace etc as well, but it'll be a bit of kludge
because it'll appear in the VSX location.

Putting the secondary register in a new area in the thread struct will
mean it's totally inaccessible for debugging without extra code in
ptrace.c/signals.c etc

We are going to need 16x spacing but you are doing to have to increase
the size using TS_FPRWIDTH = 2.

> >>
> >> >> =3DA0/*
> >> >> diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platfo=
> rms=3D
> >> /44x/
> >> > Kconfig
> >> >> index f485fc5f..24a515e 100644
> >> >> --- a/arch/powerpc/platforms/44x/Kconfig
> >> >> +++ b/arch/powerpc/platforms/44x/Kconfig
> >> >> @@ -169,6 +169,15 @@ config YOSEMITE
> >> >> =3DA0 =3DA0 =3DA0 help
> >> >> =3DA0 =3DA0 =3DA0 =3DA0 This option enables support for the AMCC PPC4=
> 40EP evalua=3D
> >> tion board.
> >> >>
> >> >> +config =3DA0 =3DA0 =3DA0 BGP
> >> >
> >> > Does this FPU feature have a specific name like double hammer? =3DA0I'=
> d
> >> > rather have the BGP defconfig depend on PPC_FPU_DOUBLE_HUMMER, or
> >> > something like that...
> >> >
> >> >> + =3DA0 =3DA0 bool "Blue Gene/P"
> >> >> + =3DA0 =3DA0 depends on 44x
> >> >> + =3DA0 =3DA0 default n
> >> >> + =3DA0 =3DA0 select PPC_FPU
> >> >> + =3DA0 =3DA0 select PPC_DOUBLE_FPU
> >> >
> >> > ... in fact, it seem you are doing something like these here but you
> >> > don't use PPC_DOUBLE_FPU anywhere?
> >> >
> >>
> >> A fair point. =A0I'm fine with calling it DOUBLE_HUMMER, but I wasn't su=
> re if
> >> that was "too internal" of a name for the kernel. =A0Let me know and
> >> I'll fix it up.
> >
> > What I'm mostly concerned about is disassociating it with a particular
> > CPU.
> >
> > If it has an external name, then all the better.
> >
> 
> Since it isn't available on other chips, shoudl it just be PPC_BGP_FPU
> or PPC_BGP_DOUBLE_FPU?

I'd probably still prefer it disassociated with the CPU name, but we are
really bike shedding here.  I'm not too fussed.

Mikey

WARNING: multiple messages have this Message-ID (diff)
From: Michael Neuling <mikey@neuling.org>
To: Eric Van Hensbergen <ericvh@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	bg-linux@lists.anl-external.org
Subject: Re: [PATCH 3/7] [RFC] add support for BlueGene/P FPU
Date: Fri, 20 May 2011 09:16:41 +1000	[thread overview]
Message-ID: <6241.1305847001@neuling.org> (raw)
In-Reply-To: <BANLkTi=rc5vZm3xAXHpHSxSH1wBWKhv92A@mail.gmail.com>

In message <BANLkTi=rc5vZm3xAXHpHSxSH1wBWKhv92A@mail.gmail.com> you wrote:
> Damnit Mikey, just after I hit send on [V2].....
> 
> On Thu, May 19, 2011 at 4:36 PM, Michael Neuling <mikey@neuling.org> wrote:
> > In message <BANLkTimKhApFW8G1-pG0u_9Kv2YB0R1O0w@mail.gmail.com> you wrote=
> :
> >> On Thu, May 19, 2011 at 12:58 AM, Michael Neuling <mikey@neuling.org> wr=
> ote=3D
> >> :
> >> > Eric,
> >> >
> >> >> This patch adds save/restore register support for the BlueGene/P
> >> >> double hummer FPU.
> >> >
> >> > What does this mean? =3DA0Needs more details here.
> >> >
> 
> okay, I've changed it a bit in [V2], if you want more I can do my best.

If you can describe the whole primary and secondary registers that'd be
cool.  ASCII art would be awesome! :-)


> 
> >> "Each of the two FPU units contains 32 64-bit floating point registers
> >> for a total of 64 FP registers per processor." which would seem to
> >> point to the kittyhawk version - but they have a second SAVE_32SFPRS
> >> for the second hummer. =A0What wasn't clear to me with this version of
> >> the code was whether or not they were doing something clever like
> >> saving the pair of the 64-bit FPU registers in a single 128-bit slot
> >> (seems plausible).
> >
> > Ok, sounds like there is 32*8*2 bytes of data, rather than the normal
> > 32*8 bytes for FP only (ignoring VSX). =A0If this is the case, then you'l=
> l
> > need make 'fpr' in the thread struct bigger which you can do by setting
> > TS_FPRWIDTH =3D 2 like we do for VSX.
> >
> 
> Okay, I'll incorporate that into [V3].
> 
> > If there is some instruction that saves and restores two of these at a
> > time (which LFPDX/STFPDX might I guess), then we can use that, otherwise
> > we'll have to do 64 saves/restores. =A0Double load/stores will be faster
> > I'm guessing though.
> 
> I assume that's true.
> 
> >
> > If two at a time, do we need to increase the index in pairs?
> >
> 
> I don't believe so.
> 
> >> If this is not the way to go, I can certainly
> >> switch the kittyhawk version of the patch with the *, the extra
> >> SAVE32SFPR and the extra double hummer specific storage space in the
> >> thread_struct.
> >
> > I'd be tempted to keep it in the 'fpr' part of the struct so you can
> > then access it with ptrace/signals/core dumps.
> >
> >> If it would help I can post an alternate version of the patch for
> >> discussion with the kittyhawk version.
> >
> > Sure.
> >
> 
> Kittyhawk version can be seen here:
> 
> http://git.kernel.org/?p=3Dlinux/kernel/git/ericvh/bluegene.git;a=3Dcommitd=
> iff;h=3D94bffe786324b9bd07187b11afd836e3ec362d95

OK.  I can see the secondary.

BTW I think it's buggy in a different way.

--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -51,6 +51,9 @@ _GLOBAL(load_up_fpu)
        toreal(r4)
        addi    r4,r4,THREAD            /* want last_task_used_math->thread */
        SAVE_32FPRS(0, r4)
+#ifdef CONFIG_DOUBLE_HUMMER
+        SAVE_32SFPRS(0, r10, r3)
+#endif /* CONFIG_DOUBLE_HUMMER */
        mffs    fr0
        stfd    fr0,THREAD_FPSCR(r4)
        PPC_LL  r5,PT_REGS(r4)
@@ -78,6 +81,9 @@ _GLOBAL(load_up_fpu)
        lfd     fr0,THREAD_FPSCR(r5)
        MTFSF_L(fr0)
        REST_32FPRS(0, r5)
+#ifdef  CONFIG_DOUBLE_HUMMER
+        REST_32SFPRS(0, r10, r5)
+#endif  /* CONFIG_DOUBLE_HUMMER */

REST uses r5 as the base in both cases (primary and secondary) which is
good.  SAVE uses r4 in the primary case and r3 in the secondary, which
is the wrong base. 

> 
> >
> > The most useful thing would be to see the instruction definition for
> > STFPDX/LFPDX.
> >
> 
> https://wiki.alcf.anl.gov/images/d/d9/PPC440_FP2_arch.pdf

stfpdx  does Primary->DW[EA]  Secondary->DW[EA+8]

I'm tempted to continue to use this and store the data in 'fpr' in the
thread_struct.  Doing it this way the primary register will continue to
be in the same location as before, which will mean ptrace etc will
continue to work at least for the primary.  The secondary will be
accessible using ptrace etc as well, but it'll be a bit of kludge
because it'll appear in the VSX location.

Putting the secondary register in a new area in the thread struct will
mean it's totally inaccessible for debugging without extra code in
ptrace.c/signals.c etc

We are going to need 16x spacing but you are doing to have to increase
the size using TS_FPRWIDTH = 2.

> >>
> >> >> =3DA0/*
> >> >> diff --git a/arch/powerpc/platforms/44x/Kconfig b/arch/powerpc/platfo=
> rms=3D
> >> /44x/
> >> > Kconfig
> >> >> index f485fc5f..24a515e 100644
> >> >> --- a/arch/powerpc/platforms/44x/Kconfig
> >> >> +++ b/arch/powerpc/platforms/44x/Kconfig
> >> >> @@ -169,6 +169,15 @@ config YOSEMITE
> >> >> =3DA0 =3DA0 =3DA0 help
> >> >> =3DA0 =3DA0 =3DA0 =3DA0 This option enables support for the AMCC PPC4=
> 40EP evalua=3D
> >> tion board.
> >> >>
> >> >> +config =3DA0 =3DA0 =3DA0 BGP
> >> >
> >> > Does this FPU feature have a specific name like double hammer? =3DA0I'=
> d
> >> > rather have the BGP defconfig depend on PPC_FPU_DOUBLE_HUMMER, or
> >> > something like that...
> >> >
> >> >> + =3DA0 =3DA0 bool "Blue Gene/P"
> >> >> + =3DA0 =3DA0 depends on 44x
> >> >> + =3DA0 =3DA0 default n
> >> >> + =3DA0 =3DA0 select PPC_FPU
> >> >> + =3DA0 =3DA0 select PPC_DOUBLE_FPU
> >> >
> >> > ... in fact, it seem you are doing something like these here but you
> >> > don't use PPC_DOUBLE_FPU anywhere?
> >> >
> >>
> >> A fair point. =A0I'm fine with calling it DOUBLE_HUMMER, but I wasn't su=
> re if
> >> that was "too internal" of a name for the kernel. =A0Let me know and
> >> I'll fix it up.
> >
> > What I'm mostly concerned about is disassociating it with a particular
> > CPU.
> >
> > If it has an external name, then all the better.
> >
> 
> Since it isn't available on other chips, shoudl it just be PPC_BGP_FPU
> or PPC_BGP_DOUBLE_FPU?

I'd probably still prefer it disassociated with the CPU name, but we are
really bike shedding here.  I'm not too fussed.

Mikey

  reply	other threads:[~2011-05-19 23:16 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-18 21:24 [PATCH 1/7] [RFC] Mainline BG/P platform support Eric Van Hensbergen
2011-05-18 21:24 ` Eric Van Hensbergen
2011-05-18 21:24 ` [PATCH 2/7] [RFC] add bluegene entry to cputable Eric Van Hensbergen
2011-05-18 21:24   ` Eric Van Hensbergen
2011-05-20  0:35   ` Benjamin Herrenschmidt
2011-05-20  0:35     ` Benjamin Herrenschmidt
2011-05-20  1:08     ` Eric Van Hensbergen
2011-05-20  1:08       ` Eric Van Hensbergen
2011-05-20  1:50       ` Benjamin Herrenschmidt
2011-05-20  1:50         ` Benjamin Herrenschmidt
2011-05-18 21:24 ` [PATCH 3/7] [RFC] add support for BlueGene/P FPU Eric Van Hensbergen
2011-05-18 21:24   ` Eric Van Hensbergen
2011-05-19  5:58   ` Michael Neuling
2011-05-19  5:58     ` Michael Neuling
2011-05-19 13:53     ` Eric Van Hensbergen
2011-05-19 13:53       ` Eric Van Hensbergen
2011-05-19 15:22       ` [bg-linux] " Kazutomo Yoshii
2011-05-19 21:36       ` Michael Neuling
2011-05-19 21:36         ` Michael Neuling
2011-05-19 21:55         ` Eric Van Hensbergen
2011-05-19 21:55           ` Eric Van Hensbergen
2011-05-19 23:16           ` Michael Neuling [this message]
2011-05-19 23:16             ` Michael Neuling
2011-05-20  0:30             ` Eric Van Hensbergen
2011-05-20  0:30               ` Eric Van Hensbergen
2011-05-20  0:43               ` Michael Neuling
2011-05-20  0:43                 ` Michael Neuling
2011-05-20  0:53       ` Benjamin Herrenschmidt
2011-05-20  0:52     ` Benjamin Herrenschmidt
2011-05-20  0:52       ` Benjamin Herrenschmidt
2011-05-19 21:41   ` [PATCH 3/7] [RFC][V2] add support for BlueGene/P Double FPU Eric Van Hensbergen
2011-05-19 21:41     ` Eric Van Hensbergen
2011-05-18 21:24 ` [PATCH 4/7] [RFC] enable L1_WRITETHROUGH mode for BG/P Eric Van Hensbergen
2011-05-18 21:24   ` Eric Van Hensbergen
2011-05-19 10:43   ` Josh Boyer
2011-05-19 10:43     ` Josh Boyer
2011-05-19 12:53     ` Eric Van Hensbergen
2011-05-19 12:53       ` Eric Van Hensbergen
2011-05-19 21:42   ` [PATCH 4/7] [RFC][V2] enable BGP_L1_WRITETHROUGH " Eric Van Hensbergen
2011-05-19 21:42     ` Eric Van Hensbergen
2011-05-20  1:01   ` [PATCH 4/7] [RFC] enable L1_WRITETHROUGH " Benjamin Herrenschmidt
2011-05-20  1:01     ` Benjamin Herrenschmidt
2011-05-18 21:24 ` [PATCH 5/7] [RFC] force 32-byte aligned kmallocs Eric Van Hensbergen
2011-05-18 21:24   ` Eric Van Hensbergen
2011-05-20  0:36   ` Benjamin Herrenschmidt
2011-05-20  0:36     ` Benjamin Herrenschmidt
2011-05-20  0:47     ` Eric Van Hensbergen
2011-05-20  0:47       ` Eric Van Hensbergen
2011-05-20  1:50       ` Benjamin Herrenschmidt
2011-05-20  1:50         ` Benjamin Herrenschmidt
2011-05-20  1:32     ` [bg-linux] " Kazutomo Yoshii
2011-05-20  2:08       ` Benjamin Herrenschmidt
2011-05-20  2:08         ` Benjamin Herrenschmidt
2011-05-20  2:13         ` Benjamin Herrenschmidt
2011-05-20  2:13           ` Benjamin Herrenschmidt
2011-05-20  3:02           ` Kazutomo Yoshii
2011-05-20  3:13             ` Benjamin Herrenschmidt
2011-05-18 21:24 ` [PATCH 6/7] [RFC] enable early TLBs for BG/P Eric Van Hensbergen
2011-05-18 21:24   ` Eric Van Hensbergen
2011-05-20  0:39   ` Benjamin Herrenschmidt
2011-05-20  0:39     ` Benjamin Herrenschmidt
2011-05-20  1:21     ` Eric Van Hensbergen
2011-05-20  1:21       ` Eric Van Hensbergen
2011-05-20  1:54       ` Benjamin Herrenschmidt
2011-05-20  1:54         ` Benjamin Herrenschmidt
2011-05-20  3:38         ` [bg-linux] " Kazutomo Yoshii
2011-05-20  3:38           ` Kazutomo Yoshii
2011-05-20  3:52           ` Benjamin Herrenschmidt
2011-05-20  3:52             ` Benjamin Herrenschmidt
2011-05-20 13:01             ` Eric Van Hensbergen
2011-05-20 22:20               ` Benjamin Herrenschmidt
2011-05-18 21:24 ` [PATCH 7/7] [RFC] SMP support code Eric Van Hensbergen
2011-05-18 21:24   ` Eric Van Hensbergen
2011-05-20  1:05   ` Benjamin Herrenschmidt
2011-05-20  1:05     ` Benjamin Herrenschmidt
2011-05-19 11:01 ` [PATCH 1/7] [RFC] Mainline BG/P platform support Josh Boyer
2011-05-19 11:01   ` Josh Boyer
2011-05-19 12:35   ` Eric Van Hensbergen
2011-05-19 12:35     ` Eric Van Hensbergen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6241.1305847001@neuling.org \
    --to=mikey@neuling.org \
    --cc=bg-linux@lists.anl-external.org \
    --cc=ericvh@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.