All of lore.kernel.org
 help / color / mirror / Atom feed
* SH sigcontext ABI is broken
@ 2015-06-19  7:09 Rich Felker
  2015-06-19  7:41 ` Andreas Schwab
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-19  7:09 UTC (permalink / raw)
  To: linux-sh

Presently the SH version of the sigcontext structure, and thus
mcontext_t/ucontext_t, varies in a way that mismatches and breaks ABI.
On the kernel side, whether it has space for FPU registers (or worse,
uses a completely different SH5 layout) depends on whether the kernel
was built for hardware with or without FPU (or for pre-SH5 vs SH5). On
the userspace side, glibc always uses the pre-SH5 layout, but whether
it has space for FPU registers depends on whether the _userspace_
binary was compile for FPU or no-FPU. This can and does mismatch the
kernel's definition when a no-FPU binary is being run on
hardware/kernel with FPU, and the mismatch is particularly bad because
the uc_sigmask member, which signal handlers can legitimately inspect,
moves around depending on which version of the structure is in use.

I did some research and this issue goes way back, to before the
beginning of the kernel git repository.

I see two possible ways forward. The complex but "compatible" (if
there's even such a thing as "compatible" with this mess) is
introducing new personalities for hardfloat vs softfloat sigcontext
ABIs, and having the kernel generate the proper layout for the
personality in use. The way I would prefer is just getting rid of the
#ifdefs around the fpu registers and always including storage for them
in the sigcontext, regardless of whether the machine has FPU. This
would be an ABI "change" in some sense for no-FPU environments, but
being that the ABI is already broken depending on which kernel you're
running, and nobody seems to have even noticed or cared up til now, I
think it's justified.

The SH5 layout is another matter and it's not clear to me whether SH5
can even run 32-bit binaries or whether it's essentially a completely
different arch -- I've never worked with it.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
@ 2015-06-19  7:41 ` Andreas Schwab
  2015-06-19 19:12 ` [musl] " Rich Felker
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Andreas Schwab @ 2015-06-19  7:41 UTC (permalink / raw)
  To: linux-sh

Rich Felker <dalias@libc.org> writes:

> I did some research and this issue goes way back, to before the
> beginning of the kernel git repository.

There are various git trees that render the pre-git history, see
<http://stackoverflow.com/questions/3264283/linux-kernel-historical-git-repository-with-full-history>
for pointers.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
  2015-06-19  7:41 ` Andreas Schwab
@ 2015-06-19 19:12 ` Rich Felker
  2015-06-19 19:57 ` Andreas Schwab
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-19 19:12 UTC (permalink / raw)
  To: linux-sh

On Fri, Jun 19, 2015 at 09:41:58AM +0200, Andreas Schwab wrote:
> Rich Felker <dalias@libc.org> writes:
> 
> > I did some research and this issue goes way back, to before the
> > beginning of the kernel git repository.
> 
> There are various git trees that render the pre-git history, see
> <http://stackoverflow.com/questions/3264283/linux-kernel-historical-git-repository-with-full-history>
> for pointers.

Thanks, but most of the links seem to be broken. Do you think there
would be value (in terms of helping determine how to solve this
problem) in digging up the history further?

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
  2015-06-19  7:41 ` Andreas Schwab
  2015-06-19 19:12 ` [musl] " Rich Felker
@ 2015-06-19 19:57 ` Andreas Schwab
  2015-06-19 20:32 ` Rich Felker
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Andreas Schwab @ 2015-06-19 19:57 UTC (permalink / raw)
  To: linux-sh

Rich Felker <dalias@libc.org> writes:

> Thanks, but most of the links seem to be broken.

Are they?  I'm only seeing a single broken link, which has a mirror.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (2 preceding siblings ...)
  2015-06-19 19:57 ` Andreas Schwab
@ 2015-06-19 20:32 ` Rich Felker
  2015-06-20  8:10 ` Geert Uytterhoeven
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-19 20:32 UTC (permalink / raw)
  To: linux-sh

On Fri, Jun 19, 2015 at 09:57:22PM +0200, Andreas Schwab wrote:
> Rich Felker <dalias@libc.org> writes:
> 
> > Thanks, but most of the links seem to be broken.
> 
> Are they?  I'm only seeing a single broken link, which has a mirror.

My bad. Indeed only the davej one is broken, but that's where the code
must have been introduced (even the earliest commit in tglx
history.git has the #ifdef __SH4__ for FPU regs) and I can't find a
cgit interface to it. Fetching several GB to browse history locally is
going to take a while if I have to do that..

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (3 preceding siblings ...)
  2015-06-19 20:32 ` Rich Felker
@ 2015-06-20  8:10 ` Geert Uytterhoeven
  2015-06-20 18:06 ` [musl] " Rich Felker
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2015-06-20  8:10 UTC (permalink / raw)
  To: linux-sh

On Fri, Jun 19, 2015 at 10:32 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Jun 19, 2015 at 09:57:22PM +0200, Andreas Schwab wrote:
>> Rich Felker <dalias@libc.org> writes:
>>
>> > Thanks, but most of the links seem to be broken.
>>
>> Are they?  I'm only seeing a single broken link, which has a mirror.
>
> My bad. Indeed only the davej one is broken, but that's where the code
> must have been introduced (even the earliest commit in tglx
> history.git has the #ifdef __SH4__ for FPU regs) and I can't find a
> cgit interface to it. Fetching several GB to browse history locally is
> going to take a while if I have to do that..

Using web interfaces for archeology doesn't fly.
If you're doing serious Linux work, you should already have a git repository
of the kernel. full-history-linux.git.tar weights in at only ca. 0.5 giB.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (4 preceding siblings ...)
  2015-06-20  8:10 ` Geert Uytterhoeven
@ 2015-06-20 18:06 ` Rich Felker
  2015-06-20 19:59 ` [musl] " Rob Landley
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-20 18:06 UTC (permalink / raw)
  To: linux-sh

On Fri, Jun 19, 2015 at 03:09:12AM -0400, Rich Felker wrote:
> Presently the SH version of the sigcontext structure, and thus
> mcontext_t/ucontext_t, varies in a way that mismatches and breaks ABI.
> On the kernel side, whether it has space for FPU registers (or worse,
> uses a completely different SH5 layout) depends on whether the kernel
> was built for hardware with or without FPU (or for pre-SH5 vs SH5). On
> the userspace side, glibc always uses the pre-SH5 layout, but whether
> it has space for FPU registers depends on whether the _userspace_
> binary was compile for FPU or no-FPU. This can and does mismatch the
> kernel's definition when a no-FPU binary is being run on
> hardware/kernel with FPU, and the mismatch is particularly bad because
> the uc_sigmask member, which signal handlers can legitimately inspect,
> moves around depending on which version of the structure is in use.
> 
> I did some research and this issue goes way back, to before the
> beginning of the kernel git repository.

From further research (glibc repo), it looks like glibc has had
separate definitions of ucontext for sh3/sh4 ever since it supported
them, and apparently never considered what happens if you run sh3
binaries on a sh4 kernel/hardware. But I don't have pre-git history so
maybe there are ancient details I'm missing.

From one copy of the davej history repo I found, the FPU registers
seem to have been added in 2.3.99pre4-1. Before that, signal.c had no
support at all for saving/restoring FPU registers, as far as I can
tell, despite entry.S handling them for context switches on sh4. At
the same time the FPU registers were added, the layout of the base
register set was also changed; sp was moved from sc_sp in its own
discontiguous location to sc_regs[15].

So there's a lot of historical mess and breakage here, but sh3
binaries have been running with a stable (albeit wrong, IMO)
definition of ucontext_t/mcontext_t/sigcontext for around 14 years
now (as long as they only run on sh3 hardware, not sh4). So I'm a bit
hesitant to consider this something that could be changed with no path
for compatibility.

What would be the right approach with personality? Is there any way
for the kernel to automatically set a personality based on the ELF
headers? There are two userspace ABIs anyway (fpu ABI, only available
on sh4 or sh2a, and nofpu ABI, available everywhere in theory but
presently broken if you run it on hardware with fpu) and they can be
distinguished by the e_flags ELF header. Alternatively, userspace
could be responsible for calling SYS_personality with the right value
in start code moving forward.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (5 preceding siblings ...)
  2015-06-20 18:06 ` [musl] " Rich Felker
@ 2015-06-20 19:59 ` Rob Landley
  2015-06-24  4:25 ` [musl] " Rob Landley
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rob Landley @ 2015-06-20 19:59 UTC (permalink / raw)
  To: linux-sh

On 06/20/2015 03:10 AM, Geert Uytterhoeven wrote:
> On Fri, Jun 19, 2015 at 10:32 PM, Rich Felker <dalias@libc.org> wrote:
>> On Fri, Jun 19, 2015 at 09:57:22PM +0200, Andreas Schwab wrote:
>>> Rich Felker <dalias@libc.org> writes:
>>>
>>>> Thanks, but most of the links seem to be broken.
>>>
>>> Are they?  I'm only seeing a single broken link, which has a mirror.
>>
>> My bad. Indeed only the davej one is broken, but that's where the code
>> must have been introduced (even the earliest commit in tglx
>> history.git has the #ifdef __SH4__ for FPU regs) and I can't find a
>> cgit interface to it. Fetching several GB to browse history locally is
>> going to take a while if I have to do that..
> 
> Using web interfaces for archeology doesn't fly.
> If you're doing serious Linux work, you should already have a git repository
> of the kernel. full-history-linux.git.tar weights in at only ca. 0.5 giB.

I have a somewhat updated version of that at
http://landley.net/kdocs/local/linux-fullhist.tar.bz2 which I should
probably update for the 4.0 release. (It's pulled to 3.0 currently.)

Meanwhile you can download and extract that tarball, cd into it, then
"git checkout -f" followed by "git pull". (It doesn't have the checked
out files because it would make the tarball bigger, it's just the .git
directory.) That gives you a repository that goes from 0.0.1 to current,
although I haven't gone back and tagged the old releases yet.

Oh, you may want to edit .git/config so it pulls from linux instead of
linux-2.6. Largely cosmetic, but eh...

> Gr{oetje,eeting}s,
> 
>                         Geert

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-sh" in

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (6 preceding siblings ...)
  2015-06-20 19:59 ` [musl] " Rob Landley
@ 2015-06-24  4:25 ` Rob Landley
  2015-06-24  4:52 ` Rich Felker
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rob Landley @ 2015-06-24  4:25 UTC (permalink / raw)
  To: linux-sh

On 06/20/2015 01:06 PM, Rich Felker wrote:
> So there's a lot of historical mess and breakage here, but sh3
> binaries have been running with a stable (albeit wrong, IMO)
> definition of ucontext_t/mcontext_t/sigcontext for around 14 years
> now (as long as they only run on sh3 hardware, not sh4). So I'm a bit
> hesitant to consider this something that could be changed with no path
> for compatibility.

I'm told SH3 was only on sale for about a year between its introduction
and sh4 coming out, at which point everybody switched. There were
significant sh2 deployments and significant sh4 deployments, but sh3 was
more or less a rounding error. The Wikipedia[citation needed] article
doesn't even break it out separately because there's really nothing to
say: https://en.wikipedia.org/?title=SuperH

(Again, there's a reason qemu-system-sh4 has a 4 in it. At $DAYJOB their
plan is to eventually jump from sh2 straight to sh4 because sh3 doesn't
matter.)

sh2a was a retcon, started shipping in 2007, a decade after the
dreamcast. Hitachi had already unloaded superh onto Renesas, which did a
big Not Invented Here on superh and kept trying to come up with their
own processor designs. The H in H8300 also stands for Hitachi, so you
can imagine how well Renesas supported it:

http://permalink.gmane.org/gmane.linux.ports.sh.devel/7237

Seriously, It only became interesting again when the patents expired...

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (7 preceding siblings ...)
  2015-06-24  4:25 ` [musl] " Rob Landley
@ 2015-06-24  4:52 ` Rich Felker
  2015-06-24  7:12 ` Rob Landley
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-24  4:52 UTC (permalink / raw)
  To: linux-sh

On Tue, Jun 23, 2015 at 11:25:08PM -0500, Rob Landley wrote:
> On 06/20/2015 01:06 PM, Rich Felker wrote:
> > So there's a lot of historical mess and breakage here, but sh3
> > binaries have been running with a stable (albeit wrong, IMO)
> > definition of ucontext_t/mcontext_t/sigcontext for around 14 years
> > now (as long as they only run on sh3 hardware, not sh4). So I'm a bit
> > hesitant to consider this something that could be changed with no path
> > for compatibility.
> 
> I'm told SH3 was only on sale for about a year between its introduction
> and sh4 coming out, at which point everybody switched. There were
> significant sh2 deployments and significant sh4 deployments, but sh3 was
> more or less a rounding error. The Wikipedia[citation needed] article
> doesn't even break it out separately because there's really nothing to
> say: https://en.wikipedia.org/?title=SuperH
> 
> (Again, there's a reason qemu-system-sh4 has a 4 in it. At $DAYJOB their
> plan is to eventually jump from sh2 straight to sh4 because sh3 doesn't
> matter.)
> 
> sh2a was a retcon, started shipping in 2007, a decade after the
> dreamcast. Hitachi had already unloaded superh onto Renesas, which did a
> big Not Invented Here on superh and kept trying to come up with their
> own processor designs. The H in H8300 also stands for Hitachi, so you
> can imagine how well Renesas supported it:
> 
> http://permalink.gmane.org/gmane.linux.ports.sh.devel/7237
> 
> Seriously, It only became interesting again when the patents expired...

It's easy to declare SH3 irrelevant when we're not using it, but if we
want SH in general to be a serious platform moving forward, there
needs to be proper attention to things like not breaking kernel
API/ABI and a concern for consensus among users of the platform.
Nominally SH3 support remains in both the kernel and glibc. If it can
be established that multiple parties agree that there's really no one
left who cares about the old no-FPU sigcontext ABI on SH3, I will be
all for dropping it and unifying sigcontext.

Perhaps a good starting point would be making SH2 (and SH1 if it's
even supported at all) use the SH4(/SH2A)-compatible sigcontext
layout. For these, I think it's completely implausible that existing
software depends on the layout.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (8 preceding siblings ...)
  2015-06-24  4:52 ` Rich Felker
@ 2015-06-24  7:12 ` Rob Landley
  2015-06-24  8:23 ` Rob Landley
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rob Landley @ 2015-06-24  7:12 UTC (permalink / raw)
  To: linux-sh



On 06/23/2015 11:52 PM, Rich Felker wrote:
> On Tue, Jun 23, 2015 at 11:25:08PM -0500, Rob Landley wrote:
>> On 06/20/2015 01:06 PM, Rich Felker wrote:
>>> So there's a lot of historical mess and breakage here, but sh3
>>> binaries have been running with a stable (albeit wrong, IMO)
>>> definition of ucontext_t/mcontext_t/sigcontext for around 14 years
>>> now (as long as they only run on sh3 hardware, not sh4). So I'm a bit
>>> hesitant to consider this something that could be changed with no path
>>> for compatibility.
>>
>> I'm told SH3 was only on sale for about a year between its introduction
>> and sh4 coming out, at which point everybody switched. There were
>> significant sh2 deployments and significant sh4 deployments, but sh3 was
>> more or less a rounding error. The Wikipedia[citation needed] article
>> doesn't even break it out separately because there's really nothing to
>> say: https://en.wikipedia.org/?title=SuperH
>>
>> (Again, there's a reason qemu-system-sh4 has a 4 in it. At $DAYJOB their
>> plan is to eventually jump from sh2 straight to sh4 because sh3 doesn't
>> matter.)
>>
>> sh2a was a retcon, started shipping in 2007, a decade after the
>> dreamcast. Hitachi had already unloaded superh onto Renesas, which did a
>> big Not Invented Here on superh and kept trying to come up with their
>> own processor designs. The H in H8300 also stands for Hitachi, so you
>> can imagine how well Renesas supported it:
>>
>> http://permalink.gmane.org/gmane.linux.ports.sh.devel/7237
>>
>> Seriously, It only became interesting again when the patents expired...
> 
> It's easy to declare SH3 irrelevant when we're not using it,

If nobody is using it it's irrelevant, yes.

> but if we
> want SH in general to be a serious platform moving forward, there
> needs to be proper attention to things like not breaking kernel
> API/ABI and a concern for consensus among users of the platform.

You're aware that modern x86 processors dropped support for the
binary-coded-decimal instructions in the original 8086, right? Obviously
x86 is not a serious platform...

You're saying that historically there have been multiple incompatible
ABIs, which nobody noticed the brokenness of for years (clone system
call arguments, etc) because _if_ anybody was still using them
(unlikely) they haven't upgraded their kernel in years. (We found things
that wouldn't build with a 4.x toolchain but the people building a lot
of this were using a 2.x toolchain and pthreads, not nptl...)

As part of your "unified" binary you want to invent a new file format
(ELF/fdpic combo) that uses a new system call trap number, and you're
going to patch the kernel to understand this new stuff due to a concern
about backwards compatibility...?

I've lost the plot here, is what I"m saying.

> Nominally SH3 support remains in both the kernel and glibc. If it can
> be established that multiple parties agree that there's really no one
> left who cares about the old no-FPU sigcontext ABI on SH3, I will be
> all for dropping it and unifying sigcontext.

Multiple parties like who?

If you feel it important to create infrastructure in search of a user
unless I can prove a negative, it's your libc. But I really, really,
really don't see the point. "This is the interesting subset." "But
somebody else might exist!" "Wait to hear from them?"

> Perhaps a good starting point would be making SH2 (and SH1 if it's
> even supported at all) use the SH4(/SH2A)-compatible sigcontext
> layout. For these, I think it's completely implausible that existing
> software depends on the layout.

Your post said the FPU is what changed the layout. I don't think sh2 had
an FPU? (Again, sh2a first shipped in 2007...)

I don't understand why you want a common abi between a nommu system and
an mmu system which did not historically have the same system calls or
even use the same binary format. What's the point?

> Rich

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (9 preceding siblings ...)
  2015-06-24  7:12 ` Rob Landley
@ 2015-06-24  8:23 ` Rob Landley
  2015-06-24  8:40 ` Rob Landley
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rob Landley @ 2015-06-24  8:23 UTC (permalink / raw)
  To: linux-sh

On 06/24/2015 09:10 AM, Joseph Myers wrote:
> On Wed, 24 Jun 2015, Rich Felker wrote:
> 
>> Nominally SH3 support remains in both the kernel and glibc. If it can
>> be established that multiple parties agree that there's really no one
>> left who cares about the old no-FPU sigcontext ABI on SH3, I will be
>> all for dropping it and unifying sigcontext.
> 
> Note that right now we have BE and LE versions of *three* ABIs for SH in 
> glibc (SH3 soft-float, SH4 soft-float, SH4 hard-float) (and as noted in 
> this discussion, right now each would only work properly on a kernel with 
> the corresponding configuration).  See 
> <https://sourceware.org/glibc/wiki/ABIList>.
> 
> We can, of course, choose to declare processor or ABI variants no longer 
> supported in glibc, much like we desupported i386 in glibc (requiring i486 
> or later - albeit the official desupporting happening several years after 
> i386 would no longer build) or removed support for non-EABI ARM.  But 
> since we don't have an SH maintainer at all in glibc at present, it's 
> harder to make such a decision (whereas if an architecture maintainer 
> decided some variants were no longer relevant, they could just remove 
> support - make those variants give a configure-time error - in the absence 
> of someone objecting and willing to take over maintaining support for 
> those variants).
> 
> I think the next glibc change likely to require action from each 
> architecture's maintainer to avoid breaking the build may be Adhemerval's 
> cancellation changes - so if no-one comes forward as SH maintainer to at 
> least update SH for those changes when they are ready to go in, the build 
> for SH will be broken and that will indicate, as per 
> <https://sourceware.org/ml/libc-alpha/2015-06/msg00424.html>, that it may 
> be time to remove the port from glibc.

Eh, ping me when that happens. I may at least do necessary changes to
keep it building. (Although I can only test glibc on qemu-system-sh4.)

Rob


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (10 preceding siblings ...)
  2015-06-24  8:23 ` Rob Landley
@ 2015-06-24  8:40 ` Rob Landley
  2015-06-24  9:14 ` Rob Landley
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rob Landley @ 2015-06-24  8:40 UTC (permalink / raw)
  To: linux-sh



On 06/24/2015 01:03 PM, Rich Felker wrote:
> On Wed, Jun 24, 2015 at 02:12:58AM -0500, Rob Landley wrote:
>> I've lost the plot here, is what I"m saying.
> 
> OK, I'll try to get us back on it then.
> 
> To begin with, let's put aside musl, revival of SH, and anything new
> and just look at the existing situation.
> 
> Right now, SH3 or SH4-nofpu binaries are ABI-incompatible with SH4
> kernels.

On glibc? Never tested it, all my sh4 stuff was uclibc until musl showed
an interest.

> This incompatibility is in a place very few applications are
> going to use or care about, but it's essential for musl and it's going
> to be essential for glibc once they get around to fixing cancellation.
> 
> Likewise, SH2 binaries are incompatible with SH2A kernels and SH4
> kernels.

And nobody ever cared before because elf2flt or fdpic didn't run on
systems _with_ an mmu, so the binary packaging format incompatability
would hit you long before anything else did.

> I can't imagine this being intentional. While the original SH2 work
> was not intended to produce binaries capable of running on later
> models, SH3 and SH4 were treated like a normal MMU-ful Linux arch,
> where it should aways be possible to run a binary built for cpu
> revision M on an actual cpu revision N, where M<=N.

Nobody cared because sh3 was supplanted by sh4 literally a year after it
came out (Hitachi had a "new processor design each year" program going
for a bit, until they figured out why that was dumb). So sh4 was already
an order of magnitude more popular than sh3 by the time Linux grew
support for sh4.

> Since our new SH2 binaries (using ELF, musl, and possibly glibc if the
> port is not dropped) are also going to be compatible with running on
> later MMU-ful hardware (e.g. J4), I don't want this same issue to be a
> point of breakage for them.

This is the "lost the plot" part. I don't get it. What's the point? I do
not understand why you have this as a goal.

> The userspace SH2 ABI is nofpu (no float registers for float args), so
> there is already a separate userspace ABI for SH2 (and SH3) vs the
> usual SH4 ABI with float. That's not a problem.

Yes, a separate ABI for sh2 vs sh4 has not, historically speaking, been
a problem.

> Dynamic linked
> binaries have their own separate shared library ecosystem, and for
> static linked binaries, there's no userspace ABI boundary left once ld
> runs. However kernel-user ABI breakage is a show-stopper. It means
> that, even if you had the right ldso and libraries for nofpu SH2
> binaries, you couldn't safely run them on SH4 because the kernel would
> give you the wrong ucontext_t layout.

And historically used entirely different trap numbers for system calls,
although you made a kernel patch for that. And a couple more to the
kernel binary loader...

> If we want the SH-nofpu ABI to use the old nofpu ucontext_t layout,
> then the kernel (and qemu-user) is going to need a way to detect
> nofpu-ABI binaries and generate the right ucontext_t for them.

Or sh2 vs sh4 could be different compile-time targets with different
libc instances?

> If we switch to using the same ucontext_t layout everywhere, the
> kernel does not have to be smart, and the kernel ABI looks the same
> for all SH variants, but old binaries (if they depend on ucontext_t
> layout, which is _rare_ anyway) could break.

Old binaries can run under old kernels with old userspace partitions.

> My leaning at this point, especially since you say SH3 is irrelevant,
> is to use the same ucontext_t layout for them all (with the float reg
> space empty for nofpu chips). If any real-world old apps break and
> people care about them, we could make a personality that you set
> manually for old-nofpu ucontext_t layout. But I suspect the issue will
> just go away.

I suspect the issue will just go away too.

After more patents expire next year, we can add full sh4 compatibility
to j-core. If we want a better userspace api ala x86's x32 or mips
o32/n32/nubi or arm's oabi/eabi, we can do that. (In fact that's one of
0pf.org's goals, kawasaki-san is _trying_ to run a standards body. If
you want to wave an abi proposal at him for comment, he is the original
superh architect...)

I want musl to support sh2 but I _also_ want it to support coldfire and
h8300 and so on. If musl is the successor to uclibc (which needs to be
put out of its misery), it needs nommu support for several different
architectures. If you insist that every nommu architecture must also run
those nommu binaries on with-mmu sibling architectures, you're going to
be unifying coldfire and m68k next...

> Rich

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (11 preceding siblings ...)
  2015-06-24  8:40 ` Rob Landley
@ 2015-06-24  9:14 ` Rob Landley
  2015-06-24 14:10 ` Joseph Myers
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rob Landley @ 2015-06-24  9:14 UTC (permalink / raw)
  To: linux-sh



On 06/24/2015 01:12 PM, Rich Felker wrote:
> On Wed, Jun 24, 2015 at 02:10:06PM +0000, Joseph Myers wrote:
>> On Wed, 24 Jun 2015, Rich Felker wrote:
>>
>>> Nominally SH3 support remains in both the kernel and glibc. If it can
>>> be established that multiple parties agree that there's really no one
>>> left who cares about the old no-FPU sigcontext ABI on SH3, I will be
>>> all for dropping it and unifying sigcontext.
>>
>> Note that right now we have BE and LE versions of *three* ABIs for SH in 
>> glibc (SH3 soft-float, SH4 soft-float, SH4 hard-float) (and as noted in 
>> this discussion, right now each would only work properly on a kernel with 
>> the corresponding configuration).  See 
>> <https://sourceware.org/glibc/wiki/ABIList>.
> 
> Is your understanding that SH4 soft-float is using the SH4 ucontext_t
> layout? I don't think it's even working at all.

I never bothered to test floating point on it. It doesn't come up much
with anything I do, and qemu's floating point emulation is notoriously
dicey.

If I do an x86-64 linux from scratch build the perl build dies with:
https://twitter.com/landley/status/571883794279493633

Of course it doesn't happen in a chroot or using distcc to call out to
the cross compiler, only when gcc does those floating point calculations
under qemu-system-x86_64. (Presumably it wouldn't happen if I was using
kvm instead of qemu either...) Given that, trying to prove anything
about qemu-system-sh4's floating point seemed like a waste of time.

> Glibc uses the layout
> with fpu registers only if __SH4__ or __SH4A__ is defined,

I've never built glibc for sh4. I could try installing the old debian
sh4 chroot? (What release was that, squiggy? I tried installing Debian's
alpha lenny chroot yesterday and "apt-get update" in the chroot is
failing trying to hand off the wget data to gzip. Something with pipes
in qemu-alpha application emulation, I think. It's on the todo list.)

If you're curious, I was following the qemu-debootstrap instructions on
https://wiki.debian.org/ArmHardFloatChroot substituting in info from
https://www.debian.org/ports/ (hence the ping on #musl about whether
musl debian ports would be interesting). Also there's a debian sh4 page
at https://wiki.debian.org/SH4 so if I needed to poke at glibc for sh4,
that would probably be my starting point.

> but GCC
> does not define these macros when -m4-nofpu is used. Instead it
> defines both __SH3__ and __SH4_NOFPU__.

I hack around that sort of thing in builds all the time. Various bits of
gnu software only ever agree with each other (or anything else) by
coincidence.

> On the other hand, the kernel uses:
> 
> #if defined(__SH4__) || defined(CONFIG_CPU_SH4) || \
>     defined(__SH2A__) || defined(CONFIG_CPU_SH2A) || 1
> 
> to determine whether to include the FPU regs in the struct.
> CONFIG_CPU_SH4 is presumably defined whenever the kernel is built for
> the SH4 entry point code. So I don't think it's even possible to build
> a kernel that's compatible with glibc's SH4 soft-float.

You think this is in any way unusual?

http://landley.net/hg/aboriginal/file/tip/sources/patches

Patching stuff to make this kind of thing match up during a build is
_normal_. It's means you're not on x86 (or these days, arm).

> This seems to have been a silent ABI regression in glibc when the sh
> sys/* sysdep headers were merged. Back when there were separate
> versions in the sh3 and sh4 dirs, it _should_ have worked with the
> kernel's definitions.

Embedded development 101: first time the package broke most of the
userbase just didn't upgrade to the broken version. If they're stuck on
2.4 (or 2.0!) as a result, and the device wasn't connected to the
internet, they did not care. (The sad parts are where the device IS
connected to the internet and they _still_ don't care.)

> I think this level of breakage (that nobody seems to have noticed or
> cared about) is sufficient to say let's just throw out the old no-fpu
> ucontext_t and use the same struct everywhere for now. We can always
> add a personality to get the old one back if anyone ever needs it.

Seriously, the person you should be talking to is either Jeff (founder
of uclinux.org) or Kawasaki-san (original superh architect). I can
forward questions to 'em, but we've established than I'm a very
inefficient intermediary. :)

>> I think the next glibc change likely to require action from each 
>> architecture's maintainer to avoid breaking the build may be Adhemerval's 
>> cancellation changes - so if no-one comes forward as SH maintainer to at 
>> least update SH for those changes when they are ready to go in, the build 
>> for SH will be broken and that will indicate, as per 
>> <https://sourceware.org/ml/libc-alpha/2015-06/msg00424.html>, that it may 
>> be time to remove the port from glibc.
> 
> I may be available to do the cancellation changes (it's my design, so
> I'm familiar with the requirements), but I'll probably have to get
> copyright assignment paperwork taken care of first.

Ah right, copyright assignment. Rich is a much better choice to do this
then.

> Rich

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (12 preceding siblings ...)
  2015-06-24  9:14 ` Rob Landley
@ 2015-06-24 14:10 ` Joseph Myers
  2015-06-24 18:03 ` Rich Felker
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Joseph Myers @ 2015-06-24 14:10 UTC (permalink / raw)
  To: linux-sh

On Wed, 24 Jun 2015, Rich Felker wrote:

> Nominally SH3 support remains in both the kernel and glibc. If it can
> be established that multiple parties agree that there's really no one
> left who cares about the old no-FPU sigcontext ABI on SH3, I will be
> all for dropping it and unifying sigcontext.

Note that right now we have BE and LE versions of *three* ABIs for SH in 
glibc (SH3 soft-float, SH4 soft-float, SH4 hard-float) (and as noted in 
this discussion, right now each would only work properly on a kernel with 
the corresponding configuration).  See 
<https://sourceware.org/glibc/wiki/ABIList>.

We can, of course, choose to declare processor or ABI variants no longer 
supported in glibc, much like we desupported i386 in glibc (requiring i486 
or later - albeit the official desupporting happening several years after 
i386 would no longer build) or removed support for non-EABI ARM.  But 
since we don't have an SH maintainer at all in glibc at present, it's 
harder to make such a decision (whereas if an architecture maintainer 
decided some variants were no longer relevant, they could just remove 
support - make those variants give a configure-time error - in the absence 
of someone objecting and willing to take over maintaining support for 
those variants).

I think the next glibc change likely to require action from each 
architecture's maintainer to avoid breaking the build may be Adhemerval's 
cancellation changes - so if no-one comes forward as SH maintainer to at 
least update SH for those changes when they are ready to go in, the build 
for SH will be broken and that will indicate, as per 
<https://sourceware.org/ml/libc-alpha/2015-06/msg00424.html>, that it may 
be time to remove the port from glibc.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (13 preceding siblings ...)
  2015-06-24 14:10 ` Joseph Myers
@ 2015-06-24 18:03 ` Rich Felker
  2015-06-24 18:12 ` Rich Felker
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-24 18:03 UTC (permalink / raw)
  To: linux-sh

On Wed, Jun 24, 2015 at 02:12:58AM -0500, Rob Landley wrote:
> I've lost the plot here, is what I"m saying.

OK, I'll try to get us back on it then.

To begin with, let's put aside musl, revival of SH, and anything new
and just look at the existing situation.

Right now, SH3 or SH4-nofpu binaries are ABI-incompatible with SH4
kernels. This incompatibility is in a place very few applications are
going to use or care about, but it's essential for musl and it's going
to be essential for glibc once they get around to fixing cancellation.

Likewise, SH2 binaries are incompatible with SH2A kernels and SH4
kernels.

I can't imagine this being intentional. While the original SH2 work
was not intended to produce binaries capable of running on later
models, SH3 and SH4 were treated like a normal MMU-ful Linux arch,
where it should aways be possible to run a binary built for cpu
revision M on an actual cpu revision N, where M<=N.

Since our new SH2 binaries (using ELF, musl, and possibly glibc if the
port is not dropped) are also going to be compatible with running on
later MMU-ful hardware (e.g. J4), I don't want this same issue to be a
point of breakage for them.

The userspace SH2 ABI is nofpu (no float registers for float args), so
there is already a separate userspace ABI for SH2 (and SH3) vs the
usual SH4 ABI with float. That's not a problem. Dynamic linked
binaries have their own separate shared library ecosystem, and for
static linked binaries, there's no userspace ABI boundary left once ld
runs. However kernel-user ABI breakage is a show-stopper. It means
that, even if you had the right ldso and libraries for nofpu SH2
binaries, you couldn't safely run them on SH4 because the kernel would
give you the wrong ucontext_t layout.

If we want the SH-nofpu ABI to use the old nofpu ucontext_t layout,
then the kernel (and qemu-user) is going to need a way to detect
nofpu-ABI binaries and generate the right ucontext_t for them.

If we switch to using the same ucontext_t layout everywhere, the
kernel does not have to be smart, and the kernel ABI looks the same
for all SH variants, but old binaries (if they depend on ucontext_t
layout, which is _rare_ anyway) could break.

My leaning at this point, especially since you say SH3 is irrelevant,
is to use the same ucontext_t layout for them all (with the float reg
space empty for nofpu chips). If any real-world old apps break and
people care about them, we could make a personality that you set
manually for old-nofpu ucontext_t layout. But I suspect the issue will
just go away.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (14 preceding siblings ...)
  2015-06-24 18:03 ` Rich Felker
@ 2015-06-24 18:12 ` Rich Felker
  2015-06-24 19:37 ` Joseph Myers
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-24 18:12 UTC (permalink / raw)
  To: linux-sh

On Wed, Jun 24, 2015 at 02:10:06PM +0000, Joseph Myers wrote:
> On Wed, 24 Jun 2015, Rich Felker wrote:
> 
> > Nominally SH3 support remains in both the kernel and glibc. If it can
> > be established that multiple parties agree that there's really no one
> > left who cares about the old no-FPU sigcontext ABI on SH3, I will be
> > all for dropping it and unifying sigcontext.
> 
> Note that right now we have BE and LE versions of *three* ABIs for SH in 
> glibc (SH3 soft-float, SH4 soft-float, SH4 hard-float) (and as noted in 
> this discussion, right now each would only work properly on a kernel with 
> the corresponding configuration).  See 
> <https://sourceware.org/glibc/wiki/ABIList>.

Is your understanding that SH4 soft-float is using the SH4 ucontext_t
layout? I don't think it's even working at all. Glibc uses the layout
with fpu registers only if __SH4__ or __SH4A__ is defined, but GCC
does not define these macros when -m4-nofpu is used. Instead it
defines both __SH3__ and __SH4_NOFPU__. On the other hand, the kernel
uses:

#if defined(__SH4__) || defined(CONFIG_CPU_SH4) || \
    defined(__SH2A__) || defined(CONFIG_CPU_SH2A) || 1

to determine whether to include the FPU regs in the struct.
CONFIG_CPU_SH4 is presumably defined whenever the kernel is built for
the SH4 entry point code. So I don't think it's even possible to build
a kernel that's compatible with glibc's SH4 soft-float.

This seems to have been a silent ABI regression in glibc when the sh
sys/* sysdep headers were merged. Back when there were separate
versions in the sh3 and sh4 dirs, it _should_ have worked with the
kernel's definitions.

I think this level of breakage (that nobody seems to have noticed or
cared about) is sufficient to say let's just throw out the old no-fpu
ucontext_t and use the same struct everywhere for now. We can always
add a personality to get the old one back if anyone ever needs it.

> I think the next glibc change likely to require action from each 
> architecture's maintainer to avoid breaking the build may be Adhemerval's 
> cancellation changes - so if no-one comes forward as SH maintainer to at 
> least update SH for those changes when they are ready to go in, the build 
> for SH will be broken and that will indicate, as per 
> <https://sourceware.org/ml/libc-alpha/2015-06/msg00424.html>, that it may 
> be time to remove the port from glibc.

I may be available to do the cancellation changes (it's my design, so
I'm familiar with the requirements), but I'll probably have to get
copyright assignment paperwork taken care of first.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (15 preceding siblings ...)
  2015-06-24 18:12 ` Rich Felker
@ 2015-06-24 19:37 ` Joseph Myers
  2015-06-24 20:08 ` Rich Felker
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Joseph Myers @ 2015-06-24 19:37 UTC (permalink / raw)
  To: linux-sh

On Wed, 24 Jun 2015, Rich Felker wrote:

> On Wed, Jun 24, 2015 at 02:10:06PM +0000, Joseph Myers wrote:
> > On Wed, 24 Jun 2015, Rich Felker wrote:
> > 
> > > Nominally SH3 support remains in both the kernel and glibc. If it can
> > > be established that multiple parties agree that there's really no one
> > > left who cares about the old no-FPU sigcontext ABI on SH3, I will be
> > > all for dropping it and unifying sigcontext.
> > 
> > Note that right now we have BE and LE versions of *three* ABIs for SH in 
> > glibc (SH3 soft-float, SH4 soft-float, SH4 hard-float) (and as noted in 
> > this discussion, right now each would only work properly on a kernel with 
> > the corresponding configuration).  See 
> > <https://sourceware.org/glibc/wiki/ABIList>.
> 
> Is your understanding that SH4 soft-float is using the SH4 ucontext_t
> layout? I don't think it's even working at all. Glibc uses the layout

My understanding is what Kaz affirmed in 
<https://sourceware.org/ml/libc-alpha/2014-01/msg00388.html>.  It's 
entirely possible there are bugs (including regressions) in this area; if 
so, they should be filed in Bugzilla.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (16 preceding siblings ...)
  2015-06-24 19:37 ` Joseph Myers
@ 2015-06-24 20:08 ` Rich Felker
  2015-06-24 21:34 ` Rich Felker
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-24 20:08 UTC (permalink / raw)
  To: linux-sh

On Wed, Jun 24, 2015 at 07:37:45PM +0000, Joseph Myers wrote:
> On Wed, 24 Jun 2015, Rich Felker wrote:
> 
> > On Wed, Jun 24, 2015 at 02:10:06PM +0000, Joseph Myers wrote:
> > > On Wed, 24 Jun 2015, Rich Felker wrote:
> > > 
> > > > Nominally SH3 support remains in both the kernel and glibc. If it can
> > > > be established that multiple parties agree that there's really no one
> > > > left who cares about the old no-FPU sigcontext ABI on SH3, I will be
> > > > all for dropping it and unifying sigcontext.
> > > 
> > > Note that right now we have BE and LE versions of *three* ABIs for SH in 
> > > glibc (SH3 soft-float, SH4 soft-float, SH4 hard-float) (and as noted in 
> > > this discussion, right now each would only work properly on a kernel with 
> > > the corresponding configuration).  See 
> > > <https://sourceware.org/glibc/wiki/ABIList>.
> > 
> > Is your understanding that SH4 soft-float is using the SH4 ucontext_t
> > layout? I don't think it's even working at all. Glibc uses the layout
> 
> My understanding is what Kaz affirmed in 
> <https://sourceware.org/ml/libc-alpha/2014-01/msg00388.html>.  It's 
> entirely possible there are bugs (including regressions) in this area; if 
> so, they should be filed in Bugzilla.

OK, if that's the intent, I don't think it matches the present
reality. If I can confirm this I'll file a bug.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (17 preceding siblings ...)
  2015-06-24 20:08 ` Rich Felker
@ 2015-06-24 21:34 ` Rich Felker
  2015-06-24 22:02 ` Rich Felker
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-24 21:34 UTC (permalink / raw)
  To: linux-sh

On Wed, Jun 24, 2015 at 03:40:50AM -0500, Rob Landley wrote:
> 
> 
> On 06/24/2015 01:03 PM, Rich Felker wrote:
> > On Wed, Jun 24, 2015 at 02:12:58AM -0500, Rob Landley wrote:
> >> I've lost the plot here, is what I"m saying.
> > 
> > OK, I'll try to get us back on it then.
> > 
> > To begin with, let's put aside musl, revival of SH, and anything new
> > and just look at the existing situation.
> > 
> > Right now, SH3 or SH4-nofpu binaries are ABI-incompatible with SH4
> > kernels.
> 
> On glibc? Never tested it, all my sh4 stuff was uclibc until musl showed
> an interest.

Yes, on glibc.

> > This incompatibility is in a place very few applications are
> > going to use or care about, but it's essential for musl and it's going
> > to be essential for glibc once they get around to fixing cancellation.
> > 
> > Likewise, SH2 binaries are incompatible with SH2A kernels and SH4
> > kernels.
> 
> And nobody ever cared before because elf2flt or fdpic didn't run on
> systems _with_ an mmu, so the binary packaging format incompatability
> would hit you long before anything else did.

You missed the SH2A part. Otherwise that's largely correct.

> > Since our new SH2 binaries (using ELF, musl, and possibly glibc if the
> > port is not dropped) are also going to be compatible with running on
> > later MMU-ful hardware (e.g. J4), I don't want this same issue to be a
> > point of breakage for them.
> 
> This is the "lost the plot" part. I don't get it. What's the point? I do
> not understand why you have this as a goal.

I'll try to keep this brief since it's not the point of this thread:

- Ability to test your binaries with qemu-[system-]sh4[eb].
- Ability to test and debug on a real machine with MMU where crashes
  are debuggable and don't bring down the whole system.
- Avoiding death-by-target-combinatorics (ala uclibc).
- Upgrade path from SH2 to SH4.
- Sharing base userspace between low-end SH2 devices and higher-end
  SH4-based model.

If you want to discuss this in more detail let's do it somewhere other
than in a thread CC'd on several lists it's not terribly relevant to.

> > The userspace SH2 ABI is nofpu (no float registers for float args), so
> > there is already a separate userspace ABI for SH2 (and SH3) vs the
> > usual SH4 ABI with float. That's not a problem.
> 
> Yes, a separate ABI for sh2 vs sh4 has not, historically speaking, been
> a problem.

Separate userspace ABI is not a problem. The problem is the kernel
ABI.

> > Dynamic linked
> > binaries have their own separate shared library ecosystem, and for
> > static linked binaries, there's no userspace ABI boundary left once ld
> > runs. However kernel-user ABI breakage is a show-stopper. It means
> > that, even if you had the right ldso and libraries for nofpu SH2
> > binaries, you couldn't safely run them on SH4 because the kernel would
> > give you the wrong ucontext_t layout.
> 
> And historically used entirely different trap numbers for system calls,
> although you made a kernel patch for that. And a couple more to the
> kernel binary loader...

The trap numbers are something that can be worked around even without
my patch; it's just ugly. There's really no workaround for a type
varying at runtime, though. (The closest thing to a valid workaround
would be wrapped signal handlers, which are really ugly to implement;
glibc has considered doing them but so far held off. I don't want to
do them.)

> > If we want the SH-nofpu ABI to use the old nofpu ucontext_t layout,
> > then the kernel (and qemu-user) is going to need a way to detect
> > nofpu-ABI binaries and generate the right ucontext_t for them.
> 
> Or sh2 vs sh4 could be different compile-time targets with different
> libc instances?

They already are (assuming you use FPU on SH4; then one is using the
hard-float ABI and the other is using soft-float ABI), but that
doesn't help. The problem is the kernel ABI.

> > If we switch to using the same ucontext_t layout everywhere, the
> > kernel does not have to be smart, and the kernel ABI looks the same
> > for all SH variants, but old binaries (if they depend on ucontext_t
> > layout, which is _rare_ anyway) could break.
> 
> Old binaries can run under old kernels with old userspace partitions.

Well, not so practical if those old kernels have critical vulns...
But unless someone steps forward and says SH3 ucontext_t ABI is
important to existing applications that are deploying new kernels, I
think we can just wait to address this issue (with a personality)
if/when it ever arises.

> > My leaning at this point, especially since you say SH3 is irrelevant,
> > is to use the same ucontext_t layout for them all (with the float reg
> > space empty for nofpu chips). If any real-world old apps break and
> > people care about them, we could make a personality that you set
> > manually for old-nofpu ucontext_t layout. But I suspect the issue will
> > just go away.
> 
> I suspect the issue will just go away too.
> 
> After more patents expire next year, we can add full sh4 compatibility
> to j-core. If we want a better userspace api ala x86's x32 or mips
> o32/n32/nubi or arm's oabi/eabi, we can do that. (In fact that's one of
> 0pf.org's goals, kawasaki-san is _trying_ to run a standards body. If
> you want to wave an abi proposal at him for comment, he is the original
> superh architect...)

The SH ABI seems pretty good as-is, especially considering the
constraints it's working with. The only additional need for ABI work I
see at the moment is getting FDPIC working.

> I want musl to support sh2 but I _also_ want it to support coldfire and
> h8300 and so on. If musl is the successor to uclibc (which needs to be
> put out of its misery), it needs nommu support for several different
> architectures. If you insist that every nommu architecture must also run
> those nommu binaries on with-mmu sibling architectures, you're going to
> be unifying coldfire and m68k next...

If you look at the kernel I'm pretty sure that already works...
Coldfire does not seem to be a separate arch/ABI as far as the kernel
is concerned.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (18 preceding siblings ...)
  2015-06-24 21:34 ` Rich Felker
@ 2015-06-24 22:02 ` Rich Felker
  2015-06-25  6:24 ` Geert Uytterhoeven
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2015-06-24 22:02 UTC (permalink / raw)
  To: linux-sh

On Wed, Jun 24, 2015 at 04:14:45AM -0500, Rob Landley wrote:
> 
> 
> On 06/24/2015 01:12 PM, Rich Felker wrote:
> > On Wed, Jun 24, 2015 at 02:10:06PM +0000, Joseph Myers wrote:
> >> On Wed, 24 Jun 2015, Rich Felker wrote:
> >>
> >>> Nominally SH3 support remains in both the kernel and glibc. If it can
> >>> be established that multiple parties agree that there's really no one
> >>> left who cares about the old no-FPU sigcontext ABI on SH3, I will be
> >>> all for dropping it and unifying sigcontext.
> >>
> >> Note that right now we have BE and LE versions of *three* ABIs for SH in 
> >> glibc (SH3 soft-float, SH4 soft-float, SH4 hard-float) (and as noted in 
> >> this discussion, right now each would only work properly on a kernel with 
> >> the corresponding configuration).  See 
> >> <https://sourceware.org/glibc/wiki/ABIList>.
> > 
> > Is your understanding that SH4 soft-float is using the SH4 ucontext_t
> > layout? I don't think it's even working at all.
> 
> I never bothered to test floating point on it. It doesn't come up much
> with anything I do, and qemu's floating point emulation is notoriously
> dicey.

Float is not what you need to test to see the breakage. Rather you
need to test inspecting the uc_sigmask member of ucontext_t. With the
mismatched ABI it's in the wrong place so the signal handler sees the
wrong set of blocked signals if it inspects them. Without this working
right, it's impossible to implement cancellation correctly.

> If I do an x86-64 linux from scratch build the perl build dies with:
> https://twitter.com/landley/status/571883794279493633
> 
> Of course it doesn't happen in a chroot or using distcc to call out to
> the cross compiler, only when gcc does those floating point calculations
> under qemu-system-x86_64. (Presumably it wouldn't happen if I was using
> kvm instead of qemu either...) Given that, trying to prove anything
> about qemu-system-sh4's floating point seemed like a waste of time.

It's likely only x86 that's broken. Nobody emulates ld80 right.

> > Glibc uses the layout
> > with fpu registers only if __SH4__ or __SH4A__ is defined,
> 
> I've never built glibc for sh4. I could try installing the old debian
> sh4 chroot? (What release was that, squiggy? I tried installing Debian's
> alpha lenny chroot yesterday and "apt-get update" in the chroot is
> failing trying to hand off the wget data to gzip. Something with pipes
> in qemu-alpha application emulation, I think. It's on the todo list.)
> 
> If you're curious, I was following the qemu-debootstrap instructions on
> https://wiki.debian.org/ArmHardFloatChroot substituting in info from
> https://www.debian.org/ports/ (hence the ping on #musl about whether
> musl debian ports would be interesting). Also there's a debian sh4 page
> at https://wiki.debian.org/SH4 so if I needed to poke at glibc for sh4,
> that would probably be my starting point.

I doubt it would help test the SH4-nofpu config that seems to be
broken; surely they use the normal SH4 ABI with fpu. You'd need a
multilib gcc with support for -m4-nofpu (or a cross compiler for it).

> > but GCC
> > does not define these macros when -m4-nofpu is used. Instead it
> > defines both __SH3__ and __SH4_NOFPU__.
> 
> I hack around that sort of thing in builds all the time. Various bits of
> gnu software only ever agree with each other (or anything else) by
> coincidence.

You _really_ don't want to change this; doing so will break anything
using the macros. There's a reason they're done the way they are. In
particular __SH4__ indicates the availability of FPU and the
associated float ABI. For example, if __SH4__ is wrongly defined here,
musl would select asm for hard-float ABI despite the compiler
generating soft-float object files, and you'd end up with
ABI-mismatched files when linking.

> > On the other hand, the kernel uses:
> > 
> > #if defined(__SH4__) || defined(CONFIG_CPU_SH4) || \
> >     defined(__SH2A__) || defined(CONFIG_CPU_SH2A) || 1
> > 
> > to determine whether to include the FPU regs in the struct.
> > CONFIG_CPU_SH4 is presumably defined whenever the kernel is built for
> > the SH4 entry point code. So I don't think it's even possible to build
> > a kernel that's compatible with glibc's SH4 soft-float.
> 
> You think this is in any way unusual?

Yes. Patches to fix minor bugs are one thing, but having to patch
public kernel API/ABI breaking changes into a system is not something
that should be treated as normal/expected. If that feels normal to
people used to working with uclibc, well, that's part of the reason
uclibc needs to be replaced.

> http://landley.net/hg/aboriginal/file/tip/sources/patches
> 
> Patching stuff to make this kind of thing match up during a build is
> _normal_. It's means you're not on x86 (or these days, arm).

I have not encountered any breakage like this on any of the other
archs we work with. Kernel stability is usually taken very seriously.
I don't see any patches in your repo above that change kernel API/ABI.

> > This seems to have been a silent ABI regression in glibc when the sh
> > sys/* sysdep headers were merged. Back when there were separate
> > versions in the sh3 and sh4 dirs, it _should_ have worked with the
> > kernel's definitions.
> 
> Embedded development 101: first time the package broke most of the
> userbase just didn't upgrade to the broken version. If they're stuck on
> 2.4 (or 2.0!) as a result, and the device wasn't connected to the
> internet, they did not care. (The sad parts are where the device IS
> connected to the internet and they _still_ don't care.)

Just because it's been that way in the past doesn't mean it's
acceptable. Embedded/IoT are disasters waiting to happen given the way
embedded development has been handled up til now. I don't claim we can
fix everyone's practices, but without addressing fundamentally broken
things like this, it's hardly possible for anyone to fix their own.

> >> I think the next glibc change likely to require action from each 
> >> architecture's maintainer to avoid breaking the build may be Adhemerval's 
> >> cancellation changes - so if no-one comes forward as SH maintainer to at 
> >> least update SH for those changes when they are ready to go in, the build 
> >> for SH will be broken and that will indicate, as per 
> >> <https://sourceware.org/ml/libc-alpha/2015-06/msg00424.html>, that it may 
> >> be time to remove the port from glibc.
> > 
> > I may be available to do the cancellation changes (it's my design, so
> > I'm familiar with the requirements), but I'll probably have to get
> > copyright assignment paperwork taken care of first.
> 
> Ah right, copyright assignment. Rich is a much better choice to do this
> then.

:-)

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (19 preceding siblings ...)
  2015-06-24 22:02 ` Rich Felker
@ 2015-06-25  6:24 ` Geert Uytterhoeven
  2015-07-02 19:23 ` [musl] " Maciej W. Rozycki
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2015-06-25  6:24 UTC (permalink / raw)
  To: linux-sh

On Wed, Jun 24, 2015 at 11:34 PM, Rich Felker <dalias@libc.org> wrote:
>> I want musl to support sh2 but I _also_ want it to support coldfire and
>> h8300 and so on. If musl is the successor to uclibc (which needs to be
>> put out of its misery), it needs nommu support for several different
>> architectures. If you insist that every nommu architecture must also run
>> those nommu binaries on with-mmu sibling architectures, you're going to
>> be unifying coldfire and m68k next...
>
> If you look at the kernel I'm pretty sure that already works...
> Coldfire does not seem to be a separate arch/ABI as far as the kernel
> is concerned.

Off-topic, but:
1. While the Coldfire user mode instruction set is (more or less?) a subset
   of the classic m68k user mode instruction set, there are larger differences
   in the supervisor mode instruction sets.
2. Coldfire uses a different MMU than classic m68k.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (20 preceding siblings ...)
  2015-06-25  6:24 ` Geert Uytterhoeven
@ 2015-07-02 19:23 ` Maciej W. Rozycki
  2015-07-02 22:51 ` Rob Landley
  2015-07-03  6:43 ` Andreas Schwab
  23 siblings, 0 replies; 25+ messages in thread
From: Maciej W. Rozycki @ 2015-07-02 19:23 UTC (permalink / raw)
  To: linux-sh

On Sat, 20 Jun 2015, Rob Landley wrote:

> >>>> Thanks, but most of the links seem to be broken.
> >>>
> >>> Are they?  I'm only seeing a single broken link, which has a mirror.
> >>
> >> My bad. Indeed only the davej one is broken, but that's where the code
> >> must have been introduced (even the earliest commit in tglx
> >> history.git has the #ifdef __SH4__ for FPU regs) and I can't find a
> >> cgit interface to it. Fetching several GB to browse history locally is
> >> going to take a while if I have to do that..
> > 
> > Using web interfaces for archeology doesn't fly.
> > If you're doing serious Linux work, you should already have a git repository
> > of the kernel. full-history-linux.git.tar weights in at only ca. 0.5 giB.
> 
> I have a somewhat updated version of that at
> http://landley.net/kdocs/local/linux-fullhist.tar.bz2 which I should
> probably update for the 4.0 release. (It's pulled to 3.0 currently.)

 For the record the LMO tree <git://git.linux-mips.org/pub/scm/ralf/linux> 
has a full history recorded and is in sync with kernel.org.  There's some 
GIT magic that cuts some operations like `git log' at 2.6.12-rc2, but you 
can go beyond that if you know the right commit id, e.g.:

$ git log -p 66f0a432 -- arch/sh

I can see the initial SH import was with 2.3.19.

  Maciej

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (21 preceding siblings ...)
  2015-07-02 19:23 ` [musl] " Maciej W. Rozycki
@ 2015-07-02 22:51 ` Rob Landley
  2015-07-03  6:43 ` Andreas Schwab
  23 siblings, 0 replies; 25+ messages in thread
From: Rob Landley @ 2015-07-02 22:51 UTC (permalink / raw)
  To: linux-sh

On 07/02/2015 02:23 PM, Maciej W. Rozycki wrote:
> On Sat, 20 Jun 2015, Rob Landley wrote:
> 
>>>>>> Thanks, but most of the links seem to be broken.
>>>>>
>>>>> Are they?  I'm only seeing a single broken link, which has a mirror.
>>>>
>>>> My bad. Indeed only the davej one is broken, but that's where the code
>>>> must have been introduced (even the earliest commit in tglx
>>>> history.git has the #ifdef __SH4__ for FPU regs) and I can't find a
>>>> cgit interface to it. Fetching several GB to browse history locally is
>>>> going to take a while if I have to do that..
>>>
>>> Using web interfaces for archeology doesn't fly.
>>> If you're doing serious Linux work, you should already have a git repository
>>> of the kernel. full-history-linux.git.tar weights in at only ca. 0.5 giB.
>>
>> I have a somewhat updated version of that at
>> http://landley.net/kdocs/local/linux-fullhist.tar.bz2 which I should
>> probably update for the 4.0 release. (It's pulled to 3.0 currently.)
> 
>  For the record the LMO tree <git://git.linux-mips.org/pub/scm/ralf/linux> 
> has a full history recorded and is in sync with kernel.org.  There's some 
> GIT magic that cuts some operations like `git log' at 2.6.12-rc2, but you 
> can go beyond that if you know the right commit id, e.g.:
> 
> $ git log -p 66f0a432 -- arch/sh
> 
> I can see the initial SH import was with 2.3.19.

If you grab the above tarball, git checkout -f, and git pull, operations
like git log work just fine all the way back to 0.0.1.

The problem is that each git commit hash includes metadata that
describes the parents, so if you retroactively insert parent commits you
change the hash ID of every single descendant.

You can add extra metadata nodes that glue commits together, but "git
clone" won't always look for that extra metadata when working out what
constitutes the branch of the tree you asked for.

I just found something that worked and stopped fiddling with it. The guy
who did it used "git graft" ala http://www.padator.org/linux.php

And apparently you're supposed to use "subtree" commands instead these
days (as opposed to "splice" which may or may not have anything to do
with graft?):
https://www.kernel.org/pub/software/scm/git/docs/howto/using-merge-subtree.html

Because it just wouldn't be git if there weren't 3 subtly different ways
to do the same thing. (And this is ignore the actual history rewriting
packages people made which _do_ change the sha1sums.)

In any case, the tag list only goes back to 2.6.12-rc2. I never went
back and tagged the earlier commits when I put my tree up on
kernel.org/doc back when 3.0 came out.

Rob

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] Re: SH sigcontext ABI is broken
  2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
                   ` (22 preceding siblings ...)
  2015-07-02 22:51 ` Rob Landley
@ 2015-07-03  6:43 ` Andreas Schwab
  23 siblings, 0 replies; 25+ messages in thread
From: Andreas Schwab @ 2015-07-03  6:43 UTC (permalink / raw)
  To: linux-sh

Rob Landley <rob@landley.net> writes:

> And apparently you're supposed to use "subtree" commands instead these

No, git replace --graft

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-07-03  6:43 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-19  7:09 SH sigcontext ABI is broken Rich Felker
2015-06-19  7:41 ` Andreas Schwab
2015-06-19 19:12 ` [musl] " Rich Felker
2015-06-19 19:57 ` Andreas Schwab
2015-06-19 20:32 ` Rich Felker
2015-06-20  8:10 ` Geert Uytterhoeven
2015-06-20 18:06 ` [musl] " Rich Felker
2015-06-20 19:59 ` [musl] " Rob Landley
2015-06-24  4:25 ` [musl] " Rob Landley
2015-06-24  4:52 ` Rich Felker
2015-06-24  7:12 ` Rob Landley
2015-06-24  8:23 ` Rob Landley
2015-06-24  8:40 ` Rob Landley
2015-06-24  9:14 ` Rob Landley
2015-06-24 14:10 ` Joseph Myers
2015-06-24 18:03 ` Rich Felker
2015-06-24 18:12 ` Rich Felker
2015-06-24 19:37 ` Joseph Myers
2015-06-24 20:08 ` Rich Felker
2015-06-24 21:34 ` Rich Felker
2015-06-24 22:02 ` Rich Felker
2015-06-25  6:24 ` Geert Uytterhoeven
2015-07-02 19:23 ` [musl] " Maciej W. Rozycki
2015-07-02 22:51 ` Rob Landley
2015-07-03  6:43 ` Andreas Schwab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.