All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
       [not found] <CA+QBN9C=wHxV5Y=SSt2UB_4b1H_Nx1mZxZu3rvuTY2wRe=GWBw@mail.gmail.com>
@ 2018-03-15 20:43 ` Helge Deller
  2018-03-15 20:48   ` John David Anglin
  2018-03-15 20:51   ` John David Anglin
  0 siblings, 2 replies; 11+ messages in thread
From: Helge Deller @ 2018-03-15 20:43 UTC (permalink / raw)
  To: debian-hppa; +Cc: linux-parisc

Hi Carlo,

On 15.03.2018 16:36, Carlo Pisani wrote:
> I am experiencing a very annoying behavior with my HPPA C3600: if I
> compile the (linux) kernel with  -mlong-calls then the IO (e.g. file
> copy) becomes very slow, and the PCI becomes unstable (i.e. it crashes
> the machine)
> 
> kernel  gcc     binutils    with mlong    without mlong
> 4.15.7  4.9.3   2.25.1     13.4 MB/s    27.0 MB/s
> 4.15.7  6.4.0   2.25.1     13.4 MB/s    27.0 MB/s
> 4.15.7  6.4.0   2.29.1     14.4 MB/s    25.0 MB/s

Interesting bad results!
 
> these tests were performed with
> 
> dd if=/dev/zero of=here bs=1k count=100000
> 
> -mlong-calls is enabled in the kernel by "CONFIG_MLONGCALLS"

I think nobody else noticed the bad performance due to CONFIG_MLONGCALLS yet.
I've now started some testing if we can disable that option on the debian
kernels...

> the help-guide says "If you configure the kernel to include many
> drivers built-in instead as modules, the kernel executable may become
> too big, so that the linker will not be able to resolve some long
> branches and fails to link your vmlinux kernel. In that case enabling
> this option will help you to overcome this limit by using the
> -mlong-calls compiler option. Usually you want to say N here, unless
> you e.g. want to build a kernel which includes all necessary drivers
> built-in and which can be used for TFTP booting without the need to
> have an initrd ramdisk. Enabling this option will probably slow down
> your kernel"
> 
> I need -mlong-calls because I need to compile the kernel without
> kernel-modules

Why?

> all built-in, that makes the size of the kernel of
> about 23Mbytes, thus without -mlong-calls the linker fails to "link"
> objects
> 
> let me know

I'm not sure what kind of help you expect here?
The only option I see is that you try to disable some options (modules) you won't
need and thus reduce the kernel size. xfs, ipv6 and such are good candidates.
Or use a 32bit kernel ?

Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-15 20:43 ` C3600 kernel/64bit 4.* slow IO due to -mlong-calls Helge Deller
@ 2018-03-15 20:48   ` John David Anglin
  2018-03-15 20:51   ` John David Anglin
  1 sibling, 0 replies; 11+ messages in thread
From: John David Anglin @ 2018-03-15 20:48 UTC (permalink / raw)
  To: Helge Deller, debian-hppa; +Cc: linux-parisc

On 2018-03-15 4:43 PM, Helge Deller wrote:
> On 15.03.2018 16:36, Carlo Pisani wrote:
>> I am experiencing a very annoying behavior with my HPPA C3600: if I
>> compile the (linux) kernel with  -mlong-calls then the IO (e.g. file
>> copy) becomes very slow, and the PCI becomes unstable (i.e. it crashes
>> the machine)
>>
>> kernel  gcc     binutils    with mlong    without mlong
>> 4.15.7  4.9.3   2.25.1     13.4 MB/s    27.0 MB/s
>> 4.15.7  6.4.0   2.25.1     13.4 MB/s    27.0 MB/s
>> 4.15.7  6.4.0   2.29.1     14.4 MB/s    25.0 MB/s
> Interesting bad results!
I don't believe the instability mentioned is due to long calls. It's the 
following issue:
https://www.spinics.net/lists/linux-parisc/msg01024.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-15 20:43 ` C3600 kernel/64bit 4.* slow IO due to -mlong-calls Helge Deller
  2018-03-15 20:48   ` John David Anglin
@ 2018-03-15 20:51   ` John David Anglin
  2018-03-16 11:25     ` Aw: " Helge Deller
  1 sibling, 1 reply; 11+ messages in thread
From: John David Anglin @ 2018-03-15 20:51 UTC (permalink / raw)
  To: Helge Deller, debian-hppa; +Cc: linux-parisc

On 2018-03-15 4:43 PM, Helge Deller wrote:
>> kernel  gcc     binutils    with mlong    without mlong
>> 4.15.7  4.9.3   2.25.1     13.4 MB/s    27.0 MB/s
>> 4.15.7  6.4.0   2.25.1     13.4 MB/s    27.0 MB/s
>> 4.15.7  6.4.0   2.29.1     14.4 MB/s    25.0 MB/s
> Interesting bad results!
>   
It's hard to understand why the performance would deteriorate so much 
but I see essentially
the same behavior.

Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Aw: Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-15 20:51   ` John David Anglin
@ 2018-03-16 11:25     ` Helge Deller
  2018-03-16 13:37       ` John David Anglin
  2018-03-16 17:29       ` Matt Turner
  0 siblings, 2 replies; 11+ messages in thread
From: Helge Deller @ 2018-03-16 11:25 UTC (permalink / raw)
  To: John David Anglin; +Cc: debian-hppa, linux-parisc

> >> kernel  gcc     binutils    with mlong    without mlong
> >> 4.15.7  4.9.3   2.25.1     13.4 MB/s    27.0 MB/s
> >> 4.15.7  6.4.0   2.25.1     13.4 MB/s    27.0 MB/s
> >> 4.15.7  6.4.0   2.29.1     14.4 MB/s    25.0 MB/s
> > Interesting bad results!
> >   
> It's hard to understand why the performance would deteriorate so much 
> but I see essentially the same behavior.

Speaking of debian kernel, it's nearly impossible to link a kernel without mlong-calls.

Compiling without mlong-calls generates this (R_PARISC_PCREL22F):
        b,l external_func,%r2
        nop

With -mlong-calls it is much more complex:
.LC0:
        .dword  P%external_func
.globl a
a:
        addil LT'.LC0,%r27
        ldd RT'.LC0(%r1),%r28
        ldd 0(%r28),%r28
        ldd 16(%r28),%r2
        bve,l (%r2),%r2


Since our kernel is running in the first 4GB of RAM (even on 64bit), couldn't we instead
introduce a gcc option, e.g. "-mkernel-indirect-calls", which translates to:
        ldil    L%external_func, %r2        // R_PARISC_DIR21L
        ldo     R%external_func(%r2), %r2   // R_PARISC_DIR14R  
        bve,l (%r2),%r2

Does -mfast-indirect-calls has any effect at all?
I haven't seen any difference when using this option.

Thoughts?
Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Aw: Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-16 11:25     ` Aw: " Helge Deller
@ 2018-03-16 13:37       ` John David Anglin
  2018-03-18 13:31         ` John David Anglin
  2018-03-16 17:29       ` Matt Turner
  1 sibling, 1 reply; 11+ messages in thread
From: John David Anglin @ 2018-03-16 13:37 UTC (permalink / raw)
  To: Helge Deller; +Cc: debian-hppa, linux-parisc

On 2018-03-16 7:25 AM, Helge Deller wrote:
>>>> kernel  gcc     binutils    with mlong    without mlong
>>>> 4.15.7  4.9.3   2.25.1     13.4 MB/s    27.0 MB/s
>>>> 4.15.7  6.4.0   2.25.1     13.4 MB/s    27.0 MB/s
>>>> 4.15.7  6.4.0   2.29.1     14.4 MB/s    25.0 MB/s
>>> Interesting bad results!
>>>    
>> It's hard to understand why the performance would deteriorate so much
>> but I see essentially the same behavior.
> Speaking of debian kernel, it's nearly impossible to link a kernel without mlong-calls.
>
> Compiling without mlong-calls generates this (R_PARISC_PCREL22F):
>          b,l external_func,%r2
>          nop
On PA 2.0, this is a 22 bit pc-relative call that has a branch distance 
of 8 MB.  We have no stub support
in the gnu 64-bit linker.  If we had stub support, this would be best 
solution.

In addition to the argument registers, the argument pointer needs to be 
loaded for each call.

>
> With -mlong-calls it is much more complex:
> .LC0:
>          .dword  P%external_func
> .globl a
> a:
>          addil LT'.LC0,%r27
>          ldd RT'.LC0(%r1),%r28
>          ldd 0(%r28),%r28
>          ldd 16(%r28),%r2
>          bve,l (%r2),%r2
This is standard 64-bit indirect call.  It calls via a function 
descriptor.  It assumes the PIC register may change
and the callee may be in a different space (i.e., 64-bit hpux runtime).  
The bve instruction is specific to PA 2.0.
b
In the kernel, we probably don't need the load of the new PIC register 
(omitted from the above).

>
>
> Since our kernel is running in the first 4GB of RAM (even on 64bit), couldn't we instead
> introduce a gcc option, e.g. "-mkernel-indirect-calls", which translates to:
>          ldil    L%external_func, %r2        // R_PARISC_DIR21L
>          ldo     R%external_func(%r2), %r2   // R_PARISC_DIR14R
>          bve,l (%r2),%r2
Another option is to use ble (i.e., call sequence generated using 
-mfast-indirect-calls).  It yields the same length
call sequence as your above sequence and it works on both PA 1.x and 2.0.

The above sequence is not PIC.  What about modules?

In the above three sequences, there is a delay slot after the branch 
which might be filled by the compiler with a
useful instruction.
>
> Does -mfast-indirect-calls has any effect at all?
> I haven't seen any difference when using this option.
At the moment, this option only applies to the 32-bit compiler.
>
> Thoughts?

I don't remember any huge increase in gcc build time with -mlong-calls.  
Calls don't usually dominate performance.

Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-16 11:25     ` Aw: " Helge Deller
  2018-03-16 13:37       ` John David Anglin
@ 2018-03-16 17:29       ` Matt Turner
  2018-03-16 17:44         ` John David Anglin
  1 sibling, 1 reply; 11+ messages in thread
From: Matt Turner @ 2018-03-16 17:29 UTC (permalink / raw)
  To: Helge Deller; +Cc: John David Anglin, debian-hppa, linux-parisc

On Fri, Mar 16, 2018 at 4:25 AM, Helge Deller <deller@gmx.de> wrote:
>> >> kernel  gcc     binutils    with mlong    without mlong
>> >> 4.15.7  4.9.3   2.25.1     13.4 MB/s    27.0 MB/s
>> >> 4.15.7  6.4.0   2.25.1     13.4 MB/s    27.0 MB/s
>> >> 4.15.7  6.4.0   2.29.1     14.4 MB/s    25.0 MB/s
>> > Interesting bad results!
>> >
>> It's hard to understand why the performance would deteriorate so much
>> but I see essentially the same behavior.
>
> Speaking of debian kernel, it's nearly impossible to link a kernel without mlong-calls.

This week I succeeded in building a stripped-down kernel without
mlong-calls (in Gentoo), but was unable to get anything to link
without mlong-calls when CONFIG_PARISC_PAGE_SIZE_16KB=y.

With mlong-calls, I couldn't get a 16K page-size kernel to boot
either. Is this a configuration anyone uses or tests? 16K pages are
supposed to give better performance, so it'd be good if they worked.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-16 17:29       ` Matt Turner
@ 2018-03-16 17:44         ` John David Anglin
  2018-03-16 17:58           ` Matt Turner
  0 siblings, 1 reply; 11+ messages in thread
From: John David Anglin @ 2018-03-16 17:44 UTC (permalink / raw)
  To: Matt Turner, Helge Deller; +Cc: debian-hppa, linux-parisc

On 2018-03-16 1:29 PM, Matt Turner wrote:
> This week I succeeded in building a stripped-down kernel without
> mlong-calls (in Gentoo), but was unable to get anything to link
> without mlong-calls when CONFIG_PARISC_PAGE_SIZE_16KB=y.
>
I wouldn't recommend the above option.  It affects alignment of some 
things in kernel
and as you found it makes the kernel bigger.  There are also some things 
in userspace
that assume 4KB pages.

Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-16 17:44         ` John David Anglin
@ 2018-03-16 17:58           ` Matt Turner
  2018-03-16 18:03             ` John David Anglin
  0 siblings, 1 reply; 11+ messages in thread
From: Matt Turner @ 2018-03-16 17:58 UTC (permalink / raw)
  To: John David Anglin; +Cc: Helge Deller, debian-hppa, linux-parisc

On Fri, Mar 16, 2018 at 10:44 AM, John David Anglin
<dave.anglin@bell.net> wrote:
> On 2018-03-16 1:29 PM, Matt Turner wrote:
>>
>> This week I succeeded in building a stripped-down kernel without
>> mlong-calls (in Gentoo), but was unable to get anything to link
>> without mlong-calls when CONFIG_PARISC_PAGE_SIZE_16KB=y.
>>
> I wouldn't recommend the above option.  It affects alignment of some things
> in kernel
> and as you found it makes the kernel bigger.  There are also some things in
> userspace
> that assume 4KB pages.

I expect we're past the point where there are any significant
obstacles to userspace support. Most platforms successfully support
multiple page sizes these days.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-16 17:58           ` Matt Turner
@ 2018-03-16 18:03             ` John David Anglin
  2018-03-16 19:50               ` Helge Deller
  0 siblings, 1 reply; 11+ messages in thread
From: John David Anglin @ 2018-03-16 18:03 UTC (permalink / raw)
  To: Matt Turner; +Cc: Helge Deller, debian-hppa, linux-parisc

On 2018-03-16 1:58 PM, Matt Turner wrote:
> On Fri, Mar 16, 2018 at 10:44 AM, John David Anglin
> <dave.anglin@bell.net> wrote:
>> On 2018-03-16 1:29 PM, Matt Turner wrote:
>>> This week I succeeded in building a stripped-down kernel without
>>> mlong-calls (in Gentoo), but was unable to get anything to link
>>> without mlong-calls when CONFIG_PARISC_PAGE_SIZE_16KB=y.
>>>
>> I wouldn't recommend the above option.  It affects alignment of some things
>> in kernel
>> and as you found it makes the kernel bigger.  There are also some things in
>> userspace
>> that assume 4KB pages.
> I expect we're past the point where there are any significant
> obstacles to userspace support. Most platforms successfully support
> multiple page sizes these days.
>
But they don't have the problem we do with non equivalent aliasing. The 
data section starts on a
page boundary on parisc and it doesn't overlap text.

Dave

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-16 18:03             ` John David Anglin
@ 2018-03-16 19:50               ` Helge Deller
  0 siblings, 0 replies; 11+ messages in thread
From: Helge Deller @ 2018-03-16 19:50 UTC (permalink / raw)
  To: John David Anglin, Matt Turner; +Cc: debian-hppa, linux-parisc

On 16.03.2018 19:03, John David Anglin wrote:
> On 2018-03-16 1:58 PM, Matt Turner wrote:
>> On Fri, Mar 16, 2018 at 10:44 AM, John David Anglin
>> <dave.anglin@bell.net> wrote:
>>> On 2018-03-16 1:29 PM, Matt Turner wrote:
>>>> This week I succeeded in building a stripped-down kernel without
>>>> mlong-calls (in Gentoo), but was unable to get anything to link
>>>> without mlong-calls when CONFIG_PARISC_PAGE_SIZE_16KB=y.
>>>>
>>> I wouldn't recommend the above option.  It affects alignment of some things
>>> in kernel
>>> and as you found it makes the kernel bigger.  There are also some things in
>>> userspace
>>> that assume 4KB pages.
>> I expect we're past the point where there are any significant
>> obstacles to userspace support. Most platforms successfully support
>> multiple page sizes these days.
>>
> But they don't have the problem we do with non equivalent aliasing. The data section starts on a
> page boundary on parisc and it doesn't overlap text.

I'd be astonished, if anything other than 4kB page size is able to 
boot to a login prompt.
For example, I know the parisc PCI-specific code (dino,lba,...) still
depends on 4kb page sizes.
I've left the CONFIG options in the source in the hope somebody
will try to finish >4kb page support at some point.

Helge

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Aw: Re: C3600 kernel/64bit 4.* slow IO due to -mlong-calls
  2018-03-16 13:37       ` John David Anglin
@ 2018-03-18 13:31         ` John David Anglin
  0 siblings, 0 replies; 11+ messages in thread
From: John David Anglin @ 2018-03-18 13:31 UTC (permalink / raw)
  To: Helge Deller; +Cc: debian-hppa, linux-parisc

On 2018-03-16 9:37 AM, John David Anglin wrote:
> On 2018-03-16 7:25 AM, Helge Deller wrote:
>>>>> kernel  gcc     binutils    with mlong    without mlong
>>>>> 4.15.7  4.9.3   2.25.1     13.4 MB/s    27.0 MB/s
>>>>> 4.15.7  6.4.0   2.25.1     13.4 MB/s    27.0 MB/s
>>>>> 4.15.7  6.4.0   2.29.1     14.4 MB/s    25.0 MB/s
>>>> Interesting bad results!
>>> It's hard to understand why the performance would deteriorate so much
>>> but I see essentially the same behavior.
The performance difference between with and without long calls is 
exaggerated by the I/O
test used for the above results.  I see 22:05 and 21:39 hours for a gcc 
build and check with
and without kernel long calls on c8000, respectively.

I think the poor performance of long calls is primarily due to the loads 
which can trigger
TLB misses.  This implies we should work to minimize the impact of TLB 
flushes.
Flushing the whole TLB is quite detrimental to overall performance and 
it doesn't scale
well to multiple CPUs.  On rp3440, a pdc instruction takes about 570 
cycles because of
the broadcast to other CPUs.  So, we need to know whether a mapping is 
local and possibly
the set of CPUs a mapping applies to.

>> Speaking of debian kernel, it's nearly impossible to link a kernel 
>> without mlong-calls.
>>
>> Compiling without mlong-calls generates this (R_PARISC_PCREL22F):
>>          b,l external_func,%r2
>>          nop
> On PA 2.0, this is a 22 bit pc-relative call that has a branch 
> distance of 8 MB.  We have no stub support
> in the gnu 64-bit linker.  If we had stub support, this would be best 
> solution.
>
> In addition to the argument registers, the argument pointer needs to 
> be loaded for each call.
>
>>
>> With -mlong-calls it is much more complex:
>> .LC0:
>>          .dword  P%external_func
>> .globl a
>> a:
>>          addil LT'.LC0,%r27
>>          ldd RT'.LC0(%r1),%r28
>>          ldd 0(%r28),%r28
>>          ldd 16(%r28),%r2
>>          bve,l (%r2),%r2
> This is standard 64-bit indirect call.  It calls via a function 
> descriptor.  It assumes the PIC register may change
> and the callee may be in a different space (i.e., 64-bit hpux 
> runtime).  The bve instruction is specific to PA 2.0.
> b
> In the kernel, we probably don't need the load of the new PIC register 
> (omitted from the above).
We don't think we need function descriptors in the kernel.  They are 
only needed to load a new PIC register.
So, we can load the function address directly from the linkage table.

         addil LT'external_function,%r27
         ldd RT'external_function(%r1),%r2
         bve,l (%r2),%r2
         Delay slot

The above sequence is PIC.  It is the same length as the one suggested 
by Helge below and the
linker could convert it to Helge's sequence when the call is not 
external to the main linux kernel.
It does have one load that might trigger a TLB miss.

I don't know enough about the call sequences used to call functions in 
external modules but
it might be easier to do the relocation for the above.  It's probably 
already handled as the addil/ldd
sequence should already load the address of external_function.

It might also be possible to use a 32-bit PIC pc-relative sequence, but 
it is longer and 32-bit
pc-relative relocations might not be supported.

>
>>
>>
>> Since our kernel is running in the first 4GB of RAM (even on 64bit), 
>> couldn't we instead
>> introduce a gcc option, e.g. "-mkernel-indirect-calls", which 
>> translates to:
>>          ldil    L%external_func, %r2        // R_PARISC_DIR21L
>>          ldo     R%external_func(%r2), %r2   // R_PARISC_DIR14R
>>          bve,l (%r2),%r2
> Another option is to use ble (i.e., call sequence generated using 
> -mfast-indirect-calls).  It yields the same length
> call sequence as your above sequence and it works on both PA 1.x and 2.0.
>
> The above sequence is not PIC.  What about modules?
>
> In the above three sequences, there is a delay slot after the branch 
> which might be filled by the compiler with a
> useful instruction.
>>
>> Does -mfast-indirect-calls has any effect at all?
>> I haven't seen any difference when using this option.
> At the moment, this option only applies to the 32-bit compiler.
>>
>> Thoughts?
>
> I don't remember any huge increase in gcc build time with 
> -mlong-calls.  Calls don't usually dominate performance.
>
> Dave
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-18 13:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CA+QBN9C=wHxV5Y=SSt2UB_4b1H_Nx1mZxZu3rvuTY2wRe=GWBw@mail.gmail.com>
2018-03-15 20:43 ` C3600 kernel/64bit 4.* slow IO due to -mlong-calls Helge Deller
2018-03-15 20:48   ` John David Anglin
2018-03-15 20:51   ` John David Anglin
2018-03-16 11:25     ` Aw: " Helge Deller
2018-03-16 13:37       ` John David Anglin
2018-03-18 13:31         ` John David Anglin
2018-03-16 17:29       ` Matt Turner
2018-03-16 17:44         ` John David Anglin
2018-03-16 17:58           ` Matt Turner
2018-03-16 18:03             ` John David Anglin
2018-03-16 19:50               ` Helge Deller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.