All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baptiste Jonglez <baptiste@bitsofnetworks.org>
To: "René van Dorst" <opensource@vdorst.com>
Cc: wireguard@lists.zx2c4.com
Subject: Re: [WireGuard] News about MIPS and ARM optimized code?
Date: Fri, 9 Sep 2016 15:52:02 +0200	[thread overview]
Message-ID: <20160909135202.GA32666@lud.imag.fr> (raw)
In-Reply-To: <20160909134611.Horde.d1CtbRQrioV8yr-kI71aUI3@www.vdorst.com>

[-- Attachment #1: Type: text/plain, Size: 3959 bytes --]

Nice work!  I had tried to write chacha20_generic_block in MIPS assembly,
but I got confused with endianness issues and the code didn't work in the
end.

Is your code available somewhere?  I'd be happy to test on a variety of
MIPS routers.

On Fri, Sep 09, 2016 at 01:46:11PM +0000, René van Dorst wrote:
> Duo the misaligned data fetching function like poly1305 causes regression on
> the mips.
> 
> 	h0 += (le32_to_cpuvp(src +  0) >> 0) & 0x3ffffff;
> 		h1 += (le32_to_cpuvp(src +  3) >> 2) & 0x3ffffff;
> 		h2 += (le32_to_cpuvp(src +  6) >> 4) & 0x3ffffff;
> 		h3 += (le32_to_cpuvp(src +  9) >> 6) & 0x3ffffff;
> 		h4 += (le32_to_cpuvp(src + 12) >> 8) | hibit;
> 
> 
> Had 26MBit now +42.
> 
> root@lede:~# iperf3 -c 10.0.0.1 -i 10
> Connecting to host 10.0.0.1, port 5201
> [  4] local 10.0.0.2 port 36216 connected to 10.0.0.1 port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-10.08  sec  51.2 MBytes  42.7 Mbits/sec    0    171 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-10.08  sec  51.2 MBytes  42.7 Mbits/sec    0             sender
> [  4]   0.00-10.08  sec  51.2 MBytes  42.7 Mbits/sec                  receiver
> 
> iperf Done.
> root@lede:~# iperf3 -c 10.0.0.1 -u -b 1G -i 10
> Connecting to host 10.0.0.1, port 5201
> [  4] local 10.0.0.2 port 60714 connected to 10.0.0.1 port 5201
> [ ID] Interval           Transfer     Bandwidth       Total Datagrams
> [  4]   0.00-10.00  sec  56.3 MBytes  47.2 Mbits/sec  7209
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total
> Datagrams
> [  4]   0.00-10.00  sec  56.3 MBytes  47.2 Mbits/sec  0.034 ms  0/7209 (0%)
> [  4] Sent 7209 datagrams
> 
> iperf Done.
> root@lede:~#
> 
> 
> Work is not done yet but a good start.
> 
> Greats,
> 
> René van Dorst.
> 
> Quoting René van Dorst <opensource@vdorst.com>:
> 
> >I did try to write some MIPS32r2 code.
> >I wrote the chacha20_keysetup, chacha20_generic_block and
> >poly1305_generic_blocks in assembly.
> >Tried to load all needed variables in the registers. Which should reduce
> >the memory overhead.
> >But it is very difficult for me to do code profiling and/or isolate the
> >code and make some benchmark programs like supercop.
> >So testing was simple. Crosscompile the code. Copy and load the module on
> >the target. Run setup script and iperf.
> >
> >#ifdef CONFIG_CPU_MIPS32_R2
> >asmlinkage void chacha20_keysetup(struct chacha20_ctx *ctx, const u8
> >key[static 32], const u8 nonce[static 8]);
> >asmlinkage void chacha20_generic_block(struct chacha20_ctx *ctx);
> >asmlinkage unsigned int poly1305_generic_blocks(struct poly1305_ctx *ctx,
> >const u8 *src, unsigned int srclen, u32 hibit);
> >#endif
> >
> >But the speed is equal or less on my TP WR1043ND device which is a
> >MIPS32r2 24kc big endian.
> >So GCC does a good job. Also 24kc has no special CoProcessors or FPU.
> >
> >Most improvement what I had it to change the buildroot default
> >optimization -Os to -O2.
> >This gives around 1-3% speed improvement.
> >
> >ideas:
> >- remove the little endian parts on the MIPS.
> >  Offcourse do it also on the other side.
> >  On this device I can't switch endian.
> >  But I did not see any improvements. Need 2 instruction for swapping
> >32bit register.
> >  After a quick calculation it could save around 0.4% which is ~0.1MBit/s
> >on this device.
> >
> >Greats,
> >
> >René van Dorst.
> >
> >_______________________________________________
> >WireGuard mailing list
> >WireGuard@lists.zx2c4.com
> >http://lists.zx2c4.com/mailman/listinfo/wireguard
> 
> 
> 
> _______________________________________________
> WireGuard mailing list
> WireGuard@lists.zx2c4.com
> http://lists.zx2c4.com/mailman/listinfo/wireguard

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

  reply	other threads:[~2016-09-09 13:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-08 13:23 [WireGuard] News about MIPS and ARM optimized code? René van Dorst
2016-08-08 14:29 ` Jason A. Donenfeld
2016-09-08 11:57   ` René van Dorst
2016-09-09 13:46     ` René van Dorst
2016-09-09 13:52       ` Baptiste Jonglez [this message]
2016-09-09 15:22         ` René van Dorst
2016-09-09 19:49           ` René van Dorst
2016-09-14  7:16             ` René van Dorst
2016-09-20 20:39               ` Jason A. Donenfeld
2016-09-22 18:27                 ` René van Dorst
2016-09-27  1:48               ` Jason A. Donenfeld
2016-09-14  8:10         ` jens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160909135202.GA32666@lud.imag.fr \
    --to=baptiste@bitsofnetworks.org \
    --cc=opensource@vdorst.com \
    --cc=wireguard@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.