linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Alexey Dobriyan <adobriyan@gmail.com>,
	Herbert Xu <herbert@gondor.hengli.com.au>
Subject: [11/65] crypto: sha512 - reduce stack usage to safe number
Date: Wed, 01 Feb 2012 12:55:51 -0800	[thread overview]
Message-ID: <20120201205738.452643756@clark.kroah.org> (raw)
In-Reply-To: <20120201210236.GA25966@kroah.com>

3.0-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Alexey Dobriyan <adobriyan@gmail.com>

commit 51fc6dc8f948047364f7d42a4ed89b416c6cc0a3 upstream.

For rounds 16--79, W[i] only depends on W[i - 2], W[i - 7], W[i - 15] and W[i - 16].
Consequently, keeping all W[80] array on stack is unnecessary,
only 16 values are really needed.

Using W[16] instead of W[80] greatly reduces stack usage
(~750 bytes to ~340 bytes on x86_64).

Line by line explanation:
* BLEND_OP
  array is "circular" now, all indexes have to be modulo 16.
  Round number is positive, so remainder operation should be
  without surprises.

* initial full message scheduling is trimmed to first 16 values which
  come from data block, the rest is calculated before it's needed.

* original loop body is unrolled version of new SHA512_0_15 and
  SHA512_16_79 macros, unrolling was done to not do explicit variable
  renaming. Otherwise it's the very same code after preprocessing.
  See sha1_transform() code which does the same trick.

Patch survives in-tree crypto test and original bugreport test
(ping flood with hmac(sha512).

See FIPS 180-2 for SHA-512 definition
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 crypto/sha512_generic.c |   58 ++++++++++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 24 deletions(-)

--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -78,7 +78,7 @@ static inline void LOAD_OP(int I, u64 *W
 
 static inline void BLEND_OP(int I, u64 *W)
 {
-	W[I] = s1(W[I-2]) + W[I-7] + s0(W[I-15]) + W[I-16];
+	W[I % 16] += s1(W[(I-2) % 16]) + W[(I-7) % 16] + s0(W[(I-15) % 16]);
 }
 
 static void
@@ -87,38 +87,48 @@ sha512_transform(u64 *state, const u8 *i
 	u64 a, b, c, d, e, f, g, h, t1, t2;
 
 	int i;
-	u64 W[80];
+	u64 W[16];
 
 	/* load the input */
         for (i = 0; i < 16; i++)
                 LOAD_OP(i, W, input);
 
-        for (i = 16; i < 80; i++) {
-                BLEND_OP(i, W);
-        }
-
 	/* load the state into our registers */
 	a=state[0];   b=state[1];   c=state[2];   d=state[3];
 	e=state[4];   f=state[5];   g=state[6];   h=state[7];
 
-	/* now iterate */
-	for (i=0; i<80; i+=8) {
-		t1 = h + e1(e) + Ch(e,f,g) + sha512_K[i  ] + W[i  ];
-		t2 = e0(a) + Maj(a,b,c);    d+=t1;    h=t1+t2;
-		t1 = g + e1(d) + Ch(d,e,f) + sha512_K[i+1] + W[i+1];
-		t2 = e0(h) + Maj(h,a,b);    c+=t1;    g=t1+t2;
-		t1 = f + e1(c) + Ch(c,d,e) + sha512_K[i+2] + W[i+2];
-		t2 = e0(g) + Maj(g,h,a);    b+=t1;    f=t1+t2;
-		t1 = e + e1(b) + Ch(b,c,d) + sha512_K[i+3] + W[i+3];
-		t2 = e0(f) + Maj(f,g,h);    a+=t1;    e=t1+t2;
-		t1 = d + e1(a) + Ch(a,b,c) + sha512_K[i+4] + W[i+4];
-		t2 = e0(e) + Maj(e,f,g);    h+=t1;    d=t1+t2;
-		t1 = c + e1(h) + Ch(h,a,b) + sha512_K[i+5] + W[i+5];
-		t2 = e0(d) + Maj(d,e,f);    g+=t1;    c=t1+t2;
-		t1 = b + e1(g) + Ch(g,h,a) + sha512_K[i+6] + W[i+6];
-		t2 = e0(c) + Maj(c,d,e);    f+=t1;    b=t1+t2;
-		t1 = a + e1(f) + Ch(f,g,h) + sha512_K[i+7] + W[i+7];
-		t2 = e0(b) + Maj(b,c,d);    e+=t1;    a=t1+t2;
+#define SHA512_0_15(i, a, b, c, d, e, f, g, h)			\
+	t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[i];	\
+	t2 = e0(a) + Maj(a, b, c);				\
+	d += t1;						\
+	h = t1 + t2
+
+#define SHA512_16_79(i, a, b, c, d, e, f, g, h)			\
+	BLEND_OP(i, W);						\
+	t1 = h + e1(e) + Ch(e, f, g) + sha512_K[i] + W[(i)%16];	\
+	t2 = e0(a) + Maj(a, b, c);				\
+	d += t1;						\
+	h = t1 + t2
+
+	for (i = 0; i < 16; i += 8) {
+		SHA512_0_15(i, a, b, c, d, e, f, g, h);
+		SHA512_0_15(i + 1, h, a, b, c, d, e, f, g);
+		SHA512_0_15(i + 2, g, h, a, b, c, d, e, f);
+		SHA512_0_15(i + 3, f, g, h, a, b, c, d, e);
+		SHA512_0_15(i + 4, e, f, g, h, a, b, c, d);
+		SHA512_0_15(i + 5, d, e, f, g, h, a, b, c);
+		SHA512_0_15(i + 6, c, d, e, f, g, h, a, b);
+		SHA512_0_15(i + 7, b, c, d, e, f, g, h, a);
+	}
+	for (i = 16; i < 80; i += 8) {
+		SHA512_16_79(i, a, b, c, d, e, f, g, h);
+		SHA512_16_79(i + 1, h, a, b, c, d, e, f, g);
+		SHA512_16_79(i + 2, g, h, a, b, c, d, e, f);
+		SHA512_16_79(i + 3, f, g, h, a, b, c, d, e);
+		SHA512_16_79(i + 4, e, f, g, h, a, b, c, d);
+		SHA512_16_79(i + 5, d, e, f, g, h, a, b, c);
+		SHA512_16_79(i + 6, c, d, e, f, g, h, a, b);
+		SHA512_16_79(i + 7, b, c, d, e, f, g, h, a);
 	}
 
 	state[0] += a; state[1] += b; state[2] += c; state[3] += d;



  parent reply	other threads:[~2012-02-01 21:48 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-01 21:02 [00/65] 3.0.19-stable review Greg KH
2012-02-01 20:55 ` [01/65] ALSA: hda - Fix silent outputs from docking-station jacks of Dell laptops Greg KH
2012-02-01 20:55 ` [02/65] eCryptfs: Sanitize write counts of /dev/ecryptfs Greg KH
2012-02-01 20:55 ` [03/65] ecryptfs: Improve metadata read failure logging Greg KH
2012-02-01 20:55 ` [04/65] eCryptfs: Make truncate path killable Greg KH
2012-02-01 20:55 ` [05/65] eCryptfs: Check inode changes in setattr Greg KH
2012-02-01 20:55 ` [06/65] eCryptfs: Fix oops when printing debug info in extent crypto functions Greg KH
2012-02-01 20:55 ` [07/65] drm/radeon/kms: Add an MSI quirk for Dell RS690 Greg KH
2012-02-01 20:55 ` [08/65] drm: Fix authentication kernel crash Greg KH
2012-02-01 20:55 ` [09/65] xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink() Greg KH
2012-02-01 20:55 ` [10/65] crypto: sha512 - make it work, undo percpu message schedule Greg KH
2012-02-01 20:55 ` Greg KH [this message]
2012-02-01 20:55 ` [12/65] ftrace: Balance records when updating the hash Greg KH
2012-02-01 20:55 ` [13/65] ftrace: Update filter when tracing enabled in set_ftrace_filter() Greg KH
2012-02-01 20:55 ` [14/65] ftrace: Fix unregister ftrace_ops accounting Greg KH
2012-02-01 20:55 ` [15/65] ah: Dont return NET_XMIT_DROP on input Greg KH
2012-02-01 20:55 ` [16/65] xfs: fix endian conversion issue in discard code Greg KH
2012-02-01 20:55 ` [17/65] x86/uv: Fix uv_gpa_to_soc_phys_ram() shift Greg KH
2012-02-01 20:55 ` [18/65] x86/microcode_amd: Add support for CPU family specific container files Greg KH
2012-02-01 20:55 ` [19/65] ALSA: hda - Fix silent output on ASUS A6Rp Greg KH
2012-02-01 20:56 ` [20/65] ALSA: hda - Fix silent output on Haier W18 laptop Greg KH
2012-02-01 20:56 ` [21/65] drm/i915/sdvo: always set positive sync polarity Greg KH
2012-02-01 20:56 ` [22/65] cap_syslog: dont use WARN_ONCE for CAP_SYS_ADMIN deprecation warning Greg KH
2012-02-01 20:56 ` [23/65] mach-ux500: enable ARM errata 764369 Greg KH
2012-02-01 20:56 ` [24/65] ARM: 7296/1: proc-v7.S: remove HARVARD_CACHE preprocessor guards Greg KH
2012-02-01 20:56 ` [25/65] USB: option: Add LG docomo L-02C Greg KH
2012-02-01 20:56 ` [26/65] USB: ftdi_sio: fix TIOCSSERIAL baud_base handling Greg KH
2012-02-01 20:56 ` [27/65] USB: ftdi_sio: fix initial baud rate Greg KH
2012-02-01 20:56 ` [28/65] USB: ftdi_sio: add PID for TI XDS100v2 / BeagleBone A3 Greg KH
2012-02-01 20:56 ` [29/65] USB: serial: ftdi additional IDs Greg KH
2012-02-01 20:56 ` [30/65] USB: ftdi_sio: Add more identifiers Greg KH
2012-02-01 20:56 ` [31/65] USB: cdc-wdm: updating desc->length must be protected by spin_lock Greg KH
2012-02-01 20:56 ` [32/65] USB: cdc-wdm: use two mutexes to allow simultaneous read and write Greg KH
2012-02-01 20:56 ` [33/65] qcaux: add more Pantech UML190 and UML290 ports Greg KH
2012-02-01 20:56 ` [34/65] usb: io_ti: Make edge_remove_sysfs_attrs the port_remove method Greg KH
2012-02-01 20:56 ` [35/65] TTY: fix UV serial console regression Greg KH
2012-02-01 20:56 ` [36/65] serial: amba-pl011: lock console writes against interrupts Greg KH
2012-02-01 20:56 ` [37/65] jsm: Fixed EEH recovery error Greg KH
2012-02-01 20:56 ` [38/65] vmwgfx: Fix assignment in vmw_framebuffer_create_handle Greg KH
2012-02-01 20:56 ` [39/65] USB: usbsevseg: fix max length Greg KH
2012-02-01 20:56 ` [40/65] drivers/usb/host/ehci-fsl.c: add missing iounmap Greg KH
2012-02-01 20:56 ` [41/65] xhci: Fix USB 3.0 device restart on resume Greg KH
2012-02-01 20:56 ` [42/65] xHCI: Cleanup isoc transfer ring when TD length mismatch found Greg KH
2012-02-01 20:56 ` [43/65] hwmon: (f71805f) Fix clamping of temperature limits Greg KH
2012-02-01 20:56 ` [44/65] hwmon: (w83627ehf) Disable setting DC mode for pwm2, pwm3 on NCT6776F Greg KH
2012-02-01 20:56 ` [45/65] hwmon: (sht15) fix bad error code Greg KH
2012-02-01 20:56 ` [46/65] USB: cdc-wdm: call wake_up_all to allow driver to shutdown on device removal Greg KH
2012-02-01 20:56 ` [47/65] USB: cdc-wdm: better allocate a buffer that is at least as big as we tell the USB core Greg KH
2012-02-01 20:56 ` [48/65] USB: cdc-wdm: Avoid hanging on interface with no USB_CDC_DMM_TYPE Greg KH
2012-02-01 20:56 ` [49/65] netns: fix net_alloc_generic() Greg KH
2012-02-01 20:56 ` [50/65] netns: Fail conspicously if someone uses net_generic at an inappropriate time Greg KH
2012-02-01 20:56 ` [51/65] net caif: Register properly as a pernet subsystem Greg KH
2012-02-01 20:56 ` [52/65] bonding: fix enslaving in alb mode when link down Greg KH
2012-02-01 20:56 ` [53/65] l2tp: l2tp_ip - fix possible oops on packet receive Greg KH
2012-02-01 20:56 ` [54/65] net: bpf_jit: fix divide by 0 generation Greg KH
2012-02-01 20:56 ` [55/65] rds: Make rds_sock_lock BH rather than IRQ safe Greg KH
2012-02-01 20:56 ` [56/65] tcp: fix tcp_trim_head() to adjust segment count with skb MSS Greg KH
2012-02-01 20:56 ` [57/65] tcp: md5: using remote adress for md5 lookup in rst packet Greg KH
2012-02-01 20:56 ` [58/65] USB: serial: CP210x: Added USB-ID for the Link Instruments MSO-19 Greg KH
2012-02-01 20:56 ` [59/65] USB: cp210x: call generic open last in open Greg KH
2012-02-01 20:56 ` [60/65] USB: cp210x: fix CP2104 baudrate usage Greg KH
2012-02-01 20:56 ` [61/65] USB: cp210x: do not map baud rates to B0 Greg KH
2012-02-01 20:56 ` [62/65] USB: cp210x: fix up set_termios variables Greg KH
2012-02-01 20:56 ` [63/65] USB: cp210x: clean up, refactor and document speed handling Greg KH
2012-02-01 20:56 ` [64/65] USB: cp210x: initialise baud rate at open Greg KH
2012-02-01 20:56 ` [65/65] USB: cp210x: allow more baud rates above 1Mbaud Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120201205738.452643756@clark.kroah.org \
    --to=gregkh@linuxfoundation.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=herbert@gondor.hengli.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).