All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [sparc64] crc32c misbehave
       [not found] <CADxRZqzgaL4ew6PVek3WBsdwo6GcT0ORx=7h+6p0V3NAr8qF+w@mail.gmail.com>
@ 2017-05-31 11:56   ` Anatoly Pugachev
  0 siblings, 0 replies; 28+ messages in thread
From: Anatoly Pugachev @ 2017-05-31 11:56 UTC (permalink / raw)
  To: Sparc kernel list; +Cc: Eric Sandeen, linux-crypto

Hello!

While debugging occasional crc32c checksum errors with xfs disk reads on
sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
that crc32c sometimes returns wrong checksum for data. Eric made a simple
test kernel module (included), which produce the following results on my
sparc64 machines:

# cat test3.c
#include <linux/module.h>
#include <linux/printk.h>
#include <linux/init.h>
#include <linux/crc32c.h>

#define CRC_SEED (~(u32)0)

static int __init mymodule_init (void)
{
char data[512];
u32 oldcrc = 0xd00dface, crc = 0xdeadbeef;
int i;
u64 nsec;

memset(data, 0, 512);

nsec = ktime_get_ns();
for (i = 0; i < 1000000; i++) {
crc = crc32c(CRC_SEED, data, 512);

if (i > 0 && crc != oldcrc)
printk("i: %d: oldcrc: 0x%x, crc: 0x%x\n", i, oldcrc, crc);
oldcrc = crc;
}
nsec = ktime_get_ns() - nsec;

printk("Loop done in %lld nsec\n", nsec);
return 0;
}

static void __exit mymodule_exit (void)
{
printk ("Module uninitialized successfully \n");
}

module_init(mymodule_init);
module_exit(mymodule_exit);
MODULE_LICENSE("GPL");


root@ttip# modprobe libcrc32c
root@ttip# for i in `seq 1 100`; do echo -n "$i "; insmod ./test3.ko; sleep
1; rmmod test3; done
# journalctl -k -f

CONFIG_CRYPTO_CRC32C = M
CONFIG_CRYPTO_CRC32C_SPARC64 = M

# lsmod| grep crc
crc32test               1557  0
libcrc32c               1382  3 nf_conntrack,xfs,nf_nat
crc32c_generic          2528  0
crc16                   1745  1 ext4
crc32c_sparc64          3493  3


May 31 12:35:13 ttip kernel: Module uninitialized successfully
May 31 12:35:13 ttip kernel: Loop done in 139269659 nsec
May 31 12:35:14 ttip kernel: Module uninitialized successfully
May 31 12:35:15 ttip kernel: Loop done in 139650571 nsec
May 31 12:35:16 ttip kernel: Module uninitialized successfully
May 31 12:35:16 ttip kernel: Loop done in 139559959 nsec
May 31 12:35:17 ttip kernel: Module uninitialized successfully
May 31 12:35:17 ttip kernel: Loop done in 139212192 nsec
May 31 12:35:18 ttip kernel: Module uninitialized successfully
May 31 12:35:18 ttip kernel: Loop done in 139619805 nsec
May 31 12:35:19 ttip kernel: Module uninitialized successfully
May 31 12:35:20 ttip kernel: Loop done in 139558722 nsec
May 31 12:35:21 ttip kernel: Module uninitialized successfully
May 31 12:35:21 ttip kernel: i: 34706: oldcrc: 0xcf03123f, crc: 0x0
May 31 12:35:21 ttip kernel: i: 34707: oldcrc: 0x0, crc: 0xcf03123f
May 31 12:35:36 ttip kernel: Module uninitialized successfully
May 31 12:35:37 ttip kernel: i: 695650: oldcrc: 0xcf03123f, crc: 0x0
May 31 12:35:37 ttip kernel: i: 695651: oldcrc: 0x0, crc: 0xcf03123f
May 31 12:36:24 ttip kernel: Module uninitialized successfully
May 31 12:36:24 ttip kernel: i: 664460: oldcrc: 0xcf03123f, crc: 0x0
May 31 12:36:24 ttip kernel: i: 664461: oldcrc: 0x0, crc: 0xcf03123f

another run with kernel
CONFIG_CRYPTO_CRC32C = M
CONFIG_CRYPTO_CRC32C_SPARC64 is not set

# lsmod | grep crc
libcrc32c               1382  3 nf_conntrack,xfs,nf_nat
crc32c_generic          2528  3
crc16                   1745  1 ext4

May 31 12:57:26 ttip kernel: test3: loading out-of-tree module taints
kernel.
May 31 12:57:26 ttip kernel: Loop done in 439555353 nsec
May 31 12:57:27 ttip kernel: Module uninitialized successfully
May 31 12:57:28 ttip kernel: Loop done in 441111064 nsec
May 31 12:57:29 ttip kernel: Module uninitialized successfully
May 31 12:57:29 ttip kernel: Loop done in 439476126 nsec
May 31 12:57:30 ttip kernel: Module uninitialized successfully
May 31 12:57:31 ttip kernel: Loop done in 440995512 nsec
May 31 12:57:32 ttip kernel: Module uninitialized successfully
May 31 12:57:33 ttip kernel: Loop done in 439825440 nsec
May 31 12:57:34 ttip kernel: Module uninitialized successfully
May 31 12:57:34 ttip kernel: i: 293384: oldcrc: 0xcf03123f, crc: 0x0
May 31 12:57:34 ttip kernel: i: 293385: oldcrc: 0x0, crc: 0xcf03123f
May 31 12:57:34 ttip kernel: Loop done in 439500110 nsec
May 31 12:57:35 ttip kernel: Module uninitialized successfully
May 31 13:02:26 ttip kernel: i: 293577: oldcrc: 0xcf03123f, crc: 0x0
May 31 13:02:26 ttip kernel: i: 293578: oldcrc: 0x0, crc: 0xcf03123f
May 31 13:02:26 ttip kernel: i: 515055: oldcrc: 0xcf03123f, crc: 0x0
May 31 13:02:26 ttip kernel: i: 515056: oldcrc: 0x0, crc: 0xcf03123f
May 31 13:03:15 ttip kernel: Module uninitialized successfully
May 31 13:03:15 ttip kernel: i: 259986: oldcrc: 0xcf03123f, crc: 0x0
May 31 13:03:15 ttip kernel: i: 259987: oldcrc: 0x0, crc: 0xcf03123f
May 31 13:03:16 ttip kernel: Loop done in 449601790 nsec


cycle loading (for i in `seq 1 100`) crc32test module
(CONFIG_CRC32_SELFTEST) shows all tests as passed (no errors).

running on older (1.5 GHz cpu) sparc64 (sun4u) v215 machine hit wrong crc
error  immidiatelly:

# lsmod | grep crc
crc16                   1591  1 ext4
libcrc32c               1234  1 raid456
crc32c_generic          2270  1

root@v215# journalctl  -k -b
May 31 14:32:13 v215 kernel: systemd: 28 output lines suppressed due to
ratelimiting
May 31 14:36:34 v215 kernel: test3: loading out-of-tree module taints
kernel.
May 31 14:36:35 v215 kernel: i: 99466: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:35 v215 kernel: i: 99467: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:35 v215 kernel: Loop done in 1017018626 nsec
May 31 14:36:36 v215 kernel: Module uninitialized successfully
May 31 14:36:37 v215 kernel: i: 320351: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:37 v215 kernel: i: 320352: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:38 v215 kernel: Loop done in 1019229670 nsec
May 31 14:36:39 v215 kernel: Module uninitialized successfully
May 31 14:36:39 v215 kernel: i: 275151: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:39 v215 kernel: i: 275152: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:40 v215 kernel: Loop done in 1019406833 nsec
May 31 14:36:41 v215 kernel: Module uninitialized successfully
May 31 14:36:41 v215 kernel: i: 356512: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:41 v215 kernel: i: 356513: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:42 v215 kernel: Loop done in 1019136933 nsec
May 31 14:36:43 v215 kernel: Module uninitialized successfully
May 31 14:36:43 v215 kernel: i: 243633: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:43 v215 kernel: i: 243634: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:44 v215 kernel: i: 409279: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:44 v215 kernel: i: 409280: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:44 v215 kernel: i: 516166: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:44 v215 kernel: i: 516167: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:44 v215 kernel: Loop done in 1285740619 nsec
May 31 14:36:45 v215 kernel: Module uninitialized successfully

Tested as well on x86_64 and ppc64 machines, and can't reproduce it there.

Can someone please look what is wrong with crc32c on sparc64 ?!
Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-05-31 11:56   ` Anatoly Pugachev
  0 siblings, 0 replies; 28+ messages in thread
From: Anatoly Pugachev @ 2017-05-31 11:56 UTC (permalink / raw)
  To: Sparc kernel list; +Cc: Eric Sandeen, linux-crypto

Hello!

While debugging occasional crc32c checksum errors with xfs disk reads on
sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
that crc32c sometimes returns wrong checksum for data. Eric made a simple
test kernel module (included), which produce the following results on my
sparc64 machines:

# cat test3.c
#include <linux/module.h>
#include <linux/printk.h>
#include <linux/init.h>
#include <linux/crc32c.h>

#define CRC_SEED (~(u32)0)

static int __init mymodule_init (void)
{
char data[512];
u32 oldcrc = 0xd00dface, crc = 0xdeadbeef;
int i;
u64 nsec;

memset(data, 0, 512);

nsec = ktime_get_ns();
for (i = 0; i < 1000000; i++) {
crc = crc32c(CRC_SEED, data, 512);

if (i > 0 && crc != oldcrc)
printk("i: %d: oldcrc: 0x%x, crc: 0x%x\n", i, oldcrc, crc);
oldcrc = crc;
}
nsec = ktime_get_ns() - nsec;

printk("Loop done in %lld nsec\n", nsec);
return 0;
}

static void __exit mymodule_exit (void)
{
printk ("Module uninitialized successfully \n");
}

module_init(mymodule_init);
module_exit(mymodule_exit);
MODULE_LICENSE("GPL");


root@ttip# modprobe libcrc32c
root@ttip# for i in `seq 1 100`; do echo -n "$i "; insmod ./test3.ko; sleep
1; rmmod test3; done
# journalctl -k -f

CONFIG_CRYPTO_CRC32C = M
CONFIG_CRYPTO_CRC32C_SPARC64 = M

# lsmod| grep crc
crc32test               1557  0
libcrc32c               1382  3 nf_conntrack,xfs,nf_nat
crc32c_generic          2528  0
crc16                   1745  1 ext4
crc32c_sparc64          3493  3


May 31 12:35:13 ttip kernel: Module uninitialized successfully
May 31 12:35:13 ttip kernel: Loop done in 139269659 nsec
May 31 12:35:14 ttip kernel: Module uninitialized successfully
May 31 12:35:15 ttip kernel: Loop done in 139650571 nsec
May 31 12:35:16 ttip kernel: Module uninitialized successfully
May 31 12:35:16 ttip kernel: Loop done in 139559959 nsec
May 31 12:35:17 ttip kernel: Module uninitialized successfully
May 31 12:35:17 ttip kernel: Loop done in 139212192 nsec
May 31 12:35:18 ttip kernel: Module uninitialized successfully
May 31 12:35:18 ttip kernel: Loop done in 139619805 nsec
May 31 12:35:19 ttip kernel: Module uninitialized successfully
May 31 12:35:20 ttip kernel: Loop done in 139558722 nsec
May 31 12:35:21 ttip kernel: Module uninitialized successfully
May 31 12:35:21 ttip kernel: i: 34706: oldcrc: 0xcf03123f, crc: 0x0
May 31 12:35:21 ttip kernel: i: 34707: oldcrc: 0x0, crc: 0xcf03123f
May 31 12:35:36 ttip kernel: Module uninitialized successfully
May 31 12:35:37 ttip kernel: i: 695650: oldcrc: 0xcf03123f, crc: 0x0
May 31 12:35:37 ttip kernel: i: 695651: oldcrc: 0x0, crc: 0xcf03123f
May 31 12:36:24 ttip kernel: Module uninitialized successfully
May 31 12:36:24 ttip kernel: i: 664460: oldcrc: 0xcf03123f, crc: 0x0
May 31 12:36:24 ttip kernel: i: 664461: oldcrc: 0x0, crc: 0xcf03123f

another run with kernel
CONFIG_CRYPTO_CRC32C = M
CONFIG_CRYPTO_CRC32C_SPARC64 is not set

# lsmod | grep crc
libcrc32c               1382  3 nf_conntrack,xfs,nf_nat
crc32c_generic          2528  3
crc16                   1745  1 ext4

May 31 12:57:26 ttip kernel: test3: loading out-of-tree module taints
kernel.
May 31 12:57:26 ttip kernel: Loop done in 439555353 nsec
May 31 12:57:27 ttip kernel: Module uninitialized successfully
May 31 12:57:28 ttip kernel: Loop done in 441111064 nsec
May 31 12:57:29 ttip kernel: Module uninitialized successfully
May 31 12:57:29 ttip kernel: Loop done in 439476126 nsec
May 31 12:57:30 ttip kernel: Module uninitialized successfully
May 31 12:57:31 ttip kernel: Loop done in 440995512 nsec
May 31 12:57:32 ttip kernel: Module uninitialized successfully
May 31 12:57:33 ttip kernel: Loop done in 439825440 nsec
May 31 12:57:34 ttip kernel: Module uninitialized successfully
May 31 12:57:34 ttip kernel: i: 293384: oldcrc: 0xcf03123f, crc: 0x0
May 31 12:57:34 ttip kernel: i: 293385: oldcrc: 0x0, crc: 0xcf03123f
May 31 12:57:34 ttip kernel: Loop done in 439500110 nsec
May 31 12:57:35 ttip kernel: Module uninitialized successfully
May 31 13:02:26 ttip kernel: i: 293577: oldcrc: 0xcf03123f, crc: 0x0
May 31 13:02:26 ttip kernel: i: 293578: oldcrc: 0x0, crc: 0xcf03123f
May 31 13:02:26 ttip kernel: i: 515055: oldcrc: 0xcf03123f, crc: 0x0
May 31 13:02:26 ttip kernel: i: 515056: oldcrc: 0x0, crc: 0xcf03123f
May 31 13:03:15 ttip kernel: Module uninitialized successfully
May 31 13:03:15 ttip kernel: i: 259986: oldcrc: 0xcf03123f, crc: 0x0
May 31 13:03:15 ttip kernel: i: 259987: oldcrc: 0x0, crc: 0xcf03123f
May 31 13:03:16 ttip kernel: Loop done in 449601790 nsec


cycle loading (for i in `seq 1 100`) crc32test module
(CONFIG_CRC32_SELFTEST) shows all tests as passed (no errors).

running on older (1.5 GHz cpu) sparc64 (sun4u) v215 machine hit wrong crc
error  immidiatelly:

# lsmod | grep crc
crc16                   1591  1 ext4
libcrc32c               1234  1 raid456
crc32c_generic          2270  1

root@v215# journalctl  -k -b
May 31 14:32:13 v215 kernel: systemd: 28 output lines suppressed due to
ratelimiting
May 31 14:36:34 v215 kernel: test3: loading out-of-tree module taints
kernel.
May 31 14:36:35 v215 kernel: i: 99466: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:35 v215 kernel: i: 99467: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:35 v215 kernel: Loop done in 1017018626 nsec
May 31 14:36:36 v215 kernel: Module uninitialized successfully
May 31 14:36:37 v215 kernel: i: 320351: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:37 v215 kernel: i: 320352: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:38 v215 kernel: Loop done in 1019229670 nsec
May 31 14:36:39 v215 kernel: Module uninitialized successfully
May 31 14:36:39 v215 kernel: i: 275151: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:39 v215 kernel: i: 275152: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:40 v215 kernel: Loop done in 1019406833 nsec
May 31 14:36:41 v215 kernel: Module uninitialized successfully
May 31 14:36:41 v215 kernel: i: 356512: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:41 v215 kernel: i: 356513: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:42 v215 kernel: Loop done in 1019136933 nsec
May 31 14:36:43 v215 kernel: Module uninitialized successfully
May 31 14:36:43 v215 kernel: i: 243633: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:43 v215 kernel: i: 243634: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:44 v215 kernel: i: 409279: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:44 v215 kernel: i: 409280: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:44 v215 kernel: i: 516166: oldcrc: 0xcf03123f, crc: 0x0
May 31 14:36:44 v215 kernel: i: 516167: oldcrc: 0x0, crc: 0xcf03123f
May 31 14:36:44 v215 kernel: Loop done in 1285740619 nsec
May 31 14:36:45 v215 kernel: Module uninitialized successfully

Tested as well on x86_64 and ppc64 machines, and can't reproduce it there.

Can someone please look what is wrong with crc32c on sparc64 ?!
Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-05-31 11:56   ` Anatoly Pugachev
@ 2017-05-31 12:12     ` Anatoly Pugachev
  -1 siblings, 0 replies; 28+ messages in thread
From: Anatoly Pugachev @ 2017-05-31 12:12 UTC (permalink / raw)
  To: Sparc kernel list; +Cc: Eric Sandeen, linux-crypto

A bit more on testing machines:

kernel on T5 ldom ttip is git kernel:

Linux ttip 4.12.0-rc3-00011-gf511c0b17b08 #327 SMP Wed May 31 12:54:02
MSK 2017 sparc64 GNU/Linux

kernel on v215 is debian sid kernel:

Linux v215 4.9.0-3-sparc64-smp #1 SMP Debian 4.9.25-1 (2017-05-02)
sparc64 GNU/Linux

mator@ttip:~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/sparc64-linux-gnu/6/lto-wrapper
Target: sparc64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian
6.3.0-18' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++
--prefix=/usr --program-suffix=-6 --program-prefix=sparc64-linux-gnu-
--enable-shared --enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libquadmath --enable-plugin --enable-default-pie
--with-system-zlib --disable-browser-plugin --enable-java-awt=gtk
--enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-sparc64/jre
--enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-sparc64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-sparc64
--with-arch-directory=sparc64
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc=auto
--enable-multiarch --enable-targets=all --with-cpu-32=ultrasparc
--with-long-double-128 --enable-multilib --enable-checking=release
--build=sparc64-linux-gnu --host=sparc64-linux-gnu
--target=sparc64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-05-31 12:12     ` Anatoly Pugachev
  0 siblings, 0 replies; 28+ messages in thread
From: Anatoly Pugachev @ 2017-05-31 12:12 UTC (permalink / raw)
  To: Sparc kernel list; +Cc: Eric Sandeen, linux-crypto

A bit more on testing machines:

kernel on T5 ldom ttip is git kernel:

Linux ttip 4.12.0-rc3-00011-gf511c0b17b08 #327 SMP Wed May 31 12:54:02
MSK 2017 sparc64 GNU/Linux

kernel on v215 is debian sid kernel:

Linux v215 4.9.0-3-sparc64-smp #1 SMP Debian 4.9.25-1 (2017-05-02)
sparc64 GNU/Linux

mator@ttip:~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/sparc64-linux-gnu/6/lto-wrapper
Target: sparc64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian
6.3.0-18' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++
--prefix=/usr --program-suffix=-6 --program-prefix=sparc64-linux-gnu-
--enable-shared --enable-linker-build-id --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/lib
--enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libquadmath --enable-plugin --enable-default-pie
--with-system-zlib --disable-browser-plugin --enable-java-awt=gtk
--enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-sparc64/jre
--enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-sparc64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-sparc64
--with-arch-directory=sparc64
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc=auto
--enable-multiarch --enable-targets=all --with-cpu-32=ultrasparc
--with-long-double-128 --enable-multilib --enable-checking=release
--build=sparc64-linux-gnu --host=sparc64-linux-gnu
--target=sparc64-linux-gnu
Thread model: posix
gcc version 6.3.0 20170516 (Debian 6.3.0-18)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-05-31 11:56   ` Anatoly Pugachev
@ 2017-05-31 15:53     ` David Miller
  -1 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-05-31 15:53 UTC (permalink / raw)
  To: matorola; +Cc: sparclinux, sandeen, linux-crypto

From: Anatoly Pugachev <matorola@gmail.com>
Date: Wed, 31 May 2017 14:56:52 +0300

> While debugging occasional crc32c checksum errors with xfs disk reads on
> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
> that crc32c sometimes returns wrong checksum for data. Eric made a simple
> test kernel module (included), which produce the following results on my
> sparc64 machines:

I don't think that crc32c() is thread safe because of the way it is
implemented with a shared TFM crypto object allocated once at boot
time.

I think you are seeing the corruption any time an interrupt comes in
on the same cpu as your test module is running on and does a crc32c()
calculation, corrupting the context key value being used by your
invocation.

At least that's my guess, I could have misread how the key is stored
and managed around operations.

Can you try something like disabling cpu IRQs around the crc32c() function
in lib/libcrc32c.c?  Something like:

	u32 retval;

	local_irq_disable();

	shash->tfm = tfm;
	shash->flags = 0;
	*ctx = crc;

	err = crypto_shash_update(shash, address, length);
	BUG_ON(err);

	retval = *ctx;

	local_irq_enable();

	return retval;

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-05-31 15:53     ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-05-31 15:53 UTC (permalink / raw)
  To: matorola; +Cc: sparclinux, sandeen, linux-crypto

From: Anatoly Pugachev <matorola@gmail.com>
Date: Wed, 31 May 2017 14:56:52 +0300

> While debugging occasional crc32c checksum errors with xfs disk reads on
> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
> that crc32c sometimes returns wrong checksum for data. Eric made a simple
> test kernel module (included), which produce the following results on my
> sparc64 machines:

I don't think that crc32c() is thread safe because of the way it is
implemented with a shared TFM crypto object allocated once at boot
time.

I think you are seeing the corruption any time an interrupt comes in
on the same cpu as your test module is running on and does a crc32c()
calculation, corrupting the context key value being used by your
invocation.

At least that's my guess, I could have misread how the key is stored
and managed around operations.

Can you try something like disabling cpu IRQs around the crc32c() function
in lib/libcrc32c.c?  Something like:

	u32 retval;

	local_irq_disable();

	shash->tfm = tfm;
	shash->flags = 0;
	*ctx = crc;

	err = crypto_shash_update(shash, address, length);
	BUG_ON(err);

	retval = *ctx;

	local_irq_enable();

	return retval;

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-05-31 15:53     ` David Miller
@ 2017-05-31 16:03       ` David Miller
  -1 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-05-31 16:03 UTC (permalink / raw)
  To: matorola; +Cc: sparclinux, sandeen, linux-crypto

From: David Miller <davem@davemloft.net>
Date: Wed, 31 May 2017 11:53:35 -0400 (EDT)

> Can you try something like disabling cpu IRQs around the crc32c() function
> in lib/libcrc32c.c?  Something like:
> 
> 	u32 retval;
> 
> 	local_irq_disable();
> 
> 	shash->tfm = tfm;
> 	shash->flags = 0;
> 	*ctx = crc;
> 
> 	err = crypto_shash_update(shash, address, length);
> 	BUG_ON(err);
> 
> 	retval = *ctx;
> 
> 	local_irq_enable();
> 
> 	return retval;

Actually you would need a spinlock, with IRQs disabled, to properly test
this theory since the TFM is shared across the entire system.

The really suspicious part of your test results is that the corrupted
checksum always evaluates to zero.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-05-31 16:03       ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-05-31 16:03 UTC (permalink / raw)
  To: matorola; +Cc: sparclinux, sandeen, linux-crypto

From: David Miller <davem@davemloft.net>
Date: Wed, 31 May 2017 11:53:35 -0400 (EDT)

> Can you try something like disabling cpu IRQs around the crc32c() function
> in lib/libcrc32c.c?  Something like:
> 
> 	u32 retval;
> 
> 	local_irq_disable();
> 
> 	shash->tfm = tfm;
> 	shash->flags = 0;
> 	*ctx = crc;
> 
> 	err = crypto_shash_update(shash, address, length);
> 	BUG_ON(err);
> 
> 	retval = *ctx;
> 
> 	local_irq_enable();
> 
> 	return retval;

Actually you would need a spinlock, with IRQs disabled, to properly test
this theory since the TFM is shared across the entire system.

The really suspicious part of your test results is that the corrupted
checksum always evaluates to zero.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-05-31 15:53     ` David Miller
@ 2017-05-31 16:19       ` Eric Sandeen
  -1 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-05-31 16:19 UTC (permalink / raw)
  To: David Miller, matorola; +Cc: sparclinux, linux-xfs

On 5/31/17 10:53 AM, David Miller wrote:
> From: Anatoly Pugachev <matorola@gmail.com>
> Date: Wed, 31 May 2017 14:56:52 +0300
> 
>> While debugging occasional crc32c checksum errors with xfs disk reads on
>> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
>> that crc32c sometimes returns wrong checksum for data. Eric made a simple
>> test kernel module (included), which produce the following results on my
>> sparc64 machines:

cc: linux-xfs, because this problem cropped up on xfs/sparc.

-Eric

> I don't think that crc32c() is thread safe because of the way it is
> implemented with a shared TFM crypto object allocated once at boot
> time.
> 
> I think you are seeing the corruption any time an interrupt comes in
> on the same cpu as your test module is running on and does a crc32c()
> calculation, corrupting the context key value being used by your
> invocation.
> 
> At least that's my guess, I could have misread how the key is stored
> and managed around operations.
> 
> Can you try something like disabling cpu IRQs around the crc32c() function
> in lib/libcrc32c.c?  Something like:
> 
> 	u32 retval;
> 
> 	local_irq_disable();
> 
> 	shash->tfm = tfm;
> 	shash->flags = 0;
> 	*ctx = crc;
> 
> 	err = crypto_shash_update(shash, address, length);
> 	BUG_ON(err);
> 
> 	retval = *ctx;
> 
> 	local_irq_enable();
> 
> 	return retval;
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-05-31 16:19       ` Eric Sandeen
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-05-31 16:19 UTC (permalink / raw)
  To: David Miller, matorola; +Cc: sparclinux, linux-xfs

On 5/31/17 10:53 AM, David Miller wrote:
> From: Anatoly Pugachev <matorola@gmail.com>
> Date: Wed, 31 May 2017 14:56:52 +0300
> 
>> While debugging occasional crc32c checksum errors with xfs disk reads on
>> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
>> that crc32c sometimes returns wrong checksum for data. Eric made a simple
>> test kernel module (included), which produce the following results on my
>> sparc64 machines:

cc: linux-xfs, because this problem cropped up on xfs/sparc.

-Eric

> I don't think that crc32c() is thread safe because of the way it is
> implemented with a shared TFM crypto object allocated once at boot
> time.
> 
> I think you are seeing the corruption any time an interrupt comes in
> on the same cpu as your test module is running on and does a crc32c()
> calculation, corrupting the context key value being used by your
> invocation.
> 
> At least that's my guess, I could have misread how the key is stored
> and managed around operations.
> 
> Can you try something like disabling cpu IRQs around the crc32c() function
> in lib/libcrc32c.c?  Something like:
> 
> 	u32 retval;
> 
> 	local_irq_disable();
> 
> 	shash->tfm = tfm;
> 	shash->flags = 0;
> 	*ctx = crc;
> 
> 	err = crypto_shash_update(shash, address, length);
> 	BUG_ON(err);
> 
> 	retval = *ctx;
> 
> 	local_irq_enable();
> 
> 	return retval;
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-05-31 16:19       ` Eric Sandeen
@ 2017-05-31 16:31         ` Eric Sandeen
  -1 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-05-31 16:31 UTC (permalink / raw)
  To: David Miller, matorola; +Cc: sparclinux, linux-xfs

On 5/31/17 11:19 AM, Eric Sandeen wrote:
> On 5/31/17 10:53 AM, David Miller wrote:
>> From: Anatoly Pugachev <matorola@gmail.com>
>> Date: Wed, 31 May 2017 14:56:52 +0300
>>
>>> While debugging occasional crc32c checksum errors with xfs disk reads on
>>> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
>>> that crc32c sometimes returns wrong checksum for data. Eric made a simple
>>> test kernel module (included), which produce the following results on my
>>> sparc64 machines:
> 
> cc: linux-xfs, because this problem cropped up on xfs/sparc.

FWIW, the testcase (module which does

	crc = crc32c(CRC_SEED, data, 512);

1 million times in a loop on the same data, and printk's if
the result ever changes) does not fail on x86_64 or ARM
(well, not after a gcc bug was fixed on ARM ...)

-Eric

> -Eric
> 
>> I don't think that crc32c() is thread safe because of the way it is
>> implemented with a shared TFM crypto object allocated once at boot
>> time.
>>
>> I think you are seeing the corruption any time an interrupt comes in
>> on the same cpu as your test module is running on and does a crc32c()
>> calculation, corrupting the context key value being used by your
>> invocation.
>>
>> At least that's my guess, I could have misread how the key is stored
>> and managed around operations.
>>
>> Can you try something like disabling cpu IRQs around the crc32c() function
>> in lib/libcrc32c.c?  Something like:
>>
>> 	u32 retval;
>>
>> 	local_irq_disable();
>>
>> 	shash->tfm = tfm;
>> 	shash->flags = 0;
>> 	*ctx = crc;
>>
>> 	err = crypto_shash_update(shash, address, length);
>> 	BUG_ON(err);
>>
>> 	retval = *ctx;
>>
>> 	local_irq_enable();
>>
>> 	return retval;
>>
>> Thanks.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-05-31 16:31         ` Eric Sandeen
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-05-31 16:31 UTC (permalink / raw)
  To: David Miller, matorola; +Cc: sparclinux, linux-xfs

On 5/31/17 11:19 AM, Eric Sandeen wrote:
> On 5/31/17 10:53 AM, David Miller wrote:
>> From: Anatoly Pugachev <matorola@gmail.com>
>> Date: Wed, 31 May 2017 14:56:52 +0300
>>
>>> While debugging occasional crc32c checksum errors with xfs disk reads on
>>> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
>>> that crc32c sometimes returns wrong checksum for data. Eric made a simple
>>> test kernel module (included), which produce the following results on my
>>> sparc64 machines:
> 
> cc: linux-xfs, because this problem cropped up on xfs/sparc.

FWIW, the testcase (module which does

	crc = crc32c(CRC_SEED, data, 512);

1 million times in a loop on the same data, and printk's if
the result ever changes) does not fail on x86_64 or ARM
(well, not after a gcc bug was fixed on ARM ...)

-Eric

> -Eric
> 
>> I don't think that crc32c() is thread safe because of the way it is
>> implemented with a shared TFM crypto object allocated once at boot
>> time.
>>
>> I think you are seeing the corruption any time an interrupt comes in
>> on the same cpu as your test module is running on and does a crc32c()
>> calculation, corrupting the context key value being used by your
>> invocation.
>>
>> At least that's my guess, I could have misread how the key is stored
>> and managed around operations.
>>
>> Can you try something like disabling cpu IRQs around the crc32c() function
>> in lib/libcrc32c.c?  Something like:
>>
>> 	u32 retval;
>>
>> 	local_irq_disable();
>>
>> 	shash->tfm = tfm;
>> 	shash->flags = 0;
>> 	*ctx = crc;
>>
>> 	err = crypto_shash_update(shash, address, length);
>> 	BUG_ON(err);
>>
>> 	retval = *ctx;
>>
>> 	local_irq_enable();
>>
>> 	return retval;
>>
>> Thanks.
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-05-31 16:31         ` Eric Sandeen
@ 2017-05-31 16:49           ` David Miller
  -1 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-05-31 16:49 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs

From: Eric Sandeen <sandeen@sandeen.net>
Date: Wed, 31 May 2017 11:31:10 -0500

> On 5/31/17 11:19 AM, Eric Sandeen wrote:
>> On 5/31/17 10:53 AM, David Miller wrote:
>>> From: Anatoly Pugachev <matorola@gmail.com>
>>> Date: Wed, 31 May 2017 14:56:52 +0300
>>>
>>>> While debugging occasional crc32c checksum errors with xfs disk reads on
>>>> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
>>>> that crc32c sometimes returns wrong checksum for data. Eric made a simple
>>>> test kernel module (included), which produce the following results on my
>>>> sparc64 machines:
>> 
>> cc: linux-xfs, because this problem cropped up on xfs/sparc.
> 
> FWIW, the testcase (module which does
> 
> 	crc = crc32c(CRC_SEED, data, 512);
> 
> 1 million times in a loop on the same data, and printk's if
> the result ever changes) does not fail on x86_64 or ARM
> (well, not after a gcc bug was fixed on ARM ...)

Is the machine doing things that would cause crc32c() operations in
interrupts (SCTP protocol traffic) or on other cpus?

That's the danger in comparing other machines, the context and what's
running on them is different.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-05-31 16:49           ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-05-31 16:49 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs

From: Eric Sandeen <sandeen@sandeen.net>
Date: Wed, 31 May 2017 11:31:10 -0500

> On 5/31/17 11:19 AM, Eric Sandeen wrote:
>> On 5/31/17 10:53 AM, David Miller wrote:
>>> From: Anatoly Pugachev <matorola@gmail.com>
>>> Date: Wed, 31 May 2017 14:56:52 +0300
>>>
>>>> While debugging occasional crc32c checksum errors with xfs disk reads on
>>>> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
>>>> that crc32c sometimes returns wrong checksum for data. Eric made a simple
>>>> test kernel module (included), which produce the following results on my
>>>> sparc64 machines:
>> 
>> cc: linux-xfs, because this problem cropped up on xfs/sparc.
> 
> FWIW, the testcase (module which does
> 
> 	crc = crc32c(CRC_SEED, data, 512);
> 
> 1 million times in a loop on the same data, and printk's if
> the result ever changes) does not fail on x86_64 or ARM
> (well, not after a gcc bug was fixed on ARM ...)

Is the machine doing things that would cause crc32c() operations in
interrupts (SCTP protocol traffic) or on other cpus?

That's the danger in comparing other machines, the context and what's
running on them is different.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-05-31 16:49           ` David Miller
@ 2017-06-01 21:44             ` David Miller
  -1 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-06-01 21:44 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs

From: David Miller <davem@davemloft.net>
Date: Wed, 31 May 2017 12:49:16 -0400 (EDT)

> From: Eric Sandeen <sandeen@sandeen.net>
> Date: Wed, 31 May 2017 11:31:10 -0500
> 
>> On 5/31/17 11:19 AM, Eric Sandeen wrote:
>>> On 5/31/17 10:53 AM, David Miller wrote:
>>>> From: Anatoly Pugachev <matorola@gmail.com>
>>>> Date: Wed, 31 May 2017 14:56:52 +0300
>>>>
>>>>> While debugging occasional crc32c checksum errors with xfs disk reads on
>>>>> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
>>>>> that crc32c sometimes returns wrong checksum for data. Eric made a simple
>>>>> test kernel module (included), which produce the following results on my
>>>>> sparc64 machines:
>>> 
>>> cc: linux-xfs, because this problem cropped up on xfs/sparc.
>> 
>> FWIW, the testcase (module which does
>> 
>> 	crc = crc32c(CRC_SEED, data, 512);
>> 
>> 1 million times in a loop on the same data, and printk's if
>> the result ever changes) does not fail on x86_64 or ARM
>> (well, not after a gcc bug was fixed on ARM ...)
> 
> Is the machine doing things that would cause crc32c() operations in
> interrupts (SCTP protocol traffic) or on other cpus?
> 
> That's the danger in comparing other machines, the context and what's
> running on them is different.

Ok, I can reproduce this bug on my systems.  I'll see if I can figure out
what is going on.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-06-01 21:44             ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-06-01 21:44 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs

From: David Miller <davem@davemloft.net>
Date: Wed, 31 May 2017 12:49:16 -0400 (EDT)

> From: Eric Sandeen <sandeen@sandeen.net>
> Date: Wed, 31 May 2017 11:31:10 -0500
> 
>> On 5/31/17 11:19 AM, Eric Sandeen wrote:
>>> On 5/31/17 10:53 AM, David Miller wrote:
>>>> From: Anatoly Pugachev <matorola@gmail.com>
>>>> Date: Wed, 31 May 2017 14:56:52 +0300
>>>>
>>>>> While debugging occasional crc32c checksum errors with xfs disk reads on
>>>>> sparc64 (T5 [sun4v] 3.6 GHz CPU ldom, debian unstable/sid), Eric have found
>>>>> that crc32c sometimes returns wrong checksum for data. Eric made a simple
>>>>> test kernel module (included), which produce the following results on my
>>>>> sparc64 machines:
>>> 
>>> cc: linux-xfs, because this problem cropped up on xfs/sparc.
>> 
>> FWIW, the testcase (module which does
>> 
>> 	crc = crc32c(CRC_SEED, data, 512);
>> 
>> 1 million times in a loop on the same data, and printk's if
>> the result ever changes) does not fail on x86_64 or ARM
>> (well, not after a gcc bug was fixed on ARM ...)
> 
> Is the machine doing things that would cause crc32c() operations in
> interrupts (SCTP protocol traffic) or on other cpus?
> 
> That's the danger in comparing other machines, the context and what's
> running on them is different.

Ok, I can reproduce this bug on my systems.  I'll see if I can figure out
what is going on.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-06-01 21:44             ` David Miller
@ 2017-06-02  1:57               ` David Miller
  -1 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-06-02  1:57 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs

From: David Miller <davem@davemloft.net>
Date: Thu, 01 Jun 2017 17:44:19 -0400 (EDT)

> Ok, I can reproduce this bug on my systems.  I'll see if I can figure out
> what is going on.

So I've done several tests to try and narrow down the cause.

First, I implemented crc32c() inside of the test module, doing
exactly the same thing that lib/libcrc32c.c is doing.  So this
make it use a separate tfm.

This never fails.

Then, I implemented a separate module "davem_crc32c.ko" that is
identical to lib/libcrc32.c except it uses it's own 'tfm' and it
exports the symbol davem_crc32c() instead of crc32c(). And finally I
adjust the test case to call davem_crc32c() instead of crc32c().

This also never fails.

So it only fails if we use the lib/libcrc32.c shared with the rest of
the kernel.

I really can't figure out yet why this sharing can even matter.  The
per-computation state is all in the on-stack 'shash':

	SHASH_DESC_ON_STACK(shash, tfm);

So invocations of crc32c() should not be able to corrupt the state of
other parallel invocations.

I'll keep digging, but that is where I am right now.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-06-02  1:57               ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-06-02  1:57 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs

From: David Miller <davem@davemloft.net>
Date: Thu, 01 Jun 2017 17:44:19 -0400 (EDT)

> Ok, I can reproduce this bug on my systems.  I'll see if I can figure out
> what is going on.

So I've done several tests to try and narrow down the cause.

First, I implemented crc32c() inside of the test module, doing
exactly the same thing that lib/libcrc32c.c is doing.  So this
make it use a separate tfm.

This never fails.

Then, I implemented a separate module "davem_crc32c.ko" that is
identical to lib/libcrc32.c except it uses it's own 'tfm' and it
exports the symbol davem_crc32c() instead of crc32c(). And finally I
adjust the test case to call davem_crc32c() instead of crc32c().

This also never fails.

So it only fails if we use the lib/libcrc32.c shared with the rest of
the kernel.

I really can't figure out yet why this sharing can even matter.  The
per-computation state is all in the on-stack 'shash':

	SHASH_DESC_ON_STACK(shash, tfm);

So invocations of crc32c() should not be able to corrupt the state of
other parallel invocations.

I'll keep digging, but that is where I am right now.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-06-02  1:57               ` David Miller
@ 2017-06-02  2:10                 ` Eric Sandeen
  -1 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-06-02  2:10 UTC (permalink / raw)
  To: David Miller; +Cc: matorola, sparclinux, linux-xfs

On 6/1/17 8:57 PM, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Thu, 01 Jun 2017 17:44:19 -0400 (EDT)
> 
>> Ok, I can reproduce this bug on my systems.  I'll see if I can figure out
>> what is going on.
> 
> So I've done several tests to try and narrow down the cause.
> 
> First, I implemented crc32c() inside of the test module, doing
> exactly the same thing that lib/libcrc32c.c is doing.  So this
> make it use a separate tfm.
> 
> This never fails.
> 
> Then, I implemented a separate module "davem_crc32c.ko" that is
> identical to lib/libcrc32.c except it uses it's own 'tfm' and it
> exports the symbol davem_crc32c() instead of crc32c(). And finally I
> adjust the test case to call davem_crc32c() instead of crc32c().
> 
> This also never fails.
> 
> So it only fails if we use the lib/libcrc32.c shared with the rest of
> the kernel.
> 
> I really can't figure out yet why this sharing can even matter.  The
> per-computation state is all in the on-stack 'shash':
> 
> 	SHASH_DESC_ON_STACK(shash, tfm);
> 
> So invocations of crc32c() should not be able to corrupt the state of
> other parallel invocations.
> 
> I'll keep digging, but that is where I am right now.

Thanks for digging.

On ARM, there was a gcc bug causing similar results - I /think/
it was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63293

"programs could fail sporadically with this if an interrupt happens at
the wrong instant in time and data was written onto the current stack."

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html

Maybe totally unrelated; if not, hope it helps.  :)

-Eric

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-06-02  2:10                 ` Eric Sandeen
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-06-02  2:10 UTC (permalink / raw)
  To: David Miller; +Cc: matorola, sparclinux, linux-xfs

On 6/1/17 8:57 PM, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Thu, 01 Jun 2017 17:44:19 -0400 (EDT)
> 
>> Ok, I can reproduce this bug on my systems.  I'll see if I can figure out
>> what is going on.
> 
> So I've done several tests to try and narrow down the cause.
> 
> First, I implemented crc32c() inside of the test module, doing
> exactly the same thing that lib/libcrc32c.c is doing.  So this
> make it use a separate tfm.
> 
> This never fails.
> 
> Then, I implemented a separate module "davem_crc32c.ko" that is
> identical to lib/libcrc32.c except it uses it's own 'tfm' and it
> exports the symbol davem_crc32c() instead of crc32c(). And finally I
> adjust the test case to call davem_crc32c() instead of crc32c().
> 
> This also never fails.
> 
> So it only fails if we use the lib/libcrc32.c shared with the rest of
> the kernel.
> 
> I really can't figure out yet why this sharing can even matter.  The
> per-computation state is all in the on-stack 'shash':
> 
> 	SHASH_DESC_ON_STACK(shash, tfm);
> 
> So invocations of crc32c() should not be able to corrupt the state of
> other parallel invocations.
> 
> I'll keep digging, but that is where I am right now.

Thanks for digging.

On ARM, there was a gcc bug causing similar results - I /think/
it was https://gcc.gnu.org/bugzilla/show_bug.cgi?idc293

"programs could fail sporadically with this if an interrupt happens at
the wrong instant in time and data was written onto the current stack."

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html

Maybe totally unrelated; if not, hope it helps.  :)

-Eric

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-06-02  2:10                 ` Eric Sandeen
@ 2017-06-02  3:33                   ` David Miller
  -1 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-06-02  3:33 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs

From: Eric Sandeen <sandeen@sandeen.net>
Date: Thu, 1 Jun 2017 21:10:50 -0500

> On ARM, there was a gcc bug causing similar results - I /think/
> it was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63293
> 
> "programs could fail sporadically with this if an interrupt happens at
> the wrong instant in time and data was written onto the current stack."
> 
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html
> 
> Maybe totally unrelated; if not, hope it helps.  :)

Wow, that looks exactly like what the bug is:

crc32c:
        .register       %g2, #scratch
        save    %sp, -176, %sp  !
        sethi   %hi(tfm), %g1   !, tmp121
        mov     %i2, %o2        ! length,
        ldx     [%g1+%lo(tfm)], %g2     ! tfm, tfm.0_4
        mov     %i1, %o1        ! address,
        lduw    [%g2], %g1      ! tfm.0_4->descsize, tfm.0_4->descsize
        add     %g1, 38, %g1    ! tfm.0_4->descsize,, tmp126
        srlx    %g1, 4, %g1     ! tmp126,, tmp127
        sllx    %g1, 4, %g1     ! tmp127,, tmp128
        sub     %sp, %g1, %sp   !, tmp128,
        add     %sp, 2230, %i5  !,, tmp130

Ok, %i5 holds the stack address of the shash context:

 ...
        return  %i7+8
         lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],

'return' deallocates the stack frame plus the register window, and at
the same time does a delayed control transfer to "%i7 + 8".  So in the
branch delay slot instruction %i5 becomes %o5.

And here we are accessing deallocated stack memory in the delay slot.

I'm using gcc-6.3.0 here.

And indeed the following patch makes the problem go away:

diff --git a/lib/libcrc32c.c b/lib/libcrc32c.c
index 74a54b7..bf831e2 100644
--- a/lib/libcrc32c.c
+++ b/lib/libcrc32c.c
@@ -43,7 +43,7 @@ static struct crypto_shash *tfm;
 u32 crc32c(u32 crc, const void *address, unsigned int length)
 {
 	SHASH_DESC_ON_STACK(shash, tfm);
-	u32 *ctx = (u32 *)shash_desc_ctx(shash);
+	u32 ret, *ctx = (u32 *)shash_desc_ctx(shash);
 	int err;
 
 	shash->tfm = tfm;
@@ -53,7 +53,9 @@ u32 crc32c(u32 crc, const void *address, unsigned int length)
 	err = crypto_shash_update(shash, address, length);
 	BUG_ON(err);
 
-	return *ctx;
+	ret = *ctx;
+	barrier();
+	return ret;
 }
 
 EXPORT_SYMBOL(crc32c);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-06-02  3:33                   ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-06-02  3:33 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs

From: Eric Sandeen <sandeen@sandeen.net>
Date: Thu, 1 Jun 2017 21:10:50 -0500

> On ARM, there was a gcc bug causing similar results - I /think/
> it was https://gcc.gnu.org/bugzilla/show_bug.cgi?idc293
> 
> "programs could fail sporadically with this if an interrupt happens at
> the wrong instant in time and data was written onto the current stack."
> 
> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html
> 
> Maybe totally unrelated; if not, hope it helps.  :)

Wow, that looks exactly like what the bug is:

crc32c:
        .register       %g2, #scratch
        save    %sp, -176, %sp  !
        sethi   %hi(tfm), %g1   !, tmp121
        mov     %i2, %o2        ! length,
        ldx     [%g1+%lo(tfm)], %g2     ! tfm, tfm.0_4
        mov     %i1, %o1        ! address,
        lduw    [%g2], %g1      ! tfm.0_4->descsize, tfm.0_4->descsize
        add     %g1, 38, %g1    ! tfm.0_4->descsize,, tmp126
        srlx    %g1, 4, %g1     ! tmp126,, tmp127
        sllx    %g1, 4, %g1     ! tmp127,, tmp128
        sub     %sp, %g1, %sp   !, tmp128,
        add     %sp, 2230, %i5  !,, tmp130

Ok, %i5 holds the stack address of the shash context:

 ...
        return  %i7+8
         lduw   [%o5+16], %o0   ! MEM[(u32 *)__shash_desc.1_10 + 16B],

'return' deallocates the stack frame plus the register window, and at
the same time does a delayed control transfer to "%i7 + 8".  So in the
branch delay slot instruction %i5 becomes %o5.

And here we are accessing deallocated stack memory in the delay slot.

I'm using gcc-6.3.0 here.

And indeed the following patch makes the problem go away:

diff --git a/lib/libcrc32c.c b/lib/libcrc32c.c
index 74a54b7..bf831e2 100644
--- a/lib/libcrc32c.c
+++ b/lib/libcrc32c.c
@@ -43,7 +43,7 @@ static struct crypto_shash *tfm;
 u32 crc32c(u32 crc, const void *address, unsigned int length)
 {
 	SHASH_DESC_ON_STACK(shash, tfm);
-	u32 *ctx = (u32 *)shash_desc_ctx(shash);
+	u32 ret, *ctx = (u32 *)shash_desc_ctx(shash);
 	int err;
 
 	shash->tfm = tfm;
@@ -53,7 +53,9 @@ u32 crc32c(u32 crc, const void *address, unsigned int length)
 	err = crypto_shash_update(shash, address, length);
 	BUG_ON(err);
 
-	return *ctx;
+	ret = *ctx;
+	barrier();
+	return ret;
 }
 
 EXPORT_SYMBOL(crc32c);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-06-02  3:33                   ` David Miller
@ 2017-06-02  3:34                     ` Eric Sandeen
  -1 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-06-02  3:34 UTC (permalink / raw)
  To: David Miller; +Cc: matorola, sparclinux, linux-xfs



On 6/1/17 10:33 PM, David Miller wrote:
> From: Eric Sandeen <sandeen@sandeen.net>
> Date: Thu, 1 Jun 2017 21:10:50 -0500
> 
>> On ARM, there was a gcc bug causing similar results - I /think/
>> it was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63293
>>
>> "programs could fail sporadically with this if an interrupt happens at
>> the wrong instant in time and data was written onto the current stack."
>>
>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html
>>
>> Maybe totally unrelated; if not, hope it helps.  :)
> 
> Wow, that looks exactly like what the bug is:

Sweet.

\o/

-Eric

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-06-02  3:34                     ` Eric Sandeen
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-06-02  3:34 UTC (permalink / raw)
  To: David Miller; +Cc: matorola, sparclinux, linux-xfs



On 6/1/17 10:33 PM, David Miller wrote:
> From: Eric Sandeen <sandeen@sandeen.net>
> Date: Thu, 1 Jun 2017 21:10:50 -0500
> 
>> On ARM, there was a gcc bug causing similar results - I /think/
>> it was https://gcc.gnu.org/bugzilla/show_bug.cgi?idc293
>>
>> "programs could fail sporadically with this if an interrupt happens at
>> the wrong instant in time and data was written onto the current stack."
>>
>> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02292.html
>>
>> Maybe totally unrelated; if not, hope it helps.  :)
> 
> Wow, that looks exactly like what the bug is:

Sweet.

\o/

-Eric

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-05-31 16:49           ` David Miller
@ 2017-06-06 19:05             ` David Miller
  -1 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-06-06 19:05 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs


Just FYI, I've meanwhile pushed a fix for the GCC bug that created
this mess to all active branches.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-06-06 19:05             ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2017-06-06 19:05 UTC (permalink / raw)
  To: sandeen; +Cc: matorola, sparclinux, linux-xfs


Just FYI, I've meanwhile pushed a fix for the GCC bug that created
this mess to all active branches.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
  2017-06-06 19:05             ` David Miller
@ 2017-06-06 19:09               ` Eric Sandeen
  -1 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-06-06 19:09 UTC (permalink / raw)
  To: David Miller; +Cc: matorola, sparclinux, linux-xfs

On 6/6/17 2:05 PM, David Miller wrote:
> 
> Just FYI, I've meanwhile pushed a fix for the GCC bug that created
> this mess to all active branches.

David -

Awesome.  Thanks for digging into it.

-Eric

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [sparc64] crc32c misbehave
@ 2017-06-06 19:09               ` Eric Sandeen
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Sandeen @ 2017-06-06 19:09 UTC (permalink / raw)
  To: David Miller; +Cc: matorola, sparclinux, linux-xfs

On 6/6/17 2:05 PM, David Miller wrote:
> 
> Just FYI, I've meanwhile pushed a fix for the GCC bug that created
> this mess to all active branches.

David -

Awesome.  Thanks for digging into it.

-Eric

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-06-06 19:09 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CADxRZqzgaL4ew6PVek3WBsdwo6GcT0ORx=7h+6p0V3NAr8qF+w@mail.gmail.com>
2017-05-31 11:56 ` [sparc64] crc32c misbehave Anatoly Pugachev
2017-05-31 11:56   ` Anatoly Pugachev
2017-05-31 12:12   ` Anatoly Pugachev
2017-05-31 12:12     ` Anatoly Pugachev
2017-05-31 15:53   ` David Miller
2017-05-31 15:53     ` David Miller
2017-05-31 16:03     ` David Miller
2017-05-31 16:03       ` David Miller
2017-05-31 16:19     ` Eric Sandeen
2017-05-31 16:19       ` Eric Sandeen
2017-05-31 16:31       ` Eric Sandeen
2017-05-31 16:31         ` Eric Sandeen
2017-05-31 16:49         ` David Miller
2017-05-31 16:49           ` David Miller
2017-06-01 21:44           ` David Miller
2017-06-01 21:44             ` David Miller
2017-06-02  1:57             ` David Miller
2017-06-02  1:57               ` David Miller
2017-06-02  2:10               ` Eric Sandeen
2017-06-02  2:10                 ` Eric Sandeen
2017-06-02  3:33                 ` David Miller
2017-06-02  3:33                   ` David Miller
2017-06-02  3:34                   ` Eric Sandeen
2017-06-02  3:34                     ` Eric Sandeen
2017-06-06 19:05           ` David Miller
2017-06-06 19:05             ` David Miller
2017-06-06 19:09             ` Eric Sandeen
2017-06-06 19:09               ` Eric Sandeen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.