On Fri, 2016-05-20 at 10:18 +0200, Peter Zijlstra wrote: > On Fri, May 20, 2016 at 10:13:15AM +0200, Peter Zijlstra wrote: > > On Thu, May 19, 2016 at 10:39:26PM -0700, Davidlohr Bueso wrote: > > > > > [1] https://hg.java.net/hg/libmicro~hg-repo > > > > So far I've managed to install mercurial and clone this thing, but > > it > > doesn't actually build :/ > > > > I'll try harder.. > > The stuff needs this.. > > --- > diff -r 7dd95b416c3c Makefile.com > --- a/Makefile.com Thu Jul 26 12:56:00 2012 -0700 > +++ b/Makefile.com Fri May 20 10:18:08 2016 +0200 > @@ -107,7 +107,7 @@ > echo "char compiler_version[] = > \""`$(COMPILER_VERSION_CMD)`"\";" > tattle.h > echo "char CC[] = \""$(CC)"\";" >> tattle.h > echo "char extra_compiler_flags[] = \""$(extra_CFLAGS)"\";" > >> tattle.h > - $(CC) -o tattle $(CFLAGS) -I. ../tattle.c libmicro.a -lrt > -lm > + $(CC) -o tattle $(CFLAGS) -I. ../tattle.c libmicro.a -lrt > -lm -lpthread > > $(ELIDED_BENCHMARKS): ../elided.c > $(CC) -o $(@) ../elided.c > Hello Peter, right, we forgot to mention that the libmicro Makefile is broken; sorry for the hassle. At the bottom of this message you'll find the script I use to reproduce the problem; you might have to modify the variable $CASCADE_PATH. The script needs an argument, which is the offending benchmark to run, like $ ./run_cascade.sh c_flock_200 or $ ./run_cascade.sh c_cond_10 This runs the benchmark 10 times, and kills it if it lasts too long. I get around 3 hangs per invocation, and on the affected kernels (4.2 or later) I get around one panic each invocation of this reproducer. The .config file with which you build the kernel seems to affect that, too; I attach 2 config files: - config.no-bug - config.with-bug The results I report (hangs & panics) happens if I compile with config.with-bug, but disappear with config.no-bug. If you take config.no-bug as reference, config.with-bug introduces CONFIG_MFD_SYSCON=y CONFIG_NO_HZ_IDLE=y CONFIG_QUEUED_SPINLOCK=y CONFIG_REGMAP=y CONFIG_REGMAP_MMIO=y CONFIG_TICK_CPU_ACCOUNTING=y and removes CONFIG_BLK_DEV_DM=m CONFIG_BLK_DEV_DM_BUILTIN=y CONFIG_CONTEXT_TRACKING=y CONFIG_DM_UEVENT=y CONFIG_NO_HZ_FULL=y CONFIG_PAGE_EXTENSION=y CONFIG_PAGE_OWNER=y CONFIG_PARAVIRT_SPINLOCKS=y CONFIG_PERSISTENT_KEYRINGS=y CONFIG_RCU_NOCB_CPU=y CONFIG_RCU_NOCB_CPU_NONE=y CONFIG_RCU_USER_QS=y CONFIG_STAGING=y CONFIG_UNINLINE_SPIN_UNLOCK=y CONFIG_VIRT_CPU_ACCOUNTING=y CONFIG_VIRT_CPU_ACCOUNTING_GEN=y Most of those params might be irrelevant, but some must trigger the problem. Both configs are taken from /proc/config.gz on a running system. FWIW my test machine is a 48 haswell cores with 64GB or RAM. Giovanni SUSE Labs ----------- run_cascade.sh ------------------------------------- #!/bin/bash TESTCASE=$1 CASCADE_PATH="libmicro-1-installed/bin-x86_64" case $TESTCASE in c_flock_200) BINNAME="cascade_flock" COMMAND="$CASCADE_PATH/cascade_flock -E -D 60000 -L -S -W \ -N c_flock_200 \ -P 200 -I 5000000" # c_flock_200 is supposed to last 60 seconds. SLEEPTIME=70 ;; c_cond_10) BINNAME="cascade_cond" COMMAND="$CASCADE_PATH/cascade_cond -E -C 2000 -L -S -W \ -N c_cond_10 \ -T 10 -I 3000" # c_cond_10 terminates in less than 1 second. SLEEPTIME=5 ;; *) echo "Unknown test case" >&2 exit 1 ;; esac ERRORS=0 uname -a for i in {1..10} ; do { eval $COMMAND & } >/dev/null 2>&1 sleep $SLEEPTIME if pidof $BINNAME >/dev/null ; then echo Run \#$i: $TESTCASE hangs for PID in $(pidof $BINNAME) ; do head -1 /proc/$PID/stack done | sort | uniq -c ERRORS=$((ERRORS+1)) killall $BINNAME else echo Run \#$i: $TESTCASE exits successfully fi done echo $TESTCASE hanged $ERRORS times. ----------------------------------------------------------------