On Fri, Feb 09, 2018 at 03:01:57PM +0100, Daniel Borkmann wrote: > On 02/09/2018 06:14 AM, Li Zhijian wrote: > > Hi > > > > INTEL 0-Day noticed that bpf/test_maps has different results at different platforms. > > when it fails, the details are like > > Sorry for the late reply and thanks for reporting! More below: > > > ------------------ > >   880 Failed to create hashmap key=16 value=131072 'Cannot allocate memory' > >   881 Failed to create hashmap key=8 value=32768 'Cannot allocate memory' > >   882 Failed to create hashmap key=8 value=131072 'Cannot allocate memory' > >   883 Failed to create hashmap key=16 value=32768 'Cannot allocate memory' > >   884 Failed to create hashmap key=8 value=16384 'Cannot allocate memory' > >   885 Failed to create hashmap key=16 value=16384 'Cannot allocate memory' > >   886 Failed to create hashmap key=8 value=65536 'Cannot allocate memory' > >   887 Failed to create hashmap key=16 value=131072 'Cannot allocate memory' > >   888 Failed to create hashmap key=16 value=32768 'Cannot allocate memory' > >   889 Failed to create hashmap key=16 value=65536 'Cannot allocate memory' > >   890 Failed to create hashmap key=8 value=65536 'Cannot allocate memory' > >   891 Failed to create hashmap key=8 value=131072 'Cannot allocate memory' > >   892 Failed to create hashmap key=8 value=131072 'Cannot allocate memory' > >   893 Failed to create hashmap key=16 value=32768 'Cannot allocate memory' > >   894 Failed to create hashmap key=8 value=16384 'Cannot allocate memory' > >   895 Failed to create hashmap key=8 value=131072 'Cannot allocate memory' > >   896 Failed to create hashmap key=16 value=8192 'Cannot allocate memory' > >   897 Failed to create hashmap key=8 value=32768 'Cannot allocate memory' > >   898 Failed to create hashmap key=16 value=8192 'Cannot allocate memory' > >   899 Failed to create hashmap key=8 value=262144 'Cannot allocate memory' > >   900 Failed to create hashmap key=8 value=262144 'Cannot allocate memory' > >   901 Failed to create hashmap key=8 value=262144 'Cannot allocate memory' > >   902 Failed to create hashmap key=16 value=262144 'Cannot allocate memory' > >   903 Failed to create hashmap key=8 value=262144 'Cannot allocate memory' > >   904 Failed to create hashmap key=8 value=262144 'Cannot allocate memory' > >   905 test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed. > >   906 Aborted > >   907 not ok 1..3 selftests:  test_maps [FAIL] > > ------------------ > > > > After a simply looking at the code, looks it's related to the cpu number and system memory. > > > > below are the result under different platform > > 1. Good > > model: Sandy Bridge > > nr_node: 1 > > nr_cpu: 4 > > memory: 6G > > > > 2. Good > > model: qemu-system-x86_64 -enable-kvm > > nr_cpu: 2 > > memory: 4G > > > > 3. Bad > > model: Ivytown Ivy Bridge-EP > > nr_cpu: 48 > > memory: 64G > > > > 4. Bad > > model: Skylake > > nr_cpu: 104 > > memory: 64G > > > > I try to change the process number to 10 from 100, so it can pass at above Skylake(4) machine. > > ------------ > > lizhijian@haswell-OptiPlex-9020:~/lkp/linux/tools/testing/selftests/bpf$ git diff > > diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c > > index 040356e..b788ca1 100644 > > --- a/tools/testing/selftests/bpf/test_maps.c > > +++ b/tools/testing/selftests/bpf/test_maps.c > > @@ -960,7 +960,7 @@ static void test_map_stress(void) > >  { > >         run_parallel(100, test_hashmap, NULL); > >         run_parallel(100, test_hashmap_percpu, NULL); > > -       run_parallel(100, test_hashmap_sizes, NULL); > > +       run_parallel(10, test_hashmap_sizes, NULL); > >         run_parallel(100, test_hashmap_walk, NULL); > >   > >         run_parallel(100, test_arraymap, NULL); > > Unless Alexei has some better idea, I think if the bpf_create_map() error in > the stress test is about ENOMEM, then we shouldn't fail hard via exit(), for > all other cases we should however. So probably makes sense to just check for > errno == ENOMEM in case of fd < 0 in test_hashmap_sizes() and then continue > to keep trying under stress. Feel free to send a patch, Li. that's probably good path for now. I also see that test_maps fails on freshly booted kernel with such assert, but then restarting test_maps again works and repeated runs succeed too. I suspect there is a deeper issue here related to memory allocation. Either slab or percpu allocator are behaving funky. It needs to be further debugged.