On Sun, 2010-09-19 at 12:47 +0530, Suzuki Poulose wrote: > Here is a patch which I had, sometime back. It is on an older version of > LTP. > > Cheers > > Suzuki > > > , Suzuki Poulose wrote: > > , renxiu liang wrote: > >> On Sun, 2010-09-19 at 12:06 +0530, Suzuki Poulose wrote: > >>> Hello Renxiu, > >>> > >>> Sorry for the late reply. Could you please resend the patch ? I don't > >>> find the patch in the e-mail. > >>> > >> Hi Suzuki > >> > >> attached failure logs on different archs and patch for you. > > > > I don't understand why there should be ACCERR. The mmap is done with > > PROT_WRIT|PROT_READ. Could we get more info on which condition triggers > > this ? > > I'm trying to explain the ACCERR case with strace output because I'm not able to debug this with GDB, it's much more time sensitive case. I added some print info in mmap1.c to indicate the read and write thread. See below: strace -o /tmp/output8 -f ./mmap1 -x 0.05 mmap1 0 INFO : pthread_create for map_write_unmap:583 mmap1 0 INFO : pid[5406]: map, change contents, unmap files 1000 times mmap1 0 INFO : created write thread[1082139792] mmap1 0 INFO : pthread_create for read_mem:594 mmap1 0 INFO : pid[5406] - read contents of memory 0x40803000 1000 times mmap1 0 INFO : before strncmp in read_mem: 393 mmap1 0 INFO : info->si_code=2, info->si_addr=0x40803000,map_address=0x40803000 mmap1 0 INFO : value at map_address= mmap1 0 INFO : info->si_code=2, info->si_addr=0x40803000,map_address=0x40803000 caught unexpected signal - 11 --- exiting root@arm926:/opt/ltp-full/testcases/bin> Here the si_code=2 is ACCERR error; the error occurs when strncmp at map_address(0x40803000). map_address(0x40803000) is used for another read thread and protected with PROT_NON in mprotect(),you can get this in strace output; any access to 0x40803000 and 0x40803000+4096 will trigger SIGSEGV with ACCERR. Below is part of the strace output: 5406 open("ashfileeGKbWx", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 5406 unlink("ashfileeGKbWx") = 0 5406 lseek(3, 1023, SEEK_SET) = 1023 5406 write(3, "\0", 1) = 1 5406 fsync(3) = 0 5406 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(5, 1), ...}) = 0 5406 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 5406 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40002000 5406 write(1, "mmap1 0 INFO : pthread_"..., 63) = 63 5406 mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, -1, 0) = 0x40003000 5406 brk(0) = 0x16000 5406 brk(0x37000) = 0x37000 5406 mprotect(0x40003000, 4096, PROT_NONE) = 0 5406 clone(child_stack=0x40801fe8, flags=CLONE_VM|CLONE_FS|CLONE_FILES| CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS| CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x408024d8, tls=0x40802930, child_tidptr=0x408024d8) = 5407 5407 SYS_338(0x408024e0, 0xc, 0x4f2fc000, 0xc, 0x408024d8) = 0 5407 write(1, "mmap1 0 INFO : pid[5406"..., 80) = 80 5407 mmap2(NULL, 1024, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x40803000 5407 nanosleep({0, 1000}, NULL) = 0 5407 munmap(0x40803000, 1024) = 0 5407 mmap2(NULL, 1024, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x40803000 5407 nanosleep({0, 1000}, NULL) = 0 5407 munmap(0x40803000, 1024) = 0 5407 mmap2(NULL, 1024, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x40803000 5407 nanosleep({0, 1000}, NULL) = 0 5407 munmap(0x40803000, 1024) = 0 ............. ............. <<<<<<<<<<>>>>>>>> ............. 5407 nanosleep({0, 1000}, NULL) = 0 5407 munmap(0x40803000, 1024) = 0 5407 mmap2(NULL, 1024, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x40803000 5407 nanosleep({0, 1000}, NULL) = 0 5407 munmap(0x40803000, 1024) = 0 5407 open("/lib/libgcc_s.so.1", O_RDONLY 5406 write(1, "mmap1 0 INFO : created "..., 57 5407 <... open resumed> ) = 4 5407 read(4, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0P \246XO4"..., 512) = 512 5407 fstat64(4, {st_mode=S_IFREG|0755, st_size=260598, ...}) = 0 5407 mmap2(0x4f588000, 75760, PROT_READ|PROT_EXEC, MAP_PRIVATE| MAP_DENYWRITE, 4, 0) = 0x4f588000 5407 mprotect(0x4f593000, 28672, PROT_NONE) = 0 5407 mmap2(0x4f59a000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE| MAP_FIXED|MAP_DENYWRITE, 4, 0xa) = 0x4f59a000 5407 close(4) = 0 5407 exit(0) = ? 5406 <... write resumed> ) = 57 5406 sched_yield() = 0 5406 write(1, "mmap1 0 INFO : pthread_"..., 56) = 56 5406 mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, -1, 0) = 0x40803000 5406 mprotect(0x40803000, 4096, PROT_NONE) = 0 5406 clone(child_stack=0x41001fe8, flags=CLONE_VM|CLONE_FS|CLONE_FILES| CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS| CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x410024d8, tls=0x41002930, child_tidptr=0x410024d8) = 5408 5408 SYS_338(0x410024e0, 0xc, 0x4f2fc000, 0xc, 0x410024d8) = 0 5408 write(1, "mmap1 0 INFO : pid[5406"..., 82) = 82 5408 write(1, "mmap1 0 INFO : before s"..., 56) = 56 5408 --- SIGSEGV (Segmentation fault) @ 0 (0) --- 5408 write(1, "mmap1 0 INFO : info->si"..., 89) = 89 5408 write(1, "mmap1 0 INFO : value at"..., 46) = 46 5408 futex(0x4f2c4010, 0x81 /* FUTEX_??? */, 1) = 0 5408 write(1, "mmap1 0 INFO : info->si"..., 89) = 89 5408 write(2, "caught unexpected signal - 11 --"..., 42) = 42 5408 exit_group(-1) = ? See above, pid 5406 indicates the main process; pid 5407 indicates the write thread; pid 5408 indicates the read thread; 0x40803000 is used to create the read thread 5408 and mprotected with PROT_NON, in the meantime, strncmp in read_mem tries to access the address, this triggers the ACCERR error: 5406 mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, -1, 0) = 0x40803000 5406 mprotect(0x40803000, 4096, PROT_NONE) = 0 5406 clone(child_stack=0x41001fe8, flags=CLONE_VM|CLONE_FS|CLONE_FILES| CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS| CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x410024d8, tls=0x41002930, child_tidptr=0x410024d8) = 5408 Please refer to the attached strace log. > > You may use GDB to break for the ACCERR case and take a look at the > > instruction which triggers this. > > > > Coming back to your patch : > > > > I think we should ensure the si_addr is within the map. (map_address, > > map_address+mapsize-1) ? We should not let the other SIGSEGVs go through. > > I agree with this, but this does not cover the following case: root@localhost:/opt/ltp-full/testcases/bin> ./mmap1 -x 0.002 mmap1 0 TINFO : created thread[724210864] mmap1 0 TINFO : created thread[732665008] mmap1 0 TINFO : pid[836] - read contents of memory (nil) 1000 times mmap1 0 TINFO : info->si_code=1, info->si_addr=(nil), map_address=(nil) mmap1 0 TINFO : page fault occurred at (nil) mmap1 0 TINFO : pid[836]: map, change contents, unmap files 1000 times mmap1 0 TINFO : info->si_code=1, info->si_addr=(nil), map_address=(nil) mmap1 0 TINFO : info->si_code=1, info->si_addr=(nil), map_address=0x2baba000 caught unexpected signal - 11 --- exiting Here, read thread first triggers signal 11, si_addr and map_addr are nil, while seconds later, write thread successfully did mmap, then map_addr changed to 0x2baba000, thus sig_handler function got the inconsistency between si_addr(nil) and map_address(0x2baba000), and exited unexpectedly. I think we should consider ACCERR case in mmap1.c, also should cover all possible cases about the si_addr and map_address. Looking forward to your advices. Thanks, Renxiu > > It may be like : > > > > case SIGSEGV: > > if ((si_addr >= map_address) && > > (si_addr < (map_address + map_size) ) { > > ... > > .. > > } > > > > > > Thanks > > > > Suzuki > > > > > > > > > >> > >> thanks for help review this. > >> > >> BR > >> Renxiu > >>> Thanks > >>> > >>> Suzuki > >>> , renxiu liang wrote: > >>>> On Wed, 2010-09-01 at 00:27 -0700, Garrett Cooper wrote: > >>>>> On Wed, Sep 1, 2010 at 12:26 AM, Garrett Cooper > >>>>> wrote: > >>>>>> On Mon, Aug 30, 2010 at 6:51 PM, renxiu liang > >>>>>> wrote: > >>>>>>> On Wed, 2010-08-25 at 20:36 -0700, Garrett Cooper wrote: > >>>>>>>> On Wed, Aug 18, 2010 at 12:33 AM, renxiu liang > >>>>>>>> wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> We met the mtest06 failure several times; this is because in > >>>>>>>>> mtest06/mmap1.c, when handling the signal 11 in sig_handler, it > >>>>>>>>> doesn't > >>>>>>>>> cover another two race conditions: > >>>>>>>>> one is si_code equals "SEGV_MAPERR" but si_address does not > >>>>>>>>> equal to > >>>>>>>>> map_address; and one is si_code equals to "SEGV_ACCERR"; > >>>>>>>>> > >>>>>>>>> see below error log: > >>>>>>>>> > >>>>>>>>> <<>> > >>>>>>>>> tag=mtest06 stime=1270902396 > >>>>>>>>> cmdline=" mmap1 -x 0.05" > >>>>>>>>> contacts="" > >>>>>>>>> analysis=exit > >>>>>>>>> initiation_status="ok" > >>>>>>>>> <<>> > >>>>>>>>> mmap1 0 INFO : pid[5456]: map, change contents, unmap files > >>>>>>>>> 1000 times > >>>>>>>>> mmap1 0 INFO : created thread[1216369840] > >>>>>>>>> mmap1 0 INFO : pid[5456] - read contents of memory 0x48002000 > >>>>>>>>> 1000 times > >>>>>>>>> mmap1 0 INFO : page fault occurred at 0x48002000 > >>>>>>>>> mmap1 0 INFO : page fault occurred at 0x48002000 > >>>>>>>>> mmap1 0 INFO : page fault occurred at 0x48002000 > >>>>>>>>> mmap1 0 INFO : page fault occurred at 0x48002000 > >>>>>>>>> ...... > >>>>>>>>> mmap1 0 INFO : page fault occurred at 0x48002000 > >>>>>>>>> mmap1 0 INFO : page fault occurred at 0x48002000 > >>>>>>>>> caught unexpected signal - 11 --- exiting > >>>>>>>>> <<>> > >>>>>>>>> duration=1 termination_type=exited termination_id=255 corefile=no > >>>>>>>>> cutime=0 cstime=4 > >>>>>>>>> <<>> > >>>>>>>>> > >>>>>>>>> I made a patch to cover all the three race conditions in one > >>>>>>>>> case in > >>>>>>>>> sig_handler(), > >>>>>>>>> then run mtest06 repeatedly on different archs, test will not > >>>>>>>>> fail by > >>>>>>>>> catching signal 11. > >>>>>>>> > >>>>>>>> Could you print out what the value of info->si_code and > >>>>>>>> info->si_addr > >>>>>>>> are at the time of the fault? Also, which architecture(s) are you > >>>>>>>> running into this issue on? > >>>>>>> Hi Garrett, > >>>>>>> > >>>>>>> Sorry for the late reply; See attached mtest06 logs, I printed > >>>>>>> out the > >>>>>>> values of si_code, si_addr, map_address; > >>>>>>> it is reproducible on arm, x86, ppc and mips; though it is not > >>>>>>> reproducible every time, but can be reproducible if run the case > >>>>>>> repeatedly. > >>>>>> > >>>>>> Nothing is jumping out at me as being incorrect, so I think it's best > >>>>>> that I bring in a better pair of eyes, just in case. > >>>>>> > >>>>>> Suzuki-san, > >>>>>> Could you please help review this patch? > >>>>> > >>>> Hi Garrett, > >>>> No reply from Suzuki-san after almost three weeks; I'm afraid > >>>> Suzuki-san > >>>> didn't catch the mail, could you find someone else to review the patch? > >>>> thanks! > >>>> > >>>> BR, > >>>> Renxiu > >>>> > >>>>> CCing Suzuki-san... > >>>>> -Garrett > >>>> > >>> > >> > > >