From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: fio signal 11 References: <575CCF51.8020704@kernel.dk> <575CD738.1050505@kernel.dk> <20160615144502.GH1607@quack2.suse.cz> <5762500B.2040202@kernel.dk> <20160720050832.GA3918@quack2.suse.cz> <8683247d-429c-e639-78a5-912316ea9e21@kernel.dk> <20160726084307.GA6860@quack2.suse.cz> From: Jens Axboe Message-ID: <2bd421b5-7d16-e948-e86f-da19f5ae297e@kernel.dk> Date: Tue, 26 Jul 2016 08:17:28 -0600 MIME-Version: 1.0 In-Reply-To: <20160726084307.GA6860@quack2.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit To: Jan Kara Cc: Jeff Furlong , Sitsofe Wheeler , "fio@vger.kernel.org" List-ID: On 07/26/2016 02:43 AM, Jan Kara wrote: > On Mon 25-07-16 09:21:00, Jens Axboe wrote: >> On 07/19/2016 11:08 PM, Jan Kara wrote: >>> On Thu 16-06-16 09:06:51, Jens Axboe wrote: >>>> On 06/15/2016 04:45 PM, Jan Kara wrote: >>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote: >>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote: >>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote: >>>>>>>> Good point. Here is the trace: >>>>>>>> >>>>>>>> [New LWP 59231] >>>>>>>> [Thread debugging using libthread_db enabled] >>>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1". >>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio >>>>>>>> --direct=1 --rw=write --iodepth=32'. >>>>>>>> Program terminated with signal 11, Segmentation fault. >>>>>>>> #0 0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at >>>>>>>> stat.c:1909 >>>>>>>> 1909 if (!cur_log) { >>>>>>>> >>>>>>>> (gdb) bt >>>>>>>> #0 0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at >>>>>>>> stat.c:1909 >>>>>>>> #1 0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at >>>>>>>> stat.c:1965 >>>>>>>> #2 0x000000000040ca90 in wait_for_completions >>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at >>>>>>>> backend.c:446 >>>>>>>> #3 0x000000000045ade7 in do_io (bytes_done=, >>>>>>>> td=0x7f8277de0000) at backend.c:991 >>>>>>>> #4 thread_main (data=data@entry=0x264d450) at backend.c:1667 >>>>>>>> #5 0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at >>>>>>>> backend.c:2217 >>>>>>>> #6 0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at >>>>>>>> backend.c:2349 >>>>>>>> #7 0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638, >>>>>>>> envp=) at fio.c:63 >>>>>>> >>>>>>> That looks odd, thanks for reporting this. I'll see if I can get to this >>>>>>> on Monday, if not, it'll have to wait until after my vacation... So >>>>>>> while I appreciate people running -git and finding issues like these >>>>>>> before they show up in a release, might be best to revert back to 2.2.11 >>>>>>> until I can get this debugged. >>>>>> >>>>>> I take that back - continue using -git! Just pull a fresh copy, should >>>>>> be fixed now. >>>>>> >>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I just ran >>>>>> a quick bisect, changing the logging from every second to every 100ms to >>>>>> make it reproduce faster. I don't have time to look into why yet, so I >>>>>> just reverted the commit. >>>>>> >>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f >>>>>> Author: Jan Kara >>>>>> Date: Tue May 24 17:03:21 2016 +0200 >>>>>> >>>>>> fio: Simplify forking of processes >>>>> >>>>> Hum, I've tried reproducing this but failed (I've tried using /dev/ram0 and >>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the >>>>> device fio works with? I have used commit >>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted my >>>>> patch) for testing. >>>> >>>> On vacation right now, I'll check when I get back. It is possible that it >>>> was just a fluke, since there was another bug there related to shared >>>> memory, but it was predictably crashing at the same time for the bisect. >>>> >>>> It doesn't make a lot of sense, however. >>> >>> Did you have a chance to look into this? >> >> I have not, unfortunately, but I'm suspecting the patch is fine and the >> later fix to allocate the cur_log out of the shared pool was the real >> fix and that the original patch was fine. > > So that's what I'd suspect as well but I'm not able to reproduce even the > original crash so I cannot verify this theory... What's the plan going > forward? Will you re-apply the patch? Frankly, I don't care much, it was > just a small cleanup. I'm just curious whether it was really that other bug > or whether I miss something. Yes, I think re-applying would be the best way forward. Especially since that 2.13 was just released, so we'll have a while to iron out any issues. But I really don't see how it could be the reason for the issue, I'm guessing it just exacerbated it somehow. -- Jens Axboe