* Exit all jobs on error @ 2015-12-10 7:01 Sitsofe Wheeler 2015-12-10 17:11 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Sitsofe Wheeler @ 2015-12-10 7:01 UTC (permalink / raw) To: fio Hi, Is there an option to exit all jobs but only on error? If I have a job like this [global] stonewall=1 verify=crc32 rw=write [pass1] bs=4k [pass2] bs=8k I want fio to stop if pass1 fails verification and for pass2 not to be performed at all. I'm aware of "exitall" but using that will make fio quit even if pass1 is successful. -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 7:01 Exit all jobs on error Sitsofe Wheeler @ 2015-12-10 17:11 ` Jens Axboe 2015-12-10 17:58 ` Sitsofe Wheeler 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2015-12-10 17:11 UTC (permalink / raw) To: Sitsofe Wheeler, fio On 12/10/2015 12:01 AM, Sitsofe Wheeler wrote: > Hi, > > Is there an option to exit all jobs but only on error? If I have a job like this > > [global] > stonewall=1 > verify=crc32 > rw=write > [pass1] > bs=4k > [pass2] > bs=8k > > I want fio to stop if pass1 fails verification and for pass2 not to be > performed at all. I'm aware of "exitall" but using that will make fio > quit even if pass1 is successful. That doesn't exist, but we could add a exitall_on_error to have that behavior. Should be pretty easy to add. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 17:11 ` Jens Axboe @ 2015-12-10 17:58 ` Sitsofe Wheeler 2015-12-10 18:11 ` Andrey Kuzmin 2015-12-10 18:15 ` Jens Axboe 0 siblings, 2 replies; 16+ messages in thread From: Sitsofe Wheeler @ 2015-12-10 17:58 UTC (permalink / raw) To: Jens Axboe; +Cc: fio On 10 December 2015 at 17:11, Jens Axboe <axboe@kernel.dk> wrote: > On 12/10/2015 12:01 AM, Sitsofe Wheeler wrote: >> >> Hi, >> >> Is there an option to exit all jobs but only on error? If I have a job >> like this >> >> [global] >> stonewall=1 >> verify=crc32 >> rw=write >> [pass1] >> bs=4k >> [pass2] >> bs=8k >> >> I want fio to stop if pass1 fails verification and for pass2 not to be >> performed at all. I'm aware of "exitall" but using that will make fio >> quit even if pass1 is successful. > > > That doesn't exist, but we could add a exitall_on_error to have that > behavior. Should be pretty easy to add. That would work for me - that way it could be put in the global section or per (stonewall) group. -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 17:58 ` Sitsofe Wheeler @ 2015-12-10 18:11 ` Andrey Kuzmin 2015-12-10 18:15 ` Jens Axboe 2015-12-10 18:15 ` Jens Axboe 1 sibling, 1 reply; 16+ messages in thread From: Andrey Kuzmin @ 2015-12-10 18:11 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: Jens Axboe, fio I've also encountered a similar issue a number of times where the job failed to stop (and refused to terminate in response to C-C) when a thread/process fails, e.g. due to an error. My guess is that the loop that waits for completions doesn't check for td->terminate being set. Regards, Andrey Regards, Andrey On Thu, Dec 10, 2015 at 8:58 PM, Sitsofe Wheeler <sitsofe@gmail.com> wrote: > On 10 December 2015 at 17:11, Jens Axboe <axboe@kernel.dk> wrote: >> On 12/10/2015 12:01 AM, Sitsofe Wheeler wrote: >>> >>> Hi, >>> >>> Is there an option to exit all jobs but only on error? If I have a job >>> like this >>> >>> [global] >>> stonewall=1 >>> verify=crc32 >>> rw=write >>> [pass1] >>> bs=4k >>> [pass2] >>> bs=8k >>> >>> I want fio to stop if pass1 fails verification and for pass2 not to be >>> performed at all. I'm aware of "exitall" but using that will make fio >>> quit even if pass1 is successful. >> >> >> That doesn't exist, but we could add a exitall_on_error to have that >> behavior. Should be pretty easy to add. > > That would work for me - that way it could be put in the global > section or per (stonewall) group. > > -- > Sitsofe | http://sucs.org/~sits/ > -- > To unsubscribe from this list: send the line "unsubscribe fio" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 18:11 ` Andrey Kuzmin @ 2015-12-10 18:15 ` Jens Axboe 2015-12-10 18:17 ` Andrey Kuzmin 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2015-12-10 18:15 UTC (permalink / raw) To: Andrey Kuzmin; +Cc: Sitsofe Wheeler, fio On Thu, Dec 10 2015, Andrey Kuzmin wrote: > I've also encountered a similar issue a number of times where the job > failed to stop (and refused to terminate in response to C-C) when a > thread/process fails, e.g. due to an error. My guess is that the loop > that waits for completions doesn't check for td->terminate being set. Attach with gdb and see what they are doing, could be a missing terminate check. Or it could already be sitting waiting for completions. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 18:15 ` Jens Axboe @ 2015-12-10 18:17 ` Andrey Kuzmin 2015-12-10 18:24 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Andrey Kuzmin @ 2015-12-10 18:17 UTC (permalink / raw) To: Jens Axboe; +Cc: Sitsofe Wheeler, fio On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote: > On Thu, Dec 10 2015, Andrey Kuzmin wrote: >> I've also encountered a similar issue a number of times where the job >> failed to stop (and refused to terminate in response to C-C) when a >> thread/process fails, e.g. due to an error. My guess is that the loop >> that waits for completions doesn't check for td->terminate being set. > > Attach with gdb and see what they are doing, could be a missing > terminate check. Or it could already be sitting waiting for completions. It just sits there waiting for completions, as gdb understandably predominantly hits the wait state. Regards, Andrey > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 18:17 ` Andrey Kuzmin @ 2015-12-10 18:24 ` Jens Axboe 2015-12-10 18:27 ` Andrey Kuzmin 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2015-12-10 18:24 UTC (permalink / raw) To: Andrey Kuzmin; +Cc: Sitsofe Wheeler, fio On 12/10/2015 11:17 AM, Andrey Kuzmin wrote: > On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >> On Thu, Dec 10 2015, Andrey Kuzmin wrote: >>> I've also encountered a similar issue a number of times where the job >>> failed to stop (and refused to terminate in response to C-C) when a >>> thread/process fails, e.g. due to an error. My guess is that the loop >>> that waits for completions doesn't check for td->terminate being set. >> >> Attach with gdb and see what they are doing, could be a missing >> terminate check. Or it could already be sitting waiting for completions. > > It just sits there waiting for completions, as gdb understandably > predominantly hits the wait state. Where is it sitting and/or looping? -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 18:24 ` Jens Axboe @ 2015-12-10 18:27 ` Andrey Kuzmin 2015-12-10 18:29 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Andrey Kuzmin @ 2015-12-10 18:27 UTC (permalink / raw) To: Jens Axboe; +Cc: Sitsofe Wheeler, fio On Thu, Dec 10, 2015 at 9:24 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 12/10/2015 11:17 AM, Andrey Kuzmin wrote: >> >> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >>> >>> On Thu, Dec 10 2015, Andrey Kuzmin wrote: >>>> >>>> I've also encountered a similar issue a number of times where the job >>>> failed to stop (and refused to terminate in response to C-C) when a >>>> thread/process fails, e.g. due to an error. My guess is that the loop >>>> that waits for completions doesn't check for td->terminate being set. >>> >>> >>> Attach with gdb and see what they are doing, could be a missing >>> terminate check. Or it could already be sitting waiting for completions. >> >> >> It just sits there waiting for completions, as gdb understandably >> predominantly hits the wait state. > > > Where is it sitting and/or looping? unix/wait smth ;), as far as I recall. If you need an exact ref, let me make up an error in the code, run, and get back to you with the exact gdb frame info. Regards, A. > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 18:27 ` Andrey Kuzmin @ 2015-12-10 18:29 ` Jens Axboe 2015-12-10 18:30 ` Andrey Kuzmin 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2015-12-10 18:29 UTC (permalink / raw) To: Andrey Kuzmin; +Cc: Sitsofe Wheeler, fio On 12/10/2015 11:27 AM, Andrey Kuzmin wrote: > On Thu, Dec 10, 2015 at 9:24 PM, Jens Axboe <axboe@kernel.dk> wrote: >> On 12/10/2015 11:17 AM, Andrey Kuzmin wrote: >>> >>> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >>>> >>>> On Thu, Dec 10 2015, Andrey Kuzmin wrote: >>>>> >>>>> I've also encountered a similar issue a number of times where the job >>>>> failed to stop (and refused to terminate in response to C-C) when a >>>>> thread/process fails, e.g. due to an error. My guess is that the loop >>>>> that waits for completions doesn't check for td->terminate being set. >>>> >>>> >>>> Attach with gdb and see what they are doing, could be a missing >>>> terminate check. Or it could already be sitting waiting for completions. >>> >>> >>> It just sits there waiting for completions, as gdb understandably >>> predominantly hits the wait state. >> >> >> Where is it sitting and/or looping? > > unix/wait smth ;), as far as I recall. > > If you need an exact ref, let me make up an error in the code, run, > and get back to you with the exact gdb frame info. I'm generally not in the crystal ball or guessing game :-) So yeah, a stack trace would be helpful. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 18:29 ` Jens Axboe @ 2015-12-10 18:30 ` Andrey Kuzmin 2015-12-11 10:01 ` Andrey Kuzmin 0 siblings, 1 reply; 16+ messages in thread From: Andrey Kuzmin @ 2015-12-10 18:30 UTC (permalink / raw) To: Jens Axboe; +Cc: Sitsofe Wheeler, fio On Thu, Dec 10, 2015 at 9:29 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 12/10/2015 11:27 AM, Andrey Kuzmin wrote: >> >> On Thu, Dec 10, 2015 at 9:24 PM, Jens Axboe <axboe@kernel.dk> wrote: >>> >>> On 12/10/2015 11:17 AM, Andrey Kuzmin wrote: >>>> >>>> >>>> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >>>>> >>>>> >>>>> On Thu, Dec 10 2015, Andrey Kuzmin wrote: >>>>>> >>>>>> >>>>>> I've also encountered a similar issue a number of times where the job >>>>>> failed to stop (and refused to terminate in response to C-C) when a >>>>>> thread/process fails, e.g. due to an error. My guess is that the loop >>>>>> that waits for completions doesn't check for td->terminate being set. >>>>> >>>>> >>>>> >>>>> Attach with gdb and see what they are doing, could be a missing >>>>> terminate check. Or it could already be sitting waiting for >>>>> completions. >>>> >>>> >>>> >>>> It just sits there waiting for completions, as gdb understandably >>>> predominantly hits the wait state. >>> >>> >>> >>> Where is it sitting and/or looping? >> >> >> unix/wait smth ;), as far as I recall. >> >> If you need an exact ref, let me make up an error in the code, run, >> and get back to you with the exact gdb frame info. > > > I'm generally not in the crystal ball or guessing game :-) > > So yeah, a stack trace would be helpful. OK, will do. Regards, Andrey > > -- > Jens Axboe > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 18:30 ` Andrey Kuzmin @ 2015-12-11 10:01 ` Andrey Kuzmin 2015-12-11 15:32 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Andrey Kuzmin @ 2015-12-11 10:01 UTC (permalink / raw) To: Jens Axboe; +Cc: Sitsofe Wheeler, fio ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 01d:12h:24m:29s] Program received signal SIGINT, Interrupt. 0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32 #2 0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951 #3 0x000000000045b33c in run_threads () at backend.c:2216 #4 0x000000000045b6a8 in fio_backend () at backend.c:2333 #5 0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8, envp=0x7fffffffddd0) at fio.c:60 Regards, Andrey On Thu, Dec 10, 2015 at 9:30 PM, Andrey Kuzmin <andrey.v.kuzmin@gmail.com> wrote: > On Thu, Dec 10, 2015 at 9:29 PM, Jens Axboe <axboe@kernel.dk> wrote: >> On 12/10/2015 11:27 AM, Andrey Kuzmin wrote: >>> >>> On Thu, Dec 10, 2015 at 9:24 PM, Jens Axboe <axboe@kernel.dk> wrote: >>>> >>>> On 12/10/2015 11:17 AM, Andrey Kuzmin wrote: >>>>> >>>>> >>>>> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote: >>>>>> >>>>>> >>>>>> On Thu, Dec 10 2015, Andrey Kuzmin wrote: >>>>>>> >>>>>>> >>>>>>> I've also encountered a similar issue a number of times where the job >>>>>>> failed to stop (and refused to terminate in response to C-C) when a >>>>>>> thread/process fails, e.g. due to an error. My guess is that the loop >>>>>>> that waits for completions doesn't check for td->terminate being set. >>>>>> >>>>>> >>>>>> >>>>>> Attach with gdb and see what they are doing, could be a missing >>>>>> terminate check. Or it could already be sitting waiting for >>>>>> completions. >>>>> >>>>> >>>>> >>>>> It just sits there waiting for completions, as gdb understandably >>>>> predominantly hits the wait state. >>>> >>>> >>>> >>>> Where is it sitting and/or looping? >>> >>> >>> unix/wait smth ;), as far as I recall. >>> >>> If you need an exact ref, let me make up an error in the code, run, >>> and get back to you with the exact gdb frame info. >> >> >> I'm generally not in the crystal ball or guessing game :-) >> >> So yeah, a stack trace would be helpful. > > OK, will do. > ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 01d:12h:24m:29s] Program received signal SIGINT, Interrupt. 0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 81 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32 #2 0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951 #3 0x000000000045b33c in run_threads () at backend.c:2216 #4 0x000000000045b6a8 in fio_backend () at backend.c:2333 #5 0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8, envp=0x7fffffffddd0) at fio.c:60 The log says "fio: terminating on signal 2", but killing it with ^C if not running under gdb doesn't work - the job continues, seemingly waiting for the completion that never comes. Regards, Andrey > > Regards, > Andrey > >> >> -- >> Jens Axboe >> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-11 10:01 ` Andrey Kuzmin @ 2015-12-11 15:32 ` Jens Axboe 2015-12-11 19:59 ` Sitsofe Wheeler 0 siblings, 1 reply; 16+ messages in thread From: Jens Axboe @ 2015-12-11 15:32 UTC (permalink / raw) To: Andrey Kuzmin; +Cc: Sitsofe Wheeler, fio On 12/11/2015 03:01 AM, Andrey Kuzmin wrote: > ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta > 01d:12h:24m:29s] > Program received signal SIGINT, Interrupt. > 0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > 81 ../sysdeps/unix/syscall-template.S: No such file or directory. > (gdb) bt > #0 0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > #1 0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at > ../sysdeps/unix/sysv/linux/usleep.c:32 > #2 0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951 > #3 0x000000000045b33c in run_threads () at backend.c:2216 > #4 0x000000000045b6a8 in fio_backend () at backend.c:2333 > #5 0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8, > envp=0x7fffffffddd0) at fio.c:60 That's not one of the IO threads, that's the main thread. It'll sit and wait in that loop until jobs finish. You'll need the backtrace of one of the stuck IO thread instead, this trace is quite normal and expected of backend. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-11 15:32 ` Jens Axboe @ 2015-12-11 19:59 ` Sitsofe Wheeler 2015-12-11 20:32 ` Andrey Kuzmin 0 siblings, 1 reply; 16+ messages in thread From: Sitsofe Wheeler @ 2015-12-11 19:59 UTC (permalink / raw) To: Jens Axboe; +Cc: Andrey Kuzmin, fio On 11 December 2015 at 15:32, Jens Axboe <axboe@kernel.dk> wrote: > On 12/11/2015 03:01 AM, Andrey Kuzmin wrote: >> >> ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta >> 01d:12h:24m:29s] >> Program received signal SIGINT, Interrupt. >> 0x00007ffff6b7ff3d in nanosleep () at >> ../sysdeps/unix/syscall-template.S:81 >> 81 ../sysdeps/unix/syscall-template.S: No such file or directory. >> (gdb) bt >> #0 0x00007ffff6b7ff3d in nanosleep () at >> ../sysdeps/unix/syscall-template.S:81 >> #1 0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at >> ../sysdeps/unix/sysv/linux/usleep.c:32 >> #2 0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951 >> #3 0x000000000045b33c in run_threads () at backend.c:2216 >> #4 0x000000000045b6a8 in fio_backend () at backend.c:2333 >> #5 0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8, >> envp=0x7fffffffddd0) at fio.c:60 > > > That's not one of the IO threads, that's the main thread. It'll sit and wait > in that loop until jobs finish. You'll need the backtrace of one of the > stuck IO thread instead, this trace is quite normal and expected of backend. > > -- > Jens Axboe > Andrey: Could you try thread apply all bt full (found over on https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces )? -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-11 19:59 ` Sitsofe Wheeler @ 2015-12-11 20:32 ` Andrey Kuzmin 2015-12-11 20:38 ` Jens Axboe 0 siblings, 1 reply; 16+ messages in thread From: Andrey Kuzmin @ 2015-12-11 20:32 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: fio, Jens Axboe [-- Attachment #1: Type: text/plain, Size: 1627 bytes --] On Dec 11, 2015 22:59, "Sitsofe Wheeler" <sitsofe@gmail.com> wrote: > > On 11 December 2015 at 15:32, Jens Axboe <axboe@kernel.dk> wrote: > > On 12/11/2015 03:01 AM, Andrey Kuzmin wrote: > >> > >> ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta > >> 01d:12h:24m:29s] > >> Program received signal SIGINT, Interrupt. > >> 0x00007ffff6b7ff3d in nanosleep () at > >> ../sysdeps/unix/syscall-template.S:81 > >> 81 ../sysdeps/unix/syscall-template.S: No such file or directory. > >> (gdb) bt > >> #0 0x00007ffff6b7ff3d in nanosleep () at > >> ../sysdeps/unix/syscall-template.S:81 > >> #1 0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at > >> ../sysdeps/unix/sysv/linux/usleep.c:32 > >> #2 0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951 > >> #3 0x000000000045b33c in run_threads () at backend.c:2216 > >> #4 0x000000000045b6a8 in fio_backend () at backend.c:2333 > >> #5 0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8, > >> envp=0x7fffffffddd0) at fio.c:60 > > > > > > That's not one of the IO threads, that's the main thread. It'll sit and wait > > in that loop until jobs finish. You'll need the backtrace of one of the > > stuck IO thread instead, this trace is quite normal and expected of backend. > > > > -- > > Jens Axboe > > > > Andrey: > > Could you try > thread apply all bt full > (found over on https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces > )? > That test case is already gone, but - if interested - you can easily simulate it by randomly dropping an io_u inside the engine. Regards, Andrey > > -- > Sitsofe | http://sucs.org/~sits/ [-- Attachment #2: Type: text/html, Size: 2403 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-11 20:32 ` Andrey Kuzmin @ 2015-12-11 20:38 ` Jens Axboe 0 siblings, 0 replies; 16+ messages in thread From: Jens Axboe @ 2015-12-11 20:38 UTC (permalink / raw) To: Andrey Kuzmin, Sitsofe Wheeler; +Cc: fio On 12/11/2015 01:32 PM, Andrey Kuzmin wrote: > > On Dec 11, 2015 22:59, "Sitsofe Wheeler" <sitsofe@gmail.com > <mailto:sitsofe@gmail.com>> wrote: > > > > On 11 December 2015 at 15:32, Jens Axboe <axboe@kernel.dk > <mailto:axboe@kernel.dk>> wrote: > > > On 12/11/2015 03:01 AM, Andrey Kuzmin wrote: > > >> > > >> ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta > > >> 01d:12h:24m:29s] > > >> Program received signal SIGINT, Interrupt. > > >> 0x00007ffff6b7ff3d in nanosleep () at > > >> ../sysdeps/unix/syscall-template.S:81 > > >> 81 ../sysdeps/unix/syscall-template.S: No such file or directory. > > >> (gdb) bt > > >> #0 0x00007ffff6b7ff3d in nanosleep () at > > >> ../sysdeps/unix/syscall-template.S:81 > > >> #1 0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at > > >> ../sysdeps/unix/sysv/linux/usleep.c:32 > > >> #2 0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951 > > >> #3 0x000000000045b33c in run_threads () at backend.c:2216 > > >> #4 0x000000000045b6a8 in fio_backend () at backend.c:2333 > > >> #5 0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8, > > >> envp=0x7fffffffddd0) at fio.c:60 > > > > > > > > > That's not one of the IO threads, that's the main thread. It'll sit > and wait > > > in that loop until jobs finish. You'll need the backtrace of one of the > > > stuck IO thread instead, this trace is quite normal and expected of > backend. > > > > > > -- > > > Jens Axboe > > > > > > > Andrey: > > > > Could you try > > thread apply all bt full > > (found over on > https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces > > )? > > > > That test case is already gone, but - if interested - you can easily > simulate it by randomly dropping an io_u inside the engine. To follow up on this, since apparently parts of that thread ended up outside of the mailing list. If you drop an io_u inside the engine, then fio will of course get stuck waiting for completions. That would be an IO engine bug. Fio does not track timeouts internally, because it does not have to: For the more real case of being stuck waiting for IO that has been submitted to the kernel, we strictly depend on the kernel completing those IOs. If not, that's a kernel bug, and it won't matter if we explicitly wait for the IO, since it'll happen in any case when we drop the aio context. Either the IO gets completed by the device, or a driver timeout will take care of completing it in either. In either case, we get a completion event. There's no fio bug here. -- Jens Axboe ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Exit all jobs on error 2015-12-10 17:58 ` Sitsofe Wheeler 2015-12-10 18:11 ` Andrey Kuzmin @ 2015-12-10 18:15 ` Jens Axboe 1 sibling, 0 replies; 16+ messages in thread From: Jens Axboe @ 2015-12-10 18:15 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: fio On Thu, Dec 10 2015, Sitsofe Wheeler wrote: > On 10 December 2015 at 17:11, Jens Axboe <axboe@kernel.dk> wrote: > > On 12/10/2015 12:01 AM, Sitsofe Wheeler wrote: > >> > >> Hi, > >> > >> Is there an option to exit all jobs but only on error? If I have a job > >> like this > >> > >> [global] > >> stonewall=1 > >> verify=crc32 > >> rw=write > >> [pass1] > >> bs=4k > >> [pass2] > >> bs=8k > >> > >> I want fio to stop if pass1 fails verification and for pass2 not to be > >> performed at all. I'm aware of "exitall" but using that will make fio > >> quit even if pass1 is successful. > > > > > > That doesn't exist, but we could add a exitall_on_error to have that > > behavior. Should be pretty easy to add. > > That would work for me - that way it could be put in the global > section or per (stonewall) group. Something like the below should work. Apart from the 'adding the option' sugar, it's a few lines of changes. If you could test, that'd be great. diff --git a/HOWTO b/HOWTO index eb9c8245d4e3..b21d27e3b15f 100644 --- a/HOWTO +++ b/HOWTO @@ -1227,6 +1227,9 @@ exitall When one job finishes, terminate the rest. The default is to wait for each job to finish, sometimes that is not the desired action. +exitall_on_error When one job finishes in error, terminate the rest. The + default is to wait for each job to finish. + bwavgtime=int Average the calculated bandwidth over the given time. Value is specified in milliseconds. diff --git a/backend.c b/backend.c index 425b0ee94c37..e37fffb7b183 100644 --- a/backend.c +++ b/backend.c @@ -974,7 +974,7 @@ reap: if (!in_ramp_time(td) && should_check_rate(td)) { if (check_min_rate(td, &comp_time)) { - if (exitall_on_terminate) + if (exitall_on_terminate || td->o.exitall_error) fio_terminate_threads(td->groupid); td_verror(td, EIO, "check_min_rate"); break; @@ -1662,7 +1662,7 @@ static void *thread_main(void *data) if (o->exec_postrun) exec_string(o, o->exec_postrun, (const char *)"postrun"); - if (exitall_on_terminate) + if (exitall_on_terminate || (o->exitall_error && td->error)) fio_terminate_threads(td->groupid); err: diff --git a/cconv.c b/cconv.c index c0168c47d7dd..a476aad6376a 100644 --- a/cconv.c +++ b/cconv.c @@ -167,6 +167,7 @@ void convert_thread_options_to_cpu(struct thread_options *o, o->fsync_on_close = le32_to_cpu(top->fsync_on_close); o->bs_is_seq_rand = le32_to_cpu(top->bs_is_seq_rand); o->random_distribution = le32_to_cpu(top->random_distribution); + o->exitall_error = le32_to_cpu(top->exitall_error); o->zipf_theta.u.f = fio_uint64_to_double(le64_to_cpu(top->zipf_theta.u.i)); o->pareto_h.u.f = fio_uint64_to_double(le64_to_cpu(top->pareto_h.u.i)); o->gauss_dev.u.f = fio_uint64_to_double(le64_to_cpu(top->gauss_dev.u.i)); @@ -353,6 +354,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top, top->fsync_on_close = cpu_to_le32(o->fsync_on_close); top->bs_is_seq_rand = cpu_to_le32(o->bs_is_seq_rand); top->random_distribution = cpu_to_le32(o->random_distribution); + top->exitall_error = cpu_to_le32(o->exitall_error); top->zipf_theta.u.i = __cpu_to_le64(fio_double_to_uint64(o->zipf_theta.u.f)); top->pareto_h.u.i = __cpu_to_le64(fio_double_to_uint64(o->pareto_h.u.f)); top->gauss_dev.u.i = __cpu_to_le64(fio_double_to_uint64(o->gauss_dev.u.f)); diff --git a/fio.1 b/fio.1 index eab20d779e35..4fe1be27c31c 100644 --- a/fio.1 +++ b/fio.1 @@ -1126,6 +1126,10 @@ Should be a multiple of 1MB. Default: 4MB. .B exitall Terminate all jobs when one finishes. Default: wait for each job to finish. .TP +.B exitall_on_error \fR=\fPbool +Terminate all jobs if one job finishes in error. Default: wait for each job +to finish. +.TP .BI bwavgtime \fR=\fPint Average bandwidth calculations over the given time in milliseconds. Default: 500ms. diff --git a/init.c b/init.c index 0100da213a24..63ba32481b9b 100644 --- a/init.c +++ b/init.c @@ -47,6 +47,7 @@ static char **job_sections; static int nr_job_sections; int exitall_on_terminate = 0; +int exitall_on_terminate_error = 0; int output_format = FIO_OUTPUT_NORMAL; int eta_print = FIO_ETA_AUTO; int eta_new_line = 0; diff --git a/options.c b/options.c index 627029cd732c..46d5fb92ea98 100644 --- a/options.c +++ b/options.c @@ -3141,6 +3141,15 @@ struct fio_option fio_options[FIO_MAX_OPTS] = { .group = FIO_OPT_G_PROCESS, }, { + .name = "exitall_on_error", + .lname = "Exit-all on terminate in error", + .type = FIO_OPT_BOOL, + .off1 = td_var_offset(unlink), + .help = "Terminate all jobs when one exits in error", + .category = FIO_OPT_C_GENERAL, + .group = FIO_OPT_G_PROCESS, + }, + { .name = "stonewall", .lname = "Wait for previous", .alias = "wait_for_previous", diff --git a/thread_options.h b/thread_options.h index 02c867f31936..6ae0335698c1 100644 --- a/thread_options.h +++ b/thread_options.h @@ -131,6 +131,7 @@ struct thread_options { unsigned int verify_only; unsigned int random_distribution; + unsigned int exitall_error; fio_fp64_t zipf_theta; fio_fp64_t pareto_h; @@ -376,7 +377,7 @@ struct thread_options_pack { uint32_t bs_is_seq_rand; uint32_t random_distribution; - uint32_t pad; + uint32_t exitall_error; fio_fp64_t zipf_theta; fio_fp64_t pareto_h; -- Jens Axboe ^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-12-11 20:38 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-12-10 7:01 Exit all jobs on error Sitsofe Wheeler 2015-12-10 17:11 ` Jens Axboe 2015-12-10 17:58 ` Sitsofe Wheeler 2015-12-10 18:11 ` Andrey Kuzmin 2015-12-10 18:15 ` Jens Axboe 2015-12-10 18:17 ` Andrey Kuzmin 2015-12-10 18:24 ` Jens Axboe 2015-12-10 18:27 ` Andrey Kuzmin 2015-12-10 18:29 ` Jens Axboe 2015-12-10 18:30 ` Andrey Kuzmin 2015-12-11 10:01 ` Andrey Kuzmin 2015-12-11 15:32 ` Jens Axboe 2015-12-11 19:59 ` Sitsofe Wheeler 2015-12-11 20:32 ` Andrey Kuzmin 2015-12-11 20:38 ` Jens Axboe 2015-12-10 18:15 ` Jens Axboe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.