All of lore.kernel.org
 help / color / mirror / Atom feed
* How to cancel job
@ 2019-09-16  7:16 Elliott Balsley
  2019-09-16 11:53 ` Sitsofe Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Elliott Balsley @ 2019-09-16  7:16 UTC (permalink / raw)
  To: fio

Hello.  Simple question, what is the best way to cancel an fio job in
progress?  If I start a job with wrong options by mistake, I try
Ctrl-C but this just prints "fio: terminating on signal 2" and the job
continues.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to cancel job
  2019-09-16  7:16 How to cancel job Elliott Balsley
@ 2019-09-16 11:53 ` Sitsofe Wheeler
  2019-10-01  0:15   ` Elliott Balsley
  0 siblings, 1 reply; 8+ messages in thread
From: Sitsofe Wheeler @ 2019-09-16 11:53 UTC (permalink / raw)
  To: Elliott Balsley; +Cc: fio

Hi,

On Mon, 16 Sep 2019 at 09:06, Elliott Balsley <elliott@altsystems.com> wrote:
>
> Hello.  Simple question, what is the best way to cancel an fio job in
> progress?  If I start a job with wrong options by mistake, I try
> Ctrl-C but this just prints "fio: terminating on signal 2" and the job
> continues.

That should be as good as it gets for an abrupt stop outside of
something like killall -9 fio. Don't you get some message about
'hasn't exited in 60 seconds, it appears to be stuck. Doing forceful
exit of this job." after 60 seconds?

--
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to cancel job
  2019-09-16 11:53 ` Sitsofe Wheeler
@ 2019-10-01  0:15   ` Elliott Balsley
  2019-10-03  8:04     ` Sitsofe Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Elliott Balsley @ 2019-10-01  0:15 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio

> That should be as good as it gets for an abrupt stop outside of
> something like killall -9 fio. Don't you get some message about
> 'hasn't exited in 60 seconds, it appears to be stuck. Doing forceful
> exit of this job." after 60 seconds?

No, after 60 seconds it is still going.  Sometimes kill -9 works, but
not always.  Sometimes it leaves the [fio] process showing in ps and
disk activity continues.  This is frustrating, not having a proper way
to quit.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to cancel job
  2019-10-01  0:15   ` Elliott Balsley
@ 2019-10-03  8:04     ` Sitsofe Wheeler
  2019-10-04 21:34       ` Elliott Balsley
  0 siblings, 1 reply; 8+ messages in thread
From: Sitsofe Wheeler @ 2019-10-03  8:04 UTC (permalink / raw)
  To: Elliott Balsley; +Cc: fio

On Tue, 1 Oct 2019 at 01:15, Elliott Balsley <elliott@altsystems.com> wrote:
>
> No, after 60 seconds it is still going.  Sometimes kill -9 works, but
> not always.  Sometimes it leaves the [fio] process showing in ps and
> disk activity continues.  This is frustrating, not having a proper way
> to quit.

This sounds unexpected. Are there easy reproduction steps for this? Is
I/O definitely being processed by the lower levels of the kernel?

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to cancel job
  2019-10-03  8:04     ` Sitsofe Wheeler
@ 2019-10-04 21:34       ` Elliott Balsley
  2019-10-05 10:14         ` Sitsofe Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Elliott Balsley @ 2019-10-04 21:34 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio

> This sounds unexpected. Are there easy reproduction steps for this? Is
> I/O definitely being processed by the lower levels of the kernel?

I'm not sure how to reproduce it reliably.  It seems to happen more
often with NFS, not local filesystems.  Here is an example where
Ctrl-C does nothing.  Then I ran "killall -9 fio" in another terminal"
and it took about 20 seconds before the disk activity actually stopped
in iostat.

$ fio --name=write --rw=write --bs=1M --size=100G --end_fsync=1
--filename_format=/mnt/rivendell/fio.\$jobnum.\$filenum
write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
(T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.1
Starting 1 process
write: Laying out IO file (1 file / 102400MiB)
fio: native_fallocate call failed: Operation not supported
^Cbs: 1 (f=1): [W(1)][14.3%][r=0KiB/s,w=1327MiB/s][r=0,w=1327 IOPS][eta 00m:42s]
fio: terminating on signal 2
Killed1 (f=1): [F(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]

$ ps aux | grep fio
root     243211 36.0  0.0 931316  3068 ?        Ds   14:25   0:08 fio
--name=write --rw=write --bs=1M --size=100G --end_fsync=1
--filename_format=/mnt/rivendell/fio.$jobnum.$filenum
root     243238  0.0  0.0 112708   988 pts/0    S+   14:26   0:00 grep
--color=auto fio


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to cancel job
  2019-10-04 21:34       ` Elliott Balsley
@ 2019-10-05 10:14         ` Sitsofe Wheeler
  2019-10-05 19:51           ` Elliott Balsley
  0 siblings, 1 reply; 8+ messages in thread
From: Sitsofe Wheeler @ 2019-10-05 10:14 UTC (permalink / raw)
  To: Elliott Balsley; +Cc: fio

On Fri, 4 Oct 2019 at 22:34, Elliott Balsley <elliott@altsystems.com> wrote:
>
> > This sounds unexpected. Are there easy reproduction steps for this? Is
> > I/O definitely being processed by the lower levels of the kernel?
>
> I'm not sure how to reproduce it reliably.  It seems to happen more
> often with NFS, not local filesystems.  Here is an example where
> Ctrl-C does nothing.  Then I ran "killall -9 fio" in another terminal"
> and it took about 20 seconds before the disk activity actually stopped
> in iostat.
>
> $ fio --name=write --rw=write --bs=1M --size=100G --end_fsync=1
> --filename_format=/mnt/rivendell/fio.\$jobnum.\$filenum
> write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
> (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
> fio-3.1
> Starting 1 process
> write: Laying out IO file (1 file / 102400MiB)
> fio: native_fallocate call failed: Operation not supported
> ^Cbs: 1 (f=1): [W(1)][14.3%][r=0KiB/s,w=1327MiB/s][r=0,w=1327 IOPS][eta 00m:42s]
> fio: terminating on signal 2
> Killed1 (f=1): [F(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
>
> $ ps aux | grep fio
> root     243211 36.0  0.0 931316  3068 ?        Ds   14:25   0:08 fio
> --name=write --rw=write --bs=1M --size=100G --end_fsync=1
> --filename_format=/mnt/rivendell/fio.$jobnum.$filenum
> root     243238  0.0  0.0 112708   988 pts/0    S+   14:26   0:00 grep
> --color=auto fio

That process state (D) means that it is uninterruptible sleep. Doing
I/O through NFS can result in the process being unkillable until it
gets out of sending/receiving some batch of data to/from the NFS
server. Some part of fio saw the Ctrl-C (hence the terminating
message) but the part actually processing I/O presumably didn't
respond - maybe your NFS share is mounted with the "hard" option and
is retrying I/O indefinitely?

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to cancel job
  2019-10-05 10:14         ` Sitsofe Wheeler
@ 2019-10-05 19:51           ` Elliott Balsley
  2019-12-19  7:19             ` Sitsofe Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Elliott Balsley @ 2019-10-05 19:51 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio

Thanks, I had not noticed the D state.  I learn something every day.
I am indeed using NFS hard mount, but when this issue happens there is
no problem with the mount itself.  Other apps can read/write just
fine.  So maybe this could be improved by making fio do interruptible
sleep?  Usually this happens when I accidentally start a job with a
size much higher than I intended.

On Sat, Oct 5, 2019 at 3:15 AM Sitsofe Wheeler <sitsofe@gmail.com> wrote:
>
> On Fri, 4 Oct 2019 at 22:34, Elliott Balsley <elliott@altsystems.com> wrote:
> >
> > > This sounds unexpected. Are there easy reproduction steps for this? Is
> > > I/O definitely being processed by the lower levels of the kernel?
> >
> > I'm not sure how to reproduce it reliably.  It seems to happen more
> > often with NFS, not local filesystems.  Here is an example where
> > Ctrl-C does nothing.  Then I ran "killall -9 fio" in another terminal"
> > and it took about 20 seconds before the disk activity actually stopped
> > in iostat.
> >
> > $ fio --name=write --rw=write --bs=1M --size=100G --end_fsync=1
> > --filename_format=/mnt/rivendell/fio.\$jobnum.\$filenum
> > write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB,
> > (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
> > fio-3.1
> > Starting 1 process
> > write: Laying out IO file (1 file / 102400MiB)
> > fio: native_fallocate call failed: Operation not supported
> > ^Cbs: 1 (f=1): [W(1)][14.3%][r=0KiB/s,w=1327MiB/s][r=0,w=1327 IOPS][eta 00m:42s]
> > fio: terminating on signal 2
> > Killed1 (f=1): [F(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
> >
> > $ ps aux | grep fio
> > root     243211 36.0  0.0 931316  3068 ?        Ds   14:25   0:08 fio
> > --name=write --rw=write --bs=1M --size=100G --end_fsync=1
> > --filename_format=/mnt/rivendell/fio.$jobnum.$filenum
> > root     243238  0.0  0.0 112708   988 pts/0    S+   14:26   0:00 grep
> > --color=auto fio
>
> That process state (D) means that it is uninterruptible sleep. Doing
> I/O through NFS can result in the process being unkillable until it
> gets out of sending/receiving some batch of data to/from the NFS
> server. Some part of fio saw the Ctrl-C (hence the terminating
> message) but the part actually processing I/O presumably didn't
> respond - maybe your NFS share is mounted with the "hard" option and
> is retrying I/O indefinitely?
>
> --
> Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: How to cancel job
  2019-10-05 19:51           ` Elliott Balsley
@ 2019-12-19  7:19             ` Sitsofe Wheeler
  0 siblings, 0 replies; 8+ messages in thread
From: Sitsofe Wheeler @ 2019-12-19  7:19 UTC (permalink / raw)
  To: Elliott Balsley; +Cc: fio

On Sat, 5 Oct 2019 at 20:51, Elliott Balsley <elliott@altsystems.com> wrote:
>
> Thanks, I had not noticed the D state.  I learn something every day.
> I am indeed using NFS hard mount, but when this issue happens there is
> no problem with the mount itself.  Other apps can read/write just
> fine.  So maybe this could be improved by making fio do interruptible
> sleep?  Usually this happens when I accidentally start a job with a
> size much higher than I intended.

(Sorry for the late reply) It's not fio's choice - it's the kernel's.
Fio is doing I/O down to a file (maybe just that file is causing the
NFS server heartache?) and the kernel chooses whether doing that I/O
makes the task (hopefully temporarily) uninterruptible. Your other
tasks might not be doing enough NFS I/O to get caught out (presumably
the queue of I/O is backing up) and/or you don't spot them hanging
because you're not actively killing them.

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-12-19  7:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-16  7:16 How to cancel job Elliott Balsley
2019-09-16 11:53 ` Sitsofe Wheeler
2019-10-01  0:15   ` Elliott Balsley
2019-10-03  8:04     ` Sitsofe Wheeler
2019-10-04 21:34       ` Elliott Balsley
2019-10-05 10:14         ` Sitsofe Wheeler
2019-10-05 19:51           ` Elliott Balsley
2019-12-19  7:19             ` Sitsofe Wheeler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.