All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fix rand_seed mismatches in verify phase
@ 2014-02-03 21:31 Puthikorn Voravootivat
  2014-02-03 22:54 ` Grant Grundler
  2014-02-05 17:15 ` Jens Axboe
  0 siblings, 2 replies; 12+ messages in thread
From: Puthikorn Voravootivat @ 2014-02-03 21:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: FIO List, Gwendal Grignou, Grant Grundler, Puthikorn Voravootivat

In verify phase, the rand_seed generated on replay does not match
the written rand_seed.

Multiple problems are causing this:
1. In verify phase fio does not set io_u->rand_seed to compare with
   hdr->rand_seed
2. In randrw scenario, fio log is stored in red-black tree in "sorted by LBA"
   order.  Thus, it is imposible to replay the written order, or rather
   generate the seeds again in the same order.
3. In write phase, the code currently will generate rand_seed, write data
   and log rand_seed. When queuedepth > 1, it's possible the writes complete
   in a different order than rand_seed was generated.  Thus when replaying
   the log, the generated rand_seed might not match what was written.
4. verify_backlog option will start verification before all the data has been
   written and it make rand_seed replay code broken with current design.

Proposed fixes:
1. Use of existing verify_state to generate verify header.
   (and assumes this was the original intention of verify_state). And also
   adds code to replay rand_seed in verify phase.
2. If verifysort option is not enabled, store the write log in a list instead
   of the red-black tree. Otherwise, don't attempt to verify the rand_seed
   in the header.
3. In write phase,  generate rand_seed, log rand_seed, write data. I.e. log
   IO transactions in the order generated, not completed.
4. Don't verify rand_seed when verify_backlog is enabled.

Signed-off-by: Puthikorn Voravootivat <puthik@chromium.org>
---
 backend.c | 13 +++++++++++++
 io_u.c    |  6 ------
 iolog.c   |  2 +-
 verify.c  | 14 ++++++++++++--
 4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/backend.c b/backend.c
index 93e6632..bf9d066 100644
--- a/backend.c
+++ b/backend.c
@@ -697,6 +697,13 @@ static uint64_t do_io(struct thread_data *td)
 		 */
 		if (td->o.verify != VERIFY_NONE && io_u->ddir == DDIR_READ &&
 		    ((io_u->flags & IO_U_F_VER_LIST) || !td_rw(td))) {
+
+			if (!td->o.verify_pattern_bytes) {
+				io_u->rand_seed = __rand(&td->__verify_state);
+				if (sizeof(int) != sizeof(long *))
+					io_u->rand_seed *= __rand(&td->__verify_state);
+			}
+
 			if (td->o.verify_async)
 				io_u->end_io = verify_io_u_async;
 			else
@@ -707,6 +714,12 @@ static uint64_t do_io(struct thread_data *td)
 		else
 			td_set_runstate(td, TD_RUNNING);
 
+		if (td_write(td) && io_u->ddir == DDIR_WRITE &&
+		    td->o.do_verify &&
+		    td->o.verify != VERIFY_NONE &&
+		    !td->o.experimental_verify)
+			log_io_piece(td, io_u);
+
 		ret = td_io_queue(td, io_u);
 		switch (ret) {
 		case FIO_Q_COMPLETED:
diff --git a/io_u.c b/io_u.c
index 518d884..f68b213 100644
--- a/io_u.c
+++ b/io_u.c
@@ -1623,12 +1623,6 @@ static void io_completed(struct thread_data *td, struct io_u *io_u,
 					 utime_since_now(&td->start));
 		}
 
-		if (td_write(td) && idx == DDIR_WRITE &&
-		    td->o.do_verify &&
-		    td->o.verify != VERIFY_NONE &&
-		    !td->o.experimental_verify)
-			log_io_piece(td, io_u);
-
 		icd->bytes_done[idx] += bytes;
 
 		if (io_u->end_io) {
diff --git a/iolog.c b/iolog.c
index ec29971..017b235 100644
--- a/iolog.c
+++ b/iolog.c
@@ -209,7 +209,7 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
 	 * drop the old one, which we rely on the rb insert/lookup for
 	 * handling.
 	 */
-	if ((!td_random(td) || !td->o.overwrite) &&
+	if (((!td->o.verifysort) || !td_random(td) || !td->o.overwrite) &&
 	      (file_randommap(td, ipo->file) || td->o.verify == VERIFY_NONE)) {
 		INIT_FLIST_HEAD(&ipo->list);
 		flist_add_tail(&ipo->list, &td->io_hist_list);
diff --git a/verify.c b/verify.c
index 568bae8..970b867 100644
--- a/verify.c
+++ b/verify.c
@@ -72,10 +72,10 @@ void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len,
 		if (use_seed)
 			__fill_random_buf(p, len, seed);
 		else
-			io_u->rand_seed = fill_random_buf(&td->buf_state, p, len);
+			io_u->rand_seed = fill_random_buf(&td->__verify_state, p, len);
 		return;
 	}
-	
+
 	if (io_u->buf_filled_len >= len) {
 		dprint(FD_VERIFY, "using already filled verify pattern b=%d len=%u\n",
 			td->o.verify_pattern_bytes, len);
@@ -718,6 +718,10 @@ int verify_io_u(struct thread_data *td, struct io_u *io_u)
 			memswp(p, p + td->o.verify_offset, header_size);
 		hdr = p;
 
+		// Make rand_seed check pass when have verifysort or verify_backlog.
+		if (td->o.verifysort || (td->flags & TD_F_VER_BACKLOG))
+			io_u->rand_seed = hdr->rand_seed;
+
 		ret = verify_header(io_u, hdr);
 		switch (ret) {
 		case 0:
@@ -1056,6 +1060,12 @@ int get_next_verify(struct thread_data *td, struct io_u *io_u)
 		remove_trim_entry(td, ipo);
 		free(ipo);
 		dprint(FD_VERIFY, "get_next_verify: ret io_u %p\n", io_u);
+
+		if (!td->o.verify_pattern_bytes) {
+			io_u->rand_seed = __rand(&td->__verify_state);
+			if (sizeof(int) != sizeof(long *))
+				io_u->rand_seed *= __rand(&td->__verify_state);
+		}
 		return 0;
 	}
 
-- 
1.9.0.rc1.175.g0b1dcb5



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-03 21:31 [PATCH] fix rand_seed mismatches in verify phase Puthikorn Voravootivat
@ 2014-02-03 22:54 ` Grant Grundler
  2014-02-04 18:54   ` Puthikorn Voravootivat
  2014-02-04 18:56   ` Puthikorn Voravootivat
  2014-02-05 17:15 ` Jens Axboe
  1 sibling, 2 replies; 12+ messages in thread
From: Grant Grundler @ 2014-02-03 22:54 UTC (permalink / raw)
  To: Puthikorn Voravootivat
  Cc: Jens Axboe, FIO List, Gwendal Grignou, Grant Grundler

On Mon, Feb 3, 2014 at 1:31 PM, Puthikorn Voravootivat
<puthik@chromium.org> wrote:
> In verify phase, the rand_seed generated on replay does not match
> the written rand_seed.
>
> Multiple problems are causing this:
> 1. In verify phase fio does not set io_u->rand_seed to compare with
>    hdr->rand_seed
> 2. In randrw scenario, fio log is stored in red-black tree in "sorted by LBA"
>    order.  Thus, it is imposible to replay the written order, or rather
>    generate the seeds again in the same order.
> 3. In write phase, the code currently will generate rand_seed, write data
>    and log rand_seed. When queuedepth > 1, it's possible the writes complete
>    in a different order than rand_seed was generated.  Thus when replaying
>    the log, the generated rand_seed might not match what was written.
> 4. verify_backlog option will start verification before all the data has been
>    written and it make rand_seed replay code broken with current design.
>
> Proposed fixes:
> 1. Use of existing verify_state to generate verify header.
>    (and assumes this was the original intention of verify_state). And also
>    adds code to replay rand_seed in verify phase.
> 2. If verifysort option is not enabled, store the write log in a list instead
>    of the red-black tree. Otherwise, don't attempt to verify the rand_seed
>    in the header.
> 3. In write phase,  generate rand_seed, log rand_seed, write data. I.e. log
>    IO transactions in the order generated, not completed.
> 4. Don't verify rand_seed when verify_backlog is enabled.


We are still seeing failures with rw=read + verify_backlog=1. Investigating.

Please send any feedback if you have a chance to look at this anyway.
We'll fix the C++ comment. :)

cheers,
grant

>
> Signed-off-by: Puthikorn Voravootivat <puthik@chromium.org>
> ---
>  backend.c | 13 +++++++++++++
>  io_u.c    |  6 ------
>  iolog.c   |  2 +-
>  verify.c  | 14 ++++++++++++--
>  4 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/backend.c b/backend.c
> index 93e6632..bf9d066 100644
> --- a/backend.c
> +++ b/backend.c
> @@ -697,6 +697,13 @@ static uint64_t do_io(struct thread_data *td)
>                  */
>                 if (td->o.verify != VERIFY_NONE && io_u->ddir == DDIR_READ &&
>                     ((io_u->flags & IO_U_F_VER_LIST) || !td_rw(td))) {
> +
> +                       if (!td->o.verify_pattern_bytes) {
> +                               io_u->rand_seed = __rand(&td->__verify_state);
> +                               if (sizeof(int) != sizeof(long *))
> +                                       io_u->rand_seed *= __rand(&td->__verify_state);
> +                       }
> +
>                         if (td->o.verify_async)
>                                 io_u->end_io = verify_io_u_async;
>                         else
> @@ -707,6 +714,12 @@ static uint64_t do_io(struct thread_data *td)
>                 else
>                         td_set_runstate(td, TD_RUNNING);
>
> +               if (td_write(td) && io_u->ddir == DDIR_WRITE &&
> +                   td->o.do_verify &&
> +                   td->o.verify != VERIFY_NONE &&
> +                   !td->o.experimental_verify)
> +                       log_io_piece(td, io_u);
> +
>                 ret = td_io_queue(td, io_u);
>                 switch (ret) {
>                 case FIO_Q_COMPLETED:
> diff --git a/io_u.c b/io_u.c
> index 518d884..f68b213 100644
> --- a/io_u.c
> +++ b/io_u.c
> @@ -1623,12 +1623,6 @@ static void io_completed(struct thread_data *td, struct io_u *io_u,
>                                          utime_since_now(&td->start));
>                 }
>
> -               if (td_write(td) && idx == DDIR_WRITE &&
> -                   td->o.do_verify &&
> -                   td->o.verify != VERIFY_NONE &&
> -                   !td->o.experimental_verify)
> -                       log_io_piece(td, io_u);
> -
>                 icd->bytes_done[idx] += bytes;
>
>                 if (io_u->end_io) {
> diff --git a/iolog.c b/iolog.c
> index ec29971..017b235 100644
> --- a/iolog.c
> +++ b/iolog.c
> @@ -209,7 +209,7 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
>          * drop the old one, which we rely on the rb insert/lookup for
>          * handling.
>          */
> -       if ((!td_random(td) || !td->o.overwrite) &&
> +       if (((!td->o.verifysort) || !td_random(td) || !td->o.overwrite) &&
>               (file_randommap(td, ipo->file) || td->o.verify == VERIFY_NONE)) {
>                 INIT_FLIST_HEAD(&ipo->list);
>                 flist_add_tail(&ipo->list, &td->io_hist_list);
> diff --git a/verify.c b/verify.c
> index 568bae8..970b867 100644
> --- a/verify.c
> +++ b/verify.c
> @@ -72,10 +72,10 @@ void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len,
>                 if (use_seed)
>                         __fill_random_buf(p, len, seed);
>                 else
> -                       io_u->rand_seed = fill_random_buf(&td->buf_state, p, len);
> +                       io_u->rand_seed = fill_random_buf(&td->__verify_state, p, len);
>                 return;
>         }
> -
> +
>         if (io_u->buf_filled_len >= len) {
>                 dprint(FD_VERIFY, "using already filled verify pattern b=%d len=%u\n",
>                         td->o.verify_pattern_bytes, len);
> @@ -718,6 +718,10 @@ int verify_io_u(struct thread_data *td, struct io_u *io_u)
>                         memswp(p, p + td->o.verify_offset, header_size);
>                 hdr = p;
>
> +               // Make rand_seed check pass when have verifysort or verify_backlog.
> +               if (td->o.verifysort || (td->flags & TD_F_VER_BACKLOG))
> +                       io_u->rand_seed = hdr->rand_seed;
> +
>                 ret = verify_header(io_u, hdr);
>                 switch (ret) {
>                 case 0:
> @@ -1056,6 +1060,12 @@ int get_next_verify(struct thread_data *td, struct io_u *io_u)
>                 remove_trim_entry(td, ipo);
>                 free(ipo);
>                 dprint(FD_VERIFY, "get_next_verify: ret io_u %p\n", io_u);
> +
> +               if (!td->o.verify_pattern_bytes) {
> +                       io_u->rand_seed = __rand(&td->__verify_state);
> +                       if (sizeof(int) != sizeof(long *))
> +                               io_u->rand_seed *= __rand(&td->__verify_state);
> +               }
>                 return 0;
>         }
>
> --
> 1.9.0.rc1.175.g0b1dcb5
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-03 22:54 ` Grant Grundler
@ 2014-02-04 18:54   ` Puthikorn Voravootivat
  2014-02-04 19:21     ` Grant Grundler
  2014-02-05 18:27     ` Jens Axboe
  2014-02-04 18:56   ` Puthikorn Voravootivat
  1 sibling, 2 replies; 12+ messages in thread
From: Puthikorn Voravootivat @ 2014-02-04 18:54 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Puthikorn Voravootivat, Jens Axboe, FIO List, Gwendal Grignou

[-- Attachment #1: Type: text/plain, Size: 7376 bytes --]

I tried to run the same test with fio 2.1.3 and saw the same error.
This error always occurs when rw=read or randread and verify_backlog=1

Here is an example job file and error message.

Job file:
[md5-sync-1-0-1-1-1]
filename=test213/fio/md5-sync-1-0-1-1-1
loops=1
direct=0
iodepth=1
ioengine=sync
verify=md5
size=1m
bs=4k
rw=read
verifysort=1
verify_backlog=1

Error message:
verify: bad magic header 869, wanted acca at file
test213/fio/md5-sync-1-0-1-1-1 offset 585728, length 2290190


On Mon, Feb 3, 2014 at 2:54 PM, Grant Grundler <grundler@chromium.org>wrote:

> On Mon, Feb 3, 2014 at 1:31 PM, Puthikorn Voravootivat
> <puthik@chromium.org> wrote:
> > In verify phase, the rand_seed generated on replay does not match
> > the written rand_seed.
> >
> > Multiple problems are causing this:
> > 1. In verify phase fio does not set io_u->rand_seed to compare with
> >    hdr->rand_seed
> > 2. In randrw scenario, fio log is stored in red-black tree in "sorted by
> LBA"
> >    order.  Thus, it is imposible to replay the written order, or rather
> >    generate the seeds again in the same order.
> > 3. In write phase, the code currently will generate rand_seed, write data
> >    and log rand_seed. When queuedepth > 1, it's possible the writes
> complete
> >    in a different order than rand_seed was generated.  Thus when
> replaying
> >    the log, the generated rand_seed might not match what was written.
> > 4. verify_backlog option will start verification before all the data has
> been
> >    written and it make rand_seed replay code broken with current design.
> >
> > Proposed fixes:
> > 1. Use of existing verify_state to generate verify header.
> >    (and assumes this was the original intention of verify_state). And
> also
> >    adds code to replay rand_seed in verify phase.
> > 2. If verifysort option is not enabled, store the write log in a list
> instead
> >    of the red-black tree. Otherwise, don't attempt to verify the
> rand_seed
> >    in the header.
> > 3. In write phase,  generate rand_seed, log rand_seed, write data. I.e.
> log
> >    IO transactions in the order generated, not completed.
> > 4. Don't verify rand_seed when verify_backlog is enabled.
>
>
> We are still seeing failures with rw=read + verify_backlog=1.
> Investigating.
>
> Please send any feedback if you have a chance to look at this anyway.
> We'll fix the C++ comment. :)
>
> cheers,
> grant
>
> >
> > Signed-off-by: Puthikorn Voravootivat <puthik@chromium.org>
> > ---
> >  backend.c | 13 +++++++++++++
> >  io_u.c    |  6 ------
> >  iolog.c   |  2 +-
> >  verify.c  | 14 ++++++++++++--
> >  4 files changed, 26 insertions(+), 9 deletions(-)
> >
> > diff --git a/backend.c b/backend.c
> > index 93e6632..bf9d066 100644
> > --- a/backend.c
> > +++ b/backend.c
> > @@ -697,6 +697,13 @@ static uint64_t do_io(struct thread_data *td)
> >                  */
> >                 if (td->o.verify != VERIFY_NONE && io_u->ddir ==
> DDIR_READ &&
> >                     ((io_u->flags & IO_U_F_VER_LIST) || !td_rw(td))) {
> > +
> > +                       if (!td->o.verify_pattern_bytes) {
> > +                               io_u->rand_seed =
> __rand(&td->__verify_state);
> > +                               if (sizeof(int) != sizeof(long *))
> > +                                       io_u->rand_seed *=
> __rand(&td->__verify_state);
> > +                       }
> > +
> >                         if (td->o.verify_async)
> >                                 io_u->end_io = verify_io_u_async;
> >                         else
> > @@ -707,6 +714,12 @@ static uint64_t do_io(struct thread_data *td)
> >                 else
> >                         td_set_runstate(td, TD_RUNNING);
> >
> > +               if (td_write(td) && io_u->ddir == DDIR_WRITE &&
> > +                   td->o.do_verify &&
> > +                   td->o.verify != VERIFY_NONE &&
> > +                   !td->o.experimental_verify)
> > +                       log_io_piece(td, io_u);
> > +
> >                 ret = td_io_queue(td, io_u);
> >                 switch (ret) {
> >                 case FIO_Q_COMPLETED:
> > diff --git a/io_u.c b/io_u.c
> > index 518d884..f68b213 100644
> > --- a/io_u.c
> > +++ b/io_u.c
> > @@ -1623,12 +1623,6 @@ static void io_completed(struct thread_data *td,
> struct io_u *io_u,
> >                                          utime_since_now(&td->start));
> >                 }
> >
> > -               if (td_write(td) && idx == DDIR_WRITE &&
> > -                   td->o.do_verify &&
> > -                   td->o.verify != VERIFY_NONE &&
> > -                   !td->o.experimental_verify)
> > -                       log_io_piece(td, io_u);
> > -
> >                 icd->bytes_done[idx] += bytes;
> >
> >                 if (io_u->end_io) {
> > diff --git a/iolog.c b/iolog.c
> > index ec29971..017b235 100644
> > --- a/iolog.c
> > +++ b/iolog.c
> > @@ -209,7 +209,7 @@ void log_io_piece(struct thread_data *td, struct
> io_u *io_u)
> >          * drop the old one, which we rely on the rb insert/lookup for
> >          * handling.
> >          */
> > -       if ((!td_random(td) || !td->o.overwrite) &&
> > +       if (((!td->o.verifysort) || !td_random(td) || !td->o.overwrite)
> &&
> >               (file_randommap(td, ipo->file) || td->o.verify ==
> VERIFY_NONE)) {
> >                 INIT_FLIST_HEAD(&ipo->list);
> >                 flist_add_tail(&ipo->list, &td->io_hist_list);
> > diff --git a/verify.c b/verify.c
> > index 568bae8..970b867 100644
> > --- a/verify.c
> > +++ b/verify.c
> > @@ -72,10 +72,10 @@ void fill_verify_pattern(struct thread_data *td,
> void *p, unsigned int len,
> >                 if (use_seed)
> >                         __fill_random_buf(p, len, seed);
> >                 else
> > -                       io_u->rand_seed =
> fill_random_buf(&td->buf_state, p, len);
> > +                       io_u->rand_seed =
> fill_random_buf(&td->__verify_state, p, len);
> >                 return;
> >         }
> > -
> > +
> >         if (io_u->buf_filled_len >= len) {
> >                 dprint(FD_VERIFY, "using already filled verify pattern
> b=%d len=%u\n",
> >                         td->o.verify_pattern_bytes, len);
> > @@ -718,6 +718,10 @@ int verify_io_u(struct thread_data *td, struct io_u
> *io_u)
> >                         memswp(p, p + td->o.verify_offset, header_size);
> >                 hdr = p;
> >
> > +               // Make rand_seed check pass when have verifysort or
> verify_backlog.
> > +               if (td->o.verifysort || (td->flags & TD_F_VER_BACKLOG))
> > +                       io_u->rand_seed = hdr->rand_seed;
> > +
> >                 ret = verify_header(io_u, hdr);
> >                 switch (ret) {
> >                 case 0:
> > @@ -1056,6 +1060,12 @@ int get_next_verify(struct thread_data *td,
> struct io_u *io_u)
> >                 remove_trim_entry(td, ipo);
> >                 free(ipo);
> >                 dprint(FD_VERIFY, "get_next_verify: ret io_u %p\n",
> io_u);
> > +
> > +               if (!td->o.verify_pattern_bytes) {
> > +                       io_u->rand_seed = __rand(&td->__verify_state);
> > +                       if (sizeof(int) != sizeof(long *))
> > +                               io_u->rand_seed *=
> __rand(&td->__verify_state);
> > +               }
> >                 return 0;
> >         }
> >
> > --
> > 1.9.0.rc1.175.g0b1dcb5
> >
>

[-- Attachment #2: Type: text/html, Size: 10134 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-03 22:54 ` Grant Grundler
  2014-02-04 18:54   ` Puthikorn Voravootivat
@ 2014-02-04 18:56   ` Puthikorn Voravootivat
  1 sibling, 0 replies; 12+ messages in thread
From: Puthikorn Voravootivat @ 2014-02-04 18:56 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Puthikorn Voravootivat, Jens Axboe, FIO List, Gwendal Grignou

*Send email again using plain text mode

I tried to run the same test with fio 2.1.3 and saw the same error.
This error always occurs when rw=read or randread and verify_backlog=1

Here is an example job file and error message.

Job file:
[md5-sync-1-0-1-1-1]
filename=test213/fio/md5-sync-1-0-1-1-1
loops=1
direct=0
iodepth=1
ioengine=sync
verify=md5
size=1m
bs=4k
rw=read
verifysort=1
verify_backlog=1

Error message:
verify: bad magic header 869, wanted acca at file
test213/fio/md5-sync-1-0-1-1-1 offset 585728, length 2290190

On Mon, Feb 3, 2014 at 2:54 PM, Grant Grundler <grundler@chromium.org> wrote:
> On Mon, Feb 3, 2014 at 1:31 PM, Puthikorn Voravootivat
> <puthik@chromium.org> wrote:
>> In verify phase, the rand_seed generated on replay does not match
>> the written rand_seed.
>>
>> Multiple problems are causing this:
>> 1. In verify phase fio does not set io_u->rand_seed to compare with
>>    hdr->rand_seed
>> 2. In randrw scenario, fio log is stored in red-black tree in "sorted by LBA"
>>    order.  Thus, it is imposible to replay the written order, or rather
>>    generate the seeds again in the same order.
>> 3. In write phase, the code currently will generate rand_seed, write data
>>    and log rand_seed. When queuedepth > 1, it's possible the writes complete
>>    in a different order than rand_seed was generated.  Thus when replaying
>>    the log, the generated rand_seed might not match what was written.
>> 4. verify_backlog option will start verification before all the data has been
>>    written and it make rand_seed replay code broken with current design.
>>
>> Proposed fixes:
>> 1. Use of existing verify_state to generate verify header.
>>    (and assumes this was the original intention of verify_state). And also
>>    adds code to replay rand_seed in verify phase.
>> 2. If verifysort option is not enabled, store the write log in a list instead
>>    of the red-black tree. Otherwise, don't attempt to verify the rand_seed
>>    in the header.
>> 3. In write phase,  generate rand_seed, log rand_seed, write data. I.e. log
>>    IO transactions in the order generated, not completed.
>> 4. Don't verify rand_seed when verify_backlog is enabled.
>
>
> We are still seeing failures with rw=read + verify_backlog=1. Investigating.
>
> Please send any feedback if you have a chance to look at this anyway.
> We'll fix the C++ comment. :)
>
> cheers,
> grant
>
>>
>> Signed-off-by: Puthikorn Voravootivat <puthik@chromium.org>
>> ---
>>  backend.c | 13 +++++++++++++
>>  io_u.c    |  6 ------
>>  iolog.c   |  2 +-
>>  verify.c  | 14 ++++++++++++--
>>  4 files changed, 26 insertions(+), 9 deletions(-)
>>
>> diff --git a/backend.c b/backend.c
>> index 93e6632..bf9d066 100644
>> --- a/backend.c
>> +++ b/backend.c
>> @@ -697,6 +697,13 @@ static uint64_t do_io(struct thread_data *td)
>>                  */
>>                 if (td->o.verify != VERIFY_NONE && io_u->ddir == DDIR_READ &&
>>                     ((io_u->flags & IO_U_F_VER_LIST) || !td_rw(td))) {
>> +
>> +                       if (!td->o.verify_pattern_bytes) {
>> +                               io_u->rand_seed = __rand(&td->__verify_state);
>> +                               if (sizeof(int) != sizeof(long *))
>> +                                       io_u->rand_seed *= __rand(&td->__verify_state);
>> +                       }
>> +
>>                         if (td->o.verify_async)
>>                                 io_u->end_io = verify_io_u_async;
>>                         else
>> @@ -707,6 +714,12 @@ static uint64_t do_io(struct thread_data *td)
>>                 else
>>                         td_set_runstate(td, TD_RUNNING);
>>
>> +               if (td_write(td) && io_u->ddir == DDIR_WRITE &&
>> +                   td->o.do_verify &&
>> +                   td->o.verify != VERIFY_NONE &&
>> +                   !td->o.experimental_verify)
>> +                       log_io_piece(td, io_u);
>> +
>>                 ret = td_io_queue(td, io_u);
>>                 switch (ret) {
>>                 case FIO_Q_COMPLETED:
>> diff --git a/io_u.c b/io_u.c
>> index 518d884..f68b213 100644
>> --- a/io_u.c
>> +++ b/io_u.c
>> @@ -1623,12 +1623,6 @@ static void io_completed(struct thread_data *td, struct io_u *io_u,
>>                                          utime_since_now(&td->start));
>>                 }
>>
>> -               if (td_write(td) && idx == DDIR_WRITE &&
>> -                   td->o.do_verify &&
>> -                   td->o.verify != VERIFY_NONE &&
>> -                   !td->o.experimental_verify)
>> -                       log_io_piece(td, io_u);
>> -
>>                 icd->bytes_done[idx] += bytes;
>>
>>                 if (io_u->end_io) {
>> diff --git a/iolog.c b/iolog.c
>> index ec29971..017b235 100644
>> --- a/iolog.c
>> +++ b/iolog.c
>> @@ -209,7 +209,7 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
>>          * drop the old one, which we rely on the rb insert/lookup for
>>          * handling.
>>          */
>> -       if ((!td_random(td) || !td->o.overwrite) &&
>> +       if (((!td->o.verifysort) || !td_random(td) || !td->o.overwrite) &&
>>               (file_randommap(td, ipo->file) || td->o.verify == VERIFY_NONE)) {
>>                 INIT_FLIST_HEAD(&ipo->list);
>>                 flist_add_tail(&ipo->list, &td->io_hist_list);
>> diff --git a/verify.c b/verify.c
>> index 568bae8..970b867 100644
>> --- a/verify.c
>> +++ b/verify.c
>> @@ -72,10 +72,10 @@ void fill_verify_pattern(struct thread_data *td, void *p, unsigned int len,
>>                 if (use_seed)
>>                         __fill_random_buf(p, len, seed);
>>                 else
>> -                       io_u->rand_seed = fill_random_buf(&td->buf_state, p, len);
>> +                       io_u->rand_seed = fill_random_buf(&td->__verify_state, p, len);
>>                 return;
>>         }
>> -
>> +
>>         if (io_u->buf_filled_len >= len) {
>>                 dprint(FD_VERIFY, "using already filled verify pattern b=%d len=%u\n",
>>                         td->o.verify_pattern_bytes, len);
>> @@ -718,6 +718,10 @@ int verify_io_u(struct thread_data *td, struct io_u *io_u)
>>                         memswp(p, p + td->o.verify_offset, header_size);
>>                 hdr = p;
>>
>> +               // Make rand_seed check pass when have verifysort or verify_backlog.
>> +               if (td->o.verifysort || (td->flags & TD_F_VER_BACKLOG))
>> +                       io_u->rand_seed = hdr->rand_seed;
>> +
>>                 ret = verify_header(io_u, hdr);
>>                 switch (ret) {
>>                 case 0:
>> @@ -1056,6 +1060,12 @@ int get_next_verify(struct thread_data *td, struct io_u *io_u)
>>                 remove_trim_entry(td, ipo);
>>                 free(ipo);
>>                 dprint(FD_VERIFY, "get_next_verify: ret io_u %p\n", io_u);
>> +
>> +               if (!td->o.verify_pattern_bytes) {
>> +                       io_u->rand_seed = __rand(&td->__verify_state);
>> +                       if (sizeof(int) != sizeof(long *))
>> +                               io_u->rand_seed *= __rand(&td->__verify_state);
>> +               }
>>                 return 0;
>>         }
>>
>> --
>> 1.9.0.rc1.175.g0b1dcb5
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-04 18:54   ` Puthikorn Voravootivat
@ 2014-02-04 19:21     ` Grant Grundler
  2014-02-05 18:27     ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Grant Grundler @ 2014-02-04 19:21 UTC (permalink / raw)
  To: Puthikorn Voravootivat
  Cc: Grant Grundler, Jens Axboe, FIO List, Gwendal Grignou

On Tue, Feb 4, 2014 at 10:54 AM, Puthikorn Voravootivat
<puthik@chromium.org> wrote:
> I tried to run the same test with fio 2.1.3 and saw the same error.
> This error always occurs when rw=read or randread and verify_backlog=1
>
> Here is an example job file and error message.
>
> Job file:
> [md5-sync-1-0-1-1-1]
> filename=test213/fio/md5-sync-1-0-1-1-1
> loops=1
> direct=0
> iodepth=1
> ioengine=sync
> verify=md5
> size=1m
> bs=4k
> rw=read
> verifysort=1
> verify_backlog=1
>
> Error message:
> verify: bad magic header 869, wanted acca at file
> test213/fio/md5-sync-1-0-1-1-1 offset 585728, length 2290190

Jens,
To restate the obvious, this is an existing bug in 2.1.3 release.  Can
you review this patch please?  I believe it resolves the remaining
issues directly related to rand_seed verification.

My gut feeling (after having spent most of yesterday trying to track
this down) is there is a race condition in the usage of struct io_u
between the verify_backlog and the "workload" thread. In particular, I
was looking at the use of IO_U_F_FREE_DEF in order to see if an io_u
was bing marked free (put_io_u()) prematurely before the
verify_backlog was done. But I didn't find that (yet?).

All the "high touch" payload verification methods pass and "low touch"
(meta) verification fails most (not all) of the time.

thanks
grant


>
>
> On Mon, Feb 3, 2014 at 2:54 PM, Grant Grundler <grundler@chromium.org>
> wrote:
>>
>> On Mon, Feb 3, 2014 at 1:31 PM, Puthikorn Voravootivat
>> <puthik@chromium.org> wrote:
>> > In verify phase, the rand_seed generated on replay does not match
>> > the written rand_seed.
>> >
>> > Multiple problems are causing this:
>> > 1. In verify phase fio does not set io_u->rand_seed to compare with
>> >    hdr->rand_seed
>> > 2. In randrw scenario, fio log is stored in red-black tree in "sorted by
>> > LBA"
>> >    order.  Thus, it is imposible to replay the written order, or rather
>> >    generate the seeds again in the same order.
>> > 3. In write phase, the code currently will generate rand_seed, write
>> > data
>> >    and log rand_seed. When queuedepth > 1, it's possible the writes
>> > complete
>> >    in a different order than rand_seed was generated.  Thus when
>> > replaying
>> >    the log, the generated rand_seed might not match what was written.
>> > 4. verify_backlog option will start verification before all the data has
>> > been
>> >    written and it make rand_seed replay code broken with current design.
>> >
>> > Proposed fixes:
>> > 1. Use of existing verify_state to generate verify header.
>> >    (and assumes this was the original intention of verify_state). And
>> > also
>> >    adds code to replay rand_seed in verify phase.
>> > 2. If verifysort option is not enabled, store the write log in a list
>> > instead
>> >    of the red-black tree. Otherwise, don't attempt to verify the
>> > rand_seed
>> >    in the header.
>> > 3. In write phase,  generate rand_seed, log rand_seed, write data. I.e.
>> > log
>> >    IO transactions in the order generated, not completed.
>> > 4. Don't verify rand_seed when verify_backlog is enabled.
>>
>>
>> We are still seeing failures with rw=read + verify_backlog=1.
>> Investigating.
>>
>> Please send any feedback if you have a chance to look at this anyway.
>> We'll fix the C++ comment. :)
>>
>> cheers,
>> grant
>>
>> >
>> > Signed-off-by: Puthikorn Voravootivat <puthik@chromium.org>
>> > ---
>> >  backend.c | 13 +++++++++++++
>> >  io_u.c    |  6 ------
>> >  iolog.c   |  2 +-
>> >  verify.c  | 14 ++++++++++++--
>> >  4 files changed, 26 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/backend.c b/backend.c
>> > index 93e6632..bf9d066 100644
>> > --- a/backend.c
>> > +++ b/backend.c
>> > @@ -697,6 +697,13 @@ static uint64_t do_io(struct thread_data *td)
>> >                  */
>> >                 if (td->o.verify != VERIFY_NONE && io_u->ddir ==
>> > DDIR_READ &&
>> >                     ((io_u->flags & IO_U_F_VER_LIST) || !td_rw(td))) {
>> > +
>> > +                       if (!td->o.verify_pattern_bytes) {
>> > +                               io_u->rand_seed =
>> > __rand(&td->__verify_state);
>> > +                               if (sizeof(int) != sizeof(long *))
>> > +                                       io_u->rand_seed *=
>> > __rand(&td->__verify_state);
>> > +                       }
>> > +
>> >                         if (td->o.verify_async)
>> >                                 io_u->end_io = verify_io_u_async;
>> >                         else
>> > @@ -707,6 +714,12 @@ static uint64_t do_io(struct thread_data *td)
>> >                 else
>> >                         td_set_runstate(td, TD_RUNNING);
>> >
>> > +               if (td_write(td) && io_u->ddir == DDIR_WRITE &&
>> > +                   td->o.do_verify &&
>> > +                   td->o.verify != VERIFY_NONE &&
>> > +                   !td->o.experimental_verify)
>> > +                       log_io_piece(td, io_u);
>> > +
>> >                 ret = td_io_queue(td, io_u);
>> >                 switch (ret) {
>> >                 case FIO_Q_COMPLETED:
>> > diff --git a/io_u.c b/io_u.c
>> > index 518d884..f68b213 100644
>> > --- a/io_u.c
>> > +++ b/io_u.c
>> > @@ -1623,12 +1623,6 @@ static void io_completed(struct thread_data *td,
>> > struct io_u *io_u,
>> >                                          utime_since_now(&td->start));
>> >                 }
>> >
>> > -               if (td_write(td) && idx == DDIR_WRITE &&
>> > -                   td->o.do_verify &&
>> > -                   td->o.verify != VERIFY_NONE &&
>> > -                   !td->o.experimental_verify)
>> > -                       log_io_piece(td, io_u);
>> > -
>> >                 icd->bytes_done[idx] += bytes;
>> >
>> >                 if (io_u->end_io) {
>> > diff --git a/iolog.c b/iolog.c
>> > index ec29971..017b235 100644
>> > --- a/iolog.c
>> > +++ b/iolog.c
>> > @@ -209,7 +209,7 @@ void log_io_piece(struct thread_data *td, struct
>> > io_u *io_u)
>> >          * drop the old one, which we rely on the rb insert/lookup for
>> >          * handling.
>> >          */
>> > -       if ((!td_random(td) || !td->o.overwrite) &&
>> > +       if (((!td->o.verifysort) || !td_random(td) || !td->o.overwrite)
>> > &&
>> >               (file_randommap(td, ipo->file) || td->o.verify ==
>> > VERIFY_NONE)) {
>> >                 INIT_FLIST_HEAD(&ipo->list);
>> >                 flist_add_tail(&ipo->list, &td->io_hist_list);
>> > diff --git a/verify.c b/verify.c
>> > index 568bae8..970b867 100644
>> > --- a/verify.c
>> > +++ b/verify.c
>> > @@ -72,10 +72,10 @@ void fill_verify_pattern(struct thread_data *td,
>> > void *p, unsigned int len,
>> >                 if (use_seed)
>> >                         __fill_random_buf(p, len, seed);
>> >                 else
>> > -                       io_u->rand_seed =
>> > fill_random_buf(&td->buf_state, p, len);
>> > +                       io_u->rand_seed =
>> > fill_random_buf(&td->__verify_state, p, len);
>> >                 return;
>> >         }
>> > -
>> > +
>> >         if (io_u->buf_filled_len >= len) {
>> >                 dprint(FD_VERIFY, "using already filled verify pattern
>> > b=%d len=%u\n",
>> >                         td->o.verify_pattern_bytes, len);
>> > @@ -718,6 +718,10 @@ int verify_io_u(struct thread_data *td, struct io_u
>> > *io_u)
>> >                         memswp(p, p + td->o.verify_offset, header_size);
>> >                 hdr = p;
>> >
>> > +               // Make rand_seed check pass when have verifysort or
>> > verify_backlog.
>> > +               if (td->o.verifysort || (td->flags & TD_F_VER_BACKLOG))
>> > +                       io_u->rand_seed = hdr->rand_seed;
>> > +
>> >                 ret = verify_header(io_u, hdr);
>> >                 switch (ret) {
>> >                 case 0:
>> > @@ -1056,6 +1060,12 @@ int get_next_verify(struct thread_data *td,
>> > struct io_u *io_u)
>> >                 remove_trim_entry(td, ipo);
>> >                 free(ipo);
>> >                 dprint(FD_VERIFY, "get_next_verify: ret io_u %p\n",
>> > io_u);
>> > +
>> > +               if (!td->o.verify_pattern_bytes) {
>> > +                       io_u->rand_seed = __rand(&td->__verify_state);
>> > +                       if (sizeof(int) != sizeof(long *))
>> > +                               io_u->rand_seed *=
>> > __rand(&td->__verify_state);
>> > +               }
>> >                 return 0;
>> >         }
>> >
>> > --
>> > 1.9.0.rc1.175.g0b1dcb5
>> >
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-03 21:31 [PATCH] fix rand_seed mismatches in verify phase Puthikorn Voravootivat
  2014-02-03 22:54 ` Grant Grundler
@ 2014-02-05 17:15 ` Jens Axboe
  2014-02-05 18:16   ` Grant Grundler
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2014-02-05 17:15 UTC (permalink / raw)
  To: Puthikorn Voravootivat; +Cc: FIO List, Gwendal Grignou, Grant Grundler

On Mon, Feb 03 2014, Puthikorn Voravootivat wrote:
> In verify phase, the rand_seed generated on replay does not match
> the written rand_seed.
> 
> Multiple problems are causing this:
> 1. In verify phase fio does not set io_u->rand_seed to compare with
>    hdr->rand_seed
> 2. In randrw scenario, fio log is stored in red-black tree in "sorted by LBA"
>    order.  Thus, it is imposible to replay the written order, or rather
>    generate the seeds again in the same order.
> 3. In write phase, the code currently will generate rand_seed, write data
>    and log rand_seed. When queuedepth > 1, it's possible the writes complete
>    in a different order than rand_seed was generated.  Thus when replaying
>    the log, the generated rand_seed might not match what was written.
> 4. verify_backlog option will start verification before all the data has been
>    written and it make rand_seed replay code broken with current design.
> 
> Proposed fixes:
> 1. Use of existing verify_state to generate verify header.
>    (and assumes this was the original intention of verify_state). And also
>    adds code to replay rand_seed in verify phase.
> 2. If verifysort option is not enabled, store the write log in a list instead
>    of the red-black tree. Otherwise, don't attempt to verify the rand_seed
>    in the header.
> 3. In write phase,  generate rand_seed, log rand_seed, write data. I.e. log
>    IO transactions in the order generated, not completed.
> 4. Don't verify rand_seed when verify_backlog is enabled.

Seems like a reasonable solution to it all. Thanks! I'll get this
applied.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-05 17:15 ` Jens Axboe
@ 2014-02-05 18:16   ` Grant Grundler
  0 siblings, 0 replies; 12+ messages in thread
From: Grant Grundler @ 2014-02-05 18:16 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Puthikorn Voravootivat, FIO List, Gwendal Grignou, Grant Grundler

On Wed, Feb 5, 2014 at 9:15 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On Mon, Feb 03 2014, Puthikorn Voravootivat wrote:
>> In verify phase, the rand_seed generated on replay does not match
>> the written rand_seed.
>>
>> Multiple problems are causing this:
>> 1. In verify phase fio does not set io_u->rand_seed to compare with
>>    hdr->rand_seed
>> 2. In randrw scenario, fio log is stored in red-black tree in "sorted by LBA"
>>    order.  Thus, it is imposible to replay the written order, or rather
>>    generate the seeds again in the same order.
>> 3. In write phase, the code currently will generate rand_seed, write data
>>    and log rand_seed. When queuedepth > 1, it's possible the writes complete
>>    in a different order than rand_seed was generated.  Thus when replaying
>>    the log, the generated rand_seed might not match what was written.
>> 4. verify_backlog option will start verification before all the data has been
>>    written and it make rand_seed replay code broken with current design.
>>
>> Proposed fixes:
>> 1. Use of existing verify_state to generate verify header.
>>    (and assumes this was the original intention of verify_state). And also
>>    adds code to replay rand_seed in verify phase.
>> 2. If verifysort option is not enabled, store the write log in a list instead
>>    of the red-black tree. Otherwise, don't attempt to verify the rand_seed
>>    in the header.
>> 3. In write phase,  generate rand_seed, log rand_seed, write data. I.e. log
>>    IO transactions in the order generated, not completed.
>> 4. Don't verify rand_seed when verify_backlog is enabled.
>
> Seems like a reasonable solution to it all. Thanks! I'll get this
> applied.

Awesome - I see that it's applied. Puthikorn has one more fix in
flight and I'm going to suggest something you'll probably reject in
another email.

cheers,
grant


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-04 18:54   ` Puthikorn Voravootivat
  2014-02-04 19:21     ` Grant Grundler
@ 2014-02-05 18:27     ` Jens Axboe
  2014-02-05 18:31       ` Grant Grundler
  1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2014-02-05 18:27 UTC (permalink / raw)
  To: Puthikorn Voravootivat; +Cc: Grant Grundler, FIO List, Gwendal Grignou

On Tue, Feb 04 2014, Puthikorn Voravootivat wrote:
> I tried to run the same test with fio 2.1.3 and saw the same error.
> This error always occurs when rw=read or randread and verify_backlog=1
> 
> Here is an example job file and error message.
> 
> Job file:
> [md5-sync-1-0-1-1-1]
> filename=test213/fio/md5-sync-1-0-1-1-1
> loops=1
> direct=0
> iodepth=1
> ioengine=sync
> verify=md5
> size=1m
> bs=4k
> rw=read
> verifysort=1
> verify_backlog=1
> 
> Error message:
> verify: bad magic header 869, wanted acca at file
> test213/fio/md5-sync-1-0-1-1-1 offset 585728, length 2290190

This is indeed breakage between 2.1.2 and 2.1.3. I ran a bisect on it,
it reveals:

20876c53b5d32f2da9049af5e7fb102133946981 is the first bad commit
commit 20876c53b5d32f2da9049af5e7fb102133946981
Author: Juan Casse <jcasse@chromium.org>
Date:   Mon Sep 16 09:22:10 2013 -0700

    Add condition to stop issuing io in do_io().
    
    Problem:
    When specifying iodepth > 1 while doing random async io
    without a random map (and using the default random generator), the
    main while loop in do_io() will issue extra io in the amount of io
    allowed to accumulate in the queue. For example, a job with size=24k
    and bs=8k will perform 4 io instead of the expected 3.
    
    Reason:
    The above behavior occurs because the while loop in do_io()
    will continue until the amount of "completed io" >= size, but io
    in the queue are not accounted for because they have not been
    completed by io_u_queued_complete().
    
    Exceptions:
    The above behavior does not manifest itself when:
        using random_generator=lfsr bacause lfsr knows when to stop
        generating, and get_io_u() inside the while loop returns NULL,
        breaking out of the loop.
        using random map because random map ensures that all blocks
        are hit, and after they are all hit, no more are generated.
    
    Proposed Solution:
    Stop while loop based on bytes issued instead of bytes completed.
    
    Signed-off-by: Juan Casse <jcasse@chromium.org>
    Reviewed-by: Grant Grundler <grundler@chromium.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-05 18:27     ` Jens Axboe
@ 2014-02-05 18:31       ` Grant Grundler
  2014-02-05 19:30         ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Grant Grundler @ 2014-02-05 18:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Puthikorn Voravootivat, Grant Grundler, FIO List, Gwendal Grignou

On Wed, Feb 5, 2014 at 10:27 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On Tue, Feb 04 2014, Puthikorn Voravootivat wrote:
>> I tried to run the same test with fio 2.1.3 and saw the same error.
>> This error always occurs when rw=read or randread and verify_backlog=1
>>
>> Here is an example job file and error message.
>>
>> Job file:
>> [md5-sync-1-0-1-1-1]
>> filename=test213/fio/md5-sync-1-0-1-1-1
>> loops=1
>> direct=0
>> iodepth=1
>> ioengine=sync
>> verify=md5
>> size=1m
>> bs=4k
>> rw=read
>> verifysort=1
>> verify_backlog=1
>>
>> Error message:
>> verify: bad magic header 869, wanted acca at file
>> test213/fio/md5-sync-1-0-1-1-1 offset 585728, length 2290190
>
> This is indeed breakage between 2.1.2 and 2.1.3. I ran a bisect on it,
> it reveals:
>
> 20876c53b5d32f2da9049af5e7fb102133946981 is the first bad commit
> commit 20876c53b5d32f2da9049af5e7fb102133946981
> Author: Juan Casse <jcasse@chromium.org>
> Date:   Mon Sep 16 09:22:10 2013 -0700
>
>     Add condition to stop issuing io in do_io().

I think Puthik has a fix related to this but I haven't reviewed his
next patch yet.  We've been talking about this behavior though.

I'll take another look at this patch to see what it's changing.

thanks,
grant

>
>     Problem:
>     When specifying iodepth > 1 while doing random async io
>     without a random map (and using the default random generator), the
>     main while loop in do_io() will issue extra io in the amount of io
>     allowed to accumulate in the queue. For example, a job with size=24k
>     and bs=8k will perform 4 io instead of the expected 3.
>
>     Reason:
>     The above behavior occurs because the while loop in do_io()
>     will continue until the amount of "completed io" >= size, but io
>     in the queue are not accounted for because they have not been
>     completed by io_u_queued_complete().
>
>     Exceptions:
>     The above behavior does not manifest itself when:
>         using random_generator=lfsr bacause lfsr knows when to stop
>         generating, and get_io_u() inside the while loop returns NULL,
>         breaking out of the loop.
>         using random map because random map ensures that all blocks
>         are hit, and after they are all hit, no more are generated.
>
>     Proposed Solution:
>     Stop while loop based on bytes issued instead of bytes completed.
>
>     Signed-off-by: Juan Casse <jcasse@chromium.org>
>     Reviewed-by: Grant Grundler <grundler@chromium.org>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-05 18:31       ` Grant Grundler
@ 2014-02-05 19:30         ` Jens Axboe
  2014-02-05 21:31           ` Grant Grundler
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2014-02-05 19:30 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Puthikorn Voravootivat, FIO List, Gwendal Grignou

On Wed, Feb 05 2014, Grant Grundler wrote:
> On Wed, Feb 5, 2014 at 10:27 AM, Jens Axboe <axboe@kernel.dk> wrote:
> > On Tue, Feb 04 2014, Puthikorn Voravootivat wrote:
> >> I tried to run the same test with fio 2.1.3 and saw the same error.
> >> This error always occurs when rw=read or randread and verify_backlog=1
> >>
> >> Here is an example job file and error message.
> >>
> >> Job file:
> >> [md5-sync-1-0-1-1-1]
> >> filename=test213/fio/md5-sync-1-0-1-1-1
> >> loops=1
> >> direct=0
> >> iodepth=1
> >> ioengine=sync
> >> verify=md5
> >> size=1m
> >> bs=4k
> >> rw=read
> >> verifysort=1
> >> verify_backlog=1
> >>
> >> Error message:
> >> verify: bad magic header 869, wanted acca at file
> >> test213/fio/md5-sync-1-0-1-1-1 offset 585728, length 2290190
> >
> > This is indeed breakage between 2.1.2 and 2.1.3. I ran a bisect on it,
> > it reveals:
> >
> > 20876c53b5d32f2da9049af5e7fb102133946981 is the first bad commit
> > commit 20876c53b5d32f2da9049af5e7fb102133946981
> > Author: Juan Casse <jcasse@chromium.org>
> > Date:   Mon Sep 16 09:22:10 2013 -0700
> >
> >     Add condition to stop issuing io in do_io().
> 
> I think Puthik has a fix related to this but I haven't reviewed his
> next patch yet.  We've been talking about this behavior though.
> 
> I'll take another look at this patch to see what it's changing.

The below should take care of it. Juan's fix didn't account for verify
exceeding size=.

diff --git a/backend.c b/backend.c
index bf9d066e012a..3ac72e771cb9 100644
--- a/backend.c
+++ b/backend.c
@@ -642,7 +642,7 @@ static uint64_t do_io(struct thread_data *td)
 	uint64_t bytes_done[DDIR_RWDIR_CNT] = { 0, 0, 0 };
 	unsigned int i;
 	int ret = 0;
-	uint64_t bytes_issued = 0;
+	uint64_t total_bytes, bytes_issued = 0;
 
 	if (in_ramp_time(td))
 		td_set_runstate(td, TD_RAMP);
@@ -651,6 +651,10 @@ static uint64_t do_io(struct thread_data *td)
 
 	lat_target_init(td);
 
+	total_bytes = td->o.size;
+	if (td->o.verify != VERIFY_NONE && td_write(td))
+		total_bytes += td->o.size;
+
 	while ((td->o.read_iolog_file && !flist_empty(&td->io_log_list)) ||
 		(!flist_empty(&td->trim_list)) || !io_bytes_exceeded(td) ||
 		td->o.time_based) {
@@ -678,7 +682,7 @@ static uint64_t do_io(struct thread_data *td)
 		if (flow_threshold_exceeded(td))
 			continue;
 
-		if (bytes_issued >= (uint64_t) td->o.size)
+		if (bytes_issued >= total_bytes)
 			break;
 
 		io_u = get_io_u(td);

-- 
Jens Axboe



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-05 19:30         ` Jens Axboe
@ 2014-02-05 21:31           ` Grant Grundler
  2014-02-05 21:38             ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Grant Grundler @ 2014-02-05 21:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Grant Grundler, Puthikorn Voravootivat, FIO List, Gwendal Grignou

On Wed, Feb 5, 2014 at 11:30 AM, Jens Axboe <axboe@kernel.dk> wrote:
...
> The below should take care of it. Juan's fix didn't account for verify
> exceeding size=.

Ah...Juan didn't test with that option and Puthikorn did.

> diff --git a/backend.c b/backend.c
> index bf9d066e012a..3ac72e771cb9 100644
> --- a/backend.c
> +++ b/backend.c
> @@ -642,7 +642,7 @@ static uint64_t do_io(struct thread_data *td)
>         uint64_t bytes_done[DDIR_RWDIR_CNT] = { 0, 0, 0 };
>         unsigned int i;
>         int ret = 0;
> -       uint64_t bytes_issued = 0;
> +       uint64_t total_bytes, bytes_issued = 0;
>
>         if (in_ramp_time(td))
>                 td_set_runstate(td, TD_RAMP);
> @@ -651,6 +651,10 @@ static uint64_t do_io(struct thread_data *td)
>
>         lat_target_init(td);
>
> +       total_bytes = td->o.size;
> +       if (td->o.verify != VERIFY_NONE && td_write(td))
> +               total_bytes += td->o.size;
> +

The amount of verify IO depends on what percentage is write traffic.
Is each thread strictly reader or writer?

I didn't think so or we wouldn't need "td->this_io_bytes[]" for
DDIR_WRITE, DDIR_READ, and DDIR_TRIM at the same time.

>         while ((td->o.read_iolog_file && !flist_empty(&td->io_log_list)) ||
>                 (!flist_empty(&td->trim_list)) || !io_bytes_exceeded(td) ||

I've gotten a bit hung up on what io_bytes_exceeded() uses (
td->this_io_bytes[]), bytes_done[], and bytes_issued. Seems like we
have three different mechanisms to do essentially the same thing. Can
I rip two of them out and just count "issued IO" for WRITE/READ/TRIM?

"io_bytes_exceeded()" is the only place I would expect this logic to
live in this loop. Having additional "break" in the middle of the loop
implies we've broken the original design.

cheers,
grant

>                 td->o.time_based) {
> @@ -678,7 +682,7 @@ static uint64_t do_io(struct thread_data *td)
>                 if (flow_threshold_exceeded(td))
>                         continue;
>
> -               if (bytes_issued >= (uint64_t) td->o.size)
> +               if (bytes_issued >= total_bytes)
>                         break;
>
>                 io_u = get_io_u(td);
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fix rand_seed mismatches in verify phase
  2014-02-05 21:31           ` Grant Grundler
@ 2014-02-05 21:38             ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2014-02-05 21:38 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Puthikorn Voravootivat, FIO List, Gwendal Grignou

On Wed, Feb 05 2014, Grant Grundler wrote:
> On Wed, Feb 5, 2014 at 11:30 AM, Jens Axboe <axboe@kernel.dk> wrote:
> ...
> > The below should take care of it. Juan's fix didn't account for verify
> > exceeding size=.
> 
> Ah...Juan didn't test with that option and Puthikorn did.
> 
> > diff --git a/backend.c b/backend.c
> > index bf9d066e012a..3ac72e771cb9 100644
> > --- a/backend.c
> > +++ b/backend.c
> > @@ -642,7 +642,7 @@ static uint64_t do_io(struct thread_data *td)
> >         uint64_t bytes_done[DDIR_RWDIR_CNT] = { 0, 0, 0 };
> >         unsigned int i;
> >         int ret = 0;
> > -       uint64_t bytes_issued = 0;
> > +       uint64_t total_bytes, bytes_issued = 0;
> >
> >         if (in_ramp_time(td))
> >                 td_set_runstate(td, TD_RAMP);
> > @@ -651,6 +651,10 @@ static uint64_t do_io(struct thread_data *td)
> >
> >         lat_target_init(td);
> >
> > +       total_bytes = td->o.size;
> > +       if (td->o.verify != VERIFY_NONE && td_write(td))
> > +               total_bytes += td->o.size;
> > +
> 
> The amount of verify IO depends on what percentage is write traffic.
> Is each thread strictly reader or writer?

It does indeed. Note I updated this to depend on verify_backlog as well,
since that is the important bit. Each thread can have any mix of
read/write/trim.

> I didn't think so or we wouldn't need "td->this_io_bytes[]" for
> DDIR_WRITE, DDIR_READ, and DDIR_TRIM at the same time.
> 
> >         while ((td->o.read_iolog_file && !flist_empty(&td->io_log_list)) ||
> >                 (!flist_empty(&td->trim_list)) || !io_bytes_exceeded(td) ||
> 
> I've gotten a bit hung up on what io_bytes_exceeded() uses (
> td->this_io_bytes[]), bytes_done[], and bytes_issued. Seems like we
> have three different mechanisms to do essentially the same thing. Can
> I rip two of them out and just count "issued IO" for WRITE/READ/TRIM?

this_io_bytes[] is per loop.
> 
> "io_bytes_exceeded()" is the only place I would expect this logic to
> live in this loop. Having additional "break" in the middle of the loop
> implies we've broken the original design.

The fix is a bit of a hack, but the original commit was a hack too. It
would be a lot better to track this explicitly. For now it's just an
upper cap, normally we expect other conditions to terminate it.

So I'd welcome a cleanup in this area for sure.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-02-05 21:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-03 21:31 [PATCH] fix rand_seed mismatches in verify phase Puthikorn Voravootivat
2014-02-03 22:54 ` Grant Grundler
2014-02-04 18:54   ` Puthikorn Voravootivat
2014-02-04 19:21     ` Grant Grundler
2014-02-05 18:27     ` Jens Axboe
2014-02-05 18:31       ` Grant Grundler
2014-02-05 19:30         ` Jens Axboe
2014-02-05 21:31           ` Grant Grundler
2014-02-05 21:38             ` Jens Axboe
2014-02-04 18:56   ` Puthikorn Voravootivat
2014-02-05 17:15 ` Jens Axboe
2014-02-05 18:16   ` Grant Grundler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.