All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: mkfs.ext2 succeeds despite nbd write errors?
       [not found]   ` <CALuDM+Aw9JCMVbQcRKaaUu+=+xvDpMgL8sUZpT2snz9Qp5DqpA@mail.gmail.com>
@ 2015-11-07 21:02     ` Richard W.M. Jones
  2015-11-07 23:09       ` Jason Pepas
  2015-11-07 23:23       ` Jason Pepas
  0 siblings, 2 replies; 6+ messages in thread
From: Richard W.M. Jones @ 2015-11-07 21:02 UTC (permalink / raw)
  To: Jason Pepas; +Cc: linux-ext4, libguestfs

[Adding linux-ext4 mailing list.  The original bug report is here:
https://www.redhat.com/archives/libguestfs/2015-November/msg00078.html ]

On Sat, Nov 07, 2015 at 01:22:45PM -0600, Jason Pepas wrote:
> On Sat, Nov 7, 2015 at 5:03 AM, Richard W.M. Jones <rjones@redhat.com> wrote:
> > How about 'strace mkfs.ext2 ..' and see if any system calls are
> > returning errors.  That would show you whether nbd-client is throwing
> > errors away, or whether mkfs is getting the errors and ignoring them
> > (seems pretty unlikely, but you never know).
> >
> > After that, it'd be down to tracing where the errors end up in the
> > kernel.
> 
> Thanks for the tip!
> 
> The results are interesting.  It looks like all of mkfs's pwrite()
> calls succeed, but its final fsync() calls do actually fail:
> 
> 
> root@debian:~# strace mkfs.ext2 /dev/nbd0 2>&1 | tee strace.out
> 
> root@debian:~# cat strace.out | grep pwrite
> pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 32768, 8187379712) = 32768
> pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 32768, 8187412480) = 32768
> pwrite(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 32768, 8187445248) = 32768
> ...
> 
> root@debian:~# cat strace.out | grep fsync
> fsync(3)                                = -1 EIO (Input/output error)
> fsync(3)                                = -1 EIO (Input/output error)
> 
> 
> The fsync() calls happen just before mkfs exists success:
> 
> 
> root@debian:~# cat strace.out | tail
> pwrite(3, "\1\2\0\0\2\2\0\0\3\2\0\0\367{\365\37\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 4096, 6576672768) = 4096
> fsync(3)                                = -1 EIO (Input/output error)
> pwrite(3, "\0\0\10\0\0\0
> \0\231\231\1\0qm\37\0\365\377\7\0\0\0\0\0\2\0\0\0\2\0\0\0"..., 1024,
> 1024) = 1024
> fsync(3)                                = -1 EIO (Input/output error)
> close(3)                                = 0
> write(1, "done\n\n", 6done
> 
> )                 = 6
> exit_group(0)                           = ?
> +++ exited with 0 +++
> root@debian:~#
> 
> 
> I did manage to find two calls to fsync in the e2fsprogs source which
> are not return-value-checked:
> 
> https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/misc/filefrag.c#L303
> https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/lib/ext2fs/unix_io.c#L915

That second one looks very suspicious to me.  I don't think that it's
ever right for mke2fs to ignore the return value from an fsync call,
so assuming mke2fs calls that function it's surely a bug.

> I'll see about submitting a patch there.
> 
> I'm not sure where to start with hunting down why mkfs's pwrite()
> calls aren't failing.  I'd look to the kernel source for that?

It looks like it's really an e2fsprogs problem, not a kernel problem.
That's pretty surprising - I wasn't expecting it.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mkfs.ext2 succeeds despite nbd write errors?
  2015-11-07 21:02     ` mkfs.ext2 succeeds despite nbd write errors? Richard W.M. Jones
@ 2015-11-07 23:09       ` Jason Pepas
  2015-11-07 23:20         ` Richard W.M. Jones
  2015-11-07 23:23       ` Jason Pepas
  1 sibling, 1 reply; 6+ messages in thread
From: Jason Pepas @ 2015-11-07 23:09 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: linux-ext4, libguestfs

On Sat, Nov 7, 2015 at 3:02 PM, Richard W.M. Jones <rjones@redhat.com> wrote:
>> I'm not sure where to start with hunting down why mkfs's pwrite()
>> calls aren't failing.  I'd look to the kernel source for that?
>
> It looks like it's really an e2fsprogs problem, not a kernel problem.
> That's pretty surprising - I wasn't expecting it.

I agree the fsync() issue is an e2fsprogs problem, but as for
specifically the pwrite() calls not getting a -1 return value, that's
the kernel's fault, right?

I've been rolling this around in my mind and I think I can see why the
kernel would correctly make fsync() fail but not pwrite() fail.  Let
me run this by you:

When a pwrite() happens, that doesn't immediately cause nbd to send a
network packet out, and doesn't wait on a network reply before
returning, right?  It just ends up in some dirty block device queue,
I'm guessing?  And then something triggers a bunch of dirty blocks to
get flushed out to "disk"?  If that's the case, then its impossible
for the kernel to give an accurate return code to pwrite(), because it
doesn't know those blocks will eventually fail to be written to "disk"
(nbd).

But as for fsync(), the kernel is probably waiting until every last
dirty sector gets written before it decides what the return code is,
which is why we see that pwrite() isn't failing, but fsync() is
failing.

Does that make sense?

I wonder if the block device were opened with O_DIRECT by e2fsprogs if
that would cause the pwrite() calls to fail correctly?

-jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mkfs.ext2 succeeds despite nbd write errors?
  2015-11-07 23:09       ` Jason Pepas
@ 2015-11-07 23:20         ` Richard W.M. Jones
  2015-11-07 23:25           ` Jason Pepas
  0 siblings, 1 reply; 6+ messages in thread
From: Richard W.M. Jones @ 2015-11-07 23:20 UTC (permalink / raw)
  To: Jason Pepas; +Cc: linux-ext4, libguestfs

On Sat, Nov 07, 2015 at 05:09:52PM -0600, Jason Pepas wrote:
> On Sat, Nov 7, 2015 at 3:02 PM, Richard W.M. Jones <rjones@redhat.com> wrote:
> >> I'm not sure where to start with hunting down why mkfs's pwrite()
> >> calls aren't failing.  I'd look to the kernel source for that?
> >
> > It looks like it's really an e2fsprogs problem, not a kernel problem.
> > That's pretty surprising - I wasn't expecting it.
> 
> I agree the fsync() issue is an e2fsprogs problem, but as for
> specifically the pwrite() calls not getting a -1 return value, that's
> the kernel's fault, right?

I'm definitely not an expert here, but I do recall being told that
writes and reads are allowed to return an "OK" indication, but a later
close(2) or fsync(2) might fail.  That is particularly a problem with NFS.

I'll leave the rest to the true experts on the ext4 mailing list.

> I've been rolling this around in my mind and I think I can see why the
> kernel would correctly make fsync() fail but not pwrite() fail.  Let
> me run this by you:
> 
> When a pwrite() happens, that doesn't immediately cause nbd to send a
> network packet out, and doesn't wait on a network reply before
> returning, right?  It just ends up in some dirty block device queue,
> I'm guessing?  And then something triggers a bunch of dirty blocks to
> get flushed out to "disk"?  If that's the case, then its impossible
> for the kernel to give an accurate return code to pwrite(), because it
> doesn't know those blocks will eventually fail to be written to "disk"
> (nbd).
> 
> But as for fsync(), the kernel is probably waiting until every last
> dirty sector gets written before it decides what the return code is,
> which is why we see that pwrite() isn't failing, but fsync() is
> failing.
> 
> Does that make sense?
> 
> I wonder if the block device were opened with O_DIRECT by e2fsprogs if
> that would cause the pwrite() calls to fail correctly?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mkfs.ext2 succeeds despite nbd write errors?
  2015-11-07 21:02     ` mkfs.ext2 succeeds despite nbd write errors? Richard W.M. Jones
  2015-11-07 23:09       ` Jason Pepas
@ 2015-11-07 23:23       ` Jason Pepas
  1 sibling, 0 replies; 6+ messages in thread
From: Jason Pepas @ 2015-11-07 23:23 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: linux-ext4, libguestfs

[-- Attachment #1: Type: text/plain, Size: 2693 bytes --]

On Sat, Nov 7, 2015 at 3:02 PM, Richard W.M. Jones <rjones@redhat.com> wrote:
> On Sat, Nov 07, 2015 at 01:22:45PM -0600, Jason Pepas wrote:
>> I did manage to find two calls to fsync in the e2fsprogs source which
>> are not return-value-checked:
>>
>> https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/misc/filefrag.c#L303
>> https://github.com/tytso/e2fsprogs/blob/956b0f18a5ddb6815a9dff4f10a1e3125cdca9ba/lib/ext2fs/unix_io.c#L915
>
> That second one looks very suspicious to me.  I don't think that it's
> ever right for mke2fs to ignore the return value from an fsync call,
> so assuming mke2fs calls that function it's surely a bug.


I locally patched mke2fs to check the return value of those two
fsync() calls, and now it exits failure like it should:


root@debian:~/mkfs/e2fsprogs-1.42.12/build/misc# strace ./mke2fs
/dev/nbd0 2>&1 | tail
write(1, "\10\10\10\10\1)          = 5
pwrite(3, "\1\2\0\0\2\2\0\0\3\2\0\0\367{\365\37\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
4096, 6576672768) = 4096
fsync(3)                                = -1 EIO (Input/output error)
pwrite(3, "\0\0\10\0\0\0
\0\231\231\1\0qm\37\0\365\377\7\0\0\0\0\0\2\0\0\0\2\0\0\0"..., 1024,
1024) = 1024
fsync(3)                                = -1 EIO (Input/output error)
close(3)                                = 0
write(2, "\nWarning, had trouble writing ou"..., 46
Warning, had trouble writing out superblocks.) = 46
exit_group(5)                           = ?
+++ exited with 5 +++



The patches are pretty simple:


root@debian:~/mkfs# diff -urN e2fsprogs-1.42.12.orig/misc/filefrag.c
e2fsprogs-1.42.12/misc/filefrag.c
--- e2fsprogs-1.42.12.orig/misc/filefrag.c      2014-08-13
14:57:29.000000000 -0500
+++ e2fsprogs-1.42.12/misc/filefrag.c   2015-11-07 13:44:17.089128243 -0600
@@ -292,8 +292,11 @@
                fm_ext.fe_flags = FIEMAP_EXTENT_MERGED;
        }

-       if (sync_file)
-               fsync(fd);
+       if (sync_file) {
+               int rc = fsync(fd);
+               if (rc < 0)
+                       return rc;
+       }

        for (i = 0, logical = 0, *num_extents = 0, count = last_block = 0;
             i < numblocks;
root@debian:~/mkfs# diff -urN
e2fsprogs-1.42.12.orig/lib/ext2fs/unix_io.c
e2fsprogs-1.42.12/lib/ext2fs/unix_io.c
--- e2fsprogs-1.42.12.orig/lib/ext2fs/unix_io.c 2014-08-08
15:59:39.000000000 -0500
+++ e2fsprogs-1.42.12/lib/ext2fs/unix_io.c      2015-11-07
14:31:28.209005806 -0600
@@ -887,7 +887,9 @@
 #ifndef NO_IO_CACHE
        retval = flush_cached_blocks(channel, data, 0);
 #endif
-       fsync(data->dev);
+       if (fsync(data->dev) == -1)
+               return errno;
+
        return retval;
 }


(attached)

-jason

[-- Attachment #2: filefrag.patch --]
[-- Type: text/x-patch, Size: 395 bytes --]

--- e2fsprogs-1.42.12.orig/lib/ext2fs/unix_io.c 2014-08-08 15:59:39.000000000 -0500
+++ e2fsprogs-1.42.12/lib/ext2fs/unix_io.c      2015-11-07 14:31:28.209005806 -0600
@@ -887,7 +887,9 @@
 #ifndef NO_IO_CACHE
        retval = flush_cached_blocks(channel, data, 0);
 #endif
-       fsync(data->dev);
+       if (fsync(data->dev) == -1)
+               return errno;
+
        return retval;
 }
 

[-- Attachment #3: unix_io.patch --]
[-- Type: text/x-patch, Size: 395 bytes --]

--- e2fsprogs-1.42.12.orig/lib/ext2fs/unix_io.c 2014-08-08 15:59:39.000000000 -0500
+++ e2fsprogs-1.42.12/lib/ext2fs/unix_io.c      2015-11-07 14:31:28.209005806 -0600
@@ -887,7 +887,9 @@
 #ifndef NO_IO_CACHE
        retval = flush_cached_blocks(channel, data, 0);
 #endif
-       fsync(data->dev);
+       if (fsync(data->dev) == -1)
+               return errno;
+
        return retval;
 }
 

[-- Attachment #4: Type: text/plain, Size: 144 bytes --]

_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://www.redhat.com/mailman/listinfo/libguestfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mkfs.ext2 succeeds despite nbd write errors?
  2015-11-07 23:20         ` Richard W.M. Jones
@ 2015-11-07 23:25           ` Jason Pepas
  2015-11-07 23:47             ` [Libguestfs] " Jason Pepas
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Pepas @ 2015-11-07 23:25 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: linux-ext4, libguestfs

On Sat, Nov 7, 2015 at 5:20 PM, Richard W.M. Jones <rjones@redhat.com> wrote:
> I'm definitely not an expert here, but I do recall being told that
> writes and reads are allowed to return an "OK" indication, but a later
> close(2) or fsync(2) might fail.  That is particularly a problem with NFS.
>
> I'll leave the rest to the true experts on the ext4 mailing list.

Thanks for you help!  I'll move the rest of this discussion to their
mailing list.

-jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Libguestfs] mkfs.ext2 succeeds despite nbd write errors?
  2015-11-07 23:25           ` Jason Pepas
@ 2015-11-07 23:47             ` Jason Pepas
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Pepas @ 2015-11-07 23:47 UTC (permalink / raw)
  Cc: linux-ext4

On Sat, Nov 7, 2015 at 5:25 PM, Jason Pepas <jasonpepas@gmail.com> wrote:
> On Sat, Nov 7, 2015 at 5:20 PM, Richard W.M. Jones <rjones@redhat.com> wrote:
>>
>> I'll leave the rest to the true experts on the ext4 mailing list.
>
> Thanks for you help!  I'll move the rest of this discussion to their
> mailing list.

Hi ext4 folk,

I ran into a situation where mke2fs was writing to faulty block device
(an nbd server), but mke2fs didn't report a problem, so I didn't know
that my newly created filesystem was corrupted.

I've traced this down to the return value not being checked on the
final fsync() call.

I've confirmed locally that checking this return value and passing it
up the call stack will correctly cause mke2fs to exit non-zero.

I've created a pull request on github here:
https://github.com/tytso/e2fsprogs/pull/4

Thanks,
Jason Pepas

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-11-07 23:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALuDM+AtyUUMFcdTOocxCw9BxQ4PBx5aeYVYwmy_UErh1MZjww@mail.gmail.com>
     [not found] ` <20151107110354.GM1908@redhat.com>
     [not found]   ` <CALuDM+Aw9JCMVbQcRKaaUu+=+xvDpMgL8sUZpT2snz9Qp5DqpA@mail.gmail.com>
2015-11-07 21:02     ` mkfs.ext2 succeeds despite nbd write errors? Richard W.M. Jones
2015-11-07 23:09       ` Jason Pepas
2015-11-07 23:20         ` Richard W.M. Jones
2015-11-07 23:25           ` Jason Pepas
2015-11-07 23:47             ` [Libguestfs] " Jason Pepas
2015-11-07 23:23       ` Jason Pepas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.