All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about `generic_write_checks()` FIXME comment
@ 2021-12-30  8:07 Hans Montero
  2021-12-30 13:59 ` Al Viro
  2021-12-30 19:04 ` Theodore Ts'o
  0 siblings, 2 replies; 5+ messages in thread
From: Hans Montero @ 2021-12-30  8:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Tal Zussman, Xijiao Li, OS-TA

Hey there,

We're the teaching staff for Columbia University's graduate-level operating
systems course and something came up as we were improving our filesystem
assignment.

For context, we have students develop a very simple filesystem that implements
the `read()`/`write()` `struct file_operations` functions as opposed to
`read_iter()`/`write_iter()`.

We noticed that the following bash redirection didn't work as expected in our
filesystem: `echo test >> foo.txt`. The write would occur at the beginning of
the file because the `ppos` argument to `write()` was set to 0. Through
`strace`, we verified that bash opens the file with `O_APPEND` so we assumed
that VFS would handle setting `ppos` to the file size for us, yet it did not.

We dug through kernel code to see if other filesystems explicitly account for
this case and surely enough, they do! Most of the filesystems we saw
directly/indirectly call `generic_write_checks()` which is implemented for
`write_iter()` usage. We were able to solve our bug by simply porting the
following logic for our version of `write()`:

  /* FIXME: this is for backwards compatibility with 2.4 */
  if (iocb->ki_flags & IOCB_APPEND)
          iocb->ki_pos = i_size_read(inode);

We noticed the FIXME comment and we weren't sure exactly what it meant so we
kept tracing through older versions of `generic_write_checks()`, going as far as
Linux 2.5.75, before it was implemented for `write_iter()` usage:

  if (!isblk) {
          /* FIXME: this is for backwards compatibility with 2.4 */
          if (file->f_flags & O_APPEND)
                  *pos = inode->i_size;
          ...
  }

Can someone explain what this comment is referring to? And why does `!isblk`
matter?

Furthermore, why doesn't VFS do the `O_APPEND` check itself instead of
delegating the task to the filesystems? It seems like a universal check,
especially given the following snippet on `O_APPEND` from the man page for
open(2):

  APPEND
      The file is opened in append mode. Before each write(2), the file offset
      is positioned at the end of the file, as if with lseek(2).

Best,
Hans

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about `generic_write_checks()` FIXME comment
  2021-12-30  8:07 Question about `generic_write_checks()` FIXME comment Hans Montero
@ 2021-12-30 13:59 ` Al Viro
  2021-12-30 19:04 ` Theodore Ts'o
  1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2021-12-30 13:59 UTC (permalink / raw)
  To: Hans Montero; +Cc: linux-fsdevel, Tal Zussman, Xijiao Li, OS-TA

On Thu, Dec 30, 2021 at 03:07:25AM -0500, Hans Montero wrote:
 
> Furthermore, why doesn't VFS do the `O_APPEND` check itself instead of
> delegating the task to the filesystems? It seems like a universal check,
> especially given the following snippet on `O_APPEND` from the man page for
> open(2):
> 
>   APPEND
>       The file is opened in append mode. Before each write(2), the file offset
>       is positioned at the end of the file, as if with lseek(2).

Because we need to make sure that no other thread will grow the file between
the check and actual IO?  And exclusion might very well differ for different
filesystems.  Incidentally, what kind of semantics could you assign to
O_APPEND on a block device?  Other than "ignore it", that is...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about `generic_write_checks()` FIXME comment
  2021-12-30  8:07 Question about `generic_write_checks()` FIXME comment Hans Montero
  2021-12-30 13:59 ` Al Viro
@ 2021-12-30 19:04 ` Theodore Ts'o
  2021-12-30 20:06   ` Al Viro
  1 sibling, 1 reply; 5+ messages in thread
From: Theodore Ts'o @ 2021-12-30 19:04 UTC (permalink / raw)
  To: Hans Montero; +Cc: linux-fsdevel, Tal Zussman, Xijiao Li, OS-TA, Al Viro

On Thu, Dec 30, 2021 at 03:07:25AM -0500, Hans Montero wrote:
> We noticed the FIXME comment and we weren't sure exactly what it meant so we
> kept tracing through older versions of `generic_write_checks()`, going as far as
> Linux 2.5.75, before it was implemented for `write_iter()` usage:
> 
>   if (!isblk) {
>           /* FIXME: this is for backwards compatibility with 2.4 */
>           if (file->f_flags & O_APPEND)
>                   *pos = inode->i_size;
>           ...
>   }

I was able to trace it back farther, to v2.4.14.8 -> v2.4.14.9.  And it
looks like the mysterious FIXME is about the block device:

commit 1040c54c3b98ac4f8d91bc313cdc9d6669481da3
Author: Linus Torvalds <torvalds@athlon.transmeta.com>
Date:   Mon Feb 4 20:33:54 2002 -0800
	...
@@ -2765,7 +2876,8 @@ generic_file_write(struct file *file,const char *buf,size_t count, loff_t *ppos)
 
 	written = 0;
 
-	if (file->f_flags & O_APPEND)
+	/* FIXME: this is for backwards compatibility with 2.4 */
+	if (!S_ISBLK(inode->i_mode) && file->f_flags & O_APPEND)
 		pos = inode->i_size;
 
 	/*

 But as Al Viro pointed out, O_APPEND really can't have any real
 meaning for a block device.  It would be neat if we could magically
 make a 10TB HDD to become as 12TB HDD by writing 2TB using
 O_APPEND, but reality doesn't work that way.  :-)

It probably makes sense just to remove the FIXME from the current
kernel sources so that future people don't get confused, asking the
same questions you have.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about `generic_write_checks()` FIXME comment
  2021-12-30 19:04 ` Theodore Ts'o
@ 2021-12-30 20:06   ` Al Viro
  2021-12-31  7:23     ` Tal Zussman
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2021-12-30 20:06 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Hans Montero, linux-fsdevel, Tal Zussman, Xijiao Li, OS-TA

On Thu, Dec 30, 2021 at 02:04:46PM -0500, Theodore Ts'o wrote:

>  But as Al Viro pointed out, O_APPEND really can't have any real
>  meaning for a block device.  It would be neat if we could magically
>  make a 10TB HDD to become as 12TB HDD by writing 2TB using
>  O_APPEND, but reality doesn't work that way.  :-)
> 
> It probably makes sense just to remove the FIXME from the current
> kernel sources so that future people don't get confused, asking the
> same questions you have.

*nod*

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about `generic_write_checks()` FIXME comment
  2021-12-30 20:06   ` Al Viro
@ 2021-12-31  7:23     ` Tal Zussman
  0 siblings, 0 replies; 5+ messages in thread
From: Tal Zussman @ 2021-12-31  7:23 UTC (permalink / raw)
  To: Al Viro; +Cc: Theodore Ts'o, Hans Montero, linux-fsdevel, Xijiao Li, OS-TA

Thanks so much for the detailed responses! That makes a lot of sense.
We'll send out a patch removing the comment soon.

Thanks,
Tal

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-31  8:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-30  8:07 Question about `generic_write_checks()` FIXME comment Hans Montero
2021-12-30 13:59 ` Al Viro
2021-12-30 19:04 ` Theodore Ts'o
2021-12-30 20:06   ` Al Viro
2021-12-31  7:23     ` Tal Zussman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.