All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Black <daniel@mariadb.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: fcntl(fd, F_SETFL, O_DIRECT) succeeds followed by EINVAL in write
Date: Thu, 27 Jan 2022 09:03:36 +1100	[thread overview]
Message-ID: <CABVffEPReS0d1dN2eKCry_k6K0LCGNNjGf04O3c7-h6P1Q_9zg@mail.gmail.com> (raw)
In-Reply-To: <YfC5vuwQyxoMfWLP@casper.infradead.org>

On Wed, Jan 26, 2022 at 2:02 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Wed, Jan 26, 2022 at 09:05:48AM +1100, Daniel Black wrote:
>
> O_RDONLY is defined to be 0, so don't worry about it.

Thanks.

> > The kernel code in setfl seems to want to return EINVAL for
> > filesystems without a direct_IO structure member assigned,
> >
> > A noop_direct_IO seems to be used frequently to just return EINVAL
> > (like cifs_direct_io).
>
> Sorry for the confusion.  You've caught us mid-transition.  Eventually,
> ->direct_IO will be deleted, but for now it signifies whether or not the
> filesystem supports O_DIRECT, even though it's not used (except in some
> scenarios you don't care about).

Is it going to be reasonable to expect fcntl(fd, F_SETFL, O_DIRECT) to
return EINVAL if O_DIRECT isn't supported?

> > Lastly on the list of peculiar behaviors here, is tmpfs will return
> > EINVAL from the fcntl call however it works fine with O_DIRECT
> > (https://bugs.mysql.com/bug.php?id=26662). MySQL (and MariaDB still
> > has the same code) that currently ignores EINVAL, but I'm willing to
> > make that code better.
>
> Out of interest, what behaviour do you _want_ from doing O_DIRECT
> to tmpfs?  O_DIRECT is defined to bypass the page cache, but tmpfs
> only stores data in the page cache.  So what do you intend to happen?

It occurs to me because EINVAL is returned, it's just operating in
non-O_DIRECT mode.

It occurs to me that someone probably added this because (too much)
MySQL/MariaDB
testing is done on tmpfs and someone didn't want to adjust the test
suite to handle
failures everywhere on O_DIRECT. I don't think there was any kernel
expectation there.

My problem it seems, I'll see what I can do to get back to using real
filesystems more.

> > Does a userspace have to fully try to write to an O_DIRECT file, note
> > the failure, reopen or clear O_DIRECT, and resubmit to use O_DIRECT?
> >
> > While I see that the success/failure of a O_DIRECT read/write can be
> > related to the capabilities of the underlying block device depending
> > on offset/length of the read/write, are there other traps?
>
> It also must be aligned in memory,

yep, knew this one.

> but I'm not quite sure what
> limitations cifs imposes.

  reply	other threads:[~2022-01-26 22:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-25 22:05 fcntl(fd, F_SETFL, O_DIRECT) succeeds followed by EINVAL in write Daniel Black
2022-01-26  3:02 ` Matthew Wilcox
2022-01-26 22:03   ` Daniel Black [this message]
2022-01-26 22:15     ` Matthew Wilcox
2022-01-26 23:16       ` Daniel Black
2022-01-27  2:38   ` Daniel Black
2022-01-27  4:37     ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABVffEPReS0d1dN2eKCry_k6K0LCGNNjGf04O3c7-h6P1Q_9zg@mail.gmail.com \
    --to=daniel@mariadb.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.