All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Henriques <lhenriques@suse.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Yan, Zheng" <zyan@redhat.com>,
	Nikolay Borisov <nborisov@suse.com>,
	fstests@vger.kernel.org, ceph-devel@vger.kernel.org
Subject: Re: [RFC PATCH 2/2] ceph: test basic ceph.quota.max_bytes quota
Date: Tue, 16 Apr 2019 11:48:12 +0100	[thread overview]
Message-ID: <874l6yrztf.fsf@suse.com> (raw)
In-Reply-To: <20190416081356.GB1454@dread.disaster.area> (Dave Chinner's message of "Tue, 16 Apr 2019 18:13:56 +1000")

Dave Chinner <david@fromorbit.com> writes:

> On Mon, Apr 15, 2019 at 10:16:18AM +0800, Yan, Zheng wrote:
>> On 4/15/19 6:15 AM, Dave Chinner wrote:
>> > On Fri, Apr 12, 2019 at 11:37:55AM +0800, Yan, Zheng wrote:
>> > > On 4/12/19 9:15 AM, Dave Chinner wrote:
>> > > > On Thu, Apr 04, 2019 at 11:18:22AM +0100, Luis Henriques wrote:
>> > > > > Dave Chinner <david@fromorbit.com> writes:
>> > > For DSYNC write, client has already written data to object store. If client
>> > > crashes, MDS will set file to 'recovering' state and probe file size by
>> > > checking object store. Accessing the file is blocked during recovery.
>> > 
>> > IOWs, ceph allows data integrity writes to the object store even
>> > though those writes breach  limits on that object store? i.e.
>> > ceph quota essentially ignores O_SYNC/O_DSYNC metadata requirements?
>> > 
>> 
>> Current cephfs quota implementation checks quota (compare i_size and quota
>> setting) at very beginning of ceph_write_iter(). Nothing do with O_SYNC and
>> O_DSYNC.
>
> Hold on, if the quota is checked on the client at the start of every
> write, then why is it not enforced /exactly/? Where does this "we
> didn't notice we'd run out of quota" overrun come from then?

Ok, there's an extra piece of information that I don't think it was
mentioned in the thread yet, which is the (recursive) directory
statistics.  These stats are maintained by the MDS and it's against
these stats that the client actually checks if there's an overrun.  The
checks can be done against the amount of bytes ('max_bytes' quota)
and/or number of files ('max_files' quota), depending on which quotas
are enabled.

_Maybe_ there's space for some optimization on the client-side by
locally updating the stats received from the MDS with local writes,
files creation/deletion, etc.  But I'm not sure it's worth the effort,
because:

- for each operation (write, truncate, create, ...) we would also have
  the extra overhead of walking up the directory tree (or at least the
  quota realms tree) to update the stats for each directory;  and

- we may have more clients writing to the same directories, and thus
  messing up with the dir quotas anyway.  I.e. the quotas overrun
  problem would still be there, and we would still require to open/close
  files in the test.

Hopefully this helps clarifying why the open/close loop is a needed hack
with the current quotas design.

Cheers,
-- 
Luis

>
> i.e. the test changes are implying that quota is not accurately
> checked and enforced on every write, and that there is something
> less that exact about quotas on the ceph client. Yet you say they
> are checked on every write.
>
> Where does the need to open/close files and force flushing client
> state to the MDS come from if quota is actually being checked
> on every write as you say it is?
>
> i.e. I'm trying to work out if this change is just working around
> bugs in ceph quota accounting and I'm being told conflicting things
> about how the ceph client accounts and enforces quota limits. Can
> you please clearly explain how the quota enforcedment works and why
> close/open between writes is necessary for accurate quota
> enforcement so that we have some clue as to why these rubbery limit
> hacks are necessary?
>
> If we don't understand why a test does something and it's not
> adequately documented, we can't really be expected to maintain
> it in working order....
>
> Cheers,
>
> Dave.

  reply	other threads:[~2019-04-16 10:48 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-02 10:34 [RFC PATCH 0/2] Initial CephFS tests Luis Henriques
2019-04-02 10:34 ` [RFC PATCH 1/2] ceph: test basic ceph.quota.max_files quota Luis Henriques
2019-04-02 10:34 ` [RFC PATCH 2/2] ceph: test basic ceph.quota.max_bytes quota Luis Henriques
2019-04-02 21:09   ` Dave Chinner
2019-04-03  9:45     ` Luis Henriques
2019-04-03 12:17       ` Nikolay Borisov
2019-04-03 13:19         ` Luis Henriques
2019-04-03 21:47           ` Dave Chinner
2019-04-04 10:18             ` Luis Henriques
2019-04-12  1:15               ` Dave Chinner
2019-04-12  3:37                 ` Yan, Zheng
2019-04-12 11:04                   ` Luis Henriques
2019-04-14 22:15                   ` Dave Chinner
2019-04-15  2:16                     ` Yan, Zheng
2019-04-16  8:13                       ` Dave Chinner
2019-04-16 10:48                         ` Luis Henriques [this message]
2019-04-16 18:38                           ` Gregory Farnum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874l6yrztf.fsf@suse.com \
    --to=lhenriques@suse.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=zyan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.