From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-btrfs-owner@vger.kernel.org>
Received: from cn.fujitsu.com ([59.151.112.132]:1135 "EHLO
        heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org
        with ESMTP id S1750776AbdBTFfT (ORCPT
        <rfc822;linux-btrfs@vger.kernel.org>);
        Mon, 20 Feb 2017 00:35:19 -0500
Subject: Re: [PATCH] qgroup: Retry after commit on getting EDQUOT
To: Goldwyn Rodrigues <rgoldwyn@suse.de>, <linux-btrfs@vger.kernel.org>
References: <20170219133059.20939-1-rgoldwyn@suse.de>
 <41d7ef6d-28fe-7d32-982f-57823e3ce943@cn.fujitsu.com>
 <227e0a99-5eae-03d4-1a9a-b9bc45861c8a@suse.de>
CC: Goldwyn Rodrigues <rgoldwyn@suse.com>
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
Message-ID: <378f0cca-67fe-212d-f6fe-b5f248cb85ee@cn.fujitsu.com>
Date: Mon, 20 Feb 2017 13:35:02 +0800
MIME-Version: 1.0
In-Reply-To: <227e0a99-5eae-03d4-1a9a-b9bc45861c8a@suse.de>
Content-Type: text/plain; charset="utf-8"; format=flowed
Sender: linux-btrfs-owner@vger.kernel.org
List-ID: <linux-btrfs.vger.kernel.org>


At 02/20/2017 12:35 PM, Goldwyn Rodrigues wrote:
> Hi Qu,
>
> On 02/19/2017 09:07 PM, Qu Wenruo wrote:
>>
>>
>> At 02/19/2017 09:30 PM, Goldwyn Rodrigues wrote:
>>> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>>
>>> We are facing the same problem with EDQUOT which was experienced with
>>> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC,
>>> but
>>> here is a fix. Let me know if it is too big a hammer.
>>>
>>> Quotas are reserved during the start of an operation, incrementing
>>> qg->reserved. However, it is written to disk in a commit_transaction
>>> which could take as long as commit_interval. In the meantime there
>>> could be deletions which are not accounted for because deletions are
>>> accounted for only while committed (free_refroot). So, when we get
>>> a EDQUOT flush the data to disk and try again.
>>
>> IIRC Jeff submitted a qgroup patch to allow unlink to be done even we
>> already hit EDQUOT.
>> https://patchwork.kernel.org/patch/9537193/
>>
>> I think that's a better solution, which is quite like what we do to
>> handle ENOSPC.
>>
>
> These are two separate issues. This issue is where qg->reserved gets
> over-inflated because of repeated deletions and recreations within a
> commit_interval.

My fault, that's indeed two different bugs.

And I succeeded to reproduce the bug.
>
> Jeff's patch deals with allowing removal even if the quota is exceeded
> so that eventually there can be space freed.
>
> I would suggest you apply Jeff's patch and try to run the script I have
> presented.
>
>
>> Thanks,
>> Qu
>>
>>>
>>> I combined the conditions of rfer and excl to reduce code lines, though
>>> the condition looks uglier.
>>>
>>> Here is a sample script which shows this issue.
>>>
>>> DEVICE=/dev/vdb
>>> MOUNTPOINT=/mnt
>>> TESTVOL=$MOUNTPOINT/tmp
>>> QUOTA=5
>>> PROG=btrfs
>>> DD_BS="4k"
>>> DD_COUNT="256"
>>> RUN_TIMES=5000
>>>
>>> mkfs.btrfs -f $DEVICE
>>> mount -o commit=240 $DEVICE $MOUNTPOINT
>>> $PROG subvolume create $TESTVOL
>>> $PROG quota enable $TESTVOL
>>> $PROG qgroup limit ${QUOTA}G $TESTVOL
>>>
>>> typeset -i DD_RUN_GOOD
>>> typeset -i QUOTA
>>>
>>> function _check_cmd() {
>>>         if [[ ${?} > 0 ]]; then
>>>                 echo -n "$(date) E: Running previous command"
>>>                 echo ${*}
>>>         echo "Without sync"
>>>         $PROG qgroup show -pcreFf ${TESTVOL}
>>>         echo "With sync"
>>>         $PROG qgroup show -pcreFf --sync ${TESTVOL}
>>>                 exit 1
>>>         fi
>>> }
>>>
>>> while true; do
>>> DD_RUN_GOOD=$RUN_TIMES
>>>
>>> while (( ${DD_RUN_GOOD} != 0 )); do
>>>     dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS}
>>> count=${DD_COUNT}
>>>     _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD}
>>> bs=${DD_BS} count=${DD_COUNT}"
>>>     DD_RUN_GOOD=(${DD_RUN_GOOD}-1)
>>> done
>>>
>>> $PROG qgroup show -pcref $TESTVOL
>>> echo "----------- Cleanup ---------- "
>>> rm $TESTVOL/quotatest*
>>>
>>> done

It would be better if we can reduce the reproduce time and submit it as 
a fstest test case.

>>>
>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>>> ---
>>>  fs/btrfs/qgroup.c | 28 +++++++++++++++++-----------
>>>  1 file changed, 17 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index 4700cac..9ace407 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root
>>> *root, u64 num_bytes)
>>>      struct btrfs_fs_info *fs_info = root->fs_info;
>>>      u64 ref_root = root->root_key.objectid;
>>>      int ret = 0;
>>> +    int retried = 0;
>>>      struct ulist_node *unode;
>>>      struct ulist_iterator uiter;
>>>
>>> @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root
>>> *root, u64 num_bytes)
>>>
>>>      if (num_bytes == 0)
>>>          return 0;
>>> -
>>> +retry:
>>>      spin_lock(&fs_info->qgroup_lock);
>>>      quota_root = fs_info->quota_root;
>>>      if (!quota_root)
>>> @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root
>>> *root, u64 num_bytes)
>>>
>>>          qg = u64_to_ptr(unode->aux);
>>>
>>> -        if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>> -            qg->reserved + (s64)qg->rfer + num_bytes >
>>> -            qg->max_rfer) {
>>> -            ret = -EDQUOT;
>>> -            goto out;
>>> -        }
>>> -
>>> -        if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>> -            qg->reserved + (s64)qg->excl + num_bytes >
>>> -            qg->max_excl) {
>>> +        if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) &&
>>> +              qg->reserved + (s64)qg->rfer + num_bytes >
>>> qg->max_rfer) ||
>>> +            ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) &&
>>> +            qg->reserved + (s64)qg->excl + num_bytes > qg->max_excl)) {
>>> +            if (!retried && qg->reserved > 0) {

Does the qg->reserved > 0 check has any extra meaning here?

If qg->reserved equals to 0, IIRC it won't goes here anyway.

Or did I miss something?

Thanks,
Qu

>>> +                struct btrfs_trans_handle *trans;
>>> +                spin_unlock(&fs_info->qgroup_lock);
>>> +                btrfs_start_delalloc_roots(fs_info, 0, -1);
>>> +                trans = btrfs_join_transaction(root);
>>> +                if (IS_ERR(trans))
>>> +                    return PTR_ERR(trans);
>>> +                btrfs_commit_transaction(trans, root);
>>> +                retried++;
>>> +                goto retry;
>>> +            }
>>>              ret = -EDQUOT;
>>>              goto out;
>>>          }
>>>
>>
>>
>