* [PATCH] qgroup: Retry after commit on getting EDQUOT @ 2017-02-19 13:30 Goldwyn Rodrigues 2017-02-20 3:07 ` Qu Wenruo ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Goldwyn Rodrigues @ 2017-02-19 13:30 UTC (permalink / raw) To: linux-btrfs; +Cc: Goldwyn Rodrigues From: Goldwyn Rodrigues <rgoldwyn@suse.com> We are facing the same problem with EDQUOT which was experienced with ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but here is a fix. Let me know if it is too big a hammer. Quotas are reserved during the start of an operation, incrementing qg->reserved. However, it is written to disk in a commit_transaction which could take as long as commit_interval. In the meantime there could be deletions which are not accounted for because deletions are accounted for only while committed (free_refroot). So, when we get a EDQUOT flush the data to disk and try again. I combined the conditions of rfer and excl to reduce code lines, though the condition looks uglier. Here is a sample script which shows this issue. DEVICE=/dev/vdb MOUNTPOINT=/mnt TESTVOL=$MOUNTPOINT/tmp QUOTA=5 PROG=btrfs DD_BS="4k" DD_COUNT="256" RUN_TIMES=5000 mkfs.btrfs -f $DEVICE mount -o commit=240 $DEVICE $MOUNTPOINT $PROG subvolume create $TESTVOL $PROG quota enable $TESTVOL $PROG qgroup limit ${QUOTA}G $TESTVOL typeset -i DD_RUN_GOOD typeset -i QUOTA function _check_cmd() { if [[ ${?} > 0 ]]; then echo -n "$(date) E: Running previous command" echo ${*} echo "Without sync" $PROG qgroup show -pcreFf ${TESTVOL} echo "With sync" $PROG qgroup show -pcreFf --sync ${TESTVOL} exit 1 fi } while true; do DD_RUN_GOOD=$RUN_TIMES while (( ${DD_RUN_GOOD} != 0 )); do dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT} _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}" DD_RUN_GOOD=(${DD_RUN_GOOD}-1) done $PROG qgroup show -pcref $TESTVOL echo "----------- Cleanup ---------- " rm $TESTVOL/quotatest* done Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> --- fs/btrfs/qgroup.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 4700cac..9ace407 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) struct btrfs_fs_info *fs_info = root->fs_info; u64 ref_root = root->root_key.objectid; int ret = 0; + int retried = 0; struct ulist_node *unode; struct ulist_iterator uiter; @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) if (num_bytes == 0) return 0; - +retry: spin_lock(&fs_info->qgroup_lock); quota_root = fs_info->quota_root; if (!quota_root) @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) qg = u64_to_ptr(unode->aux); - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && - qg->reserved + (s64)qg->rfer + num_bytes > - qg->max_rfer) { - ret = -EDQUOT; - goto out; - } - - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && - qg->reserved + (s64)qg->excl + num_bytes > - qg->max_excl) { + if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && + qg->reserved + (s64)qg->rfer + num_bytes > qg->max_rfer) || + ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && + qg->reserved + (s64)qg->excl + num_bytes > qg->max_excl)) { + if (!retried && qg->reserved > 0) { + struct btrfs_trans_handle *trans; + spin_unlock(&fs_info->qgroup_lock); + btrfs_start_delalloc_roots(fs_info, 0, -1); + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) + return PTR_ERR(trans); + btrfs_commit_transaction(trans, root); + retried++; + goto retry; + } ret = -EDQUOT; goto out; } -- 2.10.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-02-19 13:30 [PATCH] qgroup: Retry after commit on getting EDQUOT Goldwyn Rodrigues @ 2017-02-20 3:07 ` Qu Wenruo 2017-02-20 4:35 ` Goldwyn Rodrigues 2017-03-09 16:54 ` David Sterba 2017-03-10 16:06 ` Filipe Manana 2 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2017-02-20 3:07 UTC (permalink / raw) To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues At 02/19/2017 09:30 PM, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues <rgoldwyn@suse.com> > > We are facing the same problem with EDQUOT which was experienced with > ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but > here is a fix. Let me know if it is too big a hammer. > > Quotas are reserved during the start of an operation, incrementing > qg->reserved. However, it is written to disk in a commit_transaction > which could take as long as commit_interval. In the meantime there > could be deletions which are not accounted for because deletions are > accounted for only while committed (free_refroot). So, when we get > a EDQUOT flush the data to disk and try again. IIRC Jeff submitted a qgroup patch to allow unlink to be done even we already hit EDQUOT. https://patchwork.kernel.org/patch/9537193/ I think that's a better solution, which is quite like what we do to handle ENOSPC. Thanks, Qu > > I combined the conditions of rfer and excl to reduce code lines, though > the condition looks uglier. > > Here is a sample script which shows this issue. > > DEVICE=/dev/vdb > MOUNTPOINT=/mnt > TESTVOL=$MOUNTPOINT/tmp > QUOTA=5 > PROG=btrfs > DD_BS="4k" > DD_COUNT="256" > RUN_TIMES=5000 > > mkfs.btrfs -f $DEVICE > mount -o commit=240 $DEVICE $MOUNTPOINT > $PROG subvolume create $TESTVOL > $PROG quota enable $TESTVOL > $PROG qgroup limit ${QUOTA}G $TESTVOL > > typeset -i DD_RUN_GOOD > typeset -i QUOTA > > function _check_cmd() { > if [[ ${?} > 0 ]]; then > echo -n "$(date) E: Running previous command" > echo ${*} > echo "Without sync" > $PROG qgroup show -pcreFf ${TESTVOL} > echo "With sync" > $PROG qgroup show -pcreFf --sync ${TESTVOL} > exit 1 > fi > } > > while true; do > DD_RUN_GOOD=$RUN_TIMES > > while (( ${DD_RUN_GOOD} != 0 )); do > dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT} > _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}" > DD_RUN_GOOD=(${DD_RUN_GOOD}-1) > done > > $PROG qgroup show -pcref $TESTVOL > echo "----------- Cleanup ---------- " > rm $TESTVOL/quotatest* > > done > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> > --- > fs/btrfs/qgroup.c | 28 +++++++++++++++++----------- > 1 file changed, 17 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 4700cac..9ace407 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) > struct btrfs_fs_info *fs_info = root->fs_info; > u64 ref_root = root->root_key.objectid; > int ret = 0; > + int retried = 0; > struct ulist_node *unode; > struct ulist_iterator uiter; > > @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) > > if (num_bytes == 0) > return 0; > - > +retry: > spin_lock(&fs_info->qgroup_lock); > quota_root = fs_info->quota_root; > if (!quota_root) > @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) > > qg = u64_to_ptr(unode->aux); > > - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && > - qg->reserved + (s64)qg->rfer + num_bytes > > - qg->max_rfer) { > - ret = -EDQUOT; > - goto out; > - } > - > - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && > - qg->reserved + (s64)qg->excl + num_bytes > > - qg->max_excl) { > + if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && > + qg->reserved + (s64)qg->rfer + num_bytes > qg->max_rfer) || > + ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && > + qg->reserved + (s64)qg->excl + num_bytes > qg->max_excl)) { > + if (!retried && qg->reserved > 0) { > + struct btrfs_trans_handle *trans; > + spin_unlock(&fs_info->qgroup_lock); > + btrfs_start_delalloc_roots(fs_info, 0, -1); > + trans = btrfs_join_transaction(root); > + if (IS_ERR(trans)) > + return PTR_ERR(trans); > + btrfs_commit_transaction(trans, root); > + retried++; > + goto retry; > + } > ret = -EDQUOT; > goto out; > } > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-02-20 3:07 ` Qu Wenruo @ 2017-02-20 4:35 ` Goldwyn Rodrigues 2017-02-20 5:35 ` Qu Wenruo 0 siblings, 1 reply; 10+ messages in thread From: Goldwyn Rodrigues @ 2017-02-20 4:35 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Goldwyn Rodrigues Hi Qu, On 02/19/2017 09:07 PM, Qu Wenruo wrote: > > > At 02/19/2017 09:30 PM, Goldwyn Rodrigues wrote: >> From: Goldwyn Rodrigues <rgoldwyn@suse.com> >> >> We are facing the same problem with EDQUOT which was experienced with >> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, >> but >> here is a fix. Let me know if it is too big a hammer. >> >> Quotas are reserved during the start of an operation, incrementing >> qg->reserved. However, it is written to disk in a commit_transaction >> which could take as long as commit_interval. In the meantime there >> could be deletions which are not accounted for because deletions are >> accounted for only while committed (free_refroot). So, when we get >> a EDQUOT flush the data to disk and try again. > > IIRC Jeff submitted a qgroup patch to allow unlink to be done even we > already hit EDQUOT. > https://patchwork.kernel.org/patch/9537193/ > > I think that's a better solution, which is quite like what we do to > handle ENOSPC. > These are two separate issues. This issue is where qg->reserved gets over-inflated because of repeated deletions and recreations within a commit_interval. Jeff's patch deals with allowing removal even if the quota is exceeded so that eventually there can be space freed. I would suggest you apply Jeff's patch and try to run the script I have presented. > Thanks, > Qu > >> >> I combined the conditions of rfer and excl to reduce code lines, though >> the condition looks uglier. >> >> Here is a sample script which shows this issue. >> >> DEVICE=/dev/vdb >> MOUNTPOINT=/mnt >> TESTVOL=$MOUNTPOINT/tmp >> QUOTA=5 >> PROG=btrfs >> DD_BS="4k" >> DD_COUNT="256" >> RUN_TIMES=5000 >> >> mkfs.btrfs -f $DEVICE >> mount -o commit=240 $DEVICE $MOUNTPOINT >> $PROG subvolume create $TESTVOL >> $PROG quota enable $TESTVOL >> $PROG qgroup limit ${QUOTA}G $TESTVOL >> >> typeset -i DD_RUN_GOOD >> typeset -i QUOTA >> >> function _check_cmd() { >> if [[ ${?} > 0 ]]; then >> echo -n "$(date) E: Running previous command" >> echo ${*} >> echo "Without sync" >> $PROG qgroup show -pcreFf ${TESTVOL} >> echo "With sync" >> $PROG qgroup show -pcreFf --sync ${TESTVOL} >> exit 1 >> fi >> } >> >> while true; do >> DD_RUN_GOOD=$RUN_TIMES >> >> while (( ${DD_RUN_GOOD} != 0 )); do >> dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} >> count=${DD_COUNT} >> _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} >> bs=${DD_BS} count=${DD_COUNT}" >> DD_RUN_GOOD=(${DD_RUN_GOOD}-1) >> done >> >> $PROG qgroup show -pcref $TESTVOL >> echo "----------- Cleanup ---------- " >> rm $TESTVOL/quotatest* >> >> done >> >> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> >> --- >> fs/btrfs/qgroup.c | 28 +++++++++++++++++----------- >> 1 file changed, 17 insertions(+), 11 deletions(-) >> >> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >> index 4700cac..9ace407 100644 >> --- a/fs/btrfs/qgroup.c >> +++ b/fs/btrfs/qgroup.c >> @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root >> *root, u64 num_bytes) >> struct btrfs_fs_info *fs_info = root->fs_info; >> u64 ref_root = root->root_key.objectid; >> int ret = 0; >> + int retried = 0; >> struct ulist_node *unode; >> struct ulist_iterator uiter; >> >> @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root >> *root, u64 num_bytes) >> >> if (num_bytes == 0) >> return 0; >> - >> +retry: >> spin_lock(&fs_info->qgroup_lock); >> quota_root = fs_info->quota_root; >> if (!quota_root) >> @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root >> *root, u64 num_bytes) >> >> qg = u64_to_ptr(unode->aux); >> >> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >> - qg->reserved + (s64)qg->rfer + num_bytes > >> - qg->max_rfer) { >> - ret = -EDQUOT; >> - goto out; >> - } >> - >> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >> - qg->reserved + (s64)qg->excl + num_bytes > >> - qg->max_excl) { >> + if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >> + qg->reserved + (s64)qg->rfer + num_bytes > >> qg->max_rfer) || >> + ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >> + qg->reserved + (s64)qg->excl + num_bytes > qg->max_excl)) { >> + if (!retried && qg->reserved > 0) { >> + struct btrfs_trans_handle *trans; >> + spin_unlock(&fs_info->qgroup_lock); >> + btrfs_start_delalloc_roots(fs_info, 0, -1); >> + trans = btrfs_join_transaction(root); >> + if (IS_ERR(trans)) >> + return PTR_ERR(trans); >> + btrfs_commit_transaction(trans, root); >> + retried++; >> + goto retry; >> + } >> ret = -EDQUOT; >> goto out; >> } >> > > -- Goldwyn ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-02-20 4:35 ` Goldwyn Rodrigues @ 2017-02-20 5:35 ` Qu Wenruo 2017-02-20 12:51 ` Goldwyn Rodrigues 0 siblings, 1 reply; 10+ messages in thread From: Qu Wenruo @ 2017-02-20 5:35 UTC (permalink / raw) To: Goldwyn Rodrigues, linux-btrfs; +Cc: Goldwyn Rodrigues At 02/20/2017 12:35 PM, Goldwyn Rodrigues wrote: > Hi Qu, > > On 02/19/2017 09:07 PM, Qu Wenruo wrote: >> >> >> At 02/19/2017 09:30 PM, Goldwyn Rodrigues wrote: >>> From: Goldwyn Rodrigues <rgoldwyn@suse.com> >>> >>> We are facing the same problem with EDQUOT which was experienced with >>> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, >>> but >>> here is a fix. Let me know if it is too big a hammer. >>> >>> Quotas are reserved during the start of an operation, incrementing >>> qg->reserved. However, it is written to disk in a commit_transaction >>> which could take as long as commit_interval. In the meantime there >>> could be deletions which are not accounted for because deletions are >>> accounted for only while committed (free_refroot). So, when we get >>> a EDQUOT flush the data to disk and try again. >> >> IIRC Jeff submitted a qgroup patch to allow unlink to be done even we >> already hit EDQUOT. >> https://patchwork.kernel.org/patch/9537193/ >> >> I think that's a better solution, which is quite like what we do to >> handle ENOSPC. >> > > These are two separate issues. This issue is where qg->reserved gets > over-inflated because of repeated deletions and recreations within a > commit_interval. My fault, that's indeed two different bugs. And I succeeded to reproduce the bug. > > Jeff's patch deals with allowing removal even if the quota is exceeded > so that eventually there can be space freed. > > I would suggest you apply Jeff's patch and try to run the script I have > presented. > > >> Thanks, >> Qu >> >>> >>> I combined the conditions of rfer and excl to reduce code lines, though >>> the condition looks uglier. >>> >>> Here is a sample script which shows this issue. >>> >>> DEVICE=/dev/vdb >>> MOUNTPOINT=/mnt >>> TESTVOL=$MOUNTPOINT/tmp >>> QUOTA=5 >>> PROG=btrfs >>> DD_BS="4k" >>> DD_COUNT="256" >>> RUN_TIMES=5000 >>> >>> mkfs.btrfs -f $DEVICE >>> mount -o commit=240 $DEVICE $MOUNTPOINT >>> $PROG subvolume create $TESTVOL >>> $PROG quota enable $TESTVOL >>> $PROG qgroup limit ${QUOTA}G $TESTVOL >>> >>> typeset -i DD_RUN_GOOD >>> typeset -i QUOTA >>> >>> function _check_cmd() { >>> if [[ ${?} > 0 ]]; then >>> echo -n "$(date) E: Running previous command" >>> echo ${*} >>> echo "Without sync" >>> $PROG qgroup show -pcreFf ${TESTVOL} >>> echo "With sync" >>> $PROG qgroup show -pcreFf --sync ${TESTVOL} >>> exit 1 >>> fi >>> } >>> >>> while true; do >>> DD_RUN_GOOD=$RUN_TIMES >>> >>> while (( ${DD_RUN_GOOD} != 0 )); do >>> dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} >>> count=${DD_COUNT} >>> _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} >>> bs=${DD_BS} count=${DD_COUNT}" >>> DD_RUN_GOOD=(${DD_RUN_GOOD}-1) >>> done >>> >>> $PROG qgroup show -pcref $TESTVOL >>> echo "----------- Cleanup ---------- " >>> rm $TESTVOL/quotatest* >>> >>> done It would be better if we can reduce the reproduce time and submit it as a fstest test case. >>> >>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> >>> --- >>> fs/btrfs/qgroup.c | 28 +++++++++++++++++----------- >>> 1 file changed, 17 insertions(+), 11 deletions(-) >>> >>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >>> index 4700cac..9ace407 100644 >>> --- a/fs/btrfs/qgroup.c >>> +++ b/fs/btrfs/qgroup.c >>> @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root >>> *root, u64 num_bytes) >>> struct btrfs_fs_info *fs_info = root->fs_info; >>> u64 ref_root = root->root_key.objectid; >>> int ret = 0; >>> + int retried = 0; >>> struct ulist_node *unode; >>> struct ulist_iterator uiter; >>> >>> @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root >>> *root, u64 num_bytes) >>> >>> if (num_bytes == 0) >>> return 0; >>> - >>> +retry: >>> spin_lock(&fs_info->qgroup_lock); >>> quota_root = fs_info->quota_root; >>> if (!quota_root) >>> @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root >>> *root, u64 num_bytes) >>> >>> qg = u64_to_ptr(unode->aux); >>> >>> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >>> - qg->reserved + (s64)qg->rfer + num_bytes > >>> - qg->max_rfer) { >>> - ret = -EDQUOT; >>> - goto out; >>> - } >>> - >>> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >>> - qg->reserved + (s64)qg->excl + num_bytes > >>> - qg->max_excl) { >>> + if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >>> + qg->reserved + (s64)qg->rfer + num_bytes > >>> qg->max_rfer) || >>> + ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >>> + qg->reserved + (s64)qg->excl + num_bytes > qg->max_excl)) { >>> + if (!retried && qg->reserved > 0) { Does the qg->reserved > 0 check has any extra meaning here? If qg->reserved equals to 0, IIRC it won't goes here anyway. Or did I miss something? Thanks, Qu >>> + struct btrfs_trans_handle *trans; >>> + spin_unlock(&fs_info->qgroup_lock); >>> + btrfs_start_delalloc_roots(fs_info, 0, -1); >>> + trans = btrfs_join_transaction(root); >>> + if (IS_ERR(trans)) >>> + return PTR_ERR(trans); >>> + btrfs_commit_transaction(trans, root); >>> + retried++; >>> + goto retry; >>> + } >>> ret = -EDQUOT; >>> goto out; >>> } >>> >> >> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-02-20 5:35 ` Qu Wenruo @ 2017-02-20 12:51 ` Goldwyn Rodrigues 2017-02-21 0:48 ` Qu Wenruo 0 siblings, 1 reply; 10+ messages in thread From: Goldwyn Rodrigues @ 2017-02-20 12:51 UTC (permalink / raw) To: Qu Wenruo, Goldwyn Rodrigues, linux-btrfs On 02/19/2017 11:35 PM, Qu Wenruo wrote: > > > At 02/20/2017 12:35 PM, Goldwyn Rodrigues wrote: >> Hi Qu, >> >> On 02/19/2017 09:07 PM, Qu Wenruo wrote: >>> >>> >>> At 02/19/2017 09:30 PM, Goldwyn Rodrigues wrote: >>>> From: Goldwyn Rodrigues <rgoldwyn@suse.com> >>>> >>>> We are facing the same problem with EDQUOT which was experienced with >>>> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, >>>> but >>>> here is a fix. Let me know if it is too big a hammer. >>>> >>>> Quotas are reserved during the start of an operation, incrementing >>>> qg->reserved. However, it is written to disk in a commit_transaction >>>> which could take as long as commit_interval. In the meantime there >>>> could be deletions which are not accounted for because deletions are >>>> accounted for only while committed (free_refroot). So, when we get >>>> a EDQUOT flush the data to disk and try again. >>> >>> IIRC Jeff submitted a qgroup patch to allow unlink to be done even we >>> already hit EDQUOT. >>> https://patchwork.kernel.org/patch/9537193/ >>> >>> I think that's a better solution, which is quite like what we do to >>> handle ENOSPC. >>> >> >> These are two separate issues. This issue is where qg->reserved gets >> over-inflated because of repeated deletions and recreations within a >> commit_interval. > > My fault, that's indeed two different bugs. > > And I succeeded to reproduce the bug. >> >> Jeff's patch deals with allowing removal even if the quota is exceeded >> so that eventually there can be space freed. >> >> I would suggest you apply Jeff's patch and try to run the script I have >> presented. >> >> >>> Thanks, >>> Qu >>> >>>> >>>> I combined the conditions of rfer and excl to reduce code lines, though >>>> the condition looks uglier. >>>> >>>> Here is a sample script which shows this issue. >>>> >>>> DEVICE=/dev/vdb >>>> MOUNTPOINT=/mnt >>>> TESTVOL=$MOUNTPOINT/tmp >>>> QUOTA=5 >>>> PROG=btrfs >>>> DD_BS="4k" >>>> DD_COUNT="256" >>>> RUN_TIMES=5000 >>>> >>>> mkfs.btrfs -f $DEVICE >>>> mount -o commit=240 $DEVICE $MOUNTPOINT >>>> $PROG subvolume create $TESTVOL >>>> $PROG quota enable $TESTVOL >>>> $PROG qgroup limit ${QUOTA}G $TESTVOL >>>> >>>> typeset -i DD_RUN_GOOD >>>> typeset -i QUOTA >>>> >>>> function _check_cmd() { >>>> if [[ ${?} > 0 ]]; then >>>> echo -n "$(date) E: Running previous command" >>>> echo ${*} >>>> echo "Without sync" >>>> $PROG qgroup show -pcreFf ${TESTVOL} >>>> echo "With sync" >>>> $PROG qgroup show -pcreFf --sync ${TESTVOL} >>>> exit 1 >>>> fi >>>> } >>>> >>>> while true; do >>>> DD_RUN_GOOD=$RUN_TIMES >>>> >>>> while (( ${DD_RUN_GOOD} != 0 )); do >>>> dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} >>>> count=${DD_COUNT} >>>> _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} >>>> bs=${DD_BS} count=${DD_COUNT}" >>>> DD_RUN_GOOD=(${DD_RUN_GOOD}-1) >>>> done >>>> >>>> $PROG qgroup show -pcref $TESTVOL >>>> echo "----------- Cleanup ---------- " >>>> rm $TESTVOL/quotatest* >>>> >>>> done > > It would be better if we can reduce the reproduce time and submit it as > a fstest test case. Yes, unfortunately, it is more probabilistic than deterministic. But yes, reducing the size and increasing the commit_interval can improve the time. > >>>> >>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> >>>> --- >>>> fs/btrfs/qgroup.c | 28 +++++++++++++++++----------- >>>> 1 file changed, 17 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >>>> index 4700cac..9ace407 100644 >>>> --- a/fs/btrfs/qgroup.c >>>> +++ b/fs/btrfs/qgroup.c >>>> @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root >>>> *root, u64 num_bytes) >>>> struct btrfs_fs_info *fs_info = root->fs_info; >>>> u64 ref_root = root->root_key.objectid; >>>> int ret = 0; >>>> + int retried = 0; >>>> struct ulist_node *unode; >>>> struct ulist_iterator uiter; >>>> >>>> @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root >>>> *root, u64 num_bytes) >>>> >>>> if (num_bytes == 0) >>>> return 0; >>>> - >>>> +retry: >>>> spin_lock(&fs_info->qgroup_lock); >>>> quota_root = fs_info->quota_root; >>>> if (!quota_root) >>>> @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root >>>> *root, u64 num_bytes) >>>> >>>> qg = u64_to_ptr(unode->aux); >>>> >>>> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >>>> - qg->reserved + (s64)qg->rfer + num_bytes > >>>> - qg->max_rfer) { >>>> - ret = -EDQUOT; >>>> - goto out; >>>> - } >>>> - >>>> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >>>> - qg->reserved + (s64)qg->excl + num_bytes > >>>> - qg->max_excl) { >>>> + if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >>>> + qg->reserved + (s64)qg->rfer + num_bytes > >>>> qg->max_rfer) || >>>> + ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >>>> + qg->reserved + (s64)qg->excl + num_bytes > >>>> qg->max_excl)) { >>>> + if (!retried && qg->reserved > 0) { > > Does the qg->reserved > 0 check has any extra meaning here? > qg->reserved greater than zero means the qgroup tree is dirty and needs to be committed to disk. It may or may not have quota reductions which can only be confirmed after free_ref_root is called in commit. While it may be a small probability, it will save I/O processing when the quotas have been reached and more is demanded again and again. > If qg->reserved equals to 0, IIRC it won't goes here anyway. > > Or did I miss something? In this loop the checks are performed. qg->reserved is incremented in the next loop once we find all the qgroups to be incremented. Why do you say the instruction pointer will not be in the area if qg->reserved is zero. What if the qgroup tree is read from the disk or committed to disk and the quotas have been met/exceeded? > > Thanks, > Qu > >>>> + struct btrfs_trans_handle *trans; >>>> + spin_unlock(&fs_info->qgroup_lock); >>>> + btrfs_start_delalloc_roots(fs_info, 0, -1); >>>> + trans = btrfs_join_transaction(root); >>>> + if (IS_ERR(trans)) >>>> + return PTR_ERR(trans); >>>> + btrfs_commit_transaction(trans, root); >>>> + retried++; >>>> + goto retry; >>>> + } >>>> ret = -EDQUOT; >>>> goto out; >>>> } >>>> >>> >>> >> > > > -- Goldwyn ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-02-20 12:51 ` Goldwyn Rodrigues @ 2017-02-21 0:48 ` Qu Wenruo 0 siblings, 0 replies; 10+ messages in thread From: Qu Wenruo @ 2017-02-21 0:48 UTC (permalink / raw) To: Goldwyn Rodrigues, Goldwyn Rodrigues, linux-btrfs At 02/20/2017 08:51 PM, Goldwyn Rodrigues wrote: > > > On 02/19/2017 11:35 PM, Qu Wenruo wrote: >> >> >> At 02/20/2017 12:35 PM, Goldwyn Rodrigues wrote: >>> Hi Qu, >>> >>> On 02/19/2017 09:07 PM, Qu Wenruo wrote: >>>> >>>> >>>> At 02/19/2017 09:30 PM, Goldwyn Rodrigues wrote: >>>>> From: Goldwyn Rodrigues <rgoldwyn@suse.com> >>>>> >>>>> We are facing the same problem with EDQUOT which was experienced with >>>>> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, >>>>> but >>>>> here is a fix. Let me know if it is too big a hammer. >>>>> >>>>> Quotas are reserved during the start of an operation, incrementing >>>>> qg->reserved. However, it is written to disk in a commit_transaction >>>>> which could take as long as commit_interval. In the meantime there >>>>> could be deletions which are not accounted for because deletions are >>>>> accounted for only while committed (free_refroot). So, when we get >>>>> a EDQUOT flush the data to disk and try again. >>>> >>>> IIRC Jeff submitted a qgroup patch to allow unlink to be done even we >>>> already hit EDQUOT. >>>> https://patchwork.kernel.org/patch/9537193/ >>>> >>>> I think that's a better solution, which is quite like what we do to >>>> handle ENOSPC. >>>> >>> >>> These are two separate issues. This issue is where qg->reserved gets >>> over-inflated because of repeated deletions and recreations within a >>> commit_interval. >> >> My fault, that's indeed two different bugs. >> >> And I succeeded to reproduce the bug. >>> >>> Jeff's patch deals with allowing removal even if the quota is exceeded >>> so that eventually there can be space freed. >>> >>> I would suggest you apply Jeff's patch and try to run the script I have >>> presented. >>> >>> >>>> Thanks, >>>> Qu >>>> >>>>> >>>>> I combined the conditions of rfer and excl to reduce code lines, though >>>>> the condition looks uglier. >>>>> >>>>> Here is a sample script which shows this issue. >>>>> >>>>> DEVICE=/dev/vdb >>>>> MOUNTPOINT=/mnt >>>>> TESTVOL=$MOUNTPOINT/tmp >>>>> QUOTA=5 >>>>> PROG=btrfs >>>>> DD_BS="4k" >>>>> DD_COUNT="256" >>>>> RUN_TIMES=5000 >>>>> >>>>> mkfs.btrfs -f $DEVICE >>>>> mount -o commit=240 $DEVICE $MOUNTPOINT >>>>> $PROG subvolume create $TESTVOL >>>>> $PROG quota enable $TESTVOL >>>>> $PROG qgroup limit ${QUOTA}G $TESTVOL >>>>> >>>>> typeset -i DD_RUN_GOOD >>>>> typeset -i QUOTA >>>>> >>>>> function _check_cmd() { >>>>> if [[ ${?} > 0 ]]; then >>>>> echo -n "$(date) E: Running previous command" >>>>> echo ${*} >>>>> echo "Without sync" >>>>> $PROG qgroup show -pcreFf ${TESTVOL} >>>>> echo "With sync" >>>>> $PROG qgroup show -pcreFf --sync ${TESTVOL} >>>>> exit 1 >>>>> fi >>>>> } >>>>> >>>>> while true; do >>>>> DD_RUN_GOOD=$RUN_TIMES >>>>> >>>>> while (( ${DD_RUN_GOOD} != 0 )); do >>>>> dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} >>>>> count=${DD_COUNT} >>>>> _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} >>>>> bs=${DD_BS} count=${DD_COUNT}" >>>>> DD_RUN_GOOD=(${DD_RUN_GOOD}-1) >>>>> done >>>>> >>>>> $PROG qgroup show -pcref $TESTVOL >>>>> echo "----------- Cleanup ---------- " >>>>> rm $TESTVOL/quotatest* >>>>> >>>>> done >> >> It would be better if we can reduce the reproduce time and submit it as >> a fstest test case. > > Yes, unfortunately, it is more probabilistic than deterministic. But > yes, reducing the size and increasing the commit_interval can improve > the time. > >> >>>>> >>>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> >>>>> --- >>>>> fs/btrfs/qgroup.c | 28 +++++++++++++++++----------- >>>>> 1 file changed, 17 insertions(+), 11 deletions(-) >>>>> >>>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >>>>> index 4700cac..9ace407 100644 >>>>> --- a/fs/btrfs/qgroup.c >>>>> +++ b/fs/btrfs/qgroup.c >>>>> @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root >>>>> *root, u64 num_bytes) >>>>> struct btrfs_fs_info *fs_info = root->fs_info; >>>>> u64 ref_root = root->root_key.objectid; >>>>> int ret = 0; >>>>> + int retried = 0; >>>>> struct ulist_node *unode; >>>>> struct ulist_iterator uiter; >>>>> >>>>> @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root >>>>> *root, u64 num_bytes) >>>>> >>>>> if (num_bytes == 0) >>>>> return 0; >>>>> - >>>>> +retry: >>>>> spin_lock(&fs_info->qgroup_lock); >>>>> quota_root = fs_info->quota_root; >>>>> if (!quota_root) >>>>> @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root >>>>> *root, u64 num_bytes) >>>>> >>>>> qg = u64_to_ptr(unode->aux); >>>>> >>>>> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >>>>> - qg->reserved + (s64)qg->rfer + num_bytes > >>>>> - qg->max_rfer) { >>>>> - ret = -EDQUOT; >>>>> - goto out; >>>>> - } >>>>> - >>>>> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >>>>> - qg->reserved + (s64)qg->excl + num_bytes > >>>>> - qg->max_excl) { >>>>> + if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >>>>> + qg->reserved + (s64)qg->rfer + num_bytes > >>>>> qg->max_rfer) || >>>>> + ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >>>>> + qg->reserved + (s64)qg->excl + num_bytes > >>>>> qg->max_excl)) { >>>>> + if (!retried && qg->reserved > 0) { >> >> Does the qg->reserved > 0 check has any extra meaning here? >> > > qg->reserved greater than zero means the qgroup tree is dirty and needs > to be committed to disk. It may or may not have quota reductions which > can only be confirmed after free_ref_root is called in commit. While it > may be a small probability, it will save I/O processing when the quotas > have been reached and more is demanded again and again. > >> If qg->reserved equals to 0, IIRC it won't goes here anyway. >> >> Or did I miss something? > > In this loop the checks are performed. qg->reserved is incremented in > the next loop once we find all the qgroups to be incremented. Why do you > say the instruction pointer will not be in the area if qg->reserved is > zero. What if the qgroup tree is read from the disk or committed to disk > and the quotas have been met/exceeded? Right, I forgot that reserve is increased in next loop. Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Thanks, Qu > >> >> Thanks, >> Qu >> >>>>> + struct btrfs_trans_handle *trans; >>>>> + spin_unlock(&fs_info->qgroup_lock); >>>>> + btrfs_start_delalloc_roots(fs_info, 0, -1); >>>>> + trans = btrfs_join_transaction(root); >>>>> + if (IS_ERR(trans)) >>>>> + return PTR_ERR(trans); >>>>> + btrfs_commit_transaction(trans, root); >>>>> + retried++; >>>>> + goto retry; >>>>> + } >>>>> ret = -EDQUOT; >>>>> goto out; >>>>> } >>>>> >>>> >>>> >>> >> >> >> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-02-19 13:30 [PATCH] qgroup: Retry after commit on getting EDQUOT Goldwyn Rodrigues 2017-02-20 3:07 ` Qu Wenruo @ 2017-03-09 16:54 ` David Sterba 2017-03-10 15:56 ` Goldwyn Rodrigues 2017-03-10 16:06 ` Filipe Manana 2 siblings, 1 reply; 10+ messages in thread From: David Sterba @ 2017-03-09 16:54 UTC (permalink / raw) To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues On Sun, Feb 19, 2017 at 07:30:59AM -0600, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues <rgoldwyn@suse.com> > > We are facing the same problem with EDQUOT which was experienced with > ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but > here is a fix. Let me know if it is too big a hammer. On first look it does seem too heavy. Do you have numbers what's the slowdown? If it's bearable we can stick with it and implement the ticketed approach. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-03-09 16:54 ` David Sterba @ 2017-03-10 15:56 ` Goldwyn Rodrigues 0 siblings, 0 replies; 10+ messages in thread From: Goldwyn Rodrigues @ 2017-03-10 15:56 UTC (permalink / raw) To: dsterba, linux-btrfs, Goldwyn Rodrigues On 03/09/2017 10:54 AM, David Sterba wrote: > On Sun, Feb 19, 2017 at 07:30:59AM -0600, Goldwyn Rodrigues wrote: >> From: Goldwyn Rodrigues <rgoldwyn@suse.com> >> >> We are facing the same problem with EDQUOT which was experienced with >> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but >> here is a fix. Let me know if it is too big a hammer. > > On first look it does seem too heavy. Do you have numbers what's the > slowdown? If it's bearable we can stick with it and implement the > ticketed approach. > No, I don't have numbers because it is the case of when it reaches low on space only. When it meets the condition to return EDQUOUT, tt starts the commit in order to get the data to disk, or more importantly the discards, and get the true value to check. OTOH, a ticketing system will also do the same, but it will restart the process when it is assured that there is enough to continue. The current ticketing system is designed for ENOSPC or the whole filesystem tree. -- Goldwyn ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-02-19 13:30 [PATCH] qgroup: Retry after commit on getting EDQUOT Goldwyn Rodrigues 2017-02-20 3:07 ` Qu Wenruo 2017-03-09 16:54 ` David Sterba @ 2017-03-10 16:06 ` Filipe Manana 2017-03-10 19:31 ` Goldwyn Rodrigues 2 siblings, 1 reply; 10+ messages in thread From: Filipe Manana @ 2017-03-10 16:06 UTC (permalink / raw) To: Goldwyn Rodrigues; +Cc: linux-btrfs, Goldwyn Rodrigues On Sun, Feb 19, 2017 at 1:30 PM, Goldwyn Rodrigues <rgoldwyn@suse.de> wrote: > From: Goldwyn Rodrigues <rgoldwyn@suse.com> > > We are facing the same problem with EDQUOT which was experienced with > ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but > here is a fix. Let me know if it is too big a hammer. > > Quotas are reserved during the start of an operation, incrementing > qg->reserved. However, it is written to disk in a commit_transaction > which could take as long as commit_interval. In the meantime there > could be deletions which are not accounted for because deletions are > accounted for only while committed (free_refroot). So, when we get > a EDQUOT flush the data to disk and try again. > > I combined the conditions of rfer and excl to reduce code lines, though > the condition looks uglier. > > Here is a sample script which shows this issue. > > DEVICE=/dev/vdb > MOUNTPOINT=/mnt > TESTVOL=$MOUNTPOINT/tmp > QUOTA=5 > PROG=btrfs > DD_BS="4k" > DD_COUNT="256" > RUN_TIMES=5000 > > mkfs.btrfs -f $DEVICE > mount -o commit=240 $DEVICE $MOUNTPOINT > $PROG subvolume create $TESTVOL > $PROG quota enable $TESTVOL > $PROG qgroup limit ${QUOTA}G $TESTVOL > > typeset -i DD_RUN_GOOD > typeset -i QUOTA > > function _check_cmd() { > if [[ ${?} > 0 ]]; then > echo -n "$(date) E: Running previous command" > echo ${*} > echo "Without sync" > $PROG qgroup show -pcreFf ${TESTVOL} > echo "With sync" > $PROG qgroup show -pcreFf --sync ${TESTVOL} > exit 1 > fi > } > > while true; do > DD_RUN_GOOD=$RUN_TIMES > > while (( ${DD_RUN_GOOD} != 0 )); do > dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT} > _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}" > DD_RUN_GOOD=(${DD_RUN_GOOD}-1) > done > > $PROG qgroup show -pcref $TESTVOL > echo "----------- Cleanup ---------- " > rm $TESTVOL/quotatest* > > done > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> > --- > fs/btrfs/qgroup.c | 28 +++++++++++++++++----------- > 1 file changed, 17 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 4700cac..9ace407 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) > struct btrfs_fs_info *fs_info = root->fs_info; > u64 ref_root = root->root_key.objectid; > int ret = 0; > + int retried = 0; > struct ulist_node *unode; > struct ulist_iterator uiter; > > @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) > > if (num_bytes == 0) > return 0; > - > +retry: > spin_lock(&fs_info->qgroup_lock); > quota_root = fs_info->quota_root; > if (!quota_root) > @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) > > qg = u64_to_ptr(unode->aux); > > - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && > - qg->reserved + (s64)qg->rfer + num_bytes > > - qg->max_rfer) { > - ret = -EDQUOT; > - goto out; > - } > - > - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && > - qg->reserved + (s64)qg->excl + num_bytes > > - qg->max_excl) { > + if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && > + qg->reserved + (s64)qg->rfer + num_bytes > qg->max_rfer) || > + ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && > + qg->reserved + (s64)qg->excl + num_bytes > qg->max_excl)) { > + if (!retried && qg->reserved > 0) { > + struct btrfs_trans_handle *trans; > + spin_unlock(&fs_info->qgroup_lock); > + btrfs_start_delalloc_roots(fs_info, 0, -1); Only seeing this now, because the thread was bumped. But do you really need to flush delayed writes for all subvolumes? Why isn't it enough to do it just for the subvolume we are attempting to write to? Just flushing the data for that subvolume (root) should be enough, as we do when creating snapshots for example (ioctl.c:create_snapshot()). thanks > + trans = btrfs_join_transaction(root); > + if (IS_ERR(trans)) > + return PTR_ERR(trans); > + btrfs_commit_transaction(trans, root); > + retried++; > + goto retry; > + } > ret = -EDQUOT; > goto out; > } > -- > 2.10.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "People will forget what you said, people will forget what you did, but people will never forget how you made them feel." ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] qgroup: Retry after commit on getting EDQUOT 2017-03-10 16:06 ` Filipe Manana @ 2017-03-10 19:31 ` Goldwyn Rodrigues 0 siblings, 0 replies; 10+ messages in thread From: Goldwyn Rodrigues @ 2017-03-10 19:31 UTC (permalink / raw) To: fdmanana; +Cc: linux-btrfs, Goldwyn Rodrigues On 03/10/2017 10:06 AM, Filipe Manana wrote: > On Sun, Feb 19, 2017 at 1:30 PM, Goldwyn Rodrigues <rgoldwyn@suse.de> wrote: >> From: Goldwyn Rodrigues <rgoldwyn@suse.com> >> >> We are facing the same problem with EDQUOT which was experienced with >> ENOSPC. Not sure if we require a full ticketing system such as ENOSPC, but >> here is a fix. Let me know if it is too big a hammer. >> >> Quotas are reserved during the start of an operation, incrementing >> qg->reserved. However, it is written to disk in a commit_transaction >> which could take as long as commit_interval. In the meantime there >> could be deletions which are not accounted for because deletions are >> accounted for only while committed (free_refroot). So, when we get >> a EDQUOT flush the data to disk and try again. >> >> I combined the conditions of rfer and excl to reduce code lines, though >> the condition looks uglier. >> >> Here is a sample script which shows this issue. >> >> DEVICE=/dev/vdb >> MOUNTPOINT=/mnt >> TESTVOL=$MOUNTPOINT/tmp >> QUOTA=5 >> PROG=btrfs >> DD_BS="4k" >> DD_COUNT="256" >> RUN_TIMES=5000 >> >> mkfs.btrfs -f $DEVICE >> mount -o commit=240 $DEVICE $MOUNTPOINT >> $PROG subvolume create $TESTVOL >> $PROG quota enable $TESTVOL >> $PROG qgroup limit ${QUOTA}G $TESTVOL >> >> typeset -i DD_RUN_GOOD >> typeset -i QUOTA >> >> function _check_cmd() { >> if [[ ${?} > 0 ]]; then >> echo -n "$(date) E: Running previous command" >> echo ${*} >> echo "Without sync" >> $PROG qgroup show -pcreFf ${TESTVOL} >> echo "With sync" >> $PROG qgroup show -pcreFf --sync ${TESTVOL} >> exit 1 >> fi >> } >> >> while true; do >> DD_RUN_GOOD=$RUN_TIMES >> >> while (( ${DD_RUN_GOOD} != 0 )); do >> dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT} >> _check_cmd "dd if=/dev/zero of=${TESTVOL}/quotatest${DD_RUN_GOOD} bs=${DD_BS} count=${DD_COUNT}" >> DD_RUN_GOOD=(${DD_RUN_GOOD}-1) >> done >> >> $PROG qgroup show -pcref $TESTVOL >> echo "----------- Cleanup ---------- " >> rm $TESTVOL/quotatest* >> >> done >> >> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> >> --- >> fs/btrfs/qgroup.c | 28 +++++++++++++++++----------- >> 1 file changed, 17 insertions(+), 11 deletions(-) >> >> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >> index 4700cac..9ace407 100644 >> --- a/fs/btrfs/qgroup.c >> +++ b/fs/btrfs/qgroup.c >> @@ -2093,6 +2093,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) >> struct btrfs_fs_info *fs_info = root->fs_info; >> u64 ref_root = root->root_key.objectid; >> int ret = 0; >> + int retried = 0; >> struct ulist_node *unode; >> struct ulist_iterator uiter; >> >> @@ -2101,7 +2102,7 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) >> >> if (num_bytes == 0) >> return 0; >> - >> +retry: >> spin_lock(&fs_info->qgroup_lock); >> quota_root = fs_info->quota_root; >> if (!quota_root) >> @@ -2127,16 +2128,21 @@ static int qgroup_reserve(struct btrfs_root *root, u64 num_bytes) >> >> qg = u64_to_ptr(unode->aux); >> >> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >> - qg->reserved + (s64)qg->rfer + num_bytes > >> - qg->max_rfer) { >> - ret = -EDQUOT; >> - goto out; >> - } >> - >> - if ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >> - qg->reserved + (s64)qg->excl + num_bytes > >> - qg->max_excl) { >> + if (((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_RFER) && >> + qg->reserved + (s64)qg->rfer + num_bytes > qg->max_rfer) || >> + ((qg->lim_flags & BTRFS_QGROUP_LIMIT_MAX_EXCL) && >> + qg->reserved + (s64)qg->excl + num_bytes > qg->max_excl)) { >> + if (!retried && qg->reserved > 0) { >> + struct btrfs_trans_handle *trans; >> + spin_unlock(&fs_info->qgroup_lock); >> + btrfs_start_delalloc_roots(fs_info, 0, -1); > > Only seeing this now, because the thread was bumped. > > But do you really need to flush delayed writes for all subvolumes? > Why isn't it enough to do it just for the subvolume we are attempting > to write to? > Just flushing the data for that subvolume (root) should be enough, as > we do when creating snapshots for example (ioctl.c:create_snapshot()). > Yes, you're right. That will be much better and will reduce the amount of time taken in performing the flush. Thanks! > >> + trans = btrfs_join_transaction(root); >> + if (IS_ERR(trans)) >> + return PTR_ERR(trans); >> + btrfs_commit_transaction(trans, root); >> + retried++; >> + goto retry; >> + } >> ret = -EDQUOT; >> goto out; >> } >> -- >> 2.10.2 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- Goldwyn ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-03-10 19:31 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-19 13:30 [PATCH] qgroup: Retry after commit on getting EDQUOT Goldwyn Rodrigues 2017-02-20 3:07 ` Qu Wenruo 2017-02-20 4:35 ` Goldwyn Rodrigues 2017-02-20 5:35 ` Qu Wenruo 2017-02-20 12:51 ` Goldwyn Rodrigues 2017-02-21 0:48 ` Qu Wenruo 2017-03-09 16:54 ` David Sterba 2017-03-10 15:56 ` Goldwyn Rodrigues 2017-03-10 16:06 ` Filipe Manana 2017-03-10 19:31 ` Goldwyn Rodrigues
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.