[PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
@ 2010-04-07 21:45 Eric Sandeen
  2010-04-07 21:49 ` [PATCH 1/3] quota: use flags interface for dquot alloc/free space Eric Sandeen
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Eric Sandeen @ 2010-04-07 21:45 UTC (permalink / raw)
  To: ext4 development

Because we can badly over-reserve metadata when we
calculate worst-case, it complicates things for quota, since
we must reserve and then claim later, retry on EDQUOT, etc.
Quota is also a generally smaller pool than fs free blocks,
so this over-reservation hurts more, and more often.

I'm of the opinion that it's not the worst thing to allow
metadata to push a user slightly over quota.  This simplifies
the code and avoids the false quota rejections that result
from worst-case speculation.

This patch series stops the speculative quota-charging for
worst-case metadata requirements, and just charges quota
when the blocks are allocated at writeout.  It also is
able to remove the try-again loop on EDQUOT.

The first 2 patches are quota infrastructure changes,
to change __dquot_alloc/free_space to take a flags argument,
and then add a NOFAIL option, so that metadata writes which
tip us over quota can proceed.

The last patch makes the ext4 changes.  The whole batch
has been tested indirectly by running the xfstests suite
with a hack to mount & enable quota prior to the test.

I also did a more specific test of fragmenting freespace
and then doing a large delalloc write under quota; quota
stopped me at the right amount of file IO, and then the
writeout generated enough metadata (due to the fragmentation)
that it put me slightly over quota, as expected.

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] quota: use flags interface for dquot alloc/free space
  2010-04-07 21:45 [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
@ 2010-04-07 21:49 ` Eric Sandeen
  2010-04-12 13:22   ` Jan Kara
  2010-05-20 22:29   ` tytso
  2010-04-07 21:52 ` [PATCH 2/3] quota: add the option to not fail with EDQUOT in block allocation Eric Sandeen
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Eric Sandeen @ 2010-04-07 21:49 UTC (permalink / raw)
  To: ext4 development

Switch __dquot_alloc_space and __dquot_free_space to take flags
to indicate whether to warn and/or to reserve (or free reserve).

This is slightly more readable at the callpoints, and makes it
cleaner to add a "nofail" option in the next patch.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index e0b870f..8436d9b 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1484,11 +1484,12 @@ static void inode_decr_space(struct inode *inode, qsize_t number, int reserve)
 /*
  * This operation can block, but only after everything is updated
  */
-int __dquot_alloc_space(struct inode *inode, qsize_t number,
-		int warn, int reserve)
+int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
 {
 	int cnt, ret = 0;
 	char warntype[MAXQUOTAS];
+	int warn = flags & DQUOT_SPACE_WARN;
+	int reserve = flags & DQUOT_SPACE_RESERVE;
 
 	/*
 	 * First test before acquiring mutex - solves deadlocks when we
@@ -1608,10 +1609,11 @@ EXPORT_SYMBOL(dquot_claim_space_nodirty);
 /*
  * This operation can block, but only after everything is updated
  */
-void __dquot_free_space(struct inode *inode, qsize_t number, int reserve)
+void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
 {
 	unsigned int cnt;
 	char warntype[MAXQUOTAS];
+	int reserve = flags & DQUOT_SPACE_RESERVE;
 
 	/* First test before acquiring mutex - solves deadlocks when we
          * re-enter the quota code and are already holding the mutex */
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index e6fa7ac..daa106f 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -14,6 +14,9 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
 	return &sb->s_dquot;
 }
 
+#define DQUOT_SPACE_WARN	0x1
+#define DQUOT_SPACE_RESERVE	0x2
+
 #if defined(CONFIG_QUOTA)
 
 /*
@@ -33,9 +36,8 @@ int dquot_scan_active(struct super_block *sb,
 struct dquot *dquot_alloc(struct super_block *sb, int type);
 void dquot_destroy(struct dquot *dquot);
 
-int __dquot_alloc_space(struct inode *inode, qsize_t number,
-		int warn, int reserve);
-void __dquot_free_space(struct inode *inode, qsize_t number, int reserve);
+int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags);
+void __dquot_free_space(struct inode *inode, qsize_t number, int flags);
 
 int dquot_alloc_inode(const struct inode *inode);
 
@@ -231,17 +233,17 @@ static inline int dquot_transfer(struct inode *inode, struct iattr *iattr)
 }
 
 static inline int __dquot_alloc_space(struct inode *inode, qsize_t number,
-		int warn, int reserve)
+		int flags)
 {
-	if (!reserve)
+	if (!(flags & DQUOT_SPACE_RESERVE))
 		inode_add_bytes(inode, number);
 	return 0;
 }
 
 static inline void __dquot_free_space(struct inode *inode, qsize_t number,
-		int reserve)
+		int flags)
 {
-	if (!reserve)
+	if (!(flags & DQUOT_SPACE_RESERVE))
 		inode_sub_bytes(inode, number);
 }
 
@@ -257,7 +259,7 @@ static inline int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
 
 static inline int dquot_alloc_space_nodirty(struct inode *inode, qsize_t nr)
 {
-	return __dquot_alloc_space(inode, nr, 1, 0);
+	return __dquot_alloc_space(inode, nr, DQUOT_SPACE_WARN);
 }
 
 static inline int dquot_alloc_space(struct inode *inode, qsize_t nr)
@@ -282,7 +284,7 @@ static inline int dquot_alloc_block(struct inode *inode, qsize_t nr)
 
 static inline int dquot_prealloc_block_nodirty(struct inode *inode, qsize_t nr)
 {
-	return __dquot_alloc_space(inode, nr << inode->i_blkbits, 0, 0);
+	return __dquot_alloc_space(inode, nr << inode->i_blkbits, 0);
 }
 
 static inline int dquot_prealloc_block(struct inode *inode, qsize_t nr)
@@ -297,7 +299,8 @@ static inline int dquot_prealloc_block(struct inode *inode, qsize_t nr)
 
 static inline int dquot_reserve_block(struct inode *inode, qsize_t nr)
 {
-	return __dquot_alloc_space(inode, nr << inode->i_blkbits, 1, 1);
+	return __dquot_alloc_space(inode, nr << inode->i_blkbits,
+				DQUOT_SPACE_WARN|DQUOT_SPACE_RESERVE);
 }
 
 static inline int dquot_claim_block(struct inode *inode, qsize_t nr)
@@ -334,7 +337,7 @@ static inline void dquot_free_block(struct inode *inode, qsize_t nr)
 static inline void dquot_release_reservation_block(struct inode *inode,
 		qsize_t nr)
 {
-	__dquot_free_space(inode, nr << inode->i_blkbits, 1);
+	__dquot_free_space(inode, nr << inode->i_blkbits, DQUOT_SPACE_RESERVE);
 }
 
 #endif /* _LINUX_QUOTAOPS_ */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] quota: add the option to not fail with EDQUOT in block allocation
  2010-04-07 21:45 [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
  2010-04-07 21:49 ` [PATCH 1/3] quota: use flags interface for dquot alloc/free space Eric Sandeen
@ 2010-04-07 21:52 ` Eric Sandeen
  2010-04-12 13:23   ` Jan Kara
  2010-05-20 22:32   ` tytso
  2010-04-07 21:55 ` [PATCH 3/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Eric Sandeen @ 2010-04-07 21:52 UTC (permalink / raw)
  To: ext4 development

To simplify metadata tracking for delalloc writes, ext4
will simply claim metadata blocks at allocation time, without
first speculatively reserving the worst case and then freeing
what was not used.

To do this, we need a mechanism to track allocations in
the quota subsystem, but potentially allow that allocation
to actually go over quota.

This patch adds a DQUOT_SPACE_NOFAIL flag and function
variants for this purpose.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 8436d9b..c38b137 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1490,6 +1490,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
 	char warntype[MAXQUOTAS];
 	int warn = flags & DQUOT_SPACE_WARN;
 	int reserve = flags & DQUOT_SPACE_RESERVE;
+	int nofail = flags & DQUOT_SPACE_NOFAIL;
 
 	/*
 	 * First test before acquiring mutex - solves deadlocks when we
@@ -1510,7 +1511,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
 			continue;
 		ret = check_bdq(inode->i_dquot[cnt], number, !warn,
 				warntype+cnt);
-		if (ret) {
+		if (ret && !nofail) {
 			spin_unlock(&dq_data_lock);
 			goto out_flush_warn;
 		}
diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
index daa106f..8f858ce 100644
--- a/include/linux/quotaops.h
+++ b/include/linux/quotaops.h
@@ -16,6 +16,7 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
 
 #define DQUOT_SPACE_WARN	0x1
 #define DQUOT_SPACE_RESERVE	0x2
+#define DQUOT_SPACE_NOFAIL	0x4
 
 #if defined(CONFIG_QUOTA)
 
@@ -262,6 +263,12 @@ static inline int dquot_alloc_space_nodirty(struct inode *inode, qsize_t nr)
 	return __dquot_alloc_space(inode, nr, DQUOT_SPACE_WARN);
 }
 
+static inline void dquot_alloc_space_nofail(struct inode *inode, qsize_t nr)
+{
+	__dquot_alloc_space(inode, nr, DQUOT_SPACE_WARN|DQUOT_SPACE_NOFAIL);
+	mark_inode_dirty(inode);
+}
+
 static inline int dquot_alloc_space(struct inode *inode, qsize_t nr)
 {
 	int ret;
@@ -277,6 +284,11 @@ static inline int dquot_alloc_block_nodirty(struct inode *inode, qsize_t nr)
 	return dquot_alloc_space_nodirty(inode, nr << inode->i_blkbits);
 }
 
+static inline void dquot_alloc_block_nofail(struct inode *inode, qsize_t nr)
+{
+	dquot_alloc_space_nofail(inode, nr << inode->i_blkbits);
+}
+
 static inline int dquot_alloc_block(struct inode *inode, qsize_t nr)
 {
 	return dquot_alloc_space(inode, nr << inode->i_blkbits);


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-07 21:45 [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
  2010-04-07 21:49 ` [PATCH 1/3] quota: use flags interface for dquot alloc/free space Eric Sandeen
  2010-04-07 21:52 ` [PATCH 2/3] quota: add the option to not fail with EDQUOT in block allocation Eric Sandeen
@ 2010-04-07 21:55 ` Eric Sandeen
  2010-04-12 13:36   ` Jan Kara
  2010-05-20 23:10   ` tytso
  2010-04-08  8:20 ` [PATCH 0/3] " Dmitry Monakhov
  2010-04-12  2:20 ` tytso
  4 siblings, 2 replies; 26+ messages in thread
From: Eric Sandeen @ 2010-04-07 21:55 UTC (permalink / raw)
  To: ext4 development

Because we can badly over-reserve metadata when we
calculate worst-case, it complicates things for quota, since
we must reserve and then claim later, retry on EDQUOT, etc.
Quota is also a generally smaller pool than fs free blocks,
so this over-reservation hurts more, and more often.

I'm of the opinion that it's not the worst thing to allow
metadata to push a user slightly over quota.  This simplifies
the code and avoids the false quota rejections that result
from worst-case speculation.

This patch stops the speculative quota-charging for
worst-case metadata requirements, and just charges quota
when the blocks are allocated at writeout.  It also is
able to remove the try-again loop on EDQUOT.

This patch has been tested indirectly by running the xfstests
suite with a hack to mount & enable quota prior to the test.

I also did a more specific test of fragmenting freespace
and then doing a large delalloc write under quota; quota
stopped me at the right amount of file IO, and then the
writeout generated enough metadata (due to the fragmentation)
that it put me slightly over quota, as expected.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

 balloc.c |    5 ++--
 inode.c  |   64 ++++++++++++++++++++-------------------------------------------
 2 files changed, 24 insertions(+), 45 deletions(-)


diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index d2f37a5..95b7594 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -591,14 +591,15 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode,
 	ret = ext4_mb_new_blocks(handle, &ar, errp);
 	if (count)
 		*count = ar.len;
-
 	/*
-	 * Account for the allocated meta blocks
+	 * Account for the allocated meta blocks.  We will never
+	 * fail EDQUOT for metdata, but we do account for it.
 	 */
 	if (!(*errp) && EXT4_I(inode)->i_delalloc_reserved_flag) {
 		spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
 		EXT4_I(inode)->i_allocated_meta_blocks += ar.len;
 		spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
+		dquot_alloc_block_nofail(inode, ar.len);
 	}
 	return ret;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5381802..dba674a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1076,7 +1076,6 @@ void ext4_da_update_reserve_space(struct inode *inode,
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	int mdb_free = 0, allocated_meta_blocks = 0;
 
 	spin_lock(&ei->i_block_reservation_lock);
 	trace_ext4_da_update_reserve_space(inode, used);
@@ -1091,11 +1090,10 @@ void ext4_da_update_reserve_space(struct inode *inode,
 
 	/* Update per-inode reservations */
 	ei->i_reserved_data_blocks -= used;
-	used += ei->i_allocated_meta_blocks;
 	ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks;
-	allocated_meta_blocks = ei->i_allocated_meta_blocks;
+	percpu_counter_sub(&sbi->s_dirtyblocks_counter,
+			   used + ei->i_allocated_meta_blocks);
 	ei->i_allocated_meta_blocks = 0;
-	percpu_counter_sub(&sbi->s_dirtyblocks_counter, used);
 
 	if (ei->i_reserved_data_blocks == 0) {
 		/*
@@ -1103,30 +1101,23 @@ void ext4_da_update_reserve_space(struct inode *inode,
 		 * only when we have written all of the delayed
 		 * allocation blocks.
 		 */
-		mdb_free = ei->i_reserved_meta_blocks;
+		percpu_counter_sub(&sbi->s_dirtyblocks_counter,
+				   ei->i_reserved_meta_blocks);
 		ei->i_reserved_meta_blocks = 0;
 		ei->i_da_metadata_calc_len = 0;
-		percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
 	}
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
 
-	/* Update quota subsystem */
+	/* Update quota subsystem for data blocks */
 	if (quota_claim) {
 		dquot_claim_block(inode, used);
-		if (mdb_free)
-			dquot_release_reservation_block(inode, mdb_free);
 	} else {
 		/*
 		 * We did fallocate with an offset that is already delayed
 		 * allocated. So on delayed allocated writeback we should
-		 * not update the quota for allocated blocks. But then
-		 * converting an fallocate region to initialized region would
-		 * have caused a metadata allocation. So claim quota for
-		 * that
+		 * not re-claim the quota for fallocated blocks.
 		 */
-		if (allocated_meta_blocks)
-			dquot_claim_block(inode, allocated_meta_blocks);
-		dquot_release_reservation_block(inode, mdb_free + used);
+		dquot_release_reservation_block(inode, used);
 	}
 
 	/*
@@ -1860,7 +1851,7 @@ static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
 	int retries = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct ext4_inode_info *ei = EXT4_I(inode);
-	unsigned long md_needed, md_reserved;
+	unsigned long md_needed;
 	int ret;
 
 	/*
@@ -1870,22 +1861,24 @@ static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
 	 */
 repeat:
 	spin_lock(&ei->i_block_reservation_lock);
-	md_reserved = ei->i_reserved_meta_blocks;
 	md_needed = ext4_calc_metadata_amount(inode, lblock);
 	trace_ext4_da_reserve_space(inode, md_needed);
 	spin_unlock(&ei->i_block_reservation_lock);
 
 	/*
-	 * Make quota reservation here to prevent quota overflow
-	 * later. Real quota accounting is done at pages writeout
-	 * time.
+	 * We will charge metadata quota at writeout time; this saves
+	 * us from metadata over-estimation, though we may go over by
+	 * a small amount in the end.  Here we just reserve for data.
 	 */
-	ret = dquot_reserve_block(inode, md_needed + 1);
+	ret = dquot_reserve_block(inode, 1);
 	if (ret)
 		return ret;
-
+	/*
+	 * We do still charge estimated metadata to the sb though;
+	 * we cannot afford to run out of free blocks.
+	 */
 	if (ext4_claim_free_blocks(sbi, md_needed + 1)) {
-		dquot_release_reservation_block(inode, md_needed + 1);
+		dquot_release_reservation_block(inode, 1);
 		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
 			yield();
 			goto repeat;
@@ -1932,12 +1925,13 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
 		 * only when we have written all of the delayed
 		 * allocation blocks.
 		 */
-		to_free += ei->i_reserved_meta_blocks;
+		percpu_counter_sub(&sbi->s_dirtyblocks_counter,
+				   ei->i_reserved_meta_blocks);
 		ei->i_reserved_meta_blocks = 0;
 		ei->i_da_metadata_calc_len = 0;
 	}
 
-	/* update fs dirty blocks counter */
+	/* update fs dirty data blocks counter */
 	percpu_counter_sub(&sbi->s_dirtyblocks_counter, to_free);
 
 	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
@@ -3076,7 +3070,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 			       loff_t pos, unsigned len, unsigned flags,
 			       struct page **pagep, void **fsdata)
 {
-	int ret, retries = 0, quota_retries = 0;
+	int ret, retries = 0;
 	struct page *page;
 	pgoff_t index;
 	unsigned from, to;
@@ -3135,22 +3129,6 @@ retry:
 
 	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
 		goto retry;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-07 21:45 [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
                   ` (2 preceding siblings ...)
  2010-04-07 21:55 ` [PATCH 3/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
@ 2010-04-08  8:20 ` Dmitry Monakhov
  2010-04-08 15:28   ` Eric Sandeen
  2010-04-12  2:20 ` tytso
  4 siblings, 1 reply; 26+ messages in thread
From: Dmitry Monakhov @ 2010-04-08  8:20 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

Eric Sandeen <sandeen@redhat.com> writes:

> Because we can badly over-reserve metadata when we
> calculate worst-case, it complicates things for quota, since
> we must reserve and then claim later, retry on EDQUOT, etc.
> Quota is also a generally smaller pool than fs free blocks,
> so this over-reservation hurts more, and more often.
>
> I'm of the opinion that it's not the worst thing to allow
> metadata to push a user slightly over quota.  This simplifies
> the code and avoids the false quota rejections that result
> from worst-case speculation.
Hm.. Totally agree with issue description. And seem there is no another
solution except yours.
ASAIU alloc_nofail is called from places where it is impossible to fail
an allocation even if something goes wrong.
I ask because currently i'm working on EIO handling in alloc/free calls.
I've found that it is useless to fail claim/free procedures because
caller is unable to handle it properly.
It is impossible to fail following operation
->writepage
 ->dquot_claim_space (what to do if EIO happens?)
Seems that your nofail operations has same semantics because it is just
an equivalent of claim, but without reservation.
>
> This patch series stops the speculative quota-charging for
> worst-case metadata requirements, and just charges quota
> when the blocks are allocated at writeout.  It also is
> able to remove the try-again loop on EDQUOT.
>
> The first 2 patches are quota infrastructure changes,
> to change __dquot_alloc/free_space to take a flags argument,
> and then add a NOFAIL option, so that metadata writes which
> tip us over quota can proceed.
>
> The last patch makes the ext4 changes.  The whole batch
> has been tested indirectly by running the xfstests suite
> with a hack to mount & enable quota prior to the test.
>
> I also did a more specific test of fragmenting freespace
> and then doing a large delalloc write under quota; quota
> stopped me at the right amount of file IO, and then the
> writeout generated enough metadata (due to the fragmentation)
> that it put me slightly over quota, as expected.
>
> Thanks,
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-08  8:20 ` [PATCH 0/3] " Dmitry Monakhov
@ 2010-04-08 15:28   ` Eric Sandeen
  2010-04-11  9:37     ` Dmitry Monakhov
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Sandeen @ 2010-04-08 15:28 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: ext4 development

Dmitry Monakhov wrote:
> Eric Sandeen <sandeen@redhat.com> writes:
> 
>> Because we can badly over-reserve metadata when we
>> calculate worst-case, it complicates things for quota, since
>> we must reserve and then claim later, retry on EDQUOT, etc.
>> Quota is also a generally smaller pool than fs free blocks,
>> so this over-reservation hurts more, and more often.
>>
>> I'm of the opinion that it's not the worst thing to allow
>> metadata to push a user slightly over quota.  This simplifies
>> the code and avoids the false quota rejections that result
>> from worst-case speculation.
> Hm.. Totally agree with issue description. And seem there is no another
> solution except yours.
> ASAIU alloc_nofail is called from places where it is impossible to fail
> an allocation even if something goes wrong.
> I ask because currently i'm working on EIO handling in alloc/free calls.
> I've found that it is useless to fail claim/free procedures because
> caller is unable to handle it properly.
> It is impossible to fail following operation
> ->writepage
>  ->dquot_claim_space (what to do if EIO happens?)

Hm, if these start returning EIO then maybe my patch should be modified
to treat EDQUOT differently than EIO ... assuming callers can handle
the return at all.

In other words, make NOFAIL really just mean "don't fail for EDQUOT"

-Eric

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-08 15:28   ` Eric Sandeen
@ 2010-04-11  9:37     ` Dmitry Monakhov
  2010-04-12 13:15       ` Jan Kara
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Monakhov @ 2010-04-11  9:37 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

Eric Sandeen <sandeen@redhat.com> writes:

> Dmitry Monakhov wrote:
>> Eric Sandeen <sandeen@redhat.com> writes:
>> 
>>> Because we can badly over-reserve metadata when we
>>> calculate worst-case, it complicates things for quota, since
>>> we must reserve and then claim later, retry on EDQUOT, etc.
>>> Quota is also a generally smaller pool than fs free blocks,
>>> so this over-reservation hurts more, and more often.
>>>
>>> I'm of the opinion that it's not the worst thing to allow
>>> metadata to push a user slightly over quota.  This simplifies
>>> the code and avoids the false quota rejections that result
>>> from worst-case speculation.
>> Hm.. Totally agree with issue description. And seem there is no another
>> solution except yours.
>> ASAIU alloc_nofail is called from places where it is impossible to fail
>> an allocation even if something goes wrong.
>> I ask because currently i'm working on EIO handling in alloc/free calls.
>> I've found that it is useless to fail claim/free procedures because
>> caller is unable to handle it properly.
>> It is impossible to fail following operation
>> ->writepage
>>  ->dquot_claim_space (what to do if EIO happens?)
>
> Hm, if these start returning EIO then maybe my patch should be modified
> to treat EDQUOT differently than EIO ... assuming callers can handle
> the return at all.
>
> In other words, make NOFAIL really just mean "don't fail for EDQUOT"
Yes. agree So we have two types of errors
1) expected errors: EDQUOT
2) fatal errors: (EIO/ENOSPC/ENOMEM)
So we need two types of flags:
1)FORCE (IMHO it is better name than you proposed) to allow exceed a
  quota limit
2)NOFAIL to allow ignore fatal errors.

We still need NOFAIL, because for example if something is happens in
->write_page()
 ->dquot_claim()
     update_quota() -> EIO  /* update disk quota */
     uptage_bytes() /* update i_bytes count */
It is obvious that write_page should fail because it is too late to
return the error to userspace, so data will probably lost which
is much more dramatic bug than quota inconsistency.
So the only options we have is to:
1) Do not modify inode->i_bytes and return error which caller will
   probably ignore. IMHO this is not good because result in
   incorrect stat()

1) do as much as we can (as it happens for now), modify inode->i_bytes
   and return positive error code to caller.(which signal what error
   result in quota inconsystency only)

This fatal errors handling logic i'll post on top of your patch-set.
But please change flag name from NOFAIL to FORCE.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-07 21:45 [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
                   ` (3 preceding siblings ...)
  2010-04-08  8:20 ` [PATCH 0/3] " Dmitry Monakhov
@ 2010-04-12  2:20 ` tytso
  2010-04-12 14:03   ` Jan Kara
  4 siblings, 1 reply; 26+ messages in thread
From: tytso @ 2010-04-12  2:20 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development, Jan Kara

On Wed, Apr 07, 2010 at 04:45:52PM -0500, Eric Sandeen wrote:
> Because we can badly over-reserve metadata when we
> calculate worst-case, it complicates things for quota, since
> we must reserve and then claim later, retry on EDQUOT, etc.
> Quota is also a generally smaller pool than fs free blocks,
> so this over-reservation hurts more, and more often.
> 
> I'm of the opinion that it's not the worst thing to allow
> metadata to push a user slightly over quota.  This simplifies
> the code and avoids the false quota rejections that result
> from worst-case speculation.

This patch series looks good to me in general; Jan, it requires
relatively minor changes to the quota system, so it would be good to
get your Acked-by for the first two patches.  Since the changes to the
ext4 layer are more in-depth, any objections if I carry all three
patches in the ext4 tree?

					- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-11  9:37     ` Dmitry Monakhov
@ 2010-04-12 13:15       ` Jan Kara
  2010-04-12 13:22         ` Jan Kara
  2010-04-12 13:33         ` Dmitry Monakhov
  0 siblings, 2 replies; 26+ messages in thread
From: Jan Kara @ 2010-04-12 13:15 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Eric Sandeen, ext4 development

> Eric Sandeen <sandeen@redhat.com> writes:
> 
> > Dmitry Monakhov wrote:
> >> Eric Sandeen <sandeen@redhat.com> writes:
> >> 
> >>> Because we can badly over-reserve metadata when we
> >>> calculate worst-case, it complicates things for quota, since
> >>> we must reserve and then claim later, retry on EDQUOT, etc.
> >>> Quota is also a generally smaller pool than fs free blocks,
> >>> so this over-reservation hurts more, and more often.
> >>>
> >>> I'm of the opinion that it's not the worst thing to allow
> >>> metadata to push a user slightly over quota.  This simplifies
> >>> the code and avoids the false quota rejections that result
> >>> from worst-case speculation.
> >> Hm.. Totally agree with issue description. And seem there is no another
> >> solution except yours.
> >> ASAIU alloc_nofail is called from places where it is impossible to fail
> >> an allocation even if something goes wrong.
> >> I ask because currently i'm working on EIO handling in alloc/free calls.
> >> I've found that it is useless to fail claim/free procedures because
> >> caller is unable to handle it properly.
> >> It is impossible to fail following operation
> >> ->writepage
> >>  ->dquot_claim_space (what to do if EIO happens?)
> >
> > Hm, if these start returning EIO then maybe my patch should be modified
> > to treat EDQUOT differently than EIO ... assuming callers can handle
> > the return at all.
> >
> > In other words, make NOFAIL really just mean "don't fail for EDQUOT"
> Yes. agree So we have two types of errors
> 1) expected errors: EDQUOT
> 2) fatal errors: (EIO/ENOSPC/ENOMEM)
> So we need two types of flags:
> 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
>   quota limit
> 2)NOFAIL to allow ignore fatal errors.
> 
> We still need NOFAIL, because for example if something is happens in
> ->write_page()
>  ->dquot_claim()
>      update_quota() -> EIO  /* update disk quota */
>      update_bytes() /* update i_bytes count */
> It is obvious that write_page should fail because it is too late to
> return the error to userspace, so data will probably lost which
> is much more dramatic bug than quota inconsistency.
> So the only options we have is to:
> 1) Do not modify inode->i_bytes and return error which caller will
>    probably ignore. IMHO this is not good because result in
>    incorrect stat()
> 
> 2) do as much as we can (as it happens for now), modify inode->i_bytes
>    and return positive error code to caller.(which signal what error
>    result in quota inconsystency only)
  Yes, agreed that 2) is a better solution.

> This fatal errors handling logic i'll post on top of your patch-set.
> But please change flag name from NOFAIL to FORCE.
  Hmm, do we really need to distinguish between your NOFAIL and FORCE?
I mean there are places where we can handle quota failures (both EDQUOT
or others) and places where we cannot and then we just want to go on as
seamlessly as possible. So NOFAIL flag seems to be enough...
  Now I agree that in theory there can be some caller which might wish
to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
of such callsite currently so there's no immediate need for the flag.
So Eric's patches seem to be fine to me as they are. What do you think?

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 13:15       ` Jan Kara
@ 2010-04-12 13:22         ` Jan Kara
  2010-04-12 13:40           ` Dmitry Monakhov
  2010-04-12 13:33         ` Dmitry Monakhov
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Kara @ 2010-04-12 13:22 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Eric Sandeen, ext4 development

> > Eric Sandeen <sandeen@redhat.com> writes:
> > 
> > > Dmitry Monakhov wrote:
> > >> Eric Sandeen <sandeen@redhat.com> writes:
> > >> 
> > >>> Because we can badly over-reserve metadata when we
> > >>> calculate worst-case, it complicates things for quota, since
> > >>> we must reserve and then claim later, retry on EDQUOT, etc.
> > >>> Quota is also a generally smaller pool than fs free blocks,
> > >>> so this over-reservation hurts more, and more often.
> > >>>
> > >>> I'm of the opinion that it's not the worst thing to allow
> > >>> metadata to push a user slightly over quota.  This simplifies
> > >>> the code and avoids the false quota rejections that result
> > >>> from worst-case speculation.
> > >> Hm.. Totally agree with issue description. And seem there is no another
> > >> solution except yours.
> > >> ASAIU alloc_nofail is called from places where it is impossible to fail
> > >> an allocation even if something goes wrong.
> > >> I ask because currently i'm working on EIO handling in alloc/free calls.
> > >> I've found that it is useless to fail claim/free procedures because
> > >> caller is unable to handle it properly.
> > >> It is impossible to fail following operation
> > >> ->writepage
> > >>  ->dquot_claim_space (what to do if EIO happens?)
> > >
> > > Hm, if these start returning EIO then maybe my patch should be modified
> > > to treat EDQUOT differently than EIO ... assuming callers can handle
> > > the return at all.
> > >
> > > In other words, make NOFAIL really just mean "don't fail for EDQUOT"
> > Yes. agree So we have two types of errors
> > 1) expected errors: EDQUOT
> > 2) fatal errors: (EIO/ENOSPC/ENOMEM)
> > So we need two types of flags:
> > 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
> >   quota limit
> > 2)NOFAIL to allow ignore fatal errors.
> > 
> > We still need NOFAIL, because for example if something is happens in
> > ->write_page()
> >  ->dquot_claim()
> >      update_quota() -> EIO  /* update disk quota */
> >      update_bytes() /* update i_bytes count */
> > It is obvious that write_page should fail because it is too late to
> > return the error to userspace, so data will probably lost which
> > is much more dramatic bug than quota inconsistency.
> > So the only options we have is to:
> > 1) Do not modify inode->i_bytes and return error which caller will
> >    probably ignore. IMHO this is not good because result in
> >    incorrect stat()
> > 
> > 2) do as much as we can (as it happens for now), modify inode->i_bytes
> >    and return positive error code to caller.(which signal what error
> >    result in quota inconsystency only)
>   Yes, agreed that 2) is a better solution.
> 
> > This fatal errors handling logic i'll post on top of your patch-set.
> > But please change flag name from NOFAIL to FORCE.
>   Hmm, do we really need to distinguish between your NOFAIL and FORCE?
> I mean there are places where we can handle quota failures (both EDQUOT
> or others) and places where we cannot and then we just want to go on as
> seamlessly as possible. So NOFAIL flag seems to be enough...
>   Now I agree that in theory there can be some caller which might wish
> to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
> of such callsite currently so there's no immediate need for the flag.
> So Eric's patches seem to be fine to me as they are. What do you think?
  Maybe one more point - the callsite in ext4 that uses _nofail variant
really needs full NOFAIL semantics because we have no way of handling
an error there.

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] quota: use flags interface for dquot alloc/free space
  2010-04-07 21:49 ` [PATCH 1/3] quota: use flags interface for dquot alloc/free space Eric Sandeen
@ 2010-04-12 13:22   ` Jan Kara
  2010-05-20 22:29   ` tytso
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Kara @ 2010-04-12 13:22 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

> Switch __dquot_alloc_space and __dquot_free_space to take flags
> to indicate whether to warn and/or to reserve (or free reserve).
> 
> This is slightly more readable at the callpoints, and makes it
> cleaner to add a "nofail" option in the next patch.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
  Looks good. Acked-by: Jan Kara <jack@suse.cz>

> ---
> 
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index e0b870f..8436d9b 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -1484,11 +1484,12 @@ static void inode_decr_space(struct inode *inode, qsize_t number, int reserve)
>  /*
>   * This operation can block, but only after everything is updated
>   */
> -int __dquot_alloc_space(struct inode *inode, qsize_t number,
> -		int warn, int reserve)
> +int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
>  {
>  	int cnt, ret = 0;
>  	char warntype[MAXQUOTAS];
> +	int warn = flags & DQUOT_SPACE_WARN;
> +	int reserve = flags & DQUOT_SPACE_RESERVE;
>  
>  	/*
>  	 * First test before acquiring mutex - solves deadlocks when we
> @@ -1608,10 +1609,11 @@ EXPORT_SYMBOL(dquot_claim_space_nodirty);
>  /*
>   * This operation can block, but only after everything is updated
>   */
> -void __dquot_free_space(struct inode *inode, qsize_t number, int reserve)
> +void __dquot_free_space(struct inode *inode, qsize_t number, int flags)
>  {
>  	unsigned int cnt;
>  	char warntype[MAXQUOTAS];
> +	int reserve = flags & DQUOT_SPACE_RESERVE;
>  
>  	/* First test before acquiring mutex - solves deadlocks when we
>           * re-enter the quota code and are already holding the mutex */
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index e6fa7ac..daa106f 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -14,6 +14,9 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
>  	return &sb->s_dquot;
>  }
>  
> +#define DQUOT_SPACE_WARN	0x1
> +#define DQUOT_SPACE_RESERVE	0x2
> +
>  #if defined(CONFIG_QUOTA)
>  
>  /*
> @@ -33,9 +36,8 @@ int dquot_scan_active(struct super_block *sb,
>  struct dquot *dquot_alloc(struct super_block *sb, int type);
>  void dquot_destroy(struct dquot *dquot);
>  
> -int __dquot_alloc_space(struct inode *inode, qsize_t number,
> -		int warn, int reserve);
> -void __dquot_free_space(struct inode *inode, qsize_t number, int reserve);
> +int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags);
> +void __dquot_free_space(struct inode *inode, qsize_t number, int flags);
>  
>  int dquot_alloc_inode(const struct inode *inode);
>  
> @@ -231,17 +233,17 @@ static inline int dquot_transfer(struct inode *inode, struct iattr *iattr)
>  }
>  
>  static inline int __dquot_alloc_space(struct inode *inode, qsize_t number,
> -		int warn, int reserve)
> +		int flags)
>  {
> -	if (!reserve)
> +	if (!(flags & DQUOT_SPACE_RESERVE))
>  		inode_add_bytes(inode, number);
>  	return 0;
>  }
>  
>  static inline void __dquot_free_space(struct inode *inode, qsize_t number,
> -		int reserve)
> +		int flags)
>  {
> -	if (!reserve)
> +	if (!(flags & DQUOT_SPACE_RESERVE))
>  		inode_sub_bytes(inode, number);
>  }
>  
> @@ -257,7 +259,7 @@ static inline int dquot_claim_space_nodirty(struct inode *inode, qsize_t number)
>  
>  static inline int dquot_alloc_space_nodirty(struct inode *inode, qsize_t nr)
>  {
> -	return __dquot_alloc_space(inode, nr, 1, 0);
> +	return __dquot_alloc_space(inode, nr, DQUOT_SPACE_WARN);
>  }
>  
>  static inline int dquot_alloc_space(struct inode *inode, qsize_t nr)
> @@ -282,7 +284,7 @@ static inline int dquot_alloc_block(struct inode *inode, qsize_t nr)
>  
>  static inline int dquot_prealloc_block_nodirty(struct inode *inode, qsize_t nr)
>  {
> -	return __dquot_alloc_space(inode, nr << inode->i_blkbits, 0, 0);
> +	return __dquot_alloc_space(inode, nr << inode->i_blkbits, 0);
>  }
>  
>  static inline int dquot_prealloc_block(struct inode *inode, qsize_t nr)
> @@ -297,7 +299,8 @@ static inline int dquot_prealloc_block(struct inode *inode, qsize_t nr)
>  
>  static inline int dquot_reserve_block(struct inode *inode, qsize_t nr)
>  {
> -	return __dquot_alloc_space(inode, nr << inode->i_blkbits, 1, 1);
> +	return __dquot_alloc_space(inode, nr << inode->i_blkbits,
> +				DQUOT_SPACE_WARN|DQUOT_SPACE_RESERVE);
>  }
>  
>  static inline int dquot_claim_block(struct inode *inode, qsize_t nr)
> @@ -334,7 +337,7 @@ static inline void dquot_free_block(struct inode *inode, qsize_t nr)
>  static inline void dquot_release_reservation_block(struct inode *inode,
>  		qsize_t nr)
>  {
> -	__dquot_free_space(inode, nr << inode->i_blkbits, 1);
> +	__dquot_free_space(inode, nr << inode->i_blkbits, DQUOT_SPACE_RESERVE);
>  }
>  
>  #endif /* _LINUX_QUOTAOPS_ */
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] quota: add the option to not fail with EDQUOT in block allocation
  2010-04-07 21:52 ` [PATCH 2/3] quota: add the option to not fail with EDQUOT in block allocation Eric Sandeen
@ 2010-04-12 13:23   ` Jan Kara
  2010-05-20 22:32   ` tytso
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Kara @ 2010-04-12 13:23 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

> To simplify metadata tracking for delalloc writes, ext4
> will simply claim metadata blocks at allocation time, without
> first speculatively reserving the worst case and then freeing
> what was not used.
> 
> To do this, we need a mechanism to track allocations in
> the quota subsystem, but potentially allow that allocation
> to actually go over quota.
> 
> This patch adds a DQUOT_SPACE_NOFAIL flag and function
> variants for this purpose.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
   Looks fine. Acked-by: Jan Kara <jack@suse.cz>

								Honza
> ---
> 
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 8436d9b..c38b137 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -1490,6 +1490,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
>  	char warntype[MAXQUOTAS];
>  	int warn = flags & DQUOT_SPACE_WARN;
>  	int reserve = flags & DQUOT_SPACE_RESERVE;
> +	int nofail = flags & DQUOT_SPACE_NOFAIL;
>  
>  	/*
>  	 * First test before acquiring mutex - solves deadlocks when we
> @@ -1510,7 +1511,7 @@ int __dquot_alloc_space(struct inode *inode, qsize_t number, int flags)
>  			continue;
>  		ret = check_bdq(inode->i_dquot[cnt], number, !warn,
>  				warntype+cnt);
> -		if (ret) {
> +		if (ret && !nofail) {
>  			spin_unlock(&dq_data_lock);
>  			goto out_flush_warn;
>  		}
> diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h
> index daa106f..8f858ce 100644
> --- a/include/linux/quotaops.h
> +++ b/include/linux/quotaops.h
> @@ -16,6 +16,7 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb)
>  
>  #define DQUOT_SPACE_WARN	0x1
>  #define DQUOT_SPACE_RESERVE	0x2
> +#define DQUOT_SPACE_NOFAIL	0x4
>  
>  #if defined(CONFIG_QUOTA)
>  
> @@ -262,6 +263,12 @@ static inline int dquot_alloc_space_nodirty(struct inode *inode, qsize_t nr)
>  	return __dquot_alloc_space(inode, nr, DQUOT_SPACE_WARN);
>  }
>  
> +static inline void dquot_alloc_space_nofail(struct inode *inode, qsize_t nr)
> +{
> +	__dquot_alloc_space(inode, nr, DQUOT_SPACE_WARN|DQUOT_SPACE_NOFAIL);
> +	mark_inode_dirty(inode);
> +}
> +
>  static inline int dquot_alloc_space(struct inode *inode, qsize_t nr)
>  {
>  	int ret;
> @@ -277,6 +284,11 @@ static inline int dquot_alloc_block_nodirty(struct inode *inode, qsize_t nr)
>  	return dquot_alloc_space_nodirty(inode, nr << inode->i_blkbits);
>  }
>  
> +static inline void dquot_alloc_block_nofail(struct inode *inode, qsize_t nr)
> +{
> +	dquot_alloc_space_nofail(inode, nr << inode->i_blkbits);
> +}
> +
>  static inline int dquot_alloc_block(struct inode *inode, qsize_t nr)
>  {
>  	return dquot_alloc_space(inode, nr << inode->i_blkbits);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 13:15       ` Jan Kara
  2010-04-12 13:22         ` Jan Kara
@ 2010-04-12 13:33         ` Dmitry Monakhov
  2010-04-12 13:37           ` Dmitry Monakhov
                             ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Dmitry Monakhov @ 2010-04-12 13:33 UTC (permalink / raw)
  To: Jan Kara; +Cc: Eric Sandeen, ext4 development

Jan Kara <jack@suse.cz> writes:

>> Eric Sandeen <sandeen@redhat.com> writes:
>> 
>> > Dmitry Monakhov wrote:
>> >> Eric Sandeen <sandeen@redhat.com> writes:
>> >> 
>> >>> Because we can badly over-reserve metadata when we
>> >>> calculate worst-case, it complicates things for quota, since
>> >>> we must reserve and then claim later, retry on EDQUOT, etc.
>> >>> Quota is also a generally smaller pool than fs free blocks,
>> >>> so this over-reservation hurts more, and more often.
>> >>>
>> >>> I'm of the opinion that it's not the worst thing to allow
>> >>> metadata to push a user slightly over quota.  This simplifies
>> >>> the code and avoids the false quota rejections that result
>> >>> from worst-case speculation.
>> >> Hm.. Totally agree with issue description. And seem there is no another
>> >> solution except yours.
>> >> ASAIU alloc_nofail is called from places where it is impossible to fail
>> >> an allocation even if something goes wrong.
>> >> I ask because currently i'm working on EIO handling in alloc/free calls.
>> >> I've found that it is useless to fail claim/free procedures because
>> >> caller is unable to handle it properly.
>> >> It is impossible to fail following operation
>> >> ->writepage
>> >>  ->dquot_claim_space (what to do if EIO happens?)
>> >
>> > Hm, if these start returning EIO then maybe my patch should be modified
>> > to treat EDQUOT differently than EIO ... assuming callers can handle
>> > the return at all.
>> >
>> > In other words, make NOFAIL really just mean "don't fail for EDQUOT"
>> Yes. agree So we have two types of errors
>> 1) expected errors: EDQUOT
>> 2) fatal errors: (EIO/ENOSPC/ENOMEM)
>> So we need two types of flags:
>> 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
>>   quota limit
>> 2)NOFAIL to allow ignore fatal errors.
>> 
>> We still need NOFAIL, because for example if something is happens in
>> ->write_page()
>>  ->dquot_claim()
>>      update_quota() -> EIO  /* update disk quota */
>>      update_bytes() /* update i_bytes count */
>> It is obvious that write_page should fail because it is too late to
>> return the error to userspace, so data will probably lost which
>> is much more dramatic bug than quota inconsistency.
>> So the only options we have is to:
>> 1) Do not modify inode->i_bytes and return error which caller will
>>    probably ignore. IMHO this is not good because result in
>>    incorrect stat()
>> 
>> 2) do as much as we can (as it happens for now), modify inode->i_bytes
>>    and return positive error code to caller.(which signal what error
>>    result in quota inconsystency only)
>   Yes, agreed that 2) is a better solution.
>
>> This fatal errors handling logic i'll post on top of your patch-set.
>> But please change flag name from NOFAIL to FORCE.
>   Hmm, do we really need to distinguish between your NOFAIL and FORCE?
> I mean there are places where we can handle quota failures (both EDQUOT
> or others) and places where we cannot and then we just want to go on as
> seamlessly as possible. So NOFAIL flag seems to be enough...
>   Now I agree that in theory there can be some caller which might wish
> to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
> of such callsite currently so there's no immediate need for the flag.
> So Eric's patches seem to be fine to me as they are. What do you think?
FORCE but without NOFAIL will be useful for example in
write to fallocated area
extent split/convert (uninitialized=>initialized conversion)
It is not good idea to return EDQUOT from write to reserved area
due to metadata overhead, but it is easy to handle EIO from that method.
So IMHO two flags is not a fancy option, but reasonable design solution.
This design confirms to right for openvz's userbean counters.
The only thing i ask is to rename NOFAIL => FORCE.

BTW I'm too familiar with cross-devel-tree process
If tytso@ will get the patchset will you get an quota-related patches
to linux-fs tree too? Otherwise everybody have to wait for ext4-tree
push to linus's tree and when to linux-fs.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-07 21:55 ` [PATCH 3/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
@ 2010-04-12 13:36   ` Jan Kara
  2010-05-20 23:10   ` tytso
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Kara @ 2010-04-12 13:36 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

> Because we can badly over-reserve metadata when we
> calculate worst-case, it complicates things for quota, since
> we must reserve and then claim later, retry on EDQUOT, etc.
> Quota is also a generally smaller pool than fs free blocks,
> so this over-reservation hurts more, and more often.
> 
> I'm of the opinion that it's not the worst thing to allow
> metadata to push a user slightly over quota.  This simplifies
> the code and avoids the false quota rejections that result
> from worst-case speculation.
> 
> This patch stops the speculative quota-charging for
> worst-case metadata requirements, and just charges quota
> when the blocks are allocated at writeout.  It also is
> able to remove the try-again loop on EDQUOT.
> 
> This patch has been tested indirectly by running the xfstests
> suite with a hack to mount & enable quota prior to the test.
> 
> I also did a more specific test of fragmenting freespace
> and then doing a large delalloc write under quota; quota
> stopped me at the right amount of file IO, and then the
> writeout generated enough metadata (due to the fragmentation)
> that it put me slightly over quota, as expected.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
  This patch looks good as well. Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
> 
>  balloc.c |    5 ++--
>  inode.c  |   64 ++++++++++++++++++++-------------------------------------------
>  2 files changed, 24 insertions(+), 45 deletions(-)
> 
> 
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index d2f37a5..95b7594 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -591,14 +591,15 @@ ext4_fsblk_t ext4_new_meta_blocks(handle_t *handle, struct inode *inode,
>  	ret = ext4_mb_new_blocks(handle, &ar, errp);
>  	if (count)
>  		*count = ar.len;
> -
>  	/*
> -	 * Account for the allocated meta blocks
> +	 * Account for the allocated meta blocks.  We will never
> +	 * fail EDQUOT for metdata, but we do account for it.
>  	 */
>  	if (!(*errp) && EXT4_I(inode)->i_delalloc_reserved_flag) {
>  		spin_lock(&EXT4_I(inode)->i_block_reservation_lock);
>  		EXT4_I(inode)->i_allocated_meta_blocks += ar.len;
>  		spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> +		dquot_alloc_block_nofail(inode, ar.len);
>  	}
>  	return ret;
>  }
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 5381802..dba674a 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1076,7 +1076,6 @@ void ext4_da_update_reserve_space(struct inode *inode,
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>  	struct ext4_inode_info *ei = EXT4_I(inode);
> -	int mdb_free = 0, allocated_meta_blocks = 0;
>  
>  	spin_lock(&ei->i_block_reservation_lock);
>  	trace_ext4_da_update_reserve_space(inode, used);
> @@ -1091,11 +1090,10 @@ void ext4_da_update_reserve_space(struct inode *inode,
>  
>  	/* Update per-inode reservations */
>  	ei->i_reserved_data_blocks -= used;
> -	used += ei->i_allocated_meta_blocks;
>  	ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks;
> -	allocated_meta_blocks = ei->i_allocated_meta_blocks;
> +	percpu_counter_sub(&sbi->s_dirtyblocks_counter,
> +			   used + ei->i_allocated_meta_blocks);
>  	ei->i_allocated_meta_blocks = 0;
> -	percpu_counter_sub(&sbi->s_dirtyblocks_counter, used);
>  
>  	if (ei->i_reserved_data_blocks == 0) {
>  		/*
> @@ -1103,30 +1101,23 @@ void ext4_da_update_reserve_space(struct inode *inode,
>  		 * only when we have written all of the delayed
>  		 * allocation blocks.
>  		 */
> -		mdb_free = ei->i_reserved_meta_blocks;
> +		percpu_counter_sub(&sbi->s_dirtyblocks_counter,
> +				   ei->i_reserved_meta_blocks);
>  		ei->i_reserved_meta_blocks = 0;
>  		ei->i_da_metadata_calc_len = 0;
> -		percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free);
>  	}
>  	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
>  
> -	/* Update quota subsystem */
> +	/* Update quota subsystem for data blocks */
>  	if (quota_claim) {
>  		dquot_claim_block(inode, used);
> -		if (mdb_free)
> -			dquot_release_reservation_block(inode, mdb_free);
>  	} else {
>  		/*
>  		 * We did fallocate with an offset that is already delayed
>  		 * allocated. So on delayed allocated writeback we should
> -		 * not update the quota for allocated blocks. But then
> -		 * converting an fallocate region to initialized region would
> -		 * have caused a metadata allocation. So claim quota for
> -		 * that
> +		 * not re-claim the quota for fallocated blocks.
>  		 */
> -		if (allocated_meta_blocks)
> -			dquot_claim_block(inode, allocated_meta_blocks);
> -		dquot_release_reservation_block(inode, mdb_free + used);
> +		dquot_release_reservation_block(inode, used);
>  	}
>  
>  	/*
> @@ -1860,7 +1851,7 @@ static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
>  	int retries = 0;
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>  	struct ext4_inode_info *ei = EXT4_I(inode);
> -	unsigned long md_needed, md_reserved;
> +	unsigned long md_needed;
>  	int ret;
>  
>  	/*
> @@ -1870,22 +1861,24 @@ static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
>  	 */
>  repeat:
>  	spin_lock(&ei->i_block_reservation_lock);
> -	md_reserved = ei->i_reserved_meta_blocks;
>  	md_needed = ext4_calc_metadata_amount(inode, lblock);
>  	trace_ext4_da_reserve_space(inode, md_needed);
>  	spin_unlock(&ei->i_block_reservation_lock);
>  
>  	/*
> -	 * Make quota reservation here to prevent quota overflow
> -	 * later. Real quota accounting is done at pages writeout
> -	 * time.
> +	 * We will charge metadata quota at writeout time; this saves
> +	 * us from metadata over-estimation, though we may go over by
> +	 * a small amount in the end.  Here we just reserve for data.
>  	 */
> -	ret = dquot_reserve_block(inode, md_needed + 1);
> +	ret = dquot_reserve_block(inode, 1);
>  	if (ret)
>  		return ret;
> -
> +	/*
> +	 * We do still charge estimated metadata to the sb though;
> +	 * we cannot afford to run out of free blocks.
> +	 */
>  	if (ext4_claim_free_blocks(sbi, md_needed + 1)) {
> -		dquot_release_reservation_block(inode, md_needed + 1);
> +		dquot_release_reservation_block(inode, 1);
>  		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
>  			yield();
>  			goto repeat;
> @@ -1932,12 +1925,13 @@ static void ext4_da_release_space(struct inode *inode, int to_free)
>  		 * only when we have written all of the delayed
>  		 * allocation blocks.
>  		 */
> -		to_free += ei->i_reserved_meta_blocks;
> +		percpu_counter_sub(&sbi->s_dirtyblocks_counter,
> +				   ei->i_reserved_meta_blocks);
>  		ei->i_reserved_meta_blocks = 0;
>  		ei->i_da_metadata_calc_len = 0;
>  	}
>  
> -	/* update fs dirty blocks counter */
> +	/* update fs dirty data blocks counter */
>  	percpu_counter_sub(&sbi->s_dirtyblocks_counter, to_free);
>  
>  	spin_unlock(&EXT4_I(inode)->i_block_reservation_lock);
> @@ -3076,7 +3070,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
>  			       loff_t pos, unsigned len, unsigned flags,
>  			       struct page **pagep, void **fsdata)
>  {
> -	int ret, retries = 0, quota_retries = 0;
> +	int ret, retries = 0;
>  	struct page *page;
>  	pgoff_t index;
>  	unsigned from, to;
> @@ -3135,22 +3129,6 @@ retry:
>  
>  	if (ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
>  		goto retry;
> -
> -	if ((ret == -EDQUOT) &&
> -	    EXT4_I(inode)->i_reserved_meta_blocks &&
> -	    (quota_retries++ < 3)) {
> -		/*
> -		 * Since we often over-estimate the number of meta
> -		 * data blocks required, we may sometimes get a
> -		 * spurios out of quota error even though there would
> -		 * be enough space once we write the data blocks and
> -		 * find out how many meta data blocks were _really_
> -		 * required.  So try forcing the inode write to see if
> -		 * that helps.
> -		 */
> -		write_inode_now(inode, (quota_retries == 3));
> -		goto retry;
> -	}
>  out:
>  	return ret;
>  }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 13:33         ` Dmitry Monakhov
@ 2010-04-12 13:37           ` Dmitry Monakhov
  2010-04-12 13:42           ` tytso
  2010-04-12 13:55           ` Jan Kara
  2 siblings, 0 replies; 26+ messages in thread
From: Dmitry Monakhov @ 2010-04-12 13:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: Eric Sandeen, ext4 development

Dmitry Monakhov <dmonakhov@openvz.org> writes:

> Jan Kara <jack@suse.cz> writes:
>
>>> Eric Sandeen <sandeen@redhat.com> writes:
>>> 
>>> > Dmitry Monakhov wrote:
>>> >> Eric Sandeen <sandeen@redhat.com> writes:
>>> >> 
>>> >>> Because we can badly over-reserve metadata when we
>>> >>> calculate worst-case, it complicates things for quota, since
>>> >>> we must reserve and then claim later, retry on EDQUOT, etc.
>>> >>> Quota is also a generally smaller pool than fs free blocks,
>>> >>> so this over-reservation hurts more, and more often.
>>> >>>
>>> >>> I'm of the opinion that it's not the worst thing to allow
>>> >>> metadata to push a user slightly over quota.  This simplifies
>>> >>> the code and avoids the false quota rejections that result
>>> >>> from worst-case speculation.
>>> >> Hm.. Totally agree with issue description. And seem there is no another
>>> >> solution except yours.
>>> >> ASAIU alloc_nofail is called from places where it is impossible to fail
>>> >> an allocation even if something goes wrong.
>>> >> I ask because currently i'm working on EIO handling in alloc/free calls.
>>> >> I've found that it is useless to fail claim/free procedures because
>>> >> caller is unable to handle it properly.
>>> >> It is impossible to fail following operation
>>> >> ->writepage
>>> >>  ->dquot_claim_space (what to do if EIO happens?)
>>> >
>>> > Hm, if these start returning EIO then maybe my patch should be modified
>>> > to treat EDQUOT differently than EIO ... assuming callers can handle
>>> > the return at all.
>>> >
>>> > In other words, make NOFAIL really just mean "don't fail for EDQUOT"
>>> Yes. agree So we have two types of errors
>>> 1) expected errors: EDQUOT
>>> 2) fatal errors: (EIO/ENOSPC/ENOMEM)
>>> So we need two types of flags:
>>> 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
>>>   quota limit
>>> 2)NOFAIL to allow ignore fatal errors.
>>> 
>>> We still need NOFAIL, because for example if something is happens in
>>> ->write_page()
>>>  ->dquot_claim()
>>>      update_quota() -> EIO  /* update disk quota */
>>>      update_bytes() /* update i_bytes count */
>>> It is obvious that write_page should fail because it is too late to
>>> return the error to userspace, so data will probably lost which
>>> is much more dramatic bug than quota inconsistency.
>>> So the only options we have is to:
>>> 1) Do not modify inode->i_bytes and return error which caller will
>>>    probably ignore. IMHO this is not good because result in
>>>    incorrect stat()
>>> 
>>> 2) do as much as we can (as it happens for now), modify inode->i_bytes
>>>    and return positive error code to caller.(which signal what error
>>>    result in quota inconsystency only)
>>   Yes, agreed that 2) is a better solution.
>>
>>> This fatal errors handling logic i'll post on top of your patch-set.
>>> But please change flag name from NOFAIL to FORCE.
>>   Hmm, do we really need to distinguish between your NOFAIL and FORCE?
>> I mean there are places where we can handle quota failures (both EDQUOT
>> or others) and places where we cannot and then we just want to go on as
>> seamlessly as possible. So NOFAIL flag seems to be enough...
>>   Now I agree that in theory there can be some caller which might wish
>> to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
>> of such callsite currently so there's no immediate need for the flag.
>> So Eric's patches seem to be fine to me as they are. What do you think?
> FORCE but without NOFAIL will be useful for example in
> write to fallocated area
> extent split/convert (uninitialized=>initialized conversion)
> It is not good idea to return EDQUOT from write to reserved area
> due to metadata overhead, but it is easy to handle EIO from that method.
> So IMHO two flags is not a fancy option, but reasonable design solution.
> This design confirms to right for openvz's userbean counters.
> The only thing i ask is to rename NOFAIL => FORCE.
>
> BTW I'm too familiar with cross-devel-tree process
OOps i meant to say i'm not familiar :)
> If tytso@ will get the patchset will you get an quota-related patches
> to linux-fs tree too? Otherwise everybody have to wait for ext4-tree
> push to linus's tree and when to linux-fs.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 13:22         ` Jan Kara
@ 2010-04-12 13:40           ` Dmitry Monakhov
  0 siblings, 0 replies; 26+ messages in thread
From: Dmitry Monakhov @ 2010-04-12 13:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: Eric Sandeen, ext4 development

Jan Kara <jack@suse.cz> writes:

>> > Eric Sandeen <sandeen@redhat.com> writes:
>> > 
>> > > Dmitry Monakhov wrote:
>> > >> Eric Sandeen <sandeen@redhat.com> writes:
>> > >> 
>> > >>> Because we can badly over-reserve metadata when we
>> > >>> calculate worst-case, it complicates things for quota, since
>> > >>> we must reserve and then claim later, retry on EDQUOT, etc.
>> > >>> Quota is also a generally smaller pool than fs free blocks,
>> > >>> so this over-reservation hurts more, and more often.
>> > >>>
>> > >>> I'm of the opinion that it's not the worst thing to allow
>> > >>> metadata to push a user slightly over quota.  This simplifies
>> > >>> the code and avoids the false quota rejections that result
>> > >>> from worst-case speculation.
>> > >> Hm.. Totally agree with issue description. And seem there is no another
>> > >> solution except yours.
>> > >> ASAIU alloc_nofail is called from places where it is impossible to fail
>> > >> an allocation even if something goes wrong.
>> > >> I ask because currently i'm working on EIO handling in alloc/free calls.
>> > >> I've found that it is useless to fail claim/free procedures because
>> > >> caller is unable to handle it properly.
>> > >> It is impossible to fail following operation
>> > >> ->writepage
>> > >>  ->dquot_claim_space (what to do if EIO happens?)
>> > >
>> > > Hm, if these start returning EIO then maybe my patch should be modified
>> > > to treat EDQUOT differently than EIO ... assuming callers can handle
>> > > the return at all.
>> > >
>> > > In other words, make NOFAIL really just mean "don't fail for EDQUOT"
>> > Yes. agree So we have two types of errors
>> > 1) expected errors: EDQUOT
>> > 2) fatal errors: (EIO/ENOSPC/ENOMEM)
>> > So we need two types of flags:
>> > 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
>> >   quota limit
>> > 2)NOFAIL to allow ignore fatal errors.
>> > 
>> > We still need NOFAIL, because for example if something is happens in
>> > ->write_page()
>> >  ->dquot_claim()
>> >      update_quota() -> EIO  /* update disk quota */
>> >      update_bytes() /* update i_bytes count */
>> > It is obvious that write_page should fail because it is too late to
>> > return the error to userspace, so data will probably lost which
>> > is much more dramatic bug than quota inconsistency.
>> > So the only options we have is to:
>> > 1) Do not modify inode->i_bytes and return error which caller will
>> >    probably ignore. IMHO this is not good because result in
>> >    incorrect stat()
>> > 
>> > 2) do as much as we can (as it happens for now), modify inode->i_bytes
>> >    and return positive error code to caller.(which signal what error
>> >    result in quota inconsystency only)
>>   Yes, agreed that 2) is a better solution.
>> 
>> > This fatal errors handling logic i'll post on top of your patch-set.
>> > But please change flag name from NOFAIL to FORCE.
>>   Hmm, do we really need to distinguish between your NOFAIL and FORCE?
>> I mean there are places where we can handle quota failures (both EDQUOT
>> or others) and places where we cannot and then we just want to go on as
>> seamlessly as possible. So NOFAIL flag seems to be enough...
>>   Now I agree that in theory there can be some caller which might wish
>> to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
>> of such callsite currently so there's no immediate need for the flag.
>> So Eric's patches seem to be fine to me as they are. What do you think?
>   Maybe one more point - the callsite in ext4 that uses _nofail variant
> really needs full NOFAIL semantics because we have no way of handling
> an error there.
Yes. All places which Eric changed has (NOFAIL|FORCE) semantics.
But his patch spotted other places where we can use only FORCE.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 13:33         ` Dmitry Monakhov
  2010-04-12 13:37           ` Dmitry Monakhov
@ 2010-04-12 13:42           ` tytso
  2010-04-12 13:52             ` Dmitry Monakhov
  2010-04-12 13:55           ` Jan Kara
  2 siblings, 1 reply; 26+ messages in thread
From: tytso @ 2010-04-12 13:42 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, Eric Sandeen, ext4 development

On Mon, Apr 12, 2010 at 05:33:14PM +0400, Dmitry Monakhov wrote:
> BTW I'm too familiar with cross-devel-tree process
> If tytso@ will get the patchset will you get an quota-related patches
> to linux-fs tree too? Otherwise everybody have to wait for ext4-tree
> push to linus's tree and when to linux-fs.

I've already asked Jan if he would mind my carrying the quota patches
in the ext4 tree, since I believe there's less chance of patch
collisions with upcoming changes in the quota tree than if they were
carried in the quota tree and we had to worry about changes to the
ext4 tree.

These patches are also low-risk enough (they'll either work or they
won't, and it's not hard to desk-check them for correctness) that we
could push them to Linus now before the merge window, and see if he's
willing to take them.  I have some data corruption bugfixes I need to
push to Linus anyway....  I dunno if Linus will be willing to take
them, but that's the other approach.

					- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 13:42           ` tytso
@ 2010-04-12 13:52             ` Dmitry Monakhov
  0 siblings, 0 replies; 26+ messages in thread
From: Dmitry Monakhov @ 2010-04-12 13:52 UTC (permalink / raw)
  To: tytso; +Cc: Jan Kara, Eric Sandeen, ext4 development

tytso@mit.edu writes:

> On Mon, Apr 12, 2010 at 05:33:14PM +0400, Dmitry Monakhov wrote:
>> BTW I'm too familiar with cross-devel-tree process
>> If tytso@ will get the patchset will you get an quota-related patches
>> to linux-fs tree too? Otherwise everybody have to wait for ext4-tree
>> push to linus's tree and when to linux-fs.
>
> I've already asked Jan if he would mind my carrying the quota patches
> in the ext4 tree, since I believe there's less chance of patch
> collisions with upcoming changes in the quota tree than if they were
> carried in the quota tree and we had to worry about changes to the
> ext4 tree.
Yess. i do understand that both interesting in common quota-related peace.
>
> These patches are also low-risk enough (they'll either work or they
> won't, and it's not hard to desk-check them for correctness) that we
> could push them to Linus now before the merge window, and see if he's
> willing to take them.  I have some data corruption bugfixes I need to
> push to Linus anyway....  I dunno if Linus will be willing to take
> them, but that's the other approach.
>
> 					- Ted
Both options is too complex (as least for my tiny brain)
Is is possible to accept quota related patches in both(linux-fs and
ext4) tires at least in for-next branches. Seems that git is able to
detect equivalent patches via cherry-picking-like mechanisms.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 13:33         ` Dmitry Monakhov
  2010-04-12 13:37           ` Dmitry Monakhov
  2010-04-12 13:42           ` tytso
@ 2010-04-12 13:55           ` Jan Kara
  2010-04-12 14:08             ` Dmitry Monakhov
  2 siblings, 1 reply; 26+ messages in thread
From: Jan Kara @ 2010-04-12 13:55 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, Eric Sandeen, ext4 development

> Jan Kara <jack@suse.cz> writes:
> 
> >> Eric Sandeen <sandeen@redhat.com> writes:
> >> 
> >> > Dmitry Monakhov wrote:
> >> >> Eric Sandeen <sandeen@redhat.com> writes:
> >> >> 
> >> >>> Because we can badly over-reserve metadata when we
> >> >>> calculate worst-case, it complicates things for quota, since
> >> >>> we must reserve and then claim later, retry on EDQUOT, etc.
> >> >>> Quota is also a generally smaller pool than fs free blocks,
> >> >>> so this over-reservation hurts more, and more often.
> >> >>>
> >> >>> I'm of the opinion that it's not the worst thing to allow
> >> >>> metadata to push a user slightly over quota.  This simplifies
> >> >>> the code and avoids the false quota rejections that result
> >> >>> from worst-case speculation.
> >> >> Hm.. Totally agree with issue description. And seem there is no another
> >> >> solution except yours.
> >> >> ASAIU alloc_nofail is called from places where it is impossible to fail
> >> >> an allocation even if something goes wrong.
> >> >> I ask because currently i'm working on EIO handling in alloc/free calls.
> >> >> I've found that it is useless to fail claim/free procedures because
> >> >> caller is unable to handle it properly.
> >> >> It is impossible to fail following operation
> >> >> ->writepage
> >> >>  ->dquot_claim_space (what to do if EIO happens?)
> >> >
> >> > Hm, if these start returning EIO then maybe my patch should be modified
> >> > to treat EDQUOT differently than EIO ... assuming callers can handle
> >> > the return at all.
> >> >
> >> > In other words, make NOFAIL really just mean "don't fail for EDQUOT"
> >> Yes. agree So we have two types of errors
> >> 1) expected errors: EDQUOT
> >> 2) fatal errors: (EIO/ENOSPC/ENOMEM)
> >> So we need two types of flags:
> >> 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
> >>   quota limit
> >> 2)NOFAIL to allow ignore fatal errors.
> >> 
> >> We still need NOFAIL, because for example if something is happens in
> >> ->write_page()
> >>  ->dquot_claim()
> >>      update_quota() -> EIO  /* update disk quota */
> >>      update_bytes() /* update i_bytes count */
> >> It is obvious that write_page should fail because it is too late to
> >> return the error to userspace, so data will probably lost which
> >> is much more dramatic bug than quota inconsistency.
> >> So the only options we have is to:
> >> 1) Do not modify inode->i_bytes and return error which caller will
> >>    probably ignore. IMHO this is not good because result in
> >>    incorrect stat()
> >> 
> >> 2) do as much as we can (as it happens for now), modify inode->i_bytes
> >>    and return positive error code to caller.(which signal what error
> >>    result in quota inconsystency only)
> >   Yes, agreed that 2) is a better solution.
> >
> >> This fatal errors handling logic i'll post on top of your patch-set.
> >> But please change flag name from NOFAIL to FORCE.
> >   Hmm, do we really need to distinguish between your NOFAIL and FORCE?
> > I mean there are places where we can handle quota failures (both EDQUOT
> > or others) and places where we cannot and then we just want to go on as
> > seamlessly as possible. So NOFAIL flag seems to be enough...
> >   Now I agree that in theory there can be some caller which might wish
> > to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
> > of such callsite currently so there's no immediate need for the flag.
> > So Eric's patches seem to be fine to me as they are. What do you think?
> FORCE but without NOFAIL will be useful for example in
> write to fallocated area
> extent split/convert (uninitialized=>initialized conversion)
> It is not good idea to return EDQUOT from write to reserved area
> due to metadata overhead, but it is easy to handle EIO from that method.
> So IMHO two flags is not a fancy option, but reasonable design solution.
> This design confirms to right for openvz's userbean counters.
> The only thing i ask is to rename NOFAIL => FORCE.
  But as I wrote in my other email that callsite in ext4 really needs NOFAIL,
not only FORCE as far as I can see. So Eric's patch is actually right, isn't
it?

> BTW I'm too familiar with cross-devel-tree process
> If tytso@ will get the patchset will you get an quota-related patches
> to linux-fs tree too? Otherwise everybody have to wait for ext4-tree
> push to linus's tree and when to linux-fs.
  Yes, I can carry them in my tree as well (I think git is clever enough to
recognize identical patches in two different trees and do not conflict).
Only I can push the changes dependent on them only after Ted pushes his
tree to Linus.

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12  2:20 ` tytso
@ 2010-04-12 14:03   ` Jan Kara
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Kara @ 2010-04-12 14:03 UTC (permalink / raw)
  To: tytso; +Cc: Eric Sandeen, ext4 development, Jan Kara

> On Wed, Apr 07, 2010 at 04:45:52PM -0500, Eric Sandeen wrote:
> > Because we can badly over-reserve metadata when we
> > calculate worst-case, it complicates things for quota, since
> > we must reserve and then claim later, retry on EDQUOT, etc.
> > Quota is also a generally smaller pool than fs free blocks,
> > so this over-reservation hurts more, and more often.
> > 
> > I'm of the opinion that it's not the worst thing to allow
> > metadata to push a user slightly over quota.  This simplifies
> > the code and avoids the false quota rejections that result
> > from worst-case speculation.
> 
> This patch series looks good to me in general; Jan, it requires
> relatively minor changes to the quota system, so it would be good to
> get your Acked-by for the first two patches.  Since the changes to the
> ext4 layer are more in-depth, any objections if I carry all three
> patches in the ext4 tree?
  Yes, I'm fine with you carrying the quota patches. I've already sent
my acks.

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 13:55           ` Jan Kara
@ 2010-04-12 14:08             ` Dmitry Monakhov
  2010-04-12 14:24               ` Jan Kara
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Monakhov @ 2010-04-12 14:08 UTC (permalink / raw)
  To: Jan Kara; +Cc: Eric Sandeen, ext4 development

Jan Kara <jack@suse.cz> writes:

>> Jan Kara <jack@suse.cz> writes:
>> 
>> >> Eric Sandeen <sandeen@redhat.com> writes:
>> >> 
>> >> > Dmitry Monakhov wrote:
>> >> >> Eric Sandeen <sandeen@redhat.com> writes:
>> >> >> 
>> >> >>> Because we can badly over-reserve metadata when we
>> >> >>> calculate worst-case, it complicates things for quota, since
>> >> >>> we must reserve and then claim later, retry on EDQUOT, etc.
>> >> >>> Quota is also a generally smaller pool than fs free blocks,
>> >> >>> so this over-reservation hurts more, and more often.
>> >> >>>
>> >> >>> I'm of the opinion that it's not the worst thing to allow
>> >> >>> metadata to push a user slightly over quota.  This simplifies
>> >> >>> the code and avoids the false quota rejections that result
>> >> >>> from worst-case speculation.
>> >> >> Hm.. Totally agree with issue description. And seem there is no another
>> >> >> solution except yours.
>> >> >> ASAIU alloc_nofail is called from places where it is impossible to fail
>> >> >> an allocation even if something goes wrong.
>> >> >> I ask because currently i'm working on EIO handling in alloc/free calls.
>> >> >> I've found that it is useless to fail claim/free procedures because
>> >> >> caller is unable to handle it properly.
>> >> >> It is impossible to fail following operation
>> >> >> ->writepage
>> >> >>  ->dquot_claim_space (what to do if EIO happens?)
>> >> >
>> >> > Hm, if these start returning EIO then maybe my patch should be modified
>> >> > to treat EDQUOT differently than EIO ... assuming callers can handle
>> >> > the return at all.
>> >> >
>> >> > In other words, make NOFAIL really just mean "don't fail for EDQUOT"
>> >> Yes. agree So we have two types of errors
>> >> 1) expected errors: EDQUOT
>> >> 2) fatal errors: (EIO/ENOSPC/ENOMEM)
>> >> So we need two types of flags:
>> >> 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
>> >>   quota limit
>> >> 2)NOFAIL to allow ignore fatal errors.
>> >> 
>> >> We still need NOFAIL, because for example if something is happens in
>> >> ->write_page()
>> >>  ->dquot_claim()
>> >>      update_quota() -> EIO  /* update disk quota */
>> >>      update_bytes() /* update i_bytes count */
>> >> It is obvious that write_page should fail because it is too late to
>> >> return the error to userspace, so data will probably lost which
>> >> is much more dramatic bug than quota inconsistency.
>> >> So the only options we have is to:
>> >> 1) Do not modify inode->i_bytes and return error which caller will
>> >>    probably ignore. IMHO this is not good because result in
>> >>    incorrect stat()
>> >> 
>> >> 2) do as much as we can (as it happens for now), modify inode->i_bytes
>> >>    and return positive error code to caller.(which signal what error
>> >>    result in quota inconsystency only)
>> >   Yes, agreed that 2) is a better solution.
>> >
>> >> This fatal errors handling logic i'll post on top of your patch-set.
>> >> But please change flag name from NOFAIL to FORCE.
>> >   Hmm, do we really need to distinguish between your NOFAIL and FORCE?
>> > I mean there are places where we can handle quota failures (both EDQUOT
>> > or others) and places where we cannot and then we just want to go on as
>> > seamlessly as possible. So NOFAIL flag seems to be enough...
>> >   Now I agree that in theory there can be some caller which might wish
>> > to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
>> > of such callsite currently so there's no immediate need for the flag.
>> > So Eric's patches seem to be fine to me as they are. What do you think?
>> FORCE but without NOFAIL will be useful for example in
>> write to fallocated area
>> extent split/convert (uninitialized=>initialized conversion)
>> It is not good idea to return EDQUOT from write to reserved area
>> due to metadata overhead, but it is easy to handle EIO from that method.
>> So IMHO two flags is not a fancy option, but reasonable design solution.
>> This design confirms to right for openvz's userbean counters.
>> The only thing i ask is to rename NOFAIL => FORCE.
>   But as I wrote in my other email that callsite in ext4 really needs NOFAIL,
> not only FORCE as far as I can see. So Eric's patch is actually right, isn't
> it?
Ohh. Seems we are talking about the same thing but my explanation is
not clear (sorry for my crappy English)
The whole patch is definitely right. All places changed has NOFAIL
(it must succeed in 100% of cases)semantics. And it is reasonable to
accept it as is.
Later i'll add new FORCE flag and:
1) convert all NOFAIL callers to (FORCE|NOFAIL)
2) add new callers of FORCE.
>
>> BTW I'm too familiar with cross-devel-tree process
>> If tytso@ will get the patchset will you get an quota-related patches
>> to linux-fs tree too? Otherwise everybody have to wait for ext4-tree
>> push to linus's tree and when to linux-fs.
>   Yes, I can carry them in my tree as well (I think git is clever enough to
> recognize identical patches in two different trees and do not conflict).
> Only I can push the changes dependent on them only after Ted pushes his
> tree to Linus.
>
> 								Honza

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 14:08             ` Dmitry Monakhov
@ 2010-04-12 14:24               ` Jan Kara
  2010-04-12 15:50                 ` Eric Sandeen
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Kara @ 2010-04-12 14:24 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, Eric Sandeen, ext4 development

On Mon 12-04-10 18:08:10, Dmitry Monakhov wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> Jan Kara <jack@suse.cz> writes:
> >> 
> >> >> Eric Sandeen <sandeen@redhat.com> writes:
> >> >> 
> >> >> > Dmitry Monakhov wrote:
> >> >> >> Eric Sandeen <sandeen@redhat.com> writes:
> >> >> >> 
> >> >> >>> Because we can badly over-reserve metadata when we
> >> >> >>> calculate worst-case, it complicates things for quota, since
> >> >> >>> we must reserve and then claim later, retry on EDQUOT, etc.
> >> >> >>> Quota is also a generally smaller pool than fs free blocks,
> >> >> >>> so this over-reservation hurts more, and more often.
> >> >> >>>
> >> >> >>> I'm of the opinion that it's not the worst thing to allow
> >> >> >>> metadata to push a user slightly over quota.  This simplifies
> >> >> >>> the code and avoids the false quota rejections that result
> >> >> >>> from worst-case speculation.
> >> >> >> Hm.. Totally agree with issue description. And seem there is no another
> >> >> >> solution except yours.
> >> >> >> ASAIU alloc_nofail is called from places where it is impossible to fail
> >> >> >> an allocation even if something goes wrong.
> >> >> >> I ask because currently i'm working on EIO handling in alloc/free calls.
> >> >> >> I've found that it is useless to fail claim/free procedures because
> >> >> >> caller is unable to handle it properly.
> >> >> >> It is impossible to fail following operation
> >> >> >> ->writepage
> >> >> >>  ->dquot_claim_space (what to do if EIO happens?)
> >> >> >
> >> >> > Hm, if these start returning EIO then maybe my patch should be modified
> >> >> > to treat EDQUOT differently than EIO ... assuming callers can handle
> >> >> > the return at all.
> >> >> >
> >> >> > In other words, make NOFAIL really just mean "don't fail for EDQUOT"
> >> >> Yes. agree So we have two types of errors
> >> >> 1) expected errors: EDQUOT
> >> >> 2) fatal errors: (EIO/ENOSPC/ENOMEM)
> >> >> So we need two types of flags:
> >> >> 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
> >> >>   quota limit
> >> >> 2)NOFAIL to allow ignore fatal errors.
> >> >> 
> >> >> We still need NOFAIL, because for example if something is happens in
> >> >> ->write_page()
> >> >>  ->dquot_claim()
> >> >>      update_quota() -> EIO  /* update disk quota */
> >> >>      update_bytes() /* update i_bytes count */
> >> >> It is obvious that write_page should fail because it is too late to
> >> >> return the error to userspace, so data will probably lost which
> >> >> is much more dramatic bug than quota inconsistency.
> >> >> So the only options we have is to:
> >> >> 1) Do not modify inode->i_bytes and return error which caller will
> >> >>    probably ignore. IMHO this is not good because result in
> >> >>    incorrect stat()
> >> >> 
> >> >> 2) do as much as we can (as it happens for now), modify inode->i_bytes
> >> >>    and return positive error code to caller.(which signal what error
> >> >>    result in quota inconsystency only)
> >> >   Yes, agreed that 2) is a better solution.
> >> >
> >> >> This fatal errors handling logic i'll post on top of your patch-set.
> >> >> But please change flag name from NOFAIL to FORCE.
> >> >   Hmm, do we really need to distinguish between your NOFAIL and FORCE?
> >> > I mean there are places where we can handle quota failures (both EDQUOT
> >> > or others) and places where we cannot and then we just want to go on as
> >> > seamlessly as possible. So NOFAIL flag seems to be enough...
> >> >   Now I agree that in theory there can be some caller which might wish
> >> > to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
> >> > of such callsite currently so there's no immediate need for the flag.
> >> > So Eric's patches seem to be fine to me as they are. What do you think?
> >> FORCE but without NOFAIL will be useful for example in
> >> write to fallocated area
> >> extent split/convert (uninitialized=>initialized conversion)
> >> It is not good idea to return EDQUOT from write to reserved area
> >> due to metadata overhead, but it is easy to handle EIO from that method.
> >> So IMHO two flags is not a fancy option, but reasonable design solution.
> >> This design confirms to right for openvz's userbean counters.
> >> The only thing i ask is to rename NOFAIL => FORCE.
> >   But as I wrote in my other email that callsite in ext4 really needs NOFAIL,
> > not only FORCE as far as I can see. So Eric's patch is actually right, isn't
> > it?
> Ohh. Seems we are talking about the same thing but my explanation is
> not clear (sorry for my crappy English)
> The whole patch is definitely right. All places changed has NOFAIL
> (it must succeed in 100% of cases)semantics. And it is reasonable to
> accept it as is.
> Later i'll add new FORCE flag and:
> 1) convert all NOFAIL callers to (FORCE|NOFAIL)
> 2) add new callers of FORCE.
  OK :) This is fine with me. I'd just slightly favour "NOLIMIT" instead
of "FORCE" since it explains better what the flag does.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-12 14:24               ` Jan Kara
@ 2010-04-12 15:50                 ` Eric Sandeen
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Sandeen @ 2010-04-12 15:50 UTC (permalink / raw)
  To: Jan Kara; +Cc: Dmitry Monakhov, ext4 development

Jan Kara wrote:
> On Mon 12-04-10 18:08:10, Dmitry Monakhov wrote:
>> Jan Kara <jack@suse.cz> writes:
>>
>>>> Jan Kara <jack@suse.cz> writes:
>>>>
>>>>>> Eric Sandeen <sandeen@redhat.com> writes:
>>>>>>
>>>>>>> Dmitry Monakhov wrote:
>>>>>>>> Eric Sandeen <sandeen@redhat.com> writes:
>>>>>>>>
>>>>>>>>> Because we can badly over-reserve metadata when we
>>>>>>>>> calculate worst-case, it complicates things for quota, since
>>>>>>>>> we must reserve and then claim later, retry on EDQUOT, etc.
>>>>>>>>> Quota is also a generally smaller pool than fs free blocks,
>>>>>>>>> so this over-reservation hurts more, and more often.
>>>>>>>>>
>>>>>>>>> I'm of the opinion that it's not the worst thing to allow
>>>>>>>>> metadata to push a user slightly over quota.  This simplifies
>>>>>>>>> the code and avoids the false quota rejections that result
>>>>>>>>> from worst-case speculation.
>>>>>>>> Hm.. Totally agree with issue description. And seem there is no another
>>>>>>>> solution except yours.
>>>>>>>> ASAIU alloc_nofail is called from places where it is impossible to fail
>>>>>>>> an allocation even if something goes wrong.
>>>>>>>> I ask because currently i'm working on EIO handling in alloc/free calls.
>>>>>>>> I've found that it is useless to fail claim/free procedures because
>>>>>>>> caller is unable to handle it properly.
>>>>>>>> It is impossible to fail following operation
>>>>>>>> ->writepage
>>>>>>>>  ->dquot_claim_space (what to do if EIO happens?)
>>>>>>> Hm, if these start returning EIO then maybe my patch should be modified
>>>>>>> to treat EDQUOT differently than EIO ... assuming callers can handle
>>>>>>> the return at all.
>>>>>>>
>>>>>>> In other words, make NOFAIL really just mean "don't fail for EDQUOT"
>>>>>> Yes. agree So we have two types of errors
>>>>>> 1) expected errors: EDQUOT
>>>>>> 2) fatal errors: (EIO/ENOSPC/ENOMEM)
>>>>>> So we need two types of flags:
>>>>>> 1)FORCE (IMHO it is better name than you proposed) to allow exceed a
>>>>>>   quota limit
>>>>>> 2)NOFAIL to allow ignore fatal errors.
>>>>>>
>>>>>> We still need NOFAIL, because for example if something is happens in
>>>>>> ->write_page()
>>>>>>  ->dquot_claim()
>>>>>>      update_quota() -> EIO  /* update disk quota */
>>>>>>      update_bytes() /* update i_bytes count */
>>>>>> It is obvious that write_page should fail because it is too late to
>>>>>> return the error to userspace, so data will probably lost which
>>>>>> is much more dramatic bug than quota inconsistency.
>>>>>> So the only options we have is to:
>>>>>> 1) Do not modify inode->i_bytes and return error which caller will
>>>>>>    probably ignore. IMHO this is not good because result in
>>>>>>    incorrect stat()
>>>>>>
>>>>>> 2) do as much as we can (as it happens for now), modify inode->i_bytes
>>>>>>    and return positive error code to caller.(which signal what error
>>>>>>    result in quota inconsystency only)
>>>>>   Yes, agreed that 2) is a better solution.
>>>>>
>>>>>> This fatal errors handling logic i'll post on top of your patch-set.
>>>>>> But please change flag name from NOFAIL to FORCE.
>>>>>   Hmm, do we really need to distinguish between your NOFAIL and FORCE?
>>>>> I mean there are places where we can handle quota failures (both EDQUOT
>>>>> or others) and places where we cannot and then we just want to go on as
>>>>> seamlessly as possible. So NOFAIL flag seems to be enough...
>>>>>   Now I agree that in theory there can be some caller which might wish
>>>>> to seamlessly continue on EDQUOT and bail out on EIO but I'm not aware
>>>>> of such callsite currently so there's no immediate need for the flag.
>>>>> So Eric's patches seem to be fine to me as they are. What do you think?
>>>> FORCE but without NOFAIL will be useful for example in
>>>> write to fallocated area
>>>> extent split/convert (uninitialized=>initialized conversion)
>>>> It is not good idea to return EDQUOT from write to reserved area
>>>> due to metadata overhead, but it is easy to handle EIO from that method.
>>>> So IMHO two flags is not a fancy option, but reasonable design solution.
>>>> This design confirms to right for openvz's userbean counters.
>>>> The only thing i ask is to rename NOFAIL => FORCE.
>>>   But as I wrote in my other email that callsite in ext4 really needs NOFAIL,
>>> not only FORCE as far as I can see. So Eric's patch is actually right, isn't
>>> it?
>> Ohh. Seems we are talking about the same thing but my explanation is
>> not clear (sorry for my crappy English)
>> The whole patch is definitely right. All places changed has NOFAIL
>> (it must succeed in 100% of cases)semantics. And it is reasonable to
>> accept it as is.
>> Later i'll add new FORCE flag and:
>> 1) convert all NOFAIL callers to (FORCE|NOFAIL)
>> 2) add new callers of FORCE.
>   OK :) This is fine with me. I'd just slightly favour "NOLIMIT" instead
> of "FORCE" since it explains better what the flag does.

Chiming in at the end of a long thread, I'll just say I like NOLIMIT better, too :)

-Eric

> 								Honza


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/3] quota: use flags interface for dquot alloc/free space
  2010-04-07 21:49 ` [PATCH 1/3] quota: use flags interface for dquot alloc/free space Eric Sandeen
  2010-04-12 13:22   ` Jan Kara
@ 2010-05-20 22:29   ` tytso
  1 sibling, 0 replies; 26+ messages in thread
From: tytso @ 2010-05-20 22:29 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

On Wed, Apr 07, 2010 at 04:49:05PM -0500, Eric Sandeen wrote:
> Switch __dquot_alloc_space and __dquot_free_space to take flags
> to indicate whether to warn and/or to reserve (or free reserve).
> 
> This is slightly more readable at the callpoints, and makes it
> cleaner to add a "nofail" option in the next patch.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Applied to the ext4 patch queue, apologies for the delay.

	       	    	  	 - Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/3] quota: add the option to not fail with EDQUOT in block allocation
  2010-04-07 21:52 ` [PATCH 2/3] quota: add the option to not fail with EDQUOT in block allocation Eric Sandeen
  2010-04-12 13:23   ` Jan Kara
@ 2010-05-20 22:32   ` tytso
  1 sibling, 0 replies; 26+ messages in thread
From: tytso @ 2010-05-20 22:32 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

On Wed, Apr 07, 2010 at 04:52:28PM -0500, Eric Sandeen wrote:
> To simplify metadata tracking for delalloc writes, ext4
> will simply claim metadata blocks at allocation time, without
> first speculatively reserving the worst case and then freeing
> what was not used.
> 
> To do this, we need a mechanism to track allocations in
> the quota subsystem, but potentially allow that allocation
> to actually go over quota.
> 
> This patch adds a DQUOT_SPACE_NOFAIL flag and function
> variants for this purpose.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Applied to the ext4 patch queue.

					- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/3] ext4: don't use quota reservation for speculative metadata blocks
  2010-04-07 21:55 ` [PATCH 3/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
  2010-04-12 13:36   ` Jan Kara
@ 2010-05-20 23:10   ` tytso
  1 sibling, 0 replies; 26+ messages in thread
From: tytso @ 2010-05-20 23:10 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

On Wed, Apr 07, 2010 at 04:55:52PM -0500, Eric Sandeen wrote:
> Because we can badly over-reserve metadata when we
> calculate worst-case, it complicates things for quota, since
> we must reserve and then claim later, retry on EDQUOT, etc.
> Quota is also a generally smaller pool than fs free blocks,
> so this over-reservation hurts more, and more often.
> 
> I'm of the opinion that it's not the worst thing to allow
> metadata to push a user slightly over quota.  This simplifies
> the code and avoids the false quota rejections that result
> from worst-case speculation.
> 
> This patch stops the speculative quota-charging for
> worst-case metadata requirements, and just charges quota
> when the blocks are allocated at writeout.  It also is
> able to remove the try-again loop on EDQUOT.
> 
> This patch has been tested indirectly by running the xfstests
> suite with a hack to mount & enable quota prior to the test.
> 
> I also did a more specific test of fragmenting freespace
> and then doing a large delalloc write under quota; quota
> stopped me at the right amount of file IO, and then the
> writeout generated enough metadata (due to the fragmentation)
> that it put me slightly over quota, as expected.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Applied to the ext4 patch queue; there were slight adjustments to make
the patch apply.

Note: I did some testing of this patch, and found the following
warnings getting emitted when I ran fsstress:

fsstress -d /scratch/fsstress -n 100 -p 10

[   93.338647] JBD: Spotted dirty metadata buffer (dev = sdc1, blocknr = 594944). There's a risk of filesystem corruption in case of system crash.
[   93.390107] JBD: Spotted dirty metadata buffer (dev = sdc1, blocknr = 594944). There's a risk of filesystem corruption in case of system crash.
[  149.249765] JBD: Spotted dirty metadata buffer (dev = sdc1, blocknr = 594142). There's a risk of filesystem corruption in case of system crash.
[  149.305717] JBD: Spotted dirty metadata buffer (dev = sdc1, blocknr = 594142). There's a risk of filesystem corruption in case of system crash.
[  152.523048] JBD: Spotted dirty metadata buffer (dev = sdc1, blocknr = 590048). There's a risk of filesystem corruption in case of system crash.
[  152.544641] JBD: Spotted dirty metadata buffer (dev = sdc1, blocknr = 590048). There's a risk of filesystem corruption in case of system crash.
[  154.147876] JBD: Spotted dirty metadata buffer (dev = sdc1, blocknr = 590049). There's a risk of filesystem corruption in case of system crash.
[  154.192645] JBD: Spotted dirty metadata buffer (dev = sdc1, blocknr = 590049). There's a risk of filesystem corruption in case of system crash.

... but I found the same warning messages after I removed your patch,
so presumably this is a pre-existing condition with quotas and ext4.
Has anyone looked into this?

					- Ted

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-05-20 23:10 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-07 21:45 [PATCH 0/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
2010-04-07 21:49 ` [PATCH 1/3] quota: use flags interface for dquot alloc/free space Eric Sandeen
2010-04-12 13:22   ` Jan Kara
2010-05-20 22:29   ` tytso
2010-04-07 21:52 ` [PATCH 2/3] quota: add the option to not fail with EDQUOT in block allocation Eric Sandeen
2010-04-12 13:23   ` Jan Kara
2010-05-20 22:32   ` tytso
2010-04-07 21:55 ` [PATCH 3/3] ext4: don't use quota reservation for speculative metadata blocks Eric Sandeen
2010-04-12 13:36   ` Jan Kara
2010-05-20 23:10   ` tytso
2010-04-08  8:20 ` [PATCH 0/3] " Dmitry Monakhov
2010-04-08 15:28   ` Eric Sandeen
2010-04-11  9:37     ` Dmitry Monakhov
2010-04-12 13:15       ` Jan Kara
2010-04-12 13:22         ` Jan Kara
2010-04-12 13:40           ` Dmitry Monakhov
2010-04-12 13:33         ` Dmitry Monakhov
2010-04-12 13:37           ` Dmitry Monakhov
2010-04-12 13:42           ` tytso
2010-04-12 13:52             ` Dmitry Monakhov
2010-04-12 13:55           ` Jan Kara
2010-04-12 14:08             ` Dmitry Monakhov
2010-04-12 14:24               ` Jan Kara
2010-04-12 15:50                 ` Eric Sandeen
2010-04-12  2:20 ` tytso
2010-04-12 14:03   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.