All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tmpfs: truncate prealloc blocks past i_size
@ 2014-10-29 17:10 ` Josef Bacik
  0 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2014-10-29 17:10 UTC (permalink / raw)
  To: hughd, linux-mm, linux-fsdevel

One of the rocksdb people noticed that when you do something like this

fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
pwrite(fd, buf, 5M, 0)
ftruncate(5M)

on tmpfs the file would still take up 10M, which lead to super fun issues
because we were getting ENOSPC before we thought we should be getting ENOSPC.
This patch fixes the problem, and mirrors what all the other fs'es do.  I tested
it locally to make sure it worked properly with the following

xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file

Without the patch we have "Blocks: 20480", with the patch we have the correct
value of "Blocks: 10240".  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 185836b..79b7fb5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -574,7 +574,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 			i_size_write(inode, newsize);
 			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 		}
-		if (newsize < oldsize) {
+		if (newsize <= oldsize) {
 			loff_t holebegin = round_up(newsize, PAGE_SIZE);
 			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
 			shmem_truncate_range(inode, newsize, (loff_t)-1);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] tmpfs: truncate prealloc blocks past i_size
@ 2014-10-29 17:10 ` Josef Bacik
  0 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2014-10-29 17:10 UTC (permalink / raw)
  To: hughd, linux-mm, linux-fsdevel

One of the rocksdb people noticed that when you do something like this

fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
pwrite(fd, buf, 5M, 0)
ftruncate(5M)

on tmpfs the file would still take up 10M, which lead to super fun issues
because we were getting ENOSPC before we thought we should be getting ENOSPC.
This patch fixes the problem, and mirrors what all the other fs'es do.  I tested
it locally to make sure it worked properly with the following

xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file

Without the patch we have "Blocks: 20480", with the patch we have the correct
value of "Blocks: 10240".  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 185836b..79b7fb5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -574,7 +574,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 			i_size_write(inode, newsize);
 			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 		}
-		if (newsize < oldsize) {
+		if (newsize <= oldsize) {
 			loff_t holebegin = round_up(newsize, PAGE_SIZE);
 			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
 			shmem_truncate_range(inode, newsize, (loff_t)-1);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] tmpfs: truncate prealloc blocks past i_size
  2014-10-29 17:10 ` Josef Bacik
@ 2014-11-04  1:30   ` Hugh Dickins
  -1 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2014-11-04  1:30 UTC (permalink / raw)
  To: Josef Bacik; +Cc: hughd, linux-mm, linux-fsdevel

On Wed, 29 Oct 2014, Josef Bacik wrote:

> One of the rocksdb people noticed that when you do something like this
> 
> fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
> pwrite(fd, buf, 5M, 0)
> ftruncate(5M)
> 
> on tmpfs the file would still take up 10M, which lead to super fun issues
> because we were getting ENOSPC before we thought we should be getting ENOSPC.
> This patch fixes the problem, and mirrors what all the other fs'es do.  I tested
> it locally to make sure it worked properly with the following
> 
> xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file
> 
> Without the patch we have "Blocks: 20480", with the patch we have the correct
> value of "Blocks: 10240".  Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

That is a very good catch, and thank you for the patch.  But I am not
convinced that the patch is correct - even if it does happen to end
up doing what other filesystems do here (I haven't checked).

Your patch makes it look like a fix to an off-by-one, but that is
not really the case.  What if you change your final ftruncate(5M)
to ftruncate(6M): what should happen then?

My intuition says that what should happen is that i_size is set to 6M,
and the fallocated excess blocks beyond 6M be trimmed off: so that
it's both an extending and a shrinking truncate at the same time.
And I think that behavior would be served by removing the
"if (newsize < oldsize)" condition completely.

But perhaps I'm wrong: can you or anyone shed more light on this,
or point to documentation of what should happen in these cases?

Thanks,
Hugh

> ---
>  mm/shmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 185836b..79b7fb5 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -574,7 +574,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
>  			i_size_write(inode, newsize);
>  			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
>  		}
> -		if (newsize < oldsize) {
> +		if (newsize <= oldsize) {
>  			loff_t holebegin = round_up(newsize, PAGE_SIZE);
>  			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
>  			shmem_truncate_range(inode, newsize, (loff_t)-1);
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tmpfs: truncate prealloc blocks past i_size
@ 2014-11-04  1:30   ` Hugh Dickins
  0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2014-11-04  1:30 UTC (permalink / raw)
  To: Josef Bacik; +Cc: hughd, linux-mm, linux-fsdevel

On Wed, 29 Oct 2014, Josef Bacik wrote:

> One of the rocksdb people noticed that when you do something like this
> 
> fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
> pwrite(fd, buf, 5M, 0)
> ftruncate(5M)
> 
> on tmpfs the file would still take up 10M, which lead to super fun issues
> because we were getting ENOSPC before we thought we should be getting ENOSPC.
> This patch fixes the problem, and mirrors what all the other fs'es do.  I tested
> it locally to make sure it worked properly with the following
> 
> xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file
> 
> Without the patch we have "Blocks: 20480", with the patch we have the correct
> value of "Blocks: 10240".  Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

That is a very good catch, and thank you for the patch.  But I am not
convinced that the patch is correct - even if it does happen to end
up doing what other filesystems do here (I haven't checked).

Your patch makes it look like a fix to an off-by-one, but that is
not really the case.  What if you change your final ftruncate(5M)
to ftruncate(6M): what should happen then?

My intuition says that what should happen is that i_size is set to 6M,
and the fallocated excess blocks beyond 6M be trimmed off: so that
it's both an extending and a shrinking truncate at the same time.
And I think that behavior would be served by removing the
"if (newsize < oldsize)" condition completely.

But perhaps I'm wrong: can you or anyone shed more light on this,
or point to documentation of what should happen in these cases?

Thanks,
Hugh

> ---
>  mm/shmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 185836b..79b7fb5 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -574,7 +574,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
>  			i_size_write(inode, newsize);
>  			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
>  		}
> -		if (newsize < oldsize) {
> +		if (newsize <= oldsize) {
>  			loff_t holebegin = round_up(newsize, PAGE_SIZE);
>  			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
>  			shmem_truncate_range(inode, newsize, (loff_t)-1);
> -- 
> 1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tmpfs: truncate prealloc blocks past i_size
  2014-11-04  1:30   ` Hugh Dickins
@ 2014-11-04 15:55     ` Josef Bacik
  -1 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2014-11-04 15:55 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-fsdevel, dave Chinner

On 11/03/2014 08:30 PM, Hugh Dickins wrote:
> On Wed, 29 Oct 2014, Josef Bacik wrote:
>
>> One of the rocksdb people noticed that when you do something like this
>>
>> fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
>> pwrite(fd, buf, 5M, 0)
>> ftruncate(5M)
>>
>> on tmpfs the file would still take up 10M, which lead to super fun issues
>> because we were getting ENOSPC before we thought we should be getting ENOSPC.
>> This patch fixes the problem, and mirrors what all the other fs'es do.  I tested
>> it locally to make sure it worked properly with the following
>>
>> xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file
>>
>> Without the patch we have "Blocks: 20480", with the patch we have the correct
>> value of "Blocks: 10240".  Thanks,
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>
> That is a very good catch, and thank you for the patch.  But I am not
> convinced that the patch is correct - even if it does happen to end
> up doing what other filesystems do here (I haven't checked).
>
> Your patch makes it look like a fix to an off-by-one, but that is
> not really the case.  What if you change your final ftruncate(5M)
> to ftruncate(6M): what should happen then?
>
> My intuition says that what should happen is that i_size is set to 6M,
> and the fallocated excess blocks beyond 6M be trimmed off: so that
> it's both an extending and a shrinking truncate at the same time.
> And I think that behavior would be served by removing the
> "if (newsize < oldsize)" condition completely.
>
> But perhaps I'm wrong: can you or anyone shed more light on this,
> or point to documentation of what should happen in these cases?
>

Yup there's a section in the ftruncate manpage that specifically says 
"expanding truncate is for losers."

Dave you want to weigh in here?  Looking at both btrfs and xfs we only 
do the trimming if newsize <= oldsize.  So if you falloc up to 10M, 
write 5M, and truncate up to 6M there is no trimming.  I'd say this is 
ok since it's an expanding truncate, people doing this are probably 
going to want to keep the extra space, as opposed to those who falloc a 
chunk and then truncate down to the amount they actually wrote.  Thoughts?

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tmpfs: truncate prealloc blocks past i_size
@ 2014-11-04 15:55     ` Josef Bacik
  0 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2014-11-04 15:55 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-fsdevel, dave Chinner

On 11/03/2014 08:30 PM, Hugh Dickins wrote:
> On Wed, 29 Oct 2014, Josef Bacik wrote:
>
>> One of the rocksdb people noticed that when you do something like this
>>
>> fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
>> pwrite(fd, buf, 5M, 0)
>> ftruncate(5M)
>>
>> on tmpfs the file would still take up 10M, which lead to super fun issues
>> because we were getting ENOSPC before we thought we should be getting ENOSPC.
>> This patch fixes the problem, and mirrors what all the other fs'es do.  I tested
>> it locally to make sure it worked properly with the following
>>
>> xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file
>>
>> Without the patch we have "Blocks: 20480", with the patch we have the correct
>> value of "Blocks: 10240".  Thanks,
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>
> That is a very good catch, and thank you for the patch.  But I am not
> convinced that the patch is correct - even if it does happen to end
> up doing what other filesystems do here (I haven't checked).
>
> Your patch makes it look like a fix to an off-by-one, but that is
> not really the case.  What if you change your final ftruncate(5M)
> to ftruncate(6M): what should happen then?
>
> My intuition says that what should happen is that i_size is set to 6M,
> and the fallocated excess blocks beyond 6M be trimmed off: so that
> it's both an extending and a shrinking truncate at the same time.
> And I think that behavior would be served by removing the
> "if (newsize < oldsize)" condition completely.
>
> But perhaps I'm wrong: can you or anyone shed more light on this,
> or point to documentation of what should happen in these cases?
>

Yup there's a section in the ftruncate manpage that specifically says 
"expanding truncate is for losers."

Dave you want to weigh in here?  Looking at both btrfs and xfs we only 
do the trimming if newsize <= oldsize.  So if you falloc up to 10M, 
write 5M, and truncate up to 6M there is no trimming.  I'd say this is 
ok since it's an expanding truncate, people doing this are probably 
going to want to keep the extra space, as opposed to those who falloc a 
chunk and then truncate down to the amount they actually wrote.  Thoughts?

Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] tmpfs: truncate prealloc blocks past i_size
  2015-06-16 20:02 ` Hugh Dickins
@ 2015-06-16 20:07     ` Hugh Dickins
  0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2015-06-16 20:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Josef Bacik, linux-mm, linux-kernel, linux-fsdevel

From: Josef Bacik <jbacik@fb.com>

One of the rocksdb people noticed that when you do something like this

fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
pwrite(fd, buf, 5M, 0)
ftruncate(5M)

on tmpfs, the file would still take up 10M: which led to super fun issues
because we were getting ENOSPC before we thought we should be getting
ENOSPC.  This patch fixes the problem, and mirrors what all the other
fs'es do (and was agreed to be the correct behaviour at LSF).

I tested it locally to make sure it worked properly with the following

xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file

Without the patch we have "Blocks: 20480", with the patch we have the
correct value of "Blocks: 10240".

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -569,7 +569,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 			i_size_write(inode, newsize);
 			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 		}
-		if (newsize < oldsize) {
+		if (newsize <= oldsize) {
 			loff_t holebegin = round_up(newsize, PAGE_SIZE);
 			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
 			shmem_truncate_range(inode, newsize, (loff_t)-1);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] tmpfs: truncate prealloc blocks past i_size
@ 2015-06-16 20:07     ` Hugh Dickins
  0 siblings, 0 replies; 8+ messages in thread
From: Hugh Dickins @ 2015-06-16 20:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Josef Bacik, linux-mm, linux-kernel, linux-fsdevel

From: Josef Bacik <jbacik@fb.com>

One of the rocksdb people noticed that when you do something like this

fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
pwrite(fd, buf, 5M, 0)
ftruncate(5M)

on tmpfs, the file would still take up 10M: which led to super fun issues
because we were getting ENOSPC before we thought we should be getting
ENOSPC.  This patch fixes the problem, and mirrors what all the other
fs'es do (and was agreed to be the correct behaviour at LSF).

I tested it locally to make sure it worked properly with the following

xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file

Without the patch we have "Blocks: 20480", with the patch we have the
correct value of "Blocks: 10240".

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/shmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -569,7 +569,7 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 			i_size_write(inode, newsize);
 			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 		}
-		if (newsize < oldsize) {
+		if (newsize <= oldsize) {
 			loff_t holebegin = round_up(newsize, PAGE_SIZE);
 			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
 			shmem_truncate_range(inode, newsize, (loff_t)-1);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-06-16 20:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-29 17:10 [PATCH] tmpfs: truncate prealloc blocks past i_size Josef Bacik
2014-10-29 17:10 ` Josef Bacik
2014-11-04  1:30 ` Hugh Dickins
2014-11-04  1:30   ` Hugh Dickins
2014-11-04 15:55   ` Josef Bacik
2014-11-04 15:55     ` Josef Bacik
2015-05-19 15:27 [PATCH] tmpfs: truncate at i_size Josef Bacik
2015-06-16 20:02 ` Hugh Dickins
2015-06-16 20:07   ` [PATCH] tmpfs: truncate prealloc blocks past i_size Hugh Dickins
2015-06-16 20:07     ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.