linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] add FALLOC_FL_NO_HIDE_STALE flag in fallocate
@ 2012-04-17 16:53 Zheng Liu
  2012-04-17 16:53 ` [RFC][PATCH 1/3] vfs: " Zheng Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Zheng Liu @ 2012-04-17 16:53 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-ext4; +Cc: Zheng Liu

Hi list,

fallocate is a useful system call because it can preallocate some disk blocks
for a file and keep blocks contiguous.  However, it has a defect that file
system will convert an uninitialized extent to be an initialized when the user
wants to write some data to this file, because file system create an
unititalized extent while it preallocates some blocks in fallocate (e.g. ext4).
Especially, it causes a severe degradation when the user tries to do some
random write operations, which frequently modifies the metadata of this file.
We meet this problem in our product system at Taobao.  Last month, in ext4
workshop, we discussed this problem and the Google faces the same problem.  So
a new flag, FALLOC_FL_NO_HIDE_STALE, is added in order to solve this problem.
When this flag is set, file system will create an inititalized extent for this
file.  So it avoids the conversion from uninitialized to initialized.  If users
want to use this flag, they must guarantee that file has been initialized by
themselves before it is read at the same offset.  This flag is added in vfs so
that other file systems can also support this flag to improve the performance.

I try to make ext4 support this new flag, and run a simple test in my own
desktop to verify it.  The machine has a Intel(R) Core(TM)2 Duo CPU E8400, 4G
memory and a WDC WD1600AAJS-75M0A0 160G SATA disk.  I use the following script
to tset the performance.

#/bin/sh
mkfs.ext4 ${DEVICE}
mount -t ext4 ${DEVICE} ${TARGET}
fallocate -l 27262976 ${TARGET}/test # the size of the file is 256M (*)
time for((i=0;i<2000;i++)); do dd if=/dev/zero of=/mnt/sda1/test_256M \
	conv=notrunc bs=4k count=1 seek=`expr $i \* 16` oflag=sync,direct \
	2>/dev/null; done

* I write a wrapper program to call fallocate(2) with FALLOC_FL_NO_HIDE_STALE
  flag because the userspace tool doesn't support the new flag.

The result:
	w/o 		w/
real	1m16.043s	0m17.946s	-76.4%
user	0m0.195s	0m0.192s	-1.54%
sys	0m0.468s	0m0.462s	-1.28%

Obviously, this flag will bring an secure issue because the malicious user
could use this flag to get other user's data if (s)he doesn't do a
initialization before reading this file.  Thus, a sysctl parameter
'fs.falloc_no_hide_stale' is defined in order to let administrator to determine
whether or not this flag is enabled.  Currently, this flag is disabled by
default.  I am not sure whether this is enough or not.  Another option is that
a new Kconfig entry is created to remove this flag during the kernel is
complied.  So any suggestions or comments are appreciated.

Regards,
Zheng

Zheng Liu (3):
      vfs: add FALLOC_FL_NO_HIDE_STALE flag in fallocate
      vfs: add security check for _NO_HIDE_STALE flag
      ext4: add FALLOC_FL_NO_HIDE_STALE support

 fs/ext4/extents.c      |    7 +++++--
 fs/open.c              |   12 +++++++++++-
 include/linux/falloc.h |    5 +++++
 include/linux/sysctl.h |    1 +
 kernel/sysctl.c        |   10 ++++++++++
 5 files changed, 32 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread
* Re: [RFC][PATCH 0/3] add FALLOC_FL_NO_HIDE_STALE flag in fallocate
@ 2012-04-23  1:55 Szabolcs Szakacsits
  0 siblings, 0 replies; 29+ messages in thread
From: Szabolcs Szakacsits @ 2012-04-23  1:55 UTC (permalink / raw)
  To: Zheng Liu; +Cc: linux-kernel, linux-fsdevel, linux-ext4


On 4/17/12 11:53 AM, Zheng Liu wrote:

> fallocate is a useful system call because it can preallocate some disk 
> blocks for a file and keep blocks contiguous.  However, it has a defect 
> that file system will convert an uninitialized extent to be an 
> initialized when the user wants to write some data to this file, because 
> file system create an unititalized extent while it preallocates some 
> blocks in fallocate (e.g. ext4). Especially, it causes a severe 
> degradation when the user tries to do some random write operations, which 
> frequently modifies the metadata of this file. We meet this problem in 
> our product system at Taobao.  Last month, in ext4 workshop, we discussed 
> this problem and the Google faces the same problem.  So a new flag, 
> FALLOC_FL_NO_HIDE_STALE, is added in order to solve this problem. 

I think a more explicit name would be better like FALLOC_FL_EXPOSE_DATA, 
FALLOC_FL_EXPOSE_STALE_DATA, FALLOC_FL_EXPOSE_UNINITIALIZED_DATA, etc.

> When this flag is set, file system will create an inititalized extent for 
> this file.  So it avoids the conversion from uninitialized to 
> initialized.  If users want to use this flag, they must guarantee that 
> file has been initialized by themselves before it is read at the same 
> offset.  This flag is added in vfs so that other file systems can also 
> support this flag to improve the performance.

This flag could be indeed helpful for filesystems which can't fully support 
uninitialized allocated blocks efficiently unlike XFS and ext4. We are 
supporting several such interoperable filesystems (NTFS, exFAT, FAT) where 
changing the specification is unfortunately not possible.

There is real user need despite explaining potential security consequences. 
Typical usage scenarios are using a large file as a container for an 
application which tracks free/used blocks itself. Windows supports this 
feature by SetFileValidData() if extra privilege is granted.

The performance gain can be fairly large on embedded using low-end storage 
and CPU. In one of our cases it took 5 days vs 12 minutes to fully setup a 
large file for use.

Regards,
	   Szaka

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-04-23  2:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-17 16:53 [RFC][PATCH 0/3] add FALLOC_FL_NO_HIDE_STALE flag in fallocate Zheng Liu
2012-04-17 16:53 ` [RFC][PATCH 1/3] vfs: " Zheng Liu
2012-04-17 16:53 ` [RFC][PATCH 2/3] vfs: add security check for _NO_HIDE_STALE flag Zheng Liu
2012-04-17 16:53 ` [RFC][PATCH 3/3] ext4: add FALLOC_FL_NO_HIDE_STALE support Zheng Liu
2012-04-17 17:40 ` [RFC][PATCH 0/3] add FALLOC_FL_NO_HIDE_STALE flag in fallocate Eric Sandeen
2012-04-18  4:08   ` Zheng Liu
2012-04-18  7:48     ` Lukas Czerner
2012-04-18 12:03       ` Zheng Liu
2012-04-18 12:07         ` Lukas Czerner
2012-04-20  9:52           ` Zheng Liu
2012-04-18  4:59   ` Andreas Dilger
2012-04-18  8:19     ` Lukas Czerner
2012-04-18 12:48       ` Zheng Liu
2012-04-18 15:09         ` Andreas Dilger
2012-04-20  9:59           ` Zheng Liu
2012-04-18 11:38     ` Zheng Liu
2012-04-18 11:39       ` Lukas Czerner
2012-04-18 12:06         ` Zheng Liu
2012-04-18 14:57     ` Eric Sandeen
2012-04-17 17:59 ` Ric Wheeler
2012-04-17 18:43   ` Ted Ts'o
2012-04-17 18:52     ` Ric Wheeler
2012-04-17 18:53     ` Eric Sandeen
2012-04-17 19:04       ` Ted Ts'o
2012-04-18  3:02       ` Dave Chinner
2012-04-18 16:07         ` Ted Ts'o
2012-04-18 23:37           ` Dave Chinner
2012-04-18  8:04     ` Lukas Czerner
2012-04-23  1:55 Szabolcs Szakacsits

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).