* GFS2 file system does not invalidate page cache after direct IO write
@ 2017-05-04 3:33 ` Gang He
0 siblings, 0 replies; 6+ messages in thread
From: Gang He @ 2017-05-04 3:33 UTC (permalink / raw)
To: cluster-devel, linux-fsdevel
Hello Guys,
I found a interesting thing on GFS2 file system, After I did a direct IO write for a whole file, I still saw there were some page caches in this inode.
It looks this GFS2 behavior does not follow file system POSIX semantics, I just want to know this problem belongs to a know issue or we can fix it?
By the way, I did the same testing on EXT4 and OCFS2 file systems, the result looks OK.
I will paste my testing command lines and outputs as below,
For EXT4 file system,
tb-nd1:/mnt/ext4 # rm -rf f3
tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
tb-nd1:/mnt/ext4 # vmtouch -v f3
f3
[ ] 0/1024
Files: 1
Directories: 0
Resident Pages: 0/1024 0/4M 0%
Elapsed: 0.000424 seconds
tb-nd1:/mnt/ext4 #
For OCFS2 file system,
tb-nd1:/mnt/ocfs2 # rm -rf f3
tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
tb-nd1:/mnt/ocfs2 # vmtouch -v f3
f3
[ ] 0/1024
Files: 1
Directories: 0
Resident Pages: 0/1024 0/4M 0%
Elapsed: 0.000226 seconds
For GFS2 file system,
tb-nd1:/mnt/gfs2 # rm -rf f3
tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
tb-nd1:/mnt/gfs2 # vmtouch -v f3
f3
[ oo oOo ] 48/1024
Files: 1
Directories: 0
Resident Pages: 48/1024 192K/4M 4.69%
Elapsed: 0.000287 seconds
For vmtouch tool, you can download it's source code from https://github.com/hoytech/vmtouch
I also printk the inode's address_space after a full file direct-IO write in kernel space,
the nrpages value in the inode's address_space is always greater than zero.
Thanks
Gang
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
@ 2017-05-04 3:33 ` Gang He
0 siblings, 0 replies; 6+ messages in thread
From: Gang He @ 2017-05-04 3:33 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hello Guys,
I found a interesting thing on GFS2 file system, After I did a direct IO write for a whole file, I still saw there were some page caches in this inode.
It looks this GFS2 behavior does not follow file system POSIX semantics, I just want to know this problem belongs to a know issue or we can fix it?
By the way, I did the same testing on EXT4 and OCFS2 file systems, the result looks OK.
I will paste my testing command lines and outputs as below,
For EXT4 file system,
tb-nd1:/mnt/ext4 # rm -rf f3
tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
tb-nd1:/mnt/ext4 # vmtouch -v f3
f3
[ ] 0/1024
Files: 1
Directories: 0
Resident Pages: 0/1024 0/4M 0%
Elapsed: 0.000424 seconds
tb-nd1:/mnt/ext4 #
For OCFS2 file system,
tb-nd1:/mnt/ocfs2 # rm -rf f3
tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
tb-nd1:/mnt/ocfs2 # vmtouch -v f3
f3
[ ] 0/1024
Files: 1
Directories: 0
Resident Pages: 0/1024 0/4M 0%
Elapsed: 0.000226 seconds
For GFS2 file system,
tb-nd1:/mnt/gfs2 # rm -rf f3
tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
tb-nd1:/mnt/gfs2 # vmtouch -v f3
f3
[ oo oOo ] 48/1024
Files: 1
Directories: 0
Resident Pages: 48/1024 192K/4M 4.69%
Elapsed: 0.000287 seconds
For vmtouch tool, you can download it's source code from https://github.com/hoytech/vmtouch
I also printk the inode's address_space after a full file direct-IO write in kernel space,
the nrpages value in the inode's address_space is always greater than zero.
Thanks
Gang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
2017-05-04 3:33 ` [Cluster-devel] " Gang He
@ 2017-05-04 16:06 ` Andreas Gruenbacher
-1 siblings, 0 replies; 6+ messages in thread
From: Andreas Gruenbacher @ 2017-05-04 16:06 UTC (permalink / raw)
To: Gang He; +Cc: cluster-devel, linux-fsdevel
Gang,
On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@suse.com> wrote:
> Hello Guys,
>
> I found a interesting thing on GFS2 file system, After I did a direct IO write for a whole file, I still saw there were some page caches in this inode.
> It looks this GFS2 behavior does not follow file system POSIX semantics, I just want to know this problem belongs to a know issue or we can fix it?
> By the way, I did the same testing on EXT4 and OCFS2 file systems, the result looks OK.
> I will paste my testing command lines and outputs as below,
>
> For EXT4 file system,
> tb-nd1:/mnt/ext4 # rm -rf f3
> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
> tb-nd1:/mnt/ext4 # vmtouch -v f3
> f3
> [ ] 0/1024
>
> Files: 1
> Directories: 0
> Resident Pages: 0/1024 0/4M 0%
> Elapsed: 0.000424 seconds
> tb-nd1:/mnt/ext4 #
>
> For OCFS2 file system,
> tb-nd1:/mnt/ocfs2 # rm -rf f3
> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
> f3
> [ ] 0/1024
>
> Files: 1
> Directories: 0
> Resident Pages: 0/1024 0/4M 0%
> Elapsed: 0.000226 seconds
>
> For GFS2 file system,
> tb-nd1:/mnt/gfs2 # rm -rf f3
> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
> tb-nd1:/mnt/gfs2 # vmtouch -v f3
> f3
> [ oo oOo ] 48/1024
I cannot reproduce, at least not so easily. What kernel version is
this? If it's not a mainline kernel, can you reproduce on mainline?
Thanks,
Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
@ 2017-05-04 16:06 ` Andreas Gruenbacher
0 siblings, 0 replies; 6+ messages in thread
From: Andreas Gruenbacher @ 2017-05-04 16:06 UTC (permalink / raw)
To: cluster-devel.redhat.com
Gang,
On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@suse.com> wrote:
> Hello Guys,
>
> I found a interesting thing on GFS2 file system, After I did a direct IO write for a whole file, I still saw there were some page caches in this inode.
> It looks this GFS2 behavior does not follow file system POSIX semantics, I just want to know this problem belongs to a know issue or we can fix it?
> By the way, I did the same testing on EXT4 and OCFS2 file systems, the result looks OK.
> I will paste my testing command lines and outputs as below,
>
> For EXT4 file system,
> tb-nd1:/mnt/ext4 # rm -rf f3
> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
> tb-nd1:/mnt/ext4 # vmtouch -v f3
> f3
> [ ] 0/1024
>
> Files: 1
> Directories: 0
> Resident Pages: 0/1024 0/4M 0%
> Elapsed: 0.000424 seconds
> tb-nd1:/mnt/ext4 #
>
> For OCFS2 file system,
> tb-nd1:/mnt/ocfs2 # rm -rf f3
> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
> f3
> [ ] 0/1024
>
> Files: 1
> Directories: 0
> Resident Pages: 0/1024 0/4M 0%
> Elapsed: 0.000226 seconds
>
> For GFS2 file system,
> tb-nd1:/mnt/gfs2 # rm -rf f3
> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
> tb-nd1:/mnt/gfs2 # vmtouch -v f3
> f3
> [ oo oOo ] 48/1024
I cannot reproduce, at least not so easily. What kernel version is
this? If it's not a mainline kernel, can you reproduce on mainline?
Thanks,
Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
2017-05-04 16:06 ` Andreas Gruenbacher
@ 2017-05-05 3:09 ` Gang He
-1 siblings, 0 replies; 6+ messages in thread
From: Gang He @ 2017-05-05 3:09 UTC (permalink / raw)
To: agruenba; +Cc: cluster-devel, linux-fsdevel
Hello Andreas,
>>>
> Gang,
>
> On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@suse.com> wrote:
>> Hello Guys,
>>
>> I found a interesting thing on GFS2 file system, After I did a direct IO
> write for a whole file, I still saw there were some page caches in this
> inode.
>> It looks this GFS2 behavior does not follow file system POSIX semantics, I
> just want to know this problem belongs to a know issue or we can fix it?
>> By the way, I did the same testing on EXT4 and OCFS2 file systems, the
> result looks OK.
>> I will paste my testing command lines and outputs as below,
>>
>> For EXT4 file system,
>> tb-nd1:/mnt/ext4 # rm -rf f3
>> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
>> tb-nd1:/mnt/ext4 # vmtouch -v f3
>> f3
>> [ ] 0/1024
>>
>> Files: 1
>> Directories: 0
>> Resident Pages: 0/1024 0/4M 0%
>> Elapsed: 0.000424 seconds
>> tb-nd1:/mnt/ext4 #
>>
>> For OCFS2 file system,
>> tb-nd1:/mnt/ocfs2 # rm -rf f3
>> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
>> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
>> f3
>> [ ] 0/1024
>>
>> Files: 1
>> Directories: 0
>> Resident Pages: 0/1024 0/4M 0%
>> Elapsed: 0.000226 seconds
>>
>> For GFS2 file system,
>> tb-nd1:/mnt/gfs2 # rm -rf f3
>> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
>> tb-nd1:/mnt/gfs2 # vmtouch -v f3
>> f3
>> [ oo oOo ] 48/1024
>
> I cannot reproduce, at least not so easily. What kernel version is
> this? If it's not a mainline kernel, can you reproduce on mainline?
I always reproduce. I am using the kernel version 4.11.0-rc4-2-default, although the version is not latest,
it is enough new.
By the way, I add some printk in GFS2 and OCFS2 kernel module, I find GFS2 direct-IO always falls back to buffered IO, I am not sure this behavior is by-design.
Of source, even GFS2 falls back to buffered IO, the code still make sure the related page cache invalidated, but the testing result is not by-expected, I need to look at the code deeply.
the printk outputs like,
[ 198.176774] gfs2_file_write_iter: enter ino 132419 0 - 1048576
[ 198.176785] gfs2_direct_IO: enter ino 132419 pages 0 0 - 1048576
[ 198.176787] gfs2_direct_IO: exit ino 132419 - (0) <<== here, gfs2_direct_IO always return 0, then fall back to buffered IO, his behavior is by-design?
[ 198.184640] gfs2_file_write_iter: exit ino 132419 - (1048576) <<== The write_iter looks to return the right bytes.
[ 198.189151] gfs2_file_write_iter: enter ino 132419 1048576 - 1048576
[ 198.189163] gfs2_direct_IO: enter ino 132419 pages 8 1048576 - 1048576 <<== here, the inode's page number is greater than zero.
[ 198.189165] gfs2_direct_IO: exit ino 132419 - (0)
[ 198.195901] gfs2_file_write_iter: exit ino 132419 - (1048576)
But for OCFS2
[ 120.331053] ocfs2_file_write_iter: enter ino 297475 0 - 1048576
[ 120.331065] ocfs2_direct_IO: enter ino 297475 pages 0 0 - 1048576
[ 120.343129] ocfs2_direct_IO: exit ino 297475 (1048576) <<== here, ocfs2_direct_IO can return the right bytes.
[ 120.343132] ocfs2_file_write_iter: exit ino 297475 - (1048576)
[ 120.347705] ocfs2_file_write_iter: enter ino 297475 1048576 - 1048576
[ 120.347713] ocfs2_direct_IO: enter ino 297475 pages 0 1048576 - 1048576 <<== here, the inode's page number is always zero.
[ 120.354096] ocfs2_direct_IO: exit ino 297475 (1048576)
[ 120.354099] ocfs2_file_write_iter: exit ino 297475 - (1048576)
Thanks
Gang
>
> Thanks,
> Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
@ 2017-05-05 3:09 ` Gang He
0 siblings, 0 replies; 6+ messages in thread
From: Gang He @ 2017-05-05 3:09 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hello Andreas,
>>>
> Gang,
>
> On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@suse.com> wrote:
>> Hello Guys,
>>
>> I found a interesting thing on GFS2 file system, After I did a direct IO
> write for a whole file, I still saw there were some page caches in this
> inode.
>> It looks this GFS2 behavior does not follow file system POSIX semantics, I
> just want to know this problem belongs to a know issue or we can fix it?
>> By the way, I did the same testing on EXT4 and OCFS2 file systems, the
> result looks OK.
>> I will paste my testing command lines and outputs as below,
>>
>> For EXT4 file system,
>> tb-nd1:/mnt/ext4 # rm -rf f3
>> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
>> tb-nd1:/mnt/ext4 # vmtouch -v f3
>> f3
>> [ ] 0/1024
>>
>> Files: 1
>> Directories: 0
>> Resident Pages: 0/1024 0/4M 0%
>> Elapsed: 0.000424 seconds
>> tb-nd1:/mnt/ext4 #
>>
>> For OCFS2 file system,
>> tb-nd1:/mnt/ocfs2 # rm -rf f3
>> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
>> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
>> f3
>> [ ] 0/1024
>>
>> Files: 1
>> Directories: 0
>> Resident Pages: 0/1024 0/4M 0%
>> Elapsed: 0.000226 seconds
>>
>> For GFS2 file system,
>> tb-nd1:/mnt/gfs2 # rm -rf f3
>> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
>> tb-nd1:/mnt/gfs2 # vmtouch -v f3
>> f3
>> [ oo oOo ] 48/1024
>
> I cannot reproduce, at least not so easily. What kernel version is
> this? If it's not a mainline kernel, can you reproduce on mainline?
I always reproduce. I am using the kernel version 4.11.0-rc4-2-default, although the version is not latest,
it is enough new.
By the way, I add some printk in GFS2 and OCFS2 kernel module, I find GFS2 direct-IO always falls back to buffered IO, I am not sure this behavior is by-design.
Of source, even GFS2 falls back to buffered IO, the code still make sure the related page cache invalidated, but the testing result is not by-expected, I need to look at the code deeply.
the printk outputs like,
[ 198.176774] gfs2_file_write_iter: enter ino 132419 0 - 1048576
[ 198.176785] gfs2_direct_IO: enter ino 132419 pages 0 0 - 1048576
[ 198.176787] gfs2_direct_IO: exit ino 132419 - (0) <<== here, gfs2_direct_IO always return 0, then fall back to buffered IO, his behavior is by-design?
[ 198.184640] gfs2_file_write_iter: exit ino 132419 - (1048576) <<== The write_iter looks to return the right bytes.
[ 198.189151] gfs2_file_write_iter: enter ino 132419 1048576 - 1048576
[ 198.189163] gfs2_direct_IO: enter ino 132419 pages 8 1048576 - 1048576 <<== here, the inode's page number is greater than zero.
[ 198.189165] gfs2_direct_IO: exit ino 132419 - (0)
[ 198.195901] gfs2_file_write_iter: exit ino 132419 - (1048576)
But for OCFS2
[ 120.331053] ocfs2_file_write_iter: enter ino 297475 0 - 1048576
[ 120.331065] ocfs2_direct_IO: enter ino 297475 pages 0 0 - 1048576
[ 120.343129] ocfs2_direct_IO: exit ino 297475 (1048576) <<== here, ocfs2_direct_IO can return the right bytes.
[ 120.343132] ocfs2_file_write_iter: exit ino 297475 - (1048576)
[ 120.347705] ocfs2_file_write_iter: enter ino 297475 1048576 - 1048576
[ 120.347713] ocfs2_direct_IO: enter ino 297475 pages 0 1048576 - 1048576 <<== here, the inode's page number is always zero.
[ 120.354096] ocfs2_direct_IO: exit ino 297475 (1048576)
[ 120.354099] ocfs2_file_write_iter: exit ino 297475 - (1048576)
Thanks
Gang
>
> Thanks,
> Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-05-05 3:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04 3:33 GFS2 file system does not invalidate page cache after direct IO write Gang He
2017-05-04 3:33 ` [Cluster-devel] " Gang He
2017-05-04 16:06 ` Andreas Gruenbacher
2017-05-04 16:06 ` Andreas Gruenbacher
2017-05-05 3:09 ` Gang He
2017-05-05 3:09 ` Gang He
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.