All of lore.kernel.org
 help / color / mirror / Atom feed
* GFS2 file system does not invalidate page cache after direct IO write
@ 2017-05-04  3:33 ` Gang He
  0 siblings, 0 replies; 6+ messages in thread
From: Gang He @ 2017-05-04  3:33 UTC (permalink / raw)
  To: cluster-devel, linux-fsdevel

Hello Guys,

I found a interesting thing on GFS2 file system, After I did a direct IO write for a whole file, I still saw there were some page caches in this inode.
It looks this GFS2 behavior does not follow file system POSIX semantics, I just want to know this problem belongs to a know issue or we can fix it?
By the way, I did the same testing on EXT4 and OCFS2 file systems, the result looks OK.
I will paste my testing command lines and outputs as below,

For EXT4 file system,
tb-nd1:/mnt/ext4 # rm -rf f3
tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
tb-nd1:/mnt/ext4 # vmtouch -v f3
f3
[                                                         ] 0/1024

           Files: 1
     Directories: 0
  Resident Pages: 0/1024  0/4M  0%
         Elapsed: 0.000424 seconds
tb-nd1:/mnt/ext4 #

For OCFS2 file system,
tb-nd1:/mnt/ocfs2 # rm -rf f3
tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
tb-nd1:/mnt/ocfs2 # vmtouch -v f3
f3
[                                                         ] 0/1024

           Files: 1
     Directories: 0
  Resident Pages: 0/1024  0/4M  0%
         Elapsed: 0.000226 seconds

For GFS2 file system,
tb-nd1:/mnt/gfs2 # rm -rf f3
tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
tb-nd1:/mnt/gfs2 # vmtouch -v f3
f3
[             oo                         oOo              ] 48/1024

           Files: 1
     Directories: 0
  Resident Pages: 48/1024  192K/4M  4.69%
         Elapsed: 0.000287 seconds


For vmtouch tool, you can download it's source code from https://github.com/hoytech/vmtouch
I also printk the inode's address_space after a full file direct-IO write in kernel space,
the nrpages value in the inode's address_space is always greater than zero.

Thanks
Gang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
@ 2017-05-04  3:33 ` Gang He
  0 siblings, 0 replies; 6+ messages in thread
From: Gang He @ 2017-05-04  3:33 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hello Guys,

I found a interesting thing on GFS2 file system, After I did a direct IO write for a whole file, I still saw there were some page caches in this inode.
It looks this GFS2 behavior does not follow file system POSIX semantics, I just want to know this problem belongs to a know issue or we can fix it?
By the way, I did the same testing on EXT4 and OCFS2 file systems, the result looks OK.
I will paste my testing command lines and outputs as below,

For EXT4 file system,
tb-nd1:/mnt/ext4 # rm -rf f3
tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
tb-nd1:/mnt/ext4 # vmtouch -v f3
f3
[                                                         ] 0/1024

           Files: 1
     Directories: 0
  Resident Pages: 0/1024  0/4M  0%
         Elapsed: 0.000424 seconds
tb-nd1:/mnt/ext4 #

For OCFS2 file system,
tb-nd1:/mnt/ocfs2 # rm -rf f3
tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
tb-nd1:/mnt/ocfs2 # vmtouch -v f3
f3
[                                                         ] 0/1024

           Files: 1
     Directories: 0
  Resident Pages: 0/1024  0/4M  0%
         Elapsed: 0.000226 seconds

For GFS2 file system,
tb-nd1:/mnt/gfs2 # rm -rf f3
tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
4+0 records in
4+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
tb-nd1:/mnt/gfs2 # vmtouch -v f3
f3
[             oo                         oOo              ] 48/1024

           Files: 1
     Directories: 0
  Resident Pages: 48/1024  192K/4M  4.69%
         Elapsed: 0.000287 seconds


For vmtouch tool, you can download it's source code from https://github.com/hoytech/vmtouch
I also printk the inode's address_space after a full file direct-IO write in kernel space,
the nrpages value in the inode's address_space is always greater than zero.

Thanks
Gang





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
  2017-05-04  3:33 ` [Cluster-devel] " Gang He
@ 2017-05-04 16:06   ` Andreas Gruenbacher
  -1 siblings, 0 replies; 6+ messages in thread
From: Andreas Gruenbacher @ 2017-05-04 16:06 UTC (permalink / raw)
  To: Gang He; +Cc: cluster-devel, linux-fsdevel

Gang,

On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@suse.com> wrote:
> Hello Guys,
>
> I found a interesting thing on GFS2 file system, After I did a direct IO write for a whole file, I still saw there were some page caches in this inode.
> It looks this GFS2 behavior does not follow file system POSIX semantics, I just want to know this problem belongs to a know issue or we can fix it?
> By the way, I did the same testing on EXT4 and OCFS2 file systems, the result looks OK.
> I will paste my testing command lines and outputs as below,
>
> For EXT4 file system,
> tb-nd1:/mnt/ext4 # rm -rf f3
> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
> tb-nd1:/mnt/ext4 # vmtouch -v f3
> f3
> [                                                         ] 0/1024
>
>            Files: 1
>      Directories: 0
>   Resident Pages: 0/1024  0/4M  0%
>          Elapsed: 0.000424 seconds
> tb-nd1:/mnt/ext4 #
>
> For OCFS2 file system,
> tb-nd1:/mnt/ocfs2 # rm -rf f3
> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
> f3
> [                                                         ] 0/1024
>
>            Files: 1
>      Directories: 0
>   Resident Pages: 0/1024  0/4M  0%
>          Elapsed: 0.000226 seconds
>
> For GFS2 file system,
> tb-nd1:/mnt/gfs2 # rm -rf f3
> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
> tb-nd1:/mnt/gfs2 # vmtouch -v f3
> f3
> [             oo                         oOo              ] 48/1024

I cannot reproduce, at least not so easily. What kernel version is
this? If it's not a mainline kernel, can you reproduce on mainline?

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
@ 2017-05-04 16:06   ` Andreas Gruenbacher
  0 siblings, 0 replies; 6+ messages in thread
From: Andreas Gruenbacher @ 2017-05-04 16:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Gang,

On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@suse.com> wrote:
> Hello Guys,
>
> I found a interesting thing on GFS2 file system, After I did a direct IO write for a whole file, I still saw there were some page caches in this inode.
> It looks this GFS2 behavior does not follow file system POSIX semantics, I just want to know this problem belongs to a know issue or we can fix it?
> By the way, I did the same testing on EXT4 and OCFS2 file systems, the result looks OK.
> I will paste my testing command lines and outputs as below,
>
> For EXT4 file system,
> tb-nd1:/mnt/ext4 # rm -rf f3
> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
> tb-nd1:/mnt/ext4 # vmtouch -v f3
> f3
> [                                                         ] 0/1024
>
>            Files: 1
>      Directories: 0
>   Resident Pages: 0/1024  0/4M  0%
>          Elapsed: 0.000424 seconds
> tb-nd1:/mnt/ext4 #
>
> For OCFS2 file system,
> tb-nd1:/mnt/ocfs2 # rm -rf f3
> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
> f3
> [                                                         ] 0/1024
>
>            Files: 1
>      Directories: 0
>   Resident Pages: 0/1024  0/4M  0%
>          Elapsed: 0.000226 seconds
>
> For GFS2 file system,
> tb-nd1:/mnt/gfs2 # rm -rf f3
> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
> 4+0 records in
> 4+0 records out
> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
> tb-nd1:/mnt/gfs2 # vmtouch -v f3
> f3
> [             oo                         oOo              ] 48/1024

I cannot reproduce, at least not so easily. What kernel version is
this? If it's not a mainline kernel, can you reproduce on mainline?

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
  2017-05-04 16:06   ` Andreas Gruenbacher
@ 2017-05-05  3:09     ` Gang He
  -1 siblings, 0 replies; 6+ messages in thread
From: Gang He @ 2017-05-05  3:09 UTC (permalink / raw)
  To: agruenba; +Cc: cluster-devel, linux-fsdevel

Hello Andreas,


>>> 
> Gang,
> 
> On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@suse.com> wrote:
>> Hello Guys,
>>
>> I found a interesting thing on GFS2 file system, After I did a direct IO 
> write for a whole file, I still saw there were some page caches in this 
> inode.
>> It looks this GFS2 behavior does not follow file system POSIX semantics, I 
> just want to know this problem belongs to a know issue or we can fix it?
>> By the way, I did the same testing on EXT4 and OCFS2 file systems, the 
> result looks OK.
>> I will paste my testing command lines and outputs as below,
>>
>> For EXT4 file system,
>> tb-nd1:/mnt/ext4 # rm -rf f3
>> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
>> tb-nd1:/mnt/ext4 # vmtouch -v f3
>> f3
>> [                                                         ] 0/1024
>>
>>            Files: 1
>>      Directories: 0
>>   Resident Pages: 0/1024  0/4M  0%
>>          Elapsed: 0.000424 seconds
>> tb-nd1:/mnt/ext4 #
>>
>> For OCFS2 file system,
>> tb-nd1:/mnt/ocfs2 # rm -rf f3
>> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
>> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
>> f3
>> [                                                         ] 0/1024
>>
>>            Files: 1
>>      Directories: 0
>>   Resident Pages: 0/1024  0/4M  0%
>>          Elapsed: 0.000226 seconds
>>
>> For GFS2 file system,
>> tb-nd1:/mnt/gfs2 # rm -rf f3
>> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
>> tb-nd1:/mnt/gfs2 # vmtouch -v f3
>> f3
>> [             oo                         oOo              ] 48/1024
> 
> I cannot reproduce, at least not so easily. What kernel version is
> this? If it's not a mainline kernel, can you reproduce on mainline?
I always reproduce. I am using the kernel version 4.11.0-rc4-2-default, although the version is not latest,
it is enough new.
By the way, I add some printk in GFS2 and OCFS2 kernel module, I find GFS2 direct-IO always falls back to buffered IO, I am not sure this behavior is by-design. 
Of source, even GFS2 falls back to buffered IO, the code still make sure the related page cache invalidated, but the testing result is not by-expected, I need to look at the code deeply.
the printk outputs like,
[  198.176774] gfs2_file_write_iter: enter ino 132419 0 - 1048576
[  198.176785] gfs2_direct_IO: enter ino 132419 pages 0 0 - 1048576
[  198.176787] gfs2_direct_IO: exit ino 132419 - (0)   <<== here, gfs2_direct_IO always return 0, then fall back to buffered IO, his behavior is by-design?
[  198.184640] gfs2_file_write_iter: exit ino 132419 - (1048576) <<== The write_iter looks to return the right bytes.
[  198.189151] gfs2_file_write_iter: enter ino 132419 1048576 - 1048576
[  198.189163] gfs2_direct_IO: enter ino 132419 pages 8 1048576 - 1048576 <<== here, the inode's page number is greater than zero.
[  198.189165] gfs2_direct_IO: exit ino 132419 - (0)
[  198.195901] gfs2_file_write_iter: exit ino 132419 - (1048576)
But for OCFS2
[  120.331053] ocfs2_file_write_iter: enter ino 297475 0 - 1048576
[  120.331065] ocfs2_direct_IO: enter ino 297475 pages 0 0 - 1048576
[  120.343129] ocfs2_direct_IO: exit ino 297475 (1048576) <<== here, ocfs2_direct_IO can return the right bytes.
[  120.343132] ocfs2_file_write_iter: exit ino 297475 - (1048576)
[  120.347705] ocfs2_file_write_iter: enter ino 297475 1048576 - 1048576
[  120.347713] ocfs2_direct_IO: enter ino 297475 pages 0 1048576 - 1048576  <<== here, the inode's page number is always zero.
[  120.354096] ocfs2_direct_IO: exit ino 297475 (1048576)
[  120.354099] ocfs2_file_write_iter: exit ino 297475 - (1048576)

Thanks
Gang
 

> 
> Thanks,
> Andreas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] GFS2 file system does not invalidate page cache after direct IO write
@ 2017-05-05  3:09     ` Gang He
  0 siblings, 0 replies; 6+ messages in thread
From: Gang He @ 2017-05-05  3:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hello Andreas,


>>> 
> Gang,
> 
> On Thu, May 4, 2017 at 5:33 AM, Gang He <ghe@suse.com> wrote:
>> Hello Guys,
>>
>> I found a interesting thing on GFS2 file system, After I did a direct IO 
> write for a whole file, I still saw there were some page caches in this 
> inode.
>> It looks this GFS2 behavior does not follow file system POSIX semantics, I 
> just want to know this problem belongs to a know issue or we can fix it?
>> By the way, I did the same testing on EXT4 and OCFS2 file systems, the 
> result looks OK.
>> I will paste my testing command lines and outputs as below,
>>
>> For EXT4 file system,
>> tb-nd1:/mnt/ext4 # rm -rf f3
>> tb-nd1:/mnt/ext4 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0393563 s, 107 MB/s
>> tb-nd1:/mnt/ext4 # vmtouch -v f3
>> f3
>> [                                                         ] 0/1024
>>
>>            Files: 1
>>      Directories: 0
>>   Resident Pages: 0/1024  0/4M  0%
>>          Elapsed: 0.000424 seconds
>> tb-nd1:/mnt/ext4 #
>>
>> For OCFS2 file system,
>> tb-nd1:/mnt/ocfs2 # rm -rf f3
>> tb-nd1:/mnt/ocfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0592058 s, 70.8 MB/s
>> tb-nd1:/mnt/ocfs2 # vmtouch -v f3
>> f3
>> [                                                         ] 0/1024
>>
>>            Files: 1
>>      Directories: 0
>>   Resident Pages: 0/1024  0/4M  0%
>>          Elapsed: 0.000226 seconds
>>
>> For GFS2 file system,
>> tb-nd1:/mnt/gfs2 # rm -rf f3
>> tb-nd1:/mnt/gfs2 # dd if=/dev/urandom of=./f3 bs=1M count=4 oflag=direct
>> 4+0 records in
>> 4+0 records out
>> 4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.0579509 s, 72.4 MB/s
>> tb-nd1:/mnt/gfs2 # vmtouch -v f3
>> f3
>> [             oo                         oOo              ] 48/1024
> 
> I cannot reproduce, at least not so easily. What kernel version is
> this? If it's not a mainline kernel, can you reproduce on mainline?
I always reproduce. I am using the kernel version 4.11.0-rc4-2-default, although the version is not latest,
it is enough new.
By the way, I add some printk in GFS2 and OCFS2 kernel module, I find GFS2 direct-IO always falls back to buffered IO, I am not sure this behavior is by-design. 
Of source, even GFS2 falls back to buffered IO, the code still make sure the related page cache invalidated, but the testing result is not by-expected, I need to look at the code deeply.
the printk outputs like,
[  198.176774] gfs2_file_write_iter: enter ino 132419 0 - 1048576
[  198.176785] gfs2_direct_IO: enter ino 132419 pages 0 0 - 1048576
[  198.176787] gfs2_direct_IO: exit ino 132419 - (0)   <<== here, gfs2_direct_IO always return 0, then fall back to buffered IO, his behavior is by-design?
[  198.184640] gfs2_file_write_iter: exit ino 132419 - (1048576) <<== The write_iter looks to return the right bytes.
[  198.189151] gfs2_file_write_iter: enter ino 132419 1048576 - 1048576
[  198.189163] gfs2_direct_IO: enter ino 132419 pages 8 1048576 - 1048576 <<== here, the inode's page number is greater than zero.
[  198.189165] gfs2_direct_IO: exit ino 132419 - (0)
[  198.195901] gfs2_file_write_iter: exit ino 132419 - (1048576)
But for OCFS2
[  120.331053] ocfs2_file_write_iter: enter ino 297475 0 - 1048576
[  120.331065] ocfs2_direct_IO: enter ino 297475 pages 0 0 - 1048576
[  120.343129] ocfs2_direct_IO: exit ino 297475 (1048576) <<== here, ocfs2_direct_IO can return the right bytes.
[  120.343132] ocfs2_file_write_iter: exit ino 297475 - (1048576)
[  120.347705] ocfs2_file_write_iter: enter ino 297475 1048576 - 1048576
[  120.347713] ocfs2_direct_IO: enter ino 297475 pages 0 1048576 - 1048576  <<== here, the inode's page number is always zero.
[  120.354096] ocfs2_direct_IO: exit ino 297475 (1048576)
[  120.354099] ocfs2_file_write_iter: exit ino 297475 - (1048576)

Thanks
Gang
 

> 
> Thanks,
> Andreas




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-05  3:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04  3:33 GFS2 file system does not invalidate page cache after direct IO write Gang He
2017-05-04  3:33 ` [Cluster-devel] " Gang He
2017-05-04 16:06 ` Andreas Gruenbacher
2017-05-04 16:06   ` Andreas Gruenbacher
2017-05-05  3:09   ` Gang He
2017-05-05  3:09     ` Gang He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.