All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS direct IO problem
@ 2015-04-08  4:21 YeYin
  2015-04-08  4:49 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: YeYin @ 2015-04-08  4:21 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 3613 bytes --]

Hi,
About 2 months ago, I asked one problem in XFS, see here(http://oss.sgi.com/archives/xfs/2015-02/msg00197.html).


After that, I use direct IO in MySQL, see here(https://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_flush_method).‍


However, I found that MySQL performance is still poor sometimes. I use some tools(https://github.com/brendangregg/perf-tools‍) to trace the kernel, I found some problems:


# ./funccount -i 1  "xfs_f*"
Tracing "xfs_f*"... Ctrl-C to end.
FUNC                              COUNT
xfs_file_aio_read                 15591
xfs_flushinval_pages              15591
xfs_find_bdev_for_inode           31182‍



As we can see, xfs_file_aio_read each will call ‍xfs_flushinval_pages. 
Note that I used direct IO!!!‍


xfs_flushinval_pages will call truncate_inode_pages_range, from here(https://bitbucket.org/hustcat/kernel-2.6.32/src/0e5d90ed6f3ef8a3b5fe62a04cc6766a721c70f8/fs/xfs/linux-2.6/xfs_fs_subr.c?at=master#cl-56‍)‍‍


Indeed that,
# ./funccount -i 1  "truncate_inode_page*"
Tracing "truncate_inode_page*"... Ctrl-C to end.
FUNC                              COUNT
truncate_inode_page                   4
truncate_inode_pages                176
truncate_inode_pages_range        15474
FUNC                              COUNT
truncate_inode_page                   1
truncate_inode_pages                  5
truncate_inode_pages_range        15566‍



As we can see, truncate_inode_pages_range called times as many as ‍xfs_flushinval_pages,‍
However, I found that truncate_inode_pages_range didn't call ‍truncate_inode_page:‍


# ./funcgraph truncate_inode_pages_range
Tracing "truncate_inode_pages_range"... Ctrl-C to end.
  2)   1.020 us    |  finish_task_switch();
  2)               |  truncate_inode_pages_range() {
  2)               |    pagevec_lookup() {
  2)   0.413 us    |      find_get_pages();
  2)   1.033 us    |    }
  2)   0.238 us    |    _cond_resched();
  2)               |    pagevec_lookup() {
  2)   0.234 us    |      find_get_pages();
  2)   0.690 us    |    }
  2)   3.362 us    |  }
  2)               |  truncate_inode_pages_range() {
  2)               |    pagevec_lookup() {
  2)   0.266 us    |      find_get_pages();
  2)   0.745 us    |    }
  2)   0.238 us    |    _cond_resched();
  2)               |    pagevec_lookup() {
  2)   0.248 us    |      find_get_pages();
  2)   0.701 us    |    }
  2)   2.844 us    |  }
  2)               |  truncate_inode_pages_range() {
  2)               |    pagevec_lookup() {
  2)   0.262 us    |      find_get_pages();
  2)   0.740 us    |    }
  2)   0.238 us    |    _cond_resched();
  2)               |    pagevec_lookup() {
  2)   0.251 us    |      find_get_pages();
  2)   0.705 us    |    }
  2)   2.767 us    |  }‍



This will canse inode->i_mapping->nrpages > 0 always, and xfs_file_aio_read/xfs_file_dio_aio_write will always call xfs_flushinval_pages. Even worse, xfs_file_dio_aio_write will use EXCL lock:‍‍‍


	if (mapping->nrpages && iolock == XFS_IOLOCK_SHARED) { 		xfs_rw_iunlock(ip, iolock); 		iolock = XFS_IOLOCK_EXCL; 		xfs_rw_ilock(ip, iolock); 	} 


see here(https://bitbucket.org/hustcat/kernel-2.6.32/src/0e5d90ed6f3ef8a3b5fe62a04cc6766a721c70f8/fs/xfs/linux-2.6/xfs_file.c?at=master#cl-659‍).


This will cause bad performance, even direct IO. I still don't understand why not truncate_inode_page called?‍


Every time, after I run this:
echo 1 > /proc/sys/vm/drop_caches


Immediately enhance performance.


Ye,
Thanks.

[-- Attachment #1.2: Type: text/html, Size: 9621 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS direct IO problem
  2015-04-08  4:21 XFS direct IO problem YeYin
@ 2015-04-08  4:49 ` Dave Chinner
  2015-04-08  7:05   ` 回复: " YeYin
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2015-04-08  4:49 UTC (permalink / raw)
  To: YeYin; +Cc: xfs

On Wed, Apr 08, 2015 at 12:21:45PM +0800, YeYin wrote:
> Hi, About 2 months ago, I asked one problem in XFS, see
> here(http://oss.sgi.com/archives/xfs/2015-02/msg00197.html).
> 
> 
> After that, I use direct IO in MySQL, see
> here(https://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_flush_method).‍
> 
> 
> However, I found that MySQL performance is still poor sometimes. I
> use some tools(https://github.com/brendangregg/perf-tools‍) to
> trace the kernel, I found some problems:

<snip>

> This will cause bad performance, even direct IO. I still don't
> understand why not truncate_inode_page called?

Because the cached page must be outside the range of the direct IO
that is in progress - direct IO only tries to flush pages over the
range it is doing the IO over.

> Every time, after I run this: echo 1 > /proc/sys/vm/drop_caches
> 
> Immediately enhance performance.

Because that flushes whatever page is in the cache. Can you identify
what offset that cached page is at? Tracing the xfs events will tell
you what pages that operation invalidates on each inode, and knowing
the offset may tell us why that page is not getting flushed.

Alternatively, write a simple C program that deomnstrates the same
problem so we can reproduce it easily, fix the problem and turn it
into a regression test....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* 回复: XFS direct IO problem
  2015-04-08  4:49 ` Dave Chinner
@ 2015-04-08  7:05   ` YeYin
  2015-04-08 21:14     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: YeYin @ 2015-04-08  7:05 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4307 bytes --]

Dave,
Thank you for your explanation. I got the reason, and I write some code to  simulate the MySQL.It will reproduce the progress:‍


open file without direct flag
read file   //cause kernel readahead 4 pages, and inode->i_mapping->nrpages > 0
close file


open file with direct flag
lseek 4*4096 // skip 4 readahead pages
read  file //cause xfs_flushinval_pages to do nothing
...


I'd like to ask XFS how to resovle this problem?


Attach code:‍
 
/* gcc -o test_read test_read.c
 
 * dd if=/dev/zero of=/data1/fo.dat bs=4096 count=10
 
 * ./test_read /data1/fo.dat 2 direct
 
 * */
 
#define _GNU_SOURCE
 
#include <stdio.h>
 
#include <unistd.h>
 
#include <sys/types.h>
 
#include <sys/stat.h>
 
#include <fcntl.h>
 
#include <errno.h>
 
#include <string.h>
 
#include <stdlib.h>
 


 
#define BUFSIZE 4096
 


 
int read_count = 2;
 


 
int main(int argc, char *argv[]){
 
	if(argc < 3){
 
		fprintf(stderr, "usage: %s <file> <count> [buffer|direct]\n", argv[0]);
 
		exit(1);
 
	}
 
	char *buf = memalign(BUFSIZE - 1, BUFSIZE);
 
	char *file = argv[1];
 
	read_count = atoi(argv[2]);
 
	int ret = 0,sum = 0, i = 0, fd = -1;
 
	if(argc == 4 && strncmp(argv[3], "direct",6) == 0){
 
		//fd = open(file, O_RDONLY|O_DIRECT);
 
		fd = open(file, O_RDONLY);
 
		if(fd < 0){
 
			fprintf(stderr, "open read only file failed\n");
 
			exit(1);
 
		}
 
		ret = read(fd, buf, BUFSIZE);
 
		if(ret < 0){
 
			fprintf(stderr, "buffer read error\n");
 
		}
 
		close(fd);
 
		fd = open(file, O_RDWR);
 
		if(fd < 0){
 
			fprintf(stderr, "open read only file failed\n");
 
			exit(1);
 
		}
 


 
		if (fcntl(fd, F_SETFL, O_RDONLY|O_DIRECT) == -1) {
 
			fprintf(stderr, "set direct error\n");
 
			exit(1);
 
		}
 


 
	}else{
 
		fd = open(file, O_RDONLY);
 
		if(fd < 0){
 
			fprintf(stderr, "open buf file failed\n");
 
			exit(1);
 
		}
 
	}
 


 
	while(i++ < read_count){
 
		//memset(buf, 0, BUFSIZE);
 
		if(buf == NULL){
 
			fprintf(stderr, "memory allocate failed\n");
 
			exit(1);
 
		}
 
		if(lseek(fd, 4*4096, SEEK_SET) < 0){
 
			fprintf(stderr, "seek error!\n");
 
			break;
 
		}
 
		ret = read(fd, buf, BUFSIZE);
 
		if(ret > 0){
 
			sum += ret;
 
		}else if(ret == 0){
 
			printf("read end\n");
 
			break;
 
		}
 
		else{
 
			printf("error:%d\n", errno);
 
			break;
 
		}
 
		sleep(1);
 
	}
 
	printf("read sum: %d\n", sum);
 
	close(fd);
 
	free(buf);
 
	return 0;
 
}‍





------------------ 原始邮件 ------------------
发件人: "Dave Chinner";<david@fromorbit.com>;
发送时间: 2015年4月8日(星期三) 中午12:49
收件人: "YeYin"<eyniy@qq.com>; 
抄送: "xfs"<xfs@oss.sgi.com>; 
主题: Re: XFS direct IO problem



On Wed, Apr 08, 2015 at 12:21:45PM +0800, YeYin wrote:
> Hi, About 2 months ago, I asked one problem in XFS, see
> here(http://oss.sgi.com/archives/xfs/2015-02/msg00197.html).
> 
> 
> After that, I use direct IO in MySQL, see
> here(https://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_flush_method).‍
> 
> 
> However, I found that MySQL performance is still poor sometimes. I
> use some tools(https://github.com/brendangregg/perf-tools‍) to
> trace the kernel, I found some problems:

<snip>

> This will cause bad performance, even direct IO. I still don't
> understand why not truncate_inode_page called?

Because the cached page must be outside the range of the direct IO
that is in progress - direct IO only tries to flush pages over the
range it is doing the IO over.

> Every time, after I run this: echo 1 > /proc/sys/vm/drop_caches
> 
> Immediately enhance performance.

Because that flushes whatever page is in the cache. Can you identify
what offset that cached page is at? Tracing the xfs events will tell
you what pages that operation invalidates on each inode, and knowing
the offset may tell us why that page is not getting flushed.

Alternatively, write a simple C program that deomnstrates the same
problem so we can reproduce it easily, fix the problem and turn it
into a regression test....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

[-- Attachment #1.2: Type: text/html, Size: 13464 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 回复: XFS direct IO problem
  2015-04-08  7:05   ` 回复: " YeYin
@ 2015-04-08 21:14     ` Dave Chinner
  2015-04-09  2:37       ` 回复: " YeYin
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2015-04-08 21:14 UTC (permalink / raw)
  To: YeYin; +Cc: xfs

On Wed, Apr 08, 2015 at 03:05:57PM +0800, YeYin wrote:
> Dave,
> Thank you for your explanation. I got the reason, and I write some code to  simulate the MySQL.It will reproduce the progress:‍
> 
> 
> open file without direct flag
> read file   //cause kernel readahead 4 pages, and inode->i_mapping->nrpages > 0
> close file
> 
> 
> open file with direct flag
> lseek 4*4096 // skip 4 readahead pages
> read  file //cause xfs_flushinval_pages to do nothing
> ...
> 

Yes, you can cause it that way, but any application mixing buffered
IO and direct IO like that is broken.  I'll point you at the open(2)
man page, in the section about O_DIRECT:

	"Applications should avoid mixing O_DIRECT and normal I/O to
	the same file, and especially to overlapping byte regions in
	the  same file.   Even  when  the filesystem correctly
	handles the coherency issues in this situation, overall I/O
	throughput is likely to be slower than using either mode
	alone.  Likewise, applications should avoid mixing mmap(2)
	of files with direct I/O to the same files."

IOWs, your test program is behaving as documented for a program
that mixes buffered and direct IO....

AFAIK, MySQL does not do  mixed buffer/direct IO like this and so
this is extremely unlikely to be the source of the problem.  I need
to understand how MySQL is generating cached pages on it's database
files when it is supposed to be using direct IO, and the reproducer
program needs to do what MySQL does to generate cached pages.

Can you please find the location of the cached pages (as I
sugggested via tracing in my last email) in the MySQL files that are
causing the problem?

> I'd like to ask XFS how to resovle this problem?

Applications that need to mix buffered and direct IO can invalidate
the cached pages by using POSIX_FADV_DONTNEED before doing direct
IO.

FWIW, You must be looking at quite old kernel code if
xfs_flushinval_pages() exists in your kernel. Does MySQL on a
current upstream kernel have the same problem?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* 回复:  回复: XFS direct IO problem
  2015-04-08 21:14     ` Dave Chinner
@ 2015-04-09  2:37       ` YeYin
  2015-04-09  3:48         ` YeYin
  0 siblings, 1 reply; 6+ messages in thread
From: YeYin @ 2015-04-09  2:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4553 bytes --]

I traced MySQL:


[pid 13478] open("./test/big_tb.ibd", O_RDONLY) = 37
[pid 13478] pread(37, "W\346\203@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\v\37c\225\263\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
[pid 13478] close(37)                   = 0
[pid 13478] open("./test/big_tb.ibd", O_RDWR) = 37
[pid 13478] fcntl(37, F_SETFL, O_RDONLY|O_DIRECT) = 0
[pid 13478] fcntl(37, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid 13478] pread(37, "\350\301\270\271\0\0\0\3\377\377\377\377\377\377\377\377\0\0\0\v\37c\225\263E\277\0\0\0\0\0\0"..., 16384, 49152) = 16384
[pid 13478] pread(37, "e\251|m\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\31\245\243\0\5\0\0\0\0\0\0"..., 16384, 16384) = 16384‍



As we can see, MySQL will open data file twice when open table. And the first open file without O_DIRECT flag will generate page cache. I traced kernel:


Tracing "sys_pread64"... Ctrl-C to end.
  3)               |  sys_pread64() {
  3)   0.362 us    |    fget_light();
  3)               |    vfs_read() {
  3)               |      rw_verify_area() {
  3)               |        security_file_permission() {
  3)   0.251 us    |          cap_file_permission();
  3)   0.817 us    |        }
  3)   1.377 us    |      }
  3)               |      do_sync_read() {
  3)               |        xfs_file_aio_read() {
  3)   0.259 us    |          generic_segment_checks();
  3)               |          xfs_rw_ilock() {
  3)               |            xfs_ilock() {
  3)               |              down_read() {
  3)   0.233 us    |                _cond_resched();
  3)   0.713 us    |              }
  3)   1.433 us    |            }
  3)   2.097 us    |          }
  3)               |          generic_file_aio_read() {
  3)   0.229 us    |            generic_segment_checks();
  3)   0.227 us    |            _cond_resched();
  3)   0.261 us    |            find_get_page();
  3)               |            page_cache_sync_readahead() {
  3)               |              ondemand_readahead() {

‍...


I run MySQL 5.5.24 on CentOS6.5, with kernel 2.6.32-431.
Later I will use the newer kernel to test it.‍


Thanks,
Ye


------------------ 原始邮件 ------------------
发件人: "Dave Chinner";<david@fromorbit.com>;
发送时间: 2015年4月9日(星期四) 凌晨5:14
收件人: "YeYin"<eyniy@qq.com>; 
抄送: "xfs"<xfs@oss.sgi.com>; 
主题: Re:  回复: XFS direct IO problem



On Wed, Apr 08, 2015 at 03:05:57PM +0800, YeYin wrote:
> Dave,
> Thank you for your explanation. I got the reason, and I write some code to  simulate the MySQL.It will reproduce the progress:‍
> 
> 
> open file without direct flag
> read file   //cause kernel readahead 4 pages, and inode->i_mapping->nrpages > 0
> close file
> 
> 
> open file with direct flag
> lseek 4*4096 // skip 4 readahead pages
> read  file //cause xfs_flushinval_pages to do nothing
> ...
> 

Yes, you can cause it that way, but any application mixing buffered
IO and direct IO like that is broken.  I'll point you at the open(2)
man page, in the section about O_DIRECT:

	"Applications should avoid mixing O_DIRECT and normal I/O to
	the same file, and especially to overlapping byte regions in
	the  same file.   Even  when  the filesystem correctly
	handles the coherency issues in this situation, overall I/O
	throughput is likely to be slower than using either mode
	alone.  Likewise, applications should avoid mixing mmap(2)
	of files with direct I/O to the same files."

IOWs, your test program is behaving as documented for a program
that mixes buffered and direct IO....

AFAIK, MySQL does not do  mixed buffer/direct IO like this and so
this is extremely unlikely to be the source of the problem.  I need
to understand how MySQL is generating cached pages on it's database
files when it is supposed to be using direct IO, and the reproducer
program needs to do what MySQL does to generate cached pages.

Can you please find the location of the cached pages (as I
sugggested via tracing in my last email) in the MySQL files that are
causing the problem?

> I'd like to ask XFS how to resovle this problem?

Applications that need to mix buffered and direct IO can invalidate
the cached pages by using POSIX_FADV_DONTNEED before doing direct
IO.

FWIW, You must be looking at quite old kernel code if
xfs_flushinval_pages() exists in your kernel. Does MySQL on a
current upstream kernel have the same problem?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

[-- Attachment #1.2: Type: text/html, Size: 7139 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* 回复:  回复: XFS direct IO problem
  2015-04-09  2:37       ` 回复: " YeYin
@ 2015-04-09  3:48         ` YeYin
  0 siblings, 0 replies; 6+ messages in thread
From: YeYin @ 2015-04-09  3:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4947 bytes --]

I have reported this problem to MySQL. See here:
http://bugs.mysql.com/bug.php?id=76627‍


Thanks,
Ye
------------------ 原始邮件 ------------------
发件人: "YeYin";<eyniy@qq.com>;
发送时间: 2015年4月9日(星期四) 上午10:37
收件人: "Dave Chinner"<david@fromorbit.com>; 
抄送: "xfs"<xfs@oss.sgi.com>; 
主题: 回复:  回复: XFS direct IO problem



I traced MySQL:


[pid 13478] open("./test/big_tb.ibd", O_RDONLY) = 37
[pid 13478] pread(37, "W\346\203@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\v\37c\225\263\0\10\0\0\0\0\0\0"..., 16384, 0) = 16384
[pid 13478] close(37)                   = 0
[pid 13478] open("./test/big_tb.ibd", O_RDWR) = 37
[pid 13478] fcntl(37, F_SETFL, O_RDONLY|O_DIRECT) = 0
[pid 13478] fcntl(37, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0
[pid 13478] pread(37, "\350\301\270\271\0\0\0\3\377\377\377\377\377\377\377\377\0\0\0\v\37c\225\263E\277\0\0\0\0\0\0"..., 16384, 49152) = 16384
[pid 13478] pread(37, "e\251|m\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\31\245\243\0\5\0\0\0\0\0\0"..., 16384, 16384) = 16384‍



As we can see, MySQL will open data file twice when open table. And the first open file without O_DIRECT flag will generate page cache. I traced kernel:


Tracing "sys_pread64"... Ctrl-C to end.
  3)               |  sys_pread64() {
  3)   0.362 us    |    fget_light();
  3)               |    vfs_read() {
  3)               |      rw_verify_area() {
  3)               |        security_file_permission() {
  3)   0.251 us    |          cap_file_permission();
  3)   0.817 us    |        }
  3)   1.377 us    |      }
  3)               |      do_sync_read() {
  3)               |        xfs_file_aio_read() {
  3)   0.259 us    |          generic_segment_checks();
  3)               |          xfs_rw_ilock() {
  3)               |            xfs_ilock() {
  3)               |              down_read() {
  3)   0.233 us    |                _cond_resched();
  3)   0.713 us    |              }
  3)   1.433 us    |            }
  3)   2.097 us    |          }
  3)               |          generic_file_aio_read() {
  3)   0.229 us    |            generic_segment_checks();
  3)   0.227 us    |            _cond_resched();
  3)   0.261 us    |            find_get_page();
  3)               |            page_cache_sync_readahead() {
  3)               |              ondemand_readahead() {

‍...


I run MySQL 5.5.24 on CentOS6.5, with kernel 2.6.32-431.
Later I will use the newer kernel to test it.‍


Thanks,
Ye


------------------ 原始邮件 ------------------
发件人: "Dave Chinner";<david@fromorbit.com>;
发送时间: 2015年4月9日(星期四) 凌晨5:14
收件人: "YeYin"<eyniy@qq.com>; 
抄送: "xfs"<xfs@oss.sgi.com>; 
主题: Re:  回复: XFS direct IO problem



On Wed, Apr 08, 2015 at 03:05:57PM +0800, YeYin wrote:
> Dave,
> Thank you for your explanation. I got the reason, and I write some code to  simulate the MySQL.It will reproduce the progress:‍
> 
> 
> open file without direct flag
> read file   //cause kernel readahead 4 pages, and inode->i_mapping->nrpages > 0
> close file
> 
> 
> open file with direct flag
> lseek 4*4096 // skip 4 readahead pages
> read  file //cause xfs_flushinval_pages to do nothing
> ...
> 

Yes, you can cause it that way, but any application mixing buffered
IO and direct IO like that is broken.  I'll point you at the open(2)
man page, in the section about O_DIRECT:

	"Applications should avoid mixing O_DIRECT and normal I/O to
	the same file, and especially to overlapping byte regions in
	the  same file.   Even  when  the filesystem correctly
	handles the coherency issues in this situation, overall I/O
	throughput is likely to be slower than using either mode
	alone.  Likewise, applications should avoid mixing mmap(2)
	of files with direct I/O to the same files."

IOWs, your test program is behaving as documented for a program
that mixes buffered and direct IO....

AFAIK, MySQL does not do  mixed buffer/direct IO like this and so
this is extremely unlikely to be the source of the problem.  I need
to understand how MySQL is generating cached pages on it's database
files when it is supposed to be using direct IO, and the reproducer
program needs to do what MySQL does to generate cached pages.

Can you please find the location of the cached pages (as I
sugggested via tracing in my last email) in the MySQL files that are
causing the problem?

> I'd like to ask XFS how to resovle this problem?

Applications that need to mix buffered and direct IO can invalidate
the cached pages by using POSIX_FADV_DONTNEED before doing direct
IO.

FWIW, You must be looking at quite old kernel code if
xfs_flushinval_pages() exists in your kernel. Does MySQL on a
current upstream kernel have the same problem?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

[-- Attachment #1.2: Type: text/html, Size: 7954 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-04-09  3:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-08  4:21 XFS direct IO problem YeYin
2015-04-08  4:49 ` Dave Chinner
2015-04-08  7:05   ` 回复: " YeYin
2015-04-08 21:14     ` Dave Chinner
2015-04-09  2:37       ` 回复: " YeYin
2015-04-09  3:48         ` YeYin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.