All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] generic/465: just check the actual read data under dio read/write
@ 2017-11-21  7:10 Xiao Yang
  2017-11-23  6:14 ` Eryu Guan
  0 siblings, 1 reply; 6+ messages in thread
From: Xiao Yang @ 2017-11-21  7:10 UTC (permalink / raw)
  To: fstests; +Cc: Xiao Yang

I got the following message when running generic/465 on RHEL7.4
---------------------------------------------------------------
QA output created by 465
non-aio dio test
encounter an error: block 43 offset 0, content 0
encounter an error: block 0 offset 4096, content 62
encounter an error: block 1 offset 0, content 0
encounter an error: block 17 offset 0, content 0
aio-dio test
encounter an error: block 9 offset 0, content 0
encounter an error: block 2 offset 0, content 0
encounter an error: block 0 offset 4096, content 62
encounter an error: block 12 offset 0, content 0
---------------------------------------------------------------

It is possible that dio read reads less than 1M data while dio write
is writing 1M data into file, because concurrent dio read/write to the
same file cannot guarantee atomicity and may split specified size.
Please see this URL for detailed explanation:
https://marc.info/?l=linux-fsdevel&m=151053542926457&w=2

We can just check the actual read data instead of the whole read buffer.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
---
 src/aio-dio-regress/aio-dio-append-write-read-race.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/aio-dio-regress/aio-dio-append-write-read-race.c b/src/aio-dio-regress/aio-dio-append-write-read-race.c
index 8f94d50..848755c 100644
--- a/src/aio-dio-regress/aio-dio-append-write-read-race.c
+++ b/src/aio-dio-regress/aio-dio-append-write-read-race.c
@@ -46,6 +46,7 @@ struct io_data {
 };
 
 int reader_ready = 0;
+static volatile int act_rd_sz;
 
 static void usage(const char *prog)
 {
@@ -57,15 +58,15 @@ static void usage(const char *prog)
 static void *reader(void *arg)
 {
 	struct io_data *data = (struct io_data *)arg;
-	int ret;
 
+	act_rd_sz = 0;
 	memset(data->buf, 'b', data->blksize);
 	reader_ready = 1;
 	do {
-		ret = pread(data->fd, data->buf, data->blksize, data->offset);
-		if (ret < 0)
+		act_rd_sz = pread(data->fd, data->buf, data->blksize, data->offset);
+		if (act_rd_sz < 0)
 			perror("read file");
-	} while (ret <= 0);
+	} while (act_rd_sz <= 0);
 
 	return NULL;
 }
@@ -203,7 +204,7 @@ int main(int argc, char *argv[])
 			goto err;
 		}
 
-		for (j = 0; j < blksize; j++) {
+		for (j = 0; j < act_rd_sz; j++) {
 			if (rdata.buf[j] != 'a') {
 				fail("encounter an error: "
 					"block %d offset %d, content %x\n",
-- 
1.8.3.1




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] generic/465: just check the actual read data under dio read/write
  2017-11-21  7:10 [PATCH] generic/465: just check the actual read data under dio read/write Xiao Yang
@ 2017-11-23  6:14 ` Eryu Guan
  2017-11-27  2:34   ` [PATCH v2] " Xiao Yang
  0 siblings, 1 reply; 6+ messages in thread
From: Eryu Guan @ 2017-11-23  6:14 UTC (permalink / raw)
  To: Xiao Yang; +Cc: fstests

On Tue, Nov 21, 2017 at 03:10:40PM +0800, Xiao Yang wrote:
> I got the following message when running generic/465 on RHEL7.4
> ---------------------------------------------------------------
> QA output created by 465
> non-aio dio test
> encounter an error: block 43 offset 0, content 0
> encounter an error: block 0 offset 4096, content 62
> encounter an error: block 1 offset 0, content 0
> encounter an error: block 17 offset 0, content 0
> aio-dio test
> encounter an error: block 9 offset 0, content 0
> encounter an error: block 2 offset 0, content 0
> encounter an error: block 0 offset 4096, content 62
> encounter an error: block 12 offset 0, content 0
> ---------------------------------------------------------------
> 
> It is possible that dio read reads less than 1M data while dio write
> is writing 1M data into file, because concurrent dio read/write to the
> same file cannot guarantee atomicity and may split specified size.
> Please see this URL for detailed explanation:
> https://marc.info/?l=linux-fsdevel&m=151053542926457&w=2
> 
> We can just check the actual read data instead of the whole read buffer.

I think the bug in the test is real, but it has nothing to do with the
atomicity of direct I/O. (Sorry I misled you previously offline.)

The short version of why you see reader reads less than 1M data is
simply because RHEL7.4 doesn't have the referenced fix in the test,
commit 9fe55eea7 ("Fix race when checking i_size on direct i/o read").

The long version follows. This test is doing append direct write to the
file, and ext4 falls back to buffer write in this case. So the writer
takes exclusive inode lock and does buffer write, which updates i_size
by at most page size in each write cycle (generic_perform_write()). And
on the reader side, due to commit 9fe55eea7, direct read won't fall back
to buffer read, and direct read will block on taking shared inode lock
until the writer releases the lock, so direct read always sees the final
inode size.

But on RHEL7.4, on the other hand, the reader will also fall back to
buffer read, due to lack of the fix, and buffer read doesn't take inode
lock, so it doesn't need to wait for the writer to finish first and sees
the intermediate inode size and returns data less than 1M.

The test needs fix as well (but in a different way), because in ext4
data=journal mode, all direct I/O are falling back to buffer I/O, so you
still could hit the short read case.

> 
> Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
> ---
>  src/aio-dio-regress/aio-dio-append-write-read-race.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/src/aio-dio-regress/aio-dio-append-write-read-race.c b/src/aio-dio-regress/aio-dio-append-write-read-race.c
> index 8f94d50..848755c 100644
> --- a/src/aio-dio-regress/aio-dio-append-write-read-race.c
> +++ b/src/aio-dio-regress/aio-dio-append-write-read-race.c
> @@ -46,6 +46,7 @@ struct io_data {
>  };
>  
>  int reader_ready = 0;
> +static volatile int act_rd_sz;

You can introduce a new field in struct io_data to record the return
value of pread in reader thread and do check based on that value in
main().

Thanks,
Eryu

>  
>  static void usage(const char *prog)
>  {
> @@ -57,15 +58,15 @@ static void usage(const char *prog)
>  static void *reader(void *arg)
>  {
>  	struct io_data *data = (struct io_data *)arg;
> -	int ret;
>  
> +	act_rd_sz = 0;
>  	memset(data->buf, 'b', data->blksize);
>  	reader_ready = 1;
>  	do {
> -		ret = pread(data->fd, data->buf, data->blksize, data->offset);
> -		if (ret < 0)
> +		act_rd_sz = pread(data->fd, data->buf, data->blksize, data->offset);
> +		if (act_rd_sz < 0)
>  			perror("read file");
> -	} while (ret <= 0);
> +	} while (act_rd_sz <= 0);
>  
>  	return NULL;
>  }
> @@ -203,7 +204,7 @@ int main(int argc, char *argv[])
>  			goto err;
>  		}
>  
> -		for (j = 0; j < blksize; j++) {
> +		for (j = 0; j < act_rd_sz; j++) {
>  			if (rdata.buf[j] != 'a') {
>  				fail("encounter an error: "
>  					"block %d offset %d, content %x\n",
> -- 
> 1.8.3.1
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] generic/465: just check the actual read data under dio read/write
  2017-11-23  6:14 ` Eryu Guan
@ 2017-11-27  2:34   ` Xiao Yang
  2017-11-28  7:20     ` Eryu Guan
  0 siblings, 1 reply; 6+ messages in thread
From: Xiao Yang @ 2017-11-27  2:34 UTC (permalink / raw)
  To: eguan; +Cc: fstests, Xiao Yang

I got the following message when running generic/465 in ext4 data=journal mode
---------------------------------------------------------------
QA output created by 465
non-aio dio test
encounter an error: block 0 offset 4096, content 62
encounter an error: block 0 offset 122880, content 62
encounter an error: block 0 offset 274432, content 62
encounter an error: block 0 offset 86016, content 62
aio-dio test
encounter an error: block 0 offset 28672, content 62
encounter an error: block 0 offset 12288, content 62
encounter an error: block 2 offset 16384, content 62
encounter an error: block 1 offset 565248, content 62
---------------------------------------------------------------

In ext4 data=journal mode, direct read will fall back to buffer read, and buffer
read doesn't take inode lock, so it doesn't need to wait for the writer to finish
first and sees the intermediate inode size and returns data less than 1M.

We can just check the actual read data instead of the whole read buffer.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
---
 src/aio-dio-regress/aio-dio-append-write-read-race.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/aio-dio-regress/aio-dio-append-write-read-race.c b/src/aio-dio-regress/aio-dio-append-write-read-race.c
index 8f94d50..eaec784 100644
--- a/src/aio-dio-regress/aio-dio-append-write-read-race.c
+++ b/src/aio-dio-regress/aio-dio-append-write-read-race.c
@@ -43,6 +43,7 @@ struct io_data {
 	off_t offset;
 	char *buf;
 	int use_aio;
+	size_t act_sz;
 };
 
 int reader_ready = 0;
@@ -57,15 +58,14 @@ static void usage(const char *prog)
 static void *reader(void *arg)
 {
 	struct io_data *data = (struct io_data *)arg;
-	int ret;
 
 	memset(data->buf, 'b', data->blksize);
 	reader_ready = 1;
 	do {
-		ret = pread(data->fd, data->buf, data->blksize, data->offset);
-		if (ret < 0)
+		data->act_sz = pread(data->fd, data->buf, data->blksize, data->offset);
+		if (data->act_sz < 0)
 			perror("read file");
-	} while (ret <= 0);
+	} while (data->act_sz <= 0);
 
 	return NULL;
 }
@@ -203,7 +203,7 @@ int main(int argc, char *argv[])
 			goto err;
 		}
 
-		for (j = 0; j < blksize; j++) {
+		for (j = 0; j < rdata.act_sz; j++) {
 			if (rdata.buf[j] != 'a') {
 				fail("encounter an error: "
 					"block %d offset %d, content %x\n",
-- 
1.8.3.1




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] generic/465: just check the actual read data under dio read/write
  2017-11-27  2:34   ` [PATCH v2] " Xiao Yang
@ 2017-11-28  7:20     ` Eryu Guan
  2017-11-29  1:42       ` Xiao Yang
  2017-11-29  1:44       ` [PATCH v3] " Xiao Yang
  0 siblings, 2 replies; 6+ messages in thread
From: Eryu Guan @ 2017-11-28  7:20 UTC (permalink / raw)
  To: Xiao Yang; +Cc: fstests

On Mon, Nov 27, 2017 at 10:34:27AM +0800, Xiao Yang wrote:
> I got the following message when running generic/465 in ext4 data=journal mode
> ---------------------------------------------------------------
> QA output created by 465
> non-aio dio test
> encounter an error: block 0 offset 4096, content 62
> encounter an error: block 0 offset 122880, content 62
> encounter an error: block 0 offset 274432, content 62
> encounter an error: block 0 offset 86016, content 62
> aio-dio test
> encounter an error: block 0 offset 28672, content 62
> encounter an error: block 0 offset 12288, content 62
> encounter an error: block 2 offset 16384, content 62
> encounter an error: block 1 offset 565248, content 62
> ---------------------------------------------------------------
> 
> In ext4 data=journal mode, direct read will fall back to buffer read, and buffer
> read doesn't take inode lock, so it doesn't need to wait for the writer to finish
> first and sees the intermediate inode size and returns data less than 1M.
> 
> We can just check the actual read data instead of the whole read buffer.
> 
> Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
> ---
>  src/aio-dio-regress/aio-dio-append-write-read-race.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/src/aio-dio-regress/aio-dio-append-write-read-race.c b/src/aio-dio-regress/aio-dio-append-write-read-race.c
> index 8f94d50..eaec784 100644
> --- a/src/aio-dio-regress/aio-dio-append-write-read-race.c
> +++ b/src/aio-dio-regress/aio-dio-append-write-read-race.c
> @@ -43,6 +43,7 @@ struct io_data {
>  	off_t offset;
>  	char *buf;
>  	int use_aio;
> +	size_t act_sz;

I think this should be a signed value, as pread returns ssize_t. And
maybe rename the variable to read_sz? Otherwise looks fine to me. And I
can do the updates on commit, if there's no further review comments.

Thanks,
Eryu

>  };
>  
>  int reader_ready = 0;
> @@ -57,15 +58,14 @@ static void usage(const char *prog)
>  static void *reader(void *arg)
>  {
>  	struct io_data *data = (struct io_data *)arg;
> -	int ret;
>  
>  	memset(data->buf, 'b', data->blksize);
>  	reader_ready = 1;
>  	do {
> -		ret = pread(data->fd, data->buf, data->blksize, data->offset);
> -		if (ret < 0)
> +		data->act_sz = pread(data->fd, data->buf, data->blksize, data->offset);
> +		if (data->act_sz < 0)
>  			perror("read file");
> -	} while (ret <= 0);
> +	} while (data->act_sz <= 0);
>  
>  	return NULL;
>  }
> @@ -203,7 +203,7 @@ int main(int argc, char *argv[])
>  			goto err;
>  		}
>  
> -		for (j = 0; j < blksize; j++) {
> +		for (j = 0; j < rdata.act_sz; j++) {
>  			if (rdata.buf[j] != 'a') {
>  				fail("encounter an error: "
>  					"block %d offset %d, content %x\n",
> -- 
> 1.8.3.1
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] generic/465: just check the actual read data under dio read/write
  2017-11-28  7:20     ` Eryu Guan
@ 2017-11-29  1:42       ` Xiao Yang
  2017-11-29  1:44       ` [PATCH v3] " Xiao Yang
  1 sibling, 0 replies; 6+ messages in thread
From: Xiao Yang @ 2017-11-29  1:42 UTC (permalink / raw)
  To: Eryu Guan; +Cc: fstests

On 2017/11/28 15:20, Eryu Guan wrote:
> On Mon, Nov 27, 2017 at 10:34:27AM +0800, Xiao Yang wrote:
>> I got the following message when running generic/465 in ext4 data=journal mode
>> ---------------------------------------------------------------
>> QA output created by 465
>> non-aio dio test
>> encounter an error: block 0 offset 4096, content 62
>> encounter an error: block 0 offset 122880, content 62
>> encounter an error: block 0 offset 274432, content 62
>> encounter an error: block 0 offset 86016, content 62
>> aio-dio test
>> encounter an error: block 0 offset 28672, content 62
>> encounter an error: block 0 offset 12288, content 62
>> encounter an error: block 2 offset 16384, content 62
>> encounter an error: block 1 offset 565248, content 62
>> ---------------------------------------------------------------
>>
>> In ext4 data=journal mode, direct read will fall back to buffer read, and buffer
>> read doesn't take inode lock, so it doesn't need to wait for the writer to finish
>> first and sees the intermediate inode size and returns data less than 1M.
>>
>> We can just check the actual read data instead of the whole read buffer.
>>
>> Signed-off-by: Xiao Yang<yangx.jy@cn.fujitsu.com>
>> ---
>>   src/aio-dio-regress/aio-dio-append-write-read-race.c | 10 +++++-----
>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/src/aio-dio-regress/aio-dio-append-write-read-race.c b/src/aio-dio-regress/aio-dio-append-write-read-race.c
>> index 8f94d50..eaec784 100644
>> --- a/src/aio-dio-regress/aio-dio-append-write-read-race.c
>> +++ b/src/aio-dio-regress/aio-dio-append-write-read-race.c
>> @@ -43,6 +43,7 @@ struct io_data {
>>   	off_t offset;
>>   	char *buf;
>>   	int use_aio;
>> +	size_t act_sz;
> I think this should be a signed value, as pread returns ssize_t. And
> maybe rename the variable to read_sz? Otherwise looks fine to me. And I
> can do the updates on commit, if there's no further review comments.
Hi Eryu,

Thanks for your comment, i can send v3 patch as you suggested. :-)

Thanks,
Xiao Yang
> Thanks,
> Eryu
>
>>   };
>>
>>   int reader_ready = 0;
>> @@ -57,15 +58,14 @@ static void usage(const char *prog)
>>   static void *reader(void *arg)
>>   {
>>   	struct io_data *data = (struct io_data *)arg;
>> -	int ret;
>>
>>   	memset(data->buf, 'b', data->blksize);
>>   	reader_ready = 1;
>>   	do {
>> -		ret = pread(data->fd, data->buf, data->blksize, data->offset);
>> -		if (ret<  0)
>> +		data->act_sz = pread(data->fd, data->buf, data->blksize, data->offset);
>> +		if (data->act_sz<  0)
>>   			perror("read file");
>> -	} while (ret<= 0);
>> +	} while (data->act_sz<= 0);
>>
>>   	return NULL;
>>   }
>> @@ -203,7 +203,7 @@ int main(int argc, char *argv[])
>>   			goto err;
>>   		}
>>
>> -		for (j = 0; j<  blksize; j++) {
>> +		for (j = 0; j<  rdata.act_sz; j++) {
>>   			if (rdata.buf[j] != 'a') {
>>   				fail("encounter an error: "
>>   					"block %d offset %d, content %x\n",
>> -- 
>> 1.8.3.1
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fstests" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> .
>




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3] generic/465: just check the actual read data under dio read/write
  2017-11-28  7:20     ` Eryu Guan
  2017-11-29  1:42       ` Xiao Yang
@ 2017-11-29  1:44       ` Xiao Yang
  1 sibling, 0 replies; 6+ messages in thread
From: Xiao Yang @ 2017-11-29  1:44 UTC (permalink / raw)
  To: eguan; +Cc: fstests, Xiao Yang

I got the following message when running generic/465 in ext4 data=journal mode
---------------------------------------------------------------
QA output created by 465
non-aio dio test
encounter an error: block 0 offset 4096, content 62
encounter an error: block 0 offset 122880, content 62
encounter an error: block 0 offset 274432, content 62
encounter an error: block 0 offset 86016, content 62
aio-dio test
encounter an error: block 0 offset 28672, content 62
encounter an error: block 0 offset 12288, content 62
encounter an error: block 2 offset 16384, content 62
encounter an error: block 1 offset 565248, content 62
---------------------------------------------------------------

In ext4 data=journal mode, direct read will fall back to buffer read, and buffer
read doesn't take inode lock, so it doesn't need to wait for the writer to finish
first and sees the intermediate inode size and returns data less than 1M.

We can just check the actual read data instead of the whole read buffer.

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
---
 src/aio-dio-regress/aio-dio-append-write-read-race.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/aio-dio-regress/aio-dio-append-write-read-race.c b/src/aio-dio-regress/aio-dio-append-write-read-race.c
index 8f94d50..7805af1 100644
--- a/src/aio-dio-regress/aio-dio-append-write-read-race.c
+++ b/src/aio-dio-regress/aio-dio-append-write-read-race.c
@@ -43,6 +43,7 @@ struct io_data {
 	off_t offset;
 	char *buf;
 	int use_aio;
+	int read_sz;
 };
 
 int reader_ready = 0;
@@ -57,15 +58,14 @@ static void usage(const char *prog)
 static void *reader(void *arg)
 {
 	struct io_data *data = (struct io_data *)arg;
-	int ret;
 
 	memset(data->buf, 'b', data->blksize);
 	reader_ready = 1;
 	do {
-		ret = pread(data->fd, data->buf, data->blksize, data->offset);
-		if (ret < 0)
+		data->read_sz = pread(data->fd, data->buf, data->blksize, data->offset);
+		if (data->read_sz < 0)
 			perror("read file");
-	} while (ret <= 0);
+	} while (data->read_sz <= 0);
 
 	return NULL;
 }
@@ -203,7 +203,7 @@ int main(int argc, char *argv[])
 			goto err;
 		}
 
-		for (j = 0; j < blksize; j++) {
+		for (j = 0; j < rdata.read_sz; j++) {
 			if (rdata.buf[j] != 'a') {
 				fail("encounter an error: "
 					"block %d offset %d, content %x\n",
-- 
1.8.3.1




^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-29  1:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-21  7:10 [PATCH] generic/465: just check the actual read data under dio read/write Xiao Yang
2017-11-23  6:14 ` Eryu Guan
2017-11-27  2:34   ` [PATCH v2] " Xiao Yang
2017-11-28  7:20     ` Eryu Guan
2017-11-29  1:42       ` Xiao Yang
2017-11-29  1:44       ` [PATCH v3] " Xiao Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.