linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
@ 2023-08-29 18:36 Lei Huang
  2023-08-29 21:57 ` Bernd Schubert
  2024-03-06 10:01 ` Miklos Szeredi
  0 siblings, 2 replies; 11+ messages in thread
From: Lei Huang @ 2023-08-29 18:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: lei.huang, miklos, linux-fsdevel

Our user space filesystem relies on fuse to provide POSIX interface.
In our test, a known string is written into a file and the content
is read back later to verify correct data returned. We observed wrong
data returned in read buffer in rare cases although correct data are
stored in our filesystem.

Fuse kernel module calls iov_iter_get_pages2() to get the physical
pages of the user-space read buffer passed in read(). The pages are
not pinned to avoid page migration. When page migration occurs, the
consequence are two-folds.

1) Applications do not receive correct data in read buffer.
2) fuse kernel writes data into a wrong place.

Using iov_iter_extract_pages() to pin pages fixes the issue in our
test.

An auxiliary variable "struct page **pt_pages" is used in the patch
to prepare the 2nd parameter for iov_iter_extract_pages() since
iov_iter_get_pages2() uses a different type for the 2nd parameter.

Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
---
 fs/fuse/file.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index bc41152..715de3b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -670,7 +670,7 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap,
 	for (i = 0; i < ap->num_pages; i++) {
 		if (should_dirty)
 			set_page_dirty_lock(ap->pages[i]);
-		put_page(ap->pages[i]);
+		unpin_user_page(ap->pages[i]);
 	}
 }
 
@@ -1428,10 +1428,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
 	while (nbytes < *nbytesp && ap->num_pages < max_pages) {
 		unsigned npages;
 		size_t start;
-		ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages],
-					*nbytesp - nbytes,
-					max_pages - ap->num_pages,
-					&start);
+		struct page **pt_pages;
+
+		pt_pages = &ap->pages[ap->num_pages];
+		ret = iov_iter_extract_pages(ii, &pt_pages,
+					     *nbytesp - nbytes,
+					     max_pages - ap->num_pages,
+					     0, &start);
 		if (ret < 0)
 			break;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2023-08-29 18:36 [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io Lei Huang
@ 2023-08-29 21:57 ` Bernd Schubert
  2023-08-30  1:03   ` Lei Huang
  2024-03-06 10:01 ` Miklos Szeredi
  1 sibling, 1 reply; 11+ messages in thread
From: Bernd Schubert @ 2023-08-29 21:57 UTC (permalink / raw)
  To: Lei Huang, linux-kernel; +Cc: miklos, linux-fsdevel, David Howells



On 8/29/23 20:36, Lei Huang wrote:
> Our user space filesystem relies on fuse to provide POSIX interface.
> In our test, a known string is written into a file and the content
> is read back later to verify correct data returned. We observed wrong
> data returned in read buffer in rare cases although correct data are
> stored in our filesystem.
> 
> Fuse kernel module calls iov_iter_get_pages2() to get the physical
> pages of the user-space read buffer passed in read(). The pages are
> not pinned to avoid page migration. When page migration occurs, the
> consequence are two-folds.
> 
> 1) Applications do not receive correct data in read buffer.
> 2) fuse kernel writes data into a wrong place.
> 
> Using iov_iter_extract_pages() to pin pages fixes the issue in our
> test.

Hmm, iov_iter_extract_pages does not exists for a long time and the code 
in fuse_get_user_pages didn't change much. So if you are right, there 
would be a long term data corruption for page migrations? And a back 
port to old kernels would not be obvious?

What confuses me further is that
commit 85dd2c8ff368 does not mention migration or corruption, although 
lists several other advantages for iov_iter_extract_pages. Other commits 
using iov_iter_extract_pages point to fork - i.e. would your data 
corruption be possibly related that?


Thanks,
Bernd


> 
> An auxiliary variable "struct page **pt_pages" is used in the patch
> to prepare the 2nd parameter for iov_iter_extract_pages() since
> iov_iter_get_pages2() uses a different type for the 2nd parameter.
> 
> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
> ---
>   fs/fuse/file.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index bc41152..715de3b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -670,7 +670,7 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap,
>   	for (i = 0; i < ap->num_pages; i++) {
>   		if (should_dirty)
>   			set_page_dirty_lock(ap->pages[i]);
> -		put_page(ap->pages[i]);
> +		unpin_user_page(ap->pages[i]);
>   	}
>   }
>   
> @@ -1428,10 +1428,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
>   	while (nbytes < *nbytesp && ap->num_pages < max_pages) {
>   		unsigned npages;
>   		size_t start;
> -		ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages],
> -					*nbytesp - nbytes,
> -					max_pages - ap->num_pages,
> -					&start);
> +		struct page **pt_pages;
> +
> +		pt_pages = &ap->pages[ap->num_pages];
> +		ret = iov_iter_extract_pages(ii, &pt_pages,
> +					     *nbytesp - nbytes,
> +					     max_pages - ap->num_pages,
> +					     0, &start);
>   		if (ret < 0)
>   			break;
>   

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2023-08-29 21:57 ` Bernd Schubert
@ 2023-08-30  1:03   ` Lei Huang
  2023-09-01 17:56     ` Bernd Schubert
  2023-09-01 18:05     ` Bernd Schubert
  0 siblings, 2 replies; 11+ messages in thread
From: Lei Huang @ 2023-08-30  1:03 UTC (permalink / raw)
  To: Bernd Schubert, linux-kernel; +Cc: miklos, linux-fsdevel, David Howells

Hi Bernd,

Thank you very much for your reply!

 > Hmm, iov_iter_extract_pages does not exists for a long time and the code
 > in fuse_get_user_pages didn't change much. So if you are right, there
 > would be a long term data corruption for page migrations? And a back
 > port to old kernels would not be obvious?

Right. The issue has been reproduced under various versions of kernels, 
ranging from 3.10.0 to 6.3.6 in my tests. It would be different to make 
a patch under older kernels like 3.10.0. One way I tested, one can query
the physical pages associated with read buffer after data is ready 
(right before writing the data into read buffer). This seems resolving 
the issue in my tests.


 > What confuses me further is that
 > commit 85dd2c8ff368 does not mention migration or corruption, although
 > lists several other advantages for iov_iter_extract_pages. Other commits
 > using iov_iter_extract_pages point to fork - i.e. would your data
 > corruption be possibly related that?

As I mentioned above, the issue seems resolved if we query the physical 
pages as late as right before writing data into read buffer. I think the 
root cause is page migration.

Best Regards,
-lei

On 8/29/23 17:57, Bernd Schubert wrote:
> 
> 
> On 8/29/23 20:36, Lei Huang wrote:
>> Our user space filesystem relies on fuse to provide POSIX interface.
>> In our test, a known string is written into a file and the content
>> is read back later to verify correct data returned. We observed wrong
>> data returned in read buffer in rare cases although correct data are
>> stored in our filesystem.
>>
>> Fuse kernel module calls iov_iter_get_pages2() to get the physical
>> pages of the user-space read buffer passed in read(). The pages are
>> not pinned to avoid page migration. When page migration occurs, the
>> consequence are two-folds.
>>
>> 1) Applications do not receive correct data in read buffer.
>> 2) fuse kernel writes data into a wrong place.
>>
>> Using iov_iter_extract_pages() to pin pages fixes the issue in our
>> test.
> 
> Hmm, iov_iter_extract_pages does not exists for a long time and the code 
> in fuse_get_user_pages didn't change much. So if you are right, there 
> would be a long term data corruption for page migrations? And a back 
> port to old kernels would not be obvious?
> 
> What confuses me further is that
> commit 85dd2c8ff368 does not mention migration or corruption, although 
> lists several other advantages for iov_iter_extract_pages. Other commits 
> using iov_iter_extract_pages point to fork - i.e. would your data 
> corruption be possibly related that?
> 
> 
> Thanks,
> Bernd
> 
> 
>>
>> An auxiliary variable "struct page **pt_pages" is used in the patch
>> to prepare the 2nd parameter for iov_iter_extract_pages() since
>> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>>
>> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
>> ---
>>   fs/fuse/file.c | 13 ++++++++-----
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index bc41152..715de3b 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -670,7 +670,7 @@ static void fuse_release_user_pages(struct 
>> fuse_args_pages *ap,
>>       for (i = 0; i < ap->num_pages; i++) {
>>           if (should_dirty)
>>               set_page_dirty_lock(ap->pages[i]);
>> -        put_page(ap->pages[i]);
>> +        unpin_user_page(ap->pages[i]);
>>       }
>>   }
>> @@ -1428,10 +1428,13 @@ static int fuse_get_user_pages(struct 
>> fuse_args_pages *ap, struct iov_iter *ii,
>>       while (nbytes < *nbytesp && ap->num_pages < max_pages) {
>>           unsigned npages;
>>           size_t start;
>> -        ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages],
>> -                    *nbytesp - nbytes,
>> -                    max_pages - ap->num_pages,
>> -                    &start);
>> +        struct page **pt_pages;
>> +
>> +        pt_pages = &ap->pages[ap->num_pages];
>> +        ret = iov_iter_extract_pages(ii, &pt_pages,
>> +                         *nbytesp - nbytes,
>> +                         max_pages - ap->num_pages,
>> +                         0, &start);
>>           if (ret < 0)
>>               break;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2023-08-30  1:03   ` Lei Huang
@ 2023-09-01 17:56     ` Bernd Schubert
  2023-09-01 18:05     ` Bernd Schubert
  1 sibling, 0 replies; 11+ messages in thread
From: Bernd Schubert @ 2023-09-01 17:56 UTC (permalink / raw)
  To: Lei Huang, Bernd Schubert, linux-kernel
  Cc: miklos, linux-fsdevel, David Howells

Hi Lei,

On 8/30/23 03:03, Lei Huang wrote:
> Hi Bernd,
> 
> Thank you very much for your reply!
> 
>  > Hmm, iov_iter_extract_pages does not exists for a long time and the code
>  > in fuse_get_user_pages didn't change much. So if you are right, there
>  > would be a long term data corruption for page migrations? And a back
>  > port to old kernels would not be obvious?
> 
> Right. The issue has been reproduced under various versions of kernels, 
> ranging from 3.10.0 to 6.3.6 in my tests. It would be different to make 
> a patch under older kernels like 3.10.0. One way I tested, one can query
> the physical pages associated with read buffer after data is ready 
> (right before writing the data into read buffer). This seems resolving 
> the issue in my tests.
> 
> 
>  > What confuses me further is that
>  > commit 85dd2c8ff368 does not mention migration or corruption, although
>  > lists several other advantages for iov_iter_extract_pages. Other commits
>  > using iov_iter_extract_pages point to fork - i.e. would your data
>  > corruption be possibly related that?
> 
> As I mentioned above, the issue seems resolved if we query the physical 
> pages as late as right before writing data into read buffer. I think the 
> root cause is page migration.
> 

out of interest, what is your exact reproducer and how much time does i 
take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get 
and issue after about 1h run time. I let it continue over the weekend. 
The system is an older dual socket xeon.

(*) with slight modification for passthrough_hp to disable O_DIRECT for 
the underlying file system. It is running on xfs on an nvme.

(**) https://github.com/bsbernd/ql-fstest


Pinning the pages is certainly a good idea, I would just like to 
understand how severe the issue is. And would like to test 
backports/different patch on older kernels.


Thanks,
Bernd



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2023-08-30  1:03   ` Lei Huang
  2023-09-01 17:56     ` Bernd Schubert
@ 2023-09-01 18:05     ` Bernd Schubert
  2023-09-01 22:02       ` Lei Huang
  2023-09-01 22:18       ` Lei Huang
  1 sibling, 2 replies; 11+ messages in thread
From: Bernd Schubert @ 2023-09-01 18:05 UTC (permalink / raw)
  To: Lei Huang, Bernd Schubert, linux-kernel
  Cc: miklos, linux-fsdevel, David Howells

Hi Lei,

On 8/30/23 03:03, Lei Huang wrote:
> Hi Bernd,
> 
> Thank you very much for your reply!
> 
>  > Hmm, iov_iter_extract_pages does not exists for a long time and the code
>  > in fuse_get_user_pages didn't change much. So if you are right, there
>  > would be a long term data corruption for page migrations? And a back
>  > port to old kernels would not be obvious?
> 
> Right. The issue has been reproduced under various versions of kernels, 
> ranging from 3.10.0 to 6.3.6 in my tests. It would be different to make 
> a patch under older kernels like 3.10.0. One way I tested, one can query
> the physical pages associated with read buffer after data is ready 
> (right before writing the data into read buffer). This seems resolving 
> the issue in my tests.
> 
> 
>  > What confuses me further is that
>  > commit 85dd2c8ff368 does not mention migration or corruption, although
>  > lists several other advantages for iov_iter_extract_pages. Other commits
>  > using iov_iter_extract_pages point to fork - i.e. would your data
>  > corruption be possibly related that?
> 
> As I mentioned above, the issue seems resolved if we query the physical 
> pages as late as right before writing data into read buffer. I think the 
> root cause is page migration.
> 

out of interest, what is your exact reproducer and how much time does i 
take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get 
and issue after about 1h run time. I let it continue over the weekend. 
The system is an older dual socket xeon.

(*) with slight modification for passthrough_hp to disable O_DIRECT for 
the underlying file system. It is running on xfs on an nvme.

(**) https://github.com/bsbernd/ql-fstest


Pinning the pages is certainly a good idea, I would just like to 
understand how severe the issue is. And would like to test 
backports/different patch on older kernels.


Thanks,
Bernd



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2023-09-01 18:05     ` Bernd Schubert
@ 2023-09-01 22:02       ` Lei Huang
  2023-09-01 22:18       ` Lei Huang
  1 sibling, 0 replies; 11+ messages in thread
From: Lei Huang @ 2023-09-01 22:02 UTC (permalink / raw)
  To: Bernd Schubert, Bernd Schubert, linux-kernel
  Cc: miklos, linux-fsdevel, David Howells

Hi Bernd,

Actually, we do not have a simple code to reproduce this issue yet. We 
observed the issues in a suite of tests. There are up to 8 tests 
scheduled to run concurrently with many small tests. I could observe 
about 5~10 cases with wrong data in one day. I tried repeating a single 
test for very long time and did not observe the issue. I guess it would 
help to run simultaneously multiple processes causing bad memory 
fragmentation. Please let me know if you need more information. Thank 
you very much for looking into this issue. Have a great weekend!

Best Regards,
-lei

On 9/1/23 14:05, Bernd Schubert wrote:
> Hi Lei,
> 
> On 8/30/23 03:03, Lei Huang wrote:
>> Hi Bernd,
>>
>> Thank you very much for your reply!
>>
>>  > Hmm, iov_iter_extract_pages does not exists for a long time and the 
>> code
>>  > in fuse_get_user_pages didn't change much. So if you are right, there
>>  > would be a long term data corruption for page migrations? And a back
>>  > port to old kernels would not be obvious?
>>
>> Right. The issue has been reproduced under various versions of 
>> kernels, ranging from 3.10.0 to 6.3.6 in my tests. It would be 
>> different to make a patch under older kernels like 3.10.0. One way I 
>> tested, one can query
>> the physical pages associated with read buffer after data is ready 
>> (right before writing the data into read buffer). This seems resolving 
>> the issue in my tests.
>>
>>
>>  > What confuses me further is that
>>  > commit 85dd2c8ff368 does not mention migration or corruption, although
>>  > lists several other advantages for iov_iter_extract_pages. Other 
>> commits
>>  > using iov_iter_extract_pages point to fork - i.e. would your data
>>  > corruption be possibly related that?
>>
>> As I mentioned above, the issue seems resolved if we query the 
>> physical pages as late as right before writing data into read buffer. 
>> I think the root cause is page migration.
>>
> 
> out of interest, what is your exact reproducer and how much time does i 
> take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get 
> and issue after about 1h run time. I let it continue over the weekend. 
> The system is an older dual socket xeon.
> 
> (*) with slight modification for passthrough_hp to disable O_DIRECT for 
> the underlying file system. It is running on xfs on an nvme.
> 
> (**) https://github.com/bsbernd/ql-fstest
> 
> 
> Pinning the pages is certainly a good idea, I would just like to 
> understand how severe the issue is. And would like to test 
> backports/different patch on older kernels.
> 
> 
> Thanks,
> Bernd
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2023-09-01 18:05     ` Bernd Schubert
  2023-09-01 22:02       ` Lei Huang
@ 2023-09-01 22:18       ` Lei Huang
  1 sibling, 0 replies; 11+ messages in thread
From: Lei Huang @ 2023-09-01 22:18 UTC (permalink / raw)
  To: Bernd Schubert, Bernd Schubert, linux-kernel
  Cc: miklos, linux-fsdevel, David Howells

Hi Bernd,

Actually, we do not have a simple code to reproduce this issue yet. We 
observed the issues in a suite of tests which shedules multiple (6~8) 
tests concurrently. I could observe about 5~10 cases with wrong data in 
one day. I tried repeating a single test for a couple of days and did 
not observe the issue. I think concurrently running multiple processes 
which can make bad memory fragmentation could help to observe the issue. 
Please let me know if you need more information. Thank you very much for 
looking into this issue. Have a great weekend!

Best Regards,
-lei

On 9/1/23 14:05, Bernd Schubert wrote:
> Hi Lei,
> 
> On 8/30/23 03:03, Lei Huang wrote:
>> Hi Bernd,
>>
>> Thank you very much for your reply!
>>
>>  > Hmm, iov_iter_extract_pages does not exists for a long time and the 
>> code
>>  > in fuse_get_user_pages didn't change much. So if you are right, there
>>  > would be a long term data corruption for page migrations? And a back
>>  > port to old kernels would not be obvious?
>>
>> Right. The issue has been reproduced under various versions of 
>> kernels, ranging from 3.10.0 to 6.3.6 in my tests. It would be 
>> different to make a patch under older kernels like 3.10.0. One way I 
>> tested, one can query
>> the physical pages associated with read buffer after data is ready 
>> (right before writing the data into read buffer). This seems resolving 
>> the issue in my tests.
>>
>>
>>  > What confuses me further is that
>>  > commit 85dd2c8ff368 does not mention migration or corruption, although
>>  > lists several other advantages for iov_iter_extract_pages. Other 
>> commits
>>  > using iov_iter_extract_pages point to fork - i.e. would your data
>>  > corruption be possibly related that?
>>
>> As I mentioned above, the issue seems resolved if we query the 
>> physical pages as late as right before writing data into read buffer. 
>> I think the root cause is page migration.
>>
> 
> out of interest, what is your exact reproducer and how much time does i 
> take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get 
> and issue after about 1h run time. I let it continue over the weekend. 
> The system is an older dual socket xeon.
> 
> (*) with slight modification for passthrough_hp to disable O_DIRECT for 
> the underlying file system. It is running on xfs on an nvme.
> 
> (**) https://github.com/bsbernd/ql-fstest
> 
> 
> Pinning the pages is certainly a good idea, I would just like to 
> understand how severe the issue is. And would like to test 
> backports/different patch on older kernels.
> 
> 
> Thanks,
> Bernd
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2023-08-29 18:36 [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io Lei Huang
  2023-08-29 21:57 ` Bernd Schubert
@ 2024-03-06 10:01 ` Miklos Szeredi
  2024-03-06 11:16   ` Bernd Schubert
  1 sibling, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2024-03-06 10:01 UTC (permalink / raw)
  To: Lei Huang; +Cc: linux-kernel, linux-fsdevel

On Tue, 29 Aug 2023 at 20:37, Lei Huang <lei.huang@linux.intel.com> wrote:
>
> Our user space filesystem relies on fuse to provide POSIX interface.
> In our test, a known string is written into a file and the content
> is read back later to verify correct data returned. We observed wrong
> data returned in read buffer in rare cases although correct data are
> stored in our filesystem.
>
> Fuse kernel module calls iov_iter_get_pages2() to get the physical
> pages of the user-space read buffer passed in read(). The pages are
> not pinned to avoid page migration. When page migration occurs, the
> consequence are two-folds.
>
> 1) Applications do not receive correct data in read buffer.
> 2) fuse kernel writes data into a wrong place.
>
> Using iov_iter_extract_pages() to pin pages fixes the issue in our
> test.
>
> An auxiliary variable "struct page **pt_pages" is used in the patch
> to prepare the 2nd parameter for iov_iter_extract_pages() since
> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>
> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>

Applied, with a modification to only unpin if
iov_iter_extract_will_pin() returns true.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2024-03-06 10:01 ` Miklos Szeredi
@ 2024-03-06 11:16   ` Bernd Schubert
  2024-03-06 12:05     ` Miklos Szeredi
  0 siblings, 1 reply; 11+ messages in thread
From: Bernd Schubert @ 2024-03-06 11:16 UTC (permalink / raw)
  To: Miklos Szeredi, Lei Huang; +Cc: linux-kernel, linux-fsdevel



On 3/6/24 11:01, Miklos Szeredi wrote:
> On Tue, 29 Aug 2023 at 20:37, Lei Huang <lei.huang@linux.intel.com> wrote:
>>
>> Our user space filesystem relies on fuse to provide POSIX interface.
>> In our test, a known string is written into a file and the content
>> is read back later to verify correct data returned. We observed wrong
>> data returned in read buffer in rare cases although correct data are
>> stored in our filesystem.
>>
>> Fuse kernel module calls iov_iter_get_pages2() to get the physical
>> pages of the user-space read buffer passed in read(). The pages are
>> not pinned to avoid page migration. When page migration occurs, the
>> consequence are two-folds.
>>
>> 1) Applications do not receive correct data in read buffer.
>> 2) fuse kernel writes data into a wrong place.
>>
>> Using iov_iter_extract_pages() to pin pages fixes the issue in our
>> test.
>>
>> An auxiliary variable "struct page **pt_pages" is used in the patch
>> to prepare the 2nd parameter for iov_iter_extract_pages() since
>> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>>
>> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
> 
> Applied, with a modification to only unpin if
> iov_iter_extract_will_pin() returns true.

Hi Miklos,

do you have an idea if this needs to be back ported and to which kernel
version?
I had tried to reproduce data corruption with 4.18 - Lei wrote that he
could see issues with older kernels as well, but I never managed to
trigger anything on 4.18-RHEL. Typically I use ql-fstest
(https://github.com/bsbernd/ql-fstest) and even added random DIO as an
option - nothing report with weeks of run time. I could try again with
more recent kernels that have folios.

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2024-03-06 11:16   ` Bernd Schubert
@ 2024-03-06 12:05     ` Miklos Szeredi
  2024-03-10  2:59       ` Lei Huang
  0 siblings, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2024-03-06 12:05 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: Lei Huang, linux-kernel, linux-fsdevel

On Wed, 6 Mar 2024 at 12:16, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 3/6/24 11:01, Miklos Szeredi wrote:
> > On Tue, 29 Aug 2023 at 20:37, Lei Huang <lei.huang@linux.intel.com> wrote:
> >>
> >> Our user space filesystem relies on fuse to provide POSIX interface.
> >> In our test, a known string is written into a file and the content
> >> is read back later to verify correct data returned. We observed wrong
> >> data returned in read buffer in rare cases although correct data are
> >> stored in our filesystem.
> >>
> >> Fuse kernel module calls iov_iter_get_pages2() to get the physical
> >> pages of the user-space read buffer passed in read(). The pages are
> >> not pinned to avoid page migration. When page migration occurs, the
> >> consequence are two-folds.
> >>
> >> 1) Applications do not receive correct data in read buffer.
> >> 2) fuse kernel writes data into a wrong place.
> >>
> >> Using iov_iter_extract_pages() to pin pages fixes the issue in our
> >> test.
> >>
> >> An auxiliary variable "struct page **pt_pages" is used in the patch
> >> to prepare the 2nd parameter for iov_iter_extract_pages() since
> >> iov_iter_get_pages2() uses a different type for the 2nd parameter.
> >>
> >> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
> >
> > Applied, with a modification to only unpin if
> > iov_iter_extract_will_pin() returns true.
>
> Hi Miklos,
>
> do you have an idea if this needs to be back ported and to which kernel
> version?
> I had tried to reproduce data corruption with 4.18 - Lei wrote that he
> could see issues with older kernels as well, but I never managed to
> trigger anything on 4.18-RHEL. Typically I use ql-fstest
> (https://github.com/bsbernd/ql-fstest) and even added random DIO as an
> option - nothing report with weeks of run time. I could try again with
> more recent kernels that have folios.

I don't think that corruption will happen in real life.  So I'm not
sure we need to bother with backporting, and definitely not before
when the infrastructure was introduced.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
  2024-03-06 12:05     ` Miklos Szeredi
@ 2024-03-10  2:59       ` Lei Huang
  0 siblings, 0 replies; 11+ messages in thread
From: Lei Huang @ 2024-03-10  2:59 UTC (permalink / raw)
  To: Miklos Szeredi, Bernd Schubert; +Cc: linux-kernel, linux-fsdevel

Thank you very much, Miklos!

Yes. It is not easy to reproduce the issues in real applications. We 
only observed the issue in our own testing tool which runs multiple 
tests concurrently. We have not been able reproduce it with simple code yet.

-lei

On 3/6/24 07:05, Miklos Szeredi wrote:
> On Wed, 6 Mar 2024 at 12:16, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 3/6/24 11:01, Miklos Szeredi wrote:
>>> On Tue, 29 Aug 2023 at 20:37, Lei Huang <lei.huang@linux.intel.com> wrote:
>>>>
>>>> Our user space filesystem relies on fuse to provide POSIX interface.
>>>> In our test, a known string is written into a file and the content
>>>> is read back later to verify correct data returned. We observed wrong
>>>> data returned in read buffer in rare cases although correct data are
>>>> stored in our filesystem.
>>>>
>>>> Fuse kernel module calls iov_iter_get_pages2() to get the physical
>>>> pages of the user-space read buffer passed in read(). The pages are
>>>> not pinned to avoid page migration. When page migration occurs, the
>>>> consequence are two-folds.
>>>>
>>>> 1) Applications do not receive correct data in read buffer.
>>>> 2) fuse kernel writes data into a wrong place.
>>>>
>>>> Using iov_iter_extract_pages() to pin pages fixes the issue in our
>>>> test.
>>>>
>>>> An auxiliary variable "struct page **pt_pages" is used in the patch
>>>> to prepare the 2nd parameter for iov_iter_extract_pages() since
>>>> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>>>>
>>>> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
>>>
>>> Applied, with a modification to only unpin if
>>> iov_iter_extract_will_pin() returns true.
>>
>> Hi Miklos,
>>
>> do you have an idea if this needs to be back ported and to which kernel
>> version?
>> I had tried to reproduce data corruption with 4.18 - Lei wrote that he
>> could see issues with older kernels as well, but I never managed to
>> trigger anything on 4.18-RHEL. Typically I use ql-fstest
>> (https://github.com/bsbernd/ql-fstest) and even added random DIO as an
>> option - nothing report with weeks of run time. I could try again with
>> more recent kernels that have folios.
> 
> I don't think that corruption will happen in real life.  So I'm not
> sure we need to bother with backporting, and definitely not before
> when the infrastructure was introduced.
> 
> Thanks,
> Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-10  2:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-29 18:36 [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io Lei Huang
2023-08-29 21:57 ` Bernd Schubert
2023-08-30  1:03   ` Lei Huang
2023-09-01 17:56     ` Bernd Schubert
2023-09-01 18:05     ` Bernd Schubert
2023-09-01 22:02       ` Lei Huang
2023-09-01 22:18       ` Lei Huang
2024-03-06 10:01 ` Miklos Szeredi
2024-03-06 11:16   ` Bernd Schubert
2024-03-06 12:05     ` Miklos Szeredi
2024-03-10  2:59       ` Lei Huang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).