* [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
@ 2023-08-29 18:36 Lei Huang
2023-08-29 21:57 ` Bernd Schubert
2024-03-06 10:01 ` Miklos Szeredi
0 siblings, 2 replies; 11+ messages in thread
From: Lei Huang @ 2023-08-29 18:36 UTC (permalink / raw)
To: linux-kernel; +Cc: lei.huang, miklos, linux-fsdevel
Our user space filesystem relies on fuse to provide POSIX interface.
In our test, a known string is written into a file and the content
is read back later to verify correct data returned. We observed wrong
data returned in read buffer in rare cases although correct data are
stored in our filesystem.
Fuse kernel module calls iov_iter_get_pages2() to get the physical
pages of the user-space read buffer passed in read(). The pages are
not pinned to avoid page migration. When page migration occurs, the
consequence are two-folds.
1) Applications do not receive correct data in read buffer.
2) fuse kernel writes data into a wrong place.
Using iov_iter_extract_pages() to pin pages fixes the issue in our
test.
An auxiliary variable "struct page **pt_pages" is used in the patch
to prepare the 2nd parameter for iov_iter_extract_pages() since
iov_iter_get_pages2() uses a different type for the 2nd parameter.
Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
---
fs/fuse/file.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index bc41152..715de3b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -670,7 +670,7 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap,
for (i = 0; i < ap->num_pages; i++) {
if (should_dirty)
set_page_dirty_lock(ap->pages[i]);
- put_page(ap->pages[i]);
+ unpin_user_page(ap->pages[i]);
}
}
@@ -1428,10 +1428,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
while (nbytes < *nbytesp && ap->num_pages < max_pages) {
unsigned npages;
size_t start;
- ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages],
- *nbytesp - nbytes,
- max_pages - ap->num_pages,
- &start);
+ struct page **pt_pages;
+
+ pt_pages = &ap->pages[ap->num_pages];
+ ret = iov_iter_extract_pages(ii, &pt_pages,
+ *nbytesp - nbytes,
+ max_pages - ap->num_pages,
+ 0, &start);
if (ret < 0)
break;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2023-08-29 18:36 [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io Lei Huang
@ 2023-08-29 21:57 ` Bernd Schubert
2023-08-30 1:03 ` Lei Huang
2024-03-06 10:01 ` Miklos Szeredi
1 sibling, 1 reply; 11+ messages in thread
From: Bernd Schubert @ 2023-08-29 21:57 UTC (permalink / raw)
To: Lei Huang, linux-kernel; +Cc: miklos, linux-fsdevel, David Howells
On 8/29/23 20:36, Lei Huang wrote:
> Our user space filesystem relies on fuse to provide POSIX interface.
> In our test, a known string is written into a file and the content
> is read back later to verify correct data returned. We observed wrong
> data returned in read buffer in rare cases although correct data are
> stored in our filesystem.
>
> Fuse kernel module calls iov_iter_get_pages2() to get the physical
> pages of the user-space read buffer passed in read(). The pages are
> not pinned to avoid page migration. When page migration occurs, the
> consequence are two-folds.
>
> 1) Applications do not receive correct data in read buffer.
> 2) fuse kernel writes data into a wrong place.
>
> Using iov_iter_extract_pages() to pin pages fixes the issue in our
> test.
Hmm, iov_iter_extract_pages does not exists for a long time and the code
in fuse_get_user_pages didn't change much. So if you are right, there
would be a long term data corruption for page migrations? And a back
port to old kernels would not be obvious?
What confuses me further is that
commit 85dd2c8ff368 does not mention migration or corruption, although
lists several other advantages for iov_iter_extract_pages. Other commits
using iov_iter_extract_pages point to fork - i.e. would your data
corruption be possibly related that?
Thanks,
Bernd
>
> An auxiliary variable "struct page **pt_pages" is used in the patch
> to prepare the 2nd parameter for iov_iter_extract_pages() since
> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>
> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
> ---
> fs/fuse/file.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index bc41152..715de3b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -670,7 +670,7 @@ static void fuse_release_user_pages(struct fuse_args_pages *ap,
> for (i = 0; i < ap->num_pages; i++) {
> if (should_dirty)
> set_page_dirty_lock(ap->pages[i]);
> - put_page(ap->pages[i]);
> + unpin_user_page(ap->pages[i]);
> }
> }
>
> @@ -1428,10 +1428,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> while (nbytes < *nbytesp && ap->num_pages < max_pages) {
> unsigned npages;
> size_t start;
> - ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages],
> - *nbytesp - nbytes,
> - max_pages - ap->num_pages,
> - &start);
> + struct page **pt_pages;
> +
> + pt_pages = &ap->pages[ap->num_pages];
> + ret = iov_iter_extract_pages(ii, &pt_pages,
> + *nbytesp - nbytes,
> + max_pages - ap->num_pages,
> + 0, &start);
> if (ret < 0)
> break;
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2023-08-29 21:57 ` Bernd Schubert
@ 2023-08-30 1:03 ` Lei Huang
2023-09-01 17:56 ` Bernd Schubert
2023-09-01 18:05 ` Bernd Schubert
0 siblings, 2 replies; 11+ messages in thread
From: Lei Huang @ 2023-08-30 1:03 UTC (permalink / raw)
To: Bernd Schubert, linux-kernel; +Cc: miklos, linux-fsdevel, David Howells
Hi Bernd,
Thank you very much for your reply!
> Hmm, iov_iter_extract_pages does not exists for a long time and the code
> in fuse_get_user_pages didn't change much. So if you are right, there
> would be a long term data corruption for page migrations? And a back
> port to old kernels would not be obvious?
Right. The issue has been reproduced under various versions of kernels,
ranging from 3.10.0 to 6.3.6 in my tests. It would be different to make
a patch under older kernels like 3.10.0. One way I tested, one can query
the physical pages associated with read buffer after data is ready
(right before writing the data into read buffer). This seems resolving
the issue in my tests.
> What confuses me further is that
> commit 85dd2c8ff368 does not mention migration or corruption, although
> lists several other advantages for iov_iter_extract_pages. Other commits
> using iov_iter_extract_pages point to fork - i.e. would your data
> corruption be possibly related that?
As I mentioned above, the issue seems resolved if we query the physical
pages as late as right before writing data into read buffer. I think the
root cause is page migration.
Best Regards,
-lei
On 8/29/23 17:57, Bernd Schubert wrote:
>
>
> On 8/29/23 20:36, Lei Huang wrote:
>> Our user space filesystem relies on fuse to provide POSIX interface.
>> In our test, a known string is written into a file and the content
>> is read back later to verify correct data returned. We observed wrong
>> data returned in read buffer in rare cases although correct data are
>> stored in our filesystem.
>>
>> Fuse kernel module calls iov_iter_get_pages2() to get the physical
>> pages of the user-space read buffer passed in read(). The pages are
>> not pinned to avoid page migration. When page migration occurs, the
>> consequence are two-folds.
>>
>> 1) Applications do not receive correct data in read buffer.
>> 2) fuse kernel writes data into a wrong place.
>>
>> Using iov_iter_extract_pages() to pin pages fixes the issue in our
>> test.
>
> Hmm, iov_iter_extract_pages does not exists for a long time and the code
> in fuse_get_user_pages didn't change much. So if you are right, there
> would be a long term data corruption for page migrations? And a back
> port to old kernels would not be obvious?
>
> What confuses me further is that
> commit 85dd2c8ff368 does not mention migration or corruption, although
> lists several other advantages for iov_iter_extract_pages. Other commits
> using iov_iter_extract_pages point to fork - i.e. would your data
> corruption be possibly related that?
>
>
> Thanks,
> Bernd
>
>
>>
>> An auxiliary variable "struct page **pt_pages" is used in the patch
>> to prepare the 2nd parameter for iov_iter_extract_pages() since
>> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>>
>> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
>> ---
>> fs/fuse/file.c | 13 ++++++++-----
>> 1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index bc41152..715de3b 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -670,7 +670,7 @@ static void fuse_release_user_pages(struct
>> fuse_args_pages *ap,
>> for (i = 0; i < ap->num_pages; i++) {
>> if (should_dirty)
>> set_page_dirty_lock(ap->pages[i]);
>> - put_page(ap->pages[i]);
>> + unpin_user_page(ap->pages[i]);
>> }
>> }
>> @@ -1428,10 +1428,13 @@ static int fuse_get_user_pages(struct
>> fuse_args_pages *ap, struct iov_iter *ii,
>> while (nbytes < *nbytesp && ap->num_pages < max_pages) {
>> unsigned npages;
>> size_t start;
>> - ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages],
>> - *nbytesp - nbytes,
>> - max_pages - ap->num_pages,
>> - &start);
>> + struct page **pt_pages;
>> +
>> + pt_pages = &ap->pages[ap->num_pages];
>> + ret = iov_iter_extract_pages(ii, &pt_pages,
>> + *nbytesp - nbytes,
>> + max_pages - ap->num_pages,
>> + 0, &start);
>> if (ret < 0)
>> break;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2023-08-30 1:03 ` Lei Huang
@ 2023-09-01 17:56 ` Bernd Schubert
2023-09-01 18:05 ` Bernd Schubert
1 sibling, 0 replies; 11+ messages in thread
From: Bernd Schubert @ 2023-09-01 17:56 UTC (permalink / raw)
To: Lei Huang, Bernd Schubert, linux-kernel
Cc: miklos, linux-fsdevel, David Howells
Hi Lei,
On 8/30/23 03:03, Lei Huang wrote:
> Hi Bernd,
>
> Thank you very much for your reply!
>
> > Hmm, iov_iter_extract_pages does not exists for a long time and the code
> > in fuse_get_user_pages didn't change much. So if you are right, there
> > would be a long term data corruption for page migrations? And a back
> > port to old kernels would not be obvious?
>
> Right. The issue has been reproduced under various versions of kernels,
> ranging from 3.10.0 to 6.3.6 in my tests. It would be different to make
> a patch under older kernels like 3.10.0. One way I tested, one can query
> the physical pages associated with read buffer after data is ready
> (right before writing the data into read buffer). This seems resolving
> the issue in my tests.
>
>
> > What confuses me further is that
> > commit 85dd2c8ff368 does not mention migration or corruption, although
> > lists several other advantages for iov_iter_extract_pages. Other commits
> > using iov_iter_extract_pages point to fork - i.e. would your data
> > corruption be possibly related that?
>
> As I mentioned above, the issue seems resolved if we query the physical
> pages as late as right before writing data into read buffer. I think the
> root cause is page migration.
>
out of interest, what is your exact reproducer and how much time does i
take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get
and issue after about 1h run time. I let it continue over the weekend.
The system is an older dual socket xeon.
(*) with slight modification for passthrough_hp to disable O_DIRECT for
the underlying file system. It is running on xfs on an nvme.
(**) https://github.com/bsbernd/ql-fstest
Pinning the pages is certainly a good idea, I would just like to
understand how severe the issue is. And would like to test
backports/different patch on older kernels.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2023-08-30 1:03 ` Lei Huang
2023-09-01 17:56 ` Bernd Schubert
@ 2023-09-01 18:05 ` Bernd Schubert
2023-09-01 22:02 ` Lei Huang
2023-09-01 22:18 ` Lei Huang
1 sibling, 2 replies; 11+ messages in thread
From: Bernd Schubert @ 2023-09-01 18:05 UTC (permalink / raw)
To: Lei Huang, Bernd Schubert, linux-kernel
Cc: miklos, linux-fsdevel, David Howells
Hi Lei,
On 8/30/23 03:03, Lei Huang wrote:
> Hi Bernd,
>
> Thank you very much for your reply!
>
> > Hmm, iov_iter_extract_pages does not exists for a long time and the code
> > in fuse_get_user_pages didn't change much. So if you are right, there
> > would be a long term data corruption for page migrations? And a back
> > port to old kernels would not be obvious?
>
> Right. The issue has been reproduced under various versions of kernels,
> ranging from 3.10.0 to 6.3.6 in my tests. It would be different to make
> a patch under older kernels like 3.10.0. One way I tested, one can query
> the physical pages associated with read buffer after data is ready
> (right before writing the data into read buffer). This seems resolving
> the issue in my tests.
>
>
> > What confuses me further is that
> > commit 85dd2c8ff368 does not mention migration or corruption, although
> > lists several other advantages for iov_iter_extract_pages. Other commits
> > using iov_iter_extract_pages point to fork - i.e. would your data
> > corruption be possibly related that?
>
> As I mentioned above, the issue seems resolved if we query the physical
> pages as late as right before writing data into read buffer. I think the
> root cause is page migration.
>
out of interest, what is your exact reproducer and how much time does i
take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get
and issue after about 1h run time. I let it continue over the weekend.
The system is an older dual socket xeon.
(*) with slight modification for passthrough_hp to disable O_DIRECT for
the underlying file system. It is running on xfs on an nvme.
(**) https://github.com/bsbernd/ql-fstest
Pinning the pages is certainly a good idea, I would just like to
understand how severe the issue is. And would like to test
backports/different patch on older kernels.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2023-09-01 18:05 ` Bernd Schubert
@ 2023-09-01 22:02 ` Lei Huang
2023-09-01 22:18 ` Lei Huang
1 sibling, 0 replies; 11+ messages in thread
From: Lei Huang @ 2023-09-01 22:02 UTC (permalink / raw)
To: Bernd Schubert, Bernd Schubert, linux-kernel
Cc: miklos, linux-fsdevel, David Howells
Hi Bernd,
Actually, we do not have a simple code to reproduce this issue yet. We
observed the issues in a suite of tests. There are up to 8 tests
scheduled to run concurrently with many small tests. I could observe
about 5~10 cases with wrong data in one day. I tried repeating a single
test for very long time and did not observe the issue. I guess it would
help to run simultaneously multiple processes causing bad memory
fragmentation. Please let me know if you need more information. Thank
you very much for looking into this issue. Have a great weekend!
Best Regards,
-lei
On 9/1/23 14:05, Bernd Schubert wrote:
> Hi Lei,
>
> On 8/30/23 03:03, Lei Huang wrote:
>> Hi Bernd,
>>
>> Thank you very much for your reply!
>>
>> > Hmm, iov_iter_extract_pages does not exists for a long time and the
>> code
>> > in fuse_get_user_pages didn't change much. So if you are right, there
>> > would be a long term data corruption for page migrations? And a back
>> > port to old kernels would not be obvious?
>>
>> Right. The issue has been reproduced under various versions of
>> kernels, ranging from 3.10.0 to 6.3.6 in my tests. It would be
>> different to make a patch under older kernels like 3.10.0. One way I
>> tested, one can query
>> the physical pages associated with read buffer after data is ready
>> (right before writing the data into read buffer). This seems resolving
>> the issue in my tests.
>>
>>
>> > What confuses me further is that
>> > commit 85dd2c8ff368 does not mention migration or corruption, although
>> > lists several other advantages for iov_iter_extract_pages. Other
>> commits
>> > using iov_iter_extract_pages point to fork - i.e. would your data
>> > corruption be possibly related that?
>>
>> As I mentioned above, the issue seems resolved if we query the
>> physical pages as late as right before writing data into read buffer.
>> I think the root cause is page migration.
>>
>
> out of interest, what is your exact reproducer and how much time does i
> take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get
> and issue after about 1h run time. I let it continue over the weekend.
> The system is an older dual socket xeon.
>
> (*) with slight modification for passthrough_hp to disable O_DIRECT for
> the underlying file system. It is running on xfs on an nvme.
>
> (**) https://github.com/bsbernd/ql-fstest
>
>
> Pinning the pages is certainly a good idea, I would just like to
> understand how severe the issue is. And would like to test
> backports/different patch on older kernels.
>
>
> Thanks,
> Bernd
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2023-09-01 18:05 ` Bernd Schubert
2023-09-01 22:02 ` Lei Huang
@ 2023-09-01 22:18 ` Lei Huang
1 sibling, 0 replies; 11+ messages in thread
From: Lei Huang @ 2023-09-01 22:18 UTC (permalink / raw)
To: Bernd Schubert, Bernd Schubert, linux-kernel
Cc: miklos, linux-fsdevel, David Howells
Hi Bernd,
Actually, we do not have a simple code to reproduce this issue yet. We
observed the issues in a suite of tests which shedules multiple (6~8)
tests concurrently. I could observe about 5~10 cases with wrong data in
one day. I tried repeating a single test for a couple of days and did
not observe the issue. I think concurrently running multiple processes
which can make bad memory fragmentation could help to observe the issue.
Please let me know if you need more information. Thank you very much for
looking into this issue. Have a great weekend!
Best Regards,
-lei
On 9/1/23 14:05, Bernd Schubert wrote:
> Hi Lei,
>
> On 8/30/23 03:03, Lei Huang wrote:
>> Hi Bernd,
>>
>> Thank you very much for your reply!
>>
>> > Hmm, iov_iter_extract_pages does not exists for a long time and the
>> code
>> > in fuse_get_user_pages didn't change much. So if you are right, there
>> > would be a long term data corruption for page migrations? And a back
>> > port to old kernels would not be obvious?
>>
>> Right. The issue has been reproduced under various versions of
>> kernels, ranging from 3.10.0 to 6.3.6 in my tests. It would be
>> different to make a patch under older kernels like 3.10.0. One way I
>> tested, one can query
>> the physical pages associated with read buffer after data is ready
>> (right before writing the data into read buffer). This seems resolving
>> the issue in my tests.
>>
>>
>> > What confuses me further is that
>> > commit 85dd2c8ff368 does not mention migration or corruption, although
>> > lists several other advantages for iov_iter_extract_pages. Other
>> commits
>> > using iov_iter_extract_pages point to fork - i.e. would your data
>> > corruption be possibly related that?
>>
>> As I mentioned above, the issue seems resolved if we query the
>> physical pages as late as right before writing data into read buffer.
>> I think the root cause is page migration.
>>
>
> out of interest, what is your exact reproducer and how much time does i
> take? I'm just trying passthrough_hp(*) and ql-fstest (**) and don't get
> and issue after about 1h run time. I let it continue over the weekend.
> The system is an older dual socket xeon.
>
> (*) with slight modification for passthrough_hp to disable O_DIRECT for
> the underlying file system. It is running on xfs on an nvme.
>
> (**) https://github.com/bsbernd/ql-fstest
>
>
> Pinning the pages is certainly a good idea, I would just like to
> understand how severe the issue is. And would like to test
> backports/different patch on older kernels.
>
>
> Thanks,
> Bernd
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2023-08-29 18:36 [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io Lei Huang
2023-08-29 21:57 ` Bernd Schubert
@ 2024-03-06 10:01 ` Miklos Szeredi
2024-03-06 11:16 ` Bernd Schubert
1 sibling, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2024-03-06 10:01 UTC (permalink / raw)
To: Lei Huang; +Cc: linux-kernel, linux-fsdevel
On Tue, 29 Aug 2023 at 20:37, Lei Huang <lei.huang@linux.intel.com> wrote:
>
> Our user space filesystem relies on fuse to provide POSIX interface.
> In our test, a known string is written into a file and the content
> is read back later to verify correct data returned. We observed wrong
> data returned in read buffer in rare cases although correct data are
> stored in our filesystem.
>
> Fuse kernel module calls iov_iter_get_pages2() to get the physical
> pages of the user-space read buffer passed in read(). The pages are
> not pinned to avoid page migration. When page migration occurs, the
> consequence are two-folds.
>
> 1) Applications do not receive correct data in read buffer.
> 2) fuse kernel writes data into a wrong place.
>
> Using iov_iter_extract_pages() to pin pages fixes the issue in our
> test.
>
> An auxiliary variable "struct page **pt_pages" is used in the patch
> to prepare the 2nd parameter for iov_iter_extract_pages() since
> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>
> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
Applied, with a modification to only unpin if
iov_iter_extract_will_pin() returns true.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2024-03-06 10:01 ` Miklos Szeredi
@ 2024-03-06 11:16 ` Bernd Schubert
2024-03-06 12:05 ` Miklos Szeredi
0 siblings, 1 reply; 11+ messages in thread
From: Bernd Schubert @ 2024-03-06 11:16 UTC (permalink / raw)
To: Miklos Szeredi, Lei Huang; +Cc: linux-kernel, linux-fsdevel
On 3/6/24 11:01, Miklos Szeredi wrote:
> On Tue, 29 Aug 2023 at 20:37, Lei Huang <lei.huang@linux.intel.com> wrote:
>>
>> Our user space filesystem relies on fuse to provide POSIX interface.
>> In our test, a known string is written into a file and the content
>> is read back later to verify correct data returned. We observed wrong
>> data returned in read buffer in rare cases although correct data are
>> stored in our filesystem.
>>
>> Fuse kernel module calls iov_iter_get_pages2() to get the physical
>> pages of the user-space read buffer passed in read(). The pages are
>> not pinned to avoid page migration. When page migration occurs, the
>> consequence are two-folds.
>>
>> 1) Applications do not receive correct data in read buffer.
>> 2) fuse kernel writes data into a wrong place.
>>
>> Using iov_iter_extract_pages() to pin pages fixes the issue in our
>> test.
>>
>> An auxiliary variable "struct page **pt_pages" is used in the patch
>> to prepare the 2nd parameter for iov_iter_extract_pages() since
>> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>>
>> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
>
> Applied, with a modification to only unpin if
> iov_iter_extract_will_pin() returns true.
Hi Miklos,
do you have an idea if this needs to be back ported and to which kernel
version?
I had tried to reproduce data corruption with 4.18 - Lei wrote that he
could see issues with older kernels as well, but I never managed to
trigger anything on 4.18-RHEL. Typically I use ql-fstest
(https://github.com/bsbernd/ql-fstest) and even added random DIO as an
option - nothing report with weeks of run time. I could try again with
more recent kernels that have folios.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2024-03-06 11:16 ` Bernd Schubert
@ 2024-03-06 12:05 ` Miklos Szeredi
2024-03-10 2:59 ` Lei Huang
0 siblings, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2024-03-06 12:05 UTC (permalink / raw)
To: Bernd Schubert; +Cc: Lei Huang, linux-kernel, linux-fsdevel
On Wed, 6 Mar 2024 at 12:16, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 3/6/24 11:01, Miklos Szeredi wrote:
> > On Tue, 29 Aug 2023 at 20:37, Lei Huang <lei.huang@linux.intel.com> wrote:
> >>
> >> Our user space filesystem relies on fuse to provide POSIX interface.
> >> In our test, a known string is written into a file and the content
> >> is read back later to verify correct data returned. We observed wrong
> >> data returned in read buffer in rare cases although correct data are
> >> stored in our filesystem.
> >>
> >> Fuse kernel module calls iov_iter_get_pages2() to get the physical
> >> pages of the user-space read buffer passed in read(). The pages are
> >> not pinned to avoid page migration. When page migration occurs, the
> >> consequence are two-folds.
> >>
> >> 1) Applications do not receive correct data in read buffer.
> >> 2) fuse kernel writes data into a wrong place.
> >>
> >> Using iov_iter_extract_pages() to pin pages fixes the issue in our
> >> test.
> >>
> >> An auxiliary variable "struct page **pt_pages" is used in the patch
> >> to prepare the 2nd parameter for iov_iter_extract_pages() since
> >> iov_iter_get_pages2() uses a different type for the 2nd parameter.
> >>
> >> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
> >
> > Applied, with a modification to only unpin if
> > iov_iter_extract_will_pin() returns true.
>
> Hi Miklos,
>
> do you have an idea if this needs to be back ported and to which kernel
> version?
> I had tried to reproduce data corruption with 4.18 - Lei wrote that he
> could see issues with older kernels as well, but I never managed to
> trigger anything on 4.18-RHEL. Typically I use ql-fstest
> (https://github.com/bsbernd/ql-fstest) and even added random DIO as an
> option - nothing report with weeks of run time. I could try again with
> more recent kernels that have folios.
I don't think that corruption will happen in real life. So I'm not
sure we need to bother with backporting, and definitely not before
when the infrastructure was introduced.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io
2024-03-06 12:05 ` Miklos Szeredi
@ 2024-03-10 2:59 ` Lei Huang
0 siblings, 0 replies; 11+ messages in thread
From: Lei Huang @ 2024-03-10 2:59 UTC (permalink / raw)
To: Miklos Szeredi, Bernd Schubert; +Cc: linux-kernel, linux-fsdevel
Thank you very much, Miklos!
Yes. It is not easy to reproduce the issues in real applications. We
only observed the issue in our own testing tool which runs multiple
tests concurrently. We have not been able reproduce it with simple code yet.
-lei
On 3/6/24 07:05, Miklos Szeredi wrote:
> On Wed, 6 Mar 2024 at 12:16, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 3/6/24 11:01, Miklos Szeredi wrote:
>>> On Tue, 29 Aug 2023 at 20:37, Lei Huang <lei.huang@linux.intel.com> wrote:
>>>>
>>>> Our user space filesystem relies on fuse to provide POSIX interface.
>>>> In our test, a known string is written into a file and the content
>>>> is read back later to verify correct data returned. We observed wrong
>>>> data returned in read buffer in rare cases although correct data are
>>>> stored in our filesystem.
>>>>
>>>> Fuse kernel module calls iov_iter_get_pages2() to get the physical
>>>> pages of the user-space read buffer passed in read(). The pages are
>>>> not pinned to avoid page migration. When page migration occurs, the
>>>> consequence are two-folds.
>>>>
>>>> 1) Applications do not receive correct data in read buffer.
>>>> 2) fuse kernel writes data into a wrong place.
>>>>
>>>> Using iov_iter_extract_pages() to pin pages fixes the issue in our
>>>> test.
>>>>
>>>> An auxiliary variable "struct page **pt_pages" is used in the patch
>>>> to prepare the 2nd parameter for iov_iter_extract_pages() since
>>>> iov_iter_get_pages2() uses a different type for the 2nd parameter.
>>>>
>>>> Signed-off-by: Lei Huang <lei.huang@linux.intel.com>
>>>
>>> Applied, with a modification to only unpin if
>>> iov_iter_extract_will_pin() returns true.
>>
>> Hi Miklos,
>>
>> do you have an idea if this needs to be back ported and to which kernel
>> version?
>> I had tried to reproduce data corruption with 4.18 - Lei wrote that he
>> could see issues with older kernels as well, but I never managed to
>> trigger anything on 4.18-RHEL. Typically I use ql-fstest
>> (https://github.com/bsbernd/ql-fstest) and even added random DIO as an
>> option - nothing report with weeks of run time. I could try again with
>> more recent kernels that have folios.
>
> I don't think that corruption will happen in real life. So I'm not
> sure we need to bother with backporting, and definitely not before
> when the infrastructure was introduced.
>
> Thanks,
> Miklos
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-03-10 2:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-29 18:36 [PATCH v1] fs/fuse: Fix missing FOLL_PIN for direct-io Lei Huang
2023-08-29 21:57 ` Bernd Schubert
2023-08-30 1:03 ` Lei Huang
2023-09-01 17:56 ` Bernd Schubert
2023-09-01 18:05 ` Bernd Schubert
2023-09-01 22:02 ` Lei Huang
2023-09-01 22:18 ` Lei Huang
2024-03-06 10:01 ` Miklos Szeredi
2024-03-06 11:16 ` Bernd Schubert
2024-03-06 12:05 ` Miklos Szeredi
2024-03-10 2:59 ` Lei Huang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).