All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ceph: misc fix size truncate for fscrypt
@ 2022-04-07 14:41 xiubli
  2022-04-07 14:41 ` [PATCH 1/2] ceph: flush small range instead of the whole map for truncate xiubli
  2022-04-07 14:41 ` [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt xiubli
  0 siblings, 2 replies; 10+ messages in thread
From: xiubli @ 2022-04-07 14:41 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, lhenriques, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Xiubo Li (2):
  ceph: flush small range instead of the whole map for truncate
  ceph: fix coherency issue when truncating file size for fscrypt

 fs/ceph/inode.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] ceph: flush small range instead of the whole map for truncate
  2022-04-07 14:41 [PATCH 0/2] ceph: misc fix size truncate for fscrypt xiubli
@ 2022-04-07 14:41 ` xiubli
  2022-04-07 15:45   ` Jeff Layton
  2022-04-07 14:41 ` [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt xiubli
  1 sibling, 1 reply; 10+ messages in thread
From: xiubli @ 2022-04-07 14:41 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, lhenriques, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/inode.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 45ca4e598ef0..f4059d73edd5 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2275,8 +2275,12 @@ static int fill_fscrypt_truncate(struct inode *inode,
 	     ceph_cap_string(issued));
 
 	/* Try to writeback the dirty pagecaches */
-	if (issued & (CEPH_CAP_FILE_BUFFER))
-		filemap_write_and_wait(inode->i_mapping);
+	if (issued & (CEPH_CAP_FILE_BUFFER)) {
+		ret = filemap_write_and_wait_range(inode->i_mapping,
+						   orig_pos, LLONG_MAX);
+		if (ret < 0)
+			goto out;
+	}
 
 	page = __page_cache_alloc(GFP_KERNEL);
 	if (page == NULL) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt
  2022-04-07 14:41 [PATCH 0/2] ceph: misc fix size truncate for fscrypt xiubli
  2022-04-07 14:41 ` [PATCH 1/2] ceph: flush small range instead of the whole map for truncate xiubli
@ 2022-04-07 14:41 ` xiubli
  2022-04-07 15:33   ` Jeff Layton
  1 sibling, 1 reply; 10+ messages in thread
From: xiubli @ 2022-04-07 14:41 UTC (permalink / raw)
  To: jlayton; +Cc: idryomov, vshankar, lhenriques, ceph-devel, Xiubo Li

From: Xiubo Li <xiubli@redhat.com>

When truncating the file size the MDS will help update the last
encrypted block, and during this we need to make sure the client
won't fill the pagecaches.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index f4059d73edd5..cc1829ab497d 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		req->r_num_caps = 1;
 		req->r_stamp = attr->ia_ctime;
 		if (fill_fscrypt) {
+			filemap_invalidate_lock(inode->i_mapping);
 			err = fill_fscrypt_truncate(inode, req, attr);
-			if (err)
+			if (err) {
+				filemap_invalidate_unlock(inode->i_mapping);
 				goto out;
+			}
 		}
 
 		/*
@@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
 		 * it.
 		 */
 		err = ceph_mdsc_do_request(mdsc, NULL, req);
+		if (fill_fscrypt)
+			filemap_invalidate_unlock(inode->i_mapping);
 		if (err == -EAGAIN && truncate_retry--) {
 			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
 			     inode, err, ceph_cap_string(dirtied), mask);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt
  2022-04-07 14:41 ` [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt xiubli
@ 2022-04-07 15:33   ` Jeff Layton
  2022-04-07 15:38     ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2022-04-07 15:33 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, vshankar, lhenriques, ceph-devel

On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> When truncating the file size the MDS will help update the last
> encrypted block, and during this we need to make sure the client
> won't fill the pagecaches.
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/inode.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index f4059d73edd5..cc1829ab497d 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  		req->r_num_caps = 1;
>  		req->r_stamp = attr->ia_ctime;
>  		if (fill_fscrypt) {
> +			filemap_invalidate_lock(inode->i_mapping);
>  			err = fill_fscrypt_truncate(inode, req, attr);
> -			if (err)
> +			if (err) {
> +				filemap_invalidate_unlock(inode->i_mapping);
>  				goto out;
> +			}
>  		}
>  
>  		/*
> @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>  		 * it.
>  		 */
>  		err = ceph_mdsc_do_request(mdsc, NULL, req);
> +		if (fill_fscrypt)
> +			filemap_invalidate_unlock(inode->i_mapping);
>  		if (err == -EAGAIN && truncate_retry--) {
>  			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>  			     inode, err, ceph_cap_string(dirtied), mask);

Looks reasonable. Is there any reason we shouldn't do this in the non-
encrypted case too? I suppose it doesn't make as much difference in that
case.

I'll plan to pull this and the other patch into the wip-fscrypt branch.
Should I just fold them into your earlier patches?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt
  2022-04-07 15:33   ` Jeff Layton
@ 2022-04-07 15:38     ` Jeff Layton
  2022-04-07 19:14       ` Xiubo Li
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2022-04-07 15:38 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, vshankar, lhenriques, ceph-devel

On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote:
> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
> > From: Xiubo Li <xiubli@redhat.com>
> > 
> > When truncating the file size the MDS will help update the last
> > encrypted block, and during this we need to make sure the client
> > won't fill the pagecaches.
> > 
> > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > ---
> >  fs/ceph/inode.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > index f4059d73edd5..cc1829ab497d 100644
> > --- a/fs/ceph/inode.c
> > +++ b/fs/ceph/inode.c
> > @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> >  		req->r_num_caps = 1;
> >  		req->r_stamp = attr->ia_ctime;
> >  		if (fill_fscrypt) {
> > +			filemap_invalidate_lock(inode->i_mapping);
> >  			err = fill_fscrypt_truncate(inode, req, attr);
> > -			if (err)
> > +			if (err) {
> > +				filemap_invalidate_unlock(inode->i_mapping);
> >  				goto out;
> > +			}
> >  		}
> >  
> >  		/*
> > @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> >  		 * it.
> >  		 */
> >  		err = ceph_mdsc_do_request(mdsc, NULL, req);
> > +		if (fill_fscrypt)
> > +			filemap_invalidate_unlock(inode->i_mapping);
> >  		if (err == -EAGAIN && truncate_retry--) {
> >  			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
> >  			     inode, err, ceph_cap_string(dirtied), mask);
> 
> Looks reasonable. Is there any reason we shouldn't do this in the non-
> encrypted case too? I suppose it doesn't make as much difference in that
> case.
> 
> I'll plan to pull this and the other patch into the wip-fscrypt branch.
> Should I just fold them into your earlier patches?

OTOH...do we really need this? I'm not sure I understand the race you're
trying to prevent. Can you lay it out for me?

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] ceph: flush small range instead of the whole map for truncate
  2022-04-07 14:41 ` [PATCH 1/2] ceph: flush small range instead of the whole map for truncate xiubli
@ 2022-04-07 15:45   ` Jeff Layton
  2022-04-07 19:06     ` Xiubo Li
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2022-04-07 15:45 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, vshankar, lhenriques, ceph-devel

On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/inode.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 45ca4e598ef0..f4059d73edd5 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -2275,8 +2275,12 @@ static int fill_fscrypt_truncate(struct inode *inode,
>  	     ceph_cap_string(issued));
>  
>  	/* Try to writeback the dirty pagecaches */
> -	if (issued & (CEPH_CAP_FILE_BUFFER))
> -		filemap_write_and_wait(inode->i_mapping);
> +	if (issued & (CEPH_CAP_FILE_BUFFER)) {
> +		ret = filemap_write_and_wait_range(inode->i_mapping,
> +						   orig_pos, LLONG_MAX);
> +		if (ret < 0)
> +			goto out;
> +	}
> 
> 


Not much point in writing back blocks we're just going to truncate away
anyhow. Maybe this should be writing with this range?

    orig_pos, orig_pos + CEPH_FSCRYPT_BLOCK_SIZE - 1

>  
>  	page = __page_cache_alloc(GFP_KERNEL);
>  	if (page == NULL) {

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] ceph: flush small range instead of the whole map for truncate
  2022-04-07 15:45   ` Jeff Layton
@ 2022-04-07 19:06     ` Xiubo Li
  0 siblings, 0 replies; 10+ messages in thread
From: Xiubo Li @ 2022-04-07 19:06 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, lhenriques, ceph-devel


On 4/7/22 11:45 PM, Jeff Layton wrote:
> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/inode.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>> index 45ca4e598ef0..f4059d73edd5 100644
>> --- a/fs/ceph/inode.c
>> +++ b/fs/ceph/inode.c
>> @@ -2275,8 +2275,12 @@ static int fill_fscrypt_truncate(struct inode *inode,
>>   	     ceph_cap_string(issued));
>>   
>>   	/* Try to writeback the dirty pagecaches */
>> -	if (issued & (CEPH_CAP_FILE_BUFFER))
>> -		filemap_write_and_wait(inode->i_mapping);
>> +	if (issued & (CEPH_CAP_FILE_BUFFER)) {
>> +		ret = filemap_write_and_wait_range(inode->i_mapping,
>> +						   orig_pos, LLONG_MAX);
>> +		if (ret < 0)
>> +			goto out;
>> +	}
>>
>>
>
> Not much point in writing back blocks we're just going to truncate away
> anyhow. Maybe this should be writing with this range?
>
>      orig_pos, orig_pos + CEPH_FSCRYPT_BLOCK_SIZE - 1

We need to make sure the last block is not buffered in pagecache, 
because we will aways sync read that from Rados.

Yeah, this looks much better.

I will fix and test it again.

Thanks


>>   
>>   	page = __page_cache_alloc(GFP_KERNEL);
>>   	if (page == NULL) {


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt
  2022-04-07 15:38     ` Jeff Layton
@ 2022-04-07 19:14       ` Xiubo Li
  2022-04-07 20:32         ` Jeff Layton
  0 siblings, 1 reply; 10+ messages in thread
From: Xiubo Li @ 2022-04-07 19:14 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, lhenriques, ceph-devel


On 4/7/22 11:38 PM, Jeff Layton wrote:
> On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote:
>> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
>>> From: Xiubo Li <xiubli@redhat.com>
>>>
>>> When truncating the file size the MDS will help update the last
>>> encrypted block, and during this we need to make sure the client
>>> won't fill the pagecaches.
>>>
>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>> ---
>>>   fs/ceph/inode.c | 7 ++++++-
>>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>>> index f4059d73edd5..cc1829ab497d 100644
>>> --- a/fs/ceph/inode.c
>>> +++ b/fs/ceph/inode.c
>>> @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>   		req->r_num_caps = 1;
>>>   		req->r_stamp = attr->ia_ctime;
>>>   		if (fill_fscrypt) {
>>> +			filemap_invalidate_lock(inode->i_mapping);
>>>   			err = fill_fscrypt_truncate(inode, req, attr);
>>> -			if (err)
>>> +			if (err) {
>>> +				filemap_invalidate_unlock(inode->i_mapping);
>>>   				goto out;
>>> +			}
>>>   		}
>>>   
>>>   		/*
>>> @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>   		 * it.
>>>   		 */
>>>   		err = ceph_mdsc_do_request(mdsc, NULL, req);
>>> +		if (fill_fscrypt)
>>> +			filemap_invalidate_unlock(inode->i_mapping);
>>>   		if (err == -EAGAIN && truncate_retry--) {
>>>   			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>>>   			     inode, err, ceph_cap_string(dirtied), mask);
>> Looks reasonable. Is there any reason we shouldn't do this in the non-
>> encrypted case too? I suppose it doesn't make as much difference in that
>> case.

We only need this in encrypted case, which will do the RMW for the last 
block.


>> I'll plan to pull this and the other patch into the wip-fscrypt branch.
>> Should I just fold them into your earlier patches?
Yeah, certainly.
> OTOH...do we really need this? I'm not sure I understand the race you're
> trying to prevent. Can you lay it out for me?

I am thinking during the RMW for the last block, the page fault still 
could happen because the page fault function doesn't prevent that.

And we should prevent it during the RMW is going on.

-- Xiubo

>
> Thanks,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt
  2022-04-07 19:14       ` Xiubo Li
@ 2022-04-07 20:32         ` Jeff Layton
  2022-04-07 23:58           ` Xiubo Li
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2022-04-07 20:32 UTC (permalink / raw)
  To: Xiubo Li; +Cc: idryomov, vshankar, lhenriques, ceph-devel

On Fri, 2022-04-08 at 03:14 +0800, Xiubo Li wrote:
> On 4/7/22 11:38 PM, Jeff Layton wrote:
> > On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote:
> > > On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
> > > > From: Xiubo Li <xiubli@redhat.com>
> > > > 
> > > > When truncating the file size the MDS will help update the last
> > > > encrypted block, and during this we need to make sure the client
> > > > won't fill the pagecaches.
> > > > 
> > > > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > > > ---
> > > >   fs/ceph/inode.c | 7 ++++++-
> > > >   1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > > > index f4059d73edd5..cc1829ab497d 100644
> > > > --- a/fs/ceph/inode.c
> > > > +++ b/fs/ceph/inode.c
> > > > @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> > > >   		req->r_num_caps = 1;
> > > >   		req->r_stamp = attr->ia_ctime;
> > > >   		if (fill_fscrypt) {
> > > > +			filemap_invalidate_lock(inode->i_mapping);
> > > >   			err = fill_fscrypt_truncate(inode, req, attr);
> > > > -			if (err)
> > > > +			if (err) {
> > > > +				filemap_invalidate_unlock(inode->i_mapping);
> > > >   				goto out;
> > > > +			}
> > > >   		}
> > > >   
> > > >   		/*
> > > > @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
> > > >   		 * it.
> > > >   		 */
> > > >   		err = ceph_mdsc_do_request(mdsc, NULL, req);
> > > > +		if (fill_fscrypt)
> > > > +			filemap_invalidate_unlock(inode->i_mapping);
> > > >   		if (err == -EAGAIN && truncate_retry--) {
> > > >   			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
> > > >   			     inode, err, ceph_cap_string(dirtied), mask);
> > > Looks reasonable. Is there any reason we shouldn't do this in the non-
> > > encrypted case too? I suppose it doesn't make as much difference in that
> > > case.
> 
> We only need this in encrypted case, which will do the RMW for the last 
> block.
> 
> 
> > > I'll plan to pull this and the other patch into the wip-fscrypt branch.
> > > Should I just fold them into your earlier patches?
> Yeah, certainly.
> > OTOH...do we really need this? I'm not sure I understand the race you're
> > trying to prevent. Can you lay it out for me?
> 
> I am thinking during the RMW for the last block, the page fault still 
> could happen because the page fault function doesn't prevent that.
> 
> And we should prevent it during the RMW is going on.
> 

Right, but the RMW is being done using an anonymous page, and at this
point in the process we haven't really touched the pagecache yet. That
doesn't happen until __ceph_do_pending_vmtruncate.

Most of the callers for filemap_invalidate_lock/_unlock are in the hole
punching codepaths, and not so much in truncate. What outcome are you
trying to prevent with this? Can you lay out the potential race and why
it would be harmful?

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt
  2022-04-07 20:32         ` Jeff Layton
@ 2022-04-07 23:58           ` Xiubo Li
  0 siblings, 0 replies; 10+ messages in thread
From: Xiubo Li @ 2022-04-07 23:58 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, vshankar, lhenriques, ceph-devel


On 4/8/22 4:32 AM, Jeff Layton wrote:
> On Fri, 2022-04-08 at 03:14 +0800, Xiubo Li wrote:
>> On 4/7/22 11:38 PM, Jeff Layton wrote:
>>> On Thu, 2022-04-07 at 11:33 -0400, Jeff Layton wrote:
>>>> On Thu, 2022-04-07 at 22:41 +0800, xiubli@redhat.com wrote:
>>>>> From: Xiubo Li <xiubli@redhat.com>
>>>>>
>>>>> When truncating the file size the MDS will help update the last
>>>>> encrypted block, and during this we need to make sure the client
>>>>> won't fill the pagecaches.
>>>>>
>>>>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>>>>> ---
>>>>>    fs/ceph/inode.c | 7 ++++++-
>>>>>    1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>>>>> index f4059d73edd5..cc1829ab497d 100644
>>>>> --- a/fs/ceph/inode.c
>>>>> +++ b/fs/ceph/inode.c
>>>>> @@ -2647,9 +2647,12 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>>>    		req->r_num_caps = 1;
>>>>>    		req->r_stamp = attr->ia_ctime;
>>>>>    		if (fill_fscrypt) {
>>>>> +			filemap_invalidate_lock(inode->i_mapping);
>>>>>    			err = fill_fscrypt_truncate(inode, req, attr);
>>>>> -			if (err)
>>>>> +			if (err) {
>>>>> +				filemap_invalidate_unlock(inode->i_mapping);
>>>>>    				goto out;
>>>>> +			}
>>>>>    		}
>>>>>    
>>>>>    		/*
>>>>> @@ -2660,6 +2663,8 @@ int __ceph_setattr(struct inode *inode, struct iattr *attr, struct ceph_iattr *c
>>>>>    		 * it.
>>>>>    		 */
>>>>>    		err = ceph_mdsc_do_request(mdsc, NULL, req);
>>>>> +		if (fill_fscrypt)
>>>>> +			filemap_invalidate_unlock(inode->i_mapping);
>>>>>    		if (err == -EAGAIN && truncate_retry--) {
>>>>>    			dout("setattr %p result=%d (%s locally, %d remote), retry it!\n",
>>>>>    			     inode, err, ceph_cap_string(dirtied), mask);
>>>> Looks reasonable. Is there any reason we shouldn't do this in the non-
>>>> encrypted case too? I suppose it doesn't make as much difference in that
>>>> case.
>> We only need this in encrypted case, which will do the RMW for the last
>> block.
>>
>>
>>>> I'll plan to pull this and the other patch into the wip-fscrypt branch.
>>>> Should I just fold them into your earlier patches?
>> Yeah, certainly.
>>> OTOH...do we really need this? I'm not sure I understand the race you're
>>> trying to prevent. Can you lay it out for me?
>> I am thinking during the RMW for the last block, the page fault still
>> could happen because the page fault function doesn't prevent that.
>>
>> And we should prevent it during the RMW is going on.
>>
> Right, but the RMW is being done using an anonymous page, and at this
> point in the process we haven't really touched the pagecache yet. That
> doesn't happen until __ceph_do_pending_vmtruncate.
>
> Most of the callers for filemap_invalidate_lock/_unlock are in the hole
> punching codepaths, and not so much in truncate. What outcome are you
> trying to prevent with this? Can you lay out the potential race and why
> it would be harmful?

Yeah, here I forgot to invalidate the mapping. After writing the dirty 
pagecache back we should invalidate the mapping and drop the related 
page too.

It should be:

filemap_invalidate_lock(inode->i_mapping);

write pagecache back;

invalidate the mapping and drop the pages;

do the RMW;

filemap_invalidate_unlock(inode->i_mapping);


As you mentioned in another mail, other processes could do the map read 
at the same time, and we should make sure that when we are truncating 
the size, we should block map read to continue and just trigger a page 
fault and the page fault should wait our truncate size finish ?

-- Xiubo


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-04-07 23:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 14:41 [PATCH 0/2] ceph: misc fix size truncate for fscrypt xiubli
2022-04-07 14:41 ` [PATCH 1/2] ceph: flush small range instead of the whole map for truncate xiubli
2022-04-07 15:45   ` Jeff Layton
2022-04-07 19:06     ` Xiubo Li
2022-04-07 14:41 ` [PATCH 2/2] ceph: fix coherency issue when truncating file size for fscrypt xiubli
2022-04-07 15:33   ` Jeff Layton
2022-04-07 15:38     ` Jeff Layton
2022-04-07 19:14       ` Xiubo Li
2022-04-07 20:32         ` Jeff Layton
2022-04-07 23:58           ` Xiubo Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.