linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/5] fat: allocate persistent inode numbers
@ 2012-09-04 15:57 Namjae Jeon
  2012-09-04 16:17 ` Al Viro
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-04 15:57 UTC (permalink / raw)
  To: hirofumi, akpm, bfields, viro
  Cc: linux-kernel, Namjae Jeon, Namjae Jeon, Ravishankar N, Amit Sahrawat

From: Namjae Jeon <namjae.jeon@samsung.com>

All the files on a FAT partition have an on-disk directory entry.
The location of these entries, i_pos, is unique and is constructed by the
fat_make_i_pos() function.We can use this as the inode number making it
peristent across remounts.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ravishankar N <ravi.n1@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
---
 fs/fat/inode.c      |    5 ++++-
 fs/fat/namei_vfat.c |    2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index c7531b2..0811339 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -458,7 +458,10 @@ struct inode *fat_build_inode(struct super_block *sb,
 		inode = ERR_PTR(-ENOMEM);
 		goto out;
 	}
-	inode->i_ino = iunique(sb, MSDOS_ROOT_INO);
+	if (MSDOS_SB(sb)->options.nfs)
+		inode->i_ino = i_pos;
+	else
+		inode->i_ino = iunique(sb, MSDOS_ROOT_INO);
 	inode->i_version = 1;
 	err = fat_fill_inode(inode, de);
 	if (err) {
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index e535dd7..00d58ea 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -954,6 +954,8 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
 	new_dir->i_version++;
 
 	fat_detach(old_inode);
+	if (MSDOS_SB(sb)->options.nfs)
+		old_inode->i_ino = new_i_pos;
 	fat_attach(old_inode, new_i_pos);
 	if (IS_DIRSYNC(new_dir)) {
 		err = fat_sync_inode(old_inode);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-04 15:57 [PATCH v2 1/5] fat: allocate persistent inode numbers Namjae Jeon
@ 2012-09-04 16:17 ` Al Viro
  2012-09-05 14:08   ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: Al Viro @ 2012-09-04 16:17 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: hirofumi, akpm, bfields, linux-kernel, Namjae Jeon,
	Ravishankar N, Amit Sahrawat

On Wed, Sep 05, 2012 at 12:57:44AM +0900, Namjae Jeon wrote:
> From: Namjae Jeon <namjae.jeon@samsung.com>
> 
> All the files on a FAT partition have an on-disk directory entry.
> The location of these entries, i_pos, is unique and is constructed by the
> fat_make_i_pos() function.We can use this as the inode number making it
> peristent across remounts.

> --- a/fs/fat/namei_vfat.c
> +++ b/fs/fat/namei_vfat.c
> @@ -954,6 +954,8 @@ static int vfat_rename(struct inode *old_dir, struct dentry *old_dentry,
>  	new_dir->i_version++;
>  
>  	fat_detach(old_inode);
> +	if (MSDOS_SB(sb)->options.nfs)
> +		old_inode->i_ino = new_i_pos;

Sigh...  Inode numbers are reported by fstat() in stat.st_ino.  They must
	* remain constant from open() to close(), even if file gets
unlinked or renamed.
	* be equal for two simultaneously opened descriptors with the same
st_dev *ONLY* if those descriptors refer to the same file (i.e. if writing
through one of those would change the data read through another, etc.)

And yes, the userland code does depend on those properties.  There's a damn
good reason why we had gone for all those convolutions with separate hash,
etc.

inode->i_ino on a live struct inode is _never_ changed.  Period.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-04 16:17 ` Al Viro
@ 2012-09-05 14:08   ` Namjae Jeon
  2012-09-05 14:56     ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-05 14:08 UTC (permalink / raw)
  To: Al Viro, hirofumi
  Cc: akpm, bfields, linux-kernel, Namjae Jeon, Ravishankar N, Amit Sahrawat

2012/9/5, Al Viro <viro@zeniv.linux.org.uk>:
> On Wed, Sep 05, 2012 at 12:57:44AM +0900, Namjae Jeon wrote:
>> From: Namjae Jeon <namjae.jeon@samsung.com>
>>
>> All the files on a FAT partition have an on-disk directory entry.
>> The location of these entries, i_pos, is unique and is constructed by the
>> fat_make_i_pos() function.We can use this as the inode number making it
>> peristent across remounts.
>
>> --- a/fs/fat/namei_vfat.c
>> +++ b/fs/fat/namei_vfat.c
>> @@ -954,6 +954,8 @@ static int vfat_rename(struct inode *old_dir, struct
>> dentry *old_dentry,
>>  	new_dir->i_version++;
>>
>>  	fat_detach(old_inode);
>> +	if (MSDOS_SB(sb)->options.nfs)
>> +		old_inode->i_ino = new_i_pos;
>
A brief background to this patch-set : We are using FAT over NFS  in our
environment. While App just is browsing files(ls -lR, etc..) on the client
we observed a numerous times when the server cache was evicted.

Our purpose was same as mentioned by Neil Brown also -> "that it is
important to maintain support for NFS export of VFAT on a best-effort basis.

With the patch series we are able to provide a 100% safe solution ->
there is no ESTALE issues at least for the normal user scenarios (i.e,
rename, unlink do not happen while the file is open). After the review comments
from you, we proposed a new solution which takes care of 'unlink' also.

> Sigh...  Inode numbers are reported by fstat() in stat.st_ino.  They must
> 	* remain constant from open() to close(), even if file gets
> unlinked or renamed.
> 	* be equal for two simultaneously opened descriptors with the same
> st_dev *ONLY* if those descriptors refer to the same file (i.e. if writing
> through one of those would change the data read through another, etc.)
>
> And yes, the userland code does depend on those properties.  There's a damn
> good reason why we had gone for all those convolutions with separate hash,
> etc.
>
> inode->i_ino on a live struct inode is _never_ changed.  Period.
Hi Al.
Even without these patches, when a file is opened at the client and is
still 'live',
it is possible for the inode number to change due to cache eviction at server.
(because each read/write NFS transaction from client is translated into a open->
read/write->close operation and in between such transactions, the server cache
may be evicted). This is why we updated the i_ino in vfat_rename() immediately,
as it helps rebuild the inode in such cases.
But to comply with your explanation(constant i_ino from open() till close() ),
I can remove this change.

And
Hi. OGAWA.
In this long discusstion about the FAT acceptance over NFS, our belief
is still that
the objective should be to reduce errors as much as possible and then
if there are
certain scenarios - at least that could be highlighted as a limitation
in Documentation
 instead of completely discarding the usage of FAT over NFS.
So how about puttting rename scenario as a limitation ? In ideal
scenario how many
times this is going to happen ?

Thanks.
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-05 14:08   ` Namjae Jeon
@ 2012-09-05 14:56     ` OGAWA Hirofumi
  2012-09-06  6:46       ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-05 14:56 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: Al Viro, akpm, bfields, linux-kernel, Namjae Jeon, Ravishankar N,
	Amit Sahrawat

Namjae Jeon <linkinjeon@gmail.com> writes:

> In this long discusstion about the FAT acceptance over NFS, our belief
> is still that the objective should be to reduce errors as much as
> possible and then if there are certain scenarios - at least that could
> be highlighted as a limitation in Documentation instead of completely
> discarding the usage of FAT over NFS.  So how about puttting rename
> scenario as a limitation ? In ideal scenario how many times this is
> going to happen ?

My understanding of your patches is to introduce the silent corruption
bug by getting wrong location via ino on some cases, instead of
ESTALE. And make surprise to userland by change ino.

Why is it safe to change ino? If you are saying to remove the changing
ino on rename, how handle the case of collision?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-05 14:56     ` OGAWA Hirofumi
@ 2012-09-06  6:46       ` Namjae Jeon
  2012-09-06 12:19         ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-06  6:46 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Al Viro, akpm, bfields, linux-kernel, Namjae Jeon, Ravishankar N,
	Amit Sahrawat

2012/9/5 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> Namjae Jeon <linkinjeon@gmail.com> writes:
>
>> In this long discusstion about the FAT acceptance over NFS, our belief
>> is still that the objective should be to reduce errors as much as
>> possible and then if there are certain scenarios - at least that could
>> be highlighted as a limitation in Documentation instead of completely
>> discarding the usage of FAT over NFS.  So how about puttting rename
>> scenario as a limitation ? In ideal scenario how many times this is
>> going to happen ?
>
> My understanding of your patches is to introduce the silent corruption
> bug by getting wrong location via ino on some cases, instead of
> ESTALE. And make surprise to userland by change ino.
>
> Why is it safe to change ino? If you are saying to remove the changing
> ino on rename, how handle the case of collision?
Yes, agreed this would lead to collision. So, If we are choosing
'i_pos' as inode
number, We need to have a mechanism to avoid this 'i_pos' being reused.

We can have one thing over here. As a part of avoidance for such scenarios -
We can return EBUSY for this rename operation. i.e., If the inode is being
referenced then in such cases it makes sense to return EBUSY over NFS and
 forcus on the large part of the solution which is making FAT stable.

Let me know your opinion.

Thanks OGAWA.

> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-06  6:46       ` Namjae Jeon
@ 2012-09-06 12:19         ` OGAWA Hirofumi
  2012-09-06 13:39           ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-06 12:19 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: Al Viro, akpm, bfields, linux-kernel, Namjae Jeon, Ravishankar N,
	Amit Sahrawat

Namjae Jeon <linkinjeon@gmail.com> writes:

>>> In this long discusstion about the FAT acceptance over NFS, our belief
>>> is still that the objective should be to reduce errors as much as
>>> possible and then if there are certain scenarios - at least that could
>>> be highlighted as a limitation in Documentation instead of completely
>>> discarding the usage of FAT over NFS.  So how about puttting rename
>>> scenario as a limitation ? In ideal scenario how many times this is
>>> going to happen ?
>>
>> My understanding of your patches is to introduce the silent corruption
>> bug by getting wrong location via ino on some cases, instead of
>> ESTALE. And make surprise to userland by change ino.
>>
>> Why is it safe to change ino? If you are saying to remove the changing
>> ino on rename, how handle the case of collision?
> Yes, agreed this would lead to collision. So, If we are choosing
> 'i_pos' as inode
> number, We need to have a mechanism to avoid this 'i_pos' being reused.
>
> We can have one thing over here. As a part of avoidance for such scenarios -
> We can return EBUSY for this rename operation. i.e., If the inode is being
> referenced then in such cases it makes sense to return EBUSY over NFS and
>  forcus on the large part of the solution which is making FAT stable.
>
> Let me know your opinion.

It sounds like sane to provide the limited but stable behavior, with the
option. But at first, I'd like to see the read-only export but clean
solution, and make it stable. (I'm not thinking about the implementation
detail. It may be the stable ino solution, or may not be.)

After that, we can make it writable incrementally.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-06 12:19         ` OGAWA Hirofumi
@ 2012-09-06 13:39           ` Namjae Jeon
  2012-09-07  7:01             ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-06 13:39 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Al Viro, akpm, bfields, linux-kernel, Namjae Jeon, Ravishankar N,
	Amit Sahrawat

2012/9/6 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> Namjae Jeon <linkinjeon@gmail.com> writes:
>
>>>> In this long discusstion about the FAT acceptance over NFS, our belief
>>>> is still that the objective should be to reduce errors as much as
>>>> possible and then if there are certain scenarios - at least that could
>>>> be highlighted as a limitation in Documentation instead of completely
>>>> discarding the usage of FAT over NFS.  So how about puttting rename
>>>> scenario as a limitation ? In ideal scenario how many times this is
>>>> going to happen ?
>>>
>>> My understanding of your patches is to introduce the silent corruption
>>> bug by getting wrong location via ino on some cases, instead of
>>> ESTALE. And make surprise to userland by change ino.
>>>
>>> Why is it safe to change ino? If you are saying to remove the changing
>>> ino on rename, how handle the case of collision?
>> Yes, agreed this would lead to collision. So, If we are choosing
>> 'i_pos' as inode
>> number, We need to have a mechanism to avoid this 'i_pos' being reused.
>>
>> We can have one thing over here. As a part of avoidance for such scenarios -
>> We can return EBUSY for this rename operation. i.e., If the inode is being
>> referenced then in such cases it makes sense to return EBUSY over NFS and
>>  forcus on the large part of the solution which is making FAT stable.
>>
>> Let me know your opinion.
>
> It sounds like sane to provide the limited but stable behavior, with the
> option. But at first, I'd like to see the read-only export but clean
> solution, and make it stable. (I'm not thinking about the implementation
> detail. It may be the stable ino solution, or may not be.)
Hi. OGAWA.
I Agree. Once, I will check whether it is possible to implement
read-only export or not.
And If I face some problem or have some concern, I will discuss about
it with you.

>
> After that, we can make it writable incrementally.
Okay.
Thanks a lot.
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-06 13:39           ` Namjae Jeon
@ 2012-09-07  7:01             ` Namjae Jeon
  2012-09-07 12:15               ` Steven J. Magnani
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-07  7:01 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Al Viro, akpm, bfields, linux-kernel, Namjae Jeon, Ravishankar N,
	Amit Sahrawat

Hi. OGAWA.

I checked read-only option for export on FAT.
I think that there are 3 approaches as mentioned below.

1. As per the current scenario – user already has the option of
setting ‘ro’ in /etc/exports – so that can also be used to make it
read-only.
	
2. Forcefully set to “read-only” while executing FAT export operation.
-> As you know, we can set read-only(ro) export in /etc/exports.
If we set read-only export regardless of /etc/exports, This is "HACK"
and it will work regardless of user setting.

3. When FAT is mounted with -onfs option,-> Make it ‘ro’ at the mount
time itself.
-> It is simple to implement, but VFAT of NFS Server will be set to
read-only as well as NFS client.

4. As already suggested – only problem left now is of ‘RENAME’ all
other points can be taken care in the current solution. So, how about
putting RENAME as a limitation? or we can return EBUSY about rename
case to avoid collision.

Let me know which approach you prefer. Especially through more
insights on ‘rename’ part as all through the discussion this has been
the bottleneck part left now. And sincerely fruitful discussion on
this ‘rename’ might result in stable FAT over NFS.

Please let me know your opinion.

Thanks.
2012/9/6, Namjae Jeon <linkinjeon@gmail.com>:
> 2012/9/6 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
>> Namjae Jeon <linkinjeon@gmail.com> writes:
>>
>>>>> In this long discusstion about the FAT acceptance over NFS, our belief
>>>>> is still that the objective should be to reduce errors as much as
>>>>> possible and then if there are certain scenarios - at least that could
>>>>> be highlighted as a limitation in Documentation instead of completely
>>>>> discarding the usage of FAT over NFS.  So how about puttting rename
>>>>> scenario as a limitation ? In ideal scenario how many times this is
>>>>> going to happen ?
>>>>
>>>> My understanding of your patches is to introduce the silent corruption
>>>> bug by getting wrong location via ino on some cases, instead of
>>>> ESTALE. And make surprise to userland by change ino.
>>>>
>>>> Why is it safe to change ino? If you are saying to remove the changing
>>>> ino on rename, how handle the case of collision?
>>> Yes, agreed this would lead to collision. So, If we are choosing
>>> 'i_pos' as inode
>>> number, We need to have a mechanism to avoid this 'i_pos' being reused.
>>>
>>> We can have one thing over here. As a part of avoidance for such
>>> scenarios -
>>> We can return EBUSY for this rename operation. i.e., If the inode is
>>> being
>>> referenced then in such cases it makes sense to return EBUSY over NFS
>>> and
>>>  forcus on the large part of the solution which is making FAT stable.
>>>
>>> Let me know your opinion.
>>
>> It sounds like sane to provide the limited but stable behavior, with the
>> option. But at first, I'd like to see the read-only export but clean
>> solution, and make it stable. (I'm not thinking about the implementation
>> detail. It may be the stable ino solution, or may not be.)
> Hi. OGAWA.
> I Agree. Once, I will check whether it is possible to implement
> read-only export or not.
> And If I face some problem or have some concern, I will discuss about
> it with you.
>
>>
>> After that, we can make it writable incrementally.
> Okay.
> Thanks a lot.
>> --
>> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-07  7:01             ` Namjae Jeon
@ 2012-09-07 12:15               ` Steven J. Magnani
  2012-09-09  9:32                 ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: Steven J. Magnani @ 2012-09-07 12:15 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: OGAWA Hirofumi, Al Viro, akpm, bfields, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

On Fri, 2012-09-07 at 16:01 +0900, Namjae Jeon wrote: 
> Hi. OGAWA.
> 
> I checked read-only option for export on FAT.
> I think that there are 3 approaches as mentioned below.
> 
> 1. As per the current scenario – user already has the option of
> setting ‘ro’ in /etc/exports – so that can also be used to make it
> read-only.
> 	
> 2. Forcefully set to “read-only” while executing FAT export operation.
> -> As you know, we can set read-only(ro) export in /etc/exports.
> If we set read-only export regardless of /etc/exports, This is "HACK"
> and it will work regardless of user setting.
> 
> 3. When FAT is mounted with -onfs option,-> Make it ‘ro’ at the mount
> time itself.
> -> It is simple to implement, but VFAT of NFS Server will be set to
> read-only as well as NFS client.

I argue against (2) and (3). A change that drops any possibility of
NFS-mounting VFAT filesystems read-write will break my use case. Where
ESTALE is an issue, there are client-side solutions, either mounting
with lookupcache=none (which admittedly has a severe performance impact)
or the VFS patches to handle ESTALE that are working their way towards
mainline. I recognize that not everyone can take advantage of
client-side features, but options (2) and (3) make life worse for those
who can.
------------------------------------------------------------------------
Steven J. Magnani               "I claim this network for MARS!
www.digidescorp.com              Earthling, return my space modulator!"

#include <standard.disclaimer>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-07 12:15               ` Steven J. Magnani
@ 2012-09-09  9:32                 ` OGAWA Hirofumi
  2012-09-09 11:29                   ` OGAWA Hirofumi
  2012-09-10 12:28                   ` Steven J. Magnani
  0 siblings, 2 replies; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-09  9:32 UTC (permalink / raw)
  To: Steven J. Magnani
  Cc: Namjae Jeon, Al Viro, akpm, bfields, linux-kernel, Namjae Jeon,
	Ravishankar N, Amit Sahrawat

"Steven J. Magnani" <steve@digidescorp.com> writes:

> On Fri, 2012-09-07 at 16:01 +0900, Namjae Jeon wrote: 
>> Hi. OGAWA.
>> 
>> I checked read-only option for export on FAT.
>> I think that there are 3 approaches as mentioned below.
>> 
>> 1. As per the current scenario – user already has the option of
>> setting ‘ro’ in /etc/exports – so that can also be used to make it
>> read-only.
>> 	
>> 2. Forcefully set to “read-only” while executing FAT export operation.
>> -> As you know, we can set read-only(ro) export in /etc/exports.
>> If we set read-only export regardless of /etc/exports, This is "HACK"
>> and it will work regardless of user setting.
>> 
>> 3. When FAT is mounted with -onfs option,-> Make it ‘ro’ at the mount
>> time itself.
>> -> It is simple to implement, but VFAT of NFS Server will be set to
>> read-only as well as NFS client.
>
> I argue against (2) and (3). A change that drops any possibility of
> NFS-mounting VFAT filesystems read-write will break my use case. Where
> ESTALE is an issue, there are client-side solutions, either mounting
> with lookupcache=none (which admittedly has a severe performance impact)
> or the VFS patches to handle ESTALE that are working their way towards
> mainline. I recognize that not everyone can take advantage of
> client-side features, but options (2) and (3) make life worse for those
> who can.

What is your use case?  I'm assuming current NFS support of FAT is not
unstable behavior even with your patches. Is this true?

Well, this plan is to provide the stable/clean read-only behavior at
first.  After that, make it writable with some limitations (e.g. rename
may be unsupported).

If your patches in -mm is enough for now, we will not need to do those.
Namjae, were you tested it? or what are you thinking?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-09  9:32                 ` OGAWA Hirofumi
@ 2012-09-09 11:29                   ` OGAWA Hirofumi
  2012-09-10 12:03                     ` Namjae Jeon
  2012-09-10 12:28                   ` Steven J. Magnani
  1 sibling, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-09 11:29 UTC (permalink / raw)
  To: Steven J. Magnani
  Cc: Namjae Jeon, Al Viro, akpm, bfields, linux-kernel, Namjae Jeon,
	Ravishankar N, Amit Sahrawat

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:

> What is your use case?  I'm assuming current NFS support of FAT is not
> unstable behavior even with your patches. Is this true?

s/is not unstable/is still unstable/
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-09 11:29                   ` OGAWA Hirofumi
@ 2012-09-10 12:03                     ` Namjae Jeon
  2012-09-10 14:00                       ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-10 12:03 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Steven J. Magnani, Al Viro, akpm, bfields, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

2012/9/9, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:
>
>> What is your use case?  I'm assuming current NFS support of FAT
>> is still unstable behavior even with your patches. Is this true?
Hi. OGAWA.

Yes, It is  true(current VFAT of -mm tree is not stable). Although we
set lookupcache=none while mounting, ESTALE error can still occur in
rename case.
So there still remain ESTALE error issue from rename case on current -mm tree.
plz See the step as the following
1. on client write to file.
2. on client, move/rename file.
3. on server, do drop_caches. etc to somehow evict indoe number so
that it gets new inode number
4. on client, resume the program to write to file. write will fail
(write: Stale NFS file handle)
-----

As I know, I think that Steve patch tried to fix ESTALE error on best-effort.
And Steve said also his patch can perfectly not avoid ESTLE error. And
The aim of Steve's patch can be just improved under memory
pressure(dentry eviction) on best-effort.
------------------------------------------------------------------------------
Under memory pressure, the system may evict dentries from cache.
When the FAT driver receives a NFS request involving an evicted dentry,
it is unable to reconnect it to the filesystem root.
This causes the request to fail, often with ENOENT.
.....
Note that while this patch set improves FAT's NFS support, it does not
eliminate ESTALE errors completely.
The following should be considered for NFS clients who are sensitive to ESTALE:
* Mounting with lookupcache=none
  Unfortunately this can degrade performance severely, particularly for deep
  filesystems.
* Incorporating VFS patches to retry ESTALE failures on the client-side,
  such as https://lkml.org/lkml/2012/6/29/381
* Handling ESTALE errors in client application code
-------------------------------------------------------------------------------------

And ......
If we mount NFS with lookupcache=none, FAT file lookup performance is
severely dropped.
LOOKUP performance is very poor on slow network and slow device. I do
not recommend to disable lookup cache on NFS.
And that is why reconstructing inode is already implemented in other
filesystem (e.g. EXT4, XFS etc..)
Currently lookupcache is enabled by default in NFS, it means users
already have disclosed and experienced ESTALE issues on NFS over VFAT.

I agree wth you to make NFS over VFAT read-only filesystem to avoid all issues.
Eventually we can make it writable with rename limitation when we
decide that it is pretty stable in mainline.
So, I suggest to add 'nfs_ro' mount option instead of 'nfs' option.

Let me know your opinion.
Thanks.

>
> s/is not unstable/is still unstable/
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-09  9:32                 ` OGAWA Hirofumi
  2012-09-09 11:29                   ` OGAWA Hirofumi
@ 2012-09-10 12:28                   ` Steven J. Magnani
  1 sibling, 0 replies; 33+ messages in thread
From: Steven J. Magnani @ 2012-09-10 12:28 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Namjae Jeon, Al Viro, akpm, bfields, linux-kernel, Namjae Jeon,
	Ravishankar N, Amit Sahrawat

On Sun, 2012-09-09 at 18:32 +0900, OGAWA Hirofumi wrote:

> What is your use case?  

What Neil Brown refers to as a "general file access protocol" -
basically making a flash disk available to a small embedded network for
random-access file I/O.

The flash disk is required to interoperate with Windoze, which forces us
into VFAT-backed NFS.

> I'm assuming current NFS support of FAT is still unstable behavior even with your 
> patches. Is this true?

We use lookupcache=none, which I thought had stabilized things. Based on
what Namjae has found with ESTALE on rename/drop_caches I guess there is
a hole.
------------------------------------------------------------------------
Steven J. Magnani               "I claim this network for MARS!
www.digidescorp.com              Earthling, return my space modulator!"

#include <standard.disclaimer>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-10 12:03                     ` Namjae Jeon
@ 2012-09-10 14:00                       ` OGAWA Hirofumi
  2012-09-11 12:00                         ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-10 14:00 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: Steven J. Magnani, Al Viro, akpm, bfields, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

Namjae Jeon <linkinjeon@gmail.com> writes:

> Yes, It is  true(current VFAT of -mm tree is not stable). Although we
> set lookupcache=none while mounting, ESTALE error can still occur in
> rename case.
> So there still remain ESTALE error issue from rename case on current -mm tree.
> plz See the step as the following
> 1. on client write to file.
> 2. on client, move/rename file.
> 3. on server, do drop_caches. etc to somehow evict indoe number so
> that it gets new inode number
> 4. on client, resume the program to write to file. write will fail
> (write: Stale NFS file handle)

Since rename() will be disabled on stable ino patches, this will be
unfixable, so rather maybe it is worse.

Did you checked why it returns -ESTALE?  Or rename() issue also is
unfixable on -mm?

> And ......
> If we mount NFS with lookupcache=none, FAT file lookup performance is
> severely dropped.
> LOOKUP performance is very poor on slow network and slow device. I do
> not recommend to disable lookup cache on NFS.
> And that is why reconstructing inode is already implemented in other
> filesystem (e.g. EXT4, XFS etc..)
> Currently lookupcache is enabled by default in NFS, it means users
> already have disclosed and experienced ESTALE issues on NFS over VFAT.
>
> I agree wth you to make NFS over VFAT read-only filesystem to avoid all issues.
> Eventually we can make it writable with rename limitation when we
> decide that it is pretty stable in mainline.
> So, I suggest to add 'nfs_ro' mount option instead of 'nfs' option.

-mm seems to be more stable than I thought. As he said, sounds like
rename() is an only known issue on -mm, true?

And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
like to improve performance by enabling lookupcache. I'd like to be
knowing the critical reason we have to replace it.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-10 14:00                       ` OGAWA Hirofumi
@ 2012-09-11 12:00                         ` Namjae Jeon
  2012-09-11 12:31                           ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-11 12:00 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Steven J. Magnani, Al Viro, akpm, bfields, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

2012/9/10, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> Namjae Jeon <linkinjeon@gmail.com> writes:
>
>> Yes, It is  true(current VFAT of -mm tree is not stable). Although we
>> set lookupcache=none while mounting, ESTALE error can still occur in
>> rename case.
>> So there still remain ESTALE error issue from rename case on current -mm
>> tree.
>> plz See the step as the following
>> 1. on client write to file.
>> 2. on client, move/rename file.
>> 3. on server, do drop_caches. etc to somehow evict indoe number so
>> that it gets new inode number
>> 4. on client, resume the program to write to file. write will fail
>> (write: Stale NFS file handle)
>
Hi OGAWA.
> Since rename() will be disabled on stable ino patches, this will be
> unfixable, so rather maybe it is worse.
Currently with our patchset : only rename issue (could not find any
correct approach to ignore this. If we do not update this immediately
at i_pos change – it is just delaying the problem). And we can return
EBUSY when rename is called while process is opening file with rename
limitation. Without our patchset also - the rename issue can occur
over NFS file access - when the inode is evicted from the SERVER
cache.
>
> Did you checked why it returns -ESTALE?  Or rename() issue also is
> unfixable on -mm?
It is reproducible regardless of lookupcache is enable or disable.
The inode is not found in server inode cache. So when
d_obtain_alias(inode) is called, it returns ESTALE.
Call path like this.
fh_verify()-->nfsd_set_fh_dentry()-->exportfs_decode_fh()-->nop->fh_to_dentry()-->fat_fh_to_dentry()-->generic_fh_to_dentry()-->get_inode()-->fat_nfs_get_inode()

static struct inode *fat_nfs_get_inode(struct super_block *sb,
                                       u64 ino, u32 generation)
{
......
        inode = ilookup(sb, ino); ->This looks up in inode cache and
returns null
        if (inode && generation && (inode->i_generation != generation)) {
                iput(inode);
                inode = NULL;
         }
        return inode;
}

I think that it is unfixable because we can not know i_pos of inode
changed by rename.
And even though we know it, there is no rebuild inode routine in -mm.
And It even can not fix in our patches.

>
>> And ......
>> If we mount NFS with lookupcache=none, FAT file lookup performance is
>> severely dropped.
>> LOOKUP performance is very poor on slow network and slow device. I do
>> not recommend to disable lookup cache on NFS.
>> And that is why reconstructing inode is already implemented in other
>> filesystem (e.g. EXT4, XFS etc..)
>> Currently lookupcache is enabled by default in NFS, it means users
>> already have disclosed and experienced ESTALE issues on NFS over VFAT.
>>
>> I agree wth you to make NFS over VFAT read-only filesystem to avoid all
>> issues.
>> Eventually we can make it writable with rename limitation when we
>> decide that it is pretty stable in mainline.
>> So, I suggest to add 'nfs_ro' mount option instead of 'nfs' option.
>
> -mm seems to be more stable than I thought. As he said, sounds like
> rename() is an only known issue on -mm, true?
Yes, There is only rename issue in stability if we use lookcache is disable.
But performance will severely be dropped
But If lookup cache is enable, there are estale and rename issue in -mm.
>
> And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
> like to improve performance by enabling lookupcache.
We checked this patches when facing estale issue in -mm.
But It is no use, these patches just retry system call one more when
estale error.

> I'd like to be knowing the critical reason we have to replace it.
I arrange to help your decision as the following.

1. lookup cache is enable at default in NFS. So estale error can be
easily occurred in -mm.
2. If lookup cache is disable, there is rename issue and file lookup
performance is dropped in -mm.
4. If we use our patches, there is rename issue. but we can use VFAT
over NFS with lookup cache enable.
5. If we use read-only with our patches, there is no issue.

Thanks.
>
> Thanks.
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-11 12:00                         ` Namjae Jeon
@ 2012-09-11 12:31                           ` OGAWA Hirofumi
  2012-09-11 15:13                             ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-11 12:31 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: Steven J. Magnani, Al Viro, akpm, bfields, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

Namjae Jeon <linkinjeon@gmail.com> writes:

>> Since rename() will be disabled on stable ino patches, this will be
>> unfixable, so rather maybe it is worse.
> Currently with our patchset : only rename issue (could not find any
> correct approach to ignore this. If we do not update this immediately
> at i_pos change – it is just delaying the problem). And we can return
> EBUSY when rename is called while process is opening file with rename
> limitation. Without our patchset also - the rename issue can occur
> over NFS file access - when the inode is evicted from the SERVER
> cache.

Important difference is whether rename issue is unfixable or not.

> I think that it is unfixable because we can not know i_pos of inode
> changed by rename.
> And even though we know it, there is no rebuild inode routine in -mm.
> And It even can not fix in our patches.

>> And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
>> like to improve performance by enabling lookupcache.
> We checked this patches when facing estale issue in -mm.
> But It is no use, these patches just retry system call one more when
> estale error.

What happens if client retried from lookup() after -ESTALE? (client NFS
doesn't have the name of entry anymore?)

I'm assuming the retry means - it restarts from building the NFS file
handle. I might be just wrong here though.

>> I'd like to be knowing the critical reason we have to replace it.
> I arrange to help your decision as the following.
>
> 1. lookup cache is enable at default in NFS. So estale error can be
> easily occurred in -mm.
> 2. If lookup cache is disable, there is rename issue and file lookup
> performance is dropped in -mm.
> 4. If we use our patches, there is rename issue. but we can use VFAT
> over NFS with lookup cache enable.
> 5. If we use read-only with our patches, there is no issue.

Again, I'm care about whether rename issue is unfixable or not. In
stable ino patches, it will never be fixable.


What do you think about this rename issue, Steven?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-11 12:31                           ` OGAWA Hirofumi
@ 2012-09-11 15:13                             ` Namjae Jeon
  2012-09-11 15:47                               ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-11 15:13 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Steven J. Magnani, Al Viro, akpm, bfields, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

2012/9/11 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> Namjae Jeon <linkinjeon@gmail.com> writes:
>
>>> Since rename() will be disabled on stable ino patches, this will be
>>> unfixable, so rather maybe it is worse.
>> Currently with our patchset : only rename issue (could not find any
>> correct approach to ignore this. If we do not update this immediately
>> at i_pos change – it is just delaying the problem). And we can return
>> EBUSY when rename is called while process is opening file with rename
>> limitation. Without our patchset also - the rename issue can occur
>> over NFS file access - when the inode is evicted from the SERVER
>> cache.
>
> Important difference is whether rename issue is unfixable or not.
Currently I don't have any idea to fix rename issue in -mm.
>
>> I think that it is unfixable because we can not know i_pos of inode
>> changed by rename.
>> And even though we know it, there is no rebuild inode routine in -mm.
>> And It even can not fix in our patches.
>
>>> And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
>>> like to improve performance by enabling lookupcache.
>> We checked this patches when facing estale issue in -mm.
>> But It is no use, these patches just retry system call one more when
>> estale error.
>
> What happens if client retried from lookup() after -ESTALE? (client NFS
> doesn't have the name of entry anymore?)
Need to rebuild inode routine because inode cache is already evicted on Server.
>
> I'm assuming the retry means - it restarts from building the NFS file
> handle. I might be just wrong here though.
As I remember, just retry in VFS of NFS client..I heard this patch is
needed for
a very specific set of circumstances where an entry goes stale once
between the lookup and the actual operation(s).
It is not related with current issues(inode cache eviction on server).
>
>>> I'd like to be knowing the critical reason we have to replace it.
>> I arrange to help your decision as the following.
>>
>> 1. lookup cache is enable at default in NFS. So estale error can be
>> easily occurred in -mm.
>> 2. If lookup cache is disable, there is rename issue and file lookup
>> performance is dropped in -mm.
>> 4. If we use our patches, there is rename issue. but we can use VFAT
>> over NFS with lookup cache enable.
>> 5. If we use read-only with our patches, there is no issue.
>
> Again, I'm care about whether rename issue is unfixable or not.
I think that It is unfixable in -mm.
>In stable ino patches, it will never be fixable.
>
>
> What do you think about this rename issue, Steven?
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-11 15:13                             ` Namjae Jeon
@ 2012-09-11 15:47                               ` OGAWA Hirofumi
  2012-09-12 14:12                                 ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-11 15:47 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: Steven J. Magnani, Al Viro, akpm, bfields, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

Namjae Jeon <linkinjeon@gmail.com> writes:

>>> I think that it is unfixable because we can not know i_pos of inode
>>> changed by rename.
>>> And even though we know it, there is no rebuild inode routine in -mm.
>>> And It even can not fix in our patches.
>>
>>>> And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
>>>> like to improve performance by enabling lookupcache.
>>> We checked this patches when facing estale issue in -mm.
>>> But It is no use, these patches just retry system call one more when
>>> estale error.
>>
>> What happens if client retried from lookup() after -ESTALE? (client NFS
>> doesn't have the name of entry anymore?)
> Need to rebuild inode routine because inode cache is already evicted on Server.
>>
>> I'm assuming the retry means - it restarts from building the NFS file
>> handle. I might be just wrong here though.
> As I remember, just retry in VFS of NFS client..I heard this patch is
> needed for
> a very specific set of circumstances where an entry goes stale once
> between the lookup and the actual operation(s).
> It is not related with current issues(inode cache eviction on server).

Supposing, the server/client state is after cold boot, and client try to
rename at first without any cache on client/server.

Even if this state, does the server return ESTALE? If it doesn't return
ESTALE, I can't understand why it is really unfixable.

If it returns ESTALE, why does it return? I'm assuming the previous code
path is the cached FH path.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-11 15:47                               ` OGAWA Hirofumi
@ 2012-09-12 14:12                                 ` Namjae Jeon
  2012-09-12 14:32                                   ` J. Bruce Fields
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-12 14:12 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Steven J. Magnani, Al Viro, akpm, bfields, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

2012/9/12 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> Namjae Jeon <linkinjeon@gmail.com> writes:
>
>>>> I think that it is unfixable because we can not know i_pos of inode
>>>> changed by rename.
>>>> And even though we know it, there is no rebuild inode routine in -mm.
>>>> And It even can not fix in our patches.
>>>
>>>>> And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
>>>>> like to improve performance by enabling lookupcache.
>>>> We checked this patches when facing estale issue in -mm.
>>>> But It is no use, these patches just retry system call one more when
>>>> estale error.
>>>
>>> What happens if client retried from lookup() after -ESTALE? (client NFS
>>> doesn't have the name of entry anymore?)
>> Need to rebuild inode routine because inode cache is already evicted on Server.
>>>
>>> I'm assuming the retry means - it restarts from building the NFS file
>>> handle. I might be just wrong here though.
>> As I remember, just retry in VFS of NFS client..I heard this patch is
>> needed for
>> a very specific set of circumstances where an entry goes stale once
>> between the lookup and the actual operation(s).
>> It is not related with current issues(inode cache eviction on server).
>
> Supposing, the server/client state is after cold boot, and client try to
> rename at first without any cache on client/server.
>
> Even if this state, does the server return ESTALE? If it doesn't return
> ESTALE, I can't understand why it is really unfixable.
Hi OGAWA.
Server will not return ESTALE in this case. because the client does
not have any information for files yet.
I mean NFS client does not have any old NFS FH(containing old inode
number) for this.

>
> If it returns ESTALE, why does it return? I'm assuming the previous code
> path is the cached FH path.
The main point for observation is the file handle-which is used for
all the NFS operation.
So for all the NFS operation(read/write....) which makes use of the
NFS file handle in between if there is a change in inode number
It will result in ESTALE.
Changing inode number on rename happened at NFS server by inode cache
eviction with memory pressure.

lookupcache is used at NFS client to reduce number of LOOKUP operations.
But , we can still get ESTALE if inode number at NFS Server change
after LOOKUP, although lookupcache is disable.

LOOKUP return NFS FH->[inode number changed at NFS Server] ->
But we still use old NFS FH returned from LOOKUP for any file
operation(write,read,etc..)
-> ESTALE will be returned.

Thanks!
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-12 14:12                                 ` Namjae Jeon
@ 2012-09-12 14:32                                   ` J. Bruce Fields
  2012-09-12 17:03                                     ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: J. Bruce Fields @ 2012-09-12 14:32 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: OGAWA Hirofumi, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

On Wed, Sep 12, 2012 at 11:12:56PM +0900, Namjae Jeon wrote:
> 2012/9/12 OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> > Namjae Jeon <linkinjeon@gmail.com> writes:
> >
> >>>> I think that it is unfixable because we can not know i_pos of inode
> >>>> changed by rename.
> >>>> And even though we know it, there is no rebuild inode routine in -mm.
> >>>> And It even can not fix in our patches.
> >>>
> >>>>> And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
> >>>>> like to improve performance by enabling lookupcache.
> >>>> We checked this patches when facing estale issue in -mm.
> >>>> But It is no use, these patches just retry system call one more when
> >>>> estale error.
> >>>
> >>> What happens if client retried from lookup() after -ESTALE? (client NFS
> >>> doesn't have the name of entry anymore?)
> >> Need to rebuild inode routine because inode cache is already evicted on Server.
> >>>
> >>> I'm assuming the retry means - it restarts from building the NFS file
> >>> handle. I might be just wrong here though.
> >> As I remember, just retry in VFS of NFS client..I heard this patch is
> >> needed for
> >> a very specific set of circumstances where an entry goes stale once
> >> between the lookup and the actual operation(s).
> >> It is not related with current issues(inode cache eviction on server).
> >
> > Supposing, the server/client state is after cold boot, and client try to
> > rename at first without any cache on client/server.
> >
> > Even if this state, does the server return ESTALE? If it doesn't return
> > ESTALE, I can't understand why it is really unfixable.
> Hi OGAWA.
> Server will not return ESTALE in this case. because the client does
> not have any information for files yet.

It does if the client mounted before the server rebooted.  NFS is
designed so that servers can reboot without causing clients to fail.
(Applications will just see a delay during the reboot.)

It probably isn't possible to this work in the case of fat.

But from fat's point of view there probably isn't much difference
between a filehandle lookup after a reboot and a filehandle lookup after
the inode's gone from cache.


I really don't see what you can do to help here.  Won't anything that
allows looking up an uncached inode by filehandle also risk finding the
wrong file?

(If looking up the same filehandle ever results in finding a *different*
file from before, that's a bug.  Probably a more dangerous bug than an
ESTALE--in the ESTALE case the failure is obvious whereas in the case
where you get the wrong file, you may silently corrupt data.)

--b.

> I mean NFS client does not have any old NFS FH(containing old inode
> number) for this.
> 
> >
> > If it returns ESTALE, why does it return? I'm assuming the previous code
> > path is the cached FH path.
> The main point for observation is the file handle-which is used for
> all the NFS operation.
> So for all the NFS operation(read/write....) which makes use of the
> NFS file handle in between if there is a change in inode number
> It will result in ESTALE.
> Changing inode number on rename happened at NFS server by inode cache
> eviction with memory pressure.
> 
> lookupcache is used at NFS client to reduce number of LOOKUP operations.
> But , we can still get ESTALE if inode number at NFS Server change
> after LOOKUP, although lookupcache is disable.
> 
> LOOKUP return NFS FH->[inode number changed at NFS Server] ->
> But we still use old NFS FH returned from LOOKUP for any file
> operation(write,read,etc..)
> -> ESTALE will be returned.
> 
> Thanks!
> > --
> > OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-12 14:32                                   ` J. Bruce Fields
@ 2012-09-12 17:03                                     ` OGAWA Hirofumi
  2012-09-12 17:11                                       ` J. Bruce Fields
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-12 17:03 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Namjae Jeon, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

"J. Bruce Fields" <bfields@fieldses.org> writes:

>> > Supposing, the server/client state is after cold boot, and client try to
>> > rename at first without any cache on client/server.
>> >
>> > Even if this state, does the server return ESTALE? If it doesn't return
>> > ESTALE, I can't understand why it is really unfixable.
>> Hi OGAWA.
>> Server will not return ESTALE in this case. because the client does
>> not have any information for files yet.
>
> It does if the client mounted before the server rebooted.  NFS is
> designed so that servers can reboot without causing clients to fail.
> (Applications will just see a delay during the reboot.)
>
> It probably isn't possible to this work in the case of fat.
>
> But from fat's point of view there probably isn't much difference
> between a filehandle lookup after a reboot and a filehandle lookup after
> the inode's gone from cache.

This is talking about to retry by client side, not server side solution.
What happens if client retry after got ESTALE? (Yes, this would not be
the solution for all NFS clients. But, I guess this solution can be for
linux NFS client.)

> I really don't see what you can do to help here.  Won't anything that
> allows looking up an uncached inode by filehandle also risk finding the
> wrong file?

On other view (as server side solution), we are thinking there is
possible to make the stable filehandle on FAT if we disabled some
operations (e.g. rename(), unlink()) which change inode location in FAT.

Yes, it would be stable, but supporting limited operations.

This is server side solution, and we comparing it with client solution.

>> > If it returns ESTALE, why does it return? I'm assuming the previous code
>> > path is the cached FH path.
>> The main point for observation is the file handle-which is used for
>> all the NFS operation.
>> So for all the NFS operation(read/write....) which makes use of the
>> NFS file handle in between if there is a change in inode number
>> It will result in ESTALE.
>> Changing inode number on rename happened at NFS server by inode cache
>> eviction with memory pressure.
>> 
>> lookupcache is used at NFS client to reduce number of LOOKUP operations.
>> But , we can still get ESTALE if inode number at NFS Server change
>> after LOOKUP, although lookupcache is disable.
>> 
>> LOOKUP return NFS FH->[inode number changed at NFS Server] ->
>> But we still use old NFS FH returned from LOOKUP for any file
>> operation(write,read,etc..)
>> -> ESTALE will be returned.

Yes.  And I'm expecting as client side solution,

-> ESTALE will be returned -> discard old FH -> restart from LOOKUP ->
make cached inode -> use returned new FH.

Yeah, I know this is unstable (there is no perfect solution for now),
but if this works, this doesn't limit any operations.

We would want to compare client solution (-mm) and server solution
(stable ino). Or I'd like to know which my knowledges/understanding are
wrong here.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-12 17:03                                     ` OGAWA Hirofumi
@ 2012-09-12 17:11                                       ` J. Bruce Fields
  2012-09-12 17:38                                         ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: J. Bruce Fields @ 2012-09-12 17:11 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Namjae Jeon, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

On Thu, Sep 13, 2012 at 02:03:51AM +0900, OGAWA Hirofumi wrote:
> "J. Bruce Fields" <bfields@fieldses.org> writes:
> 
> >> > Supposing, the server/client state is after cold boot, and client try to
> >> > rename at first without any cache on client/server.
> >> >
> >> > Even if this state, does the server return ESTALE? If it doesn't return
> >> > ESTALE, I can't understand why it is really unfixable.
> >> Hi OGAWA.
> >> Server will not return ESTALE in this case. because the client does
> >> not have any information for files yet.
> >
> > It does if the client mounted before the server rebooted.  NFS is
> > designed so that servers can reboot without causing clients to fail.
> > (Applications will just see a delay during the reboot.)
> >
> > It probably isn't possible to this work in the case of fat.
> >
> > But from fat's point of view there probably isn't much difference
> > between a filehandle lookup after a reboot and a filehandle lookup after
> > the inode's gone from cache.
> 
> This is talking about to retry by client side, not server side solution.
> What happens if client retry after got ESTALE? (Yes, this would not be
> the solution for all NFS clients. But, I guess this solution can be for
> linux NFS client.)
> 
> > I really don't see what you can do to help here.  Won't anything that
> > allows looking up an uncached inode by filehandle also risk finding the
> > wrong file?
> 
> On other view (as server side solution), we are thinking there is
> possible to make the stable filehandle on FAT if we disabled some
> operations (e.g. rename(), unlink()) which change inode location in FAT.
> 
> Yes, it would be stable, but supporting limited operations.
> 
> This is server side solution, and we comparing it with client solution.

Is that useful to anyone?

> >> > If it returns ESTALE, why does it return? I'm assuming the previous code
> >> > path is the cached FH path.
> >> The main point for observation is the file handle-which is used for
> >> all the NFS operation.
> >> So for all the NFS operation(read/write....) which makes use of the
> >> NFS file handle in between if there is a change in inode number
> >> It will result in ESTALE.
> >> Changing inode number on rename happened at NFS server by inode cache
> >> eviction with memory pressure.
> >> 
> >> lookupcache is used at NFS client to reduce number of LOOKUP operations.
> >> But , we can still get ESTALE if inode number at NFS Server change
> >> after LOOKUP, although lookupcache is disable.
> >> 
> >> LOOKUP return NFS FH->[inode number changed at NFS Server] ->
> >> But we still use old NFS FH returned from LOOKUP for any file
> >> operation(write,read,etc..)
> >> -> ESTALE will be returned.
> 
> Yes.  And I'm expecting as client side solution,
> 
> -> ESTALE will be returned -> discard old FH -> restart from LOOKUP ->
> make cached inode -> use returned new FH.
> 
> Yeah, I know this is unstable (there is no perfect solution for now),

You may end up with a totally different file, of course:

	client:			server:

	open "/foo/bar"
				rename "/foo/baz"->"/foo/bar"
	write to file

And now we're writing to the file that was originally named /foo/baz
when we should have gotten ESTALE.

--b.

> but if this works, this doesn't limit any operations.
> 
> We would want to compare client solution (-mm) and server solution
> (stable ino). Or I'd like to know which my knowledges/understanding are
> wrong here.
> -- 
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-12 17:11                                       ` J. Bruce Fields
@ 2012-09-12 17:38                                         ` OGAWA Hirofumi
  2012-09-12 17:45                                           ` J. Bruce Fields
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-12 17:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Namjae Jeon, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

"J. Bruce Fields" <bfields@fieldses.org> writes:

>> On other view (as server side solution), we are thinking there is
>> possible to make the stable filehandle on FAT if we disabled some
>> operations (e.g. rename(), unlink()) which change inode location in FAT.
>> 
>> Yes, it would be stable, but supporting limited operations.
>> 
>> This is server side solution, and we comparing it with client solution.
>
> Is that useful to anyone?

Good question. I'm not sure though, Namjae is asking. And I was asked
about stable read-only export in past.

>> >> LOOKUP return NFS FH->[inode number changed at NFS Server] ->
>> >> But we still use old NFS FH returned from LOOKUP for any file
>> >> operation(write,read,etc..)
>> >> -> ESTALE will be returned.
>> 
>> Yes.  And I'm expecting as client side solution,
>> 
>> -> ESTALE will be returned -> discard old FH -> restart from LOOKUP ->
>> make cached inode -> use returned new FH.
>> 
>> Yeah, I know this is unstable (there is no perfect solution for now),
>
> You may end up with a totally different file, of course:
>
> 	client:			server:
>
> 	open "/foo/bar"
> 				rename "/foo/baz"->"/foo/bar"
> 	write to file
>
> And now we're writing to the file that was originally named /foo/baz
> when we should have gotten ESTALE.

I see. So, client can't solve the ESTALE if inode cache was evicted,
right? (without application changes)
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-12 17:38                                         ` OGAWA Hirofumi
@ 2012-09-12 17:45                                           ` J. Bruce Fields
  2012-09-12 18:49                                             ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: J. Bruce Fields @ 2012-09-12 17:45 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Namjae Jeon, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

On Thu, Sep 13, 2012 at 02:38:11AM +0900, OGAWA Hirofumi wrote:
> "J. Bruce Fields" <bfields@fieldses.org> writes:
> 
> >> On other view (as server side solution), we are thinking there is
> >> possible to make the stable filehandle on FAT if we disabled some
> >> operations (e.g. rename(), unlink()) which change inode location in FAT.
> >> 
> >> Yes, it would be stable, but supporting limited operations.
> >> 
> >> This is server side solution, and we comparing it with client solution.
> >
> > Is that useful to anyone?
> 
> Good question. I'm not sure though, Namjae is asking. And I was asked
> about stable read-only export in past.
> 
> >> >> LOOKUP return NFS FH->[inode number changed at NFS Server] ->
> >> >> But we still use old NFS FH returned from LOOKUP for any file
> >> >> operation(write,read,etc..)
> >> >> -> ESTALE will be returned.
> >> 
> >> Yes.  And I'm expecting as client side solution,
> >> 
> >> -> ESTALE will be returned -> discard old FH -> restart from LOOKUP ->
> >> make cached inode -> use returned new FH.
> >> 
> >> Yeah, I know this is unstable (there is no perfect solution for now),
> >
> > You may end up with a totally different file, of course:
> >
> > 	client:			server:
> >
> > 	open "/foo/bar"
> > 				rename "/foo/baz"->"/foo/bar"
> > 	write to file
> >
> > And now we're writing to the file that was originally named /foo/baz
> > when we should have gotten ESTALE.
> 
> I see. So, client can't solve the ESTALE if inode cache was evicted,
> right? (without application changes)

I don't see how.


As another server-side workaround: maybe they could try tuning the inode
caching to make eviction less likely?

Grepping around...  Documentation/sysctl/vm.txt mentions a
vfs_cache_pressure parameter.

--b.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-12 17:45                                           ` J. Bruce Fields
@ 2012-09-12 18:49                                             ` OGAWA Hirofumi
  2012-09-13  8:11                                               ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-12 18:49 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Namjae Jeon, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

"J. Bruce Fields" <bfields@fieldses.org> writes:

> As another server-side workaround: maybe they could try tuning the inode
> caching to make eviction less likely?
>
> Grepping around...  Documentation/sysctl/vm.txt mentions a
> vfs_cache_pressure parameter.

Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-12 18:49                                             ` OGAWA Hirofumi
@ 2012-09-13  8:11                                               ` Namjae Jeon
  2012-09-13  8:33                                                 ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-13  8:11 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: J. Bruce Fields, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

> >> -> ESTALE will be returned -> discard old FH -> restart from LOOKUP ->
> >> make cached inode -> use returned new FH.
> >>
> >> Yeah, I know this is unstable (there is no perfect solution for now),
> >
> > You may end up with a totally different file, of course:
> >
> >     client:                 server:
> >
> >     open "/foo/bar"
> >                             rename "/foo/baz"->"/foo/bar"
> >     write to file
> >
> > And now we're writing to the file that was originally named /foo/baz
> > when we should have gotten ESTALE.
>
> I see. So, client can't solve the ESTALE if inode cache was evicted,
> right? (without application changes)

There can be situation where we may get not only ESTALE but EIO also.

For example,
-------------------------------
fd = open(“foo.txt”);
while (1) {
       sleep(1);
       write(fd..);
}
--------------------------------

Here “write” may fail when inode number of “foo.txt” is changed at
server due to cache eviction under memory pressure.
When we tried a similar test, we found that “write” is retuning “EIO”
instead of “ESTALE”

---------------------------------------------------------------------------------------------------------
#> ./write_test_dbg bbb 1000 0
FILE : bbb, SIZE : 1048576000 , FSYNC : OFF , RECORD_SIZE = 4096
106264 -rwxr-xr-x 1 root 0 0 Jan 1 00:14 bbb
write failed after 60080128 bytes:, errno = 5: Input/output error
---------------------------------------------------------------------------------------------------------

 As we get EIO instead of ESTALE, it may be difficult to decide when
"restart from LOOKUP” in such situation.
Also, as per Bruce opinion, we can not avoid ESTALE from inode number
change in rebooted server case.
In reboot case, it is worst as it may attempt to write in a different
file if NFS handle at NFS client match with inode number of some other
file at NFS server.

> We would want to compare client solution (-mm) and server solution
> (stable ino). Or I'd like to know which my knowledges/understanding are
> wrong here.
> I see. So, client can't solve the ESTALE if inode cache was evicted,
> right? (without application changes)
> I don't see how.

Yes, I think we can not fix inode number changing issue on two
situation (reboot, inode cache eviction).
And Inode number can change because currently FAT is not allocating
stable inode number in this situation.

> Grepping around... Documentation/sysctl/vm.txt mentions a
> vfs_cache_pressure parameter.
> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
I am worrying if it could lead to OOM condition on embedded
system(short memory(DRAM) and support 3TB HDD disk of big size.)

Please let me know if any issues or queries.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-13  8:11                                               ` Namjae Jeon
@ 2012-09-13  8:33                                                 ` OGAWA Hirofumi
  2012-09-13 11:20                                                   ` J. Bruce Fields
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-13  8:33 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: J. Bruce Fields, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

Namjae Jeon <linkinjeon@gmail.com> writes:

>> I see. So, client can't solve the ESTALE if inode cache was evicted,
>> right? (without application changes)
>
> There can be situation where we may get not only ESTALE but EIO also.
>
> For example,
> -------------------------------
> fd = open(“foo.txt”);
> while (1) {
>        sleep(1);
>        write(fd..);
> }
> --------------------------------
>
> Here “write” may fail when inode number of “foo.txt” is changed at
> server due to cache eviction under memory pressure.
> When we tried a similar test, we found that “write” is retuning “EIO”
> instead of “ESTALE”
>
> ---------------------------------------------------------------------------------------------------------
> #> ./write_test_dbg bbb 1000 0
> FILE : bbb, SIZE : 1048576000 , FSYNC : OFF , RECORD_SIZE = 4096
> 106264 -rwxr-xr-x 1 root 0 0 Jan 1 00:14 bbb
> write failed after 60080128 bytes:, errno = 5: Input/output error
> ---------------------------------------------------------------------------------------------------------
>
>  As we get EIO instead of ESTALE, it may be difficult to decide when
> "restart from LOOKUP” in such situation.
> Also, as per Bruce opinion, we can not avoid ESTALE from inode number
> change in rebooted server case.
> In reboot case, it is worst as it may attempt to write in a different
> file if NFS handle at NFS client match with inode number of some other
> file at NFS server.

I see.

>> Grepping around... Documentation/sysctl/vm.txt mentions a
>> vfs_cache_pressure parameter.
>> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
> I am worrying if it could lead to OOM condition on embedded
> system(short memory(DRAM) and support 3TB HDD disk of big size.)
>
> Please let me know if any issues or queries.

So, now I think stable inode number may be useful if there are users of
it. And I guess those functionality is no collisions with -mm. And I
suppose we can add two modes for "nfs" option (e.g. nfs=1 and nfs=2).

If nfs=1, works like current -mm without no limited operations.
If nfs=2, try to make stable FH and limit some operations

(option name doesn't matter here.)

Does this work fine?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-13  8:33                                                 ` OGAWA Hirofumi
@ 2012-09-13 11:20                                                   ` J. Bruce Fields
  2012-09-13 12:17                                                     ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: J. Bruce Fields @ 2012-09-13 11:20 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Namjae Jeon, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

On Thu, Sep 13, 2012 at 05:33:02PM +0900, OGAWA Hirofumi wrote:
> Namjae Jeon <linkinjeon@gmail.com> writes:
> 
> >> I see. So, client can't solve the ESTALE if inode cache was evicted,
> >> right? (without application changes)
> >
> > There can be situation where we may get not only ESTALE but EIO also.
> >
> > For example,
> > -------------------------------
> > fd = open(“foo.txt”);
> > while (1) {
> >        sleep(1);
> >        write(fd..);
> > }
> > --------------------------------
> >
> > Here “write” may fail when inode number of “foo.txt” is changed at
> > server due to cache eviction under memory pressure.
> > When we tried a similar test, we found that “write” is retuning “EIO”
> > instead of “ESTALE”
> >
> > ---------------------------------------------------------------------------------------------------------
> > #> ./write_test_dbg bbb 1000 0
> > FILE : bbb, SIZE : 1048576000 , FSYNC : OFF , RECORD_SIZE = 4096
> > 106264 -rwxr-xr-x 1 root 0 0 Jan 1 00:14 bbb
> > write failed after 60080128 bytes:, errno = 5: Input/output error
> > ---------------------------------------------------------------------------------------------------------
> >
> >  As we get EIO instead of ESTALE, it may be difficult to decide when
> > "restart from LOOKUP” in such situation.
> > Also, as per Bruce opinion, we can not avoid ESTALE from inode number
> > change in rebooted server case.
> > In reboot case, it is worst as it may attempt to write in a different
> > file if NFS handle at NFS client match with inode number of some other
> > file at NFS server.
> 
> I see.
> 
> >> Grepping around... Documentation/sysctl/vm.txt mentions a
> >> vfs_cache_pressure parameter.
> >> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
> > I am worrying if it could lead to OOM condition on embedded
> > system(short memory(DRAM) and support 3TB HDD disk of big size.)
> >
> > Please let me know if any issues or queries.
> 
> So, now I think stable inode number may be useful if there are users of
> it. And I guess those functionality is no collisions with -mm. And I
> suppose we can add two modes for "nfs" option (e.g. nfs=1 and nfs=2).
> 
> If nfs=1, works like current -mm without no limited operations.

Apologies, I haven't been following the conversation carefully: remind
me what "works like current -mm" means?

--b.

> If nfs=2, try to make stable FH and limit some operations
> 
> (option name doesn't matter here.)
> 
> Does this work fine?
> -- 
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-13 11:20                                                   ` J. Bruce Fields
@ 2012-09-13 12:17                                                     ` OGAWA Hirofumi
  2012-09-13 14:24                                                       ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-13 12:17 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Namjae Jeon, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

"J. Bruce Fields" <bfields@fieldses.org> writes:

>> >> Grepping around... Documentation/sysctl/vm.txt mentions a
>> >> vfs_cache_pressure parameter.
>> >> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
>> > I am worrying if it could lead to OOM condition on embedded
>> > system(short memory(DRAM) and support 3TB HDD disk of big size.)
>> >
>> > Please let me know if any issues or queries.
>> 
>> So, now I think stable inode number may be useful if there are users of
>> it. And I guess those functionality is no collisions with -mm. And I
>> suppose we can add two modes for "nfs" option (e.g. nfs=1 and nfs=2).
>> 
>> If nfs=1, works like current -mm without no limited operations.
>
> Apologies, I haven't been following the conversation carefully: remind
> me what "works like current -mm" means?

Current -mm means the best-effort work only if inode cache is not
evicted.  I.e. if there is no inode cache anymore on server, server
would return ESTALE. So I guess the behavior would not be stable
relatively.

Thanks.

>> If nfs=2, try to make stable FH and limit some operations
>> 
>> (option name doesn't matter here.)
>> 
>> Does this work fine?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-13 12:17                                                     ` OGAWA Hirofumi
@ 2012-09-13 14:24                                                       ` Namjae Jeon
  2012-09-13 14:46                                                         ` J. Bruce Fields
  0 siblings, 1 reply; 33+ messages in thread
From: Namjae Jeon @ 2012-09-13 14:24 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: J. Bruce Fields, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

2012/9/13, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> "J. Bruce Fields" <bfields@fieldses.org> writes:
>
>>> >> Grepping around... Documentation/sysctl/vm.txt mentions a
>>> >> vfs_cache_pressure parameter.
>>> >> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
>>> > I am worrying if it could lead to OOM condition on embedded
>>> > system(short memory(DRAM) and support 3TB HDD disk of big size.)
>>> >
>>> > Please let me know if any issues or queries.
>>>
>>> So, now I think stable inode number may be useful if there are users of
>>> it. And I guess those functionality is no collisions with -mm. And I
>>> suppose we can add two modes for "nfs" option (e.g. nfs=1 and nfs=2).
>>>
>>> If nfs=1, works like current -mm without no limited operations.
>>
>> Apologies, I haven't been following the conversation carefully: remind
>> me what "works like current -mm" means?
>
> Current -mm means the best-effort work only if inode cache is not
> evicted.  I.e. if there is no inode cache anymore on server, server
> would return ESTALE. So I guess the behavior would not be stable
> relatively.
Hi OGAWA.
Sorry for late response.
Okay, I will resend patchset include your suggeston.(-o nfs=2)
Do you mind adding busy list patch to avoid unlink issue ?
And in case of rename, FAT retrun EBUSY while opening file.
We can limit only rename.
Let me know your opinion.

Thanks OGAWA!

>
> Thanks.
>
>>> If nfs=2, try to make stable FH and limit some operations
>>>
>>> (option name doesn't matter here.)
>>>
>>> Does this work fine?
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-13 14:24                                                       ` Namjae Jeon
@ 2012-09-13 14:46                                                         ` J. Bruce Fields
  2012-09-13 15:34                                                           ` OGAWA Hirofumi
  0 siblings, 1 reply; 33+ messages in thread
From: J. Bruce Fields @ 2012-09-13 14:46 UTC (permalink / raw)
  To: Namjae Jeon
  Cc: OGAWA Hirofumi, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

On Thu, Sep 13, 2012 at 11:24:30PM +0900, Namjae Jeon wrote:
> 2012/9/13, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> > "J. Bruce Fields" <bfields@fieldses.org> writes:
> >
> >>> >> Grepping around... Documentation/sysctl/vm.txt mentions a
> >>> >> vfs_cache_pressure parameter.
> >>> >> Yeah. And dirty hack will be possible to adjust sb->s_shrink.batch.
> >>> > I am worrying if it could lead to OOM condition on embedded
> >>> > system(short memory(DRAM) and support 3TB HDD disk of big size.)
> >>> >
> >>> > Please let me know if any issues or queries.
> >>>
> >>> So, now I think stable inode number may be useful if there are users of
> >>> it. And I guess those functionality is no collisions with -mm. And I
> >>> suppose we can add two modes for "nfs" option (e.g. nfs=1 and nfs=2).
> >>>
> >>> If nfs=1, works like current -mm without no limited operations.
> >>
> >> Apologies, I haven't been following the conversation carefully: remind
> >> me what "works like current -mm" means?
> >
> > Current -mm means the best-effort work only if inode cache is not
> > evicted.  I.e. if there is no inode cache anymore on server, server
> > would return ESTALE. So I guess the behavior would not be stable
> > relatively.
> Hi OGAWA.
> Sorry for late response.
> Okay, I will resend patchset include your suggeston.(-o nfs=2)
> Do you mind adding busy list patch to avoid unlink issue ?
> And in case of rename, FAT retrun EBUSY while opening file.
> We can limit only rename.

The server doesn't necessarily know whether a client has the file open,
so does that really help?

--b.

> Let me know your opinion.
> 
> Thanks OGAWA!
> 
> >
> > Thanks.
> >
> >>> If nfs=2, try to make stable FH and limit some operations
> >>>
> >>> (option name doesn't matter here.)
> >>>
> >>> Does this work fine?
> > --
> > OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-13 14:46                                                         ` J. Bruce Fields
@ 2012-09-13 15:34                                                           ` OGAWA Hirofumi
  2012-09-14  8:51                                                             ` Namjae Jeon
  0 siblings, 1 reply; 33+ messages in thread
From: OGAWA Hirofumi @ 2012-09-13 15:34 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Namjae Jeon, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

"J. Bruce Fields" <bfields@fieldses.org> writes:

>> > Current -mm means the best-effort work only if inode cache is not
>> > evicted.  I.e. if there is no inode cache anymore on server, server
>> > would return ESTALE. So I guess the behavior would not be stable
>> > relatively.
>> Hi OGAWA.
>> Sorry for late response.
>> Okay, I will resend patchset include your suggeston.(-o nfs=2)
>> Do you mind adding busy list patch to avoid unlink issue ?
>> And in case of rename, FAT retrun EBUSY while opening file.
>> We can limit only rename.
>
> The server doesn't necessarily know whether a client has the file open,
> so does that really help?

If you are assuming to do rename() somehow, it would not be true. And if
so, you might want to rethink about the patch. (btw, I'm not thinking
deeply yet though, I guess we have to limit unlink() too)

Well, anyway, I'd like to see stable/clean read-only support at first.
(with new nfs option, and with MS_RDONLY).  After that, we can
enable write support.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
  2012-09-13 15:34                                                           ` OGAWA Hirofumi
@ 2012-09-14  8:51                                                             ` Namjae Jeon
  0 siblings, 0 replies; 33+ messages in thread
From: Namjae Jeon @ 2012-09-14  8:51 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: J. Bruce Fields, Steven J. Magnani, Al Viro, akpm, linux-kernel,
	Namjae Jeon, Ravishankar N, Amit Sahrawat

2012/9/14, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> "J. Bruce Fields" <bfields@fieldses.org> writes:
>
>>> > Current -mm means the best-effort work only if inode cache is not
>>> > evicted.  I.e. if there is no inode cache anymore on server, server
>>> > would return ESTALE. So I guess the behavior would not be stable
>>> > relatively.
>>> Hi OGAWA.
>>> Sorry for late response.
>>> Okay, I will resend patchset include your suggeston.(-o nfs=2)
>>> Do you mind adding busy list patch to avoid unlink issue ?
>>> And in case of rename, FAT retrun EBUSY while opening file.
>>> We can limit only rename.
>>
>> The server doesn't necessarily know whether a client has the file open,
>> so does that really help?
>
> If you are assuming to do rename() somehow, it would not be true. And if
> so, you might want to rethink about the patch. (btw, I'm not thinking
> deeply yet though, I guess we have to limit unlink() too)
Hi OGAWA.
Sorry, There are some unclear things in unlink solution.
When we tested before, That did really help.
So, we will check unlink solution more and after fully convinced will
send in separate patch later with updates in changelog of patch when
enabling write support..
>
> Well, anyway, I'd like to see stable/clean read-only support at first.
> (with new nfs option, and with MS_RDONLY).  After that, we can
> enable write support.
Okay, I will send stable ino patch(read-only) soon.

Thanks a lot!
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2012-09-14  8:51 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-04 15:57 [PATCH v2 1/5] fat: allocate persistent inode numbers Namjae Jeon
2012-09-04 16:17 ` Al Viro
2012-09-05 14:08   ` Namjae Jeon
2012-09-05 14:56     ` OGAWA Hirofumi
2012-09-06  6:46       ` Namjae Jeon
2012-09-06 12:19         ` OGAWA Hirofumi
2012-09-06 13:39           ` Namjae Jeon
2012-09-07  7:01             ` Namjae Jeon
2012-09-07 12:15               ` Steven J. Magnani
2012-09-09  9:32                 ` OGAWA Hirofumi
2012-09-09 11:29                   ` OGAWA Hirofumi
2012-09-10 12:03                     ` Namjae Jeon
2012-09-10 14:00                       ` OGAWA Hirofumi
2012-09-11 12:00                         ` Namjae Jeon
2012-09-11 12:31                           ` OGAWA Hirofumi
2012-09-11 15:13                             ` Namjae Jeon
2012-09-11 15:47                               ` OGAWA Hirofumi
2012-09-12 14:12                                 ` Namjae Jeon
2012-09-12 14:32                                   ` J. Bruce Fields
2012-09-12 17:03                                     ` OGAWA Hirofumi
2012-09-12 17:11                                       ` J. Bruce Fields
2012-09-12 17:38                                         ` OGAWA Hirofumi
2012-09-12 17:45                                           ` J. Bruce Fields
2012-09-12 18:49                                             ` OGAWA Hirofumi
2012-09-13  8:11                                               ` Namjae Jeon
2012-09-13  8:33                                                 ` OGAWA Hirofumi
2012-09-13 11:20                                                   ` J. Bruce Fields
2012-09-13 12:17                                                     ` OGAWA Hirofumi
2012-09-13 14:24                                                       ` Namjae Jeon
2012-09-13 14:46                                                         ` J. Bruce Fields
2012-09-13 15:34                                                           ` OGAWA Hirofumi
2012-09-14  8:51                                                             ` Namjae Jeon
2012-09-10 12:28                   ` Steven J. Magnani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).