linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Namjae Jeon <linkinjeon@gmail.com>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: "Steven J. Magnani" <steve@digidescorp.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	akpm@linux-foundation.org, bfields@fieldses.org,
	linux-kernel@vger.kernel.org,
	Namjae Jeon <namjae.jeon@samsung.com>,
	Ravishankar N <ravi.n1@samsung.com>,
	Amit Sahrawat <a.sahrawat@samsung.com>
Subject: Re: [PATCH v2 1/5] fat: allocate persistent inode numbers
Date: Tue, 11 Sep 2012 21:00:07 +0900	[thread overview]
Message-ID: <CAKYAXd-ZNmCbHqmFX=YXti7XprhD_Hgvmnd=TnKPVoFc88Sc4A@mail.gmail.com> (raw)
In-Reply-To: <87har6kmfx.fsf@devron.myhome.or.jp>

2012/9/10, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>:
> Namjae Jeon <linkinjeon@gmail.com> writes:
>
>> Yes, It is  true(current VFAT of -mm tree is not stable). Although we
>> set lookupcache=none while mounting, ESTALE error can still occur in
>> rename case.
>> So there still remain ESTALE error issue from rename case on current -mm
>> tree.
>> plz See the step as the following
>> 1. on client write to file.
>> 2. on client, move/rename file.
>> 3. on server, do drop_caches. etc to somehow evict indoe number so
>> that it gets new inode number
>> 4. on client, resume the program to write to file. write will fail
>> (write: Stale NFS file handle)
>
Hi OGAWA.
> Since rename() will be disabled on stable ino patches, this will be
> unfixable, so rather maybe it is worse.
Currently with our patchset : only rename issue (could not find any
correct approach to ignore this. If we do not update this immediately
at i_pos change – it is just delaying the problem). And we can return
EBUSY when rename is called while process is opening file with rename
limitation. Without our patchset also - the rename issue can occur
over NFS file access - when the inode is evicted from the SERVER
cache.
>
> Did you checked why it returns -ESTALE?  Or rename() issue also is
> unfixable on -mm?
It is reproducible regardless of lookupcache is enable or disable.
The inode is not found in server inode cache. So when
d_obtain_alias(inode) is called, it returns ESTALE.
Call path like this.
fh_verify()-->nfsd_set_fh_dentry()-->exportfs_decode_fh()-->nop->fh_to_dentry()-->fat_fh_to_dentry()-->generic_fh_to_dentry()-->get_inode()-->fat_nfs_get_inode()

static struct inode *fat_nfs_get_inode(struct super_block *sb,
                                       u64 ino, u32 generation)
{
......
        inode = ilookup(sb, ino); ->This looks up in inode cache and
returns null
        if (inode && generation && (inode->i_generation != generation)) {
                iput(inode);
                inode = NULL;
         }
        return inode;
}

I think that it is unfixable because we can not know i_pos of inode
changed by rename.
And even though we know it, there is no rebuild inode routine in -mm.
And It even can not fix in our patches.

>
>> And ......
>> If we mount NFS with lookupcache=none, FAT file lookup performance is
>> severely dropped.
>> LOOKUP performance is very poor on slow network and slow device. I do
>> not recommend to disable lookup cache on NFS.
>> And that is why reconstructing inode is already implemented in other
>> filesystem (e.g. EXT4, XFS etc..)
>> Currently lookupcache is enabled by default in NFS, it means users
>> already have disclosed and experienced ESTALE issues on NFS over VFAT.
>>
>> I agree wth you to make NFS over VFAT read-only filesystem to avoid all
>> issues.
>> Eventually we can make it writable with rename limitation when we
>> decide that it is pretty stable in mainline.
>> So, I suggest to add 'nfs_ro' mount option instead of 'nfs' option.
>
> -mm seems to be more stable than I thought. As he said, sounds like
> rename() is an only known issue on -mm, true?
Yes, There is only rename issue in stability if we use lookcache is disable.
But performance will severely be dropped
But If lookup cache is enable, there are estale and rename issue in -mm.
>
> And are you tried https://lkml.org/lkml/2012/6/29/381 patches? It sounds
> like to improve performance by enabling lookupcache.
We checked this patches when facing estale issue in -mm.
But It is no use, these patches just retry system call one more when
estale error.

> I'd like to be knowing the critical reason we have to replace it.
I arrange to help your decision as the following.

1. lookup cache is enable at default in NFS. So estale error can be
easily occurred in -mm.
2. If lookup cache is disable, there is rename issue and file lookup
performance is dropped in -mm.
4. If we use our patches, there is rename issue. but we can use VFAT
over NFS with lookup cache enable.
5. If we use read-only with our patches, there is no issue.

Thanks.
>
> Thanks.
> --
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>

  reply	other threads:[~2012-09-11 12:00 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-04 15:57 [PATCH v2 1/5] fat: allocate persistent inode numbers Namjae Jeon
2012-09-04 16:17 ` Al Viro
2012-09-05 14:08   ` Namjae Jeon
2012-09-05 14:56     ` OGAWA Hirofumi
2012-09-06  6:46       ` Namjae Jeon
2012-09-06 12:19         ` OGAWA Hirofumi
2012-09-06 13:39           ` Namjae Jeon
2012-09-07  7:01             ` Namjae Jeon
2012-09-07 12:15               ` Steven J. Magnani
2012-09-09  9:32                 ` OGAWA Hirofumi
2012-09-09 11:29                   ` OGAWA Hirofumi
2012-09-10 12:03                     ` Namjae Jeon
2012-09-10 14:00                       ` OGAWA Hirofumi
2012-09-11 12:00                         ` Namjae Jeon [this message]
2012-09-11 12:31                           ` OGAWA Hirofumi
2012-09-11 15:13                             ` Namjae Jeon
2012-09-11 15:47                               ` OGAWA Hirofumi
2012-09-12 14:12                                 ` Namjae Jeon
2012-09-12 14:32                                   ` J. Bruce Fields
2012-09-12 17:03                                     ` OGAWA Hirofumi
2012-09-12 17:11                                       ` J. Bruce Fields
2012-09-12 17:38                                         ` OGAWA Hirofumi
2012-09-12 17:45                                           ` J. Bruce Fields
2012-09-12 18:49                                             ` OGAWA Hirofumi
2012-09-13  8:11                                               ` Namjae Jeon
2012-09-13  8:33                                                 ` OGAWA Hirofumi
2012-09-13 11:20                                                   ` J. Bruce Fields
2012-09-13 12:17                                                     ` OGAWA Hirofumi
2012-09-13 14:24                                                       ` Namjae Jeon
2012-09-13 14:46                                                         ` J. Bruce Fields
2012-09-13 15:34                                                           ` OGAWA Hirofumi
2012-09-14  8:51                                                             ` Namjae Jeon
2012-09-10 12:28                   ` Steven J. Magnani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKYAXd-ZNmCbHqmFX=YXti7XprhD_Hgvmnd=TnKPVoFc88Sc4A@mail.gmail.com' \
    --to=linkinjeon@gmail.com \
    --cc=a.sahrawat@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namjae.jeon@samsung.com \
    --cc=ravi.n1@samsung.com \
    --cc=steve@digidescorp.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).