linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 9p: fscache duplicate cookie
@ 2021-05-06 10:03 Luis Henriques
  2021-05-06 10:45 ` Dominique Martinet
  2021-05-11 12:38 ` David Howells
  0 siblings, 2 replies; 23+ messages in thread
From: Luis Henriques @ 2021-05-06 10:03 UTC (permalink / raw)
  To: Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet
  Cc: David Howells, linux-fsdevel, v9fs-developer

Hi!

I've been seeing fscache complaining about duplicate cookies in 9p:

 FS-Cache: Duplicate cookie detected
 FS-Cache: O-cookie c=00000000ba929e80 [p=000000002e706df1 fl=226 nc=0 na=1]
 FS-Cache: O-cookie d=0000000000000000 n=0000000000000000
 FS-Cache: O-key=[8] '0312710100000000'
 FS-Cache: N-cookie c=00000000274050fe [p=000000002e706df1 fl=2 nc=0 na=1]
 FS-Cache: N-cookie d=0000000037368b65 n=000000004047ed1f
 FS-Cache: N-key=[8] '0312710100000000'

It's quite easy to reproduce in my environment by running xfstests using
the virtme scripts to boot a test kernel.  A quick look seems to indicate
the warning comes from the v9fs_vfs_atomic_open_dotl() path:

[  383.688975]  __fscache_acquire_cookie+0xd1/0x2d0
[  383.688983]  v9fs_cache_inode_get_cookie.part.0+0xd1/0x100
[  383.688988]  ? v9fs_vfs_link+0x160/0x160
[  383.688994]  v9fs_inode_from_fid_dotl+0x141/0x180 
[  383.688998]  v9fs_vfs_lookup.part.0+0x1ab/0x1f0
[  383.689003]  ? v9fs_vfs_create+0x80/0x80 
[  383.689007]  v9fs_vfs_atomic_open_dotl+0x14f/0x4f0 
[  383.689011]  ? do_raw_spin_unlock+0xa3/0x130
[  383.689016]  ? v9fs_inode_from_fid_dotl+0x180/0x180
[  383.689021]  ? __d_lookup_rcu+0x270/0x270
[  383.689029]  ? down_read+0x13b/0x2c0
[  383.689044]  ? rwsem_down_read_slowpath+0x5b0/0x5b0
[  383.689052]  ? __d_lookup+0xb7/0x220
[  383.689059]  path_openat+0xb33/0x15a0
[  383.689068]  ? path_lookupat.isra.0+0x230/0x230
[  383.689073]  ? orc_find.part.0+0x200/0x200
[  383.689078]  ? ftrace_ops_trampoline+0x51/0x80
[  383.689083]  ? __kernel_text_address+0xe/0x30
[  383.689087]  ? unwind_get_return_address+0x2f/0x50
[  383.689091]  ? set_syscall_user_dispatch+0xb0/0xb0
[  383.689096]  ? arch_stack_walk+0x9e/0xf0
[  383.689101]  do_filp_open+0x136/0x1c0
[  383.689105]  ? may_open_dev+0x50/0x50
[  383.689109]  ? simple_attr_release+0x30/0x30
[  383.689113]  ? do_raw_spin_lock+0x119/0x1d0
[  383.689116]  ? rwlock_bug.part.0+0x60/0x60
[  383.689119]  ? do_raw_spin_unlock+0xa3/0x130
[  383.689123]  ? _raw_spin_unlock+0xa/0x10
[  383.689128]  ? alloc_fd+0x12e/0x290
[  383.689132]  do_sys_openat2+0x2f6/0x420
[  383.689137]  ? file_open_root+0x200/0x200
[  383.689141]  ? rwsem_wake.isra.0+0x100/0x100
[  383.689146]  do_sys_open+0x8a/0xe0
[  383.689150]  ? filp_open+0x50/0x50
[  383.689153]  ? asm_exc_page_fault+0x8/0x30
[  383.689156]  ? __x64_sys_openat+0x3e/0x60
[  383.689159]  do_syscall_64+0x45/0x80
[  383.689163]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Is this a know issue?

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-06 10:03 9p: fscache duplicate cookie Luis Henriques
@ 2021-05-06 10:45 ` Dominique Martinet
  2021-05-06 12:18   ` Luis Henriques
  2021-05-11 12:38 ` David Howells
  1 sibling, 1 reply; 23+ messages in thread
From: Dominique Martinet @ 2021-05-06 10:45 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Eric Van Hensbergen, Latchesar Ionkov, David Howells,
	linux-fsdevel, v9fs-developer

Hi,

Luis Henriques wrote on Thu, May 06, 2021 at 11:03:31AM +0100:
> I've been seeing fscache complaining about duplicate cookies in 9p:
> 
>  FS-Cache: Duplicate cookie detected
>  FS-Cache: O-cookie c=00000000ba929e80 [p=000000002e706df1 fl=226 nc=0 na=1]
>  FS-Cache: O-cookie d=0000000000000000 n=0000000000000000
>  FS-Cache: O-key=[8] '0312710100000000'
>  FS-Cache: N-cookie c=00000000274050fe [p=000000002e706df1 fl=2 nc=0 na=1]
>  FS-Cache: N-cookie d=0000000037368b65 n=000000004047ed1f
>  FS-Cache: N-key=[8] '0312710100000000'

> It's quite easy to reproduce in my environment by running xfstests using
> the virtme scripts to boot a test kernel.  A quick look seems to indicate
> the warning comes from the v9fs_vfs_atomic_open_dotl() path:
> 
> [...]
> 
> Is this a know issue?

I normally don't use fscache so never really looked into it, I saw it
again recently when looking at David's fscache/netfs work and it didn't
seem to cause real trouble without a server but I bet it would if there
were to be one, I just never had the time to look further.

From a quick look v9fs uses the 'qid path' of the inode that is
supposed to be a unique identifier; in practice there are various
heuristics to it depending on the server but qemu takes the st_dev of
the underlying filesystem and chops the higher bits of the inode number
to make it up -- see qid_path_suffixmap() in hw/9pfs/9p.c in qemu
sources.

(protocol description can be found here:
https://github.com/chaos/diod/blob/master/protocol.md
)


In this case if there is a cookie collision there are two possibilities
I can see: either a previously hashed inode somehow got cleaned up
without the associated fscache cleanup or qemu dished out the same qid
path for two different files -- old filesystems used to have predictable
inode numbers but that is far from true anymore so it's quite possible
some files would have the same lower bits for their inode number on the
host...
If you have the time to investigate further that would be appreciated, I
have confirmed the fscache rework David suggested did not fix it so the
work will not be lost.


That's going to be very verbose but if you're not scared of digging at
logs a possible way to confirm qid identity would be to mount with -o
debug=5 (P9_DEBUG_9P + ERROR), all qid paths are logged to dmesg, but
that might not be viable if there is a real lot -- it depends on how
fast and reliable your quite easy to reproduce is...


Thanks,
-- 
Dominique

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-06 10:45 ` Dominique Martinet
@ 2021-05-06 12:18   ` Luis Henriques
  2021-05-07 16:36     ` Luis Henriques
  0 siblings, 1 reply; 23+ messages in thread
From: Luis Henriques @ 2021-05-06 12:18 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Eric Van Hensbergen, Latchesar Ionkov, David Howells,
	linux-fsdevel, v9fs-developer

Dominique Martinet <asmadeus@codewreck.org> writes:

> Hi,
>
> Luis Henriques wrote on Thu, May 06, 2021 at 11:03:31AM +0100:
>> I've been seeing fscache complaining about duplicate cookies in 9p:
>> 
>>  FS-Cache: Duplicate cookie detected
>>  FS-Cache: O-cookie c=00000000ba929e80 [p=000000002e706df1 fl=226 nc=0 na=1]
>>  FS-Cache: O-cookie d=0000000000000000 n=0000000000000000
>>  FS-Cache: O-key=[8] '0312710100000000'
>>  FS-Cache: N-cookie c=00000000274050fe [p=000000002e706df1 fl=2 nc=0 na=1]
>>  FS-Cache: N-cookie d=0000000037368b65 n=000000004047ed1f
>>  FS-Cache: N-key=[8] '0312710100000000'
>
>> It's quite easy to reproduce in my environment by running xfstests using
>> the virtme scripts to boot a test kernel.  A quick look seems to indicate
>> the warning comes from the v9fs_vfs_atomic_open_dotl() path:
>> 
>> [...]
>> 
>> Is this a know issue?
>
> I normally don't use fscache so never really looked into it, I saw it
> again recently when looking at David's fscache/netfs work and it didn't
> seem to cause real trouble without a server but I bet it would if there
> were to be one, I just never had the time to look further.
>
> From a quick look v9fs uses the 'qid path' of the inode that is
> supposed to be a unique identifier; in practice there are various
> heuristics to it depending on the server but qemu takes the st_dev of
> the underlying filesystem and chops the higher bits of the inode number
> to make it up -- see qid_path_suffixmap() in hw/9pfs/9p.c in qemu
> sources.
>
> (protocol description can be found here:
> https://github.com/chaos/diod/blob/master/protocol.md
> )
>
>
> In this case if there is a cookie collision there are two possibilities
> I can see: either a previously hashed inode somehow got cleaned up
> without the associated fscache cleanup or qemu dished out the same qid
> path for two different files -- old filesystems used to have predictable
> inode numbers but that is far from true anymore so it's quite possible
> some files would have the same lower bits for their inode number on the
> host...
> If you have the time to investigate further that would be appreciated, I
> have confirmed the fscache rework David suggested did not fix it so the
> work will not be lost.
>
>
> That's going to be very verbose but if you're not scared of digging at
> logs a possible way to confirm qid identity would be to mount with -o
> debug=5 (P9_DEBUG_9P + ERROR), all qid paths are logged to dmesg, but
> that might not be viable if there is a real lot -- it depends on how
> fast and reliable your quite easy to reproduce is...

Thanks a lot for the quick reply, Dominique.  I'll definitely allocate
some time to try to find a bit more about this issue (although I may end
up just hacking the code to print out the qids instead of turning on all
the debug).  I just wanted to make sure I wasn't hitting some known
fundamental problem that simply couldn't be fixed without major changes.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-06 12:18   ` Luis Henriques
@ 2021-05-07 16:36     ` Luis Henriques
  2021-05-08  0:47       ` Dominique Martinet
  0 siblings, 1 reply; 23+ messages in thread
From: Luis Henriques @ 2021-05-07 16:36 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Eric Van Hensbergen, Latchesar Ionkov, David Howells,
	linux-fsdevel, v9fs-developer

Luis Henriques <lhenriques@suse.de> writes:

> Dominique Martinet <asmadeus@codewreck.org> writes:
>
>> Hi,
>>
>> Luis Henriques wrote on Thu, May 06, 2021 at 11:03:31AM +0100:
>>> I've been seeing fscache complaining about duplicate cookies in 9p:
>>> 
>>>  FS-Cache: Duplicate cookie detected
>>>  FS-Cache: O-cookie c=00000000ba929e80 [p=000000002e706df1 fl=226 nc=0 na=1]
>>>  FS-Cache: O-cookie d=0000000000000000 n=0000000000000000
>>>  FS-Cache: O-key=[8] '0312710100000000'
>>>  FS-Cache: N-cookie c=00000000274050fe [p=000000002e706df1 fl=2 nc=0 na=1]
>>>  FS-Cache: N-cookie d=0000000037368b65 n=000000004047ed1f
>>>  FS-Cache: N-key=[8] '0312710100000000'
>>
>>> It's quite easy to reproduce in my environment by running xfstests using
>>> the virtme scripts to boot a test kernel.  A quick look seems to indicate
>>> the warning comes from the v9fs_vfs_atomic_open_dotl() path:
>>> 
>>> [...]
>>> 
>>> Is this a know issue?
>>
>> I normally don't use fscache so never really looked into it, I saw it
>> again recently when looking at David's fscache/netfs work and it didn't
>> seem to cause real trouble without a server but I bet it would if there
>> were to be one, I just never had the time to look further.
>>
>> From a quick look v9fs uses the 'qid path' of the inode that is
>> supposed to be a unique identifier; in practice there are various
>> heuristics to it depending on the server but qemu takes the st_dev of
>> the underlying filesystem and chops the higher bits of the inode number
>> to make it up -- see qid_path_suffixmap() in hw/9pfs/9p.c in qemu
>> sources.
>>
>> (protocol description can be found here:
>> https://github.com/chaos/diod/blob/master/protocol.md
>> )
>>
>>
>> In this case if there is a cookie collision there are two possibilities
>> I can see: either a previously hashed inode somehow got cleaned up
>> without the associated fscache cleanup or qemu dished out the same qid
>> path for two different files -- old filesystems used to have predictable
>> inode numbers but that is far from true anymore so it's quite possible
>> some files would have the same lower bits for their inode number on the
>> host...
>> If you have the time to investigate further that would be appreciated, I
>> have confirmed the fscache rework David suggested did not fix it so the
>> work will not be lost.
>>
>>
>> That's going to be very verbose but if you're not scared of digging at
>> logs a possible way to confirm qid identity would be to mount with -o
>> debug=5 (P9_DEBUG_9P + ERROR), all qid paths are logged to dmesg, but
>> that might not be viable if there is a real lot -- it depends on how
>> fast and reliable your quite easy to reproduce is...
>
> Thanks a lot for the quick reply, Dominique.  I'll definitely allocate
> some time to try to find a bit more about this issue (although I may end
> up just hacking the code to print out the qids instead of turning on all
> the debug).  I just wanted to make sure I wasn't hitting some known
> fundamental problem that simply couldn't be fixed without major changes.

Ok, I spent some more time on this issue today.  I've hacked a bit of code
to keep track of new inodes' qids and I'm convinced there are no
duplicates when this issue happens.

OTOH, I've done another quick test: in v9fs_cache_inode_get_cookie(), I do
an fscache_acquire_cookie() retry when it fails (due to the dup error),
and this retry *does* succeed.  Which means, I guess, there's a race going
on.  I didn't managed to look too deep yet, but my current theory is that
the inode is being evicted while an open is triggered.  A new inode is
allocated but the old inode fscache cookie hasn't yet been relinquished.
Does this make any sense?

If this theory is correct, I'm not sure what's the best way to close this
race because the v9inode->fscache_lock can't really be used.  But I still
need to proof this is really what's happening.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-07 16:36     ` Luis Henriques
@ 2021-05-08  0:47       ` Dominique Martinet
  2021-05-10 10:54         ` Luis Henriques
  2021-05-11 12:53         ` David Howells
  0 siblings, 2 replies; 23+ messages in thread
From: Dominique Martinet @ 2021-05-08  0:47 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Eric Van Hensbergen, Latchesar Ionkov, David Howells,
	linux-fsdevel, v9fs-developer

Luis Henriques wrote on Fri, May 07, 2021 at 05:36:29PM +0100:
> Ok, I spent some more time on this issue today.  I've hacked a bit of code
> to keep track of new inodes' qids and I'm convinced there are no
> duplicates when this issue happens.

Ok, that's good.
Just to make sure what did you look at aside of the qid to make sure
it's unique? i_ino comes straight from qid->path too so we don't have
any great key available which is why I hadn't suggesting building a
debug table.
(... well, actually that means we'll never try to allocate two inodes
with the same inode number because of how v9fs_qid_iget_dotl() works, so
if there is a collision in qid paths we wouldn't see it through cookies
collision in the first place. I'm not sure that's good? But at least
that clears up that theory, sorry for the bad suggestion)

> OTOH, I've done another quick test: in v9fs_cache_inode_get_cookie(), I do
> an fscache_acquire_cookie() retry when it fails (due to the dup error),
> and this retry *does* succeed.  Which means, I guess, there's a race going
> on.  I didn't managed to look too deep yet, but my current theory is that
> the inode is being evicted while an open is triggered.  A new inode is
> allocated but the old inode fscache cookie hasn't yet been relinquished.
> Does this make any sense?

hm, if the retry goes through I guess that'd make sense; if they both
were used in parallel the second call should fail all the same so that's
definitely a likely explanation.

It wouldn't hurt to check with v9fs_evict_inode if that's correct...
There definitely is a window where inode is no longer findable (thus
leading to allocation of a new one) and the call to the
fscache_relinquish_cookie() at eviction, but looking at e.g. afs they
are doing exactly the same as 9p here (iget5_locked, if that gets a new
inode then call fscache_acquire_cookie // fscache_relinquish_cookie in
evict op) so I'm not sure what we're missing.


David, do you have an idea?

> If this theory is correct, I'm not sure what's the best way to close this
> race because the v9inode->fscache_lock can't really be used.  But I still
> need to proof this is really what's happening.

Yes, I think we're going to need proof before thinking of a solution, I
can't think of anything simple either.


Thanks again for looking into it,
-- 
Dominique

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-08  0:47       ` Dominique Martinet
@ 2021-05-10 10:54         ` Luis Henriques
  2021-05-10 11:47           ` Luis Henriques
  2021-05-11 12:44           ` David Howells
  2021-05-11 12:53         ` David Howells
  1 sibling, 2 replies; 23+ messages in thread
From: Luis Henriques @ 2021-05-10 10:54 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Eric Van Hensbergen, Latchesar Ionkov, David Howells,
	linux-fsdevel, v9fs-developer

Dominique Martinet <asmadeus@codewreck.org> writes:

> Luis Henriques wrote on Fri, May 07, 2021 at 05:36:29PM +0100:
>> Ok, I spent some more time on this issue today.  I've hacked a bit of code
>> to keep track of new inodes' qids and I'm convinced there are no
>> duplicates when this issue happens.
>
> Ok, that's good.
> Just to make sure what did you look at aside of the qid to make sure
> it's unique? i_ino comes straight from qid->path too so we don't have
> any great key available which is why I hadn't suggesting building a
> debug table.
> (... well, actually that means we'll never try to allocate two inodes
> with the same inode number because of how v9fs_qid_iget_dotl() works, so
> if there is a collision in qid paths we wouldn't see it through cookies
> collision in the first place. I'm not sure that's good? But at least
> that clears up that theory, sorry for the bad suggestion)
>

Ok, I should probably have added some more details in my email.  So,
here's what I did:

I've created a list (actually a tree...) of objects that keep track of
each new inode in v9fs_qid_iget_dotl().  These objects contain the inode
number, and the qid (type, version, path).  These are then removed from
the list in v9fs_evict_inode().

Whenever there's an error in v9fs_cache_inode_get_cookie(), i.e. when
v9inode->fscache == NULL, I'll search this list to see if that inode
number was there (or if I can find the qid.path and qid.version).  

I have never seen a hit in this search, hence my claim of not having a
duplicate qids involved.  Of course my hack can be buggy and I completely
miss it ;-)

>> OTOH, I've done another quick test: in v9fs_cache_inode_get_cookie(), I do
>> an fscache_acquire_cookie() retry when it fails (due to the dup error),
>> and this retry *does* succeed.  Which means, I guess, there's a race going
>> on.  I didn't managed to look too deep yet, but my current theory is that
>> the inode is being evicted while an open is triggered.  A new inode is
>> allocated but the old inode fscache cookie hasn't yet been relinquished.
>> Does this make any sense?
>
> hm, if the retry goes through I guess that'd make sense; if they both
> were used in parallel the second call should fail all the same so that's
> definitely a likely explanation.
>
> It wouldn't hurt to check with v9fs_evict_inode if that's correct...
> There definitely is a window where inode is no longer findable (thus
> leading to allocation of a new one) and the call to the
> fscache_relinquish_cookie() at eviction, but looking at e.g. afs they
> are doing exactly the same as 9p here (iget5_locked, if that gets a new
> inode then call fscache_acquire_cookie // fscache_relinquish_cookie in
> evict op) so I'm not sure what we're missing.
>
>
> David, do you have an idea?

I've just done a quick experiment: moving the call to function
v9fs_cache_inode_put_cookie() in v9fs_evict_inode() to the beginning
(before truncate_inode_pages_final()) and it seems to, at least, narrow
the window -- I'm not able to reproduce the issue anymore.  But I'll need
to look closer.

And yeah, I had checked other filesystems too and they seem to follow this
pattern.  So either I'm looking at the wrong place or this is something
that is much more likely to be triggered by 9p.

Cheers,
-- 
Luis

>> If this theory is correct, I'm not sure what's the best way to close this
>> race because the v9inode->fscache_lock can't really be used.  But I still
>> need to proof this is really what's happening.
>
> Yes, I think we're going to need proof before thinking of a solution, I
> can't think of anything simple either.
>
>
> Thanks again for looking into it,
> -- 
> Dominique

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-10 10:54         ` Luis Henriques
@ 2021-05-10 11:47           ` Luis Henriques
  2021-05-11 12:44           ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: Luis Henriques @ 2021-05-10 11:47 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Eric Van Hensbergen, Latchesar Ionkov, David Howells,
	linux-fsdevel, v9fs-developer

[-- Attachment #1: Type: text/plain, Size: 5029 bytes --]

Luis Henriques <lhenriques@suse.de> writes:

> Dominique Martinet <asmadeus@codewreck.org> writes:
>
>> Luis Henriques wrote on Fri, May 07, 2021 at 05:36:29PM +0100:
>>> Ok, I spent some more time on this issue today.  I've hacked a bit of code
>>> to keep track of new inodes' qids and I'm convinced there are no
>>> duplicates when this issue happens.
>>
>> Ok, that's good.
>> Just to make sure what did you look at aside of the qid to make sure
>> it's unique? i_ino comes straight from qid->path too so we don't have
>> any great key available which is why I hadn't suggesting building a
>> debug table.
>> (... well, actually that means we'll never try to allocate two inodes
>> with the same inode number because of how v9fs_qid_iget_dotl() works, so
>> if there is a collision in qid paths we wouldn't see it through cookies
>> collision in the first place. I'm not sure that's good? But at least
>> that clears up that theory, sorry for the bad suggestion)
>>
>
> Ok, I should probably have added some more details in my email.  So,
> here's what I did:
>
> I've created a list (actually a tree...) of objects that keep track of
> each new inode in v9fs_qid_iget_dotl().  These objects contain the inode
> number, and the qid (type, version, path).  These are then removed from
> the list in v9fs_evict_inode().
>
> Whenever there's an error in v9fs_cache_inode_get_cookie(), i.e. when
> v9inode->fscache == NULL, I'll search this list to see if that inode
> number was there (or if I can find the qid.path and qid.version).  
>
> I have never seen a hit in this search, hence my claim of not having a
> duplicate qids involved.  Of course my hack can be buggy and I completely
> miss it ;-)
>
>>> OTOH, I've done another quick test: in v9fs_cache_inode_get_cookie(), I do
>>> an fscache_acquire_cookie() retry when it fails (due to the dup error),
>>> and this retry *does* succeed.  Which means, I guess, there's a race going
>>> on.  I didn't managed to look too deep yet, but my current theory is that
>>> the inode is being evicted while an open is triggered.  A new inode is
>>> allocated but the old inode fscache cookie hasn't yet been relinquished.
>>> Does this make any sense?
>>
>> hm, if the retry goes through I guess that'd make sense; if they both
>> were used in parallel the second call should fail all the same so that's
>> definitely a likely explanation.
>>
>> It wouldn't hurt to check with v9fs_evict_inode if that's correct...
>> There definitely is a window where inode is no longer findable (thus
>> leading to allocation of a new one) and the call to the
>> fscache_relinquish_cookie() at eviction, but looking at e.g. afs they
>> are doing exactly the same as 9p here (iget5_locked, if that gets a new
>> inode then call fscache_acquire_cookie // fscache_relinquish_cookie in
>> evict op) so I'm not sure what we're missing.
>>
>>
>> David, do you have an idea?
>
> I've just done a quick experiment: moving the call to function
> v9fs_cache_inode_put_cookie() in v9fs_evict_inode() to the beginning
> (before truncate_inode_pages_final()) and it seems to, at least, narrow
> the window -- I'm not able to reproduce the issue anymore.  But I'll need
> to look closer.
>

And right after sending this email I decided to try a different
experiment.  Here's the code I had in v9fs_evict_inode():

void v9fs_evict_inode(struct inode *inode)
{
	struct v9fs_inode *v9inode = V9FS_I(inode);

	v9fs_debug_remove(inode->i_ino); /* <------------------------- */
	truncate_inode_pages_final(&inode->i_data);
	clear_inode(inode);
	filemap_fdatawrite(&inode->i_data);

	v9fs_cache_inode_put_cookie(inode);
	/* clunk the fid stashed in writeback_fid */
	if (v9inode->writeback_fid) {
		p9_client_clunk(v9inode->writeback_fid);
		v9inode->writeback_fid = NULL;
	}
}

v9fs_debug_remove() is the function that would remove the inode from my
debug tree.  In my new experiment, I changed this with:

void v9fs_evict_inode(struct inode *inode)
{
	struct v9fs_inode *v9inode = V9FS_I(inode);

	v9fs_debug_tag(inode->i_ino);
	(...)
	v9fs_debug_remove(inode->i_ino);
}

So, I effectively "tag" the inode for removing it from the list but only
remove it in the end.  The result is that I actually started seeing this
inode tagged for removing in v9fs_cache_inode_get_cookie() when fscache
detects the duplicate cookie!

I'm attaching the debug patch I'm using.  Obviously, I'm not really proud
of this code and is not for merging (it's *really* hacky!), but maybe it
helps clarifying what I tried to explain above.  I.e. that
fscache_relinquish_cookie() should probably be invoked early when evicting
an inode.

FTR, with this patch applied I occasionally (but not always!) see the
following:

  [DEBUG] inode: 24187397 quid: 0.1711203.633864d4
  [DEBUG] found in tree qid: 0.1711203.633864d4 removing: 1
  [DEBUG] found dup ino: 24187397 0.1711203.633864d4 removing: 1

(Sometimes I do not see the duplicate being found, which probably means I
didn't hit the race.)

David, does this make sense?

Cheers
-- 
Luis


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: debug-code.patch --]
[-- Type: text/x-diff, Size: 5633 bytes --]

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index eb2151fb6049..59b59291609c 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -122,6 +122,30 @@ void v9fs_cache_inode_get_cookie(struct inode *inode)
 						  i_size_read(&v9inode->vfs_inode),
 						  true);
 
+	if (!v9inode->fscache) {
+		struct v9fs_debug *db, *next;
+
+		mutex_lock(&v9fs_debug_tree_mutex);
+		db = v9fs_debug_search(inode->i_ino);
+		printk("[DEBUG] inode: %ld quid: %x.%llx.%x\n",
+		       inode->i_ino,
+		       v9inode->qid.type,
+		       (unsigned long long)v9inode->qid.path,
+		       v9inode->qid.version);
+		if (db)
+			printk("[DEBUG] found in tree qid: %x.%llx.%x removing: %d\n",
+			       db->type, (unsigned long long)db->path,
+			       db->version, db->removing);
+		rbtree_postorder_for_each_entry_safe(db, next, &v9fs_debug_tree, node) {
+			if (db->path == v9inode->qid.path &&
+			    db->version == v9inode->qid.version)
+				printk("[DEBUG] found dup ino: %ld %x.%llx.%x removing: %d\n",
+				       db->inode,
+				       db->type, (unsigned long long)db->path,
+				       db->version, db->removing);
+		}
+		mutex_unlock(&v9fs_debug_tree_mutex);
+	}
 	p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n",
 		 inode, v9inode->fscache);
 }
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 4ca56c5dd637..7935ab56453c 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -119,6 +119,22 @@ struct v9fs_inode {
 	struct inode vfs_inode;
 };
 
+struct v9fs_debug {
+	struct rb_node node;
+	unsigned long inode;
+	u8 type;
+	u32 version;
+	u64 path;
+	bool removing;
+};
+
+extern struct rb_root v9fs_debug_tree;
+extern struct mutex v9fs_debug_tree_mutex;
+extern void v9fs_debug_remove(unsigned long inode);
+extern void v9fs_debug_tag(unsigned long inode);
+extern struct v9fs_debug *v9fs_debug_search(unsigned long inode);
+extern void v9fs_debug_insert(unsigned long inode, u8 type, u32 version, u64 path);
+
 static inline struct v9fs_inode *V9FS_I(const struct inode *inode)
 {
 	return container_of(inode, struct v9fs_inode, vfs_inode);
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 795706520b5e..e5ebec787c28 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -377,6 +377,7 @@ void v9fs_evict_inode(struct inode *inode)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
 
+	v9fs_debug_tag(inode->i_ino);
 	truncate_inode_pages_final(&inode->i_data);
 	clear_inode(inode);
 	filemap_fdatawrite(&inode->i_data);
@@ -387,6 +388,7 @@ void v9fs_evict_inode(struct inode *inode)
 		p9_client_clunk(v9inode->writeback_fid);
 		v9inode->writeback_fid = NULL;
 	}
+	v9fs_debug_remove(inode->i_ino);
 }
 
 static int v9fs_test_inode(struct inode *inode, void *data)
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index e1c0240b51c0..56a7bc0c627b 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -32,6 +32,9 @@
 #include "xattr.h"
 #include "acl.h"
 
+struct rb_root v9fs_debug_tree = RB_ROOT;
+DEFINE_MUTEX(v9fs_debug_tree_mutex);
+
 static int
 v9fs_vfs_mknod_dotl(struct user_namespace *mnt_userns, struct inode *dir,
 		    struct dentry *dentry, umode_t omode, dev_t rdev);
@@ -94,6 +97,94 @@ static int v9fs_set_inode_dotl(struct inode *inode,  void *data)
 	return 0;
 }
 
+struct v9fs_debug *v9fs_debug_search(unsigned long inode)
+{
+	struct rb_node *node = v9fs_debug_tree.rb_node;
+
+	while (node) {
+		struct v9fs_debug *data = container_of(node, struct v9fs_debug, node);
+
+		if (data->inode < inode)
+			node = node->rb_left;
+		else if (data->inode > inode)
+			node = node->rb_right;
+		else
+			return data;
+	}
+	return NULL;
+}
+
+void v9fs_debug_tag(unsigned long inode)
+{
+	struct v9fs_debug *node;
+
+	mutex_lock(&v9fs_debug_tree_mutex);
+	node = v9fs_debug_search(inode);
+	if (node)
+		node->removing = true;
+	else
+		printk("[DEBUG] can't find %ld for tagging\n", inode);
+	mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
+void v9fs_debug_remove(unsigned long inode)
+{
+	struct v9fs_debug *node;
+
+	mutex_lock(&v9fs_debug_tree_mutex);
+	node = v9fs_debug_search(inode);
+	if (node) {
+		rb_erase(&node->node, &v9fs_debug_tree);
+		kfree(node);
+	} else
+		printk("[DEBUG] can't find %ld\n", inode);
+	mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
+void v9fs_debug_insert(unsigned long inode, u8 type, u32 version, u64 path)
+{
+	struct v9fs_debug *db = NULL;
+	struct rb_node **node, *parent = NULL;
+
+	mutex_lock(&v9fs_debug_tree_mutex);
+	node = &(v9fs_debug_tree.rb_node);
+	while (*node) {
+		parent = *node;
+		db = container_of(*node, struct v9fs_debug, node);
+
+		if (db->inode < inode)
+			node = &((*node)->rb_left);
+		else if (db->inode > inode)
+			node = &((*node)->rb_right);
+		else
+			break;
+	}
+	if (!db || (db->inode != inode)) {
+		db = kmalloc(sizeof(*db), GFP_KERNEL);
+		if (db) {
+			db->inode = inode;
+			db->type = type;
+			db->version = version;
+			db->path = path;
+			db->removing = false;
+			rb_link_node(&db->node, parent, node);
+			rb_insert_color(&db->node, &v9fs_debug_tree);
+		} else
+			printk("[DEBUG] Failed to alloc memory!\n");
+	}
+	if (db) {
+		if (type != db->type ||
+		    version != db->version ||
+		    path != db->path) {
+			db->inode = inode;
+			db->type = type;
+			db->version = version;
+			db->path = path;
+		}
+	}
+	mutex_unlock(&v9fs_debug_tree_mutex);
+}
+
 static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 					struct p9_qid *qid,
 					struct p9_fid *fid,
@@ -134,6 +225,7 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 	if (retval)
 		goto error;
 
+	v9fs_debug_insert(inode->i_ino, qid->type, qid->version, qid->path);
 	unlock_new_inode(inode);
 	return inode;
 error:

[-- Attachment #3: Type: text/plain, Size: 620 bytes --]


> And yeah, I had checked other filesystems too and they seem to follow this
> pattern.  So either I'm looking at the wrong place or this is something
> that is much more likely to be triggered by 9p.
>
> Cheers,
> -- 
> Luis
>
>>> If this theory is correct, I'm not sure what's the best way to close this
>>> race because the v9inode->fscache_lock can't really be used.  But I still
>>> need to proof this is really what's happening.
>>
>> Yes, I think we're going to need proof before thinking of a solution, I
>> can't think of anything simple either.
>>
>>
>> Thanks again for looking into it,
>> -- 
>> Dominique


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-06 10:03 9p: fscache duplicate cookie Luis Henriques
  2021-05-06 10:45 ` Dominique Martinet
@ 2021-05-11 12:38 ` David Howells
  2021-05-12 10:07   ` Luis Henriques
                     ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: David Howells @ 2021-05-11 12:38 UTC (permalink / raw)
  To: Luis Henriques
  Cc: dhowells, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, linux-fsdevel, v9fs-developer

Hi Luis,

> I've been seeing fscache complaining about duplicate cookies in 9p:
> 
>  FS-Cache: Duplicate cookie detected
>  FS-Cache: O-cookie c=00000000ba929e80 [p=000000002e706df1 fl=226 nc=0 na=1]

This cookie is marked acquired (fl=2xx), but not relinquished (fl=4xx), so it
would still seem to be active:-/.  Pretty much one of the first things
__fscache_relinquish_cookie() does is to set that bit (should be bit 10).

One thing that might be useful is if you can turn on a couple of fscache
tracepoints:

echo 1 >/sys/kernel/debug/tracing/events/fscache/fscache_acquire/enable
echo 1 >/sys/kernel/debug/tracing/events/fscache/fscache_relinquish/enable

The cookie pointers in the duplicate cookie report should match the entries in
the trace output.

Note that my fscache-iter branch[1] improves the situation where the disk I/O
required to effect the destruction of a cache object delays the completion of
relinquishment by inserting waits, but that oughtn't to help here.

David

[1] https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-10 10:54         ` Luis Henriques
  2021-05-10 11:47           ` Luis Henriques
@ 2021-05-11 12:44           ` David Howells
  2021-05-12 10:10             ` Luis Henriques
  1 sibling, 1 reply; 23+ messages in thread
From: David Howells @ 2021-05-11 12:44 UTC (permalink / raw)
  To: Luis Henriques
  Cc: dhowells, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, linux-fsdevel, v9fs-developer

Luis Henriques <lhenriques@suse.de> wrote:

> +		if (data->inode < inode)
> +			node = node->rb_left;
> +		else if (data->inode > inode)
> +			node = node->rb_right;

If you're just using a plain integer as the key into your debug tree, an
xarray, IDA or IDR might be easier to use.

David


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-08  0:47       ` Dominique Martinet
  2021-05-10 10:54         ` Luis Henriques
@ 2021-05-11 12:53         ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: David Howells @ 2021-05-11 12:53 UTC (permalink / raw)
  To: Luis Henriques
  Cc: dhowells, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, linux-fsdevel, v9fs-developer

Luis Henriques <lhenriques@suse.de> wrote:

> I've just done a quick experiment: moving the call to function
> v9fs_cache_inode_put_cookie() in v9fs_evict_inode() to the beginning
> (before truncate_inode_pages_final()) and it seems to, at least, narrow
> the window -- I'm not able to reproduce the issue anymore.  But I'll need
> to look closer.

The cookie needs to be relinquished after the pagecache has been cleared such
that any outstanding writes have been dealt with as ->invalidatepage() may
refer to the cookie in order to ditch references to pages.

David


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-11 12:38 ` David Howells
@ 2021-05-12 10:07   ` Luis Henriques
  2021-05-12 11:04   ` David Howells
  2021-05-12 11:09   ` 9p: fscache duplicate cookie David Howells
  2 siblings, 0 replies; 23+ messages in thread
From: Luis Henriques @ 2021-05-12 10:07 UTC (permalink / raw)
  To: David Howells
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet,
	linux-fsdevel, v9fs-developer

David Howells <dhowells@redhat.com> writes:

> Hi Luis,

Hi David,

(thanks for the feedback and sorry for the delay replying -- in the
meantime I had problems with my test environment that I had to fix first.)

>> I've been seeing fscache complaining about duplicate cookies in 9p:
>> 
>>  FS-Cache: Duplicate cookie detected
>>  FS-Cache: O-cookie c=00000000ba929e80 [p=000000002e706df1 fl=226 nc=0 na=1]
>
> This cookie is marked acquired (fl=2xx), but not relinquished (fl=4xx), so it
> would still seem to be active:-/.  Pretty much one of the first things
> __fscache_relinquish_cookie() does is to set that bit (should be bit 10).
>
> One thing that might be useful is if you can turn on a couple of fscache
> tracepoints:
>
> echo 1 >/sys/kernel/debug/tracing/events/fscache/fscache_acquire/enable
> echo 1 >/sys/kernel/debug/tracing/events/fscache/fscache_relinquish/enable
>
> The cookie pointers in the duplicate cookie report should match the entries in
> the trace output.

Awesome, thanks for the suggestion.  So, here's what I did: I enabled the
2 tracepoints you suggested *and* I've also enabled the 'fscache_cookie'
tracepoint so that I would see the collision in the trace.  I then started
my test and after I see the duplicated cookie warning I stopped it.  And
here's what I got in the logs:

[   29.702428] FS-Cache: Duplicate cookie detected
[   29.703859] FS-Cache: O-cookie c=0000000046a8b5b3 [p=00000000b2d5d5a9 fl=216 nc=0 na=0]
[   29.705922] FS-Cache: O-cookie d=0000000000000000 n=0000000000000000
[   29.706996] FS-Cache: O-key=[8] '0312710100000000'
[   29.707597] FS-Cache: N-cookie c=000000002b6a2db1 [p=00000000b2d5d5a9 fl=2 nc=0 na=1]
[   29.708712] FS-Cache: N-cookie d=000000006a7abbdd n=00000000188a0c4c
[   29.709568] FS-Cache: N-key=[8] '0312710100000000'

and in the trace ('=>' identifies the old cookie):

=>         <...>-2712    [000] ...1    29.551257: fscache_acquire: c=0000000046a8b5b3 p=00000000b2d5d5a9 pu=50 pc=49 pf=22 n=9p.inod
           <...>-2820    [000] ...1    29.707083: fscache_relinquish: c=000000003ca97471 u=1 p=00000000b2d5d5a9 Nc=0 Na=1 f=26 r=0
           <...>-2820    [000] ...1    29.707087: fscache_cookie: PUT rlq c=000000003ca97471 u=0 p=00000000b2d5d5a9 Nc=0 Na=0 f=16
           <...>-2820    [000] ...1    29.707090: fscache_cookie: PUT prn c=00000000b2d5d5a9 u=49 p=000000005cf9f76e Nc=48 Na=1 f=22
           <...>-2821    [002] ...2    29.709010: fscache_cookie: GET prn c=00000000b2d5d5a9 u=50 p=000000005cf9f76e Nc=48 Na=1 f=22
           <...>-2821    [002] ...1    29.709011: fscache_acquire: c=00000000ca586ae6 p=00000000b2d5d5a9 pu=50 pc=49 pf=22 n=9p.inod
           <...>-2847    [002] ...1    29.750224: fscache_relinquish: c=00000000ca586ae6 u=1 p=00000000b2d5d5a9 Nc=0 Na=1 f=26 r=0
           <...>-2847    [002] ...1    29.750228: fscache_cookie: PUT rlq c=00000000ca586ae6 u=0 p=00000000b2d5d5a9 Nc=0 Na=0 f=16
           <...>-2847    [002] ...1    29.750230: fscache_cookie: PUT prn c=00000000b2d5d5a9 u=49 p=000000005cf9f76e Nc=48 Na=1 f=22
           <...>-2851    [003] ...2    29.751561: fscache_cookie: GET prn c=00000000b2d5d5a9 u=50 p=000000005cf9f76e Nc=48 Na=1 f=22
           <...>-2851    [003] ...1    29.751562: fscache_acquire: c=0000000019a5a5e8 p=00000000b2d5d5a9 pu=50 pc=49 pf=22 n=9p.inod
           <...>-2863    [001] ...1    29.771136: fscache_relinquish: c=0000000019a5a5e8 u=1 p=00000000b2d5d5a9 Nc=0 Na=1 f=26 r=0
           <...>-2863    [001] ...1    29.771140: fscache_cookie: PUT rlq c=0000000019a5a5e8 u=0 p=00000000b2d5d5a9 Nc=0 Na=0 f=16
           <...>-2863    [001] ...1    29.771143: fscache_cookie: PUT prn c=00000000b2d5d5a9 u=49 p=000000005cf9f76e Nc=48 Na=1 f=22
           <...>-2865    [002] ...2    29.773439: fscache_cookie: GET prn c=00000000b2d5d5a9 u=50 p=000000005cf9f76e Nc=48 Na=1 f=22
           <...>-2865    [002] ...1    29.773440: fscache_acquire: c=00000000ca586ae6 p=00000000b2d5d5a9 pu=50 pc=49 pf=22 n=9p.inod
=>         <...>-2869    [003] ...2    29.779784: fscache_cookie: *COLLISION* c=0000000046a8b5b3 u=1 p=00000000b2d5d5a9 Nc=0 Na=1 f=26
           <...>-2876    [002] ...1    29.779966: fscache_relinquish: c=0000000046a8b5b3 u=1 p=00000000b2d5d5a9 Nc=0 Na=1 f=26 r=0
           <...>-2876    [002] .N.1    29.779969: fscache_cookie: PUT rlq c=0000000046a8b5b3 u=0 p=00000000b2d5d5a9 Nc=0 Na=0 f=16
           <...>-2869    [003] .N.1    29.787568: fscache_cookie: DISCARD c=000000002b6a2db1 u=1 p=00000000b2d5d5a9 Nc=0 Na=1 f=02

[ I wonder why the timestamps don't match between the traces and the
  kernel log... ]

So, can we infer from this trace that an evict could actually be on-going
but the old cookie wasn't relinquished yet and hence the collision?

> Note that my fscache-iter branch[1] improves the situation where the disk I/O
> required to effect the destruction of a cache object delays the completion of
> relinquishment by inserting waits, but that oughtn't to help here.

Right, I haven't looked at it yet (I'll try to) but that could make things
even worse, right?

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-11 12:44           ` David Howells
@ 2021-05-12 10:10             ` Luis Henriques
  0 siblings, 0 replies; 23+ messages in thread
From: Luis Henriques @ 2021-05-12 10:10 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	linux-fsdevel, v9fs-developer

David Howells <dhowells@redhat.com> writes:

> Luis Henriques <lhenriques@suse.de> wrote:
>
>> +		if (data->inode < inode)
>> +			node = node->rb_left;
>> +		else if (data->inode > inode)
>> +			node = node->rb_right;
>
> If you're just using a plain integer as the key into your debug tree, an
> xarray, IDA or IDR might be easier to use.

Yep, xarray actually crossed my mind but rbtrees were still fresh in my
memory.  I'll look into the xarray API next time (which is likely to be
much simpler, I know).

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-11 12:38 ` David Howells
  2021-05-12 10:07   ` Luis Henriques
@ 2021-05-12 11:04   ` David Howells
  2021-05-12 11:58     ` Luis Henriques
  2021-05-12 11:09   ` 9p: fscache duplicate cookie David Howells
  2 siblings, 1 reply; 23+ messages in thread
From: David Howells @ 2021-05-12 11:04 UTC (permalink / raw)
  To: Luis Henriques
  Cc: dhowells, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, linux-fsdevel, v9fs-developer

Luis Henriques <lhenriques@suse.de> wrote:

> [ I wonder why the timestamps don't match between the traces and the
>   kernel log... ]

I've seen that.  I wonder if the timestamping of printk lines is delayed by
the serial driver outputting things.

> So, can we infer from this trace that an evict could actually be on-going
> but the old cookie wasn't relinquished yet and hence the collision?

It might be illuminating if you can make it print a traceline at the beginning
of v9fs_evict_inode() and in v9fs_drop_inode().  Print the cookie pointer in
both.

David


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-11 12:38 ` David Howells
  2021-05-12 10:07   ` Luis Henriques
  2021-05-12 11:04   ` David Howells
@ 2021-05-12 11:09   ` David Howells
  2 siblings, 0 replies; 23+ messages in thread
From: David Howells @ 2021-05-12 11:09 UTC (permalink / raw)
  To: Luis Henriques
  Cc: dhowells, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet, linux-fsdevel, v9fs-developer

Luis Henriques <lhenriques@suse.de> wrote:

> > Note that my fscache-iter branch[1] improves the situation where the disk
> > I/O required to effect the destruction of a cache object delays the
> > completion of relinquishment by inserting waits, but that oughtn't to help
> > here.
> 
> Right, I haven't looked at it yet (I'll try to) but that could make things
> even worse, right?

It forces fscache_acquire_cookie() to wait for cookie relinquishment to
complete sufficiently; however, that doesn't seem to be the problem here since
the cookie hasn't been relinquished yet.

David


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-12 11:04   ` David Howells
@ 2021-05-12 11:58     ` Luis Henriques
  2021-05-12 12:26       ` Dominique Martinet
  2021-05-12 12:57       ` What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie] David Howells
  0 siblings, 2 replies; 23+ messages in thread
From: Luis Henriques @ 2021-05-12 11:58 UTC (permalink / raw)
  To: David Howells
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet,
	linux-fsdevel, v9fs-developer

David Howells <dhowells@redhat.com> writes:

> Luis Henriques <lhenriques@suse.de> wrote:
>
>> [ I wonder why the timestamps don't match between the traces and the
>>   kernel log... ]
>
> I've seen that.  I wonder if the timestamping of printk lines is delayed by
> the serial driver outputting things.
>
>> So, can we infer from this trace that an evict could actually be on-going
>> but the old cookie wasn't relinquished yet and hence the collision?
>
> It might be illuminating if you can make it print a traceline at the beginning
> of v9fs_evict_inode() and in v9fs_drop_inode().  Print the cookie pointer in
> both.

Ok, here's what I'm getting:

...
<...>-20575   [002] ...1    67.519214: fscache_acquire: c=0000000097476aaa p=000000003080d900 pu=50 pc=49 pf=22 n=9p.inod
<...>-20585   [003] ...2    67.535091: 9p_fscache_cookie: v9fs_drop_inode cookie: 00000000cd0099b3
<...>-20585   [003] ...1    67.535093: 9p_fscache_cookie: v9fs_evict_inode cookie: 00000000cd0099b3
<...>-20585   [003] ...1    67.535115: fscache_relinquish: c=00000000cd0099b3 u=1 p=000000003080d900 Nc=0 Na=1 f=26 r=0
<...>-20585   [003] ...1    67.535118: fscache_cookie: PUT rlq c=00000000cd0099b3 u=0 p=000000003080d900 Nc=0 Na=0 f=16
<...>-20585   [003] ...1    67.535120: fscache_cookie: PUT prn c=000000003080d900 u=49 p=0000000042542ee5 Nc=48 Na=1 f=22
<...>-20591   [000] ...2    67.538644: fscache_cookie: GET prn c=000000003080d900 u=50 p=0000000042542ee5 Nc=48 Na=1 f=22
<...>-20591   [000] ...1    67.538645: fscache_acquire: c=0000000011fa06b1 p=000000003080d900 pu=50 pc=49 pf=22 n=9p.inod
<...>-20599   [003] .N.2    67.542180: 9p_fscache_cookie: v9fs_drop_inode cookie: 0000000097476aaa
<...>-20599   [003] .N.1    67.542181: 9p_fscache_cookie: v9fs_evict_inode cookie: 0000000097476aaa
<...>-20591   [000] ...2    67.542980: fscache_cookie: *COLLISION* c=0000000097476aaa u=1 p=000000003080d900 Nc=0 Na=1 f=26
<...>-20599   [003] ...1    67.543098: fscache_relinquish: c=0000000097476aaa u=1 p=000000003080d900 Nc=0 Na=1 f=26 r=0
<...>-20599   [003] ...1    67.543100: fscache_cookie: PUT rlq c=0000000097476aaa u=0 p=000000003080d900 Nc=0 Na=0 f=16

(Note that I'm only tracing v9fs_{drop,evict}_inode if we have a cookie
for the inode; there are a bunch of drop/evict calls where the cookie is
NULL.)

So, this is... annoying, I guess.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 9p: fscache duplicate cookie
  2021-05-12 11:58     ` Luis Henriques
@ 2021-05-12 12:26       ` Dominique Martinet
  2021-05-12 12:57       ` What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie] David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: Dominique Martinet @ 2021-05-12 12:26 UTC (permalink / raw)
  To: Luis Henriques
  Cc: David Howells, Eric Van Hensbergen, Latchesar Ionkov,
	linux-fsdevel, v9fs-developer

Luis Henriques wrote on Wed, May 12, 2021 at 12:58:58PM +0100:
> <...>-20591   [000] ...2    67.538644: fscache_cookie: GET prn c=000000003080d900 u=50 p=0000000042542ee5 Nc=48 Na=1 f=22
> <...>-20591   [000] ...1    67.538645: fscache_acquire: c=0000000011fa06b1 p=000000003080d900 pu=50 pc=49 pf=22 n=9p.inod
> <...>-20599   [003] .N.2    67.542180: 9p_fscache_cookie: v9fs_drop_inode cookie: 0000000097476aaa
> [...]
>
> So, this is... annoying, I guess.

Oh, this actually looks different from what I had in mind.

So if I'm reading this right, the dup acquire happens before drop on
another thread, meaning iget5_locked somehow returned an inode with
I_NEW on same i_ino than that of the inode that is dropped later?...

How much trust can we actually put in trace ordering off different cpus?
My theory would really have wanted just that drop before the acquire :D



Anyway, I think there's no room for doubt that it's possible to get a
new inode for the same underlying file before the evict finished; which
leaves room for a few questions:
 - as David brought up on IRC (#linuxfs@OFTC), what about the flushing
of dirty data that happens in evict()? wouldn't it be possible for
operations on the new inode to read stale data while the old inode is
being flushed? I think that warrants asking someone who understands this
better than me as it's probably not 9p specific even if 9p makes it
easier to get a new inode in such a racy way...

 - for 9p in particular, Christian Schoenebeck (helping with 9p in qemu)
brought up that we evict inodes too fast too often, so I think it'd help
to have some sort of inode lifetime management and keep inodes alive for
a bit.
As a network filesystem with no coherency built in the protocol I don't
think we can afford to keep inodes cached too long, and I know some
servers have troubles if we keep too many fids open, but it would be
nice to have a few knobs to just keep inodes around a bit longer... This
won't solve the fundamental problem but if the inode isn't evicted at a
point where it's likely to be used again then this particular problem
should be much harder to hit (like other filesystems, actually :P)

I'm not sure how that works though, and won't have much time to work on
it short term anyway, but it's an idea :/

-- 
Dominique

^ permalink raw reply	[flat|nested] 23+ messages in thread

* What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie]
  2021-05-12 11:58     ` Luis Henriques
  2021-05-12 12:26       ` Dominique Martinet
@ 2021-05-12 12:57       ` David Howells
  2021-05-12 13:45         ` Al Viro
  2021-05-12 14:12         ` David Howells
  1 sibling, 2 replies; 23+ messages in thread
From: David Howells @ 2021-05-12 12:57 UTC (permalink / raw)
  To: Alexander Viro
  Cc: dhowells, Dominique Martinet, Luis Henriques,
	Eric Van Hensbergen, Latchesar Ionkov, linux-fsdevel,
	v9fs-developer

Hi Al,

We're seeing cases where fscache is reporting cookie collisions that appears
to be due to ->evict_inode() running parallel with a new inode for the same
filesystem object getting set up.

What's happening is that in all of 9p, afs, ceph, cifs and nfs, the fscache
cookie is being relinquished in ->evict_inode(), but that appears to be too
late because by that time, a new inode can be being allocated and a new cookie
get acquired.

evict_inode is a slow process, potentially, because it has to dismantle the
pagecache and wait for any outstanding DMA to the cache; then seal the cache
object - which involves a synchronous journalled op (setxattr), which means
that there's a lot of scope for a race.

Is there a better place to this?  drop_inode() maybe?  And is there a good
place to wait on all the DMAs that might be in progress to/from the cache?
(This might involve waiting for PG_locked or PG_fscache to be cleared on each
page).

David


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie]
  2021-05-12 12:57       ` What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie] David Howells
@ 2021-05-12 13:45         ` Al Viro
  2021-05-12 14:12         ` David Howells
  1 sibling, 0 replies; 23+ messages in thread
From: Al Viro @ 2021-05-12 13:45 UTC (permalink / raw)
  To: David Howells
  Cc: Dominique Martinet, Luis Henriques, Eric Van Hensbergen,
	Latchesar Ionkov, linux-fsdevel, v9fs-developer

On Wed, May 12, 2021 at 01:57:11PM +0100, David Howells wrote:
> Hi Al,
> 
> We're seeing cases where fscache is reporting cookie collisions that appears
> to be due to ->evict_inode() running parallel with a new inode for the same
> filesystem object getting set up.

Huh?  Details, please.  What we are guaranteed is that iget{,5}_locked() et.al.
on the same object will either prevent the call of ->evict_inode() (if they
manage to grab the sucker before I_FREEING is set) or will wait until after
->evict_inode() returns.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie]
  2021-05-12 12:57       ` What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie] David Howells
  2021-05-12 13:45         ` Al Viro
@ 2021-05-12 14:12         ` David Howells
  2021-05-14 16:10           ` Luis Henriques
  1 sibling, 1 reply; 23+ messages in thread
From: David Howells @ 2021-05-12 14:12 UTC (permalink / raw)
  To: Al Viro
  Cc: dhowells, Dominique Martinet, Luis Henriques,
	Eric Van Hensbergen, Latchesar Ionkov, linux-fsdevel,
	v9fs-developer

Al Viro <viro@zeniv.linux.org.uk> wrote:

> > We're seeing cases where fscache is reporting cookie collisions that appears
> > to be due to ->evict_inode() running parallel with a new inode for the same
> > filesystem object getting set up.
> 
> Huh?  Details, please.  What we are guaranteed is that iget{,5}_locked() et.al.
> on the same object will either prevent the call of ->evict_inode() (if they
> manage to grab the sucker before I_FREEING is set) or will wait until after
> ->evict_inode() returns.

See the trace from Luis in:

	https://lore.kernel.org/linux-fsdevel/87fsysyxh9.fsf@suse.de/

It appears that process 20591 manages to set up a new inode that has the same
key parameters as the one process 20585 is tearing down.

0000000097476aaa is the cookie pointer used by the old inode.
0000000011fa06b1 is the cookie pointer used by the new inode.
000000003080d900 is the cookie pointer for the parent superblock.

The fscache_acquire traceline emission is caused by one of:

 (*) v9fs_qid_iget() or v9fs_qid_iget_dotl() calling
     v9fs_cache_inode_get_cookie().

 (*) v9fs_file_open*(O_RDONLY) or v9fs_vfs_atomic_open*(O_RDONLY) calling
     v9fs_cache_inode_set_cookie().

 (*) v9fs_cache_inode_reset_cookie(), which appears unused.

The fscache_relinquish traceline emission is caused by one of:

 (*) v9fs_file_open(O_RDWR/O_WRONLY) or v9fs_vfs_atomic_open(O_RDWR/O_WRONLY)
     calling v9fs_cache_inode_set_cookie().

 (*) v9fs_evict_inode() calling v9fs_cache_inode_put_cookie().

 (*) v9fs_cache_inode_reset_cookie(), which appears unused.

From the backtrace in:

	https://lore.kernel.org/linux-fsdevel/87czu45gcs.fsf@suse.de/

the acquisition is being triggered in v9fs_vfs_atomic_open_dotl(), so it seems
v9fs_qid_iget_dotl() already happened - which *should* have created the
cookie.

So it seems more complicated than I thought.

David


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie]
  2021-05-12 14:12         ` David Howells
@ 2021-05-14 16:10           ` Luis Henriques
  2021-05-14 21:16             ` Dominique Martinet
  0 siblings, 1 reply; 23+ messages in thread
From: Luis Henriques @ 2021-05-14 16:10 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, linux-fsdevel, v9fs-developer

David Howells <dhowells@redhat.com> writes:

> Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>> > We're seeing cases where fscache is reporting cookie collisions that appears
>> > to be due to ->evict_inode() running parallel with a new inode for the same
>> > filesystem object getting set up.
>> 
>> Huh?  Details, please.  What we are guaranteed is that iget{,5}_locked() et.al.
>> on the same object will either prevent the call of ->evict_inode() (if they
>> manage to grab the sucker before I_FREEING is set) or will wait until after
>> ->evict_inode() returns.
>
> See the trace from Luis in:
>
> 	https://lore.kernel.org/linux-fsdevel/87fsysyxh9.fsf@suse.de/
>
> It appears that process 20591 manages to set up a new inode that has the same
> key parameters as the one process 20585 is tearing down.
>
> 0000000097476aaa is the cookie pointer used by the old inode.
> 0000000011fa06b1 is the cookie pointer used by the new inode.
> 000000003080d900 is the cookie pointer for the parent superblock.
>
> The fscache_acquire traceline emission is caused by one of:
>
>  (*) v9fs_qid_iget() or v9fs_qid_iget_dotl() calling
>      v9fs_cache_inode_get_cookie().
>
>  (*) v9fs_file_open*(O_RDONLY) or v9fs_vfs_atomic_open*(O_RDONLY) calling
>      v9fs_cache_inode_set_cookie().
>
>  (*) v9fs_cache_inode_reset_cookie(), which appears unused.
>
> The fscache_relinquish traceline emission is caused by one of:
>
>  (*) v9fs_file_open(O_RDWR/O_WRONLY) or v9fs_vfs_atomic_open(O_RDWR/O_WRONLY)
>      calling v9fs_cache_inode_set_cookie().
>
>  (*) v9fs_evict_inode() calling v9fs_cache_inode_put_cookie().
>
>  (*) v9fs_cache_inode_reset_cookie(), which appears unused.
>
> From the backtrace in:
>
> 	https://lore.kernel.org/linux-fsdevel/87czu45gcs.fsf@suse.de/
>
> the acquisition is being triggered in v9fs_vfs_atomic_open_dotl(), so it seems
> v9fs_qid_iget_dotl() already happened - which *should* have created the
> cookie.

So, from our last chat on IRC, we have the following happening:

v9fs_vfs_atomic_open_dotl
  v9fs_vfs_lookup
    v9fs_get_new_inode_from_fid
      v9fs_inode_from_fid_dotl
        v9fs_qid_iget_dotl

At this point, iget5_locked() gets called with the test function set to
v9fs_test_new_inode_dotl(), which *always* returns 0.  It's still not
clear to me why commit ed80fcfac256 ("fs/9p: Always ask new inode in
create") has introduced this behavior but even if that's not correct, we
still have a race regarding cookies handling, right?

I'm still seeing:

CPU0                     CPU1
v9fs_drop_inode          ...
v9fs_evict_inode         /* atomic_open */
                         v9fs_cache_inode_get_cookie <= COLLISION
fscache_relinquish

So, the question remains: would it be possible to do the relinquish
earlier (->drop_inode)?  Or is 9p really shooting itself in the foot by
forcing iget5_locked() to always create a new inode here?

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie]
  2021-05-14 16:10           ` Luis Henriques
@ 2021-05-14 21:16             ` Dominique Martinet
  2021-05-17 15:56               ` Luis Henriques
  2021-05-17 17:39               ` Aneesh Kumar K.V
  0 siblings, 2 replies; 23+ messages in thread
From: Dominique Martinet @ 2021-05-14 21:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Luis Henriques, David Howells, Al Viro, Eric Van Hensbergen,
	Latchesar Ionkov, linux-fsdevel, v9fs-developer

Hi Aneesh,

I'm going to rely on your memory here... A long, long time ago (2011!),
you've authored this commit:
-------
commit ed80fcfac2565fa866d93ba14f0e75de17a8223e
Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date:   Wed Jul 6 16:32:31 2011 +0530

    fs/9p: Always ask new inode in create
    
    This make sure we don't end up reusing the unlinked inode object.
    The ideal way is to use inode i_generation. But i_generation is
    not available in userspace always.
    
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
    Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
-------

Do you happen to remember or know *why* you wanted to make sure we don't
reuse the unlinked inode object?

I'm asking because that's causing problems with (at least) fscache
cookie, iget5_locked() gets us a new inode in v9fs_qid_iget_dotl()
and tries to get a new cookie before the evict has happened and
relinquished the former inode's.
There's also problems with coherency in sight -- evict is also in charge
of flushing all dirty pages, so the new inode can in theory issue IO and
read from server data which has been written by another process on the
same client and while 9p isn't known for coherency with multiple clients
it's a different story with a single one! (didn't reproduce that one)

Anyway, it'd be great to know why you did that so we can try another
workaround.
In theory I'd love to have qemu and others export fsid + a fhandle from
name_to_handle_at that includes i_generation and full inode number in
the qid path, but we're limited by the 64bits of the protocol so it's a
tough one... In practice I don't see generation being used all that much
by filesystems to reuse inode numbers though, so wondering which is the
most problematic?



You can find the rest of the thread here if you're not subscribed to
v9fs-developer or linux-fsdevel:
https://lkml.kernel.org/r/87czu45gcs.fsf@suse.de



Luis Henriques wrote on Fri, May 14, 2021 at 05:10:56PM +0100:
> So, from our last chat on IRC, we have the following happening:
> 
> v9fs_vfs_atomic_open_dotl
>   v9fs_vfs_lookup
>     v9fs_get_new_inode_from_fid
>       v9fs_inode_from_fid_dotl
>         v9fs_qid_iget_dotl
> 
> At this point, iget5_locked() gets called with the test function set to
> v9fs_test_new_inode_dotl(), which *always* returns 0.  It's still not
> clear to me why commit ed80fcfac256 ("fs/9p: Always ask new inode in
> create") has introduced this behavior but even if that's not correct, we
> still have a race regarding cookies handling, right?
> 
> I'm still seeing:
> 
> CPU0                     CPU1
> v9fs_drop_inode          ...
> v9fs_evict_inode         /* atomic_open */
>                          v9fs_cache_inode_get_cookie <= COLLISION
> fscache_relinquish

Do you mean you still have that problem after ed80fcfac256 has been
reverted?

> So, the question remains: would it be possible to do the relinquish
> earlier (->drop_inode)?  Or is 9p really shooting itself in the foot by
> forcing iget5_locked() to always create a new inode here?

Ugh that is the kind of things I don't want to experiment with...
->drop_inode() seems to be called with i_lock taken and meant to be just
a test, not something that can wait, but from what I'm reading it might
be possible to set I_WILL_FREE, drop the lock, do our stuff and reaquire
the lock at the end.. perhaps? It looks like inode lookup will just loop
around on ilookup5_nowait while I_WILL_FREE is set so new inodes can't
be taken at this point, it's a de-facto spin lock with iget5_locked and
friends.
I have no idea what will break though, I'd really rather leave it to the
vfs and have 9p do the right thing with inode recycling.

-- 
Dominique

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie]
  2021-05-14 21:16             ` Dominique Martinet
@ 2021-05-17 15:56               ` Luis Henriques
  2021-05-17 17:39               ` Aneesh Kumar K.V
  1 sibling, 0 replies; 23+ messages in thread
From: Luis Henriques @ 2021-05-17 15:56 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Aneesh Kumar K.V, David Howells, Al Viro, Eric Van Hensbergen,
	Latchesar Ionkov, linux-fsdevel, v9fs-developer

Dominique Martinet <asmadeus@codewreck.org> writes:

> Hi Aneesh,
>
> I'm going to rely on your memory here... A long, long time ago (2011!),
> you've authored this commit:
> -------
> commit ed80fcfac2565fa866d93ba14f0e75de17a8223e
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date:   Wed Jul 6 16:32:31 2011 +0530
>
>     fs/9p: Always ask new inode in create
>     
>     This make sure we don't end up reusing the unlinked inode object.
>     The ideal way is to use inode i_generation. But i_generation is
>     not available in userspace always.
>     
>     Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>     Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
> -------
>
> Do you happen to remember or know *why* you wanted to make sure we don't
> reuse the unlinked inode object?
>
> I'm asking because that's causing problems with (at least) fscache
> cookie, iget5_locked() gets us a new inode in v9fs_qid_iget_dotl()
> and tries to get a new cookie before the evict has happened and
> relinquished the former inode's.
> There's also problems with coherency in sight -- evict is also in charge
> of flushing all dirty pages, so the new inode can in theory issue IO and
> read from server data which has been written by another process on the
> same client and while 9p isn't known for coherency with multiple clients
> it's a different story with a single one! (didn't reproduce that one)
>
> Anyway, it'd be great to know why you did that so we can try another
> workaround.
> In theory I'd love to have qemu and others export fsid + a fhandle from
> name_to_handle_at that includes i_generation and full inode number in
> the qid path, but we're limited by the 64bits of the protocol so it's a
> tough one... In practice I don't see generation being used all that much
> by filesystems to reuse inode numbers though, so wondering which is the
> most problematic?
>
>
>
> You can find the rest of the thread here if you're not subscribed to
> v9fs-developer or linux-fsdevel:
> https://lkml.kernel.org/r/87czu45gcs.fsf@suse.de
>
>
>
> Luis Henriques wrote on Fri, May 14, 2021 at 05:10:56PM +0100:
>> So, from our last chat on IRC, we have the following happening:
>> 
>> v9fs_vfs_atomic_open_dotl
>>   v9fs_vfs_lookup
>>     v9fs_get_new_inode_from_fid
>>       v9fs_inode_from_fid_dotl
>>         v9fs_qid_iget_dotl
>> 
>> At this point, iget5_locked() gets called with the test function set to
>> v9fs_test_new_inode_dotl(), which *always* returns 0.  It's still not
>> clear to me why commit ed80fcfac256 ("fs/9p: Always ask new inode in
>> create") has introduced this behavior but even if that's not correct, we
>> still have a race regarding cookies handling, right?
>> 
>> I'm still seeing:
>> 
>> CPU0                     CPU1
>> v9fs_drop_inode          ...
>> v9fs_evict_inode         /* atomic_open */
>>                          v9fs_cache_inode_get_cookie <= COLLISION
>> fscache_relinquish
>
> Do you mean you still have that problem after ed80fcfac256 has been
> reverted?

No, I couldn't reproduce the issue after changing v9fs_qid_iget_dotl() to
never use v9fs_test_new_inode_dotl in the iget5_locked() test.  (So,
technically I didn't reverted that commit but the effect should be the
same.)

>> So, the question remains: would it be possible to do the relinquish
>> earlier (->drop_inode)?  Or is 9p really shooting itself in the foot by
>> forcing iget5_locked() to always create a new inode here?
>
> Ugh that is the kind of things I don't want to experiment with...
> ->drop_inode() seems to be called with i_lock taken and meant to be just
> a test, not something that can wait, but from what I'm reading it might
> be possible to set I_WILL_FREE, drop the lock, do our stuff and reaquire
> the lock at the end.. perhaps? It looks like inode lookup will just loop
> around on ilookup5_nowait while I_WILL_FREE is set so new inodes can't
> be taken at this point, it's a de-facto spin lock with iget5_locked and
> friends.
> I have no idea what will break though, I'd really rather leave it to the
> vfs and have 9p do the right thing with inode recycling.

Right, the fscache code would definitely need to be changed in this case.
Which makes me wonder if the following would be possible/acceptable:

Add a function fscache_reliquish_cookie_begin() that would set
FSCACHE_COOKIE_RELINQUISHING (currently unused AFAICS) in the cookie
flag.  This would allow ->drop_inode() to signal that a cookie is about to
be relinquished.

Then, fscache_hash_cookie() could do something like:

 	hlist_bl_lock(h);
 	hlist_bl_for_each_entry(cursor, p, h, hash_link) {
-		if (fscache_compare_cookie(candidate, cursor) == 0)
+		if ((fscache_compare_cookie(candidate, cursor) == 0) &&
+		    (!test_bit(FSCACHE_COOKIE_RELINQUISHING, &cursor->flags)))
 			goto collision;
 	}

Yeah, I'm sure the logic in fscache_acquire_cookie() would need to be
reworked and the old cookie would need to be removed from the list so that
the new one could be inserted.  And probably other places in the fscache
code would need to take the _RELINQUISHING flag into account.  But I guess
something like this could fix the race (although I'm not familiar with the
fscache code and this idea may be utterly wrong).

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie]
  2021-05-14 21:16             ` Dominique Martinet
  2021-05-17 15:56               ` Luis Henriques
@ 2021-05-17 17:39               ` Aneesh Kumar K.V
  1 sibling, 0 replies; 23+ messages in thread
From: Aneesh Kumar K.V @ 2021-05-17 17:39 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Luis Henriques, David Howells, Al Viro, Eric Van Hensbergen,
	Latchesar Ionkov, linux-fsdevel, v9fs-developer

On 5/15/21 2:46 AM, Dominique Martinet wrote:
> Hi Aneesh,
> 
> I'm going to rely on your memory here... A long, long time ago (2011!),
> you've authored this commit:
> -------
> commit ed80fcfac2565fa866d93ba14f0e75de17a8223e
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Date:   Wed Jul 6 16:32:31 2011 +0530
> 
>      fs/9p: Always ask new inode in create
>      
>      This make sure we don't end up reusing the unlinked inode object.
>      The ideal way is to use inode i_generation. But i_generation is
>      not available in userspace always.
>      
>      Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>      Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
> -------
> 
> Do you happen to remember or know *why* you wanted to make sure we don't
> reuse the unlinked inode object?
> 
>

Sorry, I don't recall all the details, hence some of these details may 
not be correct. I did look at the archives to see if we have email 
discussions around the change. I found the below related email thread.

https://lore.kernel.org/lkml/1310402460-5098-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com/


IIRC, this was to avoid picking up wrong inode from the hash? So a 
create is a new object and we want to actually avoid any comparison?
Hence pass a test function that always return false?

-aneesh

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-05-17 17:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 10:03 9p: fscache duplicate cookie Luis Henriques
2021-05-06 10:45 ` Dominique Martinet
2021-05-06 12:18   ` Luis Henriques
2021-05-07 16:36     ` Luis Henriques
2021-05-08  0:47       ` Dominique Martinet
2021-05-10 10:54         ` Luis Henriques
2021-05-10 11:47           ` Luis Henriques
2021-05-11 12:44           ` David Howells
2021-05-12 10:10             ` Luis Henriques
2021-05-11 12:53         ` David Howells
2021-05-11 12:38 ` David Howells
2021-05-12 10:07   ` Luis Henriques
2021-05-12 11:04   ` David Howells
2021-05-12 11:58     ` Luis Henriques
2021-05-12 12:26       ` Dominique Martinet
2021-05-12 12:57       ` What sort of inode state does ->evict_inode() expect to see? [was Re: 9p: fscache duplicate cookie] David Howells
2021-05-12 13:45         ` Al Viro
2021-05-12 14:12         ` David Howells
2021-05-14 16:10           ` Luis Henriques
2021-05-14 21:16             ` Dominique Martinet
2021-05-17 15:56               ` Luis Henriques
2021-05-17 17:39               ` Aneesh Kumar K.V
2021-05-12 11:09   ` 9p: fscache duplicate cookie David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).