All of lore.kernel.org
 help / color / mirror / Atom feed
* Client: what is supposed to protect racing readdirs and unlinks?
@ 2015-07-14 17:11 Gregory Farnum
  2015-07-14 18:36 ` Sage Weil
  2015-07-15 12:57 ` Yan, Zheng
  0 siblings, 2 replies; 4+ messages in thread
From: Gregory Farnum @ 2015-07-14 17:11 UTC (permalink / raw)
  To: Sage Weil, 严正; +Cc: ceph-devel

I spent a bunch of today looking at http://tracker.ceph.com/issues/12297.

Long story short: the workload is doing a readdir at the same time as
it's unlinking files. The readdir functions (in this case,
_readdir_cache_cb) drop the client_lock each time they invoke the
callback (for obvious reasons). There is some effort in
_readdir_cache_cb to try and keep the iterator valid (we check on each
loop that we aren't at end; we increment the iterator before dropping
the lock), but it's not sufficient.

Is there supposed to be something preventing this kind of race? If not
I can work something out in the code but I've not done much work in
that bit and there are enough pieces that I wonder if I'm missing some
other issue.
-Greg

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Client: what is supposed to protect racing readdirs and unlinks?
  2015-07-14 17:11 Client: what is supposed to protect racing readdirs and unlinks? Gregory Farnum
@ 2015-07-14 18:36 ` Sage Weil
       [not found]   ` <CAC6JEv9i2J_XAyDdWZRe1RsN9VNaikaKtBiAYMb_YEvRSV3w0A@mail.gmail.com>
  2015-07-15 12:57 ` Yan, Zheng
  1 sibling, 1 reply; 4+ messages in thread
From: Sage Weil @ 2015-07-14 18:36 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: 严正, ceph-devel

On Tue, 14 Jul 2015, Gregory Farnum wrote:
> I spent a bunch of today looking at http://tracker.ceph.com/issues/12297.
> 
> Long story short: the workload is doing a readdir at the same time as
> it's unlinking files. The readdir functions (in this case,
> _readdir_cache_cb) drop the client_lock each time they invoke the
> callback (for obvious reasons). There is some effort in
> _readdir_cache_cb to try and keep the iterator valid (we check on each
> loop that we aren't at end; we increment the iterator before dropping
> the lock), but it's not sufficient.
> 
> Is there supposed to be something preventing this kind of race? If not
> I can work something out in the code but I've not done much work in
> that bit and there are enough pieces that I wonder if I'm missing some
> other issue.

What is the race you're worried about?  Unlinking the file that we're 
doing the callback on, or the one that follows it (where the iterator now 
points)?

My guess is that in this case unlink should see that there is a reference 
on the dentry and should make it NULL instead of unlinking it from the 
directory entirely...

sage

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Client: what is supposed to protect racing readdirs and unlinks?
       [not found]   ` <CAC6JEv9i2J_XAyDdWZRe1RsN9VNaikaKtBiAYMb_YEvRSV3w0A@mail.gmail.com>
@ 2015-07-14 19:18     ` Sage Weil
  0 siblings, 0 replies; 4+ messages in thread
From: Sage Weil @ 2015-07-14 19:18 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: ceph-devel, 严正

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2277 bytes --]

On Tue, 14 Jul 2015, Gregory Farnum wrote:
> The following dentry. When it gets unlinked it also gets removed from the
> xlist we're using to traverse the directory contents. We could add some kind
> of refcounting to try and make it do a NULL or similat but I don't think
> anything like that exists yet.

Yeah that sounds likely.  I think that's the fix, though.. and I have some 
recollection of conditions where we favor making the dentry null instead 
of removing it, so hopefully it's not to involved.

It's probably a better behavior anyway for caching reasons, in case the 
directory isn't complete?

> (Sorry this is probably going to get bounced from the list, I'm on my phone
> and haven't found an html-free way to send from it...)

My reply should go through at least :)

sage

> 
> 
> On Tue, Jul 14, 2015, 7:36 PM Sage Weil <sweil@redhat.com> wrote:
>       On Tue, 14 Jul 2015, Gregory Farnum wrote:
>       > I spent a bunch of today looking at
>       http://tracker.ceph.com/issues/12297.
>       >
>       > Long story short: the workload is doing a readdir at the same
>       time as
>       > it's unlinking files. The readdir functions (in this case,
>       > _readdir_cache_cb) drop the client_lock each time they invoke
>       the
>       > callback (for obvious reasons). There is some effort in
>       > _readdir_cache_cb to try and keep the iterator valid (we check
>       on each
>       > loop that we aren't at end; we increment the iterator before
>       dropping
>       > the lock), but it's not sufficient.
>       >
>       > Is there supposed to be something preventing this kind of
>       race? If not
>       > I can work something out in the code but I've not done much
>       work in
>       > that bit and there are enough pieces that I wonder if I'm
>       missing some
>       > other issue.
> 
>       What is the race you're worried about?  Unlinking the file that
>       we're
>       doing the callback on, or the one that follows it (where the
>       iterator now
>       points)?
> 
>       My guess is that in this case unlink should see that there is a
>       reference
>       on the dentry and should make it NULL instead of unlinking it
>       from the
>       directory entirely...
> 
>       sage
> 
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Client: what is supposed to protect racing readdirs and unlinks?
  2015-07-14 17:11 Client: what is supposed to protect racing readdirs and unlinks? Gregory Farnum
  2015-07-14 18:36 ` Sage Weil
@ 2015-07-15 12:57 ` Yan, Zheng
  1 sibling, 0 replies; 4+ messages in thread
From: Yan, Zheng @ 2015-07-15 12:57 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Sage Weil, 严正, ceph-devel

On Wed, Jul 15, 2015 at 1:11 AM, Gregory Farnum <greg@gregs42.com> wrote:
> I spent a bunch of today looking at http://tracker.ceph.com/issues/12297.
>
> Long story short: the workload is doing a readdir at the same time as
> it's unlinking files. The readdir functions (in this case,
> _readdir_cache_cb) drop the client_lock each time they invoke the
> callback (for obvious reasons). There is some effort in
> _readdir_cache_cb to try and keep the iterator valid (we check on each
> loop that we aren't at end; we increment the iterator before dropping
> the lock), but it's not sufficient.
>
> Is there supposed to be something preventing this kind of race? If not
> I can work something out in the code but I've not done much work in
> that bit and there are enough pieces that I wonder if I'm missing some
> other issue.

I think calling (*pd)->get() before release the client_lock should work.

Regards
Yan, Zheng

> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-07-15 12:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-14 17:11 Client: what is supposed to protect racing readdirs and unlinks? Gregory Farnum
2015-07-14 18:36 ` Sage Weil
     [not found]   ` <CAC6JEv9i2J_XAyDdWZRe1RsN9VNaikaKtBiAYMb_YEvRSV3w0A@mail.gmail.com>
2015-07-14 19:18     ` Sage Weil
2015-07-15 12:57 ` Yan, Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.