All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Farnum <greg@gregs42.com>
To: Sage Weil <sweil@redhat.com>
Cc: "ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>
Subject: Re: idempotent op (esp delete)
Date: Fri, 23 Jan 2015 14:13:02 -0800	[thread overview]
Message-ID: <CAC6JEv8iKG=uBcojtZ-KwWVGjd75SdYm05YOwdtm_H2TZi1-2g@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1501231327570.10173@cobra.newdream.net>

On Fri, Jan 23, 2015 at 1:43 PM, Sage Weil <sweil@redhat.com> wrote:
> Background:
>
> 1) Way back when we made a task that would thrash the cache modes by
> adding and removing the cache tier while ceph_test_rados was running.
> This mostly worked, but would occasionally fail because we would
>
>  - delete an object from the cache tier
>  - a network failure injection would lose the reply
>  - we'd disable the cache
>  - the delete would resend to the base tier, not get recognized as a dup
> (different pool, different pg log)
>    -> -ENOENT instead of 0
>
> 2) The proxy write code hits a similar problem:
>
>  - delete gets proxied
>  - we initiate async promote
>  - a network failure injection loses the delete reply
>  - delete resends and blocks on promote (or arrives after it finishes)
>  - promote finishes
>  - delete is handled
>   -> -ENOENT instead of 0
>
> The ticket is http://tracker.ceph.com/issues/8935
>
> The problem is partially addressed by
>
>         https://github.com/ceph/ceph/pull/3447
>
> by logging a few request ids on every object_info_t and preserving that on
> promote and flush.
>
> However, it doesn't solve the problem for delete because we
> throw out object_info_t so that reqid_t is lost.
>
> I think we have two options, not necessarily mutually exclusive:
>
> 1) When promoting an object that doesn't exist (to create a whiteout),
> pull reqids out of the base tier's pg log so that the whiteout is primed
> with request ids.
>
> 1.5) When flushing... well, that is harder because we have nowhere to put
> the reqids.  Unless we make a way to cram a list of reqid's into a single
> PG log entry...?  In that case, we wouldn't strictly need the per-object
> list since we could pile the base tier's reqids into the promote log entry
> in the cache tier.
>
> 2) Make delete idempotent (0 instead of ENOENT if the object doesn't
> exist).  This will require a delicate compat transition (let's ignore that
> a moment) but you can preserve the old behavior for callers that care by
> preceding the delete with an assert_exists op.  Most callers don't care,
> but a handful do.  This simplifies the semantics we need to support going
> forward.
>
> Of course, it's all a bit delicate.  The idempotent op semantics have a
> time horizon so it's all a bit wishy-washy... :/
>
> Thoughts?

Do we have other cases that we're worried about which would be
improved by maintaining reqids across pool cache transitions? I'm not
a big fan of maintaining those per-op lists (they sound really
expensive?), but if we need them for something else that's a point in
their favor.

We could make delete idempotent instead and that's what I initially
favor, but it also seems a bit scary (it's not like our operations can
be made idempotent; lots of them invoke classes that will differ or
whatever!) and I can't think of which callers might care so I'm having
trouble formulating the bounds of this solution.
-Greg

  reply	other threads:[~2015-01-23 22:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-23 21:43 idempotent op (esp delete) Sage Weil
2015-01-23 22:13 ` Gregory Farnum [this message]
2015-01-23 22:18   ` Sage Weil
2015-01-23 23:20     ` Gregory Farnum
2015-01-26  2:35     ` Wang, Zhiqiang
2015-01-26  5:51       ` Wang, Zhiqiang
2015-01-26 17:21         ` Sage Weil
2015-01-26 17:37           ` Samuel Just
2015-01-26 22:52             ` Sage Weil
2015-01-27  6:11           ` Wang, Zhiqiang
2015-01-27 14:30             ` Sage Weil
2015-01-27 17:43               ` Samuel Just
2015-01-28  0:21               ` Wang, Zhiqiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC6JEv8iKG=uBcojtZ-KwWVGjd75SdYm05YOwdtm_H2TZi1-2g@mail.gmail.com' \
    --to=greg@gregs42.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sweil@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.