All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Farnum <gfarnum@redhat.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Xiubo Li <xiubli@redhat.com>, Ilya Dryomov <idryomov@gmail.com>,
	Patrick Donnelly <pdonnell@redhat.com>,
	ceph-devel <ceph-devel@vger.kernel.org>
Subject: Re: [PATCH] ceph: try to reconnect to the export targets
Date: Tue, 17 Aug 2021 11:48:13 -0700	[thread overview]
Message-ID: <CAJ4mKGZPvHCmda5=8vB+N5YU37zGGXwHTukGwAGaQrBRiAiPRA@mail.gmail.com> (raw)
In-Reply-To: <6b29b8b523fa3e45635b02c455dfecd5a0cf9b88.camel@kernel.org>

On Tue, Aug 17, 2021 at 11:14 AM Jeff Layton <jlayton@kernel.org> wrote:
> On Tue, 2021-08-17 at 10:56 -0700, Gregory Farnum wrote:
> > While I agree we don’t want to crash on unexpected input from the
> > network, if we are tossing out a map we need to shut down the mount as
> > well. If we think the system metadata is invalid, that’s not really a
> > recoverable condition and continuing to do IO is a mistake from the
> > whole-system perspective — either the server has failed horribly or
> > there’s something the client doesn’t understand which may be critical
> > to correctness; either way there's a big problem with the basic system
> > operation. (I mean, if we hit this point obviously the server has
> > failed horribly since we should have gated it, but it may have failed
> > horribly in some non-code-logic fashion.)
> > -Greg
> >
>
> I see this as essentially the same as any other parsing error in the
> mdsmap. When we hit one of those, we currently just do this:
>
>     pr_err("error decoding fsmap\n");
>
> ...and soldier on. It's been this way since the beginning, afaict.

Oh. That's, uh, interesting.
I mean, you're right, this case isn't any more special. I just didn't
know that's how the kernel client handles it. (The userspace client
inherits the usual userspace decode logic and any accompanying
asserts.)

> If we want to do something more involved there, then that could probably
> be done, but it's not as simple as throwing a switch. We may have open
> files and dirty data to deal with. We do have some code to deal with
> attempting to reconnect after a blacklist event, so you might be able to
> treat this similarly.

Hmm, my guess is this only happens if the MDS is spewing nonsense out
over its pipe, or we've made a logic error and let a client join
across a non-backwards-compatible encoding/feature change. I think we
probably just start throwing EIO and don't try to remount, rather than
going for anything more polite. *shrug*

> In any case, this would be a pretty unusual situation, and I don't see
> us having the manpower to spend on coding up an elegant solution to this
> potential problem anytime soon. It might be worth opening a tracker for
> it though if that changes in the future.

Makes sense. Ticket done: https://tracker.ceph.com/issues/52303
-Greg

>
>
> > >
> > > > +                     export_targets[m_info->export_targets[i]] = 1;
> > > > +             }
> > > > +     }
> > > > +
> > > >       for (i = 0; i < oldmap->possible_max_rank && i < mdsc->max_sessions; i++) {
> > > >               if (!mdsc->sessions[i])
> > > >                       continue;
> > > > @@ -4242,6 +4253,8 @@ static void check_new_map(struct ceph_mds_client *mdsc,
> > > >               if (s->s_state == CEPH_MDS_SESSION_RESTARTING &&
> > > >                   newstate >= CEPH_MDS_STATE_RECONNECT) {
> > > >                       mutex_unlock(&mdsc->mutex);
> > > > +                     if (export_targets)
> > > > +                             export_targets[i] = 0;
> > > >                       send_mds_reconnect(mdsc, s);
> > > >                       mutex_lock(&mdsc->mutex);
> > > >               }
> > > > @@ -4264,6 +4277,47 @@ static void check_new_map(struct ceph_mds_client *mdsc,
> > > >               }
> > > >       }
> > > >
> > > > +     for (i = 0; i < newmap->possible_max_rank; i++) {
> > >
> > > The condition on this loop is slightly different from the one below it,
> > > and I'm not sure why. Should this also be checking this?
> > >
> > >     i < newmap->possible_max_rank && i < mdsc->max_sessions
> > >
> > > ...do we need to look at export targets where i >= mdsc->max_sessions ?
> > >
> > > > +             if (!export_targets)
> > > > +                     break;
> > > > +
> > > > +             /*
> > > > +              * Only open and reconnect sessions that don't
> > > > +              * exist yet.
> > > > +              */
> > > > +             if (!export_targets[i] || __have_session(mdsc, i))
> > > > +                     continue;
> > > > +
> > > > +             /*
> > > > +              * In case the export MDS is crashed just after
> > > > +              * the EImportStart journal is flushed, so when
> > > > +              * a standby MDS takes over it and is replaying
> > > > +              * the EImportStart journal the new MDS daemon
> > > > +              * will wait the client to reconnect it, but the
> > > > +              * client may never register/open the sessions
> > > > +              * yet.
> > > > +              *
> > > > +              * It will try to reconnect that MDS daemons if
> > > > +              * the MDSes are in the export targets and is the
> > > > +              * RECONNECT state.
> > > > +              */
> > > > +             newstate = ceph_mdsmap_get_state(newmap, i);
> > > > +             if (newstate != CEPH_MDS_STATE_RECONNECT)
> > > > +                     continue;
> > > > +             s = __open_export_target_session(mdsc, i);
> > > > +             if (IS_ERR(s)) {
> > > > +                     err = PTR_ERR(s);
> > > > +                     pr_err("failed to open export target session, err %d\n",
> > > > +                            err);
> > > > +                     continue;
> > > > +             }
> > > > +             dout("send reconnect to target mds.%d\n", i);
> > > > +             mutex_unlock(&mdsc->mutex);
> > > > +             send_mds_reconnect(mdsc, s);
> > > > +             mutex_lock(&mdsc->mutex);
> > > > +             ceph_put_mds_session(s);
> > >
> > > Suppose we end up in this part of the code, and we have to drop the
> > > mdsc->mutex like this. What ensures that an earlier session in the array
> > > won't end up going back into CEPH_MDS_STATE_RECONNECT before we can get
> > > into the loop below? This looks racy.
> > >
> > > > +     }
> > > > +
> > > >       for (i = 0; i < newmap->possible_max_rank && i < mdsc->max_sessions; i++) {
> > > >               s = mdsc->sessions[i];
> > > >               if (!s)
> > > > @@ -4278,6 +4332,8 @@ static void check_new_map(struct ceph_mds_client *mdsc,
> > > >                       __open_export_target_sessions(mdsc, s);
> > > >               }
> > > >       }
> > > > +
> > > > +     kfree(export_targets);
> > > >  }
> > > >
> > > >
> > >
> > > --
> > > Jeff Layton <jlayton@kernel.org>
> > >
> >
>
> --
> Jeff Layton <jlayton@kernel.org>
>


      reply	other threads:[~2021-08-17 18:48 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12  4:10 [PATCH] ceph: try to reconnect to the export targets xiubli
2021-08-16 12:05 ` Jeff Layton
2021-08-17  3:03   ` Xiubo Li
2021-08-17 17:56   ` Gregory Farnum
2021-08-17 18:14     ` Jeff Layton
2021-08-17 18:48       ` Gregory Farnum [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ4mKGZPvHCmda5=8vB+N5YU37zGGXwHTukGwAGaQrBRiAiPRA@mail.gmail.com' \
    --to=gfarnum@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=pdonnell@redhat.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.