All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Fasheh via Ocfs2-devel <ocfs2-devel@oss.oracle.com>
To: "heming.zhao@suse.com" <heming.zhao@suse.com>
Cc: ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack"
Date: Thu, 4 Aug 2022 21:11:53 -0700	[thread overview]
Message-ID: <CAGe7X7=F16yaowhTrLgSGcS4gsGUbtjBgEamfur5n1C0hgRA=Q@mail.gmail.com> (raw)
In-Reply-To: <CAGe7X7n7J52w3Jn5Sz5=f2AQohLMK5xK2Pjsd9oPAMSn52+SrQ@mail.gmail.com>

On Thu, Aug 4, 2022 at 4:53 PM Mark Fasheh <mark@fasheh.com> wrote:
> 2) Should we allow the user to bypass our cluster checks?
>
> On this question I'm still a 'no'. I simply haven't seen enough
> evidence to warrant such a drastic change in policy. Allowing it via
> mount option too just feels extremely error-prone. I think we need to
> explore alternative avenues to help
> ing the user out here. As you noted in your followup, a single node
> config is entirely possible in pacemaker (I've run that config
> myself). Why not provide an easy way for the user to drop down to that
> sort of a config? I know that's kind
> of pushing responsibility for this to the cluster stack, but that's
> where it belongs in the first place.
>
> Another option might be an 'observer mode' mount, where the node
> participates in the cluster (and the file system locking) but purely
> in a read-only fashion.

Thinking about this some more... The only way that this works without
potential corruptions is if we always write a periodic mmp sequence,
even in clustered mode (which might mean each node writes to its own
sector). That way tunefs can always check the disk for a mounted node,
even without a cluster stack up. If tunefs sees anyone writing
sequences to the disk, it can safely fail the operation. Tunefs also
would have to be writing an mmp sequence once it has determined that
the disk is not mounted. It could also write some flag alongisde the
sequence that says 'tunefs is working on this disk'. If a cluster
mount comes up and sees a live sequence with that flag, it will know
to fail the mount request as the disk is being modified. Local mounts
can also use this to ensure that they are the only mounted node.

As it turns out, we already do pretty much all of the sequence writing
already for the o2cb cluster stack - check out cluseter/heartbeat.c.
If memory serves, tunefs.ocfs2 has code to write to this heartbeat
area as well. For o2cb, we use the disk heartbeat to detect node
liveness, and to kill our local node if we see disk timeouts. For
pcmk, we shouldn't take any of these actions as it is none of our
responsibility. Under pcmk, the heartbeating would be purely for mount
protection checks.

The downside to this is that all nodes would be heartbeating to the
disk on a regular interval, not just one. To be fair, this is exactly
how o2cb works and with the correct timeout choices, we were able to
avoid a measurable performance impact, though in any case this might
have to be a small price the user pays for cluster aware mount
protection.

Let me know what you think.

Thanks,
  --Mark

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

  reply	other threads:[~2022-08-05  4:12 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-30  1:14 [Ocfs2-devel] [PATCH 0/4] re-enable non-clustered mount & add MMP support Heming Zhao via Ocfs2-devel
2022-07-30  1:14 ` [Ocfs2-devel] [PATCH 1/4] ocfs2: Fix freeing uninitialized resource on ocfs2_dlm_shutdown Heming Zhao via Ocfs2-devel
2022-08-08  6:51   ` Joseph Qi via Ocfs2-devel
2022-08-08 12:09     ` Heming Zhao via Ocfs2-devel
2022-08-10  1:31       ` Joseph Qi via Ocfs2-devel
2022-08-10 23:52         ` heming.zhao--- via Ocfs2-devel
2022-07-30  1:14 ` [Ocfs2-devel] [PATCH 2/4] ocfs2: add mlog ML_WARNING support Heming Zhao via Ocfs2-devel
2022-07-30  1:14 ` [Ocfs2-devel] [PATCH 3/4] re-enable "ocfs2: mount shared volume without ha stack" Heming Zhao via Ocfs2-devel
2022-07-31 17:42   ` Mark Fasheh via Ocfs2-devel
2022-08-01  1:01     ` heming.zhao--- via Ocfs2-devel
2022-08-01  2:25       ` heming.zhao--- via Ocfs2-devel
2022-08-04 23:53       ` Mark Fasheh via Ocfs2-devel
2022-08-05  4:11         ` Mark Fasheh via Ocfs2-devel [this message]
2022-08-06 15:53           ` heming.zhao--- via Ocfs2-devel
2022-08-06 16:20           ` Heming Zhao via Ocfs2-devel
2022-08-06 15:44         ` heming.zhao--- via Ocfs2-devel
2022-08-06 16:15         ` Heming Zhao via Ocfs2-devel
2022-07-30  1:14 ` [Ocfs2-devel] [PATCH 4/4] ocfs2: introduce ext4 MMP feature Heming Zhao via Ocfs2-devel
2022-07-31  9:13   ` heming.zhao--- via Ocfs2-devel
2022-08-08  8:19   ` Joseph Qi via Ocfs2-devel
2022-08-08  9:07     ` Heming Zhao via Ocfs2-devel
2022-08-08  9:26       ` Heming Zhao via Ocfs2-devel
2022-08-08  9:29       ` Joseph Qi via Ocfs2-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGe7X7=F16yaowhTrLgSGcS4gsGUbtjBgEamfur5n1C0hgRA=Q@mail.gmail.com' \
    --to=ocfs2-devel@oss.oracle.com \
    --cc=heming.zhao@suse.com \
    --cc=mark@fasheh.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.