All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] enable mds rejoin with active inodes' old parent xattrs
@ 2013-08-22  6:40 Alexandre Oliva
  2013-08-22  8:27 ` Yan, Zheng
  0 siblings, 1 reply; 6+ messages in thread
From: Alexandre Oliva @ 2013-08-22  6:40 UTC (permalink / raw)
  To: ceph-devel

When the parent xattrs of active inodes that the mds attempts to open
during rejoin lack pool info (struct_v < 5), this field will be filled
in with -1, causing the mds to retry fetching a backtrace with a pool
number that matches the expected value, which fails and causes the
err==-ENOENT branch to be taken and retry pool 1, which succeeds, but
with pool -1, and so keeps on bouncing between the two retry cases
forever.

This patch arranges for the mds to go along with pool -1 instead of
insisting that it be refetched, enabling it to complete recovery
instead of eating cpu, network bandwidth and metadata osd's resources
like there's no tomorrow, in what AFAICT is an infinite and very busy
loop.

This is not a new problem: I've had it even before upgrading from
Cuttlefish to Dumpling, I'd just never managed to track it down, and
force-unmounting the filesystem and then restarting the mds was an
easier (if inconvenient) work-around, particularly because it always
hit when the filesystem was under active, heavy-ish use (or there
wouldn't be much reason for caps recovery ;-)


There are two issues not addressed in this patch, however.  One is
that nothing seems to proactively update the parent xattr when it is
found to be outdated, so it remains out of date forever.  Not even
renaming top-level directories causes the xattrs to be recursively
rewritten.  AFAICT that's a bug.

The other is that inodes that don't have a parent xattr (created by
even older versions of ceph) are reported as non-existing in the mds
rejoin message, because the absence of the parent xattr is signaled as
a missing inode (“failed to reconnect caps for missing inodes”).  I
suppose this may cause more serious recovery problems.

I suppose a global pass over the filesystem tree updating parent
xattrs that are out-of-date would be desirable, if we find any parent
xattrs still lacking current information; it might make sense to
activate it as a background thread from the backtrace decoding
function, when it finds a parent xattr that's too out-of-date, or as a
separate client (ceph-fsck?).

Signed-off-by: Alexandre Oliva <oliva@gnu.org>
---
 src/mds/MDCache.cc |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mds/MDCache.cc b/src/mds/MDCache.cc
index e592dde..b6c37aec 100644
--- a/src/mds/MDCache.cc
+++ b/src/mds/MDCache.cc
@@ -7940,7 +7940,7 @@ void MDCache::_open_ino_backtrace_fetched(inodeno_t ino, bufferlist& bl, int err
   inode_backtrace_t backtrace;
   if (err == 0) {
     ::decode(backtrace, bl);
-    if (backtrace.pool != info.pool) {
+    if (backtrace.pool != info.pool && backtrace.pool != -1) {
       dout(10) << " old object in pool " << info.pool
 	       << ", retrying pool " << backtrace.pool << dendl;
       info.pool = backtrace.pool;

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-24 10:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-22  6:40 [PATCH] enable mds rejoin with active inodes' old parent xattrs Alexandre Oliva
2013-08-22  8:27 ` Yan, Zheng
2013-08-22 15:16   ` Sage Weil
2013-08-23 11:00   ` Alexandre Oliva
2013-08-23 20:11     ` Gregory Farnum
2013-08-24 10:38       ` Alexandre Oliva

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.