All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Date: Mon, 8 Feb 2016 23:35:35 +0000	[thread overview]
Message-ID: <20160208233535.GC17997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAOg9mSR1-7xOZCC=8dbx+zfHDep4NyUZm0e5XBDQzpCFaCNH7Q@mail.gmail.com>

On Mon, Feb 08, 2016 at 05:26:53PM -0500, Mike Marshall wrote:

> My impression is that dbench is more likely to fail ungracefully
> if I signal the client-core to abort during the execute phase, and
> more likely to complete normally if I signal the client-core to abort
> during warmup (or cleanup, which removes the directory tree
> built during warmup).
> 
> I'll do more tests tomorrow with more debug turned on, and see if
> I can get some idea of what makes dbench so ill... the most important
> thing is that the kernel doesn't crash, but it would be gravy if user
> processes could better withstand a client-core recycle.
> 
> Here's the bufmap debug output, I didn't want to send 700k
> of mostly "orangefs_bufmap_copy_from_iovec" to the list:
> 
> http://myweb.clemson.edu/~hubcap/out

>From the look of it, quite a few are of "getattr after create has failed
due to restart" - AFAICS, that has nothing to do with slot-related issues,
but readv/writev failures with -EINTR *are* slot-related (in addition to
the problem with EINTR handling in callers of wait_for_direct_io() mentioned
upthread)...

> grepping for "finalize" in all that noise is a good way to see
> where client-core restarts happened. I ran dbench numerous
> times, and managed to signal the client-core to restart during
> the same run several times...

FWIW, restart treatment is definitely still not right - we *do* want to
wait for bufmap to run down before trying to claim the slot on restart.
The problems with the current code are:
	* more than one process holding slots during the daemon restart =>
possible for the first one to get through dropping the slot and into
requesting the new one before the second one manages to drop its slot.
Restarted daemon won't be able to insert its bufmap in that case.
	* one process holding a slot during restart => very likely EIO,
since it ends up dropping the last reference to old bufmap and finds
NULL __orangefs_bufmap if it gets there before the new daemon.
	* lack of wait for daemon restart anyplace reachable (the one in
service_operation() can be reached only if we are about to do double-free,
already having bufmap refcounts buggered; with that crap fixed, you are
not hitting that logics anymore).

I think I have a solution for the slot side of that, but I'm still not happy
with the use of low-level waitqueue primitives in that sucker, which is almost
always a Very Bad Sign(tm).

I'm not sure what to do with operations that go into restart directly in
service_operation() - it looks like it ought to wait for op_timeout_secs
and if the damn things isn't back *and* managed to service the operation
by that time, we are screwed.  Note that this
                if (op_state_purged(op)) {      
                        ret = (op->attempts < ORANGEFS_PURGE_RETRY_COUNT) ?
                                 -EAGAIN :
                                 -EIO;
                        gossip_debug(GOSSIP_WAIT_DEBUG,
                                     "*** %s:"
                                     " operation purged (tag "
                                     "%llu, %p, att %d)\n",
                                     __func__,
                                     llu(op->tag),
                                     op,
                                     op->attempts);
                        orangefs_clean_up_interrupted_operation(op);
                        break;
                }
is hit only if the daemon got stopped after we'd submitted the operation;
if it had been stopped before entering service_operation() and new one hadn't
managed to catch up with requests in that interval, no attempts to resubmit
are made.

And AFAICS
        if (is_daemon_in_service() < 0) {
                /*
                 * By incrementing the per-operation attempt counter, we
                 * directly go into the timeout logic while waiting for
                 * the matching downcall to be read
                 */
                gossip_debug(GOSSIP_WAIT_DEBUG,
                             "%s:client core is NOT in service(%d).\n",
                             __func__,
                             is_daemon_in_service());
                op->attempts++;
        }
is inherently racy.  Suppose the daemon gets stopped during the window
between that check and the moment request is added to the queue.  If it had
been stopped past that window, op would've been purged; if it had been
stopped before the check, we get ->attempts bumped, but if it happens _within_
that window, in wait_for_matching_downcall() you'll be getting to
                /*
                 * if this was our first attempt and client-core
                 * has not purged our operation, we are happy to
                 * simply wait
                 */
                if (op->attempts == 0 && !op_state_purged(op)) {
                        spin_unlock(&op->lock);
                        schedule();
with nothing to wake you up until the new daemon gets started.  Sure, it's
an interruptible sleep, so you can always kill -9 the sucker, but it still
looks wrong.

  reply	other threads:[~2016-02-08 23:35 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-15 21:46 Orangefs ABI documentation Mike Marshall
2016-01-22  7:11 ` Al Viro
2016-01-22 11:09   ` Mike Marshall
2016-01-22 16:59     ` Mike Marshall
2016-01-22 17:08       ` Al Viro
2016-01-22 17:40         ` Mike Marshall
2016-01-22 17:43         ` Al Viro
2016-01-22 18:17           ` Mike Marshall
2016-01-22 18:37             ` Al Viro
2016-01-22 19:07               ` Mike Marshall
2016-01-22 19:21                 ` Mike Marshall
2016-01-22 20:04                   ` Al Viro
2016-01-22 20:30                     ` Mike Marshall
2016-01-23  0:12                       ` Al Viro
2016-01-23  1:28                         ` Al Viro
2016-01-23  2:54                           ` Mike Marshall
2016-01-23 19:10                             ` Al Viro
2016-01-23 19:24                               ` Mike Marshall
2016-01-23 21:35                                 ` Mike Marshall
2016-01-23 22:05                                   ` Al Viro
2016-01-23 21:40                                 ` Al Viro
2016-01-23 22:36                                   ` Mike Marshall
2016-01-24  0:16                                     ` Al Viro
2016-01-24  4:05                                       ` Al Viro
2016-01-24 22:12                                         ` Mike Marshall
2016-01-30 17:22                                           ` Al Viro
2016-01-26 19:52                                         ` Martin Brandenburg
2016-01-30 17:34                                           ` Al Viro
2016-01-30 18:27                                             ` Al Viro
2016-02-04 23:30                                               ` Mike Marshall
2016-02-06 19:42                                                 ` Al Viro
2016-02-07  1:38                                                   ` Al Viro
2016-02-07  3:53                                                     ` Al Viro
2016-02-07 20:01                                                       ` [RFC] bufmap-related wait logics (Re: Orangefs ABI documentation) Al Viro
2016-02-08 22:26                                                       ` Orangefs ABI documentation Mike Marshall
2016-02-08 23:35                                                         ` Al Viro [this message]
2016-02-09  3:32                                                           ` Al Viro
2016-02-09 14:34                                                             ` Mike Marshall
2016-02-09 17:40                                                               ` Al Viro
2016-02-09 21:06                                                                 ` Al Viro
2016-02-09 22:25                                                                   ` Mike Marshall
2016-02-11 23:36                                                                   ` Mike Marshall
2016-02-09 22:02                                                                 ` Mike Marshall
2016-02-09 22:16                                                                   ` Al Viro
2016-02-09 22:40                                                                     ` Al Viro
2016-02-09 23:13                                                                       ` Al Viro
2016-02-10 16:44                                                                         ` Al Viro
2016-02-10 21:26                                                                           ` Al Viro
2016-02-11 23:54                                                                           ` Mike Marshall
2016-02-12  0:55                                                                             ` Al Viro
2016-02-12 12:13                                                                               ` Mike Marshall
2016-02-11  0:44                                                                         ` Al Viro
2016-02-11  3:22                                                                           ` Mike Marshall
2016-02-12  4:27                                                                             ` Al Viro
2016-02-12 12:26                                                                               ` Mike Marshall
2016-02-12 18:00                                                                                 ` Martin Brandenburg
2016-02-13 17:18                                                                                   ` Mike Marshall
2016-02-13 17:47                                                                                     ` Al Viro
2016-02-14  2:56                                                                                       ` Al Viro
2016-02-14  3:46                                                                                         ` [RFC] slot allocator - waitqueue use review needed (Re: Orangefs ABI documentation) Al Viro
2016-02-14  4:06                                                                                           ` Al Viro
2016-02-16  2:12                                                                                           ` Al Viro
2016-02-16 19:28                                                                                             ` Al Viro
2016-02-14 22:31                                                                                         ` Orangefs ABI documentation Mike Marshall
2016-02-14 23:43                                                                                           ` Al Viro
2016-02-15 17:46                                                                                             ` Mike Marshall
2016-02-15 18:45                                                                                               ` Al Viro
2016-02-15 22:32                                                                                                 ` Martin Brandenburg
2016-02-15 23:04                                                                                                   ` Al Viro
2016-02-16 23:15                                                                                                     ` Mike Marshall
2016-02-16 23:36                                                                                                       ` Al Viro
2016-02-16 23:54                                                                                                         ` Al Viro
2016-02-17 19:24                                                                                                           ` Mike Marshall
2016-02-17 20:11                                                                                                             ` Al Viro
2016-02-17 21:17                                                                                                               ` Al Viro
2016-02-17 22:24                                                                                                                 ` Mike Marshall
2016-02-17 22:40                                                                                                             ` Martin Brandenburg
2016-02-17 23:09                                                                                                               ` Al Viro
2016-02-17 23:15                                                                                                                 ` Al Viro
2016-02-18  0:04                                                                                                                   ` Al Viro
2016-02-18 11:11                                                                                                                     ` Al Viro
2016-02-18 18:58                                                                                                                       ` Mike Marshall
2016-02-18 19:20                                                                                                                         ` Al Viro
2016-02-18 19:49                                                                                                                         ` Martin Brandenburg
2016-02-18 20:08                                                                                                                           ` Mike Marshall
2016-02-18 20:22                                                                                                                             ` Mike Marshall
2016-02-18 20:38                                                                                                                               ` Mike Marshall
2016-02-18 20:52                                                                                                                                 ` Al Viro
2016-02-18 21:50                                                                                                                                   ` Mike Marshall
2016-02-19  0:25                                                                                                                                     ` Al Viro
2016-02-19 22:11                                                                                                                                       ` Mike Marshall
2016-02-19 22:22                                                                                                                                         ` Al Viro
2016-02-20 12:14                                                                                                                                           ` Mike Marshall
2016-02-20 13:36                                                                                                                                             ` Al Viro
2016-02-22 16:20                                                                                                                                               ` Mike Marshall
2016-02-22 21:22                                                                                                                                                 ` Mike Marshall
2016-02-23 21:58                                                                                                                                                   ` Mike Marshall
2016-02-26 20:21                                                                                                                                                     ` Mike Marshall
2016-02-19 22:32                                                                                                                                         ` Al Viro
2016-02-19 22:45                                                                                                                                           ` Martin Brandenburg
2016-02-19 22:50                                                                                                                                           ` Martin Brandenburg
2016-02-18 20:49                                                                                                                               ` Al Viro
2016-02-15 22:47                                                                                                 ` Mike Marshall
2016-01-23 22:46                                   ` write() semantics (Re: Orangefs ABI documentation) Al Viro
2016-01-23 23:35                                     ` Linus Torvalds
2016-03-03 22:25                                       ` Mike Marshall
2016-03-04 20:55                                         ` Mike Marshall
2016-01-22 20:51                     ` Orangefs ABI documentation Mike Marshall
2016-01-22 23:53                       ` Mike Marshall
2016-01-22 19:54                 ` Al Viro
2016-01-22 19:50             ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160208233535.GC17997@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hubcap@omnibond.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sfr@canb.auug.org.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.