All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Date: Sat, 6 Feb 2016 19:42:10 +0000	[thread overview]
Message-ID: <20160206194210.GX17997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAOg9mSQOZP155TnuoJG_YFb19oU35cj8fHTxn8BfJCw+asP-CQ@mail.gmail.com>

On Thu, Feb 04, 2016 at 06:30:26PM -0500, Mike Marshall wrote:
> As for the WARN_ONs, the waitqueue one is easy to hit when the
> client-core stops and restarts, you can see here where precopy_buffers
> started whining about the client-core, you can see that the client
> core restarted when the debug mask got sent back over, and then
> the WARN_ON in waitqueue gets hit:
> 
> [ 1239.198976] precopy_buffers: Failed to copy-in buffers. Please make
> sure that  the pvfs2-client is running. -14
> [ 1239.198979] precopy_buffers: Failed to copy-in buffers. Please make
> sure that  the pvfs2-client is running. -14
> [ 1239.198983] orangefs_file_write_iter: do_readv_writev failed, rc:-14:.
> [ 1239.199175] precopy_buffers: Failed to copy-in buffers. Please make
> sure that  the pvfs2-client is running. -14
> [ 1239.199177] precopy_buffers: Failed to copy-in buffers. Please make
> sure that  the pvfs2-client is running. -14
> [ 1239.199180] orangefs_file_write_iter: do_readv_writev failed, rc:-14:.
> [ 1239.199601] precopy_buffers: Failed to copy-in buffers. Please make
> sure that  the pvfs2-client is running. -14
> [ 1239.199602] precopy_buffers: Failed to copy-in buffers. Please make
> sure that  the pvfs2-client is running. -14
> [ 1239.199604] orangefs_file_write_iter: do_readv_writev failed, rc:-14:.
> [ 1239.248239] dispatch_ioctl_command: client debug mask has been been
> received  :0: :0:
> [ 1239.248257] dispatch_ioctl_command: client debug array string has
> been receiv ed.
> [ 1239.307842] ------------[ cut here ]------------
> [ 1239.307847] WARNING: CPU: 0 PID: 1347 at
> fs/orangefs/waitqueue.c:208 service_ operation+0x59f/0x9b0()
> [ 1239.307848] Modules linked in: bnep bluetooth ip6t_rpfilter rfkill
> ip6t_REJEC T nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
> nf_conntrack_ipv4 nf_defrag_ip v4 xt_conntrack nf_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_f ilter ebtables
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip
> 6_tables iptable_mangle iptable_security iptable_raw ppdev parport_pc
> virtio_bal loon virtio_console parport 8139too serio_raw pvpanic
> i2c_piix4 uinput qxl drm_k ms_helper ttm drm 8139cp i2c_core
> virtio_pci ata_generic virtio virtio_ring mii  pata_acpi
> [ 1239.307870] CPU: 0 PID: 1347 Comm: dbench Not tainted
> 4.4.0-161988-g237f828-d irty #49
> [ 1239.307871] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 1239.307872]  0000000000000000 0000000011eac412 ffff88003bf27cd0
> ffffffff8139c 84d
> [ 1239.307874]  ffff88003bf27d08 ffffffff8108e510 ffff880010968000
> ffff88001096c 1d0
> [ 1239.307876]  ffff88001096c188 00000000fffffff5 0000000000000000
> ffff88003bf27 d18
> [ 1239.307877] Call Trace:
> [ 1239.307881]  [<ffffffff8139c84d>] dump_stack+0x19/0x1c
> [ 1239.307884]  [<ffffffff8108e510>] warn_slowpath_common+0x80/0xc0
> [ 1239.307886]  [<ffffffff8108e65a>] warn_slowpath_null+0x1a/0x20
> [ 1239.307887]  [<ffffffff812fe73f>] service_operation+0x59f/0x9b0
> [ 1239.307889]  [<ffffffff810c28b0>] ? prepare_to_wait_event+0x100/0x100
> [ 1239.307891]  [<ffffffff810c28b0>] ? prepare_to_wait_event+0x100/0x100
> [ 1239.307893]  [<ffffffff812fbd12>] orangefs_readdir+0x172/0xd50
> [ 1239.307895]  [<ffffffff810c9a2d>] ? trace_hardirqs_on+0xd/0x10
> [ 1239.307898]  [<ffffffff8120e4bf>] iterate_dir+0x9f/0x130
> [ 1239.307899]  [<ffffffff8120e9d0>] SyS_getdents+0xa0/0x140
> [ 1239.307900]  [<ffffffff8120e550>] ? iterate_dir+0x130/0x130
> [ 1239.307903]  [<ffffffff8178302f>] entry_SYSCALL_64_fastpath+0x12/0x76
> [ 1239.307904] ---[ end trace 66a9a15ad78b3dea ]---

Bloody hell.  OK, something's seriously fishy here.  You have
service_operation() called from orangefs_readdir().  It had been
immediately preceded by
        ret = orangefs_readdir_index_get(&bufmap, &buffer_index);
returning a success.  Moreover, *all* paths after return from
service_operation() pass through some instance of
	orangefs_readdir_index_put(bufmap, buffer_index);
(possibly hidden behind readdir_handle_dtor(bufmap, &rhandle)).
We'd bloody better _not_ have had bufmap freed by that point, right?

OTOH, this service_operation() has just seen orangefs_get_bufmap_init()
returning 0.  Since
int orangefs_get_bufmap_init(void) 
{ 
        return __orangefs_bufmap ? 1 : 0;
}
we have observed NULL __orangefs_bufmap at that point.  __orangefs_bufmap
is static, its address is never taken and the only assignments to it are
static void orangefs_bufmap_unref(struct orangefs_bufmap *bufmap)
{
        if (atomic_dec_and_lock(&bufmap->refcnt, &orangefs_bufmap_lock)) {
                __orangefs_bufmap = NULL;
                spin_unlock(&orangefs_bufmap_lock);

                orangefs_bufmap_unmap(bufmap);
                orangefs_bufmap_free(bufmap);
        }
}
and
        bufmap = orangefs_bufmap_alloc(user_desc);
        if (!bufmap)
                goto out;

        ret = orangefs_bufmap_map(bufmap, user_desc);
        if (ret)
                goto out_free_bufmap;

        spin_lock(&orangefs_bufmap_lock);
        if (__orangefs_bufmap) {
                spin_unlock(&orangefs_bufmap_lock);
                gossip_err("orangefs: error: bufmap already initialized.\n");
                ret = -EALREADY;
                goto out_unmap_bufmap;
        }
        __orangefs_bufmap = bufmap;
        spin_unlock(&orangefs_bufmap_lock);

The latter is definitely *not* making it NULL - that's daemon asking to
install a new one.  And the former is unconditionally followed by
kfree(bufmap) (via orangefs_bufmap_free()).

IOW, if this WARN_ON() is triggered, you've already gotten to freeing the
damn thing.  And subsequent crash appears to match that theory.

The really weird thing is that orangefs_readdir_index_get() increments the
refcount, so having hit that __orangefs_bufmap = NULL; while we sat in
service_operation() means that refcounting of bufmap got broken somehow...

orangefs_bufmap_alloc() creates it with refcount 1 and is very shortly followed
either by orangefs_bufmap_free() or by __orangefs_bufmap changing from NULL to
the pointer to created instance.

orangefs_bufmap_ref() either returns NULL or non-NULL value equal to
__orangefs_bufmap, in the latter case having bumped its refcount.

orangefs_bufmap_unref() drops refcount on its argument and if it has
reached 0, clears __orangefs_bufmap and frees the argument.  That's
a bit of a red flag - either we are always called with argument equal to
__orangefs_bufmap, or we might end up with a leak...

Nothing else changes their refcounts.  In orangefs_bufmap_{size,shift}_query()
a successful orangefs_bufmap_ref() is immediately followed by
orangefs_bufmap_unref() (incidentally, both might as well have just
grabbed orangefs_bufmap_lock, checked __orangefs_bufmap and picked its
->desc_{size,shift} before dropping the lock - no need to mess with refcounts
in those two).

Remaining callers of orangefs_bufmap_ref() are orangefs_bufmap_get() and
orangefs_readdir_index_get().  Those begin with grabbing a reference (and
failing if __orangefs_bufmap was NULL) and eventually either returning a
zero and storing the acquired pointer in *mapp, or dropping the reference
and returning non-zero.  More serious red flag, BTW - *mapp is still set
in the latter case, which might end up with confused callers...

Let's see... bufmap_get is called in wait_for_direct_io() and returning
a negative is followed by buggering off to
        if (buffer_index >= 0) {
                orangefs_bufmap_put(bufmap, buffer_index);
                gossip_debug(GOSSIP_FILE_DEBUG,
                             "%s(%pU): PUT buffer_index %d\n",
                             __func__, handle, buffer_index);
                buffer_index = -1;
        }
Uh-oh...
	1) can that thing return a positive value?
	2) can it end up storing something in buffer_index and still
failing?
	3) can we get there with non-negative buffer_index?

OK, (1) and (2) translate into the same question for wait_for_a_slot(),
which returns 0, -EINTR or -ETIMEDOUT and only overwrites buffer_index in
case when it returns 0.  And (3)...  Fuck. (3) is possible.  Look:
        if (ret == -EAGAIN && op_state_purged(new_op)) {
                orangefs_bufmap_put(bufmap, buffer_index);
                gossip_debug(GOSSIP_FILE_DEBUG,
                             "%s:going to repopulate_shared_memory.\n",
                             __func__);
                goto populate_shared_memory;
        }
in wait_for_direct_io() sends us to populate_shared_memory with buffer_index
unchanged despite the fact that we'd done orangefs_bufmap_put() on it.  And
there we hit
        /* get a shared buffer index */
        ret = orangefs_bufmap_get(&bufmap, &buffer_index);
        if (ret < 0) {
                gossip_debug(GOSSIP_FILE_DEBUG,
                             "%s: orangefs_bufmap_get failure (%ld)\n",
                             __func__, (long)ret);
                goto out;
with failure followed by _another_ orangefs_bufmap_put() around out:

OK, there's your buggered refcounting.  The minimal fix is to slap
buffer_index = -1; right after the orangefs_bufmap_put() in there.

orangefs_readdir_index_get() caller simply buggers off on failure.  No problem
there, and longer term that's what I'd suggest doing in wait_for_direct_io()
as well.  Anyway, could you try this on top of your for-next and see if you
can reproduce either WARN_ON?

diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index d865b58..40b3805 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -210,6 +210,7 @@ populate_shared_memory:
 	 */
 	if (ret == -EAGAIN && op_state_purged(new_op)) {
 		orangefs_bufmap_put(bufmap, buffer_index);
+		buffer_index = -1;
 		gossip_debug(GOSSIP_FILE_DEBUG,
 			     "%s:going to repopulate_shared_memory.\n",
 			     __func__);

  reply	other threads:[~2016-02-06 19:42 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-15 21:46 Orangefs ABI documentation Mike Marshall
2016-01-22  7:11 ` Al Viro
2016-01-22 11:09   ` Mike Marshall
2016-01-22 16:59     ` Mike Marshall
2016-01-22 17:08       ` Al Viro
2016-01-22 17:40         ` Mike Marshall
2016-01-22 17:43         ` Al Viro
2016-01-22 18:17           ` Mike Marshall
2016-01-22 18:37             ` Al Viro
2016-01-22 19:07               ` Mike Marshall
2016-01-22 19:21                 ` Mike Marshall
2016-01-22 20:04                   ` Al Viro
2016-01-22 20:30                     ` Mike Marshall
2016-01-23  0:12                       ` Al Viro
2016-01-23  1:28                         ` Al Viro
2016-01-23  2:54                           ` Mike Marshall
2016-01-23 19:10                             ` Al Viro
2016-01-23 19:24                               ` Mike Marshall
2016-01-23 21:35                                 ` Mike Marshall
2016-01-23 22:05                                   ` Al Viro
2016-01-23 21:40                                 ` Al Viro
2016-01-23 22:36                                   ` Mike Marshall
2016-01-24  0:16                                     ` Al Viro
2016-01-24  4:05                                       ` Al Viro
2016-01-24 22:12                                         ` Mike Marshall
2016-01-30 17:22                                           ` Al Viro
2016-01-26 19:52                                         ` Martin Brandenburg
2016-01-30 17:34                                           ` Al Viro
2016-01-30 18:27                                             ` Al Viro
2016-02-04 23:30                                               ` Mike Marshall
2016-02-06 19:42                                                 ` Al Viro [this message]
2016-02-07  1:38                                                   ` Al Viro
2016-02-07  3:53                                                     ` Al Viro
2016-02-07 20:01                                                       ` [RFC] bufmap-related wait logics (Re: Orangefs ABI documentation) Al Viro
2016-02-08 22:26                                                       ` Orangefs ABI documentation Mike Marshall
2016-02-08 23:35                                                         ` Al Viro
2016-02-09  3:32                                                           ` Al Viro
2016-02-09 14:34                                                             ` Mike Marshall
2016-02-09 17:40                                                               ` Al Viro
2016-02-09 21:06                                                                 ` Al Viro
2016-02-09 22:25                                                                   ` Mike Marshall
2016-02-11 23:36                                                                   ` Mike Marshall
2016-02-09 22:02                                                                 ` Mike Marshall
2016-02-09 22:16                                                                   ` Al Viro
2016-02-09 22:40                                                                     ` Al Viro
2016-02-09 23:13                                                                       ` Al Viro
2016-02-10 16:44                                                                         ` Al Viro
2016-02-10 21:26                                                                           ` Al Viro
2016-02-11 23:54                                                                           ` Mike Marshall
2016-02-12  0:55                                                                             ` Al Viro
2016-02-12 12:13                                                                               ` Mike Marshall
2016-02-11  0:44                                                                         ` Al Viro
2016-02-11  3:22                                                                           ` Mike Marshall
2016-02-12  4:27                                                                             ` Al Viro
2016-02-12 12:26                                                                               ` Mike Marshall
2016-02-12 18:00                                                                                 ` Martin Brandenburg
2016-02-13 17:18                                                                                   ` Mike Marshall
2016-02-13 17:47                                                                                     ` Al Viro
2016-02-14  2:56                                                                                       ` Al Viro
2016-02-14  3:46                                                                                         ` [RFC] slot allocator - waitqueue use review needed (Re: Orangefs ABI documentation) Al Viro
2016-02-14  4:06                                                                                           ` Al Viro
2016-02-16  2:12                                                                                           ` Al Viro
2016-02-16 19:28                                                                                             ` Al Viro
2016-02-14 22:31                                                                                         ` Orangefs ABI documentation Mike Marshall
2016-02-14 23:43                                                                                           ` Al Viro
2016-02-15 17:46                                                                                             ` Mike Marshall
2016-02-15 18:45                                                                                               ` Al Viro
2016-02-15 22:32                                                                                                 ` Martin Brandenburg
2016-02-15 23:04                                                                                                   ` Al Viro
2016-02-16 23:15                                                                                                     ` Mike Marshall
2016-02-16 23:36                                                                                                       ` Al Viro
2016-02-16 23:54                                                                                                         ` Al Viro
2016-02-17 19:24                                                                                                           ` Mike Marshall
2016-02-17 20:11                                                                                                             ` Al Viro
2016-02-17 21:17                                                                                                               ` Al Viro
2016-02-17 22:24                                                                                                                 ` Mike Marshall
2016-02-17 22:40                                                                                                             ` Martin Brandenburg
2016-02-17 23:09                                                                                                               ` Al Viro
2016-02-17 23:15                                                                                                                 ` Al Viro
2016-02-18  0:04                                                                                                                   ` Al Viro
2016-02-18 11:11                                                                                                                     ` Al Viro
2016-02-18 18:58                                                                                                                       ` Mike Marshall
2016-02-18 19:20                                                                                                                         ` Al Viro
2016-02-18 19:49                                                                                                                         ` Martin Brandenburg
2016-02-18 20:08                                                                                                                           ` Mike Marshall
2016-02-18 20:22                                                                                                                             ` Mike Marshall
2016-02-18 20:38                                                                                                                               ` Mike Marshall
2016-02-18 20:52                                                                                                                                 ` Al Viro
2016-02-18 21:50                                                                                                                                   ` Mike Marshall
2016-02-19  0:25                                                                                                                                     ` Al Viro
2016-02-19 22:11                                                                                                                                       ` Mike Marshall
2016-02-19 22:22                                                                                                                                         ` Al Viro
2016-02-20 12:14                                                                                                                                           ` Mike Marshall
2016-02-20 13:36                                                                                                                                             ` Al Viro
2016-02-22 16:20                                                                                                                                               ` Mike Marshall
2016-02-22 21:22                                                                                                                                                 ` Mike Marshall
2016-02-23 21:58                                                                                                                                                   ` Mike Marshall
2016-02-26 20:21                                                                                                                                                     ` Mike Marshall
2016-02-19 22:32                                                                                                                                         ` Al Viro
2016-02-19 22:45                                                                                                                                           ` Martin Brandenburg
2016-02-19 22:50                                                                                                                                           ` Martin Brandenburg
2016-02-18 20:49                                                                                                                               ` Al Viro
2016-02-15 22:47                                                                                                 ` Mike Marshall
2016-01-23 22:46                                   ` write() semantics (Re: Orangefs ABI documentation) Al Viro
2016-01-23 23:35                                     ` Linus Torvalds
2016-03-03 22:25                                       ` Mike Marshall
2016-03-04 20:55                                         ` Mike Marshall
2016-01-22 20:51                     ` Orangefs ABI documentation Mike Marshall
2016-01-22 23:53                       ` Mike Marshall
2016-01-22 19:54                 ` Al Viro
2016-01-22 19:50             ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160206194210.GX17997@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hubcap@omnibond.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sfr@canb.auug.org.au \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.