From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:45699 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752809AbcBFTmO (ORCPT ); Sat, 6 Feb 2016 14:42:14 -0500 Date: Sat, 6 Feb 2016 19:42:10 +0000 From: Al Viro To: Mike Marshall Cc: Linus Torvalds , linux-fsdevel , Stephen Rothwell Subject: Re: Orangefs ABI documentation Message-ID: <20160206194210.GX17997@ZenIV.linux.org.uk> References: <20160123191055.GN17997@ZenIV.linux.org.uk> <20160123214006.GO17997@ZenIV.linux.org.uk> <20160124001615.GT17997@ZenIV.linux.org.uk> <20160124040529.GX17997@ZenIV.linux.org.uk> <20160130173413.GE17997@ZenIV.linux.org.uk> <20160130182731.GF17997@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Feb 04, 2016 at 06:30:26PM -0500, Mike Marshall wrote: > As for the WARN_ONs, the waitqueue one is easy to hit when the > client-core stops and restarts, you can see here where precopy_buffers > started whining about the client-core, you can see that the client > core restarted when the debug mask got sent back over, and then > the WARN_ON in waitqueue gets hit: > > [ 1239.198976] precopy_buffers: Failed to copy-in buffers. Please make > sure that the pvfs2-client is running. -14 > [ 1239.198979] precopy_buffers: Failed to copy-in buffers. Please make > sure that the pvfs2-client is running. -14 > [ 1239.198983] orangefs_file_write_iter: do_readv_writev failed, rc:-14:. > [ 1239.199175] precopy_buffers: Failed to copy-in buffers. Please make > sure that the pvfs2-client is running. -14 > [ 1239.199177] precopy_buffers: Failed to copy-in buffers. Please make > sure that the pvfs2-client is running. -14 > [ 1239.199180] orangefs_file_write_iter: do_readv_writev failed, rc:-14:. > [ 1239.199601] precopy_buffers: Failed to copy-in buffers. Please make > sure that the pvfs2-client is running. -14 > [ 1239.199602] precopy_buffers: Failed to copy-in buffers. Please make > sure that the pvfs2-client is running. -14 > [ 1239.199604] orangefs_file_write_iter: do_readv_writev failed, rc:-14:. > [ 1239.248239] dispatch_ioctl_command: client debug mask has been been > received :0: :0: > [ 1239.248257] dispatch_ioctl_command: client debug array string has > been receiv ed. > [ 1239.307842] ------------[ cut here ]------------ > [ 1239.307847] WARNING: CPU: 0 PID: 1347 at > fs/orangefs/waitqueue.c:208 service_ operation+0x59f/0x9b0() > [ 1239.307848] Modules linked in: bnep bluetooth ip6t_rpfilter rfkill > ip6t_REJEC T nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 > nf_conntrack_ipv4 nf_defrag_ip v4 xt_conntrack nf_conntrack > ebtable_nat ebtable_broute bridge stp llc ebtable_f ilter ebtables > ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip > 6_tables iptable_mangle iptable_security iptable_raw ppdev parport_pc > virtio_bal loon virtio_console parport 8139too serio_raw pvpanic > i2c_piix4 uinput qxl drm_k ms_helper ttm drm 8139cp i2c_core > virtio_pci ata_generic virtio virtio_ring mii pata_acpi > [ 1239.307870] CPU: 0 PID: 1347 Comm: dbench Not tainted > 4.4.0-161988-g237f828-d irty #49 > [ 1239.307871] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > [ 1239.307872] 0000000000000000 0000000011eac412 ffff88003bf27cd0 > ffffffff8139c 84d > [ 1239.307874] ffff88003bf27d08 ffffffff8108e510 ffff880010968000 > ffff88001096c 1d0 > [ 1239.307876] ffff88001096c188 00000000fffffff5 0000000000000000 > ffff88003bf27 d18 > [ 1239.307877] Call Trace: > [ 1239.307881] [] dump_stack+0x19/0x1c > [ 1239.307884] [] warn_slowpath_common+0x80/0xc0 > [ 1239.307886] [] warn_slowpath_null+0x1a/0x20 > [ 1239.307887] [] service_operation+0x59f/0x9b0 > [ 1239.307889] [] ? prepare_to_wait_event+0x100/0x100 > [ 1239.307891] [] ? prepare_to_wait_event+0x100/0x100 > [ 1239.307893] [] orangefs_readdir+0x172/0xd50 > [ 1239.307895] [] ? trace_hardirqs_on+0xd/0x10 > [ 1239.307898] [] iterate_dir+0x9f/0x130 > [ 1239.307899] [] SyS_getdents+0xa0/0x140 > [ 1239.307900] [] ? iterate_dir+0x130/0x130 > [ 1239.307903] [] entry_SYSCALL_64_fastpath+0x12/0x76 > [ 1239.307904] ---[ end trace 66a9a15ad78b3dea ]--- Bloody hell. OK, something's seriously fishy here. You have service_operation() called from orangefs_readdir(). It had been immediately preceded by ret = orangefs_readdir_index_get(&bufmap, &buffer_index); returning a success. Moreover, *all* paths after return from service_operation() pass through some instance of orangefs_readdir_index_put(bufmap, buffer_index); (possibly hidden behind readdir_handle_dtor(bufmap, &rhandle)). We'd bloody better _not_ have had bufmap freed by that point, right? OTOH, this service_operation() has just seen orangefs_get_bufmap_init() returning 0. Since int orangefs_get_bufmap_init(void) { return __orangefs_bufmap ? 1 : 0; } we have observed NULL __orangefs_bufmap at that point. __orangefs_bufmap is static, its address is never taken and the only assignments to it are static void orangefs_bufmap_unref(struct orangefs_bufmap *bufmap) { if (atomic_dec_and_lock(&bufmap->refcnt, &orangefs_bufmap_lock)) { __orangefs_bufmap = NULL; spin_unlock(&orangefs_bufmap_lock); orangefs_bufmap_unmap(bufmap); orangefs_bufmap_free(bufmap); } } and bufmap = orangefs_bufmap_alloc(user_desc); if (!bufmap) goto out; ret = orangefs_bufmap_map(bufmap, user_desc); if (ret) goto out_free_bufmap; spin_lock(&orangefs_bufmap_lock); if (__orangefs_bufmap) { spin_unlock(&orangefs_bufmap_lock); gossip_err("orangefs: error: bufmap already initialized.\n"); ret = -EALREADY; goto out_unmap_bufmap; } __orangefs_bufmap = bufmap; spin_unlock(&orangefs_bufmap_lock); The latter is definitely *not* making it NULL - that's daemon asking to install a new one. And the former is unconditionally followed by kfree(bufmap) (via orangefs_bufmap_free()). IOW, if this WARN_ON() is triggered, you've already gotten to freeing the damn thing. And subsequent crash appears to match that theory. The really weird thing is that orangefs_readdir_index_get() increments the refcount, so having hit that __orangefs_bufmap = NULL; while we sat in service_operation() means that refcounting of bufmap got broken somehow... orangefs_bufmap_alloc() creates it with refcount 1 and is very shortly followed either by orangefs_bufmap_free() or by __orangefs_bufmap changing from NULL to the pointer to created instance. orangefs_bufmap_ref() either returns NULL or non-NULL value equal to __orangefs_bufmap, in the latter case having bumped its refcount. orangefs_bufmap_unref() drops refcount on its argument and if it has reached 0, clears __orangefs_bufmap and frees the argument. That's a bit of a red flag - either we are always called with argument equal to __orangefs_bufmap, or we might end up with a leak... Nothing else changes their refcounts. In orangefs_bufmap_{size,shift}_query() a successful orangefs_bufmap_ref() is immediately followed by orangefs_bufmap_unref() (incidentally, both might as well have just grabbed orangefs_bufmap_lock, checked __orangefs_bufmap and picked its ->desc_{size,shift} before dropping the lock - no need to mess with refcounts in those two). Remaining callers of orangefs_bufmap_ref() are orangefs_bufmap_get() and orangefs_readdir_index_get(). Those begin with grabbing a reference (and failing if __orangefs_bufmap was NULL) and eventually either returning a zero and storing the acquired pointer in *mapp, or dropping the reference and returning non-zero. More serious red flag, BTW - *mapp is still set in the latter case, which might end up with confused callers... Let's see... bufmap_get is called in wait_for_direct_io() and returning a negative is followed by buggering off to if (buffer_index >= 0) { orangefs_bufmap_put(bufmap, buffer_index); gossip_debug(GOSSIP_FILE_DEBUG, "%s(%pU): PUT buffer_index %d\n", __func__, handle, buffer_index); buffer_index = -1; } Uh-oh... 1) can that thing return a positive value? 2) can it end up storing something in buffer_index and still failing? 3) can we get there with non-negative buffer_index? OK, (1) and (2) translate into the same question for wait_for_a_slot(), which returns 0, -EINTR or -ETIMEDOUT and only overwrites buffer_index in case when it returns 0. And (3)... Fuck. (3) is possible. Look: if (ret == -EAGAIN && op_state_purged(new_op)) { orangefs_bufmap_put(bufmap, buffer_index); gossip_debug(GOSSIP_FILE_DEBUG, "%s:going to repopulate_shared_memory.\n", __func__); goto populate_shared_memory; } in wait_for_direct_io() sends us to populate_shared_memory with buffer_index unchanged despite the fact that we'd done orangefs_bufmap_put() on it. And there we hit /* get a shared buffer index */ ret = orangefs_bufmap_get(&bufmap, &buffer_index); if (ret < 0) { gossip_debug(GOSSIP_FILE_DEBUG, "%s: orangefs_bufmap_get failure (%ld)\n", __func__, (long)ret); goto out; with failure followed by _another_ orangefs_bufmap_put() around out: OK, there's your buggered refcounting. The minimal fix is to slap buffer_index = -1; right after the orangefs_bufmap_put() in there. orangefs_readdir_index_get() caller simply buggers off on failure. No problem there, and longer term that's what I'd suggest doing in wait_for_direct_io() as well. Anyway, could you try this on top of your for-next and see if you can reproduce either WARN_ON? diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c index d865b58..40b3805 100644 --- a/fs/orangefs/file.c +++ b/fs/orangefs/file.c @@ -210,6 +210,7 @@ populate_shared_memory: */ if (ret == -EAGAIN && op_state_purged(new_op)) { orangefs_bufmap_put(bufmap, buffer_index); + buffer_index = -1; gossip_debug(GOSSIP_FILE_DEBUG, "%s:going to repopulate_shared_memory.\n", __func__);