All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Marshall <hubcap@omnibond.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Brandenburg <martin@omnibond.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Date: Mon, 15 Feb 2016 12:46:51 -0500	[thread overview]
Message-ID: <CAOg9mSSsztHyRsQ=BAA9FoEMo7XEdgTwYHosqvh4ZjykDapj2A@mail.gmail.com> (raw)
In-Reply-To: <20160214234312.GX17997@ZenIV.linux.org.uk>

I pushed the list_del up to the kernel.org for-next branch...

And I've been running tests with the CRUDE bandaid... weird
results...

No oopses, no WARN_ONs... I was running dbench and ls -R
or find and kill-minus-nining different ones of them with no
perceived resulting problems, so I moved on to signalling
the client-core to abort... it restarted numerous times,
and then stuff wedged up differently than I've seen before.

Usually I kill the client-core and it comes back (gets restarted)
as seen by the different PID:

# ps -ef | grep pvfs
root      1292  1185  7 11:39 ?        00:00:01 pvfs2-client-core
--child -a 60000 -n 60000 --logtype file -L /var/log/client.log
# kill -6 1292
# ps -ef | grep pvfs
root      1299  1185  8 11:40 ?        00:00:00 pvfs2-client-core
--child -a 60000 -n 60000 --logtype file -L /var/log/client.log

Until once, it didn't die, and the gorked up unkillable left-over thing's
argv[0] (or wherever this string gets scraped from) was goofy:

# ps -ef | grep pvfs
root      1324  1185  1 11:41 ?        00:00:02 pvfs2-client-core
--child -a 60000 -n 60000 --logtype file -L /var/log/client.log
[root@be1 hubcap]# kill -6 1324
[root@be1 hubcap]# ps -ef | grep pvfs
root      1324  1185  2 11:41 ?        00:00:05 [pvfs2-client-co]

The virtual host was pretty wedged up after that, I couldn't look
at anything interesting, and got a bunch of terminal windows hung
trying:

# strace -f -p 1324
Process 1324 attached
^C

^C^C
                                     .
                     ls -R's output was flowing out here
/pvfsmnt/tdir/z_really_long_disgustingly_long_super_long_file_name52
/pvfsmnt/tdir/z_really_long_disgustingly_long_super_long_file_name53



^C^C^C


[root@logtruck hubcap]# ssh be1
root@be1's password:
Last login: Mon Feb 15 11:33:42 2016 from logtruck.clemson.edu
[root@be1 ~]# df


I still had one functioning window, and looked at dmesg from there,
nothing interesting there... a couple of expected tag WARNINGS while I was
killing finds and dbenches... ioctls that happened during the
successful restarts of the client-core...

[  809.520966] client-core: opening device
[  809.521031] pvfs2-client-core: open device complete (ret = 0)
[  809.521050] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[  809.521068] dispatch_ioctl_command: client debug array string has
been received.
[  809.521070] orangefs_prepare_debugfs_help_string: start
[  809.521071] orangefs_prepare_cdm_array: start
[  809.521104] orangefs_prepare_cdm_array: rc:50:
[  809.521106] orangefs_prepare_debugfs_help_string: cdm_element_count:50:
[  809.521239] debug_mask_to_string: start
[  809.521242] debug_mask_to_string: string:none:
[  809.521243] orangefs_client_debug_init: start
[  809.521249] orangefs_client_debug_init: rc:0:
[  809.566652] dispatch_ioctl_command: got ORANGEFS_DEV_REMOUNT_ALL
[  809.566667] dispatch_ioctl_command: priority remount in progress
[  809.566668] dispatch_ioctl_command: priority remount complete
[  812.454255] orangefs_debug_open: orangefs_debug_disabled: 0
[  812.454294] orangefs_debug_open: rc: 0
[  812.454320] orangefs_debug_write: kernel-debug
[  812.454323] debug_string_to_mask: start
[  896.410522] WARNING: No one's waiting for tag 15612
[ 1085.339948] WARNING: No one's waiting for tag 127943
[ 1146.820485] orangefs: please confirm that pvfs2-client daemon is running.
[ 1146.820488] fs/orangefs/dir.c line 264: orangefs_readdir:
orangefs_readdir_index_get() failure (-5)
[ 1146.866812] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[ 1146.866834] dispatch_ioctl_command: client debug array string has
been received.
[ 1175.906800] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[ 1175.906817] dispatch_ioctl_command: client debug array string has
been received.
[ 1223.915862] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[ 1223.915880] dispatch_ioctl_command: client debug array string has
been received.
[ 1274.458852] dispatch_ioctl_command: client debug mask has been been
received :0: :0:
[ 1274.458870] dispatch_ioctl_command: client debug array string has
been received.
[root@be1 hubcap]#


ps aux shows every process' state as S except for 1324 which is
racking up time:

[hubcap@be1 ~]$ ps aux | grep pvfs2-client
root      1324 92.4  0.0      0     0 ?        R    11:41  46:29
[pvfs2-client-co]
[hubcap@be1 ~]$ ps aux | grep pvfs2-client
root      1324 92.4  0.0      0     0 ?        R    11:41  46:30
[pvfs2-client-co]

I'll virsh destroy this thing now <g>...

-Mike



On Sun, Feb 14, 2016 at 6:43 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Sun, Feb 14, 2016 at 05:31:10PM -0500, Mike Marshall wrote:
>> I added the list_del...
>>
>> Everything is very resilient, I killed
>> the client-core over and over while dbench
>> was running at the same time as  ls -R
>> was running, and the client-core always
>> restarted... until finally, it didn't. I guess
>> related to the state of just what was going on
>> at the time... Hit the WARN_ON in service_operation,
>> and then oopsed on the orangefs_bufmap_put
>> down at the end of wait_for_direct_io...
>
> Bloody hell...  I think I see what's going on, and presumably the newer
> slot allocator would fix that.  Look: closing control device (== daemon
> death) checks if we have a bufmap installed and drops a reference to
> it in that case.  The reason why it's conditional is that we might have
> not gotten around to installing one (it's done via ioctl on control
> device).  But ->release() does *NOT* wait for all references to go away!
> In other words, it's possible to restart the daemon while the old bufmap
> is still there.  Then have it killed after it has opened control devices
> and before the old bufmap has run down.  For ->release() it looks like
> we *have* gotten around to installing bufmap, and need the reference dropped.
> In reality, the reference acquired when we were installing that one has
> already been dropped, so we get double put.  With expected results...
>
> If below ends up fixing the symptoms, analysis above has a good chance to
> be correct.  This is no way to wait for rundown, of course - I'm not
> suggesting it as the solution, just as a way to narrow down what's going
> on.
>
> Incidentally, could you fold the list_del() part into offending commit
> (orangefs: delay freeing slot until cancel completes) and repush your
> for-next?
>
> diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
> index 6a7df12..630246d 100644
> --- a/fs/orangefs/devorangefs-req.c
> +++ b/fs/orangefs/devorangefs-req.c
> @@ -529,6 +529,9 @@ static int orangefs_devreq_release(struct inode *inode, struct file *file)
>         purge_inprogress_ops();
>         gossip_debug(GOSSIP_DEV_DEBUG,
>                      "pvfs2-client-core: device close complete\n");
> +       /* VERY CRUDE, NOT FOR MERGE */
> +       while (orangefs_get_bufmap_init())
> +               schedule_timeout(HZ);
>         open_access_count = 0;
>         mutex_unlock(&devreq_mutex);
>         return 0;
> diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
> index 41f8bb1f..1e28555 100644
> --- a/fs/orangefs/orangefs-kernel.h
> +++ b/fs/orangefs/orangefs-kernel.h
> @@ -261,6 +261,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
>  {
>         spin_lock(&op->lock);
>         if (unlikely(op_is_cancel(op))) {
> +               list_del(&op->list);
>                 spin_unlock(&op->lock);
>                 put_cancel(op);
>         } else {

  reply	other threads:[~2016-02-15 17:46 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-15 21:46 Orangefs ABI documentation Mike Marshall
2016-01-22  7:11 ` Al Viro
2016-01-22 11:09   ` Mike Marshall
2016-01-22 16:59     ` Mike Marshall
2016-01-22 17:08       ` Al Viro
2016-01-22 17:40         ` Mike Marshall
2016-01-22 17:43         ` Al Viro
2016-01-22 18:17           ` Mike Marshall
2016-01-22 18:37             ` Al Viro
2016-01-22 19:07               ` Mike Marshall
2016-01-22 19:21                 ` Mike Marshall
2016-01-22 20:04                   ` Al Viro
2016-01-22 20:30                     ` Mike Marshall
2016-01-23  0:12                       ` Al Viro
2016-01-23  1:28                         ` Al Viro
2016-01-23  2:54                           ` Mike Marshall
2016-01-23 19:10                             ` Al Viro
2016-01-23 19:24                               ` Mike Marshall
2016-01-23 21:35                                 ` Mike Marshall
2016-01-23 22:05                                   ` Al Viro
2016-01-23 21:40                                 ` Al Viro
2016-01-23 22:36                                   ` Mike Marshall
2016-01-24  0:16                                     ` Al Viro
2016-01-24  4:05                                       ` Al Viro
2016-01-24 22:12                                         ` Mike Marshall
2016-01-30 17:22                                           ` Al Viro
2016-01-26 19:52                                         ` Martin Brandenburg
2016-01-30 17:34                                           ` Al Viro
2016-01-30 18:27                                             ` Al Viro
2016-02-04 23:30                                               ` Mike Marshall
2016-02-06 19:42                                                 ` Al Viro
2016-02-07  1:38                                                   ` Al Viro
2016-02-07  3:53                                                     ` Al Viro
2016-02-07 20:01                                                       ` [RFC] bufmap-related wait logics (Re: Orangefs ABI documentation) Al Viro
2016-02-08 22:26                                                       ` Orangefs ABI documentation Mike Marshall
2016-02-08 23:35                                                         ` Al Viro
2016-02-09  3:32                                                           ` Al Viro
2016-02-09 14:34                                                             ` Mike Marshall
2016-02-09 17:40                                                               ` Al Viro
2016-02-09 21:06                                                                 ` Al Viro
2016-02-09 22:25                                                                   ` Mike Marshall
2016-02-11 23:36                                                                   ` Mike Marshall
2016-02-09 22:02                                                                 ` Mike Marshall
2016-02-09 22:16                                                                   ` Al Viro
2016-02-09 22:40                                                                     ` Al Viro
2016-02-09 23:13                                                                       ` Al Viro
2016-02-10 16:44                                                                         ` Al Viro
2016-02-10 21:26                                                                           ` Al Viro
2016-02-11 23:54                                                                           ` Mike Marshall
2016-02-12  0:55                                                                             ` Al Viro
2016-02-12 12:13                                                                               ` Mike Marshall
2016-02-11  0:44                                                                         ` Al Viro
2016-02-11  3:22                                                                           ` Mike Marshall
2016-02-12  4:27                                                                             ` Al Viro
2016-02-12 12:26                                                                               ` Mike Marshall
2016-02-12 18:00                                                                                 ` Martin Brandenburg
2016-02-13 17:18                                                                                   ` Mike Marshall
2016-02-13 17:47                                                                                     ` Al Viro
2016-02-14  2:56                                                                                       ` Al Viro
2016-02-14  3:46                                                                                         ` [RFC] slot allocator - waitqueue use review needed (Re: Orangefs ABI documentation) Al Viro
2016-02-14  4:06                                                                                           ` Al Viro
2016-02-16  2:12                                                                                           ` Al Viro
2016-02-16 19:28                                                                                             ` Al Viro
2016-02-14 22:31                                                                                         ` Orangefs ABI documentation Mike Marshall
2016-02-14 23:43                                                                                           ` Al Viro
2016-02-15 17:46                                                                                             ` Mike Marshall [this message]
2016-02-15 18:45                                                                                               ` Al Viro
2016-02-15 22:32                                                                                                 ` Martin Brandenburg
2016-02-15 23:04                                                                                                   ` Al Viro
2016-02-16 23:15                                                                                                     ` Mike Marshall
2016-02-16 23:36                                                                                                       ` Al Viro
2016-02-16 23:54                                                                                                         ` Al Viro
2016-02-17 19:24                                                                                                           ` Mike Marshall
2016-02-17 20:11                                                                                                             ` Al Viro
2016-02-17 21:17                                                                                                               ` Al Viro
2016-02-17 22:24                                                                                                                 ` Mike Marshall
2016-02-17 22:40                                                                                                             ` Martin Brandenburg
2016-02-17 23:09                                                                                                               ` Al Viro
2016-02-17 23:15                                                                                                                 ` Al Viro
2016-02-18  0:04                                                                                                                   ` Al Viro
2016-02-18 11:11                                                                                                                     ` Al Viro
2016-02-18 18:58                                                                                                                       ` Mike Marshall
2016-02-18 19:20                                                                                                                         ` Al Viro
2016-02-18 19:49                                                                                                                         ` Martin Brandenburg
2016-02-18 20:08                                                                                                                           ` Mike Marshall
2016-02-18 20:22                                                                                                                             ` Mike Marshall
2016-02-18 20:38                                                                                                                               ` Mike Marshall
2016-02-18 20:52                                                                                                                                 ` Al Viro
2016-02-18 21:50                                                                                                                                   ` Mike Marshall
2016-02-19  0:25                                                                                                                                     ` Al Viro
2016-02-19 22:11                                                                                                                                       ` Mike Marshall
2016-02-19 22:22                                                                                                                                         ` Al Viro
2016-02-20 12:14                                                                                                                                           ` Mike Marshall
2016-02-20 13:36                                                                                                                                             ` Al Viro
2016-02-22 16:20                                                                                                                                               ` Mike Marshall
2016-02-22 21:22                                                                                                                                                 ` Mike Marshall
2016-02-23 21:58                                                                                                                                                   ` Mike Marshall
2016-02-26 20:21                                                                                                                                                     ` Mike Marshall
2016-02-19 22:32                                                                                                                                         ` Al Viro
2016-02-19 22:45                                                                                                                                           ` Martin Brandenburg
2016-02-19 22:50                                                                                                                                           ` Martin Brandenburg
2016-02-18 20:49                                                                                                                               ` Al Viro
2016-02-15 22:47                                                                                                 ` Mike Marshall
2016-01-23 22:46                                   ` write() semantics (Re: Orangefs ABI documentation) Al Viro
2016-01-23 23:35                                     ` Linus Torvalds
2016-03-03 22:25                                       ` Mike Marshall
2016-03-04 20:55                                         ` Mike Marshall
2016-01-22 20:51                     ` Orangefs ABI documentation Mike Marshall
2016-01-22 23:53                       ` Mike Marshall
2016-01-22 19:54                 ` Al Viro
2016-01-22 19:50             ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOg9mSSsztHyRsQ=BAA9FoEMo7XEdgTwYHosqvh4ZjykDapj2A@mail.gmail.com' \
    --to=hubcap@omnibond.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=martin@omnibond.com \
    --cc=sfr@canb.auug.org.au \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.