All of lore.kernel.org
 help / color / mirror / Atom feed
From: Charles Hedrick <hedrick@rutgers.edu>
To: Timothy Pearson <tpearson@raptorengineering.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load
Date: Mon, 9 Aug 2021 22:01:39 +0000	[thread overview]
Message-ID: <EC809F03-464E-4F82-BB2B-DD62EF17C9AB@rutgers.edu> (raw)
In-Reply-To: <921953712.1074013.1628545770061.JavaMail.zimbra@raptorengineeringinc.com>

yes, but the timing may be different. When a new file is created, inotify will tell AMP about it, and AMP will immediately read it.

> On Aug 9, 2021, at 5:49:30 PM, Timothy Pearson <tpearson@raptorengineering.com> wrote:
> 
> I'm not sure that is much different than the load patterns we end up generating, with mixed remote and local I/O.  I'd think that such a scenario is fairly typical, especially when factoring in backup processes.
> 
> ----- Original Message -----
>> From: "hedrick" <hedrick@rutgers.edu>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "J. Bruce Fields" <bfields@fieldses.org>, "Chuck Lever" <chuck.lever@oracle.com>, "linux-nfs"
>> <linux-nfs@vger.kernel.org>
>> Sent: Monday, August 9, 2021 3:54:17 PM
>> Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load
> 
>> I just realized there’s one thing you should know. We run Cisco’s AMP for
>> Endpoints on the server. The goal is to detect malware that our users might put
>> on the file system. Typically one is worried about malware installed n client,
>> but we’re concerned that developers may be using java and python libraries with
>> known issues, and those will commonly be stored on the server.
>> 
>> If AMP is doing its job, it will check most new files. I’m not sure whether that
>> creates atypical usage or not.
>> 
>>> On Aug 9, 2021, at 2:56:15 PM, Timothy Pearson <tpearson@raptorengineering.com>
>>> wrote:
>>> 
>>> Can confirm -- same general backtrace I sent in earlier.
>>> 
>>> That means the bug is:
>>> 1.) Not architecture specific
>>> 2.) Not filesystem specific
>>> 
>>> I was originally concerned it was related to BTRFS or POWER-specific, good to
>>> see it is not.
>>> 
>>> ----- Original Message -----
>>>> From: "hedrick" <hedrick@rutgers.edu>
>>>> To: "J. Bruce Fields" <bfields@fieldses.org>
>>>> Cc: "Timothy Pearson" <tpearson@raptorengineering.com>, "Chuck Lever"
>>>> <chuck.lever@oracle.com>, "linux-nfs"
>>>> <linux-nfs@vger.kernel.org>
>>>> Sent: Monday, August 9, 2021 1:51:05 PM
>>>> Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load
>>> 
>>>> I have. I was trying to avoid a reboot.
>>>> 
>>>> By the way, after the first failure, during reboot, syslog showed the following.
>>>> I’m unclear what it means, bu tit looks ike it might be from the failure
>>>> 
>>>> 
>>>> 
>>>>> On Aug 9, 2021, at 2:49 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>>>> 
>>>>> On Mon, Aug 09, 2021 at 02:38:33PM -0400, hedrick@rutgers.edu wrote:
>>>>>> Does setting /proc/sys/fs/leases-enable to 0 work while the system is
>>>>>> up? I was expecting to see lslocks | grep DELE | wc go down. It’s not.
>>>>>> It’s staying around 1850.
>>>>> 
>>>>> All it should do is prevent giving out *new* delegations.
>>>>> 
>>>>> Best is to set that sysctl on system startup before nfsd starts.
>>>>> 
>>>>>>> On Aug 9, 2021, at 2:30 PM, Timothy Pearson
>>>>>>> <tpearson@raptorengineering.com> wrote:
>>>>>>> 
>>>>>>> FWIW that's *exactly* what we see.  Eventually, if the server is
>>>>>>> left alone for enough time, even the login system stops responding
>>>>>>> -- it's as if the I/O subsystem degrades and eventually blocks
>>>>>>> entirely.
>>>>> 
>>>>> That's pretty common behavior across a variety of kernel bugs.  So on
>>>>> its own it doesn't mean the root cause is the same.
>>>>> 
>>>>> --b.


  reply	other threads:[~2021-08-09 22:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-05  9:44 CPU stall, eventual host hang with BTRFS + NFS under heavy load Timothy Pearson
2021-07-05  9:47 ` Timothy Pearson
2021-07-23 21:01   ` J. Bruce Fields
     [not found]   ` <B4D8C4B7-EE8C-456C-A6C5-D25FF1F3608E@rutgers.edu>
     [not found]     ` <3A4DF3BB-955C-4301-BBED-4D5F02959F71@rutgers.edu>
2021-08-09 17:06       ` Timothy Pearson
2021-08-09 17:15         ` hedrick
2021-08-09 17:25           ` Timothy Pearson
2021-08-09 17:37           ` Chuck Lever III
     [not found]             ` <F5179A41-FB9A-4AB1-BE58-C2859DB7EC06@rutgers.edu>
2021-08-09 18:30               ` Timothy Pearson
2021-08-09 18:38                 ` hedrick
2021-08-09 18:44                   ` Timothy Pearson
2021-08-09 18:49                   ` J. Bruce Fields
     [not found]                     ` <15AD846A-4638-4ACF-B47C-8EF655AD6E85@rutgers.edu>
2021-08-09 18:56                       ` Timothy Pearson
2021-08-09 20:54                         ` Charles Hedrick
2021-08-09 21:49                           ` Timothy Pearson
2021-08-09 22:01                             ` Charles Hedrick [this message]
     [not found]             ` <1119B476-171F-4C5A-9DEF-184F211A6A98@rutgers.edu>
2021-08-10 16:22               ` Timothy Pearson
2021-08-16 14:43                 ` hedrick
2021-08-09 18:30           ` J. Bruce Fields
2021-08-09 18:34             ` hedrick
     [not found]             ` <413163A6-8484-4170-9877-C0C2D50B13C0@rutgers.edu>
2021-08-10 14:58               ` J. Bruce Fields
2021-07-23 21:00 ` J. Bruce Fields
2021-07-23 21:22   ` Timothy Pearson
2021-07-28 19:51     ` Timothy Pearson
2021-08-02 19:28       ` J. Bruce Fields
2021-08-10  0:43 ` NeilBrown
2021-08-10  0:54   ` J.  Bruce Fields
2021-08-12 14:44   ` J.  Bruce Fields
2021-08-12 21:36     ` NeilBrown
2021-10-08 20:27       ` Scott Mayhew
2021-10-08 20:53         ` Timothy Pearson
2021-10-08 21:11         ` J.  Bruce Fields
2021-10-09 17:33         ` Chuck Lever III
2021-10-11 14:30           ` Bruce Fields
2021-10-11 16:36             ` Chuck Lever III
2021-10-11 21:57               ` NeilBrown
2021-10-14 22:36                 ` Trond Myklebust
2021-10-14 22:51                   ` NeilBrown
2021-10-15  8:03                     ` Trond Myklebust
2021-10-15  8:05                       ` Trond Myklebust
2021-12-01 18:36                         ` Scott Mayhew
2021-12-01 19:35                           ` Trond Myklebust
2021-12-01 20:13                             ` Scott Mayhew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EC809F03-464E-4F82-BB2B-DD62EF17C9AB@rutgers.edu \
    --to=hedrick@rutgers.edu \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tpearson@raptorengineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.