linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
To: Olga Kornievskaia <aglo@umich.edu>
Cc: Chuck Lever III <chuck.lever@oracle.com>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	Trond Myklebust <trondmy@hammerspace.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: NFS regression between 5.17 and 5.18
Date: Tue, 21 Jun 2022 12:58:17 -0400	[thread overview]
Message-ID: <31af1a7f-51a7-87eb-aba1-ad933a845423@cornelisnetworks.com> (raw)
In-Reply-To: <CAN-5tyFWse4YP8dCGtQMDnqm5s+WsK8HqbitD2dAF5PayJMsEw@mail.gmail.com>

On 6/21/22 12:04 PM, Olga Kornievskaia wrote:
> Hi Dennis,
> 
> Can I ask some basic questions? Have you tried to get any kinds of
> profiling done to see where the client is spending time (using perf
> perhaps)?
> 
> real    4m11.835s
> user    0m0.001s
> sys     0m0.277s
> 
> sounds like 4ms are spent sleeping somewhere? Did it take 4mins to do
> a network transfer (if we had a network trace we could see how long
> network transfer were)? Do you have one (that goes along with
> something that can tell us approximately when the request began from
> the cp's perspective, like a date before hand)?
> 
> I see that there were no rdma changes that went into 5.18 kernel so
> whatever changed either a generic nfs behaviour or perhaps something
> in the rdma core code (is an mellonax card being used here?)
> 
> I wonder if the slowdown only happens on rdma or is it visible on the
> tcp mount as well, have you tried?
> 

Hi Olga,

I have opened a Kernel Bugzilla if you would rather log future responses there:
https://bugzilla.kernel.org/show_bug.cgi?id=216160

To answer your above questions: This is on Omni-Path hardware. I have not tried
the TCP mount, I can though. I don't have any network trace per-se or a profile.
We don't support like a TCP dump or anything like that. However I can tell you
there is nothing going over the network while it appears to be hung. I can
monitor the packet counters.

If you have some ideas where I could put some trace points that could tell us
something I can certainly add those.

-Denny

> 
> 
> On Mon, Jun 20, 2022 at 1:06 PM Dennis Dalessandro
> <dennis.dalessandro@cornelisnetworks.com> wrote:
>>
>> On 6/20/22 10:40 AM, Chuck Lever III wrote:
>>> Hi Thorsten-
>>>
>>>> On Jun 20, 2022, at 10:29 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
>>>>
>>>> On 20.06.22 16:11, Chuck Lever III wrote:
>>>>>
>>>>>
>>>>>> On Jun 20, 2022, at 3:46 AM, Thorsten Leemhuis <regressions@leemhuis.info> wrote:
>>>>>>
>>>>>> Dennis, Chuck, I have below issue on the list of tracked regressions.
>>>>>> What's the status? Has any progress been made? Or is this not really a
>>>>>> regression and can be ignored?
>>>>>>
>>>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>>>>
>>>>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>>>>>> reports and sometimes miss something important when writing mails like
>>>>>> this. If that's the case here, don't hesitate to tell me in a public
>>>>>> reply, it's in everyone's interest to set the public record straight.
>>>>>>
>>>>>> #regzbot poke
>>>>>> ##regzbot unlink: https://bugzilla.kernel.org/show_bug.cgi?id=215890
>>>>>
>>>>> The above link points to an Apple trackpad bug.
>>>>
>>>> Yeah, I know, sorry, should have mentioned: either I or my bot did
>>>> something stupid and associated that report with this regression, that's
>>>> why I deassociated it with the "unlink" command.
>>>
>>> Is there an open bugzilla for the original regression?
>>>
>>>
>>>>> The bug described all the way at the bottom was the origin problem
>>>>> report. I believe this is an NFS client issue. We are waiting for
>>>>> a response from the NFS client maintainers to help Dennis track
>>>>> this down.
>>>>
>>>> Many thx for the status update. Can anything be done to speed things up?
>>>> This is taken quite a long time already -- way longer that outlined in
>>>> "Prioritize work on fixing regressions" here:
>>>> https://docs.kernel.org/process/handling-regressions.html
>>>
>>> ENOTMYMONKEYS ;-)
>>>
>>> I was involved to help with the ^C issue that happened while
>>> Dennis was troubleshooting. It's not related to the original
>>> regression, which needs to be pursued by the NFS client
>>> maintainers.
>>>
>>> The correct people to poke are Trond, Olga (both cc'd) and
>>> Anna Schumaker.
>>
>> Perhaps I should open a bugzilla for the regression. The Ctrl+C issue was a
>> result of the test we were running taking too long. It times out after 10
>> minutes or so and kills the process. So a downstream effect of the regression.
>>
>> The test is still continuing to fail as of 5.19-rc2. I'll double check that it's
>> the same issue and open a bugzilla.
>>
>> Thanks for poking at this.
>>
>> -Denny

  reply	other threads:[~2022-06-21 16:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-28 13:05 NFS regression between 5.17 and 5.18 Dennis Dalessandro
2022-04-28 14:57 ` Chuck Lever III
2022-04-28 15:42   ` Dennis Dalessandro
2022-04-28 19:47     ` Dennis Dalessandro
2022-04-28 19:56       ` Trond Myklebust
2022-04-29 12:54         ` Dennis Dalessandro
2022-04-29 13:37           ` Chuck Lever III
2022-05-06 13:24             ` Dennis Dalessandro
2022-05-13 11:58               ` Dennis Dalessandro
2022-05-13 13:30                 ` Trond Myklebust
2022-05-13 14:59                   ` Chuck Lever III
2022-05-13 15:19                     ` Trond Myklebust
2022-05-13 18:53                       ` Chuck Lever III
2022-05-17 13:40                     ` Dennis Dalessandro
2022-05-17 14:02                       ` Chuck Lever III
2022-06-20  7:46                         ` Thorsten Leemhuis
2022-06-20 14:11                           ` Chuck Lever III
2022-06-20 14:29                             ` Thorsten Leemhuis
2022-06-20 14:40                               ` Chuck Lever III
2022-06-20 17:06                                 ` Dennis Dalessandro
2022-06-21 16:04                                   ` Olga Kornievskaia
2022-06-21 16:58                                     ` Dennis Dalessandro [this message]
2022-06-21 17:51                                       ` Olga Kornievskaia
2022-06-21 17:53                                         ` Chuck Lever III
2022-05-04  9:45 ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31af1a7f-51a7-87eb-aba1-ad933a845423@cornelisnetworks.com \
    --to=dennis.dalessandro@cornelisnetworks.com \
    --cc=aglo@umich.edu \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).