linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Krzysztof Kozlowski <krzk@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Bruce Fields <bfields@fieldses.org>,
	Jeff Layton <jlayton@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"linux-samsung-soc@vger.kernel.org"
	<linux-samsung-soc@vger.kernel.org>
Subject: Re: [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage
Date: Fri, 15 Jun 2018 16:28:43 +0200	[thread overview]
Message-ID: <CAJKOXPcw-i99_U-Ltuuf6w-WRK7V4gNqVSYkCMGFEXQG2DL2XQ@mail.gmail.com> (raw)
In-Reply-To: <BCD39D7E-EEEC-4EB9-824E-63323C333C88@oracle.com>

On Fri, Jun 15, 2018 at 4:23 PM, Chuck Lever <chuck.lever@oracle.com> wrote=
:
>
>
>> On Jun 15, 2018, at 10:07 AM, Krzysztof Kozlowski <krzk@kernel.org> wrot=
e:
>>
>> On Fri, Jun 15, 2018 at 2:53 PM, Sudeep Holla <sudeep.holla@arm.com> wro=
te:
>>> Hi,
>>>
>>> On Thu, Jun 7, 2018 at 12:19 PM, Krzysztof Kozlowski <krzk@kernel.org> =
wrote:
>>>> Hi,
>>>>
>>>> When booting my boards under recent linux-next, I see failures of syst=
emd:
>>>>
>>>> [FAILED] Failed to start Flush Journal to Persistent Storage.
>>>> See 'systemctl status systemd-journal-flush.service' for details.
>>>>         Starting Create Volatile Files and Directories...
>>>> [**    ] A start job is running for Create V=E2=80=A6 [  223.209289] n=
fs:
>>>> server 192.168.1.10 not responding, still trying
>>>> [  223.209377] nfs: server 192.168.1.10 not responding, still trying
>>>>
>>>> Effectively the boards fails to boot. Example is here:
>>>> https://krzk.eu/#/builders/1/builds/2157
>>>>
>>>
>>> I too encountered the same issue.
>>>
>>>> This was bisected to:
>>>> commit 37ac86c3a76c113619b7d9afe0251bbfc04cb80a
>>>> Author: Chuck Lever <chuck.lever@oracle.com>
>>>> Date:   Fri May 4 15:34:53 2018 -0400
>>>>
>>>>    SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock
>>>>
>>>>    alloc_slot is a transport-specific op, but initializing an rpc_rqst
>>>>    is common to all transports. In addition, the only part of initial-
>>>>    izing an rpc_rqst that needs serialization is getting a fresh XID.
>>>>
>>>>    Move rpc_rqst initialization to common code in preparation for
>>>>    adding a transport-specific alloc_slot to xprtrdma.
>>>>
>>>>    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>>>
>>>
>>> Unfortunately, spent time to bisect independently without seeing this
>>> report and got the same culprit.
>>>
>>>>
>>>> Bisect log attached. Full configuration:
>>>> 1. exynos_defconfig
>>>> 2. ARMv7, octa-core, Exynos5422 and Exynos4412 (Odroid XU3, U3 and oth=
ers)
>>>> 3. NFSv4 client (from Raspberry Pi)
>>>>
>>>
>>> Yes the issue is seen only with NFSv4 client and with latest systemd I =
think.
>>> My Ubuntu 16.04(32bit FS) is  boots fine while 18.04 has the above issu=
e.
>>> Passing nfsv3 in kernel command line makes it work again.
>>
>> Thanks for reply!
>>
>> I test it on systemd versions 236 and 238... and it fails on both.
>> However one board passes always - it is Odroid HC1 with same core
>> configuration as described before. Probably there is some different SW
>> package on it.
>>
>>>> Let me know if you need any more information.
>>>>
>>>
>>> Also I was observing this issue with Linus master branch from
>>> the time the above patch was merged until today. The issue
>>> is no longer seen since this morning however I just enabled lockdep
>>> and got these messages.
>>
>> All recent linux-next fail. Today's Linus' tree (4c5e8fc62d6a ("Merge
>> tag 'linux-kselftest-4.18-rc1-2' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest"))
>> managed to get up on one board but stuck on different board with the
>> same issue.
>>
>> I am quite surprised that there is no response from the author of the
>> commit and this was just moved from next (while failing) to Linus'
>> tree... bringing the issue to mainline now.
>
> Sorry. This morning is the first time I've seen this report, which was
> not To: or Cc'd to me.

D'oh! That's mine mistake. Apparently I missed to put you on CC list.
Sorry for that.


> Since I don't have access to this kind of hardware, I will have to ask
> for your help to perform basic troubleshooting.
>
> Can we start by capturing the network traffic that occurs while you
> reproduce the problem? Use tshark or tcpdump on your NFS server, filter
> on the IP of the client, and send me (or the list) the raw pcap file.

Sure, I'll send you tcpdump without Cc-ing list.

Best regards,
Krzysztof

  reply	other threads:[~2018-06-15 14:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07 11:19 [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage Krzysztof Kozlowski
2018-06-07 11:22 ` Krzysztof Kozlowski
2018-06-15 12:53 ` Sudeep Holla
2018-06-15 14:07   ` Krzysztof Kozlowski
2018-06-15 14:10     ` Krzysztof Kozlowski
2018-06-15 14:23     ` Chuck Lever
2018-06-15 14:28       ` Krzysztof Kozlowski [this message]
     [not found]         ` <CAJKOXPf7V1z4pNZ2RsjkJmaVAgk34_rPOtQOc7J5KRMSRemrdg@mail.gmail.com>
     [not found]           ` <082848F5-6360-4523-BA95-601777E17CF2@oracle.com>
     [not found]             ` <CAJKOXPe4uqOZGd_zb_Mqq4Fspu63LHZky86O4tPaD3BZNCBS6w@mail.gmail.com>
     [not found]               ` <6AAC5897-8762-4209-8718-67291B87AA00@oracle.com>
     [not found]                 ` <d617f8a85c3414230cd755fe0e4470c0f8bd17ff.camel@hammerspace.com>
     [not found]                   ` <26995E49-E8DD-421D-BABF-9E672B9383BD@oracle.com>
     [not found]                     ` <30a1455b6817361f8228c022628aef5cdee70c3e.camel@hammerspace.com>
     [not found]                       ` <784782B0-A387-4D3B-A517-E85D851162B8@oracle.com>
2018-07-25 13:27                         ` Krzysztof Kozlowski
2018-07-25 14:31                           ` Chuck Lever
2018-07-26  8:46                             ` Krzysztof Kozlowski
2018-07-27  1:48                               ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJKOXPcw-i99_U-Ltuuf6w-WRK7V4gNqVSYkCMGFEXQG2DL2XQ@mail.gmail.com \
    --to=krzk@kernel.org \
    --cc=anna.schumaker@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=davem@davemloft.net \
    --cc=jlayton@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-samsung-soc@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sudeep.holla@arm.com \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).