linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Bruce Fields <bfields@fieldses.org>,
	Jeff Layton <jlayton@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"linux-samsung-soc@vger.kernel.org"
	<linux-samsung-soc@vger.kernel.org>
Subject: Re: [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage
Date: Fri, 15 Jun 2018 10:23:04 -0400	[thread overview]
Message-ID: <BCD39D7E-EEEC-4EB9-824E-63323C333C88@oracle.com> (raw)
In-Reply-To: <CAJKOXPfX0CytKcYDaDAYYuCQjk-mcGjFRHfZco-wPQsc4G1agA@mail.gmail.com>



> On Jun 15, 2018, at 10:07 AM, Krzysztof Kozlowski <krzk@kernel.org> =
wrote:
>=20
> On Fri, Jun 15, 2018 at 2:53 PM, Sudeep Holla <sudeep.holla@arm.com> =
wrote:
>> Hi,
>>=20
>> On Thu, Jun 7, 2018 at 12:19 PM, Krzysztof Kozlowski =
<krzk@kernel.org> wrote:
>>> Hi,
>>>=20
>>> When booting my boards under recent linux-next, I see failures of =
systemd:
>>>=20
>>> [FAILED] Failed to start Flush Journal to Persistent Storage.
>>> See 'systemctl status systemd-journal-flush.service' for details.
>>>         Starting Create Volatile Files and Directories...
>>> [**    ] A start job is running for Create V=E2=80=A6 [  223.209289] =
nfs:
>>> server 192.168.1.10 not responding, still trying
>>> [  223.209377] nfs: server 192.168.1.10 not responding, still trying
>>>=20
>>> Effectively the boards fails to boot. Example is here:
>>> https://krzk.eu/#/builders/1/builds/2157
>>>=20
>>=20
>> I too encountered the same issue.
>>=20
>>> This was bisected to:
>>> commit 37ac86c3a76c113619b7d9afe0251bbfc04cb80a
>>> Author: Chuck Lever <chuck.lever@oracle.com>
>>> Date:   Fri May 4 15:34:53 2018 -0400
>>>=20
>>>    SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock
>>>=20
>>>    alloc_slot is a transport-specific op, but initializing an =
rpc_rqst
>>>    is common to all transports. In addition, the only part of =
initial-
>>>    izing an rpc_rqst that needs serialization is getting a fresh =
XID.
>>>=20
>>>    Move rpc_rqst initialization to common code in preparation for
>>>    adding a transport-specific alloc_slot to xprtrdma.
>>>=20
>>>    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>    Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>>=20
>>=20
>> Unfortunately, spent time to bisect independently without seeing this
>> report and got the same culprit.
>>=20
>>>=20
>>> Bisect log attached. Full configuration:
>>> 1. exynos_defconfig
>>> 2. ARMv7, octa-core, Exynos5422 and Exynos4412 (Odroid XU3, U3 and =
others)
>>> 3. NFSv4 client (from Raspberry Pi)
>>>=20
>>=20
>> Yes the issue is seen only with NFSv4 client and with latest systemd =
I think.
>> My Ubuntu 16.04(32bit FS) is  boots fine while 18.04 has the above =
issue.
>> Passing nfsv3 in kernel command line makes it work again.
>=20
> Thanks for reply!
>=20
> I test it on systemd versions 236 and 238... and it fails on both.
> However one board passes always - it is Odroid HC1 with same core
> configuration as described before. Probably there is some different SW
> package on it.
>=20
>>> Let me know if you need any more information.
>>>=20
>>=20
>> Also I was observing this issue with Linus master branch from
>> the time the above patch was merged until today. The issue
>> is no longer seen since this morning however I just enabled lockdep
>> and got these messages.
>=20
> All recent linux-next fail. Today's Linus' tree (4c5e8fc62d6a ("Merge
> tag 'linux-kselftest-4.18-rc1-2' of
> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest"))
> managed to get up on one board but stuck on different board with the
> same issue.
>=20
> I am quite surprised that there is no response from the author of the
> commit and this was just moved from next (while failing) to Linus'
> tree... bringing the issue to mainline now.

Sorry. This morning is the first time I've seen this report, which was
not To: or Cc'd to me.

Since I don't have access to this kind of hardware, I will have to ask
for your help to perform basic troubleshooting.

Can we start by capturing the network traffic that occurs while you
reproduce the problem? Use tshark or tcpdump on your NFS server, filter
on the IP of the client, and send me (or the list) the raw pcap file.


--
Chuck Lever




  parent reply	other threads:[~2018-06-15 14:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-07 11:19 [BUG BISECT] NFSv4 client fails on Flush Journal to Persistent Storage Krzysztof Kozlowski
2018-06-07 11:22 ` Krzysztof Kozlowski
2018-06-15 12:53 ` Sudeep Holla
2018-06-15 14:07   ` Krzysztof Kozlowski
2018-06-15 14:10     ` Krzysztof Kozlowski
2018-06-15 14:23     ` Chuck Lever [this message]
2018-06-15 14:28       ` Krzysztof Kozlowski
     [not found]         ` <CAJKOXPf7V1z4pNZ2RsjkJmaVAgk34_rPOtQOc7J5KRMSRemrdg@mail.gmail.com>
     [not found]           ` <082848F5-6360-4523-BA95-601777E17CF2@oracle.com>
     [not found]             ` <CAJKOXPe4uqOZGd_zb_Mqq4Fspu63LHZky86O4tPaD3BZNCBS6w@mail.gmail.com>
     [not found]               ` <6AAC5897-8762-4209-8718-67291B87AA00@oracle.com>
     [not found]                 ` <d617f8a85c3414230cd755fe0e4470c0f8bd17ff.camel@hammerspace.com>
     [not found]                   ` <26995E49-E8DD-421D-BABF-9E672B9383BD@oracle.com>
     [not found]                     ` <30a1455b6817361f8228c022628aef5cdee70c3e.camel@hammerspace.com>
     [not found]                       ` <784782B0-A387-4D3B-A517-E85D851162B8@oracle.com>
2018-07-25 13:27                         ` Krzysztof Kozlowski
2018-07-25 14:31                           ` Chuck Lever
2018-07-26  8:46                             ` Krzysztof Kozlowski
2018-07-27  1:48                               ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BCD39D7E-EEEC-4EB9-824E-63323C333C88@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=anna.schumaker@netapp.com \
    --cc=bfields@fieldses.org \
    --cc=davem@davemloft.net \
    --cc=jlayton@kernel.org \
    --cc=krzk@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-samsung-soc@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sudeep.holla@arm.com \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).