All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: nfs I/O errors with sqlite applications
@ 2015-10-12 16:48 Nick Bowler
  2015-10-12 19:25 ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Bowler @ 2015-10-12 16:48 UTC (permalink / raw)
  To: linux-nfs

Hi,

I'm having a problem where, eventually, the nfs-mounted home directory
on one of my machines starts failing in a kind of weird way.  The issue
appears to affect only sqlite; I have two applications that I know of
which use it:

  - Firefox, where the symptom is that the browser just hangs randomly,
  - gmpc, which crashes immediately on startup with I/O error.

Once the issue occurs these applications remain permanently broken.
Since the latter is easier to test, I can run it in strace, and the
failing syscall seems to be:

  fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)

When the issue occurs, the client dmesg log is full of messages of the form:

  [3441972.381211] NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff88007612ae20!

There are no unusual messages on the server.

Rebooting the client corrects the issue in the short term, but it seems
to re-occur after about 1 month of uptime.  This makes it difficult to
test anything.  So right now I have left the client in the broken state
in case there's something else I can try.

The client is running Linux 4.2, with approx. 38 days uptime.  The
server is running Linux 4.1.4, with 62 days uptime.

Let me know if you need any more info.

Thanks,
  Nick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2015-10-12 16:48 PROBLEM: nfs I/O errors with sqlite applications Nick Bowler
@ 2015-10-12 19:25 ` J. Bruce Fields
  2015-10-12 19:46   ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2015-10-12 19:25 UTC (permalink / raw)
  To: Nick Bowler; +Cc: linux-nfs, jlayton

On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
> Hi,
> 
> I'm having a problem where, eventually, the nfs-mounted home directory
> on one of my machines starts failing in a kind of weird way.  The issue
> appears to affect only sqlite; I have two applications that I know of
> which use it:
> 
>   - Firefox, where the symptom is that the browser just hangs randomly,
>   - gmpc, which crashes immediately on startup with I/O error.
> 
> Once the issue occurs these applications remain permanently broken.
> Since the latter is easier to test, I can run it in strace, and the
> failing syscall seems to be:
> 
>   fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
> 
> When the issue occurs, the client dmesg log is full of messages of the form:
> 
>   [3441972.381211] NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff88007612ae20!
> 
> There are no unusual messages on the server.
> 
> Rebooting the client corrects the issue in the short term, but it seems
> to re-occur after about 1 month of uptime.  This makes it difficult to
> test anything.  So right now I have left the client in the broken state
> in case there's something else I can try.
> 
> The client is running Linux 4.2, with approx. 38 days uptime.  The
> server is running Linux 4.1.4, with 62 days uptime.
> 
> Let me know if you need any more info.

That does sound like a pain to debug.

I don't *think* this could be explained by the problem Jeff's seqid
locking patches fixed, but maybe I'm wrong; cc'ing him to confirm.

I wonder if there's some way to make this reproduce more quickly, for
example by running something that makes more aggressive use of sqlite,
or running multiple copies of such a thing simultaneously.  Might be
interesting to know what the pattern of file opens and locking looks
like (so stracing one of those applications might help).

--b.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2015-10-12 19:25 ` J. Bruce Fields
@ 2015-10-12 19:46   ` J. Bruce Fields
  2015-10-13  3:01     ` Nick Bowler
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2015-10-12 19:46 UTC (permalink / raw)
  To: Nick Bowler; +Cc: linux-nfs, jlayton

On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote:
> On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
> > Hi,
> > 
> > I'm having a problem where, eventually, the nfs-mounted home directory
> > on one of my machines starts failing in a kind of weird way.  The issue
> > appears to affect only sqlite; I have two applications that I know of
> > which use it:
> > 
> >   - Firefox, where the symptom is that the browser just hangs randomly,
> >   - gmpc, which crashes immediately on startup with I/O error.
> > 
> > Once the issue occurs these applications remain permanently broken.
> > Since the latter is easier to test, I can run it in strace, and the
> > failing syscall seems to be:
> > 
> >   fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
> > 
> > When the issue occurs, the client dmesg log is full of messages of the form:
> > 
> >   [3441972.381211] NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff88007612ae20!
> > 
> > There are no unusual messages on the server.
> > 
> > Rebooting the client corrects the issue in the short term, but it seems
> > to re-occur after about 1 month of uptime.  This makes it difficult to
> > test anything.  So right now I have left the client in the broken state
> > in case there's something else I can try.
> > 
> > The client is running Linux 4.2, with approx. 38 days uptime.  The
> > server is running Linux 4.1.4, with 62 days uptime.
> > 
> > Let me know if you need any more info.
> 
> That does sound like a pain to debug.
> 
> I don't *think* this could be explained by the problem Jeff's seqid
> locking patches fixed, but maybe I'm wrong; cc'ing him to confirm.
> 
> I wonder if there's some way to make this reproduce more quickly, for
> example by running something that makes more aggressive use of sqlite,
> or running multiple copies of such a thing simultaneously.  Might be
> interesting to know what the pattern of file opens and locking looks
> like (so stracing one of those applications might help).

Oh, also I forgot to ask what version of the NFS protocol you're using
(4.0, 4.1, or 4.2).

--b.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2015-10-12 19:46   ` J. Bruce Fields
@ 2015-10-13  3:01     ` Nick Bowler
  2015-10-13 10:52       ` Jeff Layton
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Bowler @ 2015-10-13  3:01 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, jlayton

[-- Attachment #1: Type: text/plain, Size: 2187 bytes --]

On 2015-10-12 15:46 -0400, J. Bruce Fields wrote:
> On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote:
> > On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
> > > I'm having a problem where, eventually, the nfs-mounted home directory
> > > on one of my machines starts failing in a kind of weird way.  The issue
> > > appears to affect only sqlite; I have two applications that I know of
> > > which use it:
> > > 
> > >   - Firefox, where the symptom is that the browser just hangs randomly,
> > >   - gmpc, which crashes immediately on startup with I/O error.
> > > 
> > > Once the issue occurs these applications remain permanently broken.
> > > Since the latter is easier to test, I can run it in strace, and the
> > > failing syscall seems to be:
> > > 
> > >   fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
> > > 
> > > When the issue occurs, the client dmesg log is full of messages of the form:
> > > 
> > >   [3441972.381211] NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff88007612ae20!
> > > 
> > > There are no unusual messages on the server.
[...]
> > I wonder if there's some way to make this reproduce more quickly, for
> > example by running something that makes more aggressive use of sqlite,
> > or running multiple copies of such a thing simultaneously.  Might be
> > interesting to know what the pattern of file opens and locking looks
> > like (so stracing one of those applications might help).

I could try doing something like using the sqlite3 command-line tool to
do a lot of database operations, and hope I can reproduce.  I'd have to
reboot to test though.

I attached a full strace log (gzipped) from a failing process.  The
command run is:

  sqlite3 newfile.sqlite vacuum

which fails in a similar manner to gmpc.

> Oh, also I forgot to ask what version of the NFS protocol you're using
> (4.0, 4.1, or 4.2).

Looks like 4.0:

  athena:/home on /home type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=192.168.0.207,local_lock=none,addr=192.168.0.10)

Cheers,
  Nick

[-- Attachment #2: sqlite3-vacuum-strace.log.gz --]
[-- Type: application/octet-stream, Size: 2458 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2015-10-13  3:01     ` Nick Bowler
@ 2015-10-13 10:52       ` Jeff Layton
  2015-10-13 12:54         ` Nick Bowler
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Layton @ 2015-10-13 10:52 UTC (permalink / raw)
  To: Nick Bowler; +Cc: J. Bruce Fields, linux-nfs

On Mon, 12 Oct 2015 23:01:36 -0400
Nick Bowler <nbowler@draconx.ca> wrote:

> On 2015-10-12 15:46 -0400, J. Bruce Fields wrote:
> > On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote:
> > > On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
> > > > I'm having a problem where, eventually, the nfs-mounted home directory
> > > > on one of my machines starts failing in a kind of weird way.  The issue
> > > > appears to affect only sqlite; I have two applications that I know of
> > > > which use it:
> > > > 
> > > >   - Firefox, where the symptom is that the browser just hangs randomly,
> > > >   - gmpc, which crashes immediately on startup with I/O error.
> > > > 
> > > > Once the issue occurs these applications remain permanently broken.
> > > > Since the latter is easier to test, I can run it in strace, and the
> > > > failing syscall seems to be:
> > > > 
> > > >   fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
> > > > 
> > > > When the issue occurs, the client dmesg log is full of messages of the form:
> > > > 
> > > >   [3441972.381211] NFS: v4 server returned a bad sequence-id error on an unconfirmed sequence ffff88007612ae20!
> > > > 
> > > > There are no unusual messages on the server.
> [...]
> > > I wonder if there's some way to make this reproduce more quickly, for
> > > example by running something that makes more aggressive use of sqlite,
> > > or running multiple copies of such a thing simultaneously.  Might be
> > > interesting to know what the pattern of file opens and locking looks
> > > like (so stracing one of those applications might help).
> 
> I could try doing something like using the sqlite3 command-line tool to
> do a lot of database operations, and hope I can reproduce.  I'd have to
> reboot to test though.
> 
> I attached a full strace log (gzipped) from a failing process.  The
> command run is:
> 
>   sqlite3 newfile.sqlite vacuum
> 
> which fails in a similar manner to gmpc.
> 
> > Oh, also I forgot to ask what version of the NFS protocol you're using
> > (4.0, 4.1, or 4.2).
> 
> Looks like 4.0:
> 
>   athena:/home on /home type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=192.168.0.207,local_lock=none,addr=192.168.0.10)
> 
> Cheers,
>   Nick

Ok, makes sense. The log shows that it occurred in a fcntl call, so
it's probably this from lookup_or_create_lock_state:

        lo = find_lockowner_str(cl, &lock->lk_new_owner);
        if (!lo) {
                strhashval = ownerstr_hashval(&lock->lk_new_owner);
                lo = alloc_init_lock_stateowner(strhashval, cl, ost, lock);
                if (lo == NULL)
                        return nfserr_jukebox;
        } else {
                /* with an existing lockowner, seqids must be the same */
                status = nfserr_bad_seqid;
                if (!cstate->minorversion &&
                    lock->lk_new_lock_seqid != lo->lo_owner.so_seqid)
                        goto out;
        }

...so we found an existing lockowner, but the seqid in the call is
wrong. It seems like the client ought to try to recover in this case,
but I don't see where it handles BAD_SEQID errors in the locking code.
What kernel versions are the client and server running here?

In any case, the question now is whether this is a client or server
bug. What would tell us that is a network capture of the NFS traffic
between client and server at the time that this occurs. Would it be
possible to collect one? If so, then let Bruce and I know and we can
figure out a way to share it privately.

In the meantime, you may want to consider switching to NFSv4.1+. It
really is a superior protocol to v4.0 as it allows more stateful
operations to run in parallel and would likely sidestep this problem.

-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2015-10-13 10:52       ` Jeff Layton
@ 2015-10-13 12:54         ` Nick Bowler
  2016-07-29 16:43           ` Nick Bowler
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Bowler @ 2015-10-13 12:54 UTC (permalink / raw)
  To: Jeff Layton; +Cc: J. Bruce Fields, linux-nfs

On 2015-10-13, Jeff Layton <jlayton@poochiereds.net> wrote:
> On Mon, 12 Oct 2015 23:01:36 -0400
> Nick Bowler <nbowler@draconx.ca> wrote:
>> On 2015-10-12 15:46 -0400, J. Bruce Fields wrote:
>> > On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote:
>> > > On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
>> > > > I'm having a problem where, eventually, the nfs-mounted home
>> > > > directory on one of my machines starts failing in a kind of weird
>> > > > way.  The issue appears to affect only sqlite; I have two
>> > > > applications that I know of which use it:
>> > > >
>> > > >   - Firefox, where the symptom is that the browser just hangs
>> > > >     randomly,
>> > > >   - gmpc, which crashes immediately on startup with I/O error.
>> > > >
>> > > > Once the issue occurs these applications remain permanently broken.
>> > > > Since the latter is easier to test, I can run it in strace, and the
>> > > > failing syscall seems to be:
>> > > >
>> > > >   fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
>> > > > start=1073741824, len=1}) = -1 EIO (Input/output error)
>> > > >
>> > > > When the issue occurs, the client dmesg log is full of messages of
>> > > > the form:
>> > > >
>> > > >   [3441972.381211] NFS: v4 server returned a bad sequence-id error
>> > > > on an unconfirmed sequence ffff88007612ae20!
>> > > >
>> > > > There are no unusual messages on the server.
>> [...]
> Ok, makes sense. The log shows that it occurred in a fcntl call, so
> it's probably this from lookup_or_create_lock_state:
>
>         lo = find_lockowner_str(cl, &lock->lk_new_owner);
>         if (!lo) {
>                 strhashval = ownerstr_hashval(&lock->lk_new_owner);
>                 lo = alloc_init_lock_stateowner(strhashval, cl, ost, lock);
>                 if (lo == NULL)
>                         return nfserr_jukebox;
>         } else {
>                 /* with an existing lockowner, seqids must be the same */
>                 status = nfserr_bad_seqid;
>                 if (!cstate->minorversion &&
>                     lock->lk_new_lock_seqid != lo->lo_owner.so_seqid)
>                         goto out;
>         }
>
> ...so we found an existing lockowner, but the seqid in the call is
> wrong. It seems like the client ought to try to recover in this case,
> but I don't see where it handles BAD_SEQID errors in the locking code.
> What kernel versions are the client and server running here?

It was in my original mail but got snipped (by me).  The client is
running Linux 4.2.  The server is running Linux 4.1.4.  But that's
just what they're running right now; I've been seeing this issue
for a while now and both machines have been updated several times.

> In any case, the question now is whether this is a client or server
> bug. What would tell us that is a network capture of the NFS traffic
> between client and server at the time that this occurs. Would it be
> possible to collect one? If so, then let Bruce and I know and we can
> figure out a way to share it privately.

This should be possible.

> In the meantime, you may want to consider switching to NFSv4.1+. It
> really is a superior protocol to v4.0 as it allows more stateful
> operations to run in parallel and would likely sidestep this problem.

Certainly something to look into!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2015-10-13 12:54         ` Nick Bowler
@ 2016-07-29 16:43           ` Nick Bowler
  2016-07-29 17:52             ` Jeff Layton
  0 siblings, 1 reply; 14+ messages in thread
From: Nick Bowler @ 2016-07-29 16:43 UTC (permalink / raw)
  To: Jeff Layton; +Cc: J. Bruce Fields, linux-nfs

Hi guys,

On 2015-10-13, Nick Bowler <nbowler@draconx.ca> wrote:
> On 2015-10-13, Jeff Layton <jlayton@poochiereds.net> wrote:
>> On Mon, 12 Oct 2015 23:01:36 -0400
>> Nick Bowler <nbowler@draconx.ca> wrote:
>>> On 2015-10-12 15:46 -0400, J. Bruce Fields wrote:
>>> > On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote:
>>> > > On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
[...]
>>> > > > the failing syscall seems to be:
>>> > > >
>>> > > >   fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
>>> > > > start=1073741824, len=1}) = -1 EIO (Input/output error)
>>> > > >
>>> > > > When the issue occurs, the client dmesg log is full of messages of
>>> > > > the form:
>>> > > >
>>> > > >   [3441972.381211] NFS: v4 server returned a bad sequence-id error
>>> > > > on an unconfirmed sequence ffff88007612ae20!
>>> > > >
>>> > > > There are no unusual messages on the server.
>>> [...]
>> Ok, makes sense. The log shows that it occurred in a fcntl call, so
>> it's probably this from lookup_or_create_lock_state:
>>
>>         lo = find_lockowner_str(cl, &lock->lk_new_owner);
>>         if (!lo) {
>>                 strhashval = ownerstr_hashval(&lock->lk_new_owner);
>>                 lo = alloc_init_lock_stateowner(strhashval, cl, ost,
>> lock);
>>                 if (lo == NULL)
>>                         return nfserr_jukebox;
>>         } else {
>>                 /* with an existing lockowner, seqids must be the same */
>>                 status = nfserr_bad_seqid;
>>                 if (!cstate->minorversion &&
>>                     lock->lk_new_lock_seqid != lo->lo_owner.so_seqid)
>>                         goto out;
>>         }
>>
>> ...so we found an existing lockowner, but the seqid in the call is
>> wrong. It seems like the client ought to try to recover in this case,
>> but I don't see where it handles BAD_SEQID errors in the locking code.
[...]
>> In any case, the question now is whether this is a client or server
>> bug. What would tell us that is a network capture of the NFS traffic
>> between client and server at the time that this occurs. Would it be
>> possible to collect one? If so, then let Bruce and I know and we can
>> figure out a way to share it privately.

Hi guys,

Unfortunately I did not manage to perform a network capture last time
due to power loss.  I did not hit this issue again until yesterday (~9
months later), this time after 45 days of uptime.

Kernel versions now are: 4.5.1 on the server, and 4.4.3 on the client.

Since it's now in a failing state again (this situation persists until
a reboot of the client), I captured with strace and tcpdump (on both
client and server) when attempting to start gmpc, the result is quite
small (just 30 packets).  Will that be helpful?

Thanks,
  Nick

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2016-07-29 16:43           ` Nick Bowler
@ 2016-07-29 17:52             ` Jeff Layton
  2017-06-06 16:46               ` Lutz Vieweg
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Layton @ 2016-07-29 17:52 UTC (permalink / raw)
  To: Nick Bowler; +Cc: J. Bruce Fields, linux-nfs

On Fri, 2016-07-29 at 12:43 -0400, Nick Bowler wrote:
> Hi guys,
> 
> > On 2015-10-13, Nick Bowler <nbowler@draconx.ca> wrote:
> > 
> > > > On 2015-10-13, Jeff Layton <jlayton@poochiereds.net> wrote:
> > > 
> > > On Mon, 12 Oct 2015 23:01:36 -0400
> > > > > > Nick Bowler <nbowler@draconx.ca> wrote:
> > > > 
> > > > On 2015-10-12 15:46 -0400, J. Bruce Fields wrote:
> > > > > 
> > > > > On Mon, Oct 12, 2015 at 03:25:38PM -0400, bfields wrote:
> > > > > > 
> > > > > > On Mon, Oct 12, 2015 at 12:48:56PM -0400, Nick Bowler wrote:
> [...]
> > 
> > > 
> > > > 
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > the failing syscall seems to be:
> > > > > > > 
> > > > > > >   fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
> > > > > > > start=1073741824, len=1}) = -1 EIO (Input/output error)
> > > > > > > 
> > > > > > > When the issue occurs, the client dmesg log is full of messages of
> > > > > > > the form:
> > > > > > > 
> > > > > > >   [3441972.381211] NFS: v4 server returned a bad sequence-id error
> > > > > > > on an unconfirmed sequence ffff88007612ae20!
> > > > > > > 
> > > > > > > There are no unusual messages on the server.
> > > > [...]
> > > Ok, makes sense. The log shows that it occurred in a fcntl call, so
> > > it's probably this from lookup_or_create_lock_state:
> > > 
> > >         lo = find_lockowner_str(cl, &lock->lk_new_owner);
> > >         if (!lo) {
> > >                 strhashval = ownerstr_hashval(&lock->lk_new_owner);
> > >                 lo = alloc_init_lock_stateowner(strhashval, cl, ost,
> > > lock);
> > >                 if (lo == NULL)
> > >                         return nfserr_jukebox;
> > >         } else {
> > >                 /* with an existing lockowner, seqids must be the same */
> > >                 status = nfserr_bad_seqid;
> > >                 if (!cstate->minorversion &&
> > >                     lock->lk_new_lock_seqid != lo->lo_owner.so_seqid)
> > >                         goto out;
> > >         }
> > > 
> > > ...so we found an existing lockowner, but the seqid in the call is
> > > wrong. It seems like the client ought to try to recover in this case,
> > > but I don't see where it handles BAD_SEQID errors in the locking code.
> [...]
> > 
> > > 
> > > In any case, the question now is whether this is a client or server
> > > bug. What would tell us that is a network capture of the NFS traffic
> > > between client and server at the time that this occurs. Would it be
> > > possible to collect one? If so, then let Bruce and I know and we can
> > > figure out a way to share it privately.
> 
> Hi guys,
> 
> Unfortunately I did not manage to perform a network capture last time
> due to power loss.  I did not hit this issue again until yesterday (~9
> months later), this time after 45 days of uptime.
> 
> Kernel versions now are: 4.5.1 on the server, and 4.4.3 on the client.
> 
> Since it's now in a failing state again (this situation persists until
> a reboot of the client), I captured with strace and tcpdump (on both
> client and server) when attempting to start gmpc, the result is quite
> small (just 30 packets).  Will that be helpful?
> 
> Thanks,
>   Nick

I doubt we'd be able to tell much after the fact, but feel free to send it along.

Thanks,
-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2016-07-29 17:52             ` Jeff Layton
@ 2017-06-06 16:46               ` Lutz Vieweg
  2017-06-07  3:08                 ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Lutz Vieweg @ 2017-06-06 16:46 UTC (permalink / raw)
  To: linux-nfs

On 07/29/2016 07:52 PM, Jeff Layton wrote:
>>>>>>>>    fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
>>>>>>>> start=1073741824, len=1}) = -1 EIO (Input/output error)
>>
>> Unfortunately I did not manage to perform a network capture last time
>> due to power loss.  I did not hit this issue again until yesterday (~9
>> months later), this time after 45 days of uptime.
>>
>> Kernel versions now are: 4.5.1 on the server, and 4.4.3 on the client.

I wanted to add that I, too, have one NFS client and server
(running linux-4.11.0 on both the server and the client)
currently in the same kind of state:

I can reproduce in 100% of the cases that the following commands:

> rm -f x.sqlite
> sqlite3 x.sqlite "PRAGMA case_sensitive_like=1;PRAGMA synchronous=OFF;PRAGMA recursive_triggers=ON;PRAGMA foreign_keys=OFF;PRAGMA locking_mode = NORMAL;PRAGMA journal_mode =  TRUNCATE;"

result in:

>  "Error: disk I/O error"

on the client - while working fine on the NFS server - with the same kind
of strace output:

>  fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
>  write(2, "Error: disk I/O error\n", 22Error: disk I/O error

But unlike the original reporter, we use the NFS v3 protocol:
> server:/data on /data type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountvers=3,mountport=20048,mountproto=udp,local_lock=none)

If you want me to try or trace something on the client,
I'm willing to help.

Regards,

Lutz Vieweg



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2017-06-06 16:46               ` Lutz Vieweg
@ 2017-06-07  3:08                 ` NeilBrown
  2017-06-08 18:36                   ` Lutz Vieweg
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-06-07  3:08 UTC (permalink / raw)
  To: Lutz Vieweg, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 2390 bytes --]

On Tue, Jun 06 2017, Lutz Vieweg wrote:

> On 07/29/2016 07:52 PM, Jeff Layton wrote:
>>>>>>>>>    fcntl(7, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
>>>>>>>>> start=1073741824, len=1}) = -1 EIO (Input/output error)
>>>
>>> Unfortunately I did not manage to perform a network capture last time
>>> due to power loss.  I did not hit this issue again until yesterday (~9
>>> months later), this time after 45 days of uptime.
>>>
>>> Kernel versions now are: 4.5.1 on the server, and 4.4.3 on the client.
>
> I wanted to add that I, too, have one NFS client and server
> (running linux-4.11.0 on both the server and the client)
> currently in the same kind of state:
>
> I can reproduce in 100% of the cases that the following commands:
>
>> rm -f x.sqlite
>> sqlite3 x.sqlite "PRAGMA case_sensitive_like=1;PRAGMA synchronous=OFF;PRAGMA recursive_triggers=ON;PRAGMA foreign_keys=OFF;PRAGMA locking_mode = NORMAL;PRAGMA journal_mode =  TRUNCATE;"
>
> result in:
>
>>  "Error: disk I/O error"
>
> on the client - while working fine on the NFS server - with the same kind
> of strace output:
>
>>  fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
>>  write(2, "Error: disk I/O error\n", 22Error: disk I/O error
>
> But unlike the original reporter, we use the NFS v3 protocol:
>> server:/data on /data type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountvers=3,mountport=20048,mountproto=udp,local_lock=none)
>
> If you want me to try or trace something on the client,
> I'm willing to help.

Using "soft" is not a good idea.  It could be the cause, but it isn't very
likely if NFS is otherwise working OK.

It might help to run
  rpcdebug -m nfs -s all; rpcdebug -m nlm -s all ;rpcdebug -m rpc -s all
  #repeat your test
  rpcdebug -m nfs -c all; rpcdebug -m nlm -c all ;rpcdebug -m rpc -c all

then collect the kernel logs (possibly just run "dmesg") and post all
the messages which happened at that time.

It might also help to find the port number that lockd is running on
   rpcinfo -p $SERVER | grep 'tcp.*nlockmgr'

(use the 4th column) and

  tcpdump -s 0 -w /tmp/trace.pcap port 2049 or port $LOCKD_PID &
  # run test
  killall tcpdump

gzip /tmp/trace.pcap and put it somewhere it can be fetched from - or
maybe post as an attachment if it isn't too big.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2017-06-07  3:08                 ` NeilBrown
@ 2017-06-08 18:36                   ` Lutz Vieweg
  2017-06-08 22:07                     ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Lutz Vieweg @ 2017-06-08 18:36 UTC (permalink / raw)
  To: NeilBrown, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]

On 06/07/2017 05:08 AM, NeilBrown wrote:
>>>   fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
>>>   write(2, "Error: disk I/O error\n", 22Error: disk I/O error
>>
>> But unlike the original reporter, we use the NFS v3 protocol:
>>> myserver:/data on /data type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountvers=3,mountport=20048,mountproto=udp,local_lock=none)
>
> Using "soft" is not a good idea.  It could be the cause, but it isn't very
> likely if NFS is otherwise working OK.

NFS v3 has been working very well for us for many years.
When we upgraded those two servers ~3 years ago, we did try NFS v4 first, but
that had caused frequent occurences of "un-killable processes in D state",
so we had to revert to v3 to allow for stable operation.

> It might help to run
>    rpcdebug -m nfs -s all; rpcdebug -m nlm -s all ;rpcdebug -m rpc -s all
>    #repeat your test
>    rpcdebug -m nfs -c all; rpcdebug -m nlm -c all ;rpcdebug -m rpc -c all
>
> then collect the kernel logs (possibly just run "dmesg") and post all
> the messages which happened at that time.

Ok, attaching a log generated like this while running:

sqlite3 x.sqlite "PRAGMA case_sensitive_like=1;PRAGMA synchronous=OFF;PRAGMA 
recursive_triggers=ON;PRAGMA foreign_keys=OFF;PRAGMA locking_mode = NORMAL;PRAGMA journal_mode = 
TRUNCATE;"

> It might also help to find the port number that lockd is running on
>     rpcinfo -p $SERVER | grep 'tcp.*nlockmgr'

None of the ports reported this way contains the string "nlockmgr":
> rpcinfo -p myserver
>    program vers proto   port  service
>     100000    4   tcp    111  portmapper
>     100000    3   tcp    111  portmapper
>     100000    2   tcp    111  portmapper
>     100000    4   udp    111  portmapper
>     100000    3   udp    111  portmapper
>     100000    2   udp    111  portmapper

Regards,

Lutz Vieweg

[-- Attachment #2: log.txt.xz --]
[-- Type: application/x-xz, Size: 7544 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2017-06-08 18:36                   ` Lutz Vieweg
@ 2017-06-08 22:07                     ` NeilBrown
  2017-06-09 11:01                       ` Lutz Vieweg
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2017-06-08 22:07 UTC (permalink / raw)
  To: Lutz Vieweg, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 3349 bytes --]

On Thu, Jun 08 2017, Lutz Vieweg wrote:

> On 06/07/2017 05:08 AM, NeilBrown wrote:
>>>>   fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
>>>>   write(2, "Error: disk I/O error\n", 22Error: disk I/O error
>>>
>>> But unlike the original reporter, we use the NFS v3 protocol:
>>>> myserver:/data on /data type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountvers=3,mountport=20048,mountproto=udp,local_lock=none)
>>
>> Using "soft" is not a good idea.  It could be the cause, but it isn't very
>> likely if NFS is otherwise working OK.
>
> NFS v3 has been working very well for us for many years.
> When we upgraded those two servers ~3 years ago, we did try NFS v4 first, but
> that had caused frequent occurences of "un-killable processes in D state",
> so we had to revert to v3 to allow for stable operation.

I queried the use of "soft" - as opposed to "hard".
You defend the use of v3 as opposed to v4.
I think there is some miscommunication happening here.

If v3 works better for you than v4, then certainly use it.
You could try reporting details of the problems with v4, but I cannot
promise a helpful response, so it is totally up to you.

But "soft" is generally a bad idea.  It can lead to data corruption in
various way as it ports errors to user-space which user-space is often
not expecting.

These days, the processes in D state are (usually) killable.

>
>> It might help to run
>>    rpcdebug -m nfs -s all; rpcdebug -m nlm -s all ;rpcdebug -m rpc -s all
>>    #repeat your test
>>    rpcdebug -m nfs -c all; rpcdebug -m nlm -c all ;rpcdebug -m rpc -c all
>>
>> then collect the kernel logs (possibly just run "dmesg") and post all
>> the messages which happened at that time.
>
> Ok, attaching a log generated like this while running:
>
> sqlite3 x.sqlite "PRAGMA case_sensitive_like=1;PRAGMA synchronous=OFF;PRAGMA 
> recursive_triggers=ON;PRAGMA foreign_keys=OFF;PRAGMA locking_mode = NORMAL;PRAGMA journal_mode = 
> TRUNCATE;"

Thanks. Probably the key line is

[2339904.695240] RPC: 46702 remote rpcbind: RPC program/version unavailable

The client is trying to talk to lockd on the server, and lockd doesn't
seem to be there.


>
>> It might also help to find the port number that lockd is running on
>>     rpcinfo -p $SERVER | grep 'tcp.*nlockmgr'
>
> None of the ports reported this way contains the string "nlockmgr":

This agrees with the line from the log.  If nlockmgr isn't listed, then
locking cannot work.  This is the cause of your problem.

>> rpcinfo -p myserver
>>    program vers proto   port  service
>>     100000    4   tcp    111  portmapper
>>     100000    3   tcp    111  portmapper
>>     100000    2   tcp    111  portmapper
>>     100000    4   udp    111  portmapper
>>     100000    3   udp    111  portmapper
>>     100000    2   udp    111  portmapper

Even "nfs" isn't listed - but clearly the nfs server is running.

My guess is that rpcbind was restarted with the "-w" flag, so it lost
all the state that it previosly had.
If you stop and restart NFS service on the server, it might start
working again.  Otherwise just reboot the nfs server.

NeilBrown


>
> Regards,
>
> Lutz Vieweg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2017-06-08 22:07                     ` NeilBrown
@ 2017-06-09 11:01                       ` Lutz Vieweg
  2017-06-09 22:01                         ` NeilBrown
  0 siblings, 1 reply; 14+ messages in thread
From: Lutz Vieweg @ 2017-06-09 11:01 UTC (permalink / raw)
  To: NeilBrown, linux-nfs

On 06/09/2017 12:07 AM, NeilBrown wrote:
> But "soft" is generally a bad idea.  It can lead to data corruption in
> various way as it ports errors to user-space which user-space is often
> not expecting.

 From reading "man 5 nfs" I understood the one situation in which this
option makes a difference is when the NFS server becomes unavailable/unre=
achable.

With "hard" user-space applications will wait indefinitely in the hope
that the NFS service will become available again.

I see that if there was only some temporary glitch with connectivity
to the NFS server, this waiting might yield a better outcome - but that
should be covered by the timeout grace periods anyway.

But if:

- An unreachability of the service persists for a very long time,
   it is bad that it will take a very long time for any monitoring
   of the applications on the server to notice that this is no longer
   a tolerable situation, so some sort of fail-over to different applicat=
ion
   instances need to be triggered

- The unavailability/unreachability of the service is resolved by rebooti=
ng
   the NFS server, chances are that the files are then in a different sta=
te
   than before (due to reverting to the last known consistent state of
   the local filesystem on the server), and in that situation I don't
   want to fool the client into thinking that everything I/O-wise is fine=
 -
   better signal an error to make the application aware of the situation

- The unavailability/unreachability of the service is unresolvable, becau=
se
   the primary NFS server died completely, then the files will clearly be=

   in a different state once a secondary service is brought up - and a
   "kill -9" on all the processes waiting for NFS-I/O seems equally likel=
y
   to me to cause the applications trouble than returning an error on
   the pending I/O operations.

> These days, the processes in D state are (usually) killable.

If that is true for processes waiting on (hard) mounted NFS services,
that is really appreciated and good to know. It would certainly help
us next time we try a newer NFS protocol release :-)

(BTW: I recently had to reboot a machine because processes who
waited for access to a long-removed USB device persisted in D-state...
and were immune to "kill -9". So at least the USB driver subsystem
seems to still contain such pitfalls.)

> Thanks. Probably the key line is
>
> [2339904.695240] RPC: 46702 remote rpcbind: RPC program/version unavail=
able
>
> The client is trying to talk to lockd on the server, and lockd doesn't
> seem to be there.

"ps" however says there is a process of that name running on that server:=

> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAN=
D
> root      3753  0.0  0.0      0     0 ?        S    May26   0:02  \_ [l=
ockd]

Your assumption:
> My guess is that rpcbind was restarted with the "-w" flag, so it lost
> all the state that it previosly had.
seems to be right:

> > systemctl status rpcbind
> =E2=97=8F rpcbind.service - RPC bind service
>    Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; ve=
ndor preset: enabled)
>    Active: active (running) since Wed 2017-05-31 10:06:05 CEST; 1 weeks=
 2 days ago
>   Process: 14043 ExecStart=3D/sbin/rpcbind -w $RPCBIND_ARGS (code=3Dexi=
ted, status=3D0/SUCCESS)
>  Main PID: 14044 (rpcbind)
>    CGroup: /system.slice/rpcbind.service
>            =E2=94=94=E2=94=8014044 /sbin/rpcbind -w
>
> May 31 10:06:05 myserver systemd[1]: Starting RPC bind service...
> May 31 10:06:05 myserver systemd[1]: Started RPC bind service.

If that kind of invocation is known to cause trouble, I wonder why
RedHat/CentOS chose to make it wath seems to be their default...

> If you stop and restart NFS service on the server, it might start
> working again.  Otherwise just reboot the nfs server.

A "systemctl stop nfs ; systemctl start nfs" was not sufficent, only chan=
ged the symptom:
> sqlite3 x.sqlite "PRAGMA case_sensitive_like=3D1;PRAGMA synchronous=3DO=
FF;PRAGMA recursive_triggers=3DON;PRAGMA foreign_keys=3DOFF;PRAGMA lockin=
g_mode =3D NORMAL;PRAGMA journal_mode =3D TRUNCATE;"
> Error: database is locked

On the server, at the same time, the following message is emitted to the =
system log:
> Jun  9 12:53:57 myserver kernel: lockd: cannot monitor myclient

What did help, however, was running:
> systemctl stop rpc-statd ; systemctl start rpc-statd
on the server.

So thanks for your analysis! - We now know a way to remove the symptom
with relatively little disturbance of services.

Should we somehow try to get rid of that "-w" to rpcbind, in an attempt
to not re-trigger the symptom at a later time?

Regards,

Lutz Vieweg


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: PROBLEM: nfs I/O errors with sqlite applications
  2017-06-09 11:01                       ` Lutz Vieweg
@ 2017-06-09 22:01                         ` NeilBrown
  0 siblings, 0 replies; 14+ messages in thread
From: NeilBrown @ 2017-06-09 22:01 UTC (permalink / raw)
  To: Lutz Vieweg, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 7488 bytes --]

On Fri, Jun 09 2017, Lutz Vieweg wrote:

> On 06/09/2017 12:07 AM, NeilBrown wrote:
>> But "soft" is generally a bad idea.  It can lead to data corruption in
>> various way as it ports errors to user-space which user-space is often
>> not expecting.
>
>  From reading "man 5 nfs" I understood the one situation in which this
> option makes a difference is when the NFS server becomes unavailable/unreachable.

Exactly - which should be independent of whether you use NFSv3 or
NFSv4...
The only case where NFSv3 vs NFSv4 would make a difference is if the
server starts misbehaving in some way that only affects one protocol.
This is exactly what happened to you.  The misbehaviour of rpcbind only
affects NFSv3.  NFSv4 wouldn't have noticed :-)

>
> With "hard" user-space applications will wait indefinitely in the hope
> that the NFS service will become available again.
>
> I see that if there was only some temporary glitch with connectivity
> to the NFS server, this waiting might yield a better outcome - but that
> should be covered by the timeout grace periods anyway.

"should be".  Servers and networks can get congested and take longer to
reply than you would expect.  Unless the total timeout is long enough
to notice and get annoyed and frustrated about, it probably isn't long
enough to cover all transient conditions.
>
> But if:
>
> - An unreachability of the service persists for a very long time,
>    it is bad that it will take a very long time for any monitoring
>    of the applications on the server to notice that this is no longer
>    a tolerable situation, so some sort of fail-over to different application
>    instances need to be triggered
>
> - The unavailability/unreachability of the service is resolved by rebooting
>    the NFS server, chances are that the files are then in a different state
>    than before (due to reverting to the last known consistent state of
>    the local filesystem on the server), and in that situation I don't
>    want to fool the client into thinking that everything I/O-wise is fine -
>    better signal an error to make the application aware of the
>    situation

This isn't (or shouldn't be) a valid concern.  Any changes that the
client isn't certain are stable and consistent on the server, will be
resent after a server reboot.
If the server catches fire and you restore from yesterday's backups,
then you might have an issue here - but in that case you'd almost
certainly want to restart all client services anyway.

>
> - The unavailability/unreachability of the service is unresolvable, because
>    the primary NFS server died completely, then the files will clearly be
>    in a different state once a secondary service is brought up - and a
>    "kill -9" on all the processes waiting for NFS-I/O seems equally likely
>    to me to cause the applications trouble than returning an error on
>    the pending I/O operations.

A "kill -9" cannot be ignored, while IO errors can.  If your application
cannot cope with kill -9, it needs to be fixed or replaced.

>
>> These days, the processes in D state are (usually) killable.
>
> If that is true for processes waiting on (hard) mounted NFS services,
> that is really appreciated and good to know. It would certainly help
> us next time we try a newer NFS protocol release :-)

You mean "next time we try with the 'hard' mount option".

>
> (BTW: I recently had to reboot a machine because processes who
> waited for access to a long-removed USB device persisted in D-state...
> and were immune to "kill -9". So at least the USB driver subsystem
> seems to still contain such pitfalls.)

This isn't surprising.  It is easy to trigger NFS related problems, so
developers get annoyed and eventually something gets fixed.
It is much less common to hit these problems with USB device, so
developers don't get annoyed.
A concrete bug report might result in improvements, but I cannot promise.

>
>> Thanks. Probably the key line is
>>
>> [2339904.695240] RPC: 46702 remote rpcbind: RPC program/version unavailable
>>
>> The client is trying to talk to lockd on the server, and lockd doesn't
>> seem to be there.
>
> "ps" however says there is a process of that name running on that server:
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>> root      3753  0.0  0.0      0     0 ?        S    May26   0:02  \_ [lockd]
>
> Your assumption:
>> My guess is that rpcbind was restarted with the "-w" flag, so it lost
>> all the state that it previosly had.
> seems to be right:
>
>> > systemctl status rpcbind
>> ● rpcbind.service - RPC bind service
>>    Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; enabled; vendor preset: enabled)
>>    Active: active (running) since Wed 2017-05-31 10:06:05 CEST; 1 weeks 2 days ago
>>   Process: 14043 ExecStart=/sbin/rpcbind -w $RPCBIND_ARGS (code=exited, status=0/SUCCESS)
>>  Main PID: 14044 (rpcbind)
>>    CGroup: /system.slice/rpcbind.service
>>            └─14044 /sbin/rpcbind -w
>>
>> May 31 10:06:05 myserver systemd[1]: Starting RPC bind service...
>> May 31 10:06:05 myserver systemd[1]: Started RPC bind service.
>
> If that kind of invocation is known to cause trouble, I wonder why
> RedHat/CentOS chose to make it wath seems to be their default...

Sorry - typo on my part.  I should have say "was restarted withOUT the
-w flag".  This configuration of rpcbind appears to be correct.

However.....
rpcbind stores its state in a file.  Until about 6 months ago, the
upstream rpcbind would use a file in /tmp.  Late last year we changed
the code to use a file in /var/run.

When a distro updates to the newer version with a different location,
they *should*
 - stop the running rpcbind
 - copy the state file from /tmp/ to /var/run
 - start rpcbind

If this sequence isn't followed, you will get exactly the symptoms you
report.  That might be what happened.

>
>> If you stop and restart NFS service on the server, it might start
>> working again.  Otherwise just reboot the nfs server.
>
> A "systemctl stop nfs ; systemctl start nfs" was not sufficent, only changed the symptom:
>> sqlite3 x.sqlite "PRAGMA case_sensitive_like=1;PRAGMA synchronous=OFF;PRAGMA recursive_triggers=ON;PRAGMA foreign_keys=OFF;PRAGMA locking_mode = NORMAL;PRAGMA journal_mode = TRUNCATE;"
>> Error: database is locked

By "stop NFS service on the server" I meant
  systemctl restart nfs-server
or something like that.  "nfs" is more client-side than server-side.

However you seem have have got things working again, and that is the
important thing.

You might like to report the (possible) upgrade bug to Fedora, though
maybe someone responsible is listening on the list.
(Hm... I should probably go make sure that openSUSE does the right thing
here...).

NeilBrown



>
> On the server, at the same time, the following message is emitted to the system log:
>> Jun  9 12:53:57 myserver kernel: lockd: cannot monitor myclient
>
> What did help, however, was running:
>> systemctl stop rpc-statd ; systemctl start rpc-statd
> on the server.
>
> So thanks for your analysis! - We now know a way to remove the symptom
> with relatively little disturbance of services.
>
> Should we somehow try to get rid of that "-w" to rpcbind, in an attempt
> to not re-trigger the symptom at a later time?
>
> Regards,
>
> Lutz Vieweg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-09 22:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-12 16:48 PROBLEM: nfs I/O errors with sqlite applications Nick Bowler
2015-10-12 19:25 ` J. Bruce Fields
2015-10-12 19:46   ` J. Bruce Fields
2015-10-13  3:01     ` Nick Bowler
2015-10-13 10:52       ` Jeff Layton
2015-10-13 12:54         ` Nick Bowler
2016-07-29 16:43           ` Nick Bowler
2016-07-29 17:52             ` Jeff Layton
2017-06-06 16:46               ` Lutz Vieweg
2017-06-07  3:08                 ` NeilBrown
2017-06-08 18:36                   ` Lutz Vieweg
2017-06-08 22:07                     ` NeilBrown
2017-06-09 11:01                       ` Lutz Vieweg
2017-06-09 22:01                         ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.