All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS 4.2 client support broken on 5.10.5
@ 2021-01-08 11:41 Kurt Garloff
  2021-01-08 11:58 ` Trond Myklebust
  0 siblings, 1 reply; 4+ messages in thread
From: Kurt Garloff @ 2021-01-08 11:41 UTC (permalink / raw)
  To: Neil Brown, Anna Schumaker, Trond Myklebust; +Cc: LKML

[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]

Hi Neil, Anna, Trond,

compiling a kernel, I suddenly started getting errors from objtool orc.
(This first occurs on init/main.o.)

I looked at all kind of things, before I noticed that this was not a
toolchain issue (gcc-10.2.1 self compiled), gcc plugins (I use
structleak and stackleak) nor an issue with objtool or libelf,
but that there was an -EIO error.

The kernel tree is on an NFS share, and I run 5.10.5 client kernel
against the kernel NFS (4.2) server, running a 5.10.3 kernel.

The issue does NOT occur on a 5.10.3 client kernel, but is easily
reproducible on 5.10.5. Note that 5.10.5 on a local file system or
against an NFSv3 server does not show the issue.

Test program that reproduces this on the first pwrite64() is attached.
Note that the call to ftruncate() is required to make the problem happen.

I could go on bisecting this to a particular patch, but you'll
probably be able to see right away what's wrong.

Best,

-- 
Kurt Garloff <kurt@garloff.de>
Cologne, Germany


[-- Attachment #2: testpwrite.c --]
[-- Type: text/x-csrc, Size: 1579 bytes --]

/* testpwrite.c
 * reproduces issue on NFS 4.2 client on Linux 5.10.5
 * (c) Kurt Garloff <scs@garloff.de>, 1/2021
 * License: GNU GPL 2 or later 
 */

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <time.h>

#define MAXBUF 1048576
#define MAXSIZE (128*MAXBUF)
#define REP 16384

long randomwrite(int fd)
{
	void* buf = malloc(MAXBUF);
	long written = 0;
	int rep = REP;
	int ret;
	if (!buf) {
		perror("malloc");
		return -3;
	}

	memset(buf, 0, MAXBUF);
	while (--rep) {
#if MAXSIZE >= RAND_MAX
		loff_t offset = rand() * (MAXSIZE/RAND_MAX);
#else
		loff_t offset = rand() / (RAND_MAX/MAXSIZE);
#endif
#if MAXBUF >= RAND_MAX
		size_t len = rand() * (MAXBUF/RAND_MAX);
#else
		size_t len = rand() / (RAND_MAX/MAXBUF);
#endif
		//ret = pread(fd, buf+len/4, len/2, offset);
		/* Slowly fill in some random non-0 values */
		*(int*)(buf + (len - len%8)) = rand();
		ret = pwrite(fd, buf, len, offset);
		if (ret < 0) {
			fprintf(stderr, "pwrite(%i,%p,%li,%li): ", fd, buf, len, offset);
			perror("");
			return -1;
		} else 
			written += ret;
	}
	return written;
}

int main(int argc, char *argv[])
{
	srand(time(0));
	int fd = open("testfile", O_RDWR|O_CREAT, S_IRUSR|S_IWUSR|S_IRGRP);
	if (fd <= 0) {
		perror("open(\"testfile\")");
		return 2;
	}
	ftruncate(fd, MAXSIZE);
	unlink("testfile");
	long rc = randomwrite(fd);
	close(fd);
	if (rc < 0)
		return -rc;
	printf("%li bytes written successfully\n", rc);
	return 0;
}


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS 4.2 client support broken on 5.10.5
  2021-01-08 11:41 NFS 4.2 client support broken on 5.10.5 Kurt Garloff
@ 2021-01-08 11:58 ` Trond Myklebust
  2021-01-08 14:39   ` Kurt Garloff
  0 siblings, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2021-01-08 11:58 UTC (permalink / raw)
  To: kurt, NeilB, Anna.Schumaker; +Cc: linux-kernel

On Fri, 2021-01-08 at 12:41 +0100, Kurt Garloff wrote:
> Hi Neil, Anna, Trond,
> 
> compiling a kernel, I suddenly started getting errors from objtool
> orc.
> (This first occurs on init/main.o.)
> 
> I looked at all kind of things, before I noticed that this was not a
> toolchain issue (gcc-10.2.1 self compiled), gcc plugins (I use
> structleak and stackleak) nor an issue with objtool or libelf,
> but that there was an -EIO error.
> 
> The kernel tree is on an NFS share, and I run 5.10.5 client kernel
> against the kernel NFS (4.2) server, running a 5.10.3 kernel.
> 
> The issue does NOT occur on a 5.10.3 client kernel, but is easily
> reproducible on 5.10.5. Note that 5.10.5 on a local file system or
> against an NFSv3 server does not show the issue.
> 
> Test program that reproduces this on the first pwrite64() is
> attached.
> Note that the call to ftruncate() is required to make the problem
> happen.
> 
> I could go on bisecting this to a particular patch, but you'll
> probably be able to see right away what's wrong.
> 
> Best,
> 

Hmm... If this is NFSv4.2 do you have READ_PLUS turned on or off in
.config? It really is not safe to enable READ_PLUS on 5.10 kernels
since that can cause random memory corruption.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS 4.2 client support broken on 5.10.5
  2021-01-08 11:58 ` Trond Myklebust
@ 2021-01-08 14:39   ` Kurt Garloff
  2021-01-08 15:49     ` Kurt Garloff
  0 siblings, 1 reply; 4+ messages in thread
From: Kurt Garloff @ 2021-01-08 14:39 UTC (permalink / raw)
  To: Trond Myklebust, NeilB, Anna.Schumaker; +Cc: linux-kernel

Hi Trond,

On 08/01/2021 12:58, Trond Myklebust wrote:
> On Fri, 2021-01-08 at 12:41 +0100, Kurt Garloff wrote:
>> Hi Neil, Anna, Trond,
>>
>> compiling a kernel, I suddenly started getting errors from objtool
>> orc.
>> (This first occurs on init/main.o.)
>>
>> I looked at all kind of things, before I noticed that this was not a
>> toolchain issue (gcc-10.2.1 self compiled), gcc plugins (I use
>> structleak and stackleak) nor an issue with objtool or libelf,
>> but that there was an -EIO error.
>>
>> The kernel tree is on an NFS share, and I run 5.10.5 client kernel
>> against the kernel NFS (4.2) server, running a 5.10.3 kernel.
>>
>> The issue does NOT occur on a 5.10.3 client kernel, but is easily
>> reproducible on 5.10.5. Note that 5.10.5 on a local file system or
>> against an NFSv3 server does not show the issue.
>>
>> Test program that reproduces this on the first pwrite64() is
>> attached.
>> Note that the call to ftruncate() is required to make the problem
>> happen.
>>
>> I could go on bisecting this to a particular patch, but you'll
>> probably be able to see right away what's wrong.
>>
> Hmm... If this is NFSv4.2 do you have READ_PLUS turned on or off in
> .config? It really is not safe to enable READ_PLUS on 5.10 kernels
> since that can cause random memory corruption.
OK, it is turned on in my kernel -- looks like I have not read the
warning in the config option help text carefully enough ...

I'll test what happens if I switch it off and report back.

Thanks for the quick response

---

Kurt Garloff <kurt@garloff.de>
Cologne, Germany



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NFS 4.2 client support broken on 5.10.5
  2021-01-08 14:39   ` Kurt Garloff
@ 2021-01-08 15:49     ` Kurt Garloff
  0 siblings, 0 replies; 4+ messages in thread
From: Kurt Garloff @ 2021-01-08 15:49 UTC (permalink / raw)
  To: Trond Myklebust, NeilB, Anna.Schumaker; +Cc: linux-kernel

Hi Trond,

Am 08.01.21 um 15:39 schrieb Kurt Garloff:
> Hi Trond,
>
> On 08/01/2021 12:58, Trond Myklebust wrote:
>> On Fri, 2021-01-08 at 12:41 +0100, Kurt Garloff wrote:
>>> [...]
>>> The kernel tree is on an NFS share, and I run 5.10.5 client kernel
>>> against the kernel NFS (4.2) server, running a 5.10.3 kernel.
>>>
>>> The issue does NOT occur on a 5.10.3 client kernel, but is easily
>>> reproducible on 5.10.5. Note that 5.10.5 on a local file system or
>>> against an NFSv3 server does not show the issue.
>>>
>>> Test program that reproduces this on the first pwrite64() is
>>> attached.
>>> Note that the call to ftruncate() is required to make the problem
>>> happen.
>>>
>>> I could go on bisecting this to a particular patch, but you'll
>>> probably be able to see right away what's wrong.
>>>
>> Hmm... If this is NFSv4.2 do you have READ_PLUS turned on or off in
>> .config? It really is not safe to enable READ_PLUS on 5.10 kernels
>> since that can cause random memory corruption.
> OK, it is turned on in my kernel -- looks like I have not read the
> warning in the config option help text carefully enough ...
>
> I'll test what happens if I switch it off and report back.

OK, I compiled a kernel without support for READ_PLUS
and the test program magically succeeds.

So take my report as input to the developers that work
on making READ_PLUS work. Maybe they want to add
my little program to their CI suite.

Thanks,

-- 
Kurt Garloff <kurt@garloff.de>, Cologne, Germany


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-01-08 15:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-08 11:41 NFS 4.2 client support broken on 5.10.5 Kurt Garloff
2021-01-08 11:58 ` Trond Myklebust
2021-01-08 14:39   ` Kurt Garloff
2021-01-08 15:49     ` Kurt Garloff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.