linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: CGEL <cgel.zte@gmail.com>
Cc: keescook@chromium.org, ktkhai@virtuozzo.com,
	jamorris@linux.microsoft.com, varad.gautam@suse.com,
	legion@kernel.org, dbueso@suse.de, linux-kernel@vger.kernel.org,
	Ran Xiaokai <ran.xiaokai@zte.com.cn>
Subject: Re: [PATCH] ipc: add set_ownership() and permissions() callbacks for posix mqueue sysctl
Date: Tue, 3 Aug 2021 16:01:33 +0200	[thread overview]
Message-ID: <20210803140133.vksebmgqhlbqipla@wittgenstein> (raw)
In-Reply-To: <20210803103150.GA607784@www>

On Tue, Aug 03, 2021 at 03:31:50AM -0700, CGEL wrote:
> O Thu, Jul 29, 2021 at 04:53:48PM +0200, Christian Brauner wrote:
> > 
> > Yeah, we did that work specifically for the network namespace but knew
> > there were quite a few places that would need fix up. This makes sense
> > to me.
> > 
> > Please add tests for this patch though. Also make sure to run them in a
> > tight loop on a kernel with memory and log debugging enabled.
>   For now i have rebuilt the kernel turning on the config items you
>   suggested and some other kernel hacking and locking debug configs.
>   I tested this by a shell script concurrently writing the mqueue sysctl
>   files and checking the value. Do you mean that i should add some test code in this patch?
>   Can you give some examples for this tests code?

Yep, I'd prefer to see tests with this. This should be fairly easy to
test. There are a bunch of ways. A really trivial skeleton for a test in
tools/testing/selftstes/mqueue/ could be:

- Create a new mount and ipc namespace and mount mqueue in there.
  Read and remember the /proc/sys/fs/mqueue/queues_max value.
- Now create a new user + mount namespace pair in a child process.
- Mount mqueue filesystem in there.
- Set /proc/sys/fs/mqueue/queues_max to 1.
- Call mq_open with O_CREAT in the child process the first time and
  expect success keeping the fd open.
- Call mq_open with O_CREAT in the child process a second time and
  expect failure because of:

	if (ipc_ns->mq_queues_count >= ipc_ns->mq_queues_max &&
	    !capable(CAP_SYS_RESOURCE)) {
		error = -ENOSPC;
		goto out_unlock;
	}
	ipc_ns->mq_queues_count++;
	spin_unlock(&mq_lock);

- Reap the child in the parent expecting success.
- Verify that the /proc/sys/fs/mqueue/queues_max value in the parent is
  identical to the value you read before creating the child.

This should be a very rough test that at least the fundamentals of this
change are sane.

This doesn't have to be complex.

Here's the code from a test I've written for mount_setattr() to create
an unpriv mount + userns which might be the annoying part:

#include "../../kselftest_harness.h"
#include "../pidfd.h" /* for wait_for_pid() */

static ssize_t write_nointr(int fd, const void *buf, size_t count)
{
	ssize_t ret;

	do {
		ret = write(fd, buf, count);
	} while (ret < 0 && errno == EINTR);

	return ret;
}

static int write_file(const char *path, const void *buf, size_t count)
{
	int fd;
	ssize_t ret;

	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
	if (fd < 0)
		return -1;

	ret = write_nointr(fd, buf, count);
	close(fd);
	if (ret < 0 || (size_t)ret != count)
		return -1;

	return 0;
}

static int create_and_enter_userns(void)
{
	uid_t uid;
	gid_t gid;
	char map[100];

	uid = getuid();
	gid = getgid();

	if (unshare(CLONE_NEWUSER))
		return -1;

	if (write_file("/proc/self/setgroups", "deny", sizeof("deny") - 1) &&
	    errno != ENOENT)
		return -1;

	snprintf(map, sizeof(map), "0 %d 1", uid);
	if (write_file("/proc/self/uid_map", map, strlen(map)))
		return -1;


	snprintf(map, sizeof(map), "0 %d 1", gid);
	if (write_file("/proc/self/gid_map", map, strlen(map)))
		return -1;

	if (setgid(0))
		return -1;

	if (setuid(0))
		return -1;

	return 0;
}

static int prepare_unpriv_mountns(void)
{
	if (create_and_enter_userns())
		return -1;

	if (unshare(CLONE_NEWNS))
		return -1;

	if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0))
		return -1;

	if (unshare(CLONE_NEWIPC))
		return -1;

	return 0;
}

TEST(mqueue_sysctls)
{
	int ret;
	pid_t pid;

	if (unshare(CLONE_NEWNS))
		return -1;

	if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0))
		return -1;

	if (unshare(CLONE_NEWIPC))
		return -1;

	/* Mount new mqueue instance, read and stash sysctl value. */

	pid = fork();
	ASSERT_GE(pid, 0) {
		TH_LOG("%s - Failed to fork", strerror(errno));
	}

	if (pid == 0) {
		ASSERT_EQ(prepare_unpriv_mountns(), 0);

		/* mount new mqueue instance, setup sysctls and perform mq_open() tests. */

		exit(EXIT_SUCCESS);
	}

	ASSERT_EQ(wait_for_pid(pid), 0);


	/* Read sysctl value again making sure it's still the same as before. */
}

TEST_HARNESS_MAIN

> 
> > The whole sysctl retire stuff can't be called from rcu contexts and that's easy to
> > miss. 
>   for this patch, retire_mq_sysctls() is called by free_ipc_ns(), and free_ipc_ns() is 
>   triggered by schedule_work(&free_ipc_work) in kthread context.
>   Can you give some comments on the chance this code running on rcu
>   context?

I really just prefer to see sysctl namespacing changes compiled with all
manner of debugging options enabled as they can quite easily uncover
bugs where the sysctl cleanup helpers are called from invalid contexts.
That has happened to me when I worked on some sysctl changes a while
back and thought I had followed all codepaths diligently to make sure
that nothing calls the sysctl cleanup code from invalid contexts. I only
caught the codepath that did because I had all of the options turned on.

  reply	other threads:[~2021-08-03 14:02 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-29  3:06 [PATCH] ipc: add set_ownership() and permissions() callbacks for posix mqueue sysctl cgel.zte
2021-07-29 14:53 ` Christian Brauner
2021-08-03 10:31   ` CGEL
2021-08-03 14:01     ` Christian Brauner [this message]
2021-08-11 15:51       ` CGEL
2021-08-23  3:29       ` [PATCH] tests: add mqueue sysctl tests for user namespace Ran Xiaokai
2021-08-23 15:26         ` Davidlohr Bueso
2021-08-24 12:05         ` Christian Brauner
2021-08-27  9:50           ` [PATCH V2] " CGEL
2021-08-27 10:12           ` [PATCH V2] ipc: add set_ownership() and permissions() callbacks for posix mqueue sysctl CGEL
2021-09-13 14:40             ` Christian Brauner
2021-09-13 19:42               ` Davidlohr Bueso
2021-09-16  1:49               ` CGEL
2021-10-04 10:53                 ` Christian Brauner
2021-12-01  7:14                   ` CGEL
2021-12-01 12:53                     ` Christian Brauner
2022-04-06  7:59                       ` cgel.zte
2021-07-30 15:09 ` [PATCH] " Davidlohr Bueso
2021-08-03 10:34   ` CGEL

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210803140133.vksebmgqhlbqipla@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=cgel.zte@gmail.com \
    --cc=dbueso@suse.de \
    --cc=jamorris@linux.microsoft.com \
    --cc=keescook@chromium.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=legion@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ran.xiaokai@zte.com.cn \
    --cc=varad.gautam@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).