linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rongwei Wang <rongwei.wang@linux.alibaba.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	song@kernel.org, william.kucharski@oracle.com,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache
Date: Fri, 24 Sep 2021 15:12:50 +0800	[thread overview]
Message-ID: <9e41661d-9919-d556-8c49-610dae157553@linux.alibaba.com> (raw)
In-Reply-To: <20210923194343.ca0f29e1c4d361170343a6f2@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 2582 bytes --]



On 9/24/21 10:43 AM, Andrew Morton wrote:
> On Thu, 23 Sep 2021 01:04:54 +0800 Rongwei Wang <rongwei.wang@linux.alibaba.com> wrote:
> 
>>
>>
>>> On Sep 22, 2021, at 7:37 PM, Matthew Wilcox <willy@infradead.org> wrote:
>>>
>>> On Wed, Sep 22, 2021 at 03:06:44PM +0800, Rongwei Wang wrote:
>>>> Transparent huge page has supported read-only non-shmem files. The file-
>>>> backed THP is collapsed by khugepaged and truncated when written (for
>>>> shared libraries).
>>>>
>>>> However, there is race in two possible places.
>>>>
>>>> 1) multiple writers truncate the same page cache concurrently;
>>>> 2) collapse_file rolls back when writer truncates the page cache;
>>>
>>> As I've said before, the bug here is that somehow there is a writable fd
>>> to a file with THPs.  That's what we need to track down and fix.
>> Hi, Matthew
>> I am not sure get your means. We know “mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs"
>> Introduced file-backed THPs for DSO. It is possible {very rarely} for DSO to be opened in writeable way.
>>
>> ...
>>
>>> https://lore.kernel.org/linux-mm/YUdL3lFLFHzC80Wt@casper.infradead.org/
>> All in all, what you mean is that we should solve this race at the source?
> 
> Matthew is being pretty clear here: we shouldn't be permitting
> userspace to get a writeable fd for a thp-backed file.
> 
> Why are we permitting the DSO to be opened writeably?  If there's a
> legitimate case for doing this then presumably "mm, thp: relax the
There is a use case to stress file-backed THP within attachment.
I test this case in a system which has enabled CONFIG_READ_ONLY_THP_FOR_FS:

$ gcc -Wall -g -o stress_madvise_dso stress_madvise_dso.c
$ ulimit -s unlimited
$ ./stress_madvise_dso 10000 <libtest.so>

the meaning of above parameters:
10000: the max test time;
<libtest.so>: the DSO that will been mapped into file-backed THP by 
madvise. It recommended that the text segment of DSO to be tested is 
greater than 2M.

The crash will been triggered at once in the latest kernel. And this
case also can used to trigger the bug that mentioned in our another patch.

> VM_DENYWRITE constraint on file-backed THPs: should be fixed or
> reverted.
> 
> If there is no legitimate use case for returning a writeable fd for a
> thp-backed file then we should fail such an attempt at open().  This
> approach has back-compatibility issues which need to be thought about.
> Perhaps we should permit the open-writeably attempt to appear to
> succeed, but to really return a read-only fd?
> 

[-- Attachment #2: stress_madvise_dso.c --]
[-- Type: text/plain, Size: 6957 bytes --]

/*
 * case: stress file-backed THP
 */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif

#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sched.h>
#include <time.h>
#include <string.h>
#include <fcntl.h>
#include <signal.h> /* for signal */
#include <sys/mman.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <errno.h>

#define PATH_MAX	1024
#define BUFF_MAX	1024
#define TIME_DFL	180	/* seconds */

void signal_handler(int signo)
{
	/* Restore env */
	system("echo never > /sys/kernel/mm/transparent_hugepage/enabled");
	system("echo 10000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs");

	printf("\nrestore env:\n");
	printf("	echo never > /sys/kernel/mm/transparent_hugepage/enabled\n");
	printf("	echo 10000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs\n");
	exit(-1);
}

/* in KB */
#define text_size	(14UL << 10)

#define PROCMAP_SZ	8
struct procmap {
	uint64_t vm_start;
	uint64_t vm_end;
	uint64_t pgoff;
	uint32_t maj;
	uint32_t min;
	uint32_t ino;
#define PROT_SZ		5
	char     prot[PROT_SZ];
	char     fname[PATH_MAX];
};

unsigned long sleep_secs = 0;

/*
 * Routines of procmap, i.e., /proc/pid/(s)maps
 */
static int get_memory_map(pid_t pid, struct procmap *procmap,
		const char *fname)
{
	char path[PATH_MAX];
	char line[BUFF_MAX];
	FILE *fp = NULL;
	char *end = NULL;
	char *pos, *sp = NULL, *in[PROCMAP_SZ];
	char dlm[] = "-   :   ";
	uint64_t counter;
	int i;

	snprintf(path, PATH_MAX, "/proc/%u/maps", pid);

	fp = fopen(path, "r");
	if (fp == NULL) {
		printf("fopen: %s: %s\n", path, strerror(errno));
		return -1;
	}

	if (procmap == NULL || fname == NULL) {
		perror("fail: procmap or fname is NULL");
		goto failed;
	}

	while (fgets(line, BUFF_MAX, fp)) {
		/* Split line into fields */
		pos = line;
		for (i = 0; i < PROCMAP_SZ; i++) {
			in[i] = strtok_r(pos, &dlm[i], &sp);
			if (in[i] == NULL)
				break;
			pos = NULL;
		}

		/* Check this line is procmap item header */
		if (i != PROCMAP_SZ)
			continue;

		memcpy(procmap->prot, in[2], PROT_SZ);
		memcpy(procmap->fname, in[7], PATH_MAX);

		/* Find the target entry */
		if (strcmp(procmap->prot, "r-xp") ||
				!strstr(procmap->fname, fname))
			continue;

		/* Convert/Copy each field as needed */
		errno = 0;
		procmap->vm_start = strtoull(in[0], &end, 16);
		if ((in[0] == '\0') || (end == NULL) || (*end != '\0') ||
				(errno != 0))
			goto failed;

		procmap->vm_end = strtoull(in[1], &end, 16);
		if ((in[1] == '\0') || (end == NULL) || (*end != '\0') ||
				(errno != 0))
			goto failed;

		procmap->pgoff = strtoull(in[3], &end, 16);
		if ((in[3] == '\0') || (end == NULL) || (*end != '\0') ||
				(errno != 0))
			goto failed;

		procmap->ino = strtoul(in[6], &end, 16);
		if ((in[6] == '\0') || (end == NULL) || (*end != '\0') ||
				(errno != 0))
			goto failed;
	}

	if (fp)
		fclose(fp);
	return 0;

failed:
	if (fp)
		fclose(fp);
	printf("fail: exit\n");

	return -1;
}

#define NR_CPU 32
uint64_t gettimeofday_sec(void);
inline uint64_t gettimeofday_sec(void)
{
	struct timeval tv;

	gettimeofday(&tv, NULL);
	return tv.tv_sec;
}

void thread_read(int cpu, char *args)
{
	int fd;
	char *dso_path = args;
	char buf[0x800000];
	struct procmap maps;
	pid_t pid = getpid();

	cpu_set_t mask;
	CPU_ZERO(&mask);
	CPU_SET(cpu, &mask);
	if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
		printf("warning: can not set CPU affinity\n");
	}

	printf("read %s\n", dso_path);
	fd = open(dso_path, O_RDONLY);
	/* The start addr must be alignment with 2M */
	void *p = mmap((void *)0x40000dc00000UL, 0x800000, PROT_READ | PROT_EXEC,
			MAP_PRIVATE, fd, 0);
	if (p == MAP_FAILED) {
		perror("mmap");
		goto out;
	}

	/* get the mapping address (ONLY r-xp) of the DSO */
	get_memory_map(pid, &maps, dso_path);
	printf("pid: %d\n", pid);
	printf("text vm_start: 0x%lx\n", maps.vm_start);
	printf("text vm_end: 0x%lx\n", maps.vm_end);
	madvise((void *)maps.vm_start, maps.vm_end - maps.vm_start, MADV_HUGEPAGE);
	lseek(fd, 0, SEEK_SET);
	for(;;) {
		memcpy(buf, p, 0x800000 - 1);
		sleep(1);
	}

	sleep(100);

out:
	/* Restore env */
	system("echo never > /sys/kernel/mm/transparent_hugepage/enabled");
	system("echo 10000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs");

	printf("read exit %s\n", dso_path);
}

void thread_write(int cpu, char *args)
{
	void *p = NULL;
	char buf[32];
	uint64_t sec = 1;
	uint64_t count = 0;
	char *dso_path = args;

	cpu_set_t mask;
	CPU_ZERO(&mask);
	CPU_SET(cpu, &mask);
	if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
		printf("warning: can not set CPU affinity\n");
	}

	sleep(3);
	printf("write %s\n", dso_path);
	for (;;) {
		sec = gettimeofday_sec();
		while ((sec % 10) >= 3) {
			sec = gettimeofday_sec();
		}

		int fd = open(dso_path, O_RDWR);
		p = mmap(NULL, 0x800000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
		if (p == MAP_FAILED) {
			perror("mmap");
			goto out; /* fail */
		}

		lseek(fd, 0x1600, SEEK_SET);
		for(long i=1; i <= 2; i++){
			memcpy(p + 0x10, buf, 16);
		}

		munmap(p, 0x800000);
		close(fd);

		sleep(2);
		count++;
		if (count >= sleep_secs) {
			printf("test finish: %ld\n", count);
			break;
		}
	} /* end for */

out:
	printf("write exit %s\n", dso_path);
}

/*
 * usage:
 *		stress_madvise_dso <test time> <libtest.so>
 */
int main(int argc, char *argv[])
{
	struct timeval start, end;
	char dso_path[80];
	int ret = 0;
	pid_t pid;

	if (argc > 2) {
		sleep_secs = strtoul(argv[1], NULL, 10);
		realpath(argv[2], dso_path);
	}
	else {
		printf("usage error:\n"\
				"	stress_madvise_dso <test time> <libtest.so>\n"\
				"	e.g. stress_madvise_dso 10000 libtest.so\n");
		exit(-1);
	}

	/* Set env */
	system("ulimit -s unlimited");
	system("echo madvise > /sys/kernel/mm/transparent_hugepage/enabled");
	system("echo 1000 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs");

	gettimeofday(&start, NULL);

	/*
	 * fork 32 task to read and write the same DSO:
	 *		task 0: read dso;
	 *		task 1 - 31: write dso;
	 */
	for (int i = 0; i < NR_CPU; ++i) {
		pid = fork();
		if (pid == 0) {
			if (i == 0)
				thread_read(i, dso_path);
			else
				thread_write(i, dso_path);
			break; /* forbid child fork */
		}
		else {
			/* parent */
		}
	}

	if (pid != 0) {
		signal(SIGINT, signal_handler);
		signal(SIGSEGV, signal_handler);
		signal(SIGABRT, signal_handler);
		/* wait */
		while (1) {
			int status;

			pid_t done = wait(&status);
			if (done == -1) {
				if (errno == ECHILD)
					break; /* No more child processes */
			} else {
				if (!WIFEXITED(status) || WEXITSTATUS(status) != 0) {
					printf("Pid:%d failed\n", done);
					goto out;
				}
			}
		}
	}

out:
	if (ret == 0) {
		gettimeofday(&end, NULL);
		printf("time to collapse file thp: %ld ms\n",
				1000 * (end.tv_sec - start.tv_sec) +
				(end.tv_usec - start.tv_usec) / 1000);
	}

	return ret;
}

  parent reply	other threads:[~2021-09-24  7:12 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-06 12:11 [PATCH 0/2] mm, thp: fix file-backed THP race in collapse_file Rongwei Wang
2021-09-06 12:11 ` [PATCH 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-07  2:49   ` Yu Xu
2021-09-07 18:08   ` Yang Shi
2021-09-08  2:35     ` Rongwei Wang
2021-09-08 21:48       ` Yang Shi
2021-09-09  1:25         ` Rongwei Wang
2021-09-13 14:49   ` [mm, thp] 20753096b6: BUG:unable_to_handle_page_fault_for_address kernel test robot
2021-09-06 12:12 ` [PATCH 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-09-07 16:56   ` Yang Shi
2021-09-08  2:16     ` Rongwei Wang
2021-09-08 21:51       ` Yang Shi
2021-09-09  1:33         ` Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 1/2] mm, thp: check page mapping when truncating page cache Rongwei Wang
2021-09-22 11:37   ` Matthew Wilcox
2021-09-22 17:04     ` Rongwei Wang
2021-09-24  2:43       ` Andrew Morton
2021-09-24  3:08         ` Yang Shi
2021-09-24  3:35         ` Rongwei Wang
2021-09-24  7:12         ` Rongwei Wang [this message]
2021-09-27 22:24           ` Song Liu
2021-09-28 12:06             ` Matthew Wilcox
2021-09-28 16:59               ` Song Liu
2021-09-28 16:20             ` Rongwei Wang
2021-09-29  7:14               ` Song Liu
2021-09-29  7:50                 ` Rongwei Wang
2021-09-29 16:59                   ` Song Liu
2021-09-29 17:55                     ` Matthew Wilcox
2021-09-29 23:41                       ` Song Liu
2021-09-30  0:00                         ` Matthew Wilcox
2021-09-30  0:41                           ` Song Liu
2021-09-30  2:14                             ` Rongwei Wang
2021-10-04 17:26                             ` Rongwei Wang
2021-10-04 19:05                               ` Matthew Wilcox
2021-10-05  1:58                                 ` Rongwei Wang
2021-10-04 20:26                               ` Song Liu
2021-10-05  2:58                               ` Hugh Dickins
2021-10-05  3:07                                 ` Matthew Wilcox
2021-10-05  9:03                                 ` Rongwei Wang
2021-09-30  1:54                         ` Rongwei Wang
2021-09-30  3:26                           ` Song Liu
2021-09-30  5:24                             ` Hugh Dickins
2021-09-30 15:28                               ` Matthew Wilcox
2021-09-30 16:49                                 ` Hugh Dickins
2021-09-30 17:39                                   ` Yang Shi
2021-10-02 17:08                                     ` Matthew Wilcox
2021-10-04 18:28                                       ` Yang Shi
2021-10-04 19:31                                         ` Matthew Wilcox
2021-10-05  2:26                                           ` Hugh Dickins
2021-10-02  2:22                                   ` Rongwei Wang
2021-09-22  7:06 ` [PATCH v2 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06  2:18 ` [PATCH v3 v3 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-06  2:18   ` [PATCH v3 v3 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-06  2:18   ` [PATCH v3 v3 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-06  2:41     ` Matthew Wilcox
2021-10-06  8:39       ` Rongwei Wang
2021-10-06 17:58     ` Yang Shi
2021-10-11  2:22 ` [PATCH v4 0/2] mm, thp: fix file-backed THP race in collapse_file and truncate pagecache Rongwei Wang
2021-10-11  2:22   ` [PATCH v4 1/2] mm, thp: lock filemap when truncating page cache Rongwei Wang
2021-10-13  7:55     ` Rongwei Wang
2021-10-11  2:22   ` [PATCH v4 2/2] mm, thp: bail out early in collapse_file for writeback page Rongwei Wang
2021-10-11  3:08     ` Matthew Wilcox
2021-10-11  3:22       ` Rongwei Wang
2021-10-11  5:08     ` [PATCH v4 RESEND " Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e41661d-9919-d556-8c49-610dae157553@linux.alibaba.com \
    --to=rongwei.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=song@kernel.org \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).