All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint
@ 2018-06-01 17:15 Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 1/9] fbarray: support no-shconf mode Anatoly Burakov
                   ` (18 more replies)
  0 siblings, 19 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

This patchset adds a new command-line option "--in-memory",
which takes old debug options "--huge-unlink" and
"--no-shconf", and enhances them with additional
functionality. This will allow DPDK to reserve hugepages
anonymously instead of using hugetlbfs mountpoints. Coupled
with the fact that this option also effectively enables both
"--no-shconf" and "--huge-unlink" modes, DPDK will be able
to run entirely in memory and not create any shared files
while running - neither hugepages nor any runtime data.

This will, of course, disable secondary processes, but for
use-cases this is targeted at (containers etc.), this is
not a problem.

Older revisions had kernel support at 4.14+ and also
required a fairly new glibc, but now due to not using memfd
and using mmap() instead, minimum supported kernel version
has dropped to 3.8.

RFC->v1 changes:
- Dropped memfd, using anonymous mmap() instead
- Do not deprecate old command-line parameters, instead
  use them as they are, and add a deprecation notice to
  remove them in the next release.

Anatoly Burakov (9):
  fbarray: support no-shconf mode
  ipc: add support for no-shconf mode
  eal: add support for no-shconf for hugepage info
  eal: add support for no-shconf in hugepage data file
  eal: do not create runtime dir in no-shconf mode
  mem: add support for hugepage-unlink mode
  eal: add --in-memory option
  doc: add deprecation notice for EAL command line options
  mem: support in-memory mode

 doc/guides/rel_notes/deprecation.rst          |   5 +
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +-
 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c |   4 +
 lib/librte_eal/common/eal_common_fbarray.c    |  71 +++++----
 lib/librte_eal/common/eal_common_options.c    |  21 ++-
 lib/librte_eal/common/eal_common_proc.c       |  25 ++++
 lib/librte_eal/common/eal_internal_cfg.h      |   4 +
 lib/librte_eal/common/eal_options.h           |   2 +
 lib/librte_eal/linuxapp/eal/eal.c             |   3 +-
 .../linuxapp/eal/eal_hugepage_info.c          |  95 ++++++++----
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 140 ++++++++++++------
 lib/librte_eal/linuxapp/eal/eal_memory.c      |  16 +-
 12 files changed, 272 insertions(+), 117 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/9] fbarray: support no-shconf mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 2/9] ipc: add support for " Anatoly Burakov
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++---------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 019f84c18..2c8b2c218 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -434,39 +434,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len,
 	if (data == NULL)
 		goto fail;
 
-	eal_get_fbarray_path(path, sizeof(path), name);
+	if (internal_config.no_shconf) {
+		/* remap virtual area as writable */
+		void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE,
+				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (new_data == MAP_FAILED) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n",
+					__func__, strerror(errno));
+			goto fail;
+		}
+	} else {
+		eal_get_fbarray_path(path, sizeof(path), name);
 
-	/*
-	 * Each fbarray is unique to process namespace, i.e. the filename
-	 * depends on process prefix. Try to take out a lock and see if we
-	 * succeed. If we don't, someone else is using it already.
-	 */
-	fd = open(path, O_CREAT | O_RDWR, 0600);
-	if (fd < 0) {
-		RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__,
-				path, strerror(errno));
-		rte_errno = errno;
-		goto fail;
-	} else if (flock(fd, LOCK_EX | LOCK_NB)) {
-		RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__,
-				path, strerror(errno));
-		rte_errno = EBUSY;
-		goto fail;
-	}
+		/*
+		 * Each fbarray is unique to process namespace, i.e. the
+		 * filename depends on process prefix. Try to take out a lock
+		 * and see if we succeed. If we don't, someone else is using it
+		 * already.
+		 */
+		fd = open(path, O_CREAT | O_RDWR, 0600);
+		if (fd < 0) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n",
+					__func__, path, strerror(errno));
+			rte_errno = errno;
+			goto fail;
+		} else if (flock(fd, LOCK_EX | LOCK_NB)) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n",
+					__func__, path, strerror(errno));
+			rte_errno = EBUSY;
+			goto fail;
+		}
 
-	/* take out a non-exclusive lock, so that other processes could still
-	 * attach to it, but no other process could reinitialize it.
-	 */
-	if (flock(fd, LOCK_SH | LOCK_NB)) {
-		rte_errno = errno;
-		goto fail;
-	}
+		/* take out a non-exclusive lock, so that other processes could
+		 * still attach to it, but no other process could reinitialize
+		 * it.
+		 */
+		if (flock(fd, LOCK_SH | LOCK_NB)) {
+			rte_errno = errno;
+			goto fail;
+		}
 
-	if (resize_and_map(fd, data, mmap_len))
-		goto fail;
+		if (resize_and_map(fd, data, mmap_len))
+			goto fail;
 
-	/* we've mmap'ed the file, we can now close the fd */
-	close(fd);
+		/* we've mmap'ed the file, we can now close the fd */
+		close(fd);
+	}
 
 	/* initialize the data */
 	memset(data, 0, mmap_len);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/9] ipc: add support for no-shconf mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 1/9] fbarray: support no-shconf mode Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 3/9] eal: add support for no-shconf for hugepage info Anatoly Burakov
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index 707d8ab30..31b5394cc 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -626,6 +626,14 @@ rte_mp_channel_init(void)
 	int dir_fd;
 	pthread_t mp_handle_tid, async_reply_handle_tid;
 
+	/* in no shared files mode, we do not have secondary processes support,
+	 * so no need to initialize IPC.
+	 */
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n");
+		return 0;
+	}
+
 	/* create filter path */
 	create_socket_path("*", path, sizeof(path));
 	strlcpy(mp_filter, basename(path), sizeof(mp_filter));
@@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
 
 	if (check_input(req) == false)
 		return -1;
+
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	if (gettimeofday(&now, NULL) < 0) {
 		RTE_LOG(ERR, EAL, "Faile to get current time\n");
 		rte_errno = errno;
@@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 
 	if (check_input(req) == false)
 		return -1;
+
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	if (gettimeofday(&now, NULL) < 0) {
 		RTE_LOG(ERR, EAL, "Faile to get current time\n");
 		rte_errno = errno;
@@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
 		return -1;
 	}
 
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	return mp_send(msg, peer, MP_REP);
 }
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/9] eal: add support for no-shconf for hugepage info
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 1/9] fbarray: support no-shconf mode Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 2/9] ipc: add support for " Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 4/9] eal: add support for no-shconf in hugepage data file Anatoly Burakov
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev, thomas

Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c   | 4 ++++
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
index 836feb672..1e8f5df23 100644
--- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
@@ -101,6 +101,10 @@ eal_hugepage_info_init(void)
 	hpi->num_pages[0] = num_buffers;
 	hpi->lock_descriptor = fd;
 
+	/* for no shared files mode, do not create shared memory config */
+	if (internal_config.no_shconf)
+		return 0;
+
 	tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
 			sizeof(internal_config.hugepage_info));
 	if (tmp_hpi == NULL ) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7eca711ba..7f8e2fd9c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -446,6 +446,10 @@ eal_hugepage_info_init(void)
 	if (hugepage_info_init() < 0)
 		return -1;
 
+	/* for no shared files mode, we're done */
+	if (internal_config.no_shconf)
+		return 0;
+
 	hpi = &internal_config.hugepage_info[0];
 
 	tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/9] eal: add support for no-shconf in hugepage data file
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (2 preceding siblings ...)
  2018-06-01 17:15 ` [PATCH 3/9] eal: add support for no-shconf for hugepage info Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 5/9] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index c917de1c2..cb784e1c3 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -521,7 +521,18 @@ static void *
 create_shared_memory(const char *filename, const size_t mem_size)
 {
 	void *retval;
-	int fd = open(filename, O_CREAT | O_RDWR, 0666);
+	int fd;
+
+	/* if no shared files mode is used, create anonymous memory instead */
+	if (internal_config.no_shconf) {
+		retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (retval == MAP_FAILED)
+			return NULL;
+		return retval;
+	}
+
+	fd = open(filename, O_CREAT | O_RDWR, 0666);
 	if (fd < 0)
 		return NULL;
 	if (ftruncate(fd, mem_size) < 0) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/9] eal: do not create runtime dir in no-shconf mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (3 preceding siblings ...)
  2018-06-01 17:15 ` [PATCH 4/9] eal: add support for no-shconf in hugepage data file Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 6/9] mem: add support for hugepage-unlink mode Anatoly Burakov
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev, thomas

Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/bsdapp/eal/eal.c   | 3 ++-
 lib/librte_eal/linuxapp/eal/eal.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index dc279542d..13b6f8ae1 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	/* create runtime data directory */
-	if (eal_create_runtime_dir() < 0) {
+	if (internal_config.no_shconf == 0 &&
+			eal_create_runtime_dir() < 0) {
 		rte_eal_init_alert("Cannot create runtime directory\n");
 		rte_errno = EACCES;
 		return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 8655b8691..a8d291520 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -818,7 +818,8 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	/* create runtime data directory */
-	if (eal_create_runtime_dir() < 0) {
+	if (internal_config.no_shconf == 0 &&
+			eal_create_runtime_dir() < 0) {
 		rte_eal_init_alert("Cannot create runtime directory\n");
 		rte_errno = EACCES;
 		return -1;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 6/9] mem: add support for hugepage-unlink mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (4 preceding siblings ...)
  2018-06-01 17:15 ` [PATCH 5/9] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 7/9] eal: add --in-memory option Anatoly Burakov
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --huge-unlink only

 lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 8c11f98c9..f1b6d9744 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -512,6 +512,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 				__func__, strerror(errno));
 			goto resized;
 		}
+		if (internal_config.hugepage_unlink) {
+			if (unlink(path)) {
+				RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+					__func__, strerror(errno));
+				goto resized;
+			}
+		}
 	}
 
 	/*
@@ -592,7 +599,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 		/* ignore failure, can't make it any worse */
 	} else {
 		/* only remove file if we can take out a write lock */
-		if (lock(fd, LOCK_EX) == 1)
+		if (internal_config.hugepage_unlink == 0 &&
+				lock(fd, LOCK_EX) == 1)
 			unlink(path);
 		close(fd);
 	}
@@ -617,6 +625,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
 		return -1;
 	}
 
+	/* if we've already unlinked the page, nothing needs to be done */
+	if (internal_config.hugepage_unlink) {
+		memset(ms, 0, sizeof(*ms));
+		return 0;
+	}
+
 	/* if we are not in single file segments mode, we're going to unmap the
 	 * segment and thus drop the lock on original fd, but hugepage dir is
 	 * now locked so we can take out another one without races.
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 7/9] eal: add --in-memory option
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (5 preceding siblings ...)
  2018-06-01 17:15 ` [PATCH 6/9] mem: add support for hugepage-unlink mode Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 8/9] doc: add deprecation notice for EAL command line options Anatoly Burakov
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.

Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Do not deprecate old options, instead just coopt them

 lib/librte_eal/common/eal_common_options.c | 21 +++++++++++++++++++--
 lib/librte_eal/common/eal_internal_cfg.h   |  4 ++++
 lib/librte_eal/common/eal_options.h        |  2 ++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index ecebb2923..b175b1446 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -66,6 +66,7 @@ eal_long_options[] = {
 	{OPT_NO_HUGE,           0, NULL, OPT_NO_HUGE_NUM          },
 	{OPT_NO_PCI,            0, NULL, OPT_NO_PCI_NUM           },
 	{OPT_NO_SHCONF,         0, NULL, OPT_NO_SHCONF_NUM        },
+	{OPT_IN_MEMORY,         0, NULL, OPT_IN_MEMORY_NUM        },
 	{OPT_PCI_BLACKLIST,     1, NULL, OPT_PCI_BLACKLIST_NUM    },
 	{OPT_PCI_WHITELIST,     1, NULL, OPT_PCI_WHITELIST_NUM    },
 	{OPT_PROC_TYPE,         1, NULL, OPT_PROC_TYPE_NUM        },
@@ -1165,6 +1166,13 @@ eal_parse_common_option(int opt, const char *optarg,
 		conf->no_shconf = 1;
 		break;
 
+	case OPT_IN_MEMORY_NUM:
+		conf->in_memory = 1;
+		/* in-memory is a superset of noshconf and huge-unlink */
+		conf->no_shconf = 1;
+		conf->hugepage_unlink = 1;
+		break;
+
 	case OPT_PROC_TYPE_NUM:
 		conf->process_type = eal_parse_proc_type(optarg);
 		break;
@@ -1316,12 +1324,19 @@ eal_check_common_options(struct internal_config *internal_cfg)
 			"be specified together with --"OPT_NO_HUGE"\n");
 		return -1;
 	}
-
-	if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+	if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink &&
+			!internal_cfg->in_memory) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
 			"be specified together with --"OPT_NO_HUGE"\n");
 		return -1;
 	}
+	if (internal_cfg->single_file_segments &&
+			internal_cfg->hugepage_unlink) {
+		RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
+			"not compatible with neither --"OPT_IN_MEMORY" nor "
+			"--"OPT_HUGE_UNLINK"\n");
+		return -1;
+	}
 
 	return 0;
 }
@@ -1370,6 +1385,8 @@ eal_common_usage(void)
 	       "                      Set specific log level\n"
 	       "  -v                  Display version information on startup\n"
 	       "  -h, --help          This help\n"
+	       "  --"OPT_IN_MEMORY"   Operate entirely in memory. This will \n"
+	       "                      disable secondary process support\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_HUGE_UNLINK"       Unlink hugepage files after init\n"
 	       "  --"OPT_NO_HUGE"           Use malloc instead of hugetlbfs\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index c4cbf3acd..f90d94206 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -41,6 +41,10 @@ struct internal_config {
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
 	volatile unsigned no_shconf;      /**< true if there is no shared config */
+	volatile unsigned in_memory;
+	/**< true if DPDK should operate entirely in-memory and not create any
+	 * shared files or runtime data.
+	 */
 	volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
 	volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
 	/** true to try allocating memory on specific sockets */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 211ae06ae..dcde4054e 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -45,6 +45,8 @@ enum {
 	OPT_NO_PCI_NUM,
 #define OPT_NO_SHCONF         "no-shconf"
 	OPT_NO_SHCONF_NUM,
+#define OPT_IN_MEMORY         "in-memory"
+	OPT_IN_MEMORY_NUM,
 #define OPT_SOCKET_MEM        "socket-mem"
 	OPT_SOCKET_MEM_NUM,
 #define OPT_SYSLOG            "syslog"
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 8/9] doc: add deprecation notice for EAL command line options
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (6 preceding siblings ...)
  2018-06-01 17:15 ` [PATCH 7/9] eal: add --in-memory option Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-06-01 17:15 ` [PATCH 9/9] mem: support in-memory mode Anatoly Burakov
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, ray.kinsella,
	kuralamudhan.ramakrishnan, louise.m.daly, bruce.richardson,
	ferruh.yigit, konstantin.ananyev, thomas

Options --no-shconf and --huge-unlink will be removed, and
replaced with --in-memory option, which will be a superset
of these two, and an offially support method to run DPDK
entirely in memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Add this patch

 doc/guides/rel_notes/deprecation.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 1ce692eac..c8344f42f 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,11 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
+* eal: command-line options ``--no-shconf`` and ``--huge-unlink`` will be
+    removed, and replaced with a single option ``--in-memory``, which will
+    enable DPDK to operate entirely in memory, without creating any files on any
+    filesystems.
+
 * eal: DPDK runtime configuration file (located at
   ``/var/run/.<prefix>_config``) will be moved. The new path will be as follows:
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 9/9] mem: support in-memory mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (7 preceding siblings ...)
  2018-06-01 17:15 ` [PATCH 8/9] doc: add deprecation notice for EAL command line options Anatoly Burakov
@ 2018-06-01 17:15 ` Anatoly Burakov
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-06-01 17:15 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.

To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.

First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).

Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.

Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the
      kernel requirements down to 3.8, and does not impose any restrictions
      glibc (as far as i known).
    
      Unfortunately, there's a bit of an issue with this approach, because
      mmap() is stupid and will happily ignore unsupported arguments. This
      means that if the binary were to be compiled on a 3.8+ kernel but run
      on a pre-3.8 kernel (such as currently supported minimum of 3.2), then
      most likely the memory would be allocated using regular pages, causing
      unthinkable performance degradation. No solution to this problem is
      currently known to me.

 .../linuxapp/eal/eal_hugepage_info.c          |  91 +++++++-----
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 130 +++++++++++-------
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   3 +-
 3 files changed, 139 insertions(+), 85 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7f8e2fd9c..3a7d4b222 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -18,6 +18,8 @@
 #include <sys/queue.h>
 #include <sys/stat.h>
 
+#include <linux/mman.h> /* for hugetlb-related flags */
+
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
@@ -313,11 +315,49 @@ compare_hpi(const void *a, const void *b)
 	return hpi_b->hugepage_sz - hpi_a->hugepage_sz;
 }
 
+static void
+calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent)
+{
+	uint64_t total_pages = 0;
+	unsigned int i;
+
+	/*
+	 * first, try to put all hugepages into relevant sockets, but
+	 * if first attempts fails, fall back to collecting all pages
+	 * in one socket and sorting them later
+	 */
+	total_pages = 0;
+	/* we also don't want to do this for legacy init */
+	if (!internal_config.legacy_mem)
+		for (i = 0; i < rte_socket_count(); i++) {
+			int socket = rte_socket_id_by_idx(i);
+			unsigned int num_pages =
+					get_num_hugepages_on_node(
+						dirent->d_name, socket);
+			hpi->num_pages[socket] = num_pages;
+			total_pages += num_pages;
+		}
+	/*
+	 * we failed to sort memory from the get go, so fall
+	 * back to old way
+	 */
+	if (total_pages == 0) {
+		hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+
+#ifndef RTE_ARCH_64
+		/* for 32-bit systems, limit number of hugepages to
+		 * 1GB per page size */
+		hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
+				RTE_PGSIZE_1G / hpi->hugepage_sz);
+#endif
+	}
+}
+
 static int
 hugepage_info_init(void)
 {	const char dirent_start_text[] = "hugepages-";
 	const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
-	unsigned int i, total_pages, num_sizes = 0;
+	unsigned int i, num_sizes = 0;
 	DIR *dir;
 	struct dirent *dirent;
 
@@ -355,6 +395,22 @@ hugepage_info_init(void)
 					"%" PRIu64 " reserved, but no mounted "
 					"hugetlbfs found for that size\n",
 					num_pages, hpi->hugepage_sz);
+			/* if we have kernel support for reserving hugepages
+			 * through mmap, and we're in in-memory mode, treat this
+			 * page size as valid. we cannot be in legacy mode at
+			 * this point because we've checked this earlier in the
+			 * init process.
+			 */
+#ifdef MAP_HUGE_SHIFT
+			if (internal_config.in_memory) {
+				RTE_LOG(DEBUG, EAL, "In-memory mode enabled, "
+					"hugepages of size %" PRIu64 " bytes "
+					"will be allocated anonymously\n",
+					hpi->hugepage_sz);
+				calc_num_pages(hpi, dirent);
+				num_sizes++;
+			}
+#endif
 			continue;
 		}
 
@@ -371,35 +427,7 @@ hugepage_info_init(void)
 		if (clear_hugedir(hpi->hugedir) == -1)
 			break;
 
-		/*
-		 * first, try to put all hugepages into relevant sockets, but
-		 * if first attempts fails, fall back to collecting all pages
-		 * in one socket and sorting them later
-		 */
-		total_pages = 0;
-		/* we also don't want to do this for legacy init */
-		if (!internal_config.legacy_mem)
-			for (i = 0; i < rte_socket_count(); i++) {
-				int socket = rte_socket_id_by_idx(i);
-				unsigned int num_pages =
-						get_num_hugepages_on_node(
-							dirent->d_name, socket);
-				hpi->num_pages[socket] = num_pages;
-				total_pages += num_pages;
-			}
-		/*
-		 * we failed to sort memory from the get go, so fall
-		 * back to old way
-		 */
-		if (total_pages == 0)
-			hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
-
-#ifndef RTE_ARCH_64
-		/* for 32-bit systems, limit number of hugepages to
-		 * 1GB per page size */
-		hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
-					    RTE_PGSIZE_1G / hpi->hugepage_sz);
-#endif
+		calc_num_pages(hpi, dirent);
 
 		num_sizes++;
 	}
@@ -423,8 +451,7 @@ hugepage_info_init(void)
 
 		for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
 			num_pages += hpi->num_pages[j];
-		if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 &&
-				num_pages > 0)
+		if (num_pages > 0)
 			return 0;
 	}
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index f1b6d9744..19c53e7af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -28,6 +28,7 @@
 #include <numaif.h>
 #endif
 #include <linux/falloc.h>
+#include <linux/mman.h> /* for hugetlb-related mmap flags */
 
 #include <rte_common.h>
 #include <rte_log.h>
@@ -40,6 +41,15 @@
 #include "eal_internal_cfg.h"
 #include "eal_memalloc.h"
 
+const int anonymous_hugepages_supported =
+#ifdef MAP_HUGE_SHIFT
+		1;
+#define RTE_MAP_HUGE_SHIFT MAP_HUGE_SHIFT
+#else
+		0;
+#define RTE_MAP_HUGE_SHIFT 26
+#endif
+
 /*
  * not all kernel version support fallocate on hugetlbfs, so fall back to
  * ftruncate and disallow deallocation if fallocate is not supported.
@@ -486,47 +496,63 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	int cur_socket_id = 0;
 #endif
 	uint64_t map_offset;
+	rte_iova_t iova;
+	void *va;
 	char path[PATH_MAX];
 	int ret = 0;
 	int fd;
 	size_t alloc_sz;
 
-	/* takes out a read lock on segment or segment list */
-	fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
-		return -1;
-	}
-
 	alloc_sz = hi->hugepage_sz;
-	if (internal_config.single_file_segments) {
-		map_offset = seg_idx * alloc_sz;
-		ret = resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
-				alloc_sz, true);
-		if (ret < 0)
-			goto resized;
+	if (internal_config.in_memory && anonymous_hugepages_supported) {
+		int log2, flags;
+
+		log2 = rte_log2_u32(alloc_sz);
+		/* as per mmap() manpage, all page sizes are log2 of page size
+		 * shifted by MAP_HUGE_SHIFT
+		 */
+		flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+				MAP_PRIVATE | MAP_ANONYMOUS;
+		fd = -1;
+		va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
 	} else {
-		map_offset = 0;
-		if (ftruncate(fd, alloc_sz) < 0) {
-			RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
-				__func__, strerror(errno));
-			goto resized;
+		/* takes out a read lock on segment or segment list */
+		fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
+		if (fd < 0) {
+			RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
+			return -1;
 		}
-		if (internal_config.hugepage_unlink) {
-			if (unlink(path)) {
-				RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+
+		if (internal_config.single_file_segments) {
+			map_offset = seg_idx * alloc_sz;
+			ret = resize_hugefile(fd, path, list_idx, seg_idx,
+					map_offset, alloc_sz, true);
+			if (ret < 0)
+				goto resized;
+		} else {
+			map_offset = 0;
+			if (ftruncate(fd, alloc_sz) < 0) {
+				RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
 					__func__, strerror(errno));
 				goto resized;
 			}
+			if (internal_config.hugepage_unlink) {
+				if (unlink(path)) {
+					RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+						__func__, strerror(errno));
+					goto resized;
+				}
+			}
 		}
-	}
 
-	/*
-	 * map the segment, and populate page tables, the kernel fills this
-	 * segment with zeros if it's a new page.
-	 */
-	void *va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
-			MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, map_offset);
+		/*
+		 * map the segment, and populate page tables, the kernel fills
+		 * this segment with zeros if it's a new page.
+		 */
+		va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
+				MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
+				map_offset);
+	}
 
 	if (va == MAP_FAILED) {
 		RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
@@ -539,24 +565,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 		goto resized;
 	}
 
-	rte_iova_t iova = rte_mem_virt2iova(addr);
-	if (iova == RTE_BAD_PHYS_ADDR) {
-		RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
-			__func__);
-		goto mapped;
-	}
-
-#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
-	move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
-
-	if (cur_socket_id != socket_id) {
-		RTE_LOG(DEBUG, EAL,
-				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
-			__func__, socket_id, cur_socket_id);
-		goto mapped;
-	}
-#endif
-
 	/* In linux, hugetlb limitations, like cgroup, are
 	 * enforced at fault time instead of mmap(), even
 	 * with the option of MAP_POPULATE. Kernel will send
@@ -569,9 +577,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 			(unsigned int)(alloc_sz >> 20));
 		goto mapped;
 	}
-	/* for non-single file segments, we can close fd here */
-	if (!internal_config.single_file_segments)
-		close(fd);
 
 	/* we need to trigger a write to the page to enforce page fault and
 	 * ensure that page is accessible to us, but we can't overwrite value
@@ -580,6 +585,28 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	 */
 	*(volatile int *)addr = *(volatile int *)addr;
 
+	iova = rte_mem_virt2iova(addr);
+	if (iova == RTE_BAD_PHYS_ADDR) {
+		RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
+			__func__);
+		goto mapped;
+	}
+
+#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
+	move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
+
+	if (cur_socket_id != socket_id) {
+		RTE_LOG(DEBUG, EAL,
+				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+			__func__, socket_id, cur_socket_id);
+		goto mapped;
+	}
+#endif
+	/* for non-single file segments that aren't in-memory, we can close fd
+	 * here */
+	if (!internal_config.single_file_segments && !internal_config.in_memory)
+		close(fd);
+
 	ms->addr = addr;
 	ms->hugepage_sz = alloc_sz;
 	ms->len = alloc_sz;
@@ -600,6 +627,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	} else {
 		/* only remove file if we can take out a write lock */
 		if (internal_config.hugepage_unlink == 0 &&
+				internal_config.in_memory == 0 &&
 				lock(fd, LOCK_EX) == 1)
 			unlink(path);
 		close(fd);
@@ -709,7 +737,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	 * during init, we already hold a write lock, so don't try to take out
 	 * another one.
 	 */
-	if (wa->hi->lock_descriptor == -1) {
+	if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
 		dir_fd = open(wa->hi->hugedir, O_RDONLY);
 		if (dir_fd < 0) {
 			RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
@@ -813,7 +841,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	 * during init, we already hold a write lock, so don't try to take out
 	 * another one.
 	 */
-	if (wa->hi->lock_descriptor == -1) {
+	if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
 		dir_fd = open(wa->hi->hugedir, O_RDONLY);
 		if (dir_fd < 0) {
 			RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index cb784e1c3..a98d8c036 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1060,8 +1060,7 @@ get_socket_mem_size(int socket)
 
 	for (i = 0; i < internal_config.num_hugepage_sizes; i++){
 		struct hugepage_info *hpi = &internal_config.hugepage_info[i];
-		if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0)
-			size += hpi->hugepage_sz * hpi->num_pages[socket];
+		size += hpi->hugepage_sz * hpi->num_pages[socket];
 	}
 
 	return size;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (8 preceding siblings ...)
  2018-06-01 17:15 ` [PATCH 9/9] mem: support in-memory mode Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 12:47   ` [PATCH v3 0/8] " Anatoly Burakov
                     ` (8 more replies)
  2018-07-13 10:27 ` [PATCH v2 1/9] fbarray: support no-shconf mode Anatoly Burakov
                   ` (8 subsequent siblings)
  18 siblings, 9 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

This patchset adds a new command-line option "--in-memory",
which takes old debug options "--huge-unlink" and
"--no-shconf", and enhances them with additional
functionality. This will allow DPDK to reserve hugepages
anonymously instead of using hugetlbfs mountpoints. Coupled
with the fact that this option also effectively enables both
"--no-shconf" and "--huge-unlink" modes, DPDK will be able
to run entirely in memory and not create any shared files
while running - neither hugepages nor any runtime data.

This will, of course, disable secondary processes, but for
use-cases this is targeted at (containers etc.), this is
not a problem.

Older revisions had kernel support at 4.14+ and also
required a fairly new glibc, but now due to not using memfd
and using mmap() instead, minimum supported kernel version
has dropped to 3.8.

v1->v2 changes:
- Rebase on latest master
- Fix patch 5 to include check from patch 6 as commit message
  states

RFC->v1 changes:
- Dropped memfd, using anonymous mmap() instead
- Do not deprecate old command-line parameters, instead
  use them as they are, and add a deprecation notice to
  remove them in the next release.

Anatoly Burakov (9):
  fbarray: support no-shconf mode
  ipc: add support for no-shconf mode
  eal: add support for no-shconf for hugepage info
  eal: add support for no-shconf in hugepage data file
  eal: do not create runtime dir in no-shconf mode
  mem: add support for hugepage-unlink mode
  eal: add --in-memory option
  doc: add deprecation notice for EAL command line options
  mem: support in-memory mode

 doc/guides/rel_notes/deprecation.rst          |   5 +
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +-
 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c |   4 +
 lib/librte_eal/common/eal_common_fbarray.c    |  71 +++++----
 lib/librte_eal/common/eal_common_options.c    |  20 ++-
 lib/librte_eal/common/eal_common_proc.c       |  25 ++++
 lib/librte_eal/common/eal_internal_cfg.h      |   4 +
 lib/librte_eal/common/eal_options.h           |   2 +
 lib/librte_eal/linuxapp/eal/eal.c             |   3 +-
 .../linuxapp/eal/eal_hugepage_info.c          |  95 ++++++++----
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 140 ++++++++++++------
 lib/librte_eal/linuxapp/eal/eal_memory.c      |  16 +-
 12 files changed, 271 insertions(+), 117 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/9] fbarray: support no-shconf mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (9 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 10:27 ` [PATCH v2 2/9] ipc: add support for " Anatoly Burakov
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++---------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 977174c4f..43caf3ced 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -705,39 +705,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len,
 	if (data == NULL)
 		goto fail;
 
-	eal_get_fbarray_path(path, sizeof(path), name);
+	if (internal_config.no_shconf) {
+		/* remap virtual area as writable */
+		void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE,
+				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (new_data == MAP_FAILED) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n",
+					__func__, strerror(errno));
+			goto fail;
+		}
+	} else {
+		eal_get_fbarray_path(path, sizeof(path), name);
 
-	/*
-	 * Each fbarray is unique to process namespace, i.e. the filename
-	 * depends on process prefix. Try to take out a lock and see if we
-	 * succeed. If we don't, someone else is using it already.
-	 */
-	fd = open(path, O_CREAT | O_RDWR, 0600);
-	if (fd < 0) {
-		RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__,
-				path, strerror(errno));
-		rte_errno = errno;
-		goto fail;
-	} else if (flock(fd, LOCK_EX | LOCK_NB)) {
-		RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__,
-				path, strerror(errno));
-		rte_errno = EBUSY;
-		goto fail;
-	}
+		/*
+		 * Each fbarray is unique to process namespace, i.e. the
+		 * filename depends on process prefix. Try to take out a lock
+		 * and see if we succeed. If we don't, someone else is using it
+		 * already.
+		 */
+		fd = open(path, O_CREAT | O_RDWR, 0600);
+		if (fd < 0) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n",
+					__func__, path, strerror(errno));
+			rte_errno = errno;
+			goto fail;
+		} else if (flock(fd, LOCK_EX | LOCK_NB)) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n",
+					__func__, path, strerror(errno));
+			rte_errno = EBUSY;
+			goto fail;
+		}
 
-	/* take out a non-exclusive lock, so that other processes could still
-	 * attach to it, but no other process could reinitialize it.
-	 */
-	if (flock(fd, LOCK_SH | LOCK_NB)) {
-		rte_errno = errno;
-		goto fail;
-	}
+		/* take out a non-exclusive lock, so that other processes could
+		 * still attach to it, but no other process could reinitialize
+		 * it.
+		 */
+		if (flock(fd, LOCK_SH | LOCK_NB)) {
+			rte_errno = errno;
+			goto fail;
+		}
 
-	if (resize_and_map(fd, data, mmap_len))
-		goto fail;
+		if (resize_and_map(fd, data, mmap_len))
+			goto fail;
 
-	/* we've mmap'ed the file, we can now close the fd */
-	close(fd);
+		/* we've mmap'ed the file, we can now close the fd */
+		close(fd);
+	}
 
 	/* initialize the data */
 	memset(data, 0, mmap_len);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/9] ipc: add support for no-shconf mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (10 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 1/9] fbarray: support no-shconf mode Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 10:27 ` [PATCH v2 3/9] eal: add support for no-shconf for hugepage info Anatoly Burakov
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index f010ef59e..c19b4b406 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -626,6 +626,14 @@ rte_mp_channel_init(void)
 	int dir_fd;
 	pthread_t mp_handle_tid, async_reply_handle_tid;
 
+	/* in no shared files mode, we do not have secondary processes support,
+	 * so no need to initialize IPC.
+	 */
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n");
+		return 0;
+	}
+
 	/* create filter path */
 	create_socket_path("*", path, sizeof(path));
 	strlcpy(mp_filter, basename(path), sizeof(mp_filter));
@@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
 
 	if (check_input(req) == false)
 		return -1;
+
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	if (gettimeofday(&now, NULL) < 0) {
 		RTE_LOG(ERR, EAL, "Faile to get current time\n");
 		rte_errno = errno;
@@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 
 	if (check_input(req) == false)
 		return -1;
+
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	if (gettimeofday(&now, NULL) < 0) {
 		RTE_LOG(ERR, EAL, "Faile to get current time\n");
 		rte_errno = errno;
@@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
 		return -1;
 	}
 
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	return mp_send(msg, peer, MP_REP);
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 3/9] eal: add support for no-shconf for hugepage info
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (11 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 2/9] ipc: add support for " Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 10:27 ` [PATCH v2 4/9] eal: add support for no-shconf in hugepage data file Anatoly Burakov
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev, thomas

Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c   | 4 ++++
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
index 836feb672..1e8f5df23 100644
--- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
@@ -101,6 +101,10 @@ eal_hugepage_info_init(void)
 	hpi->num_pages[0] = num_buffers;
 	hpi->lock_descriptor = fd;
 
+	/* for no shared files mode, do not create shared memory config */
+	if (internal_config.no_shconf)
+		return 0;
+
 	tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
 			sizeof(internal_config.hugepage_info));
 	if (tmp_hpi == NULL ) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7eca711ba..7f8e2fd9c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -446,6 +446,10 @@ eal_hugepage_info_init(void)
 	if (hugepage_info_init() < 0)
 		return -1;
 
+	/* for no shared files mode, we're done */
+	if (internal_config.no_shconf)
+		return 0;
+
 	hpi = &internal_config.hugepage_info[0];
 
 	tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 4/9] eal: add support for no-shconf in hugepage data file
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (12 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 3/9] eal: add support for no-shconf for hugepage info Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 10:27 ` [PATCH v2 5/9] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5d3c8831b..ddfa8b133 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -521,7 +521,18 @@ static void *
 create_shared_memory(const char *filename, const size_t mem_size)
 {
 	void *retval;
-	int fd = open(filename, O_CREAT | O_RDWR, 0666);
+	int fd;
+
+	/* if no shared files mode is used, create anonymous memory instead */
+	if (internal_config.no_shconf) {
+		retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (retval == MAP_FAILED)
+			return NULL;
+		return retval;
+	}
+
+	fd = open(filename, O_CREAT | O_RDWR, 0666);
 	if (fd < 0)
 		return NULL;
 	if (ftruncate(fd, mem_size) < 0) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 5/9] eal: do not create runtime dir in no-shconf mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (13 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 4/9] eal: add support for no-shconf in hugepage data file Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 10:27 ` [PATCH v2 6/9] mem: add support for hugepage-unlink mode Anatoly Burakov
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev, thomas

Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/bsdapp/eal/eal.c   | 3 ++-
 lib/librte_eal/linuxapp/eal/eal.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index dc279542d..13b6f8ae1 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	/* create runtime data directory */
-	if (eal_create_runtime_dir() < 0) {
+	if (internal_config.no_shconf == 0 &&
+			eal_create_runtime_dir() < 0) {
 		rte_eal_init_alert("Cannot create runtime directory\n");
 		rte_errno = EACCES;
 		return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index ec7cea55d..191960caa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -832,7 +832,8 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	/* create runtime data directory */
-	if (eal_create_runtime_dir() < 0) {
+	if (internal_config.no_shconf == 0 &&
+			eal_create_runtime_dir() < 0) {
 		rte_eal_init_alert("Cannot create runtime directory\n");
 		rte_errno = EACCES;
 		return -1;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 6/9] mem: add support for hugepage-unlink mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (14 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 5/9] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 10:27 ` [PATCH v2 7/9] eal: add --in-memory option Anatoly Burakov
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v1->v2:
    - Move check for hugepage unlink into this patch, to be
      consistent with commit message
    
    RFC->v1:
    - Use --huge-unlink only
    
    RFC->v1:
    - Use --huge-unlink only

 lib/librte_eal/common/eal_common_options.c |  6 ++++++
 lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 +++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 45ea01a8b..df5d53648 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1332,6 +1332,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
 			" is only supported in non-legacy memory mode\n");
 		return -1;
 	}
+	if (internal_cfg->single_file_segments &&
+			internal_cfg->hugepage_unlink) {
+		RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
+			"not compatible with --"OPT_HUGE_UNLINK"\n");
+		return -1;
+	}
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 69604f823..d610923b8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -489,6 +489,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 				__func__, strerror(errno));
 			goto resized;
 		}
+		if (internal_config.hugepage_unlink) {
+			if (unlink(path)) {
+				RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+					__func__, strerror(errno));
+				goto resized;
+			}
+		}
 	}
 
 	/*
@@ -587,7 +594,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 		/* ignore failure, can't make it any worse */
 	} else {
 		/* only remove file if we can take out a write lock */
-		if (lock(fd, LOCK_EX) == 1)
+		if (internal_config.hugepage_unlink == 0 &&
+				lock(fd, LOCK_EX) == 1)
 			unlink(path);
 		close(fd);
 	}
@@ -612,6 +620,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
 		return -1;
 	}
 
+	/* if we've already unlinked the page, nothing needs to be done */
+	if (internal_config.hugepage_unlink) {
+		memset(ms, 0, sizeof(*ms));
+		return 0;
+	}
+
 	/* if we are not in single file segments mode, we're going to unmap the
 	 * segment and thus drop the lock on original fd, but hugepage dir is
 	 * now locked so we can take out another one without races.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 7/9] eal: add --in-memory option
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (15 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 6/9] mem: add support for hugepage-unlink mode Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 12:13   ` Thomas Monjalon
  2018-07-13 10:27 ` [PATCH v2 8/9] doc: add deprecation notice for EAL command line options Anatoly Burakov
  2018-07-13 10:27 ` [PATCH v2 9/9] mem: support in-memory mode Anatoly Burakov
  18 siblings, 1 reply; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.

Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Do not deprecate old options, instead just coopt them

 lib/librte_eal/common/eal_common_options.c | 18 ++++++++++++++----
 lib/librte_eal/common/eal_internal_cfg.h   |  4 ++++
 lib/librte_eal/common/eal_options.h        |  2 ++
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index df5d53648..f308b57c3 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -66,6 +66,7 @@ eal_long_options[] = {
 	{OPT_NO_HUGE,           0, NULL, OPT_NO_HUGE_NUM          },
 	{OPT_NO_PCI,            0, NULL, OPT_NO_PCI_NUM           },
 	{OPT_NO_SHCONF,         0, NULL, OPT_NO_SHCONF_NUM        },
+	{OPT_IN_MEMORY,         0, NULL, OPT_IN_MEMORY_NUM        },
 	{OPT_PCI_BLACKLIST,     1, NULL, OPT_PCI_BLACKLIST_NUM    },
 	{OPT_PCI_WHITELIST,     1, NULL, OPT_PCI_WHITELIST_NUM    },
 	{OPT_PROC_TYPE,         1, NULL, OPT_PROC_TYPE_NUM        },
@@ -1170,6 +1171,13 @@ eal_parse_common_option(int opt, const char *optarg,
 		conf->no_shconf = 1;
 		break;
 
+	case OPT_IN_MEMORY_NUM:
+		conf->in_memory = 1;
+		/* in-memory is a superset of noshconf and huge-unlink */
+		conf->no_shconf = 1;
+		conf->hugepage_unlink = 1;
+		break;
+
 	case OPT_PROC_TYPE_NUM:
 		conf->process_type = eal_parse_proc_type(optarg);
 		break;
@@ -1321,8 +1329,8 @@ eal_check_common_options(struct internal_config *internal_cfg)
 			"be specified together with --"OPT_NO_HUGE"\n");
 		return -1;
 	}
-
-	if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+	if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink &&
+			!internal_cfg->in_memory) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
 			"be specified together with --"OPT_NO_HUGE"\n");
 		return -1;
@@ -1330,12 +1338,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
 	if (internal_config.force_socket_limits && internal_config.legacy_mem) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_SOCKET_LIMIT
 			" is only supported in non-legacy memory mode\n");
-		return -1;
 	}
 	if (internal_cfg->single_file_segments &&
 			internal_cfg->hugepage_unlink) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
-			"not compatible with --"OPT_HUGE_UNLINK"\n");
+			"not compatible with neither --"OPT_IN_MEMORY" nor "
+			"--"OPT_HUGE_UNLINK"\n");
 		return -1;
 	}
 
@@ -1386,6 +1394,8 @@ eal_common_usage(void)
 	       "                      Set specific log level\n"
 	       "  -v                  Display version information on startup\n"
 	       "  -h, --help          This help\n"
+	       "  --"OPT_IN_MEMORY"   Operate entirely in memory. This will \n"
+	       "                      disable secondary process support\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_HUGE_UNLINK"       Unlink hugepage files after init\n"
 	       "  --"OPT_NO_HUGE"           Use malloc instead of hugetlbfs\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index d66cd0313..00ee6e06e 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -41,6 +41,10 @@ struct internal_config {
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
 	volatile unsigned no_shconf;      /**< true if there is no shared config */
+	volatile unsigned in_memory;
+	/**< true if DPDK should operate entirely in-memory and not create any
+	 * shared files or runtime data.
+	 */
 	volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
 	volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
 	/** true to try allocating memory on specific sockets */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 6d92f64a8..96e166787 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -45,6 +45,8 @@ enum {
 	OPT_NO_PCI_NUM,
 #define OPT_NO_SHCONF         "no-shconf"
 	OPT_NO_SHCONF_NUM,
+#define OPT_IN_MEMORY         "in-memory"
+	OPT_IN_MEMORY_NUM,
 #define OPT_SOCKET_MEM        "socket-mem"
 	OPT_SOCKET_MEM_NUM,
 #define OPT_SOCKET_LIMIT        "socket-limit"
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 8/9] doc: add deprecation notice for EAL command line options
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (16 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 7/9] eal: add --in-memory option Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 12:13   ` Thomas Monjalon
  2018-07-13 10:27 ` [PATCH v2 9/9] mem: support in-memory mode Anatoly Burakov
  18 siblings, 1 reply; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, ray.kinsella,
	kuralamudhan.ramakrishnan, louise.m.daly, bruce.richardson,
	ferruh.yigit, konstantin.ananyev, thomas

Options --no-shconf and --huge-unlink will be removed, and
replaced with --in-memory option, which will be a superset
of these two, and an offially support method to run DPDK
entirely in memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Add this patch

 doc/guides/rel_notes/deprecation.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 5de59833d..dd1b5c5d8 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,11 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
+* eal: command-line options ``--no-shconf`` and ``--huge-unlink`` will be
+    removed, and replaced with a single option ``--in-memory``, which will
+    enable DPDK to operate entirely in memory, without creating any files on any
+    filesystems.
+
 * eal: DPDK runtime configuration file (located at
   ``/var/run/.<prefix>_config``) will be moved. The new path will be as follows:
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 9/9] mem: support in-memory mode
  2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                   ` (17 preceding siblings ...)
  2018-07-13 10:27 ` [PATCH v2 8/9] doc: add deprecation notice for EAL command line options Anatoly Burakov
@ 2018-07-13 10:27 ` Anatoly Burakov
  2018-07-13 12:15   ` Thomas Monjalon
  18 siblings, 1 reply; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.

To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.

First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).

Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.

Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the
      kernel requirements down to 3.8, and does not impose any restrictions
      glibc (as far as i known).
    
      Unfortunately, there's a bit of an issue with this approach, because
      mmap() is stupid and will happily ignore unsupported arguments. This
      means that if the binary were to be compiled on a 3.8+ kernel but run
      on a pre-3.8 kernel (such as currently supported minimum of 3.2), then
      most likely the memory would be allocated using regular pages, causing
      unthinkable performance degradation. No solution to this problem is
      currently known to me.

 .../linuxapp/eal/eal_hugepage_info.c          |  91 +++++++-----
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 130 +++++++++++-------
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   3 +-
 3 files changed, 139 insertions(+), 85 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7f8e2fd9c..3a7d4b222 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -18,6 +18,8 @@
 #include <sys/queue.h>
 #include <sys/stat.h>
 
+#include <linux/mman.h> /* for hugetlb-related flags */
+
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
@@ -313,11 +315,49 @@ compare_hpi(const void *a, const void *b)
 	return hpi_b->hugepage_sz - hpi_a->hugepage_sz;
 }
 
+static void
+calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent)
+{
+	uint64_t total_pages = 0;
+	unsigned int i;
+
+	/*
+	 * first, try to put all hugepages into relevant sockets, but
+	 * if first attempts fails, fall back to collecting all pages
+	 * in one socket and sorting them later
+	 */
+	total_pages = 0;
+	/* we also don't want to do this for legacy init */
+	if (!internal_config.legacy_mem)
+		for (i = 0; i < rte_socket_count(); i++) {
+			int socket = rte_socket_id_by_idx(i);
+			unsigned int num_pages =
+					get_num_hugepages_on_node(
+						dirent->d_name, socket);
+			hpi->num_pages[socket] = num_pages;
+			total_pages += num_pages;
+		}
+	/*
+	 * we failed to sort memory from the get go, so fall
+	 * back to old way
+	 */
+	if (total_pages == 0) {
+		hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+
+#ifndef RTE_ARCH_64
+		/* for 32-bit systems, limit number of hugepages to
+		 * 1GB per page size */
+		hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
+				RTE_PGSIZE_1G / hpi->hugepage_sz);
+#endif
+	}
+}
+
 static int
 hugepage_info_init(void)
 {	const char dirent_start_text[] = "hugepages-";
 	const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
-	unsigned int i, total_pages, num_sizes = 0;
+	unsigned int i, num_sizes = 0;
 	DIR *dir;
 	struct dirent *dirent;
 
@@ -355,6 +395,22 @@ hugepage_info_init(void)
 					"%" PRIu64 " reserved, but no mounted "
 					"hugetlbfs found for that size\n",
 					num_pages, hpi->hugepage_sz);
+			/* if we have kernel support for reserving hugepages
+			 * through mmap, and we're in in-memory mode, treat this
+			 * page size as valid. we cannot be in legacy mode at
+			 * this point because we've checked this earlier in the
+			 * init process.
+			 */
+#ifdef MAP_HUGE_SHIFT
+			if (internal_config.in_memory) {
+				RTE_LOG(DEBUG, EAL, "In-memory mode enabled, "
+					"hugepages of size %" PRIu64 " bytes "
+					"will be allocated anonymously\n",
+					hpi->hugepage_sz);
+				calc_num_pages(hpi, dirent);
+				num_sizes++;
+			}
+#endif
 			continue;
 		}
 
@@ -371,35 +427,7 @@ hugepage_info_init(void)
 		if (clear_hugedir(hpi->hugedir) == -1)
 			break;
 
-		/*
-		 * first, try to put all hugepages into relevant sockets, but
-		 * if first attempts fails, fall back to collecting all pages
-		 * in one socket and sorting them later
-		 */
-		total_pages = 0;
-		/* we also don't want to do this for legacy init */
-		if (!internal_config.legacy_mem)
-			for (i = 0; i < rte_socket_count(); i++) {
-				int socket = rte_socket_id_by_idx(i);
-				unsigned int num_pages =
-						get_num_hugepages_on_node(
-							dirent->d_name, socket);
-				hpi->num_pages[socket] = num_pages;
-				total_pages += num_pages;
-			}
-		/*
-		 * we failed to sort memory from the get go, so fall
-		 * back to old way
-		 */
-		if (total_pages == 0)
-			hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
-
-#ifndef RTE_ARCH_64
-		/* for 32-bit systems, limit number of hugepages to
-		 * 1GB per page size */
-		hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
-					    RTE_PGSIZE_1G / hpi->hugepage_sz);
-#endif
+		calc_num_pages(hpi, dirent);
 
 		num_sizes++;
 	}
@@ -423,8 +451,7 @@ hugepage_info_init(void)
 
 		for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
 			num_pages += hpi->num_pages[j];
-		if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 &&
-				num_pages > 0)
+		if (num_pages > 0)
 			return 0;
 	}
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d610923b8..10c959da4 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -28,6 +28,7 @@
 #include <numaif.h>
 #endif
 #include <linux/falloc.h>
+#include <linux/mman.h> /* for hugetlb-related mmap flags */
 
 #include <rte_common.h>
 #include <rte_log.h>
@@ -41,6 +42,15 @@
 #include "eal_memalloc.h"
 #include "eal_private.h"
 
+const int anonymous_hugepages_supported =
+#ifdef MAP_HUGE_SHIFT
+		1;
+#define RTE_MAP_HUGE_SHIFT MAP_HUGE_SHIFT
+#else
+		0;
+#define RTE_MAP_HUGE_SHIFT 26
+#endif
+
 /*
  * not all kernel version support fallocate on hugetlbfs, so fall back to
  * ftruncate and disallow deallocation if fallocate is not supported.
@@ -461,6 +471,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	int cur_socket_id = 0;
 #endif
 	uint64_t map_offset;
+	rte_iova_t iova;
+	void *va;
 	char path[PATH_MAX];
 	int ret = 0;
 	int fd;
@@ -468,43 +480,57 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	int flags;
 	void *new_addr;
 
-	/* takes out a read lock on segment or segment list */
-	fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
-		return -1;
-	}
-
 	alloc_sz = hi->hugepage_sz;
-	if (internal_config.single_file_segments) {
-		map_offset = seg_idx * alloc_sz;
-		ret = resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
-				alloc_sz, true);
-		if (ret < 0)
-			goto resized;
+	if (internal_config.in_memory && anonymous_hugepages_supported) {
+		int log2, flags;
+
+		log2 = rte_log2_u32(alloc_sz);
+		/* as per mmap() manpage, all page sizes are log2 of page size
+		 * shifted by MAP_HUGE_SHIFT
+		 */
+		flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+				MAP_PRIVATE | MAP_ANONYMOUS;
+		fd = -1;
+		va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
 	} else {
-		map_offset = 0;
-		if (ftruncate(fd, alloc_sz) < 0) {
-			RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
-				__func__, strerror(errno));
-			goto resized;
+		/* takes out a read lock on segment or segment list */
+		fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
+		if (fd < 0) {
+			RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
+			return -1;
 		}
-		if (internal_config.hugepage_unlink) {
-			if (unlink(path)) {
-				RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+
+		if (internal_config.single_file_segments) {
+			map_offset = seg_idx * alloc_sz;
+			ret = resize_hugefile(fd, path, list_idx, seg_idx,
+					map_offset, alloc_sz, true);
+			if (ret < 0)
+				goto resized;
+		} else {
+			map_offset = 0;
+			if (ftruncate(fd, alloc_sz) < 0) {
+				RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
 					__func__, strerror(errno));
 				goto resized;
 			}
+			if (internal_config.hugepage_unlink) {
+				if (unlink(path)) {
+					RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+						__func__, strerror(errno));
+					goto resized;
+				}
+			}
 		}
+
+		/*
+		 * map the segment, and populate page tables, the kernel fills
+		 * this segment with zeros if it's a new page.
+		 */
+		va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
+				MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
+				map_offset);
 	}
 
-	/*
-	 * map the segment, and populate page tables, the kernel fills this
-	 * segment with zeros if it's a new page.
-	 */
-	void *va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
-			MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, map_offset);
-
 	if (va == MAP_FAILED) {
 		RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
 			strerror(errno));
@@ -519,24 +545,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 		goto resized;
 	}
 
-	rte_iova_t iova = rte_mem_virt2iova(addr);
-	if (iova == RTE_BAD_PHYS_ADDR) {
-		RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
-			__func__);
-		goto mapped;
-	}
-
-#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
-	move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
-
-	if (cur_socket_id != socket_id) {
-		RTE_LOG(DEBUG, EAL,
-				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
-			__func__, socket_id, cur_socket_id);
-		goto mapped;
-	}
-#endif
-
 	/* In linux, hugetlb limitations, like cgroup, are
 	 * enforced at fault time instead of mmap(), even
 	 * with the option of MAP_POPULATE. Kernel will send
@@ -549,9 +557,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 			(unsigned int)(alloc_sz >> 20));
 		goto mapped;
 	}
-	/* for non-single file segments, we can close fd here */
-	if (!internal_config.single_file_segments)
-		close(fd);
 
 	/* we need to trigger a write to the page to enforce page fault and
 	 * ensure that page is accessible to us, but we can't overwrite value
@@ -560,6 +565,28 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	 */
 	*(volatile int *)addr = *(volatile int *)addr;
 
+	iova = rte_mem_virt2iova(addr);
+	if (iova == RTE_BAD_PHYS_ADDR) {
+		RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
+			__func__);
+		goto mapped;
+	}
+
+#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
+	move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
+
+	if (cur_socket_id != socket_id) {
+		RTE_LOG(DEBUG, EAL,
+				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+			__func__, socket_id, cur_socket_id);
+		goto mapped;
+	}
+#endif
+	/* for non-single file segments that aren't in-memory, we can close fd
+	 * here */
+	if (!internal_config.single_file_segments && !internal_config.in_memory)
+		close(fd);
+
 	ms->addr = addr;
 	ms->hugepage_sz = alloc_sz;
 	ms->len = alloc_sz;
@@ -595,6 +622,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	} else {
 		/* only remove file if we can take out a write lock */
 		if (internal_config.hugepage_unlink == 0 &&
+				internal_config.in_memory == 0 &&
 				lock(fd, LOCK_EX) == 1)
 			unlink(path);
 		close(fd);
@@ -705,7 +733,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	 * during init, we already hold a write lock, so don't try to take out
 	 * another one.
 	 */
-	if (wa->hi->lock_descriptor == -1) {
+	if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
 		dir_fd = open(wa->hi->hugedir, O_RDONLY);
 		if (dir_fd < 0) {
 			RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
@@ -809,7 +837,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	 * during init, we already hold a write lock, so don't try to take out
 	 * another one.
 	 */
-	if (wa->hi->lock_descriptor == -1) {
+	if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
 		dir_fd = open(wa->hi->hugedir, O_RDONLY);
 		if (dir_fd < 0) {
 			RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ddfa8b133..dbf19499e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1088,8 +1088,7 @@ get_socket_mem_size(int socket)
 
 	for (i = 0; i < internal_config.num_hugepage_sizes; i++){
 		struct hugepage_info *hpi = &internal_config.hugepage_info[i];
-		if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0)
-			size += hpi->hugepage_sz * hpi->num_pages[socket];
+		size += hpi->hugepage_sz * hpi->num_pages[socket];
 	}
 
 	return size;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 7/9] eal: add --in-memory option
  2018-07-13 10:27 ` [PATCH v2 7/9] eal: add --in-memory option Anatoly Burakov
@ 2018-07-13 12:13   ` Thomas Monjalon
  2018-07-13 12:27     ` Burakov, Anatoly
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Monjalon @ 2018-07-13 12:13 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

13/07/2018 12:27, Anatoly Burakov:
> This command-line option will cause DPDK to operate entirely in
> memory and not create any shared files at runtime, including any
> shared configuration or hugetlbfs files. This is useful for debug
> purposes, as well as for certain use cases like containers or
> automatic memory cleanup.
> 
> Currently, this option acts as a strict superset of --no-shconf and
> --huge-unlink commands.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

I would like to see some support or review for this feature.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 8/9] doc: add deprecation notice for EAL command line options
  2018-07-13 10:27 ` [PATCH v2 8/9] doc: add deprecation notice for EAL command line options Anatoly Burakov
@ 2018-07-13 12:13   ` Thomas Monjalon
  2018-07-13 12:29     ` Burakov, Anatoly
  0 siblings, 1 reply; 35+ messages in thread
From: Thomas Monjalon @ 2018-07-13 12:13 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Neil Horman, John McNamara, Marko Kovacevic, ray.kinsella,
	kuralamudhan.ramakrishnan, louise.m.daly, bruce.richardson,
	ferruh.yigit, konstantin.ananyev

13/07/2018 12:27, Anatoly Burakov:
> Options --no-shconf and --huge-unlink will be removed, and
> replaced with --in-memory option, which will be a superset
> of these two, and an offially support method to run DPDK
> entirely in memory.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

The deprecation notice should be sent separately in order to wait
for enough agreement.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 9/9] mem: support in-memory mode
  2018-07-13 10:27 ` [PATCH v2 9/9] mem: support in-memory mode Anatoly Burakov
@ 2018-07-13 12:15   ` Thomas Monjalon
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Monjalon @ 2018-07-13 12:15 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

There is a compilation error:

../lib/librte_eal/linuxapp/eal/eal_memalloc.c: In function ‘alloc_seg’:
../lib/librte_eal/linuxapp/eal/eal_memalloc.c:619:3: error:
‘map_offset’ may be used uninitialized in this function


13/07/2018 12:27, Anatoly Burakov:
> Implement the final piece of the in-memory mode puzzle - enable running
> DPDK entirely in memory, without creating any files.
> 
> To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
> without hugetlbfs mountpoints. In order to enable this, a few things needed
> to be changed.
> 
> First of all, we need to allow empty hugetlbfs mountpoints in
> hugepage_info, and handle them correctly (by not trying to create any
> files and lock any directories).
> 
> Next, we need to reorder the mapping sequence, because the page is not
> really allocated until the page fault, and we cannot get its IOVA
> address before we trigger the page fault.
> 
> Finally, decide at compile time whether we are going to be supporting
> anonymous hugepages or not, because we cannot check for it at runtime.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> 
> Notes:
>     RFC->v1:
>     - Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the
>       kernel requirements down to 3.8, and does not impose any restrictions
>       glibc (as far as i known).
>     
>       Unfortunately, there's a bit of an issue with this approach, because
>       mmap() is stupid and will happily ignore unsupported arguments. This
>       means that if the binary were to be compiled on a 3.8+ kernel but run
>       on a pre-3.8 kernel (such as currently supported minimum of 3.2), then
>       most likely the memory would be allocated using regular pages, causing
>       unthinkable performance degradation. No solution to this problem is
>       currently known to me.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 7/9] eal: add --in-memory option
  2018-07-13 12:13   ` Thomas Monjalon
@ 2018-07-13 12:27     ` Burakov, Anatoly
  0 siblings, 0 replies; 35+ messages in thread
From: Burakov, Anatoly @ 2018-07-13 12:27 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

On 13-Jul-18 1:13 PM, Thomas Monjalon wrote:
> 13/07/2018 12:27, Anatoly Burakov:
>> This command-line option will cause DPDK to operate entirely in
>> memory and not create any shared files at runtime, including any
>> shared configuration or hugetlbfs files. This is useful for debug
>> purposes, as well as for certain use cases like containers or
>> automatic memory cleanup.
>>
>> Currently, this option acts as a strict superset of --no-shconf and
>> --huge-unlink commands.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> I would like to see some support or review for this feature.
> 

While the justification for it can be use-cases like running a DPDK 
process without worrying about cleaning up its hugepages afterwards 
(somewhat less of a concern since 18.05 but still a concern if primary 
crashes), it is really fixing the no-shconf/huge-unlink options to not 
be half-measures.

Both of these options effectively disable secondary processes, but don't 
do it in a consistent way - huge-unlink cleans up hugepages after 
allocating them, but leaves shared config on. No-shconf disables shared 
config, but leaves hugepages in place. Since 18.05, huge-unlink didn't 
work anyway (wasn't implemented, which was my omission), and due to EAL 
now relying on fbarray's to store some data, no-shconf wasn't working 
correctly either because even though shared config wasn't created, two 
primaries still couldn't share a prefix with --no-shconf (see the first 
patch).

So, this patchset is really an acknowledgement of the fact that both 
huge-unlink and no-shconf options are really there to disable secondary 
processes and stop leaving files on the file system. I just went one 
step further, and instead of allocating-and-then-removing hugepages 
we're not creating them in the first place, and map them anonymously 
instead.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 8/9] doc: add deprecation notice for EAL command line options
  2018-07-13 12:13   ` Thomas Monjalon
@ 2018-07-13 12:29     ` Burakov, Anatoly
  0 siblings, 0 replies; 35+ messages in thread
From: Burakov, Anatoly @ 2018-07-13 12:29 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Neil Horman, John McNamara, Marko Kovacevic, ray.kinsella,
	kuralamudhan.ramakrishnan, louise.m.daly, bruce.richardson,
	ferruh.yigit, konstantin.ananyev

On 13-Jul-18 1:13 PM, Thomas Monjalon wrote:
> 13/07/2018 12:27, Anatoly Burakov:
>> Options --no-shconf and --huge-unlink will be removed, and
>> replaced with --in-memory option, which will be a superset
>> of these two, and an offially support method to run DPDK
>> entirely in memory.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> The deprecation notice should be sent separately in order to wait
> for enough agreement.
> 

Really, we don't have to deprecate old options. It would be nice to 
remove them as half-measures and replace them with a proper 
implementation, but it's not strictly necessary, so i'll make it a 
separate patch.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 0/8] Support running DPDK without hugetlbfs mountpoint
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
@ 2018-07-13 12:47   ` Anatoly Burakov
  2018-07-13 13:41     ` Thomas Monjalon
  2018-07-13 12:47   ` [PATCH v3 1/8] fbarray: support no-shconf mode Anatoly Burakov
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:47 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

This patchset adds a new command-line option "--in-memory",
which takes old debug options "--huge-unlink" and
"--no-shconf", and enhances them with additional
functionality. This will allow DPDK to reserve hugepages
anonymously instead of using hugetlbfs mountpoints. Coupled
with the fact that this option also effectively enables both
"--no-shconf" and "--huge-unlink" modes, DPDK will be able
to run entirely in memory and not create any shared files
while running - neither hugepages nor any runtime data.

This will, of course, disable secondary processes, but for
use-cases this is targeted at (containers etc.), this is
not a problem.

Older revisions had kernel support at 4.14+ and also
required a fairly new glibc, but now due to not using memfd
and using mmap() instead, minimum supported kernel version
has dropped to 3.8.

v2->v3 changes:
- Fix compile issue in patch 9 (now 8)
- Drop deprecation notice (will be sent separately)

v1->v2 changes:
- Rebase on latest master
- Fix patch 5 to include check from patch 6 as commit message
  states

RFC->v1 changes:
- Dropped memfd, using anonymous mmap() instead
- Do not deprecate old command-line parameters, instead
  use them as they are, and add a deprecation notice to
  remove them in the next release.

Anatoly Burakov (8):
  fbarray: support no-shconf mode
  ipc: add support for no-shconf mode
  eal: add support for no-shconf for hugepage info
  eal: add support for no-shconf in hugepage data file
  eal: do not create runtime dir in no-shconf mode
  mem: add support for hugepage-unlink mode
  eal: add --in-memory option
  mem: support in-memory mode

 lib/librte_eal/bsdapp/eal/eal.c               |   3 +-
 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c |   4 +
 lib/librte_eal/common/eal_common_fbarray.c    |  71 +++++----
 lib/librte_eal/common/eal_common_options.c    |  20 ++-
 lib/librte_eal/common/eal_common_proc.c       |  25 +++
 lib/librte_eal/common/eal_internal_cfg.h      |   4 +
 lib/librte_eal/common/eal_options.h           |   2 +
 lib/librte_eal/linuxapp/eal/eal.c             |   3 +-
 .../linuxapp/eal/eal_hugepage_info.c          |  95 +++++++----
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 150 ++++++++++++------
 lib/librte_eal/linuxapp/eal/eal_memory.c      |  16 +-
 11 files changed, 276 insertions(+), 117 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 1/8] fbarray: support no-shconf mode
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
  2018-07-13 12:47   ` [PATCH v3 0/8] " Anatoly Burakov
@ 2018-07-13 12:47   ` Anatoly Burakov
  2018-07-13 12:47   ` [PATCH v3 2/8] ipc: add support for " Anatoly Burakov
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:47 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

When using --no-shconf option, the expectation is that no multiprocess
will be supported as no shared files are created. However, fbarray still
creates some shared files that prevent multiple processes with the same
prefix from starting.

Fix this by avoiding creating shared files whenever noshconf option is
specified. Since virtual areas we get from eal_get_virtual_area() are
read-only, remap them as writable.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/common/eal_common_fbarray.c | 71 +++++++++++++---------
 1 file changed, 42 insertions(+), 29 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 977174c4f..43caf3ced 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -705,39 +705,52 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len,
 	if (data == NULL)
 		goto fail;
 
-	eal_get_fbarray_path(path, sizeof(path), name);
+	if (internal_config.no_shconf) {
+		/* remap virtual area as writable */
+		void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE,
+				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (new_data == MAP_FAILED) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n",
+					__func__, strerror(errno));
+			goto fail;
+		}
+	} else {
+		eal_get_fbarray_path(path, sizeof(path), name);
 
-	/*
-	 * Each fbarray is unique to process namespace, i.e. the filename
-	 * depends on process prefix. Try to take out a lock and see if we
-	 * succeed. If we don't, someone else is using it already.
-	 */
-	fd = open(path, O_CREAT | O_RDWR, 0600);
-	if (fd < 0) {
-		RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n", __func__,
-				path, strerror(errno));
-		rte_errno = errno;
-		goto fail;
-	} else if (flock(fd, LOCK_EX | LOCK_NB)) {
-		RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n", __func__,
-				path, strerror(errno));
-		rte_errno = EBUSY;
-		goto fail;
-	}
+		/*
+		 * Each fbarray is unique to process namespace, i.e. the
+		 * filename depends on process prefix. Try to take out a lock
+		 * and see if we succeed. If we don't, someone else is using it
+		 * already.
+		 */
+		fd = open(path, O_CREAT | O_RDWR, 0600);
+		if (fd < 0) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't open %s: %s\n",
+					__func__, path, strerror(errno));
+			rte_errno = errno;
+			goto fail;
+		} else if (flock(fd, LOCK_EX | LOCK_NB)) {
+			RTE_LOG(DEBUG, EAL, "%s(): couldn't lock %s: %s\n",
+					__func__, path, strerror(errno));
+			rte_errno = EBUSY;
+			goto fail;
+		}
 
-	/* take out a non-exclusive lock, so that other processes could still
-	 * attach to it, but no other process could reinitialize it.
-	 */
-	if (flock(fd, LOCK_SH | LOCK_NB)) {
-		rte_errno = errno;
-		goto fail;
-	}
+		/* take out a non-exclusive lock, so that other processes could
+		 * still attach to it, but no other process could reinitialize
+		 * it.
+		 */
+		if (flock(fd, LOCK_SH | LOCK_NB)) {
+			rte_errno = errno;
+			goto fail;
+		}
 
-	if (resize_and_map(fd, data, mmap_len))
-		goto fail;
+		if (resize_and_map(fd, data, mmap_len))
+			goto fail;
 
-	/* we've mmap'ed the file, we can now close the fd */
-	close(fd);
+		/* we've mmap'ed the file, we can now close the fd */
+		close(fd);
+	}
 
 	/* initialize the data */
 	memset(data, 0, mmap_len);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 2/8] ipc: add support for no-shconf mode
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
  2018-07-13 12:47   ` [PATCH v3 0/8] " Anatoly Burakov
  2018-07-13 12:47   ` [PATCH v3 1/8] fbarray: support no-shconf mode Anatoly Burakov
@ 2018-07-13 12:47   ` Anatoly Burakov
  2018-07-13 12:47   ` [PATCH v3 3/8] eal: add support for no-shconf for hugepage info Anatoly Burakov
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:47 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

IPC is an inter-process communication mechanism. Since no secondaries
can ever be expected to run in no-shconf mode, IPC will be useless, so
do not enable it in the first place. In the interests of API usage
convenience, we will still allow registering callbacks, but obviously
they won't ever be triggered.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/common/eal_common_proc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c
index f010ef59e..c19b4b406 100644
--- a/lib/librte_eal/common/eal_common_proc.c
+++ b/lib/librte_eal/common/eal_common_proc.c
@@ -626,6 +626,14 @@ rte_mp_channel_init(void)
 	int dir_fd;
 	pthread_t mp_handle_tid, async_reply_handle_tid;
 
+	/* in no shared files mode, we do not have secondary processes support,
+	 * so no need to initialize IPC.
+	 */
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC will be disabled\n");
+		return 0;
+	}
+
 	/* create filter path */
 	create_socket_path("*", path, sizeof(path));
 	strlcpy(mp_filter, basename(path), sizeof(mp_filter));
@@ -988,6 +996,12 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
 
 	if (check_input(req) == false)
 		return -1;
+
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	if (gettimeofday(&now, NULL) < 0) {
 		RTE_LOG(ERR, EAL, "Faile to get current time\n");
 		rte_errno = errno;
@@ -1072,6 +1086,12 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 
 	if (check_input(req) == false)
 		return -1;
+
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	if (gettimeofday(&now, NULL) < 0) {
 		RTE_LOG(ERR, EAL, "Faile to get current time\n");
 		rte_errno = errno;
@@ -1213,5 +1233,10 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
 		return -1;
 	}
 
+	if (internal_config.no_shconf) {
+		RTE_LOG(DEBUG, EAL, "No shared files mode enabled, IPC is disabled\n");
+		return 0;
+	}
+
 	return mp_send(msg, peer, MP_REP);
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 3/8] eal: add support for no-shconf for hugepage info
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                     ` (2 preceding siblings ...)
  2018-07-13 12:47   ` [PATCH v3 2/8] ipc: add support for " Anatoly Burakov
@ 2018-07-13 12:47   ` Anatoly Burakov
  2018-07-13 12:48   ` [PATCH v3 4/8] eal: add support for no-shconf in hugepage data file Anatoly Burakov
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:47 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev, thomas

Do not create any shared hugepage size info files if we were
asked to not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/bsdapp/eal/eal_hugepage_info.c   | 4 ++++
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
index 836feb672..1e8f5df23 100644
--- a/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/bsdapp/eal/eal_hugepage_info.c
@@ -101,6 +101,10 @@ eal_hugepage_info_init(void)
 	hpi->num_pages[0] = num_buffers;
 	hpi->lock_descriptor = fd;
 
+	/* for no shared files mode, do not create shared memory config */
+	if (internal_config.no_shconf)
+		return 0;
+
 	tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
 			sizeof(internal_config.hugepage_info));
 	if (tmp_hpi == NULL ) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7eca711ba..7f8e2fd9c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -446,6 +446,10 @@ eal_hugepage_info_init(void)
 	if (hugepage_info_init() < 0)
 		return -1;
 
+	/* for no shared files mode, we're done */
+	if (internal_config.no_shconf)
+		return 0;
+
 	hpi = &internal_config.hugepage_info[0];
 
 	tmp_hpi = create_shared_memory(eal_hugepage_info_path(),
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 4/8] eal: add support for no-shconf in hugepage data file
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                     ` (3 preceding siblings ...)
  2018-07-13 12:47   ` [PATCH v3 3/8] eal: add support for no-shconf for hugepage info Anatoly Burakov
@ 2018-07-13 12:48   ` Anatoly Burakov
  2018-07-13 12:48   ` [PATCH v3 5/8] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:48 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Do not create a shared hugepage data file if we were asked to
not create any shared files.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5d3c8831b..ddfa8b133 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -521,7 +521,18 @@ static void *
 create_shared_memory(const char *filename, const size_t mem_size)
 {
 	void *retval;
-	int fd = open(filename, O_CREAT | O_RDWR, 0666);
+	int fd;
+
+	/* if no shared files mode is used, create anonymous memory instead */
+	if (internal_config.no_shconf) {
+		retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
+				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (retval == MAP_FAILED)
+			return NULL;
+		return retval;
+	}
+
+	fd = open(filename, O_CREAT | O_RDWR, 0666);
 	if (fd < 0)
 		return NULL;
 	if (ftruncate(fd, mem_size) < 0) {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 5/8] eal: do not create runtime dir in no-shconf mode
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                     ` (4 preceding siblings ...)
  2018-07-13 12:48   ` [PATCH v3 4/8] eal: add support for no-shconf in hugepage data file Anatoly Burakov
@ 2018-07-13 12:48   ` Anatoly Burakov
  2018-07-13 12:48   ` [PATCH v3 6/8] mem: add support for hugepage-unlink mode Anatoly Burakov
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:48 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, ray.kinsella, kuralamudhan.ramakrishnan,
	louise.m.daly, ferruh.yigit, konstantin.ananyev, thomas

Now that the rest of the EAL is adjusted to not create any shared
files, prevent runtime directory from ever being created.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Use --no-shconf only

 lib/librte_eal/bsdapp/eal/eal.c   | 3 ++-
 lib/librte_eal/linuxapp/eal/eal.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index dc279542d..13b6f8ae1 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -601,7 +601,8 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	/* create runtime data directory */
-	if (eal_create_runtime_dir() < 0) {
+	if (internal_config.no_shconf == 0 &&
+			eal_create_runtime_dir() < 0) {
 		rte_eal_init_alert("Cannot create runtime directory\n");
 		rte_errno = EACCES;
 		return -1;
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index ec7cea55d..191960caa 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -832,7 +832,8 @@ rte_eal_init(int argc, char **argv)
 	}
 
 	/* create runtime data directory */
-	if (eal_create_runtime_dir() < 0) {
+	if (internal_config.no_shconf == 0 &&
+			eal_create_runtime_dir() < 0) {
 		rte_eal_init_alert("Cannot create runtime directory\n");
 		rte_errno = EACCES;
 		return -1;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 6/8] mem: add support for hugepage-unlink mode
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                     ` (5 preceding siblings ...)
  2018-07-13 12:48   ` [PATCH v3 5/8] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
@ 2018-07-13 12:48   ` Anatoly Burakov
  2018-07-13 12:48   ` [PATCH v3 7/8] eal: add --in-memory option Anatoly Burakov
  2018-07-13 12:48   ` [PATCH v3 8/8] mem: support in-memory mode Anatoly Burakov
  8 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:48 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Unlink hugepages after creating them, to honor the hugepage-unlink mode.
We cannot resize non-existing files, so make single file segments
explicitly unsupported.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v1->v2:
    - Move check for hugepage unlink into this patch, to be
      consistent with commit message
    
    RFC->v1:
    - Use --huge-unlink only
    
    RFC->v1:
    - Use --huge-unlink only

 lib/librte_eal/common/eal_common_options.c |  6 ++++++
 lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 +++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 45ea01a8b..df5d53648 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1332,6 +1332,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
 			" is only supported in non-legacy memory mode\n");
 		return -1;
 	}
+	if (internal_cfg->single_file_segments &&
+			internal_cfg->hugepage_unlink) {
+		RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
+			"not compatible with --"OPT_HUGE_UNLINK"\n");
+		return -1;
+	}
 
 	return 0;
 }
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 69604f823..d610923b8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -489,6 +489,13 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 				__func__, strerror(errno));
 			goto resized;
 		}
+		if (internal_config.hugepage_unlink) {
+			if (unlink(path)) {
+				RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+					__func__, strerror(errno));
+				goto resized;
+			}
+		}
 	}
 
 	/*
@@ -587,7 +594,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 		/* ignore failure, can't make it any worse */
 	} else {
 		/* only remove file if we can take out a write lock */
-		if (lock(fd, LOCK_EX) == 1)
+		if (internal_config.hugepage_unlink == 0 &&
+				lock(fd, LOCK_EX) == 1)
 			unlink(path);
 		close(fd);
 	}
@@ -612,6 +620,12 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
 		return -1;
 	}
 
+	/* if we've already unlinked the page, nothing needs to be done */
+	if (internal_config.hugepage_unlink) {
+		memset(ms, 0, sizeof(*ms));
+		return 0;
+	}
+
 	/* if we are not in single file segments mode, we're going to unmap the
 	 * segment and thus drop the lock on original fd, but hugepage dir is
 	 * now locked so we can take out another one without races.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 7/8] eal: add --in-memory option
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                     ` (6 preceding siblings ...)
  2018-07-13 12:48   ` [PATCH v3 6/8] mem: add support for hugepage-unlink mode Anatoly Burakov
@ 2018-07-13 12:48   ` Anatoly Burakov
  2018-07-13 12:48   ` [PATCH v3 8/8] mem: support in-memory mode Anatoly Burakov
  8 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:48 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

This command-line option will cause DPDK to operate entirely in
memory and not create any shared files at runtime, including any
shared configuration or hugetlbfs files. This is useful for debug
purposes, as well as for certain use cases like containers or
automatic memory cleanup.

Currently, this option acts as a strict superset of --no-shconf and
--huge-unlink commands.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Do not deprecate old options, instead just coopt them

 lib/librte_eal/common/eal_common_options.c | 18 ++++++++++++++----
 lib/librte_eal/common/eal_internal_cfg.h   |  4 ++++
 lib/librte_eal/common/eal_options.h        |  2 ++
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index df5d53648..f308b57c3 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -66,6 +66,7 @@ eal_long_options[] = {
 	{OPT_NO_HUGE,           0, NULL, OPT_NO_HUGE_NUM          },
 	{OPT_NO_PCI,            0, NULL, OPT_NO_PCI_NUM           },
 	{OPT_NO_SHCONF,         0, NULL, OPT_NO_SHCONF_NUM        },
+	{OPT_IN_MEMORY,         0, NULL, OPT_IN_MEMORY_NUM        },
 	{OPT_PCI_BLACKLIST,     1, NULL, OPT_PCI_BLACKLIST_NUM    },
 	{OPT_PCI_WHITELIST,     1, NULL, OPT_PCI_WHITELIST_NUM    },
 	{OPT_PROC_TYPE,         1, NULL, OPT_PROC_TYPE_NUM        },
@@ -1170,6 +1171,13 @@ eal_parse_common_option(int opt, const char *optarg,
 		conf->no_shconf = 1;
 		break;
 
+	case OPT_IN_MEMORY_NUM:
+		conf->in_memory = 1;
+		/* in-memory is a superset of noshconf and huge-unlink */
+		conf->no_shconf = 1;
+		conf->hugepage_unlink = 1;
+		break;
+
 	case OPT_PROC_TYPE_NUM:
 		conf->process_type = eal_parse_proc_type(optarg);
 		break;
@@ -1321,8 +1329,8 @@ eal_check_common_options(struct internal_config *internal_cfg)
 			"be specified together with --"OPT_NO_HUGE"\n");
 		return -1;
 	}
-
-	if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink) {
+	if (internal_cfg->no_hugetlbfs && internal_cfg->hugepage_unlink &&
+			!internal_cfg->in_memory) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
 			"be specified together with --"OPT_NO_HUGE"\n");
 		return -1;
@@ -1330,12 +1338,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
 	if (internal_config.force_socket_limits && internal_config.legacy_mem) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_SOCKET_LIMIT
 			" is only supported in non-legacy memory mode\n");
-		return -1;
 	}
 	if (internal_cfg->single_file_segments &&
 			internal_cfg->hugepage_unlink) {
 		RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
-			"not compatible with --"OPT_HUGE_UNLINK"\n");
+			"not compatible with neither --"OPT_IN_MEMORY" nor "
+			"--"OPT_HUGE_UNLINK"\n");
 		return -1;
 	}
 
@@ -1386,6 +1394,8 @@ eal_common_usage(void)
 	       "                      Set specific log level\n"
 	       "  -v                  Display version information on startup\n"
 	       "  -h, --help          This help\n"
+	       "  --"OPT_IN_MEMORY"   Operate entirely in memory. This will \n"
+	       "                      disable secondary process support\n"
 	       "\nEAL options for DEBUG use only:\n"
 	       "  --"OPT_HUGE_UNLINK"       Unlink hugepage files after init\n"
 	       "  --"OPT_NO_HUGE"           Use malloc instead of hugetlbfs\n"
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index d66cd0313..00ee6e06e 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -41,6 +41,10 @@ struct internal_config {
 	volatile unsigned vmware_tsc_map; /**< true to use VMware TSC mapping
 										* instead of native TSC */
 	volatile unsigned no_shconf;      /**< true if there is no shared config */
+	volatile unsigned in_memory;
+	/**< true if DPDK should operate entirely in-memory and not create any
+	 * shared files or runtime data.
+	 */
 	volatile unsigned create_uio_dev; /**< true to create /dev/uioX devices */
 	volatile enum rte_proc_type_t process_type; /**< multi-process proc type */
 	/** true to try allocating memory on specific sockets */
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 6d92f64a8..96e166787 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -45,6 +45,8 @@ enum {
 	OPT_NO_PCI_NUM,
 #define OPT_NO_SHCONF         "no-shconf"
 	OPT_NO_SHCONF_NUM,
+#define OPT_IN_MEMORY         "in-memory"
+	OPT_IN_MEMORY_NUM,
 #define OPT_SOCKET_MEM        "socket-mem"
 	OPT_SOCKET_MEM_NUM,
 #define OPT_SOCKET_LIMIT        "socket-limit"
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 8/8] mem: support in-memory mode
  2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
                     ` (7 preceding siblings ...)
  2018-07-13 12:48   ` [PATCH v3 7/8] eal: add --in-memory option Anatoly Burakov
@ 2018-07-13 12:48   ` Anatoly Burakov
  8 siblings, 0 replies; 35+ messages in thread
From: Anatoly Burakov @ 2018-07-13 12:48 UTC (permalink / raw)
  To: dev
  Cc: ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev, thomas

Implement the final piece of the in-memory mode puzzle - enable running
DPDK entirely in memory, without creating any files.

To do it, use mmap with MAP_HUGETLB and size flags to enable DPDK to work
without hugetlbfs mountpoints. In order to enable this, a few things needed
to be changed.

First of all, we need to allow empty hugetlbfs mountpoints in
hugepage_info, and handle them correctly (by not trying to create any
files and lock any directories).

Next, we need to reorder the mapping sequence, because the page is not
really allocated until the page fault, and we cannot get its IOVA
address before we trigger the page fault.

Finally, decide at compile time whether we are going to be supporting
anonymous hugepages or not, because we cannot check for it at runtime.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Drop memfd and instead use mmap() with MAP_HUGETLB. This will drop the
      kernel requirements down to 3.8, and does not impose any restrictions
      glibc (as far as i known).
    
      Unfortunately, there's a bit of an issue with this approach, because
      mmap() is stupid and will happily ignore unsupported arguments. This
      means that if the binary were to be compiled on a 3.8+ kernel but run
      on a pre-3.8 kernel (such as currently supported minimum of 3.2), then
      most likely the memory would be allocated using regular pages, causing
      unthinkable performance degradation. No solution to this problem is
      currently known to me.

 .../linuxapp/eal/eal_hugepage_info.c          |  91 ++++++++----
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 140 +++++++++++-------
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   3 +-
 3 files changed, 149 insertions(+), 85 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
index 7f8e2fd9c..3a7d4b222 100644
--- a/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
+++ b/lib/librte_eal/linuxapp/eal/eal_hugepage_info.c
@@ -18,6 +18,8 @@
 #include <sys/queue.h>
 #include <sys/stat.h>
 
+#include <linux/mman.h> /* for hugetlb-related flags */
+
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_launch.h>
@@ -313,11 +315,49 @@ compare_hpi(const void *a, const void *b)
 	return hpi_b->hugepage_sz - hpi_a->hugepage_sz;
 }
 
+static void
+calc_num_pages(struct hugepage_info *hpi, struct dirent *dirent)
+{
+	uint64_t total_pages = 0;
+	unsigned int i;
+
+	/*
+	 * first, try to put all hugepages into relevant sockets, but
+	 * if first attempts fails, fall back to collecting all pages
+	 * in one socket and sorting them later
+	 */
+	total_pages = 0;
+	/* we also don't want to do this for legacy init */
+	if (!internal_config.legacy_mem)
+		for (i = 0; i < rte_socket_count(); i++) {
+			int socket = rte_socket_id_by_idx(i);
+			unsigned int num_pages =
+					get_num_hugepages_on_node(
+						dirent->d_name, socket);
+			hpi->num_pages[socket] = num_pages;
+			total_pages += num_pages;
+		}
+	/*
+	 * we failed to sort memory from the get go, so fall
+	 * back to old way
+	 */
+	if (total_pages == 0) {
+		hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
+
+#ifndef RTE_ARCH_64
+		/* for 32-bit systems, limit number of hugepages to
+		 * 1GB per page size */
+		hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
+				RTE_PGSIZE_1G / hpi->hugepage_sz);
+#endif
+	}
+}
+
 static int
 hugepage_info_init(void)
 {	const char dirent_start_text[] = "hugepages-";
 	const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
-	unsigned int i, total_pages, num_sizes = 0;
+	unsigned int i, num_sizes = 0;
 	DIR *dir;
 	struct dirent *dirent;
 
@@ -355,6 +395,22 @@ hugepage_info_init(void)
 					"%" PRIu64 " reserved, but no mounted "
 					"hugetlbfs found for that size\n",
 					num_pages, hpi->hugepage_sz);
+			/* if we have kernel support for reserving hugepages
+			 * through mmap, and we're in in-memory mode, treat this
+			 * page size as valid. we cannot be in legacy mode at
+			 * this point because we've checked this earlier in the
+			 * init process.
+			 */
+#ifdef MAP_HUGE_SHIFT
+			if (internal_config.in_memory) {
+				RTE_LOG(DEBUG, EAL, "In-memory mode enabled, "
+					"hugepages of size %" PRIu64 " bytes "
+					"will be allocated anonymously\n",
+					hpi->hugepage_sz);
+				calc_num_pages(hpi, dirent);
+				num_sizes++;
+			}
+#endif
 			continue;
 		}
 
@@ -371,35 +427,7 @@ hugepage_info_init(void)
 		if (clear_hugedir(hpi->hugedir) == -1)
 			break;
 
-		/*
-		 * first, try to put all hugepages into relevant sockets, but
-		 * if first attempts fails, fall back to collecting all pages
-		 * in one socket and sorting them later
-		 */
-		total_pages = 0;
-		/* we also don't want to do this for legacy init */
-		if (!internal_config.legacy_mem)
-			for (i = 0; i < rte_socket_count(); i++) {
-				int socket = rte_socket_id_by_idx(i);
-				unsigned int num_pages =
-						get_num_hugepages_on_node(
-							dirent->d_name, socket);
-				hpi->num_pages[socket] = num_pages;
-				total_pages += num_pages;
-			}
-		/*
-		 * we failed to sort memory from the get go, so fall
-		 * back to old way
-		 */
-		if (total_pages == 0)
-			hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
-
-#ifndef RTE_ARCH_64
-		/* for 32-bit systems, limit number of hugepages to
-		 * 1GB per page size */
-		hpi->num_pages[0] = RTE_MIN(hpi->num_pages[0],
-					    RTE_PGSIZE_1G / hpi->hugepage_sz);
-#endif
+		calc_num_pages(hpi, dirent);
 
 		num_sizes++;
 	}
@@ -423,8 +451,7 @@ hugepage_info_init(void)
 
 		for (j = 0; j < RTE_MAX_NUMA_NODES; j++)
 			num_pages += hpi->num_pages[j];
-		if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0 &&
-				num_pages > 0)
+		if (num_pages > 0)
 			return 0;
 	}
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d610923b8..79443c56a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -28,6 +28,7 @@
 #include <numaif.h>
 #endif
 #include <linux/falloc.h>
+#include <linux/mman.h> /* for hugetlb-related mmap flags */
 
 #include <rte_common.h>
 #include <rte_log.h>
@@ -41,6 +42,15 @@
 #include "eal_memalloc.h"
 #include "eal_private.h"
 
+const int anonymous_hugepages_supported =
+#ifdef MAP_HUGE_SHIFT
+		1;
+#define RTE_MAP_HUGE_SHIFT MAP_HUGE_SHIFT
+#else
+		0;
+#define RTE_MAP_HUGE_SHIFT 26
+#endif
+
 /*
  * not all kernel version support fallocate on hugetlbfs, so fall back to
  * ftruncate and disallow deallocation if fallocate is not supported.
@@ -461,6 +471,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	int cur_socket_id = 0;
 #endif
 	uint64_t map_offset;
+	rte_iova_t iova;
+	void *va;
 	char path[PATH_MAX];
 	int ret = 0;
 	int fd;
@@ -468,43 +480,66 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	int flags;
 	void *new_addr;
 
-	/* takes out a read lock on segment or segment list */
-	fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
-	if (fd < 0) {
-		RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
-		return -1;
-	}
-
 	alloc_sz = hi->hugepage_sz;
-	if (internal_config.single_file_segments) {
-		map_offset = seg_idx * alloc_sz;
-		ret = resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
-				alloc_sz, true);
-		if (ret < 0)
-			goto resized;
-	} else {
+	if (internal_config.in_memory && anonymous_hugepages_supported) {
+		int log2, flags;
+
+		log2 = rte_log2_u32(alloc_sz);
+		/* as per mmap() manpage, all page sizes are log2 of page size
+		 * shifted by MAP_HUGE_SHIFT
+		 */
+		flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+				MAP_PRIVATE | MAP_ANONYMOUS;
+		fd = -1;
+		va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
+
+		/* single-file segments codepath will never be active because
+		 * in-memory mode is incompatible with it and it's stopped at
+		 * EAL initialization stage, however the compiler doesn't know
+		 * that and complains about map_offset being used uninitialized
+		 * on failure codepaths while having in-memory mode enabled. so,
+		 * assign a value here.
+		 */
 		map_offset = 0;
-		if (ftruncate(fd, alloc_sz) < 0) {
-			RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
-				__func__, strerror(errno));
-			goto resized;
+	} else {
+		/* takes out a read lock on segment or segment list */
+		fd = get_seg_fd(path, sizeof(path), hi, list_idx, seg_idx);
+		if (fd < 0) {
+			RTE_LOG(ERR, EAL, "Couldn't get fd on hugepage file\n");
+			return -1;
 		}
-		if (internal_config.hugepage_unlink) {
-			if (unlink(path)) {
-				RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+
+		if (internal_config.single_file_segments) {
+			map_offset = seg_idx * alloc_sz;
+			ret = resize_hugefile(fd, path, list_idx, seg_idx,
+					map_offset, alloc_sz, true);
+			if (ret < 0)
+				goto resized;
+		} else {
+			map_offset = 0;
+			if (ftruncate(fd, alloc_sz) < 0) {
+				RTE_LOG(DEBUG, EAL, "%s(): ftruncate() failed: %s\n",
 					__func__, strerror(errno));
 				goto resized;
 			}
+			if (internal_config.hugepage_unlink) {
+				if (unlink(path)) {
+					RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
+						__func__, strerror(errno));
+					goto resized;
+				}
+			}
 		}
+
+		/*
+		 * map the segment, and populate page tables, the kernel fills
+		 * this segment with zeros if it's a new page.
+		 */
+		va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
+				MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
+				map_offset);
 	}
 
-	/*
-	 * map the segment, and populate page tables, the kernel fills this
-	 * segment with zeros if it's a new page.
-	 */
-	void *va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
-			MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd, map_offset);
-
 	if (va == MAP_FAILED) {
 		RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
 			strerror(errno));
@@ -519,24 +554,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 		goto resized;
 	}
 
-	rte_iova_t iova = rte_mem_virt2iova(addr);
-	if (iova == RTE_BAD_PHYS_ADDR) {
-		RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
-			__func__);
-		goto mapped;
-	}
-
-#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
-	move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
-
-	if (cur_socket_id != socket_id) {
-		RTE_LOG(DEBUG, EAL,
-				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
-			__func__, socket_id, cur_socket_id);
-		goto mapped;
-	}
-#endif
-
 	/* In linux, hugetlb limitations, like cgroup, are
 	 * enforced at fault time instead of mmap(), even
 	 * with the option of MAP_POPULATE. Kernel will send
@@ -549,9 +566,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 			(unsigned int)(alloc_sz >> 20));
 		goto mapped;
 	}
-	/* for non-single file segments, we can close fd here */
-	if (!internal_config.single_file_segments)
-		close(fd);
 
 	/* we need to trigger a write to the page to enforce page fault and
 	 * ensure that page is accessible to us, but we can't overwrite value
@@ -560,6 +574,28 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	 */
 	*(volatile int *)addr = *(volatile int *)addr;
 
+	iova = rte_mem_virt2iova(addr);
+	if (iova == RTE_BAD_PHYS_ADDR) {
+		RTE_LOG(DEBUG, EAL, "%s(): can't get IOVA addr\n",
+			__func__);
+		goto mapped;
+	}
+
+#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
+	move_pages(getpid(), 1, &addr, NULL, &cur_socket_id, 0);
+
+	if (cur_socket_id != socket_id) {
+		RTE_LOG(DEBUG, EAL,
+				"%s(): allocation happened on wrong socket (wanted %d, got %d)\n",
+			__func__, socket_id, cur_socket_id);
+		goto mapped;
+	}
+#endif
+	/* for non-single file segments that aren't in-memory, we can close fd
+	 * here */
+	if (!internal_config.single_file_segments && !internal_config.in_memory)
+		close(fd);
+
 	ms->addr = addr;
 	ms->hugepage_sz = alloc_sz;
 	ms->len = alloc_sz;
@@ -588,6 +624,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 		RTE_LOG(CRIT, EAL, "Can't mmap holes in our virtual address space\n");
 	}
 resized:
+	/* in-memory mode will never be single-file-segments mode */
 	if (internal_config.single_file_segments) {
 		resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
 				alloc_sz, false);
@@ -595,6 +632,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 	} else {
 		/* only remove file if we can take out a write lock */
 		if (internal_config.hugepage_unlink == 0 &&
+				internal_config.in_memory == 0 &&
 				lock(fd, LOCK_EX) == 1)
 			unlink(path);
 		close(fd);
@@ -705,7 +743,7 @@ alloc_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	 * during init, we already hold a write lock, so don't try to take out
 	 * another one.
 	 */
-	if (wa->hi->lock_descriptor == -1) {
+	if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
 		dir_fd = open(wa->hi->hugedir, O_RDONLY);
 		if (dir_fd < 0) {
 			RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
@@ -809,7 +847,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
 	 * during init, we already hold a write lock, so don't try to take out
 	 * another one.
 	 */
-	if (wa->hi->lock_descriptor == -1) {
+	if (wa->hi->lock_descriptor == -1 && !internal_config.in_memory) {
 		dir_fd = open(wa->hi->hugedir, O_RDONLY);
 		if (dir_fd < 0) {
 			RTE_LOG(ERR, EAL, "%s(): Cannot open '%s': %s\n",
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index ddfa8b133..dbf19499e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1088,8 +1088,7 @@ get_socket_mem_size(int socket)
 
 	for (i = 0; i < internal_config.num_hugepage_sizes; i++){
 		struct hugepage_info *hpi = &internal_config.hugepage_info[i];
-		if (strnlen(hpi->hugedir, sizeof(hpi->hugedir)) != 0)
-			size += hpi->hugepage_sz * hpi->num_pages[socket];
+		size += hpi->hugepage_sz * hpi->num_pages[socket];
 	}
 
 	return size;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 0/8] Support running DPDK without hugetlbfs mountpoint
  2018-07-13 12:47   ` [PATCH v3 0/8] " Anatoly Burakov
@ 2018-07-13 13:41     ` Thomas Monjalon
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Monjalon @ 2018-07-13 13:41 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, ray.kinsella, kuralamudhan.ramakrishnan, louise.m.daly,
	bruce.richardson, ferruh.yigit, konstantin.ananyev

> Anatoly Burakov (8):
>   fbarray: support no-shconf mode
>   ipc: add support for no-shconf mode
>   eal: add support for no-shconf for hugepage info
>   eal: add support for no-shconf in hugepage data file
>   eal: do not create runtime dir in no-shconf mode
>   mem: add support for hugepage-unlink mode
>   eal: add --in-memory option
>   mem: support in-memory mode

Applied, thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-07-13 13:41 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-01 17:15 [PATCH 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
2018-06-01 17:15 ` [PATCH 1/9] fbarray: support no-shconf mode Anatoly Burakov
2018-06-01 17:15 ` [PATCH 2/9] ipc: add support for " Anatoly Burakov
2018-06-01 17:15 ` [PATCH 3/9] eal: add support for no-shconf for hugepage info Anatoly Burakov
2018-06-01 17:15 ` [PATCH 4/9] eal: add support for no-shconf in hugepage data file Anatoly Burakov
2018-06-01 17:15 ` [PATCH 5/9] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
2018-06-01 17:15 ` [PATCH 6/9] mem: add support for hugepage-unlink mode Anatoly Burakov
2018-06-01 17:15 ` [PATCH 7/9] eal: add --in-memory option Anatoly Burakov
2018-06-01 17:15 ` [PATCH 8/9] doc: add deprecation notice for EAL command line options Anatoly Burakov
2018-06-01 17:15 ` [PATCH 9/9] mem: support in-memory mode Anatoly Burakov
2018-07-13 10:27 ` [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
2018-07-13 12:47   ` [PATCH v3 0/8] " Anatoly Burakov
2018-07-13 13:41     ` Thomas Monjalon
2018-07-13 12:47   ` [PATCH v3 1/8] fbarray: support no-shconf mode Anatoly Burakov
2018-07-13 12:47   ` [PATCH v3 2/8] ipc: add support for " Anatoly Burakov
2018-07-13 12:47   ` [PATCH v3 3/8] eal: add support for no-shconf for hugepage info Anatoly Burakov
2018-07-13 12:48   ` [PATCH v3 4/8] eal: add support for no-shconf in hugepage data file Anatoly Burakov
2018-07-13 12:48   ` [PATCH v3 5/8] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
2018-07-13 12:48   ` [PATCH v3 6/8] mem: add support for hugepage-unlink mode Anatoly Burakov
2018-07-13 12:48   ` [PATCH v3 7/8] eal: add --in-memory option Anatoly Burakov
2018-07-13 12:48   ` [PATCH v3 8/8] mem: support in-memory mode Anatoly Burakov
2018-07-13 10:27 ` [PATCH v2 1/9] fbarray: support no-shconf mode Anatoly Burakov
2018-07-13 10:27 ` [PATCH v2 2/9] ipc: add support for " Anatoly Burakov
2018-07-13 10:27 ` [PATCH v2 3/9] eal: add support for no-shconf for hugepage info Anatoly Burakov
2018-07-13 10:27 ` [PATCH v2 4/9] eal: add support for no-shconf in hugepage data file Anatoly Burakov
2018-07-13 10:27 ` [PATCH v2 5/9] eal: do not create runtime dir in no-shconf mode Anatoly Burakov
2018-07-13 10:27 ` [PATCH v2 6/9] mem: add support for hugepage-unlink mode Anatoly Burakov
2018-07-13 10:27 ` [PATCH v2 7/9] eal: add --in-memory option Anatoly Burakov
2018-07-13 12:13   ` Thomas Monjalon
2018-07-13 12:27     ` Burakov, Anatoly
2018-07-13 10:27 ` [PATCH v2 8/9] doc: add deprecation notice for EAL command line options Anatoly Burakov
2018-07-13 12:13   ` Thomas Monjalon
2018-07-13 12:29     ` Burakov, Anatoly
2018-07-13 10:27 ` [PATCH v2 9/9] mem: support in-memory mode Anatoly Burakov
2018-07-13 12:15   ` Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.