[PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
@ 2022-04-18 21:37 Gabriel Krisman Bertazi
  2022-04-18 21:37 ` [PATCH v3 1/3] shmem: Keep track of out-of-memory and out-of-space errors Gabriel Krisman Bertazi
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-04-18 21:37 UTC (permalink / raw)
  To: hughd, akpm, amir73il
  Cc: viro, Gabriel Krisman Bertazi, kernel, Khazhismel Kumykov,
	Linux MM, linux-fsdevel

The only difference from v2 is applying Viro's coment on how the life of
the sbinfo should now be tied to the kobject.  I hope it is correct the
way i did it.  Tested by mount/umount while holding a reference.

* v2 cover:

the only difference from v1 is addressing Amir's comment about
generating the directory in sysfs using the minor number.

* Original cover letter

When provisioning containerized applications, multiple very small tmpfs
are used, for which one cannot always predict the proper file system
size ahead of time.  We want to be able to reliably monitor filesystems
for ENOSPC errors, without depending on the application being executed
reporting the ENOSPC after a failure.  It is also not enough to watch
statfs since that information might be ephemeral (say the application
recovers by deleting data, the issue can get lost).  For this use case,
it is also interesting to differentiate IO errors caused by lack of
virtual memory from lack of FS space.

This patch exposes two counters on sysfs that log the two conditions
that are interesting to observe for container provisioning.  They are
recorded per tmpfs superblock, and can be polled by a monitoring
application.

I proposed a more general approach [1] using fsnotify, but considering
the specificity of this use-case, people agreed it seems that a simpler
solution in sysfs is more than enough.

[1] https://lore.kernel.org/linux-mm/20211116220742.584975-3-krisman@collabora.com/T/#mee338d25b0e1e07cbe0861f9a5ca8cc439b3edb8

To: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Linux MM <linux-mm@kvack.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>

Gabriel Krisman Bertazi (3):
  shmem: Keep track of out-of-memory and out-of-space errors
  shmem: Introduce /sys/fs/tmpfs support
  shmem: Expose space and accounting error count

 Documentation/ABI/testing/sysfs-fs-tmpfs | 13 ++++
 include/linux/shmem_fs.h                 |  5 ++
 mm/shmem.c                               | 76 ++++++++++++++++++++++--
 3 files changed, 90 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-fs-tmpfs

-- 
2.35.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/3] shmem: Keep track of out-of-memory and out-of-space errors
  2022-04-18 21:37 [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Gabriel Krisman Bertazi
@ 2022-04-18 21:37 ` Gabriel Krisman Bertazi
  2022-04-18 21:37 ` [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support Gabriel Krisman Bertazi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-04-18 21:37 UTC (permalink / raw)
  To: hughd, akpm, amir73il
  Cc: viro, Gabriel Krisman Bertazi, kernel, Khazhismel Kumykov,
	Linux MM, linux-fsdevel

Keep a per-sb counter of failed shmem allocations for ENOMEM/ENOSPC to
be reported on sysfs.  The sysfs support is done separately on a later
patch.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 include/linux/shmem_fs.h | 3 +++
 mm/shmem.c               | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index e65b80ed09e7..1a7cd9ea9107 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -44,6 +44,9 @@ struct shmem_sb_info {
 	spinlock_t shrinklist_lock;   /* Protects shrinklist */
 	struct list_head shrinklist;  /* List of shinkable inodes */
 	unsigned long shrinklist_len; /* Length of shrinklist */
+
+	unsigned long acct_errors;
+	unsigned long space_errors;
 };
 
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
diff --git a/mm/shmem.c b/mm/shmem.c
index a09b29ec2b45..c350fa0a0fff 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -212,8 +212,10 @@ static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 
-	if (shmem_acct_block(info->flags, pages))
+	if (shmem_acct_block(info->flags, pages)) {
+		sbinfo->acct_errors += 1;
 		return false;
+	}
 
 	if (sbinfo->max_blocks) {
 		if (percpu_counter_compare(&sbinfo->used_blocks,
@@ -225,6 +227,7 @@ static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
 	return true;
 
 unacct:
+	sbinfo->space_errors += 1;
 	shmem_unacct_blocks(info->flags, pages);
 	return false;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support
  2022-04-18 21:37 [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Gabriel Krisman Bertazi
  2022-04-18 21:37 ` [PATCH v3 1/3] shmem: Keep track of out-of-memory and out-of-space errors Gabriel Krisman Bertazi
@ 2022-04-18 21:37 ` Gabriel Krisman Bertazi
  2022-04-18 21:37 ` [PATCH v3 3/3] shmem: Expose space and accounting error count Gabriel Krisman Bertazi
  2022-04-19  3:42 ` [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Andrew Morton
  3 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-04-18 21:37 UTC (permalink / raw)
  To: hughd, akpm, amir73il
  Cc: viro, Gabriel Krisman Bertazi, kernel, Khazhismel Kumykov,
	Linux MM, linux-fsdevel

In order to expose tmpfs statistics on sysfs, add the boilerplate code
to create the /sys/fs/tmpfs structure.  As suggested on a previous
review, this uses the minor as the volume directory in /sys/fs/.

This takes care of not exposing SB_NOUSER mounts.  I don't think we have
a usecase for showing them and, since they don't appear elsewhere, they
might be confusing to users.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>

---
Changes since v2:
  - Use kobject to release sbinfo (Viro)

Changes since v1:
  - Use minor instead of fsid for directory in sysfs. (Amir)
---
 include/linux/shmem_fs.h |  2 ++
 mm/shmem.c               | 46 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 1a7cd9ea9107..6c1f3a4b8c46 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -47,6 +47,8 @@ struct shmem_sb_info {
 
 	unsigned long acct_errors;
 	unsigned long space_errors;
+
+	struct kobject kobj;
 };
 
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
diff --git a/mm/shmem.c b/mm/shmem.c
index c350fa0a0fff..8fe4a22e83a6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -250,6 +250,7 @@ static const struct inode_operations shmem_dir_inode_operations;
 static const struct inode_operations shmem_special_inode_operations;
 static const struct vm_operations_struct shmem_vm_ops;
 static struct file_system_type shmem_fs_type;
+static struct kobject *shmem_root;
 
 bool vma_is_shmem(struct vm_area_struct *vma)
 {
@@ -3582,17 +3583,44 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 	return 0;
 }
 
+#if defined(CONFIG_SYSFS)
+#define TMPFS_SB_ATTR_RO(name)	\
+	static struct kobj_attribute tmpfs_sb_attr_##name = __ATTR_RO(name)
+
+static struct attribute *tmpfs_attrs[] = {
+	NULL
+};
+ATTRIBUTE_GROUPS(tmpfs);
+#endif /* CONFIG_SYSFS */
+
 #endif /* CONFIG_TMPFS */
 
-static void shmem_put_super(struct super_block *sb)
+static void tmpfs_sb_release(struct kobject *kobj)
 {
-	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
+	struct shmem_sb_info *sbinfo =
+		container_of(kobj, struct shmem_sb_info, kobj);
 
 	free_percpu(sbinfo->ino_batch);
 	percpu_counter_destroy(&sbinfo->used_blocks);
 	mpol_put(sbinfo->mpol);
 	kfree(sbinfo);
+}
+
+static struct kobj_type tmpfs_sb_ktype = {
+#if defined(CONFIG_TMPFS) && defined(CONFIG_SYSFS)
+	.default_groups = tmpfs_groups,
+#endif
+	.sysfs_ops	= &kobj_sysfs_ops,
+	.release	= tmpfs_sb_release,
+};
+
+static void shmem_put_super(struct super_block *sb)
+{
+	struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
+
 	sb->s_fs_info = NULL;
+
+	kobject_put(&sbinfo->kobj);
 }
 
 static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
@@ -3608,6 +3636,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 		return -ENOMEM;
 
 	sb->s_fs_info = sbinfo;
+	kobject_init(&sbinfo->kobj, &tmpfs_sb_ktype);
 
 #ifdef CONFIG_TMPFS
 	/*
@@ -3673,6 +3702,11 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	sb->s_root = d_make_root(inode);
 	if (!sb->s_root)
 		goto failed;
+
+	if (!(sb->s_flags & SB_NOUSER) &&
+	    kobject_add(&sbinfo->kobj, shmem_root, "%d", MINOR(sb->s_dev)))
+		goto failed;
+
 	return 0;
 
 failed:
@@ -3889,11 +3923,15 @@ int __init shmem_init(void)
 		goto out2;
 	}
 
+	shmem_root = kobject_create_and_add("tmpfs", fs_kobj);
+	if (!shmem_root)
+		goto out1;
+
 	shm_mnt = kern_mount(&shmem_fs_type);
 	if (IS_ERR(shm_mnt)) {
 		error = PTR_ERR(shm_mnt);
 		pr_err("Could not kern_mount tmpfs\n");
-		goto out1;
+		goto put_kobj;
 	}
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -3904,6 +3942,8 @@ int __init shmem_init(void)
 #endif
 	return 0;
 
+put_kobj:
+	kobject_put(shmem_root);
 out1:
 	unregister_filesystem(&shmem_fs_type);
 out2:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/3] shmem: Expose space and accounting error count
  2022-04-18 21:37 [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Gabriel Krisman Bertazi
  2022-04-18 21:37 ` [PATCH v3 1/3] shmem: Keep track of out-of-memory and out-of-space errors Gabriel Krisman Bertazi
  2022-04-18 21:37 ` [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support Gabriel Krisman Bertazi
@ 2022-04-18 21:37 ` Gabriel Krisman Bertazi
  2022-04-19  3:42 ` [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Andrew Morton
  3 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-04-18 21:37 UTC (permalink / raw)
  To: hughd, akpm, amir73il
  Cc: viro, Gabriel Krisman Bertazi, kernel, Khazhismel Kumykov,
	Linux MM, linux-fsdevel

Exposing these shmem counters through sysfs is particularly useful for
container provisioning, to allow administrators to differentiate between
insufficiently provisioned fs size vs. running out of memory.

Suggested-by: Amir Goldstein <amir73il@gmail.com>
Suggested-by: Khazhy Kumykov <khazhy@google.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 Documentation/ABI/testing/sysfs-fs-tmpfs | 13 ++++++++++++
 mm/shmem.c                               | 25 ++++++++++++++++++++++++
 2 files changed, 38 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-fs-tmpfs

diff --git a/Documentation/ABI/testing/sysfs-fs-tmpfs b/Documentation/ABI/testing/sysfs-fs-tmpfs
new file mode 100644
index 000000000000..d32b90949710
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-fs-tmpfs
@@ -0,0 +1,13 @@
+What:		/sys/fs/tmpfs/<disk>/acct_errors
+Date:		March 2022
+Contact:	"Gabriel Krisman Bertazi" <krisman@collabora.com>
+Description:
+		Track the number of IO errors caused by lack of memory to
+		perform the allocation of a tmpfs block.
+
+What:		/sys/fs/tmpfs/<disk>/space_errors
+Date:		March 2022
+Contact:	"Gabriel Krisman Bertazi" <krisman@collabora.com>
+Description:
+		Track the number of IO errors caused by lack of space
+		in the filesystem to perform the allocation of a tmpfs block.
diff --git a/mm/shmem.c b/mm/shmem.c
index 8fe4a22e83a6..5c665b955ceb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -214,6 +214,7 @@ static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
 
 	if (shmem_acct_block(info->flags, pages)) {
 		sbinfo->acct_errors += 1;
+		sysfs_notify(&sbinfo->kobj, NULL, "acct_errors");
 		return false;
 	}
 
@@ -228,6 +229,7 @@ static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
 
 unacct:
 	sbinfo->space_errors += 1;
+	sysfs_notify(&sbinfo->kobj, NULL, "space_errors");
 	shmem_unacct_blocks(info->flags, pages);
 	return false;
 }
@@ -3584,10 +3586,33 @@ static int shmem_show_options(struct seq_file *seq, struct dentry *root)
 }
 
 #if defined(CONFIG_SYSFS)
+static ssize_t acct_errors_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *page)
+{
+	struct shmem_sb_info *sbinfo =
+		container_of(kobj, struct shmem_sb_info, kobj);
+
+	return sysfs_emit(page, "%lu\n", sbinfo->acct_errors);
+}
+
+static ssize_t space_errors_show(struct kobject *kobj,
+				 struct kobj_attribute *attr, char *page)
+{
+	struct shmem_sb_info *sbinfo =
+		container_of(kobj, struct shmem_sb_info, kobj);
+
+	return sysfs_emit(page, "%lu\n", sbinfo->space_errors);
+}
+
 #define TMPFS_SB_ATTR_RO(name)	\
 	static struct kobj_attribute tmpfs_sb_attr_##name = __ATTR_RO(name)
 
+TMPFS_SB_ATTR_RO(acct_errors);
+TMPFS_SB_ATTR_RO(space_errors);
+
 static struct attribute *tmpfs_attrs[] = {
+	&tmpfs_sb_attr_acct_errors.attr,
+	&tmpfs_sb_attr_space_errors.attr,
 	NULL
 };
 ATTRIBUTE_GROUPS(tmpfs);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
  2022-04-18 21:37 [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2022-04-18 21:37 ` [PATCH v3 3/3] shmem: Expose space and accounting error count Gabriel Krisman Bertazi
@ 2022-04-19  3:42 ` Andrew Morton
  2022-04-19 15:28   ` Gabriel Krisman Bertazi
  3 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2022-04-19  3:42 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: hughd, amir73il, viro, kernel, Khazhismel Kumykov, Linux MM,
	linux-fsdevel

On Mon, 18 Apr 2022 17:37:10 -0400 Gabriel Krisman Bertazi <krisman@collabora.com> wrote:

> When provisioning containerized applications, multiple very small tmpfs

"files"?

> are used, for which one cannot always predict the proper file system
> size ahead of time.  We want to be able to reliably monitor filesystems
> for ENOSPC errors, without depending on the application being executed
> reporting the ENOSPC after a failure.

Well that sucks.  We need a kernel-side workaround for applications
that fail to check and report storage errors?

We could do this for every syscall in the kernel.  What's special about
tmpfs in this regard?  

Please provide additional justification and usage examples for such an
extraordinary thing.

>  It is also not enough to watch
> statfs since that information might be ephemeral (say the application
> recovers by deleting data, the issue can get lost).

We could fix the apps?  Heck, you could patch libc's write() to the same
effect.

>  For this use case,
> it is also interesting to differentiate IO errors caused by lack of
> virtual memory from lack of FS space.

More details, please.  Why interesting?  What actions can the system
operator take based upon this information?

Whatever that action is, I see no user-facing documentation which
guides the user info how to take advantage of this?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
  2022-04-19  3:42 ` [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Andrew Morton
@ 2022-04-19 15:28   ` Gabriel Krisman Bertazi
  2022-04-21  5:33     ` Amir Goldstein
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-04-19 15:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hughd, amir73il, viro, kernel, Khazhismel Kumykov, Linux MM,
	linux-fsdevel

Andrew Morton <akpm@linux-foundation.org> writes:

Hi Andrew,

> On Mon, 18 Apr 2022 17:37:10 -0400 Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
>
>> When provisioning containerized applications, multiple very small tmpfs
>
> "files"?

Actually, filesystems.  In cloud environments, we have several small
tmpfs associated with containerized tasks.

>> are used, for which one cannot always predict the proper file system
>> size ahead of time.  We want to be able to reliably monitor filesystems
>> for ENOSPC errors, without depending on the application being executed
>> reporting the ENOSPC after a failure.
>
> Well that sucks.  We need a kernel-side workaround for applications
> that fail to check and report storage errors?
>
> We could do this for every syscall in the kernel.  What's special about
> tmpfs in this regard?
>
> Please provide additional justification and usage examples for such an
> extraordinary thing.

For a cloud provider deploying containerized applications, they might
not control the application, so patching userspace wouldn't be a
solution.  More importantly - and why this is shmem specific -
they want to differentiate between a user getting ENOSPC due to
insufficiently provisioned fs size, vs. due to running out of memory in
a container, both of which return ENOSPC to the process.

A system administrator can then use this feature to monitor a fleet of
containerized applications in a uniform way, detect provisioning issues
caused by different reasons and address the deployment.

I originally submitted this as a new fanotify event, but given the
specificity of shmem, Amir suggested the interface I'm implementing
here.  We've raised this discussion originally here:

https://lore.kernel.org/linux-mm/CACGdZYLLCqzS4VLUHvzYG=rX3SEJaG7Vbs8_Wb_iUVSvXsqkxA@mail.gmail.com/

> Whatever that action is, I see no user-facing documentation which
> guides the user info how to take advantage of this?

I can follow up with a new version with documentation, if we agree this
feature makes sense.

Thanks,

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support
  2022-04-18 21:37 ` [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support Gabriel Krisman Bertazi
  (?)
@ 2022-04-22  9:54 ` Dan Carpenter
  -1 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2022-04-20  0:10 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 5071 bytes --]

CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220418213713.273050-3-krisman@collabora.com>
References: <20220418213713.273050-3-krisman@collabora.com>
TO: Gabriel Krisman Bertazi <krisman@collabora.com>
TO: hughd(a)google.com
TO: akpm(a)linux-foundation.org
TO: amir73il(a)gmail.com
CC: viro(a)zeniv.linux.org.uk
CC: Gabriel Krisman Bertazi <krisman@collabora.com>
CC: kernel(a)collabora.com
CC: Khazhismel Kumykov <khazhy@google.com>
CC: Linux MM <linux-mm@kvack.org>
CC: "linux-fsdevel" <linux-fsdevel@vger.kernel.org>

Hi Gabriel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.18-rc3 next-20220419]
[cannot apply to hnaz-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Gabriel-Krisman-Bertazi/shmem-Allow-userspace-monitoring-of-tmpfs-for-lack-of-space/20220419-054011
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b2d229d4ddb17db541098b83524d901257e93845
:::::: branch date: 26 hours ago
:::::: commit date: 26 hours ago
config: ia64-randconfig-m031-20220418 (https://download.01.org/0day-ci/archive/20220420/202204200819.72S8HjcF-lkp(a)intel.com/config)
compiler: ia64-linux-gcc (GCC) 11.2.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
mm/shmem.c:3965 shmem_init() warn: passing zero to 'ERR_PTR'

vim +/ERR_PTR +3965 mm/shmem.c

^1da177e4c3f41 Linus Torvalds          2005-04-16  3927  
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3928  int __init shmem_init(void)
^1da177e4c3f41 Linus Torvalds          2005-04-16  3929  {
^1da177e4c3f41 Linus Torvalds          2005-04-16  3930  	int error;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3931  
9a8ec03ed022b7 weiping zhang           2017-11-15  3932  	shmem_init_inodecache();
^1da177e4c3f41 Linus Torvalds          2005-04-16  3933  
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3934  	error = register_filesystem(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3935  	if (error) {
1170532bb49f94 Joe Perches             2016-03-17  3936  		pr_err("Could not register tmpfs\n");
^1da177e4c3f41 Linus Torvalds          2005-04-16  3937  		goto out2;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3938  	}
95dc112a5770dc Greg Kroah-Hartman      2005-06-20  3939  
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3940  	shmem_root = kobject_create_and_add("tmpfs", fs_kobj);
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3941  	if (!shmem_root)
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3942  		goto out1;
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3943  
ca4e05195dbc25 Al Viro                 2013-08-31  3944  	shm_mnt = kern_mount(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3945  	if (IS_ERR(shm_mnt)) {
^1da177e4c3f41 Linus Torvalds          2005-04-16  3946  		error = PTR_ERR(shm_mnt);
1170532bb49f94 Joe Perches             2016-03-17  3947  		pr_err("Could not kern_mount tmpfs\n");
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3948  		goto put_kobj;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3949  	}
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3950  
396bcc5299c281 Matthew Wilcox (Oracle  2020-04-06  3951) #ifdef CONFIG_TRANSPARENT_HUGEPAGE
435c0b87d661da Kirill A. Shutemov      2017-08-25  3952  	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3953  		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3954  	else
5e6e5a12a44ca5 Hugh Dickins            2021-09-02  3955  		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3956  #endif
^1da177e4c3f41 Linus Torvalds          2005-04-16  3957  	return 0;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3958  
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3959  put_kobj:
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3960  	kobject_put(shmem_root);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3961  out1:
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3962  	unregister_filesystem(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3963  out2:
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3964  	shmem_destroy_inodecache();
^1da177e4c3f41 Linus Torvalds          2005-04-16 @3965  	shm_mnt = ERR_PTR(error);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3966  	return error;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3967  }
853ac43ab194f5 Matt Mackall            2009-01-06  3968  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
  2022-04-19 15:28   ` Gabriel Krisman Bertazi
@ 2022-04-21  5:33     ` Amir Goldstein
  2022-04-21 22:37       ` Gabriel Krisman Bertazi
  2022-04-21 23:19       ` Khazhy Kumykov
  0 siblings, 2 replies; 15+ messages in thread
From: Amir Goldstein @ 2022-04-21  5:33 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Andrew Morton, Hugh Dickins, Al Viro, kernel, Khazhismel Kumykov,
	Linux MM, linux-fsdevel, Theodore Tso

On Tue, Apr 19, 2022 at 6:29 PM Gabriel Krisman Bertazi
<krisman@collabora.com> wrote:
>
> Andrew Morton <akpm@linux-foundation.org> writes:
>
> Hi Andrew,
>
> > On Mon, 18 Apr 2022 17:37:10 -0400 Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
> >
> >> When provisioning containerized applications, multiple very small tmpfs
> >
> > "files"?
>
> Actually, filesystems.  In cloud environments, we have several small
> tmpfs associated with containerized tasks.
>
> >> are used, for which one cannot always predict the proper file system
> >> size ahead of time.  We want to be able to reliably monitor filesystems
> >> for ENOSPC errors, without depending on the application being executed
> >> reporting the ENOSPC after a failure.
> >
> > Well that sucks.  We need a kernel-side workaround for applications
> > that fail to check and report storage errors?
> >
> > We could do this for every syscall in the kernel.  What's special about
> > tmpfs in this regard?
> >
> > Please provide additional justification and usage examples for such an
> > extraordinary thing.
>
> For a cloud provider deploying containerized applications, they might
> not control the application, so patching userspace wouldn't be a
> solution.  More importantly - and why this is shmem specific -
> they want to differentiate between a user getting ENOSPC due to
> insufficiently provisioned fs size, vs. due to running out of memory in
> a container, both of which return ENOSPC to the process.
>

Isn't there already a per memcg OOM handler that could be used by
orchestrator to detect the latter?

> A system administrator can then use this feature to monitor a fleet of
> containerized applications in a uniform way, detect provisioning issues
> caused by different reasons and address the deployment.
>
> I originally submitted this as a new fanotify event, but given the
> specificity of shmem, Amir suggested the interface I'm implementing
> here.  We've raised this discussion originally here:
>
> https://lore.kernel.org/linux-mm/CACGdZYLLCqzS4VLUHvzYG=rX3SEJaG7Vbs8_Wb_iUVSvXsqkxA@mail.gmail.com/
>

To put things in context, the points I was trying to make in this
discussion are:

1. Why isn't monitoring with statfs() a sufficient solution? and more
    specifically, the shared disk space provisioning problem does not sound
    very tmpfs specific to me.
    It is a well known issue for thin provisioned storage in environments
    with shared resources as the ones that you describe
2. OTOH, exporting internal fs stats via /sys/fs for debugging, health
monitoring
    or whatever seems legit to me and is widely practiced by other fs, so
    exposing those tmpfs stats as this patch set is doing seems fine to me.

Another point worth considering in favor of /sys/fs/tmpfs -
since tmpfs is FS_USERNS_MOUNT, the ability of sysadmin to monitor all
tmpfs mounts in the system and their usage is limited.

Therefore, having a central way to enumerate all tmpfs instances in the system
like blockdev fs instances and like fuse fs instances, does not sound
like a terrible
idea in general.

> > Whatever that action is, I see no user-facing documentation which
> > guides the user info how to take advantage of this?
>
> I can follow up with a new version with documentation, if we agree this
> feature makes sense.
>

Given the time of year and participants involved, shall we continue
this discussion
in LSFMM?

I am not sure if this even requires a shared FS/MM session, but I
don't mind trying
to allocate a shared FS/MM slot if Andrew and MM guys are interested
to take part
in the discussion.

As long as memcg is able to report OOM to the orchestrator, the problem does not
sound very tmpfs specific to me.

As Ted explained, cloud providers (for some reason) charge by disk size and not
by disk usage, so also for non-tmpfs, online growing the fs on demand could
prove to be a rewarding practice for cloud applications.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
  2022-04-21  5:33     ` Amir Goldstein
@ 2022-04-21 22:37       ` Gabriel Krisman Bertazi
  2022-04-21 23:19       ` Khazhy Kumykov
  1 sibling, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-04-21 22:37 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Andrew Morton, Hugh Dickins, Al Viro, kernel, Khazhismel Kumykov,
	Linux MM, linux-fsdevel, Theodore Tso

Amir Goldstein <amir73il@gmail.com> writes:

> On Tue, Apr 19, 2022 at 6:29 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
>> > Well that sucks.  We need a kernel-side workaround for applications
>> > that fail to check and report storage errors?
>> >
>> > We could do this for every syscall in the kernel.  What's special about
>> > tmpfs in this regard?
>> >
>> > Please provide additional justification and usage examples for such an
>> > extraordinary thing.
>>
>> For a cloud provider deploying containerized applications, they might
>> not control the application, so patching userspace wouldn't be a
>> solution.  More importantly - and why this is shmem specific -
>> they want to differentiate between a user getting ENOSPC due to
>> insufficiently provisioned fs size, vs. due to running out of memory in
>> a container, both of which return ENOSPC to the process.
>>
>
> Isn't there already a per memcg OOM handler that could be used by
> orchestrator to detect the latter?

Hi Amir,

Thanks for the added context.  I'm actually not sure if an OOM handler
completely solves the latter case.  If shmem_inode_acct_block fails, it
happens before the allocation. The OOM won't trigger and we won't know
about it, as far as I understand.  I'm not sure it's real problem for
Google's use case.  Khazhy is the expert on their implementation and
might be able to better discuss it.

I wanna mention that, for the insufficiently-provisioned-fs-size case,
we still can't rely just on statfs.  We need a polling interface -
generic or tmpfs specific - to make sure we don't miss these events, I
think.

Thanks,

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
  2022-04-21  5:33     ` Amir Goldstein
  2022-04-21 22:37       ` Gabriel Krisman Bertazi
@ 2022-04-21 23:19       ` Khazhy Kumykov
  2022-04-22  9:02         ` Amir Goldstein
  1 sibling, 1 reply; 15+ messages in thread
From: Khazhy Kumykov @ 2022-04-21 23:19 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Gabriel Krisman Bertazi, Andrew Morton, Hugh Dickins, Al Viro,
	kernel, Linux MM, linux-fsdevel, Theodore Tso

[-- Attachment #1: Type: text/plain, Size: 5399 bytes --]

On Wed, Apr 20, 2022 at 10:34 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Apr 19, 2022 at 6:29 PM Gabriel Krisman Bertazi
> <krisman@collabora.com> wrote:
> >
> > Andrew Morton <akpm@linux-foundation.org> writes:
> >
> > Hi Andrew,
> >
> > > On Mon, 18 Apr 2022 17:37:10 -0400 Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
> > >
> > >> When provisioning containerized applications, multiple very small tmpfs
> > >
> > > "files"?
> >
> > Actually, filesystems.  In cloud environments, we have several small
> > tmpfs associated with containerized tasks.
> >
> > >> are used, for which one cannot always predict the proper file system
> > >> size ahead of time.  We want to be able to reliably monitor filesystems
> > >> for ENOSPC errors, without depending on the application being executed
> > >> reporting the ENOSPC after a failure.
> > >
> > > Well that sucks.  We need a kernel-side workaround for applications
> > > that fail to check and report storage errors?
> > >
> > > We could do this for every syscall in the kernel.  What's special about
> > > tmpfs in this regard?
> > >
> > > Please provide additional justification and usage examples for such an
> > > extraordinary thing.
> >
> > For a cloud provider deploying containerized applications, they might
> > not control the application, so patching userspace wouldn't be a
> > solution.  More importantly - and why this is shmem specific -
> > they want to differentiate between a user getting ENOSPC due to
> > insufficiently provisioned fs size, vs. due to running out of memory in
> > a container, both of which return ENOSPC to the process.
> >
>
> Isn't there already a per memcg OOM handler that could be used by
> orchestrator to detect the latter?
>
> > A system administrator can then use this feature to monitor a fleet of
> > containerized applications in a uniform way, detect provisioning issues
> > caused by different reasons and address the deployment.
> >
> > I originally submitted this as a new fanotify event, but given the
> > specificity of shmem, Amir suggested the interface I'm implementing
> > here.  We've raised this discussion originally here:
> >
> > https://lore.kernel.org/linux-mm/CACGdZYLLCqzS4VLUHvzYG=rX3SEJaG7Vbs8_Wb_iUVSvXsqkxA@mail.gmail.com/
> >
>
> To put things in context, the points I was trying to make in this
> discussion are:
>
> 1. Why isn't monitoring with statfs() a sufficient solution? and more
>     specifically, the shared disk space provisioning problem does not sound
>     very tmpfs specific to me.
>     It is a well known issue for thin provisioned storage in environments
>     with shared resources as the ones that you describe

I think this solves a different problem: to my understanding statfs
polling is useful for determining if a long lived, slowly growing FS
is approaching its limits - the tmpfs here are generally short lived,
and may be intentionally running close to limits (e.g. if they "know"
exactly how much they need, and don't expect to write any more than
that). In this case, the limits are there to guard against runaway
(and assist with scheduling), so "monitor and increase limits
periodically" isn't appropriate.

It's meant just to make it easier to distinguish between "tmpfs write
failed due to OOM" and "tmpfs write failed because you exceeded tmpfs'
max size" (what makes tmpfs "special" is that tmpfs, for good reason,
returns ENOSPC for both of these situations to the user). For a small
task a user could easily go from 0% to full, or OOM, rather quickly,
so statfs polling would likely miss the event. The orchestrator can,
when the task fails, easily (and reliably) look at this statistic to
determine if a user exceeded the tmpfs limit.

(I do see the parallel here to thin provisioned storage - "exceeded
your individual budget" vs. "underlying overcommitted system ran out
of bytes")

> 2. OTOH, exporting internal fs stats via /sys/fs for debugging, health
> monitoring
>     or whatever seems legit to me and is widely practiced by other fs, so
>     exposing those tmpfs stats as this patch set is doing seems fine to me.
>
> Another point worth considering in favor of /sys/fs/tmpfs -
> since tmpfs is FS_USERNS_MOUNT, the ability of sysadmin to monitor all
> tmpfs mounts in the system and their usage is limited.
>
> Therefore, having a central way to enumerate all tmpfs instances in the system
> like blockdev fs instances and like fuse fs instances, does not sound
> like a terrible
> idea in general.
>
> > > Whatever that action is, I see no user-facing documentation which
> > > guides the user info how to take advantage of this?
> >
> > I can follow up with a new version with documentation, if we agree this
> > feature makes sense.
> >
>
> Given the time of year and participants involved, shall we continue
> this discussion
> in LSFMM?
>
> I am not sure if this even requires a shared FS/MM session, but I
> don't mind trying
> to allocate a shared FS/MM slot if Andrew and MM guys are interested
> to take part
> in the discussion.
>
> As long as memcg is able to report OOM to the orchestrator, the problem does not
> sound very tmpfs specific to me.
>
> As Ted explained, cloud providers (for some reason) charge by disk size and not
> by disk usage, so also for non-tmpfs, online growing the fs on demand could
> prove to be a rewarding practice for cloud applications.
>
> Thanks,
> Amir.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3999 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
  2022-04-21 23:19       ` Khazhy Kumykov
@ 2022-04-22  9:02         ` Amir Goldstein
  2022-05-05 21:16           ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 15+ messages in thread
From: Amir Goldstein @ 2022-04-22  9:02 UTC (permalink / raw)
  To: Khazhy Kumykov
  Cc: Gabriel Krisman Bertazi, Andrew Morton, Hugh Dickins, Al Viro,
	kernel, Linux MM, linux-fsdevel, Theodore Tso

On Fri, Apr 22, 2022 at 2:19 AM Khazhy Kumykov <khazhy@google.com> wrote:
>
> On Wed, Apr 20, 2022 at 10:34 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Tue, Apr 19, 2022 at 6:29 PM Gabriel Krisman Bertazi
> > <krisman@collabora.com> wrote:
> > >
> > > Andrew Morton <akpm@linux-foundation.org> writes:
> > >
> > > Hi Andrew,
> > >
> > > > On Mon, 18 Apr 2022 17:37:10 -0400 Gabriel Krisman Bertazi <krisman@collabora.com> wrote:
> > > >
> > > >> When provisioning containerized applications, multiple very small tmpfs
> > > >
> > > > "files"?
> > >
> > > Actually, filesystems.  In cloud environments, we have several small
> > > tmpfs associated with containerized tasks.
> > >
> > > >> are used, for which one cannot always predict the proper file system
> > > >> size ahead of time.  We want to be able to reliably monitor filesystems
> > > >> for ENOSPC errors, without depending on the application being executed
> > > >> reporting the ENOSPC after a failure.
> > > >
> > > > Well that sucks.  We need a kernel-side workaround for applications
> > > > that fail to check and report storage errors?
> > > >
> > > > We could do this for every syscall in the kernel.  What's special about
> > > > tmpfs in this regard?
> > > >
> > > > Please provide additional justification and usage examples for such an
> > > > extraordinary thing.
> > >
> > > For a cloud provider deploying containerized applications, they might
> > > not control the application, so patching userspace wouldn't be a
> > > solution.  More importantly - and why this is shmem specific -
> > > they want to differentiate between a user getting ENOSPC due to
> > > insufficiently provisioned fs size, vs. due to running out of memory in
> > > a container, both of which return ENOSPC to the process.
> > >
> >
> > Isn't there already a per memcg OOM handler that could be used by
> > orchestrator to detect the latter?
> >
> > > A system administrator can then use this feature to monitor a fleet of
> > > containerized applications in a uniform way, detect provisioning issues
> > > caused by different reasons and address the deployment.
> > >
> > > I originally submitted this as a new fanotify event, but given the
> > > specificity of shmem, Amir suggested the interface I'm implementing
> > > here.  We've raised this discussion originally here:
> > >
> > > https://lore.kernel.org/linux-mm/CACGdZYLLCqzS4VLUHvzYG=rX3SEJaG7Vbs8_Wb_iUVSvXsqkxA@mail.gmail.com/
> > >
> >
> > To put things in context, the points I was trying to make in this
> > discussion are:
> >
> > 1. Why isn't monitoring with statfs() a sufficient solution? and more
> >     specifically, the shared disk space provisioning problem does not sound
> >     very tmpfs specific to me.
> >     It is a well known issue for thin provisioned storage in environments
> >     with shared resources as the ones that you describe
>
> I think this solves a different problem: to my understanding statfs
> polling is useful for determining if a long lived, slowly growing FS
> is approaching its limits - the tmpfs here are generally short lived,
> and may be intentionally running close to limits (e.g. if they "know"
> exactly how much they need, and don't expect to write any more than
> that). In this case, the limits are there to guard against runaway
> (and assist with scheduling), so "monitor and increase limits
> periodically" isn't appropriate.
>
> It's meant just to make it easier to distinguish between "tmpfs write
> failed due to OOM" and "tmpfs write failed because you exceeded tmpfs'
> max size" (what makes tmpfs "special" is that tmpfs, for good reason,
> returns ENOSPC for both of these situations to the user). For a small

Maybe it's for a good reason, but it clearly is not the desired behavior
in your use case. Perhaps what is needed here is a way for user to opt-in
to a different OOM behavior from shmem using a mount option?
Would that be enough to cover your use case?

> task a user could easily go from 0% to full, or OOM, rather quickly,
> so statfs polling would likely miss the event. The orchestrator can,
> when the task fails, easily (and reliably) look at this statistic to
> determine if a user exceeded the tmpfs limit.
>
> (I do see the parallel here to thin provisioned storage - "exceeded
> your individual budget" vs. "underlying overcommitted system ran out
> of bytes")

Right, and in this case, the application gets a different error in case
of "underlying space overcommitted", usually EIO, that's why I think that
opting-in for this same behavior could make sense for tmpfs.

We can even consider shutdown behavior for shmem in that case, but
that is up to whoever may be interested in that kind of behavior.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support
@ 2022-04-22  9:54 ` Dan Carpenter
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Carpenter @ 2022-04-22  9:54 UTC (permalink / raw)
  To: kbuild, Gabriel Krisman Bertazi, hughd, akpm, amir73il
  Cc: lkp, kbuild-all, viro, Gabriel Krisman Bertazi, kernel,
	Khazhismel Kumykov, Linux MM, linux-fsdevel

Hi Gabriel,

url:    https://github.com/intel-lab-lkp/linux/commits/Gabriel-Krisman-Bertazi/shmem-Allow-userspace-monitoring-of-tmpfs-for-lack-of-space/20220419-054011
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b2d229d4ddb17db541098b83524d901257e93845
config: ia64-randconfig-m031-20220418 (https://download.01.org/0day-ci/archive/20220420/202204200819.72S8HjcF-lkp@intel.com/config)
compiler: ia64-linux-gcc (GCC) 11.2.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
mm/shmem.c:3965 shmem_init() warn: passing zero to 'ERR_PTR'

vim +/ERR_PTR +3965 mm/shmem.c

41ffe5d5ceef7f Hugh Dickins            2011-08-03  3928  int __init shmem_init(void)
^1da177e4c3f41 Linus Torvalds          2005-04-16  3929  {
^1da177e4c3f41 Linus Torvalds          2005-04-16  3930  	int error;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3931  
9a8ec03ed022b7 weiping zhang           2017-11-15  3932  	shmem_init_inodecache();
^1da177e4c3f41 Linus Torvalds          2005-04-16  3933  
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3934  	error = register_filesystem(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3935  	if (error) {
1170532bb49f94 Joe Perches             2016-03-17  3936  		pr_err("Could not register tmpfs\n");
^1da177e4c3f41 Linus Torvalds          2005-04-16  3937  		goto out2;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3938  	}
95dc112a5770dc Greg Kroah-Hartman      2005-06-20  3939  
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3940  	shmem_root = kobject_create_and_add("tmpfs", fs_kobj);
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3941  	if (!shmem_root)
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3942  		goto out1;

error = -ENOMEM;

e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3943  
ca4e05195dbc25 Al Viro                 2013-08-31  3944  	shm_mnt = kern_mount(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3945  	if (IS_ERR(shm_mnt)) {
^1da177e4c3f41 Linus Torvalds          2005-04-16  3946  		error = PTR_ERR(shm_mnt);
1170532bb49f94 Joe Perches             2016-03-17  3947  		pr_err("Could not kern_mount tmpfs\n");
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3948  		goto put_kobj;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3949  	}
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3950  
396bcc5299c281 Matthew Wilcox (Oracle  2020-04-06  3951) #ifdef CONFIG_TRANSPARENT_HUGEPAGE
435c0b87d661da Kirill A. Shutemov      2017-08-25  3952  	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3953  		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3954  	else
5e6e5a12a44ca5 Hugh Dickins            2021-09-02  3955  		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3956  #endif
^1da177e4c3f41 Linus Torvalds          2005-04-16  3957  	return 0;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3958  
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3959  put_kobj:
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3960  	kobject_put(shmem_root);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3961  out1:
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3962  	unregister_filesystem(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3963  out2:
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3964  	shmem_destroy_inodecache();
^1da177e4c3f41 Linus Torvalds          2005-04-16 @3965  	shm_mnt = ERR_PTR(error);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3966  	return error;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3967  }

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support
@ 2022-04-22  9:54 ` Dan Carpenter
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Carpenter @ 2022-04-22  9:54 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3977 bytes --]

Hi Gabriel,

url:    https://github.com/intel-lab-lkp/linux/commits/Gabriel-Krisman-Bertazi/shmem-Allow-userspace-monitoring-of-tmpfs-for-lack-of-space/20220419-054011
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b2d229d4ddb17db541098b83524d901257e93845
config: ia64-randconfig-m031-20220418 (https://download.01.org/0day-ci/archive/20220420/202204200819.72S8HjcF-lkp(a)intel.com/config)
compiler: ia64-linux-gcc (GCC) 11.2.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
mm/shmem.c:3965 shmem_init() warn: passing zero to 'ERR_PTR'

vim +/ERR_PTR +3965 mm/shmem.c

41ffe5d5ceef7f Hugh Dickins            2011-08-03  3928  int __init shmem_init(void)
^1da177e4c3f41 Linus Torvalds          2005-04-16  3929  {
^1da177e4c3f41 Linus Torvalds          2005-04-16  3930  	int error;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3931  
9a8ec03ed022b7 weiping zhang           2017-11-15  3932  	shmem_init_inodecache();
^1da177e4c3f41 Linus Torvalds          2005-04-16  3933  
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3934  	error = register_filesystem(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3935  	if (error) {
1170532bb49f94 Joe Perches             2016-03-17  3936  		pr_err("Could not register tmpfs\n");
^1da177e4c3f41 Linus Torvalds          2005-04-16  3937  		goto out2;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3938  	}
95dc112a5770dc Greg Kroah-Hartman      2005-06-20  3939  
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3940  	shmem_root = kobject_create_and_add("tmpfs", fs_kobj);
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3941  	if (!shmem_root)
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3942  		goto out1;

error = -ENOMEM;

e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3943  
ca4e05195dbc25 Al Viro                 2013-08-31  3944  	shm_mnt = kern_mount(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3945  	if (IS_ERR(shm_mnt)) {
^1da177e4c3f41 Linus Torvalds          2005-04-16  3946  		error = PTR_ERR(shm_mnt);
1170532bb49f94 Joe Perches             2016-03-17  3947  		pr_err("Could not kern_mount tmpfs\n");
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3948  		goto put_kobj;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3949  	}
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3950  
396bcc5299c281 Matthew Wilcox (Oracle  2020-04-06  3951) #ifdef CONFIG_TRANSPARENT_HUGEPAGE
435c0b87d661da Kirill A. Shutemov      2017-08-25  3952  	if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3953  		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3954  	else
5e6e5a12a44ca5 Hugh Dickins            2021-09-02  3955  		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
5a6e75f8110c97 Kirill A. Shutemov      2016-07-26  3956  #endif
^1da177e4c3f41 Linus Torvalds          2005-04-16  3957  	return 0;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3958  
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3959  put_kobj:
e43933b9793ad3 Gabriel Krisman Bertazi 2022-04-18  3960  	kobject_put(shmem_root);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3961  out1:
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3962  	unregister_filesystem(&shmem_fs_type);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3963  out2:
41ffe5d5ceef7f Hugh Dickins            2011-08-03  3964  	shmem_destroy_inodecache();
^1da177e4c3f41 Linus Torvalds          2005-04-16 @3965  	shm_mnt = ERR_PTR(error);
^1da177e4c3f41 Linus Torvalds          2005-04-16  3966  	return error;
^1da177e4c3f41 Linus Torvalds          2005-04-16  3967  }

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
  2022-04-22  9:02         ` Amir Goldstein
@ 2022-05-05 21:16           ` Gabriel Krisman Bertazi
  2022-05-12 20:00             ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-05-05 21:16 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Khazhy Kumykov, Andrew Morton, Hugh Dickins, Al Viro, kernel,
	Linux MM, linux-fsdevel, Theodore Tso

Amir Goldstein <amir73il@gmail.com> writes:

>> task a user could easily go from 0% to full, or OOM, rather quickly,
>> so statfs polling would likely miss the event. The orchestrator can,
>> when the task fails, easily (and reliably) look at this statistic to
>> determine if a user exceeded the tmpfs limit.
>>
>> (I do see the parallel here to thin provisioned storage - "exceeded
>> your individual budget" vs. "underlying overcommitted system ran out
>> of bytes")
>
> Right, and in this case, the application gets a different error in case
> of "underlying space overcommitted", usually EIO, that's why I think that
> opting-in for this same behavior could make sense for tmpfs.

Amir,

If I understand correctly, that would allow the application to catch the
lack of memory vs. lack of fs space, but it wouldn't facilitate life for
an orchestrator trying to detect the condition.  Still it seems like a
step in the right direction.  For the orchestrator, it seems necessary
that we expose this is some out-of-band mechanism, a WB_ERROR
notification or sysfs.

As a first step:

>8
Subject: [PATCH] shmem: Differentiate overcommit failure from lack of fs space

When provisioning user applications in cloud environments, it is common
to allocate containers with very small tmpfs and little available
memory.  In such scenarios, it is hard for an application to
differentiate whether its tmpfs IO failed due do insufficient
provisioned filesystem space, or due to running out of memory in the
container, because both situations will return ENOSPC in shmem.

This patch modifies the behavior of shmem failure due to overcommit to
return EIO instead of ENOSPC in this scenario.  In order to preserve the
existing interface, this feature must be enabled through a new
shmem-specific mount option.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
---
 Documentation/filesystems/tmpfs.rst | 16 +++++++++++++++
 include/linux/shmem_fs.h            |  3 +++
 mm/shmem.c                          | 30 ++++++++++++++++++++---------
 3 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
index 0408c245785e..83278d2b15a3 100644
--- a/Documentation/filesystems/tmpfs.rst
+++ b/Documentation/filesystems/tmpfs.rst
@@ -171,6 +171,22 @@ will give you tmpfs instance on /mytmpfs which can allocate 10GB
 RAM/SWAP in 10240 inodes and it is only accessible by root.
 
 
+When provisioning containerized applications, it is common to allocate
+the system with a very small tmpfs and little total memory.  In such
+scenarios, it is sometimes useful for an application to differentiate
+whether an IO operation failed due to insufficient provisioned
+filesystem space or due to running out of container memory.  tmpfs
+includes a mount parameter to treat a memory overcommit limit error
+differently from a lack of filesystem space error, allowing the
+application to differentiate these two scenarios.  If the following
+mount option is specified, surpassing memory overcommit limits on a
+tmpfs will return EIO.  ENOSPC is then only used to report lack of
+filesystem space.
+
+=================   ===================================================
+report_overcommit   Report overcommit issues with EIO instead of ENOSPC
+=================   ===================================================
+
 :Author:
    Christoph Rohland <cr@sap.com>, 1.12.01
 :Updated:
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index e65b80ed09e7..1be57531b257 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -44,6 +44,9 @@ struct shmem_sb_info {
 	spinlock_t shrinklist_lock;   /* Protects shrinklist */
 	struct list_head shrinklist;  /* List of shinkable inodes */
 	unsigned long shrinklist_len; /* Length of shrinklist */
+
+	/* Assist userspace with detecting overcommit errors */
+	bool report_overcommit;
 };
 
 static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
diff --git a/mm/shmem.c b/mm/shmem.c
index a09b29ec2b45..23f2780678df 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -112,6 +112,7 @@ struct shmem_options {
 	kgid_t gid;
 	umode_t mode;
 	bool full_inums;
+	bool report_overcommit;
 	int huge;
 	int seen;
 #define SHMEM_SEEN_BLOCKS 1
@@ -207,13 +208,16 @@ static inline void shmem_unacct_blocks(unsigned long flags, long pages)
 		vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
 }
 
-static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
+static inline int shmem_inode_acct_block(struct inode *inode, long pages)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 
-	if (shmem_acct_block(info->flags, pages))
-		return false;
+	if (shmem_acct_block(info->flags, pages)) {
+		if (sbinfo->report_overcommit)
+			return -EIO;
+		return -ENOSPC;
+	}
 
 	if (sbinfo->max_blocks) {
 		if (percpu_counter_compare(&sbinfo->used_blocks,
@@ -222,11 +226,11 @@ static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
 		percpu_counter_add(&sbinfo->used_blocks, pages);
 	}
 
-	return true;
+	return 0;
 
 unacct:
 	shmem_unacct_blocks(info->flags, pages);
-	return false;
+	return -ENOSPC;
 }
 
 static inline void shmem_inode_unacct_blocks(struct inode *inode, long pages)
@@ -372,7 +376,7 @@ bool shmem_charge(struct inode *inode, long pages)
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	unsigned long flags;
 
-	if (!shmem_inode_acct_block(inode, pages))
+	if (shmem_inode_acct_block(inode, pages))
 		return false;
 
 	/* nrpages adjustment first, then shmem_recalc_inode() when balanced */
@@ -1555,13 +1559,14 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct page *page;
 	int nr;
-	int err = -ENOSPC;
+	int err;
 
 	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
 		huge = false;
 	nr = huge ? HPAGE_PMD_NR : 1;
 
-	if (!shmem_inode_acct_block(inode, nr))
+	err = shmem_inode_acct_block(inode, nr);
+	if (err)
 		goto failed;
 
 	if (huge)
@@ -2324,7 +2329,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
 	int ret;
 	pgoff_t max_off;
 
-	if (!shmem_inode_acct_block(inode, 1)) {
+	if (shmem_inode_acct_block(inode, 1)) {
 		/*
 		 * We may have got a page, returned -ENOENT triggering a retry,
 		 * and now we find ourselves with -ENOMEM. Release the page, to
@@ -3301,6 +3306,7 @@ enum shmem_param {
 	Opt_uid,
 	Opt_inode32,
 	Opt_inode64,
+	Opt_report_overcommit,
 };
 
 static const struct constant_table shmem_param_enums_huge[] = {
@@ -3322,6 +3328,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
 	fsparam_u32   ("uid",		Opt_uid),
 	fsparam_flag  ("inode32",	Opt_inode32),
 	fsparam_flag  ("inode64",	Opt_inode64),
+	fsparam_flag  ("report_overcommit", Opt_report_overcommit),
 	{}
 };
 
@@ -3405,6 +3412,9 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
 		ctx->full_inums = true;
 		ctx->seen |= SHMEM_SEEN_INUMS;
 		break;
+	case Opt_report_overcommit:
+		ctx->report_overcommit = true;
+		break;
 	}
 	return 0;
 
@@ -3513,6 +3523,7 @@ static int shmem_reconfigure(struct fs_context *fc)
 		sbinfo->max_inodes  = ctx->inodes;
 		sbinfo->free_inodes = ctx->inodes - inodes;
 	}
+	sbinfo->report_overcommit = ctx->report_overcommit;
 
 	/*
 	 * Preserve previous mempolicy unless mpol remount option was specified.
@@ -3640,6 +3651,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 	sbinfo->mode = ctx->mode;
 	sbinfo->huge = ctx->huge;
 	sbinfo->mpol = ctx->mpol;
+	sbinfo->report_overcommit = ctx->report_overcommit;
 	ctx->mpol = NULL;
 
 	raw_spin_lock_init(&sbinfo->stat_lock);
-- 
2.35.1




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space.
  2022-05-05 21:16           ` Gabriel Krisman Bertazi
@ 2022-05-12 20:00             ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 15+ messages in thread
From: Gabriel Krisman Bertazi @ 2022-05-12 20:00 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Khazhy Kumykov, Andrew Morton, Hugh Dickins, Al Viro, kernel,
	Linux MM, linux-fsdevel, Theodore Tso

Gabriel Krisman Bertazi <krisman@collabora.com> writes:

> Amir Goldstein <amir73il@gmail.com> writes:
>
>>> task a user could easily go from 0% to full, or OOM, rather quickly,
>>> so statfs polling would likely miss the event. The orchestrator can,
>>> when the task fails, easily (and reliably) look at this statistic to
>>> determine if a user exceeded the tmpfs limit.
>>>
>>> (I do see the parallel here to thin provisioned storage - "exceeded
>>> your individual budget" vs. "underlying overcommitted system ran out
>>> of bytes")
>>
>> Right, and in this case, the application gets a different error in case
>> of "underlying space overcommitted", usually EIO, that's why I think that
>> opting-in for this same behavior could make sense for tmpfs.
>
> Amir,
>
> If I understand correctly, that would allow the application to catch the
> lack of memory vs. lack of fs space, but it wouldn't facilitate life for
> an orchestrator trying to detect the condition.  Still it seems like a
> step in the right direction.  For the orchestrator, it seems necessary
> that we expose this is some out-of-band mechanism, a WB_ERROR
> notification or sysfs.

Amir,

Regarding allowing an orchestrator to catch this situation, I'd like to
go back to the original proposal and create a new tmpfs
"thin-provisioned" option that will return a different error code (as
the patch below, that I sent last week) and also issue a special
FAN_FS_ERROR/WB_ERROR to notify the orchestrator of this situation.
This would completely solve the use case, I believe.  Since this is
quite specific to tmpfs, it is reasonable to implement the notification
at FS level, similar to how other FS_ERRORs are implemented.

> As a first step:
>
>>8
> Subject: [PATCH] shmem: Differentiate overcommit failure from lack of fs space
>
> When provisioning user applications in cloud environments, it is common
> to allocate containers with very small tmpfs and little available
> memory.  In such scenarios, it is hard for an application to
> differentiate whether its tmpfs IO failed due do insufficient
> provisioned filesystem space, or due to running out of memory in the
> container, because both situations will return ENOSPC in shmem.
>
> This patch modifies the behavior of shmem failure due to overcommit to
> return EIO instead of ENOSPC in this scenario.  In order to preserve the
> existing interface, this feature must be enabled through a new
> shmem-specific mount option.
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
> ---
>  Documentation/filesystems/tmpfs.rst | 16 +++++++++++++++
>  include/linux/shmem_fs.h            |  3 +++
>  mm/shmem.c                          | 30 ++++++++++++++++++++---------
>  3 files changed, 40 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
> index 0408c245785e..83278d2b15a3 100644
> --- a/Documentation/filesystems/tmpfs.rst
> +++ b/Documentation/filesystems/tmpfs.rst
> @@ -171,6 +171,22 @@ will give you tmpfs instance on /mytmpfs which can allocate 10GB
>  RAM/SWAP in 10240 inodes and it is only accessible by root.
>  
>  
> +When provisioning containerized applications, it is common to allocate
> +the system with a very small tmpfs and little total memory.  In such
> +scenarios, it is sometimes useful for an application to differentiate
> +whether an IO operation failed due to insufficient provisioned
> +filesystem space or due to running out of container memory.  tmpfs
> +includes a mount parameter to treat a memory overcommit limit error
> +differently from a lack of filesystem space error, allowing the
> +application to differentiate these two scenarios.  If the following
> +mount option is specified, surpassing memory overcommit limits on a
> +tmpfs will return EIO.  ENOSPC is then only used to report lack of
> +filesystem space.
> +
> +=================   ===================================================
> +report_overcommit   Report overcommit issues with EIO instead of ENOSPC
> +=================   ===================================================
> +
>  :Author:
>     Christoph Rohland <cr@sap.com>, 1.12.01
>  :Updated:
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index e65b80ed09e7..1be57531b257 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -44,6 +44,9 @@ struct shmem_sb_info {
>  	spinlock_t shrinklist_lock;   /* Protects shrinklist */
>  	struct list_head shrinklist;  /* List of shinkable inodes */
>  	unsigned long shrinklist_len; /* Length of shrinklist */
> +
> +	/* Assist userspace with detecting overcommit errors */
> +	bool report_overcommit;
>  };
>  
>  static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index a09b29ec2b45..23f2780678df 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -112,6 +112,7 @@ struct shmem_options {
>  	kgid_t gid;
>  	umode_t mode;
>  	bool full_inums;
> +	bool report_overcommit;
>  	int huge;
>  	int seen;
>  #define SHMEM_SEEN_BLOCKS 1
> @@ -207,13 +208,16 @@ static inline void shmem_unacct_blocks(unsigned long flags, long pages)
>  		vm_unacct_memory(pages * VM_ACCT(PAGE_SIZE));
>  }
>  
> -static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
> +static inline int shmem_inode_acct_block(struct inode *inode, long pages)
>  {
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
>  
> -	if (shmem_acct_block(info->flags, pages))
> -		return false;
> +	if (shmem_acct_block(info->flags, pages)) {
> +		if (sbinfo->report_overcommit)
> +			return -EIO;
> +		return -ENOSPC;
> +	}
>  
>  	if (sbinfo->max_blocks) {
>  		if (percpu_counter_compare(&sbinfo->used_blocks,
> @@ -222,11 +226,11 @@ static inline bool shmem_inode_acct_block(struct inode *inode, long pages)
>  		percpu_counter_add(&sbinfo->used_blocks, pages);
>  	}
>  
> -	return true;
> +	return 0;
>  
>  unacct:
>  	shmem_unacct_blocks(info->flags, pages);
> -	return false;
> +	return -ENOSPC;
>  }
>  
>  static inline void shmem_inode_unacct_blocks(struct inode *inode, long pages)
> @@ -372,7 +376,7 @@ bool shmem_charge(struct inode *inode, long pages)
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	unsigned long flags;
>  
> -	if (!shmem_inode_acct_block(inode, pages))
> +	if (shmem_inode_acct_block(inode, pages))
>  		return false;
>  
>  	/* nrpages adjustment first, then shmem_recalc_inode() when balanced */
> @@ -1555,13 +1559,14 @@ static struct page *shmem_alloc_and_acct_page(gfp_t gfp,
>  	struct shmem_inode_info *info = SHMEM_I(inode);
>  	struct page *page;
>  	int nr;
> -	int err = -ENOSPC;
> +	int err;
>  
>  	if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
>  		huge = false;
>  	nr = huge ? HPAGE_PMD_NR : 1;
>  
> -	if (!shmem_inode_acct_block(inode, nr))
> +	err = shmem_inode_acct_block(inode, nr);
> +	if (err)
>  		goto failed;
>  
>  	if (huge)
> @@ -2324,7 +2329,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
>  	int ret;
>  	pgoff_t max_off;
>  
> -	if (!shmem_inode_acct_block(inode, 1)) {
> +	if (shmem_inode_acct_block(inode, 1)) {
>  		/*
>  		 * We may have got a page, returned -ENOENT triggering a retry,
>  		 * and now we find ourselves with -ENOMEM. Release the page, to
> @@ -3301,6 +3306,7 @@ enum shmem_param {
>  	Opt_uid,
>  	Opt_inode32,
>  	Opt_inode64,
> +	Opt_report_overcommit,
>  };
>  
>  static const struct constant_table shmem_param_enums_huge[] = {
> @@ -3322,6 +3328,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = {
>  	fsparam_u32   ("uid",		Opt_uid),
>  	fsparam_flag  ("inode32",	Opt_inode32),
>  	fsparam_flag  ("inode64",	Opt_inode64),
> +	fsparam_flag  ("report_overcommit", Opt_report_overcommit),
>  	{}
>  };
>  
> @@ -3405,6 +3412,9 @@ static int shmem_parse_one(struct fs_context *fc, struct fs_parameter *param)
>  		ctx->full_inums = true;
>  		ctx->seen |= SHMEM_SEEN_INUMS;
>  		break;
> +	case Opt_report_overcommit:
> +		ctx->report_overcommit = true;
> +		break;
>  	}
>  	return 0;
>  
> @@ -3513,6 +3523,7 @@ static int shmem_reconfigure(struct fs_context *fc)
>  		sbinfo->max_inodes  = ctx->inodes;
>  		sbinfo->free_inodes = ctx->inodes - inodes;
>  	}
> +	sbinfo->report_overcommit = ctx->report_overcommit;
>  
>  	/*
>  	 * Preserve previous mempolicy unless mpol remount option was specified.
> @@ -3640,6 +3651,7 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
>  	sbinfo->mode = ctx->mode;
>  	sbinfo->huge = ctx->huge;
>  	sbinfo->mpol = ctx->mpol;
> +	sbinfo->report_overcommit = ctx->report_overcommit;
>  	ctx->mpol = NULL;
>  
>  	raw_spin_lock_init(&sbinfo->stat_lock);

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-05-12 20:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18 21:37 [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Gabriel Krisman Bertazi
2022-04-18 21:37 ` [PATCH v3 1/3] shmem: Keep track of out-of-memory and out-of-space errors Gabriel Krisman Bertazi
2022-04-18 21:37 ` [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support Gabriel Krisman Bertazi
2022-04-18 21:37 ` [PATCH v3 3/3] shmem: Expose space and accounting error count Gabriel Krisman Bertazi
2022-04-19  3:42 ` [PATCH v3 0/3] shmem: Allow userspace monitoring of tmpfs for lack of space Andrew Morton
2022-04-19 15:28   ` Gabriel Krisman Bertazi
2022-04-21  5:33     ` Amir Goldstein
2022-04-21 22:37       ` Gabriel Krisman Bertazi
2022-04-21 23:19       ` Khazhy Kumykov
2022-04-22  9:02         ` Amir Goldstein
2022-05-05 21:16           ` Gabriel Krisman Bertazi
2022-05-12 20:00             ` Gabriel Krisman Bertazi
2022-04-20  0:10 [PATCH v3 2/3] shmem: Introduce /sys/fs/tmpfs support kernel test robot
2022-04-22  9:54 ` Dan Carpenter
2022-04-22  9:54 ` Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.