All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 15/15] mm: add strictlimit knob
@ 2017-11-30 22:15 akpm
  2017-12-01 12:29 ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: akpm @ 2017-11-30 22:15 UTC (permalink / raw)
  To: linux-mm, akpm, MPatlasov, fengguang.wu, hmh, jack, mel, t.artem, tytso

From: Maxim Patlasov <MPatlasov@parallels.com>
Subject: mm: add strictlimit knob

The "strictlimit" feature was introduced to enforce per-bdi dirty limits
for FUSE which sets bdi max_ratio to 1% by default:

http://article.gmane.org/gmane.linux.kernel.mm/105809

However the feature can be useful for other relatively slow or untrusted
BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
the feature:

echo 1 > /sys/class/bdi/X:Y/strictlimit

Being enabled, the feature enforces bdi max_ratio limit even if global
(10%) dirty limit is not reached.  Of course, the effect is not visible
until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Artem S. Tashkinov" <t.artem@lycos.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/ABI/testing/sysfs-class-bdi |    8 ++++
 mm/backing-dev.c                          |   35 ++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff -puN Documentation/ABI/testing/sysfs-class-bdi~mm-add-strictlimit-knob-v2 Documentation/ABI/testing/sysfs-class-bdi
--- a/Documentation/ABI/testing/sysfs-class-bdi~mm-add-strictlimit-knob-v2
+++ a/Documentation/ABI/testing/sysfs-class-bdi
@@ -53,3 +53,11 @@ stable_pages_required (read-only)
 
 	If set, the backing device requires that all pages comprising a write
 	request must not be changed until writeout is complete.
+
+strictlimit (read-write)
+
+	Forces per-BDI checks for the share of given device in the write-back
+	cache even before the global background dirty limit is reached. This
+	is useful in situations where the global limit is much higher than
+	affordable for given relatively slow (or untrusted) device. Turning
+	strictlimit on has no visible effect if max_ratio is equal to 100%.
diff -puN mm/backing-dev.c~mm-add-strictlimit-knob-v2 mm/backing-dev.c
--- a/mm/backing-dev.c~mm-add-strictlimit-knob-v2
+++ a/mm/backing-dev.c
@@ -218,11 +218,46 @@ static ssize_t stable_pages_required_sho
 }
 static DEVICE_ATTR_RO(stable_pages_required);
 
+static ssize_t strictlimit_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
+	unsigned int val;
+	ssize_t ret;
+
+	ret = kstrtouint(buf, 10, &val);
+	if (ret < 0)
+		return ret;
+
+	switch (val) {
+	case 0:
+		bdi->capabilities &= ~BDI_CAP_STRICTLIMIT;
+		break;
+	case 1:
+		bdi->capabilities |= BDI_CAP_STRICTLIMIT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return count;
+}
+static ssize_t strictlimit_show(struct device *dev,
+		struct device_attribute *attr, char *page)
+{
+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
+
+	return snprintf(page, PAGE_SIZE-1, "%d\n",
+			!!(bdi->capabilities & BDI_CAP_STRICTLIMIT));
+}
+static DEVICE_ATTR_RW(strictlimit);
+
 static struct attribute *bdi_dev_attrs[] = {
 	&dev_attr_read_ahead_kb.attr,
 	&dev_attr_min_ratio.attr,
 	&dev_attr_max_ratio.attr,
 	&dev_attr_stable_pages_required.attr,
+	&dev_attr_strictlimit.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(bdi_dev);
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 15/15] mm: add strictlimit knob
  2017-11-30 22:15 [patch 15/15] mm: add strictlimit knob akpm
@ 2017-12-01 12:29 ` Jan Kara
  2017-12-07  1:09   ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2017-12-01 12:29 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, MPatlasov, fengguang.wu, hmh, jack, mel, t.artem, tytso

On Thu 30-11-17 14:15:58, Andrew Morton wrote:
> From: Maxim Patlasov <MPatlasov@parallels.com>
> Subject: mm: add strictlimit knob
> 
> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
> for FUSE which sets bdi max_ratio to 1% by default:
> 
> http://article.gmane.org/gmane.linux.kernel.mm/105809
> 
> However the feature can be useful for other relatively slow or untrusted
> BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
> the feature:
> 
> echo 1 > /sys/class/bdi/X:Y/strictlimit
> 
> Being enabled, the feature enforces bdi max_ratio limit even if global
> (10%) dirty limit is not reached.  Of course, the effect is not visible
> until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.

In principle I have nothing against this and the usecase sounds reasonable
(in fact I believe the lack of a feature like this is one of reasons why
desktop automounters usually mount USB devices with 'sync' mount option).
So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: "Artem S. Tashkinov" <t.artem@lycos.com>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  Documentation/ABI/testing/sysfs-class-bdi |    8 ++++
>  mm/backing-dev.c                          |   35 ++++++++++++++++++++
>  2 files changed, 43 insertions(+)
> 
> diff -puN Documentation/ABI/testing/sysfs-class-bdi~mm-add-strictlimit-knob-v2 Documentation/ABI/testing/sysfs-class-bdi
> --- a/Documentation/ABI/testing/sysfs-class-bdi~mm-add-strictlimit-knob-v2
> +++ a/Documentation/ABI/testing/sysfs-class-bdi
> @@ -53,3 +53,11 @@ stable_pages_required (read-only)
>  
>  	If set, the backing device requires that all pages comprising a write
>  	request must not be changed until writeout is complete.
> +
> +strictlimit (read-write)
> +
> +	Forces per-BDI checks for the share of given device in the write-back
> +	cache even before the global background dirty limit is reached. This
> +	is useful in situations where the global limit is much higher than
> +	affordable for given relatively slow (or untrusted) device. Turning
> +	strictlimit on has no visible effect if max_ratio is equal to 100%.
> diff -puN mm/backing-dev.c~mm-add-strictlimit-knob-v2 mm/backing-dev.c
> --- a/mm/backing-dev.c~mm-add-strictlimit-knob-v2
> +++ a/mm/backing-dev.c
> @@ -218,11 +218,46 @@ static ssize_t stable_pages_required_sho
>  }
>  static DEVICE_ATTR_RO(stable_pages_required);
>  
> +static ssize_t strictlimit_store(struct device *dev,
> +		struct device_attribute *attr, const char *buf, size_t count)
> +{
> +	struct backing_dev_info *bdi = dev_get_drvdata(dev);
> +	unsigned int val;
> +	ssize_t ret;
> +
> +	ret = kstrtouint(buf, 10, &val);
> +	if (ret < 0)
> +		return ret;
> +
> +	switch (val) {
> +	case 0:
> +		bdi->capabilities &= ~BDI_CAP_STRICTLIMIT;
> +		break;
> +	case 1:
> +		bdi->capabilities |= BDI_CAP_STRICTLIMIT;
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return count;
> +}
> +static ssize_t strictlimit_show(struct device *dev,
> +		struct device_attribute *attr, char *page)
> +{
> +	struct backing_dev_info *bdi = dev_get_drvdata(dev);
> +
> +	return snprintf(page, PAGE_SIZE-1, "%d\n",
> +			!!(bdi->capabilities & BDI_CAP_STRICTLIMIT));
> +}
> +static DEVICE_ATTR_RW(strictlimit);
> +
>  static struct attribute *bdi_dev_attrs[] = {
>  	&dev_attr_read_ahead_kb.attr,
>  	&dev_attr_min_ratio.attr,
>  	&dev_attr_max_ratio.attr,
>  	&dev_attr_stable_pages_required.attr,
> +	&dev_attr_strictlimit.attr,
>  	NULL,
>  };
>  ATTRIBUTE_GROUPS(bdi_dev);
> _
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 15/15] mm: add strictlimit knob
  2017-12-01 12:29 ` Jan Kara
@ 2017-12-07  1:09   ` Andrew Morton
  2017-12-07  4:14     ` Fengguang Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2017-12-07  1:09 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-mm, MPatlasov, fengguang.wu, hmh, mel, t.artem, tytso, Jens Axboe

On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@suse.cz> wrote:

> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
> > From: Maxim Patlasov <MPatlasov@parallels.com>
> > Subject: mm: add strictlimit knob
> > 
> > The "strictlimit" feature was introduced to enforce per-bdi dirty limits
> > for FUSE which sets bdi max_ratio to 1% by default:
> > 
> > http://article.gmane.org/gmane.linux.kernel.mm/105809
> > 
> > However the feature can be useful for other relatively slow or untrusted
> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
> > the feature:
> > 
> > echo 1 > /sys/class/bdi/X:Y/strictlimit
> > 
> > Being enabled, the feature enforces bdi max_ratio limit even if global
> > (10%) dirty limit is not reached.  Of course, the effect is not visible
> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.
> 
> In principle I have nothing against this and the usecase sounds reasonable
> (in fact I believe the lack of a feature like this is one of reasons why
> desktop automounters usually mount USB devices with 'sync' mount option).
> So feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 

Cc Jens, who may be vaguely interested in plans to finally merge this
three-year-old patch?



From: Maxim Patlasov <MPatlasov@parallels.com>
Subject: mm: add strictlimit knob

The "strictlimit" feature was introduced to enforce per-bdi dirty limits
for FUSE which sets bdi max_ratio to 1% by default:

http://article.gmane.org/gmane.linux.kernel.mm/105809

However the feature can be useful for other relatively slow or untrusted
BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
the feature:

echo 1 > /sys/class/bdi/X:Y/strictlimit

Being enabled, the feature enforces bdi max_ratio limit even if global
(10%) dirty limit is not reached.  Of course, the effect is not visible
until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.

Jan said:

: In principle I have nothing against this and the usecase sounds reasonable
: (in fact I believe the lack of a feature like this is one of reasons why
: desktop automounters usually mount USB devices with 'sync' mount option). 
: So feel free to add:

Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Artem S. Tashkinov" <t.artem@lycos.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/ABI/testing/sysfs-class-bdi |    8 ++++
 mm/backing-dev.c                          |   35 ++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff -puN Documentation/ABI/testing/sysfs-class-bdi~mm-add-strictlimit-knob-v2 Documentation/ABI/testing/sysfs-class-bdi
--- a/Documentation/ABI/testing/sysfs-class-bdi~mm-add-strictlimit-knob-v2
+++ a/Documentation/ABI/testing/sysfs-class-bdi
@@ -53,3 +53,11 @@ stable_pages_required (read-only)
 
 	If set, the backing device requires that all pages comprising a write
 	request must not be changed until writeout is complete.
+
+strictlimit (read-write)
+
+	Forces per-BDI checks for the share of given device in the write-back
+	cache even before the global background dirty limit is reached. This
+	is useful in situations where the global limit is much higher than
+	affordable for given relatively slow (or untrusted) device. Turning
+	strictlimit on has no visible effect if max_ratio is equal to 100%.
diff -puN mm/backing-dev.c~mm-add-strictlimit-knob-v2 mm/backing-dev.c
--- a/mm/backing-dev.c~mm-add-strictlimit-knob-v2
+++ a/mm/backing-dev.c
@@ -231,11 +231,46 @@ static ssize_t stable_pages_required_sho
 }
 static DEVICE_ATTR_RO(stable_pages_required);
 
+static ssize_t strictlimit_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
+	unsigned int val;
+	ssize_t ret;
+
+	ret = kstrtouint(buf, 10, &val);
+	if (ret < 0)
+		return ret;
+
+	switch (val) {
+	case 0:
+		bdi->capabilities &= ~BDI_CAP_STRICTLIMIT;
+		break;
+	case 1:
+		bdi->capabilities |= BDI_CAP_STRICTLIMIT;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return count;
+}
+static ssize_t strictlimit_show(struct device *dev,
+		struct device_attribute *attr, char *page)
+{
+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
+
+	return snprintf(page, PAGE_SIZE-1, "%d\n",
+			!!(bdi->capabilities & BDI_CAP_STRICTLIMIT));
+}
+static DEVICE_ATTR_RW(strictlimit);
+
 static struct attribute *bdi_dev_attrs[] = {
 	&dev_attr_read_ahead_kb.attr,
 	&dev_attr_min_ratio.attr,
 	&dev_attr_max_ratio.attr,
 	&dev_attr_stable_pages_required.attr,
+	&dev_attr_strictlimit.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(bdi_dev);
_


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 15/15] mm: add strictlimit knob
  2017-12-07  1:09   ` Andrew Morton
@ 2017-12-07  4:14     ` Fengguang Wu
  2017-12-07  8:50       ` Miklos Szeredi
  0 siblings, 1 reply; 8+ messages in thread
From: Fengguang Wu @ 2017-12-07  4:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, linux-mm, MPatlasov, hmh, mel, t.artem, tytso,
	Jens Axboe, Miklos Szeredi, linux-fsdevel

CC fuse maintainer, too.

On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
>On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@suse.cz> wrote:
>
>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
>> > From: Maxim Patlasov <MPatlasov@parallels.com>
>> > Subject: mm: add strictlimit knob
>> >
>> > The "strictlimit" feature was introduced to enforce per-bdi dirty limits
>> > for FUSE which sets bdi max_ratio to 1% by default:
>> >
>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
>> >
>> > However the feature can be useful for other relatively slow or untrusted
>> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
>> > the feature:
>> >
>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
>> >
>> > Being enabled, the feature enforces bdi max_ratio limit even if global
>> > (10%) dirty limit is not reached.  Of course, the effect is not visible
>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.
>>
>> In principle I have nothing against this and the usecase sounds reasonable
>> (in fact I believe the lack of a feature like this is one of reasons why
>> desktop automounters usually mount USB devices with 'sync' mount option).
>> So feel free to add:
>>
>> Reviewed-by: Jan Kara <jack@suse.cz>
>>
>
>Cc Jens, who may be vaguely interested in plans to finally merge this
>three-year-old patch?
>
>
>
>From: Maxim Patlasov <MPatlasov@parallels.com>
>Subject: mm: add strictlimit knob
>
>The "strictlimit" feature was introduced to enforce per-bdi dirty limits
>for FUSE which sets bdi max_ratio to 1% by default:
>
>http://article.gmane.org/gmane.linux.kernel.mm/105809

That link is invalid for now, possibly due to the gmane site rebuild.
I find an email thread here which looks relevant:

https://sourceforge.net/p/fuse/mailman/message/35254883/

Where Maxim has an interesting point:

        > Did any one try increasing the limit and did see any better/worse 
        > performance ?

        We've used 20% as default value in OpenVZ kernel for a long while (1% 
        was not enough to saturate our distributed parallel storage).

So the knob will also enable people to _disable_ the 1% fuse limit to
increase performance.

So people can use the exposed knob in 2 ways to fit their needs, which
is in general a good thing.

However the comment in wb_position_ratio() says

                        Without strictlimit feature, fuse writeback may
	 * consume arbitrary amount of RAM because it is accounted in
	 * NR_WRITEBACK_TEMP which is not involved in calculating "nr_dirty".

How dangerous would that be if some user disabled the 1% fuse limit
through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
beyond the user's expectation (20% max dirty limit)?

Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
WB_WRITEBACK, which should be throttled when bdi_write_congested().
The congested flag will be set on

        fuse_conn.num_background >= fuse_conn.congestion_threshold
        
So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
it's not included in the 20% dirty limit.

Other than that concern, the patch looks good to me.

Thanks,
Fengguang

>However the feature can be useful for other relatively slow or untrusted
>BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
>the feature:
>
>echo 1 > /sys/class/bdi/X:Y/strictlimit
>
>Being enabled, the feature enforces bdi max_ratio limit even if global
>(10%) dirty limit is not reached.  Of course, the effect is not visible
>until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable value.
>
>Jan said:
>
>: In principle I have nothing against this and the usecase sounds reasonable
>: (in fact I believe the lack of a feature like this is one of reasons why
>: desktop automounters usually mount USB devices with 'sync' mount option).
>: So feel free to add:
>
>Signed-off-by: Maxim Patlasov <MPatlasov@parallels.com>
>Reviewed-by: Jan Kara <jack@suse.cz>
>Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
>Cc: Theodore Ts'o <tytso@mit.edu>
>Cc: "Artem S. Tashkinov" <t.artem@lycos.com>
>Cc: Mel Gorman <mel@csn.ul.ie>
>Cc: Jan Kara <jack@suse.cz>
>Cc: Wu Fengguang <fengguang.wu@intel.com>
>Cc: Jens Axboe <axboe@kernel.dk>
>Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>---
>
> Documentation/ABI/testing/sysfs-class-bdi |    8 ++++
> mm/backing-dev.c                          |   35 ++++++++++++++++++++
> 2 files changed, 43 insertions(+)
>
>diff -puN Documentation/ABI/testing/sysfs-class-bdi~mm-add-strictlimit-knob-v2 Documentation/ABI/testing/sysfs-class-bdi
>--- a/Documentation/ABI/testing/sysfs-class-bdi~mm-add-strictlimit-knob-v2
>+++ a/Documentation/ABI/testing/sysfs-class-bdi
>@@ -53,3 +53,11 @@ stable_pages_required (read-only)
>
> 	If set, the backing device requires that all pages comprising a write
> 	request must not be changed until writeout is complete.
>+
>+strictlimit (read-write)
>+
>+	Forces per-BDI checks for the share of given device in the write-back
>+	cache even before the global background dirty limit is reached. This
>+	is useful in situations where the global limit is much higher than
>+	affordable for given relatively slow (or untrusted) device. Turning
>+	strictlimit on has no visible effect if max_ratio is equal to 100%.
>diff -puN mm/backing-dev.c~mm-add-strictlimit-knob-v2 mm/backing-dev.c
>--- a/mm/backing-dev.c~mm-add-strictlimit-knob-v2
>+++ a/mm/backing-dev.c
>@@ -231,11 +231,46 @@ static ssize_t stable_pages_required_sho
> }
> static DEVICE_ATTR_RO(stable_pages_required);
>
>+static ssize_t strictlimit_store(struct device *dev,
>+		struct device_attribute *attr, const char *buf, size_t count)
>+{
>+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
>+	unsigned int val;
>+	ssize_t ret;
>+
>+	ret = kstrtouint(buf, 10, &val);
>+	if (ret < 0)
>+		return ret;
>+
>+	switch (val) {
>+	case 0:
>+		bdi->capabilities &= ~BDI_CAP_STRICTLIMIT;
>+		break;
>+	case 1:
>+		bdi->capabilities |= BDI_CAP_STRICTLIMIT;
>+		break;
>+	default:
>+		return -EINVAL;
>+	}
>+
>+	return count;
>+}
>+static ssize_t strictlimit_show(struct device *dev,
>+		struct device_attribute *attr, char *page)
>+{
>+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
>+
>+	return snprintf(page, PAGE_SIZE-1, "%d\n",
>+			!!(bdi->capabilities & BDI_CAP_STRICTLIMIT));
>+}
>+static DEVICE_ATTR_RW(strictlimit);
>+
> static struct attribute *bdi_dev_attrs[] = {
> 	&dev_attr_read_ahead_kb.attr,
> 	&dev_attr_min_ratio.attr,
> 	&dev_attr_max_ratio.attr,
> 	&dev_attr_stable_pages_required.attr,
>+	&dev_attr_strictlimit.attr,
> 	NULL,
> };
> ATTRIBUTE_GROUPS(bdi_dev);
>_
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 15/15] mm: add strictlimit knob
  2017-12-07  4:14     ` Fengguang Wu
@ 2017-12-07  8:50       ` Miklos Szeredi
  2017-12-07 10:15         ` Fengguang Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2017-12-07  8:50 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Andrew Morton, Jan Kara, linux-mm, Maxim Patlasov, hmh, mel,
	t.artem, Theodore Ts'o, Jens Axboe, linux-fsdevel

On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> CC fuse maintainer, too.
>
> On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
>>
>> On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@suse.cz> wrote:
>>
>>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
>>> > From: Maxim Patlasov <MPatlasov@parallels.com>
>>> > Subject: mm: add strictlimit knob
>>> >
>>> > The "strictlimit" feature was introduced to enforce per-bdi dirty
>>> > limits
>>> > for FUSE which sets bdi max_ratio to 1% by default:
>>> >
>>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
>>> >
>>> > However the feature can be useful for other relatively slow or
>>> > untrusted
>>> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
>>> > the feature:
>>> >
>>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
>>> >
>>> > Being enabled, the feature enforces bdi max_ratio limit even if global
>>> > (10%) dirty limit is not reached.  Of course, the effect is not visible
>>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable
>>> > value.
>>>
>>> In principle I have nothing against this and the usecase sounds
>>> reasonable
>>> (in fact I believe the lack of a feature like this is one of reasons why
>>> desktop automounters usually mount USB devices with 'sync' mount option).
>>> So feel free to add:
>>>
>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>
>>
>> Cc Jens, who may be vaguely interested in plans to finally merge this
>> three-year-old patch?
>>
>>
>>
>> From: Maxim Patlasov <MPatlasov@parallels.com>
>> Subject: mm: add strictlimit knob
>>
>> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
>> for FUSE which sets bdi max_ratio to 1% by default:
>>
>> http://article.gmane.org/gmane.linux.kernel.mm/105809
>
>
> That link is invalid for now, possibly due to the gmane site rebuild.
> I find an email thread here which looks relevant:
>
> https://sourceforge.net/p/fuse/mailman/message/35254883/
>
> Where Maxim has an interesting point:
>
>        > Did any one try increasing the limit and did see any better/worse
>> performance ?
>
>        We've used 20% as default value in OpenVZ kernel for a long while (1%
> was not enough to saturate our distributed parallel storage).
>
> So the knob will also enable people to _disable_ the 1% fuse limit to
> increase performance.
>
> So people can use the exposed knob in 2 ways to fit their needs, which
> is in general a good thing.
>
> However the comment in wb_position_ratio() says
>
>                        Without strictlimit feature, fuse writeback may
>          * consume arbitrary amount of RAM because it is accounted in
>          * NR_WRITEBACK_TEMP which is not involved in calculating
> "nr_dirty".
>
> How dangerous would that be if some user disabled the 1% fuse limit
> through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
> beyond the user's expectation (20% max dirty limit)?
>
> Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
> WB_WRITEBACK, which should be throttled when bdi_write_congested().
> The congested flag will be set on
>
>        fuse_conn.num_background >= fuse_conn.congestion_threshold
>        So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
> it's not included in the 20% dirty limit.

Only balance_dirty_pages_ratelimited() is going to limit the
generation of dirty pages, I don't think congestion flags will do
that.  And (AFAICS) for fuse only  BDI_CAP_STRICTLIMIT will allow
accounting temp writeback pages when throttling dirty page generation.
So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode.
So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not
permit disabling it for fuse).

Please correct me if I'm wrong in any of the above statements, it's
been a long time I've taken a detailed look at the page writeback
mechanisms.

Thanks,
Miklos

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 15/15] mm: add strictlimit knob
  2017-12-07  8:50       ` Miklos Szeredi
@ 2017-12-07 10:15         ` Fengguang Wu
  2017-12-07 10:32           ` Miklos Szeredi
  0 siblings, 1 reply; 8+ messages in thread
From: Fengguang Wu @ 2017-12-07 10:15 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Andrew Morton, Jan Kara, linux-mm, Maxim Patlasov, hmh, mel,
	t.artem, Theodore Ts'o, Jens Axboe, linux-fsdevel

On Thu, Dec 07, 2017 at 09:50:23AM +0100, Miklos Szeredi wrote:
>On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>> CC fuse maintainer, too.
>>
>> On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
>>>
>>> On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@suse.cz> wrote:
>>>
>>>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
>>>> > From: Maxim Patlasov <MPatlasov@parallels.com>
>>>> > Subject: mm: add strictlimit knob
>>>> >
>>>> > The "strictlimit" feature was introduced to enforce per-bdi dirty
>>>> > limits
>>>> > for FUSE which sets bdi max_ratio to 1% by default:
>>>> >
>>>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
>>>> >
>>>> > However the feature can be useful for other relatively slow or
>>>> > untrusted
>>>> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to enable
>>>> > the feature:
>>>> >
>>>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
>>>> >
>>>> > Being enabled, the feature enforces bdi max_ratio limit even if global
>>>> > (10%) dirty limit is not reached.  Of course, the effect is not visible
>>>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable
>>>> > value.
>>>>
>>>> In principle I have nothing against this and the usecase sounds
>>>> reasonable
>>>> (in fact I believe the lack of a feature like this is one of reasons why
>>>> desktop automounters usually mount USB devices with 'sync' mount option).
>>>> So feel free to add:
>>>>
>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>
>>>
>>> Cc Jens, who may be vaguely interested in plans to finally merge this
>>> three-year-old patch?
>>>
>>>
>>>
>>> From: Maxim Patlasov <MPatlasov@parallels.com>
>>> Subject: mm: add strictlimit knob
>>>
>>> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
>>> for FUSE which sets bdi max_ratio to 1% by default:
>>>
>>> http://article.gmane.org/gmane.linux.kernel.mm/105809
>>
>>
>> That link is invalid for now, possibly due to the gmane site rebuild.
>> I find an email thread here which looks relevant:
>>
>> https://sourceforge.net/p/fuse/mailman/message/35254883/
>>
>> Where Maxim has an interesting point:
>>
>>        > Did any one try increasing the limit and did see any better/worse
>>> performance ?
>>
>>        We've used 20% as default value in OpenVZ kernel for a long while (1%
>> was not enough to saturate our distributed parallel storage).
>>
>> So the knob will also enable people to _disable_ the 1% fuse limit to
>> increase performance.
>>
>> So people can use the exposed knob in 2 ways to fit their needs, which
>> is in general a good thing.
>>
>> However the comment in wb_position_ratio() says
>>
>>                        Without strictlimit feature, fuse writeback may
>>          * consume arbitrary amount of RAM because it is accounted in
>>          * NR_WRITEBACK_TEMP which is not involved in calculating
>> "nr_dirty".
>>
>> How dangerous would that be if some user disabled the 1% fuse limit
>> through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
>> beyond the user's expectation (20% max dirty limit)?
>>
>> Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
>> WB_WRITEBACK, which should be throttled when bdi_write_congested().
>> The congested flag will be set on
>>
>>        fuse_conn.num_background >= fuse_conn.congestion_threshold
>>        So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
>> it's not included in the 20% dirty limit.
>
>Only balance_dirty_pages_ratelimited() is going to limit the
>generation of dirty pages, I don't think congestion flags will do
>that.

Right. However my concern is something to limit the generation of
fuse's _writeback_ pages.

The normal writeback pages are limited in 2 ways:

- balance_dirty_pages_ratelimited()'s dirty throttling:

  nr_dirty + nr_writeback + nr_unstable < global and/or bdi dirty limit

- block layer's nr_requests queue limit

However fuse's NR_WRITEBACK_TEMP looks special and has none of such
limits. The congested bit merely affect the vmscan pageout path.

        pageout
          may_write_to_inode
            inode_write_congested
              wb_congested

I wonder if fuse has its own approach to limit NR_WRITEBACK_TEMP?
Either explicitly or implicitly, there has to be some hard limit.

>And (AFAICS) for fuse only  BDI_CAP_STRICTLIMIT will allow
>accounting temp writeback pages when throttling dirty page generation.
>So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode.
>So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not
>permit disabling it for fuse).

So fuse relies on small nr_dirty. Does fuse impose any explicit or
implicit rule that NR_WRITEBACK_TEMP will never exceed (N * nr_dirty)?
Otherwise the size of NR_WRITEBACK_TEMP cannot be guaranteed.

For example, is it possible for some process (eg. dd) to dirty pages
as fast as possible while some other kernel logic to convert PG_dirty
to NR_WRITEBACK_TEMP as fast as possible, so that even the 1% bdi
strictlimit (which limits PG_dirty rather than NR_WRITEBACK_TEMP)
cannot stop all memory being eat up by ever growing NR_WRITEBACK_TEMP?

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 15/15] mm: add strictlimit knob
  2017-12-07 10:15         ` Fengguang Wu
@ 2017-12-07 10:32           ` Miklos Szeredi
  2018-01-31 22:58             ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2017-12-07 10:32 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Andrew Morton, Jan Kara, linux-mm, Maxim Patlasov, hmh, mel,
	t.artem, Theodore Ts'o, Jens Axboe, linux-fsdevel

On Thu, Dec 7, 2017 at 11:15 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> On Thu, Dec 07, 2017 at 09:50:23AM +0100, Miklos Szeredi wrote:
>>
>> On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu <fengguang.wu@intel.com>
>> wrote:
>>>
>>> CC fuse maintainer, too.
>>>
>>> On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
>>>>
>>>>
>>>> On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@suse.cz> wrote:
>>>>
>>>>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
>>>>> > From: Maxim Patlasov <MPatlasov@parallels.com>
>>>>> > Subject: mm: add strictlimit knob
>>>>> >
>>>>> > The "strictlimit" feature was introduced to enforce per-bdi dirty
>>>>> > limits
>>>>> > for FUSE which sets bdi max_ratio to 1% by default:
>>>>> >
>>>>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
>>>>> >
>>>>> > However the feature can be useful for other relatively slow or
>>>>> > untrusted
>>>>> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to
>>>>> > enable
>>>>> > the feature:
>>>>> >
>>>>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
>>>>> >
>>>>> > Being enabled, the feature enforces bdi max_ratio limit even if
>>>>> > global
>>>>> > (10%) dirty limit is not reached.  Of course, the effect is not
>>>>> > visible
>>>>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable
>>>>> > value.
>>>>>
>>>>> In principle I have nothing against this and the usecase sounds
>>>>> reasonable
>>>>> (in fact I believe the lack of a feature like this is one of reasons
>>>>> why
>>>>> desktop automounters usually mount USB devices with 'sync' mount
>>>>> option).
>>>>> So feel free to add:
>>>>>
>>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>>
>>>>
>>>> Cc Jens, who may be vaguely interested in plans to finally merge this
>>>> three-year-old patch?
>>>>
>>>>
>>>>
>>>> From: Maxim Patlasov <MPatlasov@parallels.com>
>>>> Subject: mm: add strictlimit knob
>>>>
>>>> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
>>>> for FUSE which sets bdi max_ratio to 1% by default:
>>>>
>>>> http://article.gmane.org/gmane.linux.kernel.mm/105809
>>>
>>>
>>>
>>> That link is invalid for now, possibly due to the gmane site rebuild.
>>> I find an email thread here which looks relevant:
>>>
>>> https://sourceforge.net/p/fuse/mailman/message/35254883/
>>>
>>> Where Maxim has an interesting point:
>>>
>>>        > Did any one try increasing the limit and did see any
>>> better/worse
>>>>
>>>> performance ?
>>>
>>>
>>>        We've used 20% as default value in OpenVZ kernel for a long while
>>> (1%
>>> was not enough to saturate our distributed parallel storage).
>>>
>>> So the knob will also enable people to _disable_ the 1% fuse limit to
>>> increase performance.
>>>
>>> So people can use the exposed knob in 2 ways to fit their needs, which
>>> is in general a good thing.
>>>
>>> However the comment in wb_position_ratio() says
>>>
>>>                        Without strictlimit feature, fuse writeback may
>>>          * consume arbitrary amount of RAM because it is accounted in
>>>          * NR_WRITEBACK_TEMP which is not involved in calculating
>>> "nr_dirty".
>>>
>>> How dangerous would that be if some user disabled the 1% fuse limit
>>> through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
>>> beyond the user's expectation (20% max dirty limit)?
>>>
>>> Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
>>> WB_WRITEBACK, which should be throttled when bdi_write_congested().
>>> The congested flag will be set on
>>>
>>>        fuse_conn.num_background >= fuse_conn.congestion_threshold
>>>        So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
>>> it's not included in the 20% dirty limit.
>>
>>
>> Only balance_dirty_pages_ratelimited() is going to limit the
>> generation of dirty pages, I don't think congestion flags will do
>> that.
>
>
> Right. However my concern is something to limit the generation of
> fuse's _writeback_ pages.
>
> The normal writeback pages are limited in 2 ways:
>
> - balance_dirty_pages_ratelimited()'s dirty throttling:
>
>  nr_dirty + nr_writeback + nr_unstable < global and/or bdi dirty limit
>
> - block layer's nr_requests queue limit
>
> However fuse's NR_WRITEBACK_TEMP looks special and has none of such
> limits. The congested bit merely affect the vmscan pageout path.
>
>        pageout
>          may_write_to_inode
>            inode_write_congested
>              wb_congested
>
> I wonder if fuse has its own approach to limit NR_WRITEBACK_TEMP?
> Either explicitly or implicitly, there has to be some hard limit.
>
>> And (AFAICS) for fuse only  BDI_CAP_STRICTLIMIT will allow
>> accounting temp writeback pages when throttling dirty page generation.
>> So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode.
>> So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not
>> permit disabling it for fuse).
>
>
> So fuse relies on small nr_dirty. Does fuse impose any explicit or
> implicit rule that NR_WRITEBACK_TEMP will never exceed (N * nr_dirty)?
> Otherwise the size of NR_WRITEBACK_TEMP cannot be guaranteed.
>
> For example, is it possible for some process (eg. dd) to dirty pages
> as fast as possible while some other kernel logic to convert PG_dirty
> to NR_WRITEBACK_TEMP as fast as possible, so that even the 1% bdi
> strictlimit (which limits PG_dirty rather than NR_WRITEBACK_TEMP)
> cannot stop all memory being eat up by ever growing NR_WRITEBACK_TEMP?

Hmm,  temp pages are still accounted as WB_WRITEBACK until writeback
finishes.  Does that not count towards the dirty limit?

Thanks,
Miklos

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch 15/15] mm: add strictlimit knob
  2017-12-07 10:32           ` Miklos Szeredi
@ 2018-01-31 22:58             ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2018-01-31 22:58 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Fengguang Wu, Jan Kara, linux-mm, Maxim Patlasov, hmh, mel,
	t.artem, Theodore Ts'o, Jens Axboe, linux-fsdevel

On Thu, 7 Dec 2017 11:32:43 +0100 Miklos Szeredi <miklos@szeredi.hu> wrote:

> On Thu, Dec 7, 2017 at 11:15 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> > On Thu, Dec 07, 2017 at 09:50:23AM +0100, Miklos Szeredi wrote:
> >>
> >> On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu <fengguang.wu@intel.com>
> >> wrote:
> >>>
> >>> CC fuse maintainer, too.
> >>>
> >>> On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
> >>>>
> >>>>
> >>>> On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@suse.cz> wrote:
> >>>>
> >>>>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
> >>>>> > From: Maxim Patlasov <MPatlasov@parallels.com>
> >>>>> > Subject: mm: add strictlimit knob
> >>>>> >
> >>>>> > The "strictlimit" feature was introduced to enforce per-bdi dirty
> >>>>> > limits
> >>>>> > for FUSE which sets bdi max_ratio to 1% by default:
> >>>>> >
> >>>>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
> >>>>> >
> >>>>> > However the feature can be useful for other relatively slow or
> >>>>> > untrusted
> >>>>> > BDIs like USB flash drives and DVD+RW.  The patch adds a knob to
> >>>>> > enable
> >>>>> > the feature:
> >>>>> >
> >>>>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
> >>>>> >
> >>>>> > Being enabled, the feature enforces bdi max_ratio limit even if
> >>>>> > global
> >>>>> > (10%) dirty limit is not reached.  Of course, the effect is not
> >>>>> > visible
> >>>>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable
> >>>>> > value.
> >>>>>
> >>>>> In principle I have nothing against this and the usecase sounds
> >>>>> reasonable
> >>>>> (in fact I believe the lack of a feature like this is one of reasons
> >>>>> why
> >>>>> desktop automounters usually mount USB devices with 'sync' mount
> >>>>> option).
> >>>>> So feel free to add:
> >>>>>
> >>>>> Reviewed-by: Jan Kara <jack@suse.cz>
> >>>>>
> >>>>
> >>>> Cc Jens, who may be vaguely interested in plans to finally merge this
> >>>> three-year-old patch?
> >>>>
> >>>>
> >>>>
> >>>> From: Maxim Patlasov <MPatlasov@parallels.com>
> >>>> Subject: mm: add strictlimit knob
> >>>>
> >>>> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
> >>>> for FUSE which sets bdi max_ratio to 1% by default:
> >>>>
> >>>> http://article.gmane.org/gmane.linux.kernel.mm/105809
> >>>
> >>>
> >>>
> >>> That link is invalid for now, possibly due to the gmane site rebuild.
> >>> I find an email thread here which looks relevant:
> >>>
> >>> https://sourceforge.net/p/fuse/mailman/message/35254883/
> >>>
> >>> Where Maxim has an interesting point:
> >>>
> >>>        > Did any one try increasing the limit and did see any
> >>> better/worse
> >>>>
> >>>> performance ?
> >>>
> >>>
> >>>        We've used 20% as default value in OpenVZ kernel for a long while
> >>> (1%
> >>> was not enough to saturate our distributed parallel storage).
> >>>
> >>> So the knob will also enable people to _disable_ the 1% fuse limit to
> >>> increase performance.
> >>>
> >>> So people can use the exposed knob in 2 ways to fit their needs, which
> >>> is in general a good thing.
> >>>
> >>> However the comment in wb_position_ratio() says
> >>>
> >>>                        Without strictlimit feature, fuse writeback may
> >>>          * consume arbitrary amount of RAM because it is accounted in
> >>>          * NR_WRITEBACK_TEMP which is not involved in calculating
> >>> "nr_dirty".
> >>>
> >>> How dangerous would that be if some user disabled the 1% fuse limit
> >>> through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
> >>> beyond the user's expectation (20% max dirty limit)?
> >>>
> >>> Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
> >>> WB_WRITEBACK, which should be throttled when bdi_write_congested().
> >>> The congested flag will be set on
> >>>
> >>>        fuse_conn.num_background >= fuse_conn.congestion_threshold
> >>>        So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
> >>> it's not included in the 20% dirty limit.
> >>
> >>
> >> Only balance_dirty_pages_ratelimited() is going to limit the
> >> generation of dirty pages, I don't think congestion flags will do
> >> that.
> >
> >
> > Right. However my concern is something to limit the generation of
> > fuse's _writeback_ pages.
> >
> > The normal writeback pages are limited in 2 ways:
> >
> > - balance_dirty_pages_ratelimited()'s dirty throttling:
> >
> >  nr_dirty + nr_writeback + nr_unstable < global and/or bdi dirty limit
> >
> > - block layer's nr_requests queue limit
> >
> > However fuse's NR_WRITEBACK_TEMP looks special and has none of such
> > limits. The congested bit merely affect the vmscan pageout path.
> >
> >        pageout
> >          may_write_to_inode
> >            inode_write_congested
> >              wb_congested
> >
> > I wonder if fuse has its own approach to limit NR_WRITEBACK_TEMP?
> > Either explicitly or implicitly, there has to be some hard limit.
> >
> >> And (AFAICS) for fuse only  BDI_CAP_STRICTLIMIT will allow
> >> accounting temp writeback pages when throttling dirty page generation.
> >> So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode.
> >> So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not
> >> permit disabling it for fuse).
> >
> >
> > So fuse relies on small nr_dirty. Does fuse impose any explicit or
> > implicit rule that NR_WRITEBACK_TEMP will never exceed (N * nr_dirty)?
> > Otherwise the size of NR_WRITEBACK_TEMP cannot be guaranteed.
> >
> > For example, is it possible for some process (eg. dd) to dirty pages
> > as fast as possible while some other kernel logic to convert PG_dirty
> > to NR_WRITEBACK_TEMP as fast as possible, so that even the 1% bdi
> > strictlimit (which limits PG_dirty rather than NR_WRITEBACK_TEMP)
> > cannot stop all memory being eat up by ever growing NR_WRITEBACK_TEMP?
> 
> Hmm,  temp pages are still accounted as WB_WRITEBACK until writeback
> finishes.  Does that not count towards the dirty limit?
> 

This discussion died out and the patch is still "stuck" :(

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-31 22:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30 22:15 [patch 15/15] mm: add strictlimit knob akpm
2017-12-01 12:29 ` Jan Kara
2017-12-07  1:09   ` Andrew Morton
2017-12-07  4:14     ` Fengguang Wu
2017-12-07  8:50       ` Miklos Szeredi
2017-12-07 10:15         ` Fengguang Wu
2017-12-07 10:32           ` Miklos Szeredi
2018-01-31 22:58             ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.