All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-16 15:16 ` Laura Abbott
  0 siblings, 0 replies; 25+ messages in thread
From: Laura Abbott @ 2016-09-16 15:16 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Linus Torvalds, Andrew Morton, Cyrill Gorcunov,
	Christian Borntraeger, linux-mm, Linux Kernel Mailing List

Hi,

Fedora received a bug report[1] after pushing 4.7.2 that named
was segfaulting with named-chroot. With some help (thank you
tibbs!), it was noted that on older kernels named was spitting
out

mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
Will be forbidden soon.

and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
workaround for valgrind") it now spits out

mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
Update limits or use boot option ignore_rlimit_data.

Apparently the segfault goes away when dropping datasize=size.
I haven't looked into the named code yet but what I'm
suspecting is named is not setting its limits correctly and
then corrupting itself. This may have existed for much longer
but the rlimit is only now exposing it.

I'd like to propose reverting f4fcd55841fc ("mm: enable RLIMIT_DATA
by default with workaround for valgrind") or default to setting
ignore_rlimit_data to true and spitting out a warning until
named can be fixed.

Thanks,
Laura


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1374917

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-16 15:16 ` Laura Abbott
  0 siblings, 0 replies; 25+ messages in thread
From: Laura Abbott @ 2016-09-16 15:16 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Linus Torvalds, Andrew Morton, Cyrill Gorcunov,
	Christian Borntraeger, linux-mm, Linux Kernel Mailing List

Hi,

Fedora received a bug report[1] after pushing 4.7.2 that named
was segfaulting with named-chroot. With some help (thank you
tibbs!), it was noted that on older kernels named was spitting
out

mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
Will be forbidden soon.

and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
workaround for valgrind") it now spits out

mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
Update limits or use boot option ignore_rlimit_data.

Apparently the segfault goes away when dropping datasize=size.
I haven't looked into the named code yet but what I'm
suspecting is named is not setting its limits correctly and
then corrupting itself. This may have existed for much longer
but the rlimit is only now exposing it.

I'd like to propose reverting f4fcd55841fc ("mm: enable RLIMIT_DATA
by default with workaround for valgrind") or default to setting
ignore_rlimit_data to true and spitting out a warning until
named can be fixed.

Thanks,
Laura


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1374917

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-16 15:16 ` Laura Abbott
@ 2016-09-16 17:46   ` Linus Torvalds
  -1 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2016-09-16 17:46 UTC (permalink / raw)
  To: Laura Abbott, Sam Varshavchik, Brent
  Cc: Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov,
	Christian Borntraeger, linux-mm, Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 8:16 AM, Laura Abbott <labbott@redhat.com> wrote:
>
> Fedora received a bug report[1] after pushing 4.7.2 that named
> was segfaulting with named-chroot. With some help (thank you
> tibbs!), it was noted that on older kernels named was spitting
> out
>
> mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
> Will be forbidden soon.
>
> and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
> workaround for valgrind") it now spits out
>
> mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
> Update limits or use boot option ignore_rlimit_data.

Ok, we can certainly revert, but before we do that I'd like to
understand a few more things.

For example, where the data limit came from, and how likely this is to
hit others that have a much harder time fixing it. Adding Sam
Varshavchik and Brent to the participants list...

In particular, this is clearly trivially fixable as noted by Brent in
that bugzilla entry:

  'remove the "datasize 20M;" directive in named.conf'

along with the (much worse) option of "use boot option
ignore_rlimit_data" that the kernel dmesg itself suggests as an
option.

So for example, if that "datasize 20M;" is coming from just the Fedora
named package, it would be much nicer to just get that fixed instead.
Because RLIMIT_DATA the old way was just meaningless noise.

We definitely don't want to break peoples existing setups, but as this
is *so* easy to fix in other ways (even at runtime without even
updating a kernel), and since this commit is already four months old
by now with this single bugzilla being the only report since then that
I'm aware of, my reaction is just that there are better ways to fix it
than reverting a commit that can be worked around trivially.

                 Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-16 17:46   ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2016-09-16 17:46 UTC (permalink / raw)
  To: Laura Abbott, Sam Varshavchik, Brent
  Cc: Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov,
	Christian Borntraeger, linux-mm, Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 8:16 AM, Laura Abbott <labbott@redhat.com> wrote:
>
> Fedora received a bug report[1] after pushing 4.7.2 that named
> was segfaulting with named-chroot. With some help (thank you
> tibbs!), it was noted that on older kernels named was spitting
> out
>
> mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
> Will be forbidden soon.
>
> and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
> workaround for valgrind") it now spits out
>
> mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
> Update limits or use boot option ignore_rlimit_data.

Ok, we can certainly revert, but before we do that I'd like to
understand a few more things.

For example, where the data limit came from, and how likely this is to
hit others that have a much harder time fixing it. Adding Sam
Varshavchik and Brent to the participants list...

In particular, this is clearly trivially fixable as noted by Brent in
that bugzilla entry:

  'remove the "datasize 20M;" directive in named.conf'

along with the (much worse) option of "use boot option
ignore_rlimit_data" that the kernel dmesg itself suggests as an
option.

So for example, if that "datasize 20M;" is coming from just the Fedora
named package, it would be much nicer to just get that fixed instead.
Because RLIMIT_DATA the old way was just meaningless noise.

We definitely don't want to break peoples existing setups, but as this
is *so* easy to fix in other ways (even at runtime without even
updating a kernel), and since this commit is already four months old
by now with this single bugzilla being the only report since then that
I'm aware of, my reaction is just that there are better ways to fix it
than reverting a commit that can be worked around trivially.

                 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-16 17:46   ` Linus Torvalds
@ 2016-09-16 20:10     ` Laura Abbott
  -1 siblings, 0 replies; 25+ messages in thread
From: Laura Abbott @ 2016-09-16 20:10 UTC (permalink / raw)
  To: Linus Torvalds, Sam Varshavchik, Brent
  Cc: Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov,
	Christian Borntraeger, linux-mm, Linux Kernel Mailing List

On 09/16/2016 10:46 AM, Linus Torvalds wrote:
> On Fri, Sep 16, 2016 at 8:16 AM, Laura Abbott <labbott@redhat.com> wrote:
>>
>> Fedora received a bug report[1] after pushing 4.7.2 that named
>> was segfaulting with named-chroot. With some help (thank you
>> tibbs!), it was noted that on older kernels named was spitting
>> out
>>
>> mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
>> Will be forbidden soon.
>>
>> and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
>> workaround for valgrind") it now spits out
>>
>> mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
>> Update limits or use boot option ignore_rlimit_data.
>
> Ok, we can certainly revert, but before we do that I'd like to
> understand a few more things.
>
> For example, where the data limit came from, and how likely this is to
> hit others that have a much harder time fixing it. Adding Sam
> Varshavchik and Brent to the participants list...
>
> In particular, this is clearly trivially fixable as noted by Brent in
> that bugzilla entry:
>
>   'remove the "datasize 20M;" directive in named.conf'
>
> along with the (much worse) option of "use boot option
> ignore_rlimit_data" that the kernel dmesg itself suggests as an
> option.
>
> So for example, if that "datasize 20M;" is coming from just the Fedora
> named package, it would be much nicer to just get that fixed instead.
> Because RLIMIT_DATA the old way was just meaningless noise.
>

As far as I can tell this isn't Fedora specific.

> We definitely don't want to break peoples existing setups, but as this
> is *so* easy to fix in other ways (even at runtime without even
> updating a kernel), and since this commit is already four months old
> by now with this single bugzilla being the only report since then that
> I'm aware of, my reaction is just that there are better ways to fix it
> than reverting a commit that can be worked around trivially.

I was debating the merits of a revert. My concern is that this bugzilla
just represents the people who are reporting the bug and able to
correlate it to named. The actual number of people who are seeing
problems may be higher and anyone mucking with their config
could hit this and then have to go through troubleshooting steps again.
Add a config, get a segfault is a pretty terrible experience even
by Linux standards. I'd feel better about not reverting if there
were a proposed patch for named

I would like to see RLIMIT_DATA actually do something useful so worse
case I'll figure out something to carry in Fedora and this thread
can be an FYI for people googling.

>
>                  Linus
>

Thanks,
Laura

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-16 20:10     ` Laura Abbott
  0 siblings, 0 replies; 25+ messages in thread
From: Laura Abbott @ 2016-09-16 20:10 UTC (permalink / raw)
  To: Linus Torvalds, Sam Varshavchik, Brent
  Cc: Konstantin Khlebnikov, Andrew Morton, Cyrill Gorcunov,
	Christian Borntraeger, linux-mm, Linux Kernel Mailing List

On 09/16/2016 10:46 AM, Linus Torvalds wrote:
> On Fri, Sep 16, 2016 at 8:16 AM, Laura Abbott <labbott@redhat.com> wrote:
>>
>> Fedora received a bug report[1] after pushing 4.7.2 that named
>> was segfaulting with named-chroot. With some help (thank you
>> tibbs!), it was noted that on older kernels named was spitting
>> out
>>
>> mmap: named (671): VmData 27566080 exceed data ulimit 23068672.
>> Will be forbidden soon.
>>
>> and with f4fcd55841fc ("mm: enable RLIMIT_DATA by default with
>> workaround for valgrind") it now spits out
>>
>> mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
>> Update limits or use boot option ignore_rlimit_data.
>
> Ok, we can certainly revert, but before we do that I'd like to
> understand a few more things.
>
> For example, where the data limit came from, and how likely this is to
> hit others that have a much harder time fixing it. Adding Sam
> Varshavchik and Brent to the participants list...
>
> In particular, this is clearly trivially fixable as noted by Brent in
> that bugzilla entry:
>
>   'remove the "datasize 20M;" directive in named.conf'
>
> along with the (much worse) option of "use boot option
> ignore_rlimit_data" that the kernel dmesg itself suggests as an
> option.
>
> So for example, if that "datasize 20M;" is coming from just the Fedora
> named package, it would be much nicer to just get that fixed instead.
> Because RLIMIT_DATA the old way was just meaningless noise.
>

As far as I can tell this isn't Fedora specific.

> We definitely don't want to break peoples existing setups, but as this
> is *so* easy to fix in other ways (even at runtime without even
> updating a kernel), and since this commit is already four months old
> by now with this single bugzilla being the only report since then that
> I'm aware of, my reaction is just that there are better ways to fix it
> than reverting a commit that can be worked around trivially.

I was debating the merits of a revert. My concern is that this bugzilla
just represents the people who are reporting the bug and able to
correlate it to named. The actual number of people who are seeing
problems may be higher and anyone mucking with their config
could hit this and then have to go through troubleshooting steps again.
Add a config, get a segfault is a pretty terrible experience even
by Linux standards. I'd feel better about not reverting if there
were a proposed patch for named

I would like to see RLIMIT_DATA actually do something useful so worse
case I'll figure out something to carry in Fedora and this thread
can be an FYI for people googling.

>
>                  Linus
>

Thanks,
Laura

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-16 20:10     ` Laura Abbott
@ 2016-09-16 20:32       ` Linus Torvalds
  -1 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2016-09-16 20:32 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Sam Varshavchik, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 1:10 PM, Laura Abbott <labbott@redhat.com> wrote:
>
> As far as I can tell this isn't Fedora specific.

Some googling does seem to say that "datalimit 20M" and "named.conf"
ends up being some really old default that just gets endlessly copied.

So no, it's not Fedora-specific per se.

But I suspect most people with a named.conf did either

 (a) get it from their distro and didn't change it and so if the
distro just updates theirs, things will automatically "just work"

 (b) actually did write their own (or at least edited it), and knows
what they are doing, and have absolutely no problem removing or
updating that datalimit thing.

> I would like to see RLIMIT_DATA actually do something useful so worse
> case I'll figure out something to carry in Fedora and this thread
> can be an FYI for people googling.

Yeah, even if we only get a good hit for "named segmentation fault", I
guess that will help people a lot.

The really annoying thing seems to be that the kernel message has been
hidden too much. IOW, Sam in his bugzilla report clearly found the
system messages with

    Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of
user 25 dumped core.

but for some reason never noticed the kernel saying (quoting Jason):

   mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
Update limits or use boot option ignore_rlimit_data

at the same time.

Ok, the kernel only says it *once*. Maybe Sam had it in his logs, but
didn't notice the initial failure (which would have had the kernel
message too), and he then looked at the logs for when he tried to
re-start.

Or maybe the system logs don't have those kernel messages, which would
be a disaster.

So maybe we should just change the "pr_warn_once()" into
"pr_warn_ratelimited()", except the default rate limits for that are
wrong (we'd perhaps want something like "at most once every minute" or
similar, while the default rate limits are along the lines of "max 10
lines every 5 _seconds_").

Sam, do you end up seeing the kernel warning in your logs if you just
go back earlier in the boot?

                    Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-16 20:32       ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2016-09-16 20:32 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Sam Varshavchik, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 1:10 PM, Laura Abbott <labbott@redhat.com> wrote:
>
> As far as I can tell this isn't Fedora specific.

Some googling does seem to say that "datalimit 20M" and "named.conf"
ends up being some really old default that just gets endlessly copied.

So no, it's not Fedora-specific per se.

But I suspect most people with a named.conf did either

 (a) get it from their distro and didn't change it and so if the
distro just updates theirs, things will automatically "just work"

 (b) actually did write their own (or at least edited it), and knows
what they are doing, and have absolutely no problem removing or
updating that datalimit thing.

> I would like to see RLIMIT_DATA actually do something useful so worse
> case I'll figure out something to carry in Fedora and this thread
> can be an FYI for people googling.

Yeah, even if we only get a good hit for "named segmentation fault", I
guess that will help people a lot.

The really annoying thing seems to be that the kernel message has been
hidden too much. IOW, Sam in his bugzilla report clearly found the
system messages with

    Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of
user 25 dumped core.

but for some reason never noticed the kernel saying (quoting Jason):

   mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
Update limits or use boot option ignore_rlimit_data

at the same time.

Ok, the kernel only says it *once*. Maybe Sam had it in his logs, but
didn't notice the initial failure (which would have had the kernel
message too), and he then looked at the logs for when he tried to
re-start.

Or maybe the system logs don't have those kernel messages, which would
be a disaster.

So maybe we should just change the "pr_warn_once()" into
"pr_warn_ratelimited()", except the default rate limits for that are
wrong (we'd perhaps want something like "at most once every minute" or
similar, while the default rate limits are along the lines of "max 10
lines every 5 _seconds_").

Sam, do you end up seeing the kernel warning in your logs if you just
go back earlier in the boot?

                    Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-16 20:32       ` Linus Torvalds
@ 2016-09-16 22:30         ` Sam Varshavchik
  -1 siblings, 0 replies; 25+ messages in thread
From: Sam Varshavchik @ 2016-09-16 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

Linus Torvalds writes:

> On Fri, Sep 16, 2016 at 1:10 PM, Laura Abbott <labbott@redhat.com> wrote:
> >
> > As far as I can tell this isn't Fedora specific.
>
> Some googling does seem to say that "datalimit 20M" and "named.conf"
> ends up being some really old default that just gets endlessly copied.
>
> So no, it's not Fedora-specific per se.

I'll confirm that.

It's been sitting in my named.conf for at least ten years. I don't remember  
where it came from. The Google sources are very likely. I probably copied  
it, from some tutorial.

> But I suspect most people with a named.conf did either
>
>  (a) get it from their distro and didn't change it and so if the
> distro just updates theirs, things will automatically "just work"
>
>  (b) actually did write their own (or at least edited it), and knows
> what they are doing, and have absolutely no problem removing or
> updating that datalimit thing.

(b) in my case. Now that the root cause is mostly known, I'll just bump it  
up.

> The really annoying thing seems to be that the kernel message has been
> hidden too much. IOW, Sam in his bugzilla report clearly found the
> system messages with
>
>     Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of
> user 25 dumped core.
>
> but for some reason never noticed the kernel saying (quoting Jason):
>
>    mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
> Update limits or use boot option ignore_rlimit_data
>
> at the same time.
>
> Ok, the kernel only says it *once*. Maybe Sam had it in his logs, but
> didn't notice the initial failure (which would have had the kernel
> message too), and he then looked at the logs for when he tried to
> re-start.

I still have this log file. Looking over it, this is indeed what happened.

> Or maybe the system logs don't have those kernel messages, which would
> be a disaster.
>
> So maybe we should just change the "pr_warn_once()" into
> "pr_warn_ratelimited()", except the default rate limits for that are
> wrong (we'd perhaps want something like "at most once every minute" or
> similar, while the default rate limits are along the lines of "max 10
> lines every 5 _seconds_").
>
> Sam, do you end up seeing the kernel warning in your logs if you just
> go back earlier in the boot?

Yes, I found it.

Sep 10 07:36:29 shorty kernel: mmap: named (1108): VmData 52588544 exceed  
data ulimit 20971520. Update limits or use boot option ignore_rlimit_data.

Now that I know what to search for: this appeared about 300 lines earlier in  
/var/log/messages.

When trying to figure out what's going on with named, searching backwards in  
time, and finding the logged segault @07:38:23, IIRC I only looked as far  
back until the @07:38:23 timestamp started, and did not see anything other  
the apparent segfault. Before that, /var/log/messages was full of other  
noise. The original named that was launched two minutes earlier was ancient  
history, by then.

All I saw was that named was apparently segfaulting after booting a new  
kernel. Ok, boot back to the previous kernel, search bugzilla to see if it  
was reported already, and, if not, create it yourself. That's what happened.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-16 22:30         ` Sam Varshavchik
  0 siblings, 0 replies; 25+ messages in thread
From: Sam Varshavchik @ 2016-09-16 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

Linus Torvalds writes:

> On Fri, Sep 16, 2016 at 1:10 PM, Laura Abbott <labbott@redhat.com> wrote:
> >
> > As far as I can tell this isn't Fedora specific.
>
> Some googling does seem to say that "datalimit 20M" and "named.conf"
> ends up being some really old default that just gets endlessly copied.
>
> So no, it's not Fedora-specific per se.

I'll confirm that.

It's been sitting in my named.conf for at least ten years. I don't remember  
where it came from. The Google sources are very likely. I probably copied  
it, from some tutorial.

> But I suspect most people with a named.conf did either
>
>  (a) get it from their distro and didn't change it and so if the
> distro just updates theirs, things will automatically "just work"
>
>  (b) actually did write their own (or at least edited it), and knows
> what they are doing, and have absolutely no problem removing or
> updating that datalimit thing.

(b) in my case. Now that the root cause is mostly known, I'll just bump it  
up.

> The really annoying thing seems to be that the kernel message has been
> hidden too much. IOW, Sam in his bugzilla report clearly found the
> system messages with
>
>     Sep 10 07:38:23 shorty systemd-coredump: Process 1651 (named) of
> user 25 dumped core.
>
> but for some reason never noticed the kernel saying (quoting Jason):
>
>    mmap: named (593): VmData 27566080 exceed data ulimit 20971520.
> Update limits or use boot option ignore_rlimit_data
>
> at the same time.
>
> Ok, the kernel only says it *once*. Maybe Sam had it in his logs, but
> didn't notice the initial failure (which would have had the kernel
> message too), and he then looked at the logs for when he tried to
> re-start.

I still have this log file. Looking over it, this is indeed what happened.

> Or maybe the system logs don't have those kernel messages, which would
> be a disaster.
>
> So maybe we should just change the "pr_warn_once()" into
> "pr_warn_ratelimited()", except the default rate limits for that are
> wrong (we'd perhaps want something like "at most once every minute" or
> similar, while the default rate limits are along the lines of "max 10
> lines every 5 _seconds_").
>
> Sam, do you end up seeing the kernel warning in your logs if you just
> go back earlier in the boot?

Yes, I found it.

Sep 10 07:36:29 shorty kernel: mmap: named (1108): VmData 52588544 exceed  
data ulimit 20971520. Update limits or use boot option ignore_rlimit_data.

Now that I know what to search for: this appeared about 300 lines earlier in  
/var/log/messages.

When trying to figure out what's going on with named, searching backwards in  
time, and finding the logged segault @07:38:23, IIRC I only looked as far  
back until the @07:38:23 timestamp started, and did not see anything other  
the apparent segfault. Before that, /var/log/messages was full of other  
noise. The original named that was launched two minutes earlier was ancient  
history, by then.

All I saw was that named was apparently segfaulting after booting a new  
kernel. Ok, boot back to the previous kernel, search bugzilla to see if it  
was reported already, and, if not, create it yourself. That's what happened.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-16 22:30         ` Sam Varshavchik
  (?)
@ 2016-09-16 23:58         ` Linus Torvalds
  2016-09-17  0:04             ` Linus Torvalds
  -1 siblings, 1 reply; 25+ messages in thread
From: Linus Torvalds @ 2016-09-16 23:58 UTC (permalink / raw)
  To: Sam Varshavchik, Ingo Molnar, Joe Perches
  Cc: Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1548 bytes --]

On Fri, Sep 16, 2016 at 3:30 PM, Sam Varshavchik <mrsam@courier-mta.com> wrote:
>>
>> Sam, do you end up seeing the kernel warning in your logs if you just
>> go back earlier in the boot?
>
> Yes, I found it.
>
> Sep 10 07:36:29 shorty kernel: mmap: named (1108): VmData 52588544 exceed
> data ulimit 20971520. Update limits or use boot option ignore_rlimit_data.
>
> Now that I know what to search for: this appeared about 300 lines earlier in
> /var/log/messages.

Ok, so that's a pretty strong argument that we shouldn't just warn once.

Maybe the warning happened at bootup, and it is now three months
later, and somebody notices that something doesn't work. It might not
be *critical* (three months without working implies it isn't), but it
sure is silly for the kernel to say "yeah, I already warned you, I'm
not going to tell you why it's not working any more".

So it sounds like if the kernel had just had a that warning be
rate-limited instead of happening only once, there would never have
been any confusion about the RLIMIT_DATA change.

Doing a grep for "pr_warn_once()", I get the feeling that we could
just change the definition of "once" to be "at most once per minute"
and everybody would be happy.

Maybe we could change all the "pr_xyz_once()" to consider "once" to be
a softer "at most once per minute" thing. After all, these things are
*supposed* to be very uncommon to begin with, but when they do happen
we do want the user to be aware of them.

Here's a totally untested patch. What do people say?

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1579 bytes --]

 include/linux/printk.h | 38 +++++++++++++++-----------------------
 1 file changed, 15 insertions(+), 23 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 696a56be7d3e..ae98c388a377 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -316,31 +316,23 @@ extern asmlinkage void dump_stack(void) __cold;
 
 /*
  * Print a one-time message (analogous to WARN_ONCE() et al):
+ *
+ * "once" here is a misnomer. It's shorthand for "at most once a minute".
  */
-
 #ifdef CONFIG_PRINTK
-#define printk_once(fmt, ...)					\
-({								\
-	static bool __print_once __read_mostly;			\
-	bool __ret_print_once = !__print_once;			\
-								\
-	if (!__print_once) {					\
-		__print_once = true;				\
-		printk(fmt, ##__VA_ARGS__);			\
-	}							\
-	unlikely(__ret_print_once);				\
-})
-#define printk_deferred_once(fmt, ...)				\
-({								\
-	static bool __print_once __read_mostly;			\
-	bool __ret_print_once = !__print_once;			\
-								\
-	if (!__print_once) {					\
-		__print_once = true;				\
-		printk_deferred(fmt, ##__VA_ARGS__);		\
-	}							\
-	unlikely(__ret_print_once);				\
-})
+
+#define do_just_once(stmt)	({			\
+	static DEFINE_RATELIMIT_STATE(_rs, HZ*60, 1);	\
+	bool __do_it = __ratelimit(&_rs);		\
+	if (unlikely(__do_it))				\
+		stmt;					\
+	unlikely(__do_it); })
+
+#define printk_once(fmt, ...) \
+	do_just_once(printk(fmt, ##__VA_ARGS__))
+#define printk_deferred_once(fmt, ...) \
+	do_just_once(printk_deferred(fmt, ##__VA_ARGS__))
+
 #else
 #define printk_once(fmt, ...)					\
 	no_printk(fmt, ##__VA_ARGS__)

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-16 23:58         ` Linus Torvalds
@ 2016-09-17  0:04             ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2016-09-17  0:04 UTC (permalink / raw)
  To: Sam Varshavchik, Ingo Molnar, Joe Perches
  Cc: Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 4:58 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Here's a totally untested patch. What do people say?

Heh. It looks like "pr_xyz_once()" is used in places that haven't
included "ratelimit.h", so this doesn't actually build for everything.

But I guess as a concept patch it's not hard to understand, even if
the implementation needs a bit of tweaking.

             Linus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-17  0:04             ` Linus Torvalds
  0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2016-09-17  0:04 UTC (permalink / raw)
  To: Sam Varshavchik, Ingo Molnar, Joe Perches
  Cc: Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

On Fri, Sep 16, 2016 at 4:58 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Here's a totally untested patch. What do people say?

Heh. It looks like "pr_xyz_once()" is used in places that haven't
included "ratelimit.h", so this doesn't actually build for everything.

But I guess as a concept patch it's not hard to understand, even if
the implementation needs a bit of tweaking.

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-17  0:04             ` Linus Torvalds
@ 2016-09-17  4:08               ` Joe Perches
  -1 siblings, 0 replies; 25+ messages in thread
From: Joe Perches @ 2016-09-17  4:08 UTC (permalink / raw)
  To: Linus Torvalds, Sam Varshavchik, Ingo Molnar
  Cc: Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

On Fri, 2016-09-16 at 17:040700, Linus Torvalds wrote:
> On Fri, Sep 16, 2016 at 4:58 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > Here's a totally untested patch. What do people say?
> Heh. It looks like "pr_xyz_once()" is used in places that haven't
> included "ratelimit.h", so this doesn't actually build for everything.
> But I guess as a concept patch it's not hard to understand, even if
> the implementation needs a bit of tweaking.

do_just_once just isn't a good name for a global
rate limited mechanism that does something very
different than the name.

Maybe allow_once_per_ratelimit or the like

There could be an equivalent do_once

https://lkml.org/lkml/2009/5/22/3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-17  4:08               ` Joe Perches
  0 siblings, 0 replies; 25+ messages in thread
From: Joe Perches @ 2016-09-17  4:08 UTC (permalink / raw)
  To: Linus Torvalds, Sam Varshavchik, Ingo Molnar
  Cc: Laura Abbott, Brent, Konstantin Khlebnikov, Andrew Morton,
	Cyrill Gorcunov, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

On Fri, 2016-09-16 at 17:040700, Linus Torvalds wrote:
> On Fri, Sep 16, 2016 at 4:58 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > Here's a totally untested patch. What do people say?
> Heh. It looks like "pr_xyz_once()" is used in places that haven't
> included "ratelimit.h", so this doesn't actually build for everything.
> But I guess as a concept patch it's not hard to understand, even if
> the implementation needs a bit of tweaking.

do_just_once just isn't a good name for a global
rate limited mechanism that does something very
different than the name.

Maybe allow_once_per_ratelimit or the like

There could be an equivalent do_once

https://lkml.org/lkml/2009/5/22/3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-17  4:08               ` Joe Perches
  (?)
@ 2016-09-17  8:33               ` Konstantin Khlebnikov
  2016-09-17  9:09                   ` Cyrill Gorcunov
  -1 siblings, 1 reply; 25+ messages in thread
From: Konstantin Khlebnikov @ 2016-09-17  8:33 UTC (permalink / raw)
  To: Joe Perches
  Cc: Linus Torvalds, Sam Varshavchik, Ingo Molnar, Laura Abbott,
	Brent, Andrew Morton, Cyrill Gorcunov, Christian Borntraeger,
	linux-mm, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 959 bytes --]

On Sat, Sep 17, 2016 at 7:08 AM, Joe Perches <joe@perches.com> wrote:
> On Fri, 2016-09-16 at 17:040700, Linus Torvalds wrote:
>> On Fri, Sep 16, 2016 at 4:58 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> > Here's a totally untested patch. What do people say?
>> Heh. It looks like "pr_xyz_once()" is used in places that haven't
>> included "ratelimit.h", so this doesn't actually build for everything.
>> But I guess as a concept patch it's not hard to understand, even if
>> the implementation needs a bit of tweaking.
>
> do_just_once just isn't a good name for a global
> rate limited mechanism that does something very
> different than the name.
>
> Maybe allow_once_per_ratelimit or the like
>
> There could be an equivalent do_once
>
> https://lkml.org/lkml/2009/5/22/3
>

What about this printk_reriodic() and pr_warn_once_per_minute()?

It simply remembers next jiffies to print rather than using that
complicated ratelimiting engine.

[-- Attachment #2: printk-add-pr_warn_once_per_minute --]
[-- Type: application/octet-stream, Size: 2273 bytes --]

printk: add pr_warn_once_per_minute

From: Konstantin Khlebnikov <koct9i@gmail.com>

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
---
 include/linux/printk.h |   16 ++++++++++++++++
 mm/mmap.c              |    2 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 696a56be7d3e..af1646483dbb 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -341,11 +341,24 @@ extern asmlinkage void dump_stack(void) __cold;
 	}							\
 	unlikely(__ret_print_once);				\
 })
+#define printk_periodic(period, fmt, ...)			\
+({								\
+	static unsigned long __print_next __read_mostly = INITIAL_JIFFIES; \
+	bool __do_print = time_after_eq(jiffies, __print_next); \
+								\
+	if (__do_print) {					\
+		__print_next = jiffies + (period);		\
+		printk(fmt, ##__VA_ARGS__);			\
+	}							\
+	unlikely(__do_print);					\
+})
 #else
 #define printk_once(fmt, ...)					\
 	no_printk(fmt, ##__VA_ARGS__)
 #define printk_deferred_once(fmt, ...)				\
 	no_printk(fmt, ##__VA_ARGS__)
+#define printk_periodic(period, fmt, ...)			\
+	no_printk(fmt, ##__VA_ARGS__)
 #endif
 
 #define pr_emerg_once(fmt, ...)					\
@@ -365,6 +378,9 @@ extern asmlinkage void dump_stack(void) __cold;
 #define pr_cont_once(fmt, ...)					\
 	printk_once(KERN_CONT pr_fmt(fmt), ##__VA_ARGS__)
 
+#define pr_warn_once_per_minute(fmt, ...)			\
+	printk_periodic(HZ * 60, KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+
 #if defined(DEBUG)
 #define pr_devel_once(fmt, ...)					\
 	printk_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/mmap.c b/mm/mmap.c
index ca9d91bca0d6..34f9fb2adcab 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2935,7 +2935,7 @@ bool may_expand_vm(struct mm_struct *mm, vm_flags_t flags, unsigned long npages)
 		    mm->data_vm + npages <= rlimit_max(RLIMIT_DATA) >> PAGE_SHIFT)
 			return true;
 		if (!ignore_rlimit_data) {
-			pr_warn_once("%s (%d): VmData %lu exceed data ulimit %lu. Update limits or use boot option ignore_rlimit_data.\n",
+			pr_warn_once_per_minute("%s (%d): VmData %lu exceed data ulimit %lu. Update limits or use boot option ignore_rlimit_data.\n",
 				     current->comm, current->pid,
 				     (mm->data_vm + npages) << PAGE_SHIFT,
 				     rlimit(RLIMIT_DATA));

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-17  8:33               ` Konstantin Khlebnikov
@ 2016-09-17  9:09                   ` Cyrill Gorcunov
  0 siblings, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2016-09-17  9:09 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar,
	Laura Abbott, Brent, Andrew Morton, Christian Borntraeger,
	linux-mm, Linux Kernel Mailing List

On Sat, Sep 17, 2016 at 11:33:56AM +0300, Konstantin Khlebnikov wrote:
> >
> > do_just_once just isn't a good name for a global
> > rate limited mechanism that does something very
> > different than the name.
> >
> > Maybe allow_once_per_ratelimit or the like
> >
> > There could be an equivalent do_once
> >
> > https://lkml.org/lkml/2009/5/22/3
> >
> 
> What about this printk_reriodic() and pr_warn_once_per_minute()?
> 
> It simply remembers next jiffies to print rather than using that
> complicated ratelimiting engine.

+#define printk_periodic(period, fmt, ...)                      \
+({                                                             \
+	static unsigned long __print_next __read_mostly = INITIAL_JIFFIES; \
+	bool __do_print = time_after_eq(jiffies, __print_next); \
+                                                               \
+	if (__do_print) {                                       \
+               __print_next = jiffies + (period);              \
+               printk(fmt, ##__VA_ARGS__);                     \
+	}                                                       \
+	unlikely(__do_print);                                   \
+})

Seems I don't understand the bottom unlikely...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-17  9:09                   ` Cyrill Gorcunov
  0 siblings, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2016-09-17  9:09 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar,
	Laura Abbott, Brent, Andrew Morton, Christian Borntraeger,
	linux-mm, Linux Kernel Mailing List

On Sat, Sep 17, 2016 at 11:33:56AM +0300, Konstantin Khlebnikov wrote:
> >
> > do_just_once just isn't a good name for a global
> > rate limited mechanism that does something very
> > different than the name.
> >
> > Maybe allow_once_per_ratelimit or the like
> >
> > There could be an equivalent do_once
> >
> > https://lkml.org/lkml/2009/5/22/3
> >
> 
> What about this printk_reriodic() and pr_warn_once_per_minute()?
> 
> It simply remembers next jiffies to print rather than using that
> complicated ratelimiting engine.

+#define printk_periodic(period, fmt, ...)                      \
+({                                                             \
+	static unsigned long __print_next __read_mostly = INITIAL_JIFFIES; \
+	bool __do_print = time_after_eq(jiffies, __print_next); \
+                                                               \
+	if (__do_print) {                                       \
+               __print_next = jiffies + (period);              \
+               printk(fmt, ##__VA_ARGS__);                     \
+	}                                                       \
+	unlikely(__do_print);                                   \
+})

Seems I don't understand the bottom unlikely...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-17  9:09                   ` Cyrill Gorcunov
@ 2016-09-17 12:09                     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 25+ messages in thread
From: Konstantin Khlebnikov @ 2016-09-17 12:09 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar,
	Laura Abbott, Brent, Andrew Morton, Christian Borntraeger,
	linux-mm, Linux Kernel Mailing List

On Sat, Sep 17, 2016 at 12:09 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On Sat, Sep 17, 2016 at 11:33:56AM +0300, Konstantin Khlebnikov wrote:
>> >
>> > do_just_once just isn't a good name for a global
>> > rate limited mechanism that does something very
>> > different than the name.
>> >
>> > Maybe allow_once_per_ratelimit or the like
>> >
>> > There could be an equivalent do_once
>> >
>> > https://lkml.org/lkml/2009/5/22/3
>> >
>>
>> What about this printk_reriodic() and pr_warn_once_per_minute()?
>>
>> It simply remembers next jiffies to print rather than using that
>> complicated ratelimiting engine.
>
> +#define printk_periodic(period, fmt, ...)                      \
> +({                                                             \
> +       static unsigned long __print_next __read_mostly = INITIAL_JIFFIES; \
> +       bool __do_print = time_after_eq(jiffies, __print_next); \
> +                                                               \
> +       if (__do_print) {                                       \
> +               __print_next = jiffies + (period);              \
> +               printk(fmt, ##__VA_ARGS__);                     \
> +       }                                                       \
> +       unlikely(__do_print);                                   \
> +})
>
> Seems I don't understand the bottom unlikely...

This is gcc extrension:  https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
Here macro works as a function which returns bool

After second though macro should update __print_next if it's too far
if first warning happens too late here will long period of silence
untill next jiffies overlap.

something like

#define printk_periodic(period, fmt, ...)
({
static unsigned long __print_next = INITIAL_JIFFIES;
unsigned long __print_jiffies = jiffies;
bool __do_print = time_after_eq(__print_jiffies, __print_next);

if (__do_print) {
        __print_next = __print_jiffies + (period);
        printk(fmt, ##__VA_ARGS__);
} else if (time_after(__print_next, __print_jiffies + (period))
        __print_next = __print_jiffies + (period);
unlikely(__do_print);
})

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-17 12:09                     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 25+ messages in thread
From: Konstantin Khlebnikov @ 2016-09-17 12:09 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar,
	Laura Abbott, Brent, Andrew Morton, Christian Borntraeger,
	linux-mm, Linux Kernel Mailing List

On Sat, Sep 17, 2016 at 12:09 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On Sat, Sep 17, 2016 at 11:33:56AM +0300, Konstantin Khlebnikov wrote:
>> >
>> > do_just_once just isn't a good name for a global
>> > rate limited mechanism that does something very
>> > different than the name.
>> >
>> > Maybe allow_once_per_ratelimit or the like
>> >
>> > There could be an equivalent do_once
>> >
>> > https://lkml.org/lkml/2009/5/22/3
>> >
>>
>> What about this printk_reriodic() and pr_warn_once_per_minute()?
>>
>> It simply remembers next jiffies to print rather than using that
>> complicated ratelimiting engine.
>
> +#define printk_periodic(period, fmt, ...)                      \
> +({                                                             \
> +       static unsigned long __print_next __read_mostly = INITIAL_JIFFIES; \
> +       bool __do_print = time_after_eq(jiffies, __print_next); \
> +                                                               \
> +       if (__do_print) {                                       \
> +               __print_next = jiffies + (period);              \
> +               printk(fmt, ##__VA_ARGS__);                     \
> +       }                                                       \
> +       unlikely(__do_print);                                   \
> +})
>
> Seems I don't understand the bottom unlikely...

This is gcc extrension:  https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
Here macro works as a function which returns bool

After second though macro should update __print_next if it's too far
if first warning happens too late here will long period of silence
untill next jiffies overlap.

something like

#define printk_periodic(period, fmt, ...)
({
static unsigned long __print_next = INITIAL_JIFFIES;
unsigned long __print_jiffies = jiffies;
bool __do_print = time_after_eq(__print_jiffies, __print_next);

if (__do_print) {
        __print_next = __print_jiffies + (period);
        printk(fmt, ##__VA_ARGS__);
} else if (time_after(__print_next, __print_jiffies + (period))
        __print_next = __print_jiffies + (period);
unlikely(__do_print);
})

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-17 12:09                     ` Konstantin Khlebnikov
@ 2016-09-17 12:20                       ` Cyrill Gorcunov
  -1 siblings, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2016-09-17 12:20 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar,
	Laura Abbott, Brent, Andrew Morton, Christian Borntraeger,
	linux-mm, Linux Kernel Mailing List

On Sat, Sep 17, 2016 at 03:09:09PM +0300, Konstantin Khlebnikov wrote:
> >
> > Seems I don't understand the bottom unlikely...
> 
> This is gcc extrension:  https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
> Here macro works as a function which returns bool

no, no, I know for what unlikely extension stand for.
it was just hard to obtain from without the context.
this extension implies someone calls for
if (printk_periodic()) right?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-17 12:20                       ` Cyrill Gorcunov
  0 siblings, 0 replies; 25+ messages in thread
From: Cyrill Gorcunov @ 2016-09-17 12:20 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar,
	Laura Abbott, Brent, Andrew Morton, Christian Borntraeger,
	linux-mm, Linux Kernel Mailing List

On Sat, Sep 17, 2016 at 03:09:09PM +0300, Konstantin Khlebnikov wrote:
> >
> > Seems I don't understand the bottom unlikely...
> 
> This is gcc extrension:  https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
> Here macro works as a function which returns bool

no, no, I know for what unlikely extension stand for.
it was just hard to obtain from without the context.
this extension implies someone calls for
if (printk_periodic()) right?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-17 12:20                       ` Cyrill Gorcunov
  (?)
@ 2016-09-17 21:40                       ` Konstantin Khlebnikov
  2016-09-17 21:52                           ` Joe Perches
  -1 siblings, 1 reply; 25+ messages in thread
From: Konstantin Khlebnikov @ 2016-09-17 21:40 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Joe Perches, Linus Torvalds, Sam Varshavchik, Ingo Molnar,
	Laura Abbott, Brent, Andrew Morton, Christian Borntraeger,
	linux-mm, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1163 bytes --]

On Sat, Sep 17, 2016 at 3:20 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On Sat, Sep 17, 2016 at 03:09:09PM +0300, Konstantin Khlebnikov wrote:
>> >
>> > Seems I don't understand the bottom unlikely...
>>
>> This is gcc extrension:  https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html
>> Here macro works as a function which returns bool
>
> no, no, I know for what unlikely extension stand for.
> it was just hard to obtain from without the context.
> this extension implies someone calls for
> if (printk_periodic()) right?

Yep.


Here is perfect macro for that jiffies check: time_in_range_open.

/*
 * Calculate whether a is in the range of [b, c).
 */
#define time_in_range_open(a,b,c) \
(time_after_eq(a,b) && \
time_before(a,c))

So... better version looks like

#define printk_periodic(period, fmt, ...)
({
        static unsigned long __prev __read_mostly = INITIAL_JIFFIES - (period);
        unsigned long __now = jiffies;
        bool __print = !time_in_range_open(__now, __prev, __prev + (period));

        if (__print) {
                __prev = __now;
                printk(fmt, ##__VA_ARGS__);
        }
        unlikely(__print);
})

[-- Attachment #2: printk-add-pr_warn_once_per_minute --]
[-- Type: application/octet-stream, Size: 2310 bytes --]

printk: add pr_warn_once_per_minute

From: Konstantin Khlebnikov <koct9i@gmail.com>

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
---
 include/linux/printk.h |   17 +++++++++++++++++
 mm/mmap.c              |    2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 696a56be7d3e..372984b6645b 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -341,11 +341,25 @@ extern asmlinkage void dump_stack(void) __cold;
 	}							\
 	unlikely(__ret_print_once);				\
 })
+#define printk_periodic(period, fmt, ...)			\
+({								\
+	static unsigned long __prev __read_mostly = INITIAL_JIFFIES - (period); \
+	unsigned long __now = jiffies;				\
+	bool __print = !time_in_range_open(__now, __prev, __prev + (period)); \
+								\
+	if (__print) {						\
+		__prev = __now;					\
+		printk(fmt, ##__VA_ARGS__);			\
+	}							\
+	unlikely(__print);					\
+})
 #else
 #define printk_once(fmt, ...)					\
 	no_printk(fmt, ##__VA_ARGS__)
 #define printk_deferred_once(fmt, ...)				\
 	no_printk(fmt, ##__VA_ARGS__)
+#define printk_periodic(period, fmt, ...)			\
+	no_printk(fmt, ##__VA_ARGS__)
 #endif
 
 #define pr_emerg_once(fmt, ...)					\
@@ -365,6 +379,9 @@ extern asmlinkage void dump_stack(void) __cold;
 #define pr_cont_once(fmt, ...)					\
 	printk_once(KERN_CONT pr_fmt(fmt), ##__VA_ARGS__)
 
+#define pr_warn_once_per_minute(fmt, ...)			\
+	printk_periodic(HZ * 60, KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
+
 #if defined(DEBUG)
 #define pr_devel_once(fmt, ...)					\
 	printk_once(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
diff --git a/mm/mmap.c b/mm/mmap.c
index ca9d91bca0d6..34f9fb2adcab 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2935,7 +2935,7 @@ bool may_expand_vm(struct mm_struct *mm, vm_flags_t flags, unsigned long npages)
 		    mm->data_vm + npages <= rlimit_max(RLIMIT_DATA) >> PAGE_SHIFT)
 			return true;
 		if (!ignore_rlimit_data) {
-			pr_warn_once("%s (%d): VmData %lu exceed data ulimit %lu. Update limits or use boot option ignore_rlimit_data.\n",
+			pr_warn_once_per_minute("%s (%d): VmData %lu exceed data ulimit %lu. Update limits or use boot option ignore_rlimit_data.\n",
 				     current->comm, current->pid,
 				     (mm->data_vm + npages) << PAGE_SHIFT,
 				     rlimit(RLIMIT_DATA));

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
  2016-09-17 21:40                       ` Konstantin Khlebnikov
@ 2016-09-17 21:52                           ` Joe Perches
  0 siblings, 0 replies; 25+ messages in thread
From: Joe Perches @ 2016-09-17 21:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov, Cyrill Gorcunov
  Cc: Linus Torvalds, Sam Varshavchik, Ingo Molnar, Laura Abbott,
	Brent, Andrew Morton, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

On Sun, 2016-09-18 at 00:40 +0300, Konstantin Khlebnikov wrote:
> #define printk_periodic(period, fmt, ...)
> ({
>         static unsigned long __prev __read_mostly = INITIAL_JIFFIES - (period);
>         unsigned long __now = jiffies;
>         bool __print = !time_in_range_open(__now, __prev, __prev + (period));
> 
>         if (__print) {
>                 __prev = __now;
>                 printk(fmt, ##__VA_ARGS__);
>         }
>         unlikely(__print);
> })

printk_periodic reads like a thing that would create a
thread to printk a message every period.

And trivially, period should be copied to a temporary
and not be reused (use your choice of # of underscores)

	unsigned long _period = period;
	unsigned long _now = now;
	static unsigned long _prev __read_mostly = etc...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [REGRESSION] RLIMIT_DATA crashes named
@ 2016-09-17 21:52                           ` Joe Perches
  0 siblings, 0 replies; 25+ messages in thread
From: Joe Perches @ 2016-09-17 21:52 UTC (permalink / raw)
  To: Konstantin Khlebnikov, Cyrill Gorcunov
  Cc: Linus Torvalds, Sam Varshavchik, Ingo Molnar, Laura Abbott,
	Brent, Andrew Morton, Christian Borntraeger, linux-mm,
	Linux Kernel Mailing List

On Sun, 2016-09-18 at 00:40 +0300, Konstantin Khlebnikov wrote:
> #define printk_periodic(period, fmt, ...)
> ({
>         static unsigned long __prev __read_mostly = INITIAL_JIFFIES - (period);
>         unsigned long __now = jiffies;
>         bool __print = !time_in_range_open(__now, __prev, __prev + (period));
> 
>         if (__print) {
>                 __prev = __now;
>                 printk(fmt, ##__VA_ARGS__);
>         }
>         unlikely(__print);
> })

printk_periodic reads like a thing that would create a
thread to printk a message every period.

And trivially, period should be copied to a temporary
and not be reused (use your choice of # of underscores)

	unsigned long _period = period;
	unsigned long _now = now;
	static unsigned long _prev __read_mostly = etc...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-09-17 21:52 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-16 15:16 [REGRESSION] RLIMIT_DATA crashes named Laura Abbott
2016-09-16 15:16 ` Laura Abbott
2016-09-16 17:46 ` Linus Torvalds
2016-09-16 17:46   ` Linus Torvalds
2016-09-16 20:10   ` Laura Abbott
2016-09-16 20:10     ` Laura Abbott
2016-09-16 20:32     ` Linus Torvalds
2016-09-16 20:32       ` Linus Torvalds
2016-09-16 22:30       ` Sam Varshavchik
2016-09-16 22:30         ` Sam Varshavchik
2016-09-16 23:58         ` Linus Torvalds
2016-09-17  0:04           ` Linus Torvalds
2016-09-17  0:04             ` Linus Torvalds
2016-09-17  4:08             ` Joe Perches
2016-09-17  4:08               ` Joe Perches
2016-09-17  8:33               ` Konstantin Khlebnikov
2016-09-17  9:09                 ` Cyrill Gorcunov
2016-09-17  9:09                   ` Cyrill Gorcunov
2016-09-17 12:09                   ` Konstantin Khlebnikov
2016-09-17 12:09                     ` Konstantin Khlebnikov
2016-09-17 12:20                     ` Cyrill Gorcunov
2016-09-17 12:20                       ` Cyrill Gorcunov
2016-09-17 21:40                       ` Konstantin Khlebnikov
2016-09-17 21:52                         ` Joe Perches
2016-09-17 21:52                           ` Joe Perches

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.