linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sysctl_writes_strict documentation  + an oddity?
@ 2015-05-09  8:54 Michael Kerrisk (man-pages)
  2015-05-20 14:08 ` Michael Kerrisk (man-pages)
  2015-06-04 19:36 ` Kees Cook
  0 siblings, 2 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-05-09  8:54 UTC (permalink / raw)
  To: Kees Cook; +Cc: mtk.manpages, lkml, Randy Dunlap, Andrew Morton, linux-man

Hi Kees,

I discovered that you added /proc/sys/kernel/sysctl_writes_strict in
Linux 3.16. In passing, I'll just mention that was an API change that
should have been CCed to linux-api@vger.kernel.org.

Anyway, I've tried to write this file up for the proc(5) man page, 
and I have two requests:

1) Could you review this text?
2) I've found some behavior that surprised me, and I am wondering if it
   is intended. Could you let me know your thoughts?

===== 1) man-page text =====

The man-page text, heavily based on your text in
Documentation/sysctl/kernel.txt, is as follows:

       /proc/sys/kernel/sysctl_writes_strict (since Linux 3.16)
              The  value  in  this  file  determines how the file offset
              affects the behavior of updating entries  in  files  under
              /proc/sys.  The file has three possible values:

              -1  This  provides  legacy  handling, with no printk warn‐
                  ings.  Each write(2) must fully contain the  value  to
                  be  written,  and  multiple  writes  on  the same file
                  descriptor will overwrite the entire value, regardless
                  of the file position.

              0   (default)  This  provides the same behavior as for -1,
                  but printk warnings are  written  for  processes  that
                  perform writes when the file offset is not 0.

              1   Respect  the  file  offset  when  writing strings into
                  /proc/sys files.  Multiple writes will append  to  the
                  value  buffer.   Anything  written  beyond the maximum
                  length of the value buffer will be ignored.  Writes to
                  numeric  /proc/sys entries must always be at file off‐
                  set 0 and the value must be  fully  contained  in  the
                  buffer provided to write(2).

===== 2) Behavior puzzle (a) =====

The last sentence quoted from the man page was based on your sentence

    Writes to numeric sysctl entries must always be at file position 0 
    and the value must be fully contained in the buffer sent in the write 
    syscall.

So, I had interpreted /proc/sys/kernel/sysctl_writes_strict==1 to
mean that if one writes into a numeric /proc/sys file at an offset
other than zero, the write() will fail with some kind of error.
But this seems not to be the case. Instead, the write() succeeds, 
but the file is left unmodified. That's surprising, I find. So, I'm
wondering whether the implementation deviates from your intention.

There's a test program below, which takes arguments as follows

    ./a.out pathname offset string

And here's a test run that demonstrates the behavior:

$ sudo sh -c "echo 1 > /proc/sys/kernel/sysctl_writes_strict"
$ cat /proc/sys/kernel/pid_max
32768
$ sudo dmesg --clear
$ sudo ./a.out /proc/sys/kernel/pid_max 1 3000
write() succeeded (return value 4)
$ cat /proc/sys/kernel/pid_max
32768
$ dmesg

As you can see above, an attempt was made to write into the
/proc/sys/kernel/pid_max file at offset 1.
The write() returned successfully (reporting 4 bytes written)
but the file contents were unchanged, and no printk() warning
was issued. Is this intended behavior?

===== 2) Behavior puzzle (b) =====

In commit f88083005ab319abba5d0b2e4e997558245493c8, there is this note:

    This adds the sysctl kernel.sysctl_writes_strict to control the write
    behavior.  The default (0) reports when VFS position is non-0 on a
    write, but retains legacy behavior, -1 disables the warning, and 1
    enables the position-respecting behavior.
    
    The long-term plan here is to wait for userspace to be fixed in response
    to the new warning and to then switch the default kernel behavior to the
    new position-respecting behavior.

(That last para was added to the commit message by AKPM, I see.)

But, I wonder here whether /proc/sys/kernel/sysctl_writes_strict==0
is going to help with the long-term plan. The problem is that in 
warn_sysctl_write(), pr_warn_once() is used. This means that only 
the first offending user-space application that writes to *any* 
/proc/sys file will generate the printk warning. If that application 
isn't fixed, then none of the other "broken" applications will be 
discovered. It therefore seems possible that it could be a very long
time before we could "switch the default kernel behavior to the
new position-respecting behavior".

Looking over old mails
(http://thread.gmane.org/gmane.linux.kernel/1695177/focus=23240),
I see that you're aware of the problem, but it seems to me that
the switch to pr_warn_once() (for fear of spamming the log) likely
dooms the long-term plan to failure. Your thoughts?

Cheers,

Michael


8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--

#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

int
main(int argc, char *argv[])
{
    char *pathname;
    off_t offset;
    char *string;
    int fd;
    ssize_t numWritten;

    if (argc != 4) {
        fprintf(stderr, "Usage: %s pathname offset string\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    pathname = argv[1];
    offset = strtoll(argv[2], NULL, 0);
    string = argv[3];

    fd = open(pathname, O_RDWR);
    if (fd == -1)
        errExit("open");

    if (lseek(fd, offset, SEEK_SET) == -1)
        errExit("lseek");

    numWritten = write(fd, string, strlen(string));
    if (numWritten == -1)
        errExit("write");

    printf("write() succeeded (return value %zd)\n", numWritten);

    exit(EXIT_SUCCESS);
}

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sysctl_writes_strict documentation + an oddity?
  2015-05-09  8:54 sysctl_writes_strict documentation + an oddity? Michael Kerrisk (man-pages)
@ 2015-05-20 14:08 ` Michael Kerrisk (man-pages)
  2015-06-04 19:36 ` Kees Cook
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-05-20 14:08 UTC (permalink / raw)
  To: Kees Cook; +Cc: Michael Kerrisk, lkml, Andrew Morton, linux-man, Randy Dunlap

Hello Kees,

Ping on the below!

Cheers,

Michael


On 9 May 2015 at 10:54, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hi Kees,
>
> I discovered that you added /proc/sys/kernel/sysctl_writes_strict in
> Linux 3.16. In passing, I'll just mention that was an API change that
> should have been CCed to linux-api@vger.kernel.org.
>
> Anyway, I've tried to write this file up for the proc(5) man page,
> and I have two requests:
>
> 1) Could you review this text?
> 2) I've found some behavior that surprised me, and I am wondering if it
>    is intended. Could you let me know your thoughts?
>
> ===== 1) man-page text =====
>
> The man-page text, heavily based on your text in
> Documentation/sysctl/kernel.txt, is as follows:
>
>        /proc/sys/kernel/sysctl_writes_strict (since Linux 3.16)
>               The  value  in  this  file  determines how the file offset
>               affects the behavior of updating entries  in  files  under
>               /proc/sys.  The file has three possible values:
>
>               -1  This  provides  legacy  handling, with no printk warn‐
>                   ings.  Each write(2) must fully contain the  value  to
>                   be  written,  and  multiple  writes  on  the same file
>                   descriptor will overwrite the entire value, regardless
>                   of the file position.
>
>               0   (default)  This  provides the same behavior as for -1,
>                   but printk warnings are  written  for  processes  that
>                   perform writes when the file offset is not 0.
>
>               1   Respect  the  file  offset  when  writing strings into
>                   /proc/sys files.  Multiple writes will append  to  the
>                   value  buffer.   Anything  written  beyond the maximum
>                   length of the value buffer will be ignored.  Writes to
>                   numeric  /proc/sys entries must always be at file off‐
>                   set 0 and the value must be  fully  contained  in  the
>                   buffer provided to write(2).
>
> ===== 2) Behavior puzzle (a) =====
>
> The last sentence quoted from the man page was based on your sentence
>
>     Writes to numeric sysctl entries must always be at file position 0
>     and the value must be fully contained in the buffer sent in the write
>     syscall.
>
> So, I had interpreted /proc/sys/kernel/sysctl_writes_strict==1 to
> mean that if one writes into a numeric /proc/sys file at an offset
> other than zero, the write() will fail with some kind of error.
> But this seems not to be the case. Instead, the write() succeeds,
> but the file is left unmodified. That's surprising, I find. So, I'm
> wondering whether the implementation deviates from your intention.
>
> There's a test program below, which takes arguments as follows
>
>     ./a.out pathname offset string
>
> And here's a test run that demonstrates the behavior:
>
> $ sudo sh -c "echo 1 > /proc/sys/kernel/sysctl_writes_strict"
> $ cat /proc/sys/kernel/pid_max
> 32768
> $ sudo dmesg --clear
> $ sudo ./a.out /proc/sys/kernel/pid_max 1 3000
> write() succeeded (return value 4)
> $ cat /proc/sys/kernel/pid_max
> 32768
> $ dmesg
>
> As you can see above, an attempt was made to write into the
> /proc/sys/kernel/pid_max file at offset 1.
> The write() returned successfully (reporting 4 bytes written)
> but the file contents were unchanged, and no printk() warning
> was issued. Is this intended behavior?
>
> ===== 2) Behavior puzzle (b) =====
>
> In commit f88083005ab319abba5d0b2e4e997558245493c8, there is this note:
>
>     This adds the sysctl kernel.sysctl_writes_strict to control the write
>     behavior.  The default (0) reports when VFS position is non-0 on a
>     write, but retains legacy behavior, -1 disables the warning, and 1
>     enables the position-respecting behavior.
>
>     The long-term plan here is to wait for userspace to be fixed in response
>     to the new warning and to then switch the default kernel behavior to the
>     new position-respecting behavior.
>
> (That last para was added to the commit message by AKPM, I see.)
>
> But, I wonder here whether /proc/sys/kernel/sysctl_writes_strict==0
> is going to help with the long-term plan. The problem is that in
> warn_sysctl_write(), pr_warn_once() is used. This means that only
> the first offending user-space application that writes to *any*
> /proc/sys file will generate the printk warning. If that application
> isn't fixed, then none of the other "broken" applications will be
> discovered. It therefore seems possible that it could be a very long
> time before we could "switch the default kernel behavior to the
> new position-respecting behavior".
>
> Looking over old mails
> (http://thread.gmane.org/gmane.linux.kernel/1695177/focus=23240),
> I see that you're aware of the problem, but it seems to me that
> the switch to pr_warn_once() (for fear of spamming the log) likely
> dooms the long-term plan to failure. Your thoughts?
>
> Cheers,
>
> Michael
>
>
> 8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--
>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
>
> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
>
> int
> main(int argc, char *argv[])
> {
>     char *pathname;
>     off_t offset;
>     char *string;
>     int fd;
>     ssize_t numWritten;
>
>     if (argc != 4) {
>         fprintf(stderr, "Usage: %s pathname offset string\n", argv[0]);
>         exit(EXIT_FAILURE);
>     }
>
>     pathname = argv[1];
>     offset = strtoll(argv[2], NULL, 0);
>     string = argv[3];
>
>     fd = open(pathname, O_RDWR);
>     if (fd == -1)
>         errExit("open");
>
>     if (lseek(fd, offset, SEEK_SET) == -1)
>         errExit("lseek");
>
>     numWritten = write(fd, string, strlen(string));
>     if (numWritten == -1)
>         errExit("write");
>
>     printf("write() succeeded (return value %zd)\n", numWritten);
>
>     exit(EXIT_SUCCESS);
> }
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sysctl_writes_strict documentation + an oddity?
  2015-05-09  8:54 sysctl_writes_strict documentation + an oddity? Michael Kerrisk (man-pages)
  2015-05-20 14:08 ` Michael Kerrisk (man-pages)
@ 2015-06-04 19:36 ` Kees Cook
  2015-06-16 10:03   ` Michael Kerrisk (man-pages)
  1 sibling, 1 reply; 6+ messages in thread
From: Kees Cook @ 2015-06-04 19:36 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: lkml, Randy Dunlap, Andrew Morton, linux-man

On Sat, May 9, 2015 at 1:54 AM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> Hi Kees,
>
> I discovered that you added /proc/sys/kernel/sysctl_writes_strict in
> Linux 3.16. In passing, I'll just mention that was an API change that
> should have been CCed to linux-api@vger.kernel.org.

Sorry about that! I'm trying to get better. I think my main trigger
for this is "if I'm adding a file to Documentation/ I should probably
CC linux-api" now. :)

> Anyway, I've tried to write this file up for the proc(5) man page,
> and I have two requests:
>
> 1) Could you review this text?
> 2) I've found some behavior that surprised me, and I am wondering if it
>    is intended. Could you let me know your thoughts?
>
> ===== 1) man-page text =====
>
> The man-page text, heavily based on your text in
> Documentation/sysctl/kernel.txt, is as follows:
>
>        /proc/sys/kernel/sysctl_writes_strict (since Linux 3.16)
>               The  value  in  this  file  determines how the file offset
>               affects the behavior of updating entries  in  files  under
>               /proc/sys.  The file has three possible values:
>
>               -1  This  provides  legacy  handling, with no printk warn‐
>                   ings.  Each write(2) must fully contain the  value  to
>                   be  written,  and  multiple  writes  on  the same file
>                   descriptor will overwrite the entire value, regardless
>                   of the file position.
>
>               0   (default)  This  provides the same behavior as for -1,
>                   but printk warnings are  written  for  processes  that
>                   perform writes when the file offset is not 0.
>
>               1   Respect  the  file  offset  when  writing strings into
>                   /proc/sys files.  Multiple writes will append  to  the
>                   value  buffer.   Anything  written  beyond the maximum
>                   length of the value buffer will be ignored.  Writes to
>                   numeric  /proc/sys entries must always be at file off‐
>                   set 0 and the value must be  fully  contained  in  the
>                   buffer provided to write(2).

That looks correct, yes. Thanks!

>
> ===== 2) Behavior puzzle (a) =====
>
> The last sentence quoted from the man page was based on your sentence
>
>     Writes to numeric sysctl entries must always be at file position 0
>     and the value must be fully contained in the buffer sent in the write
>     syscall.
>
> So, I had interpreted /proc/sys/kernel/sysctl_writes_strict==1 to
> mean that if one writes into a numeric /proc/sys file at an offset
> other than zero, the write() will fail with some kind of error.

Reporting back an error wasn't something I'd tested before. Looking at
the code again now, it should be possible make this change.
Regardless, in the case of the numeric value error condition, it's the
same as the "past the end" string error condition: "Anything written
beyond the maximum length of the value buffer will be ignored." i.e.
anything other than file offset 0 is considered "past the end of the
buffer" for a numeric value and is ignored.

> But this seems not to be the case. Instead, the write() succeeds,
> but the file is left unmodified. That's surprising, I find. So, I'm
> wondering whether the implementation deviates from your intention.
>
> There's a test program below, which takes arguments as follows
>
>     ./a.out pathname offset string

I have tests in tools/testing/selftests/sysctl for checking the
various behaviors too. They don't actually examine any error
conditions from the sysctl writing itself. It should be simple to make
sysctl_writes_strict failures return an error, though.

>
> And here's a test run that demonstrates the behavior:
>
> $ sudo sh -c "echo 1 > /proc/sys/kernel/sysctl_writes_strict"
> $ cat /proc/sys/kernel/pid_max
> 32768
> $ sudo dmesg --clear
> $ sudo ./a.out /proc/sys/kernel/pid_max 1 3000
> write() succeeded (return value 4)
> $ cat /proc/sys/kernel/pid_max
> 32768
> $ dmesg
>
> As you can see above, an attempt was made to write into the
> /proc/sys/kernel/pid_max file at offset 1.
> The write() returned successfully (reporting 4 bytes written)
> but the file contents were unchanged, and no printk() warning
> was issued. Is this intended behavior?
>
> ===== 2) Behavior puzzle (b) =====
>
> In commit f88083005ab319abba5d0b2e4e997558245493c8, there is this note:
>
>     This adds the sysctl kernel.sysctl_writes_strict to control the write
>     behavior.  The default (0) reports when VFS position is non-0 on a
>     write, but retains legacy behavior, -1 disables the warning, and 1
>     enables the position-respecting behavior.
>
>     The long-term plan here is to wait for userspace to be fixed in response
>     to the new warning and to then switch the default kernel behavior to the
>     new position-respecting behavior.
>
> (That last para was added to the commit message by AKPM, I see.)
>
> But, I wonder here whether /proc/sys/kernel/sysctl_writes_strict==0
> is going to help with the long-term plan. The problem is that in
> warn_sysctl_write(), pr_warn_once() is used. This means that only
> the first offending user-space application that writes to *any*
> /proc/sys file will generate the printk warning. If that application
> isn't fixed, then none of the other "broken" applications will be
> discovered. It therefore seems possible that it could be a very long
> time before we could "switch the default kernel behavior to the
> new position-respecting behavior".
>
> Looking over old mails
> (http://thread.gmane.org/gmane.linux.kernel/1695177/focus=23240),
> I see that you're aware of the problem, but it seems to me that
> the switch to pr_warn_once() (for fear of spamming the log) likely
> dooms the long-term plan to failure. Your thoughts?

In actual regular use, the situation that triggers the warning should
be vanishingly rare, but the condition can be trivially met by someone
intending to hit it for the purposes of filling log files. As such, it
makes sense to me to use _once to avoid spamming, but still catch a
rare usage under normal conditions.

>
> Cheers,
>
> Michael
>
>
> 8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--
>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
>
> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
>
> int
> main(int argc, char *argv[])
> {
>     char *pathname;
>     off_t offset;
>     char *string;
>     int fd;
>     ssize_t numWritten;
>
>     if (argc != 4) {
>         fprintf(stderr, "Usage: %s pathname offset string\n", argv[0]);
>         exit(EXIT_FAILURE);
>     }
>
>     pathname = argv[1];
>     offset = strtoll(argv[2], NULL, 0);
>     string = argv[3];
>
>     fd = open(pathname, O_RDWR);
>     if (fd == -1)
>         errExit("open");
>
>     if (lseek(fd, offset, SEEK_SET) == -1)
>         errExit("lseek");
>
>     numWritten = write(fd, string, strlen(string));
>     if (numWritten == -1)
>         errExit("write");
>
>     printf("write() succeeded (return value %zd)\n", numWritten);
>
>     exit(EXIT_SUCCESS);
> }
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Linux/UNIX System Programming Training: http://man7.org/training/



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sysctl_writes_strict documentation + an oddity?
  2015-06-04 19:36 ` Kees Cook
@ 2015-06-16 10:03   ` Michael Kerrisk (man-pages)
  2015-06-16 16:32     ` Kees Cook
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-06-16 10:03 UTC (permalink / raw)
  To: Kees Cook; +Cc: mtk.manpages, lkml, Randy Dunlap, Andrew Morton, linux-man

Hi Kees,

On 06/04/2015 09:36 PM, Kees Cook wrote:
> On Sat, May 9, 2015 at 1:54 AM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hi Kees,
>>
>> I discovered that you added /proc/sys/kernel/sysctl_writes_strict in
>> Linux 3.16. In passing, I'll just mention that was an API change that
>> should have been CCed to linux-api@vger.kernel.org.
> 
> Sorry about that! I'm trying to get better. I think my main trigger
> for this is "if I'm adding a file to Documentation/ I should probably
> CC linux-api" now. :)
> 
>> Anyway, I've tried to write this file up for the proc(5) man page,
>> and I have two requests:
>>
>> 1) Could you review this text?
>> 2) I've found some behavior that surprised me, and I am wondering if it
>>    is intended. Could you let me know your thoughts?
>>
>> ===== 1) man-page text =====
>>
>> The man-page text, heavily based on your text in
>> Documentation/sysctl/kernel.txt, is as follows:
>>
>>        /proc/sys/kernel/sysctl_writes_strict (since Linux 3.16)
>>               The  value  in  this  file  determines how the file offset
>>               affects the behavior of updating entries  in  files  under
>>               /proc/sys.  The file has three possible values:
>>
>>               -1  This  provides  legacy  handling, with no printk warn‐
>>                   ings.  Each write(2) must fully contain the  value  to
>>                   be  written,  and  multiple  writes  on  the same file
>>                   descriptor will overwrite the entire value, regardless
>>                   of the file position.
>>
>>               0   (default)  This  provides the same behavior as for -1,
>>                   but printk warnings are  written  for  processes  that
>>                   perform writes when the file offset is not 0.
>>
>>               1   Respect  the  file  offset  when  writing strings into
>>                   /proc/sys files.  Multiple writes will append  to  the
>>                   value  buffer.   Anything  written  beyond the maximum
>>                   length of the value buffer will be ignored.  Writes to
>>                   numeric  /proc/sys entries must always be at file off‐
>>                   set 0 and the value must be  fully  contained  in  the
>>                   buffer provided to write(2).
> 
> That looks correct, yes. Thanks!

Okay. Thanks.

>> ===== 2) Behavior puzzle (a) =====
>>
>> The last sentence quoted from the man page was based on your sentence
>>
>>     Writes to numeric sysctl entries must always be at file position 0
>>     and the value must be fully contained in the buffer sent in the write
>>     syscall.
>>
>> So, I had interpreted /proc/sys/kernel/sysctl_writes_strict==1 to
>> mean that if one writes into a numeric /proc/sys file at an offset
>> other than zero, the write() will fail with some kind of error.
> 
> Reporting back an error wasn't something I'd tested before. Looking at
> the code again now, it should be possible make this change.
> Regardless, in the case of the numeric value error condition, it's the
> same as the "past the end" string error condition: "Anything written
> beyond the maximum length of the value buffer will be ignored." i.e.
> anything other than file offset 0 is considered "past the end of the
> buffer" for a numeric value and is ignored.
> 
>> But this seems not to be the case. Instead, the write() succeeds,
>> but the file is left unmodified. That's surprising, I find. So, I'm
>> wondering whether the implementation deviates from your intention.
>>
>> There's a test program below, which takes arguments as follows
>>
>>     ./a.out pathname offset string
> 
> I have tests in tools/testing/selftests/sysctl for checking the
> various behaviors too. They don't actually examine any error
> conditions from the sysctl writing itself. It should be simple to make
> sysctl_writes_strict failures return an error, though.

So, what do you think: is it *desirable* to make sysctl_writes_strict
failures return an error?

>> And here's a test run that demonstrates the behavior:
>>
>> $ sudo sh -c "echo 1 > /proc/sys/kernel/sysctl_writes_strict"
>> $ cat /proc/sys/kernel/pid_max
>> 32768
>> $ sudo dmesg --clear
>> $ sudo ./a.out /proc/sys/kernel/pid_max 1 3000
>> write() succeeded (return value 4)
>> $ cat /proc/sys/kernel/pid_max
>> 32768
>> $ dmesg
>>
>> As you can see above, an attempt was made to write into the
>> /proc/sys/kernel/pid_max file at offset 1.
>> The write() returned successfully (reporting 4 bytes written)
>> but the file contents were unchanged, and no printk() warning
>> was issued. Is this intended behavior?
>>
>> ===== 2) Behavior puzzle (b) =====
>>
>> In commit f88083005ab319abba5d0b2e4e997558245493c8, there is this note:
>>
>>     This adds the sysctl kernel.sysctl_writes_strict to control the write
>>     behavior.  The default (0) reports when VFS position is non-0 on a
>>     write, but retains legacy behavior, -1 disables the warning, and 1
>>     enables the position-respecting behavior.
>>
>>     The long-term plan here is to wait for userspace to be fixed in response
>>     to the new warning and to then switch the default kernel behavior to the
>>     new position-respecting behavior.
>>
>> (That last para was added to the commit message by AKPM, I see.)
>>
>> But, I wonder here whether /proc/sys/kernel/sysctl_writes_strict==0
>> is going to help with the long-term plan. The problem is that in
>> warn_sysctl_write(), pr_warn_once() is used. This means that only
>> the first offending user-space application that writes to *any*
>> /proc/sys file will generate the printk warning. If that application
>> isn't fixed, then none of the other "broken" applications will be
>> discovered. It therefore seems possible that it could be a very long
>> time before we could "switch the default kernel behavior to the
>> new position-respecting behavior".
>>
>> Looking over old mails
>> (http://thread.gmane.org/gmane.linux.kernel/1695177/focus=23240),
>> I see that you're aware of the problem, but it seems to me that
>> the switch to pr_warn_once() (for fear of spamming the log) likely
>> dooms the long-term plan to failure. Your thoughts?
> 
> In actual regular use, the situation that triggers the warning should
> be vanishingly rare, but the condition can be trivially met by someone
> intending to hit it for the purposes of filling log files. As such, it
> makes sense to me to use _once to avoid spamming, but still catch a
> rare usage under normal conditions.

So, I'm not clear whether you think I'm wrong or not ;-). 
Do you disagree with my point that this approach may doom
the long-term project to failure? (That was my main point.)

Cheers,

Michael


>> 8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--8x--
>>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <sys/types.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <string.h>
>>
>> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
>>
>> int
>> main(int argc, char *argv[])
>> {
>>     char *pathname;
>>     off_t offset;
>>     char *string;
>>     int fd;
>>     ssize_t numWritten;
>>
>>     if (argc != 4) {
>>         fprintf(stderr, "Usage: %s pathname offset string\n", argv[0]);
>>         exit(EXIT_FAILURE);
>>     }
>>
>>     pathname = argv[1];
>>     offset = strtoll(argv[2], NULL, 0);
>>     string = argv[3];
>>
>>     fd = open(pathname, O_RDWR);
>>     if (fd == -1)
>>         errExit("open");
>>
>>     if (lseek(fd, offset, SEEK_SET) == -1)
>>         errExit("lseek");
>>
>>     numWritten = write(fd, string, strlen(string));
>>     if (numWritten == -1)
>>         errExit("write");
>>
>>     printf("write() succeeded (return value %zd)\n", numWritten);
>>
>>     exit(EXIT_SUCCESS);
>> }
>>
>> --
>> Michael Kerrisk
>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
>> Linux/UNIX System Programming Training: http://man7.org/training/
> 
> 
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sysctl_writes_strict documentation + an oddity?
  2015-06-16 10:03   ` Michael Kerrisk (man-pages)
@ 2015-06-16 16:32     ` Kees Cook
  2015-06-17  9:09       ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 6+ messages in thread
From: Kees Cook @ 2015-06-16 16:32 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: lkml, Randy Dunlap, Andrew Morton, linux-man

On Tue, Jun 16, 2015 at 3:03 AM, Michael Kerrisk (man-pages)
<mtk.manpages@gmail.com> wrote:
> On 06/04/2015 09:36 PM, Kees Cook wrote:
>> On Sat, May 9, 2015 at 1:54 AM, Michael Kerrisk (man-pages)
>> <mtk.manpages@gmail.com> wrote:
>>> ===== 2) Behavior puzzle (a) =====
>>>
>>> The last sentence quoted from the man page was based on your sentence
>>>
>>>     Writes to numeric sysctl entries must always be at file position 0
>>>     and the value must be fully contained in the buffer sent in the write
>>>     syscall.
>>>
>>> So, I had interpreted /proc/sys/kernel/sysctl_writes_strict==1 to
>>> mean that if one writes into a numeric /proc/sys file at an offset
>>> other than zero, the write() will fail with some kind of error.
>>
>> Reporting back an error wasn't something I'd tested before. Looking at
>> the code again now, it should be possible make this change.
>> Regardless, in the case of the numeric value error condition, it's the
>> same as the "past the end" string error condition: "Anything written
>> beyond the maximum length of the value buffer will be ignored." i.e.
>> anything other than file offset 0 is considered "past the end of the
>> buffer" for a numeric value and is ignored.
>>
>>> But this seems not to be the case. Instead, the write() succeeds,
>>> but the file is left unmodified. That's surprising, I find. So, I'm
>>> wondering whether the implementation deviates from your intention.
>>>
>>> There's a test program below, which takes arguments as follows
>>>
>>>     ./a.out pathname offset string
>>
>> I have tests in tools/testing/selftests/sysctl for checking the
>> various behaviors too. They don't actually examine any error
>> conditions from the sysctl writing itself. It should be simple to make
>> sysctl_writes_strict failures return an error, though.
>
> So, what do you think: is it *desirable* to make sysctl_writes_strict
> failures return an error?

I think it would be desirable, yes. I want to improve the tests to add
error checking first, so I can make sure the change doesn't introduce
anything unexpected. The fix is simple, but since the code is a little
twisty, I want to be careful.

>>> ===== 2) Behavior puzzle (b) =====
>>>
>>> In commit f88083005ab319abba5d0b2e4e997558245493c8, there is this note:
>>>
>>>     This adds the sysctl kernel.sysctl_writes_strict to control the write
>>>     behavior.  The default (0) reports when VFS position is non-0 on a
>>>     write, but retains legacy behavior, -1 disables the warning, and 1
>>>     enables the position-respecting behavior.
>>>
>>>     The long-term plan here is to wait for userspace to be fixed in response
>>>     to the new warning and to then switch the default kernel behavior to the
>>>     new position-respecting behavior.
>>>
>>> (That last para was added to the commit message by AKPM, I see.)
>>>
>>> But, I wonder here whether /proc/sys/kernel/sysctl_writes_strict==0
>>> is going to help with the long-term plan. The problem is that in
>>> warn_sysctl_write(), pr_warn_once() is used. This means that only
>>> the first offending user-space application that writes to *any*
>>> /proc/sys file will generate the printk warning. If that application
>>> isn't fixed, then none of the other "broken" applications will be
>>> discovered. It therefore seems possible that it could be a very long
>>> time before we could "switch the default kernel behavior to the
>>> new position-respecting behavior".
>>>
>>> Looking over old mails
>>> (http://thread.gmane.org/gmane.linux.kernel/1695177/focus=23240),
>>> I see that you're aware of the problem, but it seems to me that
>>> the switch to pr_warn_once() (for fear of spamming the log) likely
>>> dooms the long-term plan to failure. Your thoughts?
>>
>> In actual regular use, the situation that triggers the warning should
>> be vanishingly rare, but the condition can be trivially met by someone
>> intending to hit it for the purposes of filling log files. As such, it
>> makes sense to me to use _once to avoid spamming, but still catch a
>> rare usage under normal conditions.
>
> So, I'm not clear whether you think I'm wrong or not ;-).
> Do you disagree with my point that this approach may doom
> the long-term project to failure? (That was my main point.)

Sorry! No, I don't think using pr_warn_once() will doom the
transition. I think that if we see the warning, we need to investigate
what's using sysctl that way. If we never see it, then we can switch
the default. (Using _once protects against log spamming.) I would
basically expect to never see this warning, but akpm wanted to be very
cautious, which I can't argue with. :)

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sysctl_writes_strict documentation + an oddity?
  2015-06-16 16:32     ` Kees Cook
@ 2015-06-17  9:09       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-06-17  9:09 UTC (permalink / raw)
  To: Kees Cook; +Cc: mtk.manpages, lkml, Randy Dunlap, Andrew Morton, linux-man

Hi Kees,

On 06/16/2015 06:32 PM, Kees Cook wrote:
> On Tue, Jun 16, 2015 at 3:03 AM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> On 06/04/2015 09:36 PM, Kees Cook wrote:
>>> On Sat, May 9, 2015 at 1:54 AM, Michael Kerrisk (man-pages)
>>> <mtk.manpages@gmail.com> wrote:
>>>> ===== 2) Behavior puzzle (a) =====
>>>>
>>>> The last sentence quoted from the man page was based on your sentence
>>>>
>>>>     Writes to numeric sysctl entries must always be at file position 0
>>>>     and the value must be fully contained in the buffer sent in the write
>>>>     syscall.
>>>>
>>>> So, I had interpreted /proc/sys/kernel/sysctl_writes_strict==1 to
>>>> mean that if one writes into a numeric /proc/sys file at an offset
>>>> other than zero, the write() will fail with some kind of error.
>>>
>>> Reporting back an error wasn't something I'd tested before. Looking at
>>> the code again now, it should be possible make this change.
>>> Regardless, in the case of the numeric value error condition, it's the
>>> same as the "past the end" string error condition: "Anything written
>>> beyond the maximum length of the value buffer will be ignored." i.e.
>>> anything other than file offset 0 is considered "past the end of the
>>> buffer" for a numeric value and is ignored.
>>>
>>>> But this seems not to be the case. Instead, the write() succeeds,
>>>> but the file is left unmodified. That's surprising, I find. So, I'm
>>>> wondering whether the implementation deviates from your intention.
>>>>
>>>> There's a test program below, which takes arguments as follows
>>>>
>>>>     ./a.out pathname offset string
>>>
>>> I have tests in tools/testing/selftests/sysctl for checking the
>>> various behaviors too. They don't actually examine any error
>>> conditions from the sysctl writing itself. It should be simple to make
>>> sysctl_writes_strict failures return an error, though.
>>
>> So, what do you think: is it *desirable* to make sysctl_writes_strict
>> failures return an error?
> 
> I think it would be desirable, yes. I want to improve the tests to add
> error checking first, so I can make sure the change doesn't introduce
> anything unexpected. The fix is simple, but since the code is a little
> twisty, I want to be careful.

Okay.

>>>> ===== 2) Behavior puzzle (b) =====
>>>>
>>>> In commit f88083005ab319abba5d0b2e4e997558245493c8, there is this note:
>>>>
>>>>     This adds the sysctl kernel.sysctl_writes_strict to control the write
>>>>     behavior.  The default (0) reports when VFS position is non-0 on a
>>>>     write, but retains legacy behavior, -1 disables the warning, and 1
>>>>     enables the position-respecting behavior.
>>>>
>>>>     The long-term plan here is to wait for userspace to be fixed in response
>>>>     to the new warning and to then switch the default kernel behavior to the
>>>>     new position-respecting behavior.
>>>>
>>>> (That last para was added to the commit message by AKPM, I see.)
>>>>
>>>> But, I wonder here whether /proc/sys/kernel/sysctl_writes_strict==0
>>>> is going to help with the long-term plan. The problem is that in
>>>> warn_sysctl_write(), pr_warn_once() is used. This means that only
>>>> the first offending user-space application that writes to *any*
>>>> /proc/sys file will generate the printk warning. If that application
>>>> isn't fixed, then none of the other "broken" applications will be
>>>> discovered. It therefore seems possible that it could be a very long
>>>> time before we could "switch the default kernel behavior to the
>>>> new position-respecting behavior".
>>>>
>>>> Looking over old mails
>>>> (http://thread.gmane.org/gmane.linux.kernel/1695177/focus=23240),
>>>> I see that you're aware of the problem, but it seems to me that
>>>> the switch to pr_warn_once() (for fear of spamming the log) likely
>>>> dooms the long-term plan to failure. Your thoughts?
>>>
>>> In actual regular use, the situation that triggers the warning should
>>> be vanishingly rare, but the condition can be trivially met by someone
>>> intending to hit it for the purposes of filling log files. As such, it
>>> makes sense to me to use _once to avoid spamming, but still catch a
>>> rare usage under normal conditions.
>>
>> So, I'm not clear whether you think I'm wrong or not ;-).
>> Do you disagree with my point that this approach may doom
>> the long-term project to failure? (That was my main point.)
> 
> Sorry! No, I don't think using pr_warn_once() will doom the
> transition. I think that if we see the warning, we need to investigate
> what's using sysctl that way. If we never see it, then we can switch
> the default. (Using _once protects against log spamming.) I would
> basically expect to never see this warning, but akpm wanted to be very
> cautious, which I can't argue with. :)

Okay.

The man-pages text above will go out with the next release. Thanks
for your help!

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-06-17  9:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-09  8:54 sysctl_writes_strict documentation + an oddity? Michael Kerrisk (man-pages)
2015-05-20 14:08 ` Michael Kerrisk (man-pages)
2015-06-04 19:36 ` Kees Cook
2015-06-16 10:03   ` Michael Kerrisk (man-pages)
2015-06-16 16:32     ` Kees Cook
2015-06-17  9:09       ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).