Document that chcpu -g is not supported on IBM z/VM because the detach cpu would CLEAR the running zVM guest memory. References: https://www.ibm.com/docs/en/linux-on-z?topic=mc-changing-state-1 https://www.ibm.com/docs/en/zvm/7.3?topic=commands-detach-cpu Reported-by: Heikki Ylipiessa <heikki.ylipiessa@suse.com> Signed-off-by: Stanislav Brabec <sbrabec@suse.cz> --- sys-utils/chcpu.8.adoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sys-utils/chcpu.8.adoc b/sys-utils/chcpu.8.adoc index c5797dfb3..5b28ef8d2 100644 --- a/sys-utils/chcpu.8.adoc +++ b/sys-utils/chcpu.8.adoc @@ -37,6 +37,8 @@ Enable the specified CPUs. Enabling a CPU means that the kernel sets it online. *-g*, *--deconfigure* _cpu-list_:: Deconfigure the specified CPUs. Deconfiguring a CPU means that the hypervisor removes the CPU from the virtual hardware on which the Linux instance runs and returns it to the CPU pool. A CPU must be offline, see *-d*, before it can be deconfigured. ++ +*chcpu -g* is not supported on IBM z/VM, CPUs are always in a configured state there. *-p*, *--dispatch* _mode_:: Set the CPU dispatching _mode_ (polarization). This option has an effect only if your hardware architecture and hypervisor support CPU polarization. Available _modes_ are: -- 2.43.0 -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.com Křižíkova 148/34 (Corso IIa) tel: +420 284 084 060 186 00 Praha 8-Karlín fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76
On 17/03/2024 11:32, Pádraig Brady wrote:
> On 17/03/2024 06:10, Paul Eggert wrote:
>> On 2024-03-05 06:16, Pádraig Brady wrote:
>>> I think I'll remove the as yet unreleased mv --swap from coreutils,
>>> given that
>>> util-linux is as widely available as coreutils on GNU/Linux platforms.
>>
>> Although removing that "mv --swap" implementation was a win, I don't
>> think we can simply delegate this to util-linux's exch command.
>> Exchanging files via a renameat-like call is not limited to the Linux
>> kernel; it's also possible on macOS via renameatx_np with RENAME_SWAP,
>> and there have been noises about adding similar things to other
>> operating systems.
>>
>> I just now added support for macOS renameatx_np to Gnulib, so coreutils
>> does not need to worry about the macOS details; it can simply use
>> renameatu with the Linux flags. See:
>>
>> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=af32ee824ee18255839f9812b8ed61aa5257a82b
>>
>> Even with Linux it's dicey. People may have older util-linux installed
>> and so lack the 'exch' utility; this is true for both Fedora 39 and
>> Ubuntu 23.10, the current releases. Ubuntu is also odd in that it
>> doesn't install all the util-linux utilities as part of the util-linux
>> package, so it's not clear what they will do with 'exch'.
>>
>> So I propose that we implement the idea in coreutils in a better way,
>> that interacts more nicely with -t, -T, etc. Also, I suggest using the
>> Linuxish name "--exchange" instead of the macOSish name "--swap", and
>> (for now at least) not giving the option a single-letter equivalent as I
>> expect it to be useful from scripts, not interactively.
>>
>> After looking at various ways to do it I came up with the attached
>> proposed patch. This should work on both GNU/Linux and macOS, if your OS
>> is recent enough and the file system supports atomic exchange.
>
> The implementation looks good.
>
> Re exch(1) on macos, I see util-linux is on homebrew,
> so it would be relatively easy to ifdef renameatx_np in util-linux also.
>
> I think the --no-copy situation is brittle, as scripts not using it now
> would be atomic, but then if we ever supported cross fs swaps
> it may become non atomic. I'd at least doc with a line in the --exchange
> description in usage() to say something like:
> "Use --no-copy to enforce atomic operation"
>
> While the most flexible, it's also quite awkward to need
> `mv --exchange --no-copy --no-target-directory` for most uses.
> I.e. it's tempting to imply the --no-... options with --exchange,
> but I suppose since scripting is the primary use case for this
> flexibility trumps conciseness, so I'm ok with the verbosity I think.
Oh also in the texinfo I think it's important to mention that the swap
will "exchange all data and metadata". That's not obvious otherwise.
For example users may be wondering if only data was being exchanged
with the macos exchangedata(2) or equivalent.
cheers,
Pádraig
On 17/03/2024 06:10, Paul Eggert wrote:
> On 2024-03-05 06:16, Pádraig Brady wrote:
>> I think I'll remove the as yet unreleased mv --swap from coreutils,
>> given that
>> util-linux is as widely available as coreutils on GNU/Linux platforms.
>
> Although removing that "mv --swap" implementation was a win, I don't
> think we can simply delegate this to util-linux's exch command.
> Exchanging files via a renameat-like call is not limited to the Linux
> kernel; it's also possible on macOS via renameatx_np with RENAME_SWAP,
> and there have been noises about adding similar things to other
> operating systems.
>
> I just now added support for macOS renameatx_np to Gnulib, so coreutils
> does not need to worry about the macOS details; it can simply use
> renameatu with the Linux flags. See:
>
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=af32ee824ee18255839f9812b8ed61aa5257a82b
>
> Even with Linux it's dicey. People may have older util-linux installed
> and so lack the 'exch' utility; this is true for both Fedora 39 and
> Ubuntu 23.10, the current releases. Ubuntu is also odd in that it
> doesn't install all the util-linux utilities as part of the util-linux
> package, so it's not clear what they will do with 'exch'.
>
> So I propose that we implement the idea in coreutils in a better way,
> that interacts more nicely with -t, -T, etc. Also, I suggest using the
> Linuxish name "--exchange" instead of the macOSish name "--swap", and
> (for now at least) not giving the option a single-letter equivalent as I
> expect it to be useful from scripts, not interactively.
>
> After looking at various ways to do it I came up with the attached
> proposed patch. This should work on both GNU/Linux and macOS, if your OS
> is recent enough and the file system supports atomic exchange.
The implementation looks good.
Re exch(1) on macos, I see util-linux is on homebrew,
so it would be relatively easy to ifdef renameatx_np in util-linux also.
I think the --no-copy situation is brittle, as scripts not using it now
would be atomic, but then if we ever supported cross fs swaps
it may become non atomic. I'd at least doc with a line in the --exchange
description in usage() to say something like:
"Use --no-copy to enforce atomic operation"
While the most flexible, it's also quite awkward to need
`mv --exchange --no-copy --no-target-directory` for most uses.
I.e. it's tempting to imply the --no-... options with --exchange,
but I suppose since scripting is the primary use case for this
flexibility trumps conciseness, so I'm ok with the verbosity I think.
thanks,
Pádraig
[-- Attachment #1: Type: text/plain, Size: 1760 bytes --] On 2024-03-05 06:16, Pádraig Brady wrote: > I think I'll remove the as yet unreleased mv --swap from coreutils, > given that > util-linux is as widely available as coreutils on GNU/Linux platforms. Although removing that "mv --swap" implementation was a win, I don't think we can simply delegate this to util-linux's exch command. Exchanging files via a renameat-like call is not limited to the Linux kernel; it's also possible on macOS via renameatx_np with RENAME_SWAP, and there have been noises about adding similar things to other operating systems. I just now added support for macOS renameatx_np to Gnulib, so coreutils does not need to worry about the macOS details; it can simply use renameatu with the Linux flags. See: https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=af32ee824ee18255839f9812b8ed61aa5257a82b Even with Linux it's dicey. People may have older util-linux installed and so lack the 'exch' utility; this is true for both Fedora 39 and Ubuntu 23.10, the current releases. Ubuntu is also odd in that it doesn't install all the util-linux utilities as part of the util-linux package, so it's not clear what they will do with 'exch'. So I propose that we implement the idea in coreutils in a better way, that interacts more nicely with -t, -T, etc. Also, I suggest using the Linuxish name "--exchange" instead of the macOSish name "--swap", and (for now at least) not giving the option a single-letter equivalent as I expect it to be useful from scripts, not interactively. After looking at various ways to do it I came up with the attached proposed patch. This should work on both GNU/Linux and macOS, if your OS is recent enough and the file system supports atomic exchange. [-- Attachment #2: 0001-mv-new-option-exchange.patch --] [-- Type: text/x-patch, Size: 12415 bytes --] From d522aba06107d3532ad6103470727bf9057f8d2c Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@cs.ucla.edu> Date: Sat, 16 Mar 2024 22:50:17 -0700 Subject: [PATCH] mv: new option --exchange * src/copy.h (struct cp_options): New member 'exchange'. * src/copy.c (copy_internal): Support the new member. * src/mv.c (EXCHANGE_OPTION): New constant. (long_options): Add --exchange. (usage): Document --exchange. (main): Support --exchange. * tests/mv/mv-exchange.sh: New test case. * tests/local.mk (all_tests): Add it. --- NEWS | 7 ++++++ doc/coreutils.texi | 18 ++++++++++++++ src/copy.c | 54 +++++++++++++++++++++++------------------ src/copy.h | 4 +++ src/mv.c | 16 +++++++++--- tests/local.mk | 1 + tests/mv/mv-exchange.sh | 41 +++++++++++++++++++++++++++++++ 7 files changed, 114 insertions(+), 27 deletions(-) create mode 100755 tests/mv/mv-exchange.sh diff --git a/NEWS b/NEWS index f21efc7c0..67bb27ebb 100644 --- a/NEWS +++ b/NEWS @@ -81,6 +81,13 @@ GNU coreutils NEWS -*- outline -*- and the command exits with failure status if existing files. The -n,--no-clobber option is best avoided due to platform differences. + mv now accepts an --exchange option, which causes the source and + destination to be exchanged. It should be combined with + --no-target-directory (-T) if the destination is a directory. + The exchange is atomic if source and destination are on a single + file system that supports atomic exchange; --exchange is not yet + supported in other situations. + od now supports printing IEEE half precision floating point with -t fH, or brain 16 bit floating point with -t fB, where supported by the compiler. diff --git a/doc/coreutils.texi b/doc/coreutils.texi index d07ed7e76..c456a03d9 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -10269,6 +10269,24 @@ skip existing files but not fail. If a file cannot be renamed because the destination file system differs, fail with a diagnostic instead of copying and then removing the file. +@item --exchange +@opindex --exchange +Exchange source and destination instead of renaming source to destination. +Both files must exist; they need not be the same type. +The exchange is atomic if the source and destination are both in a +single file system that supports atomic exchange; +exchanges are not yet supported in other situations. + +This option can be used to replace one directory with another, atomically. +When used this way, it should be combined with +@code{--no-target-directory} (@option{-T}) +to avoid confusion about the destination location. +Also, if the two directories might not be on the same file system, +using @code{--no-copy} will prevent future +versions of @command{mv} from implementing the exchange by copying. +For example, you might use @samp{mv -T --exchange --no-copy +@var{d1} @var{d2}} to exchange the directories @var{d1} and @var{d2}. + @item -u @itemx --update @opindex -u diff --git a/src/copy.c b/src/copy.c index 8d99f8562..e7bf6022f 100644 --- a/src/copy.c +++ b/src/copy.c @@ -2223,9 +2223,11 @@ copy_internal (char const *src_name, char const *dst_name, { if (rename_errno < 0) rename_errno = (renameatu (AT_FDCWD, src_name, dst_dirfd, drelname, - RENAME_NOREPLACE) + (x->exchange + ? RENAME_EXCHANGE : RENAME_NOREPLACE)) ? errno : 0); - nonexistent_dst = *rename_succeeded = rename_errno == 0; + *rename_succeeded = rename_errno == 0; + nonexistent_dst = *rename_succeeded && !x->exchange; } if (rename_errno == 0 @@ -2246,7 +2248,7 @@ copy_internal (char const *src_name, char const *dst_name, src_mode = src_sb.st_mode; - if (S_ISDIR (src_mode) && !x->recursive) + if (S_ISDIR (src_mode) && !x->recursive && !x->exchange) { error (0, 0, ! x->install_mode /* cp */ ? _("-r not specified; omitting directory %s") @@ -2289,7 +2291,7 @@ copy_internal (char const *src_name, char const *dst_name, treated the same as nonexistent files. */ bool new_dst = 0 < nonexistent_dst; - if (! new_dst) + if (! new_dst && ! x->exchange) { /* Normally, fill in DST_SB or set NEW_DST so that later code can use DST_SB if NEW_DST is false. However, don't bother @@ -2657,7 +2659,7 @@ skip: Also, with --recursive, record dev/ino of each command-line directory. We'll use that info to detect this problem: cp -R dir dir. */ - if (rename_errno == 0) + if (rename_errno == 0 || x->exchange) earlier_file = nullptr; else if (x->recursive && S_ISDIR (src_mode)) { @@ -2752,7 +2754,7 @@ skip: if (x->move_mode) { - if (rename_errno == EEXIST) + if (rename_errno == EEXIST && !x->exchange) rename_errno = (renameat (AT_FDCWD, src_name, dst_dirfd, drelname) == 0 ? 0 : errno); @@ -2781,7 +2783,7 @@ skip: _destination_ dev/ino, since the rename above can't have changed those, and 'mv' always uses lstat. We could limit it further by operating - only on non-directories. */ + only on non-directories when !x->exchange. */ record_file (x->dest_info, dst_relname, &src_sb); } @@ -2828,7 +2830,7 @@ skip: where you'd replace '18' with the integer in parentheses that was output from the perl one-liner above. If necessary, of course, change '/tmp' to some other directory. */ - if (rename_errno != EXDEV || x->no_copy) + if (rename_errno != EXDEV || x->no_copy || x->exchange) { /* There are many ways this can happen due to a race condition. When something happens between the initial follow_fstatat and the @@ -2841,25 +2843,29 @@ skip: destination file are made too restrictive, the rename will fail. Etc. */ char const *quoted_dst_name = quoteaf_n (1, dst_name); - switch (rename_errno) - { - case EDQUOT: case EEXIST: case EISDIR: case EMLINK: - case ENOSPC: case ETXTBSY: + if (x->exchange) + error (0, rename_errno, _("cannot exchange %s and %s"), + quoteaf_n (0, src_name), quoted_dst_name); + else + switch (rename_errno) + { + case EDQUOT: case EEXIST: case EISDIR: case EMLINK: + case ENOSPC: case ETXTBSY: #if ENOTEMPTY != EEXIST - case ENOTEMPTY: + case ENOTEMPTY: #endif - /* The destination must be the problem. Don't mention - the source as that is more likely to confuse the user - than be helpful. */ - error (0, rename_errno, _("cannot overwrite %s"), - quoted_dst_name); - break; + /* The destination must be the problem. Don't mention + the source as that is more likely to confuse the user + than be helpful. */ + error (0, rename_errno, _("cannot overwrite %s"), + quoted_dst_name); + break; - default: - error (0, rename_errno, _("cannot move %s to %s"), - quoteaf_n (0, src_name), quoted_dst_name); - break; - } + default: + error (0, rename_errno, _("cannot move %s to %s"), + quoteaf_n (0, src_name), quoted_dst_name); + break; + } forget_created (src_sb.st_ino, src_sb.st_dev); return false; } diff --git a/src/copy.h b/src/copy.h index dfa9435b3..ab89c75fd 100644 --- a/src/copy.h +++ b/src/copy.h @@ -155,6 +155,10 @@ struct cp_options If that fails and NO_COPY, fail instead of copying. */ bool move_mode, no_copy; + /* Exchange instead of renaming. Valid only if MOVE_MODE and if + BACKUP_TYPE == no_backups. */ + bool exchange; + /* If true, install(1) is the caller. */ bool install_mode; diff --git a/src/mv.c b/src/mv.c index 9dc40fe3e..692943a70 100644 --- a/src/mv.c +++ b/src/mv.c @@ -48,6 +48,7 @@ enum { DEBUG_OPTION = CHAR_MAX + 1, + EXCHANGE_OPTION, NO_COPY_OPTION, STRIP_TRAILING_SLASHES_OPTION }; @@ -67,6 +68,7 @@ static struct option const long_options[] = {"backup", optional_argument, nullptr, 'b'}, {"context", no_argument, nullptr, 'Z'}, {"debug", no_argument, nullptr, DEBUG_OPTION}, + {"exchange", no_argument, nullptr, EXCHANGE_OPTION}, {"force", no_argument, nullptr, 'f'}, {"interactive", no_argument, nullptr, 'i'}, {"no-clobber", no_argument, nullptr, 'n'}, /* Deprecated. */ @@ -271,6 +273,9 @@ Rename SOURCE to DEST, or move SOURCE(s) to DIRECTORY.\n\ "), stdout); fputs (_("\ --debug explain how a file is copied. Implies -v\n\ +"), stdout); + fputs (_("\ + --exchange exchange source and destination\n\ "), stdout); fputs (_("\ -f, --force do not prompt before overwriting\n\ @@ -361,6 +366,9 @@ main (int argc, char **argv) case DEBUG_OPTION: x.debug = x.verbose = true; break; + case EXCHANGE_OPTION: + x.exchange = true; + break; case NO_COPY_OPTION: x.no_copy = true; break; @@ -469,7 +477,7 @@ main (int argc, char **argv) else { char const *lastfile = file[n_files - 1]; - if (n_files == 2) + if (n_files == 2 && !x.exchange) x.rename_errno = (renameatu (AT_FDCWD, file[0], AT_FDCWD, lastfile, RENAME_NOREPLACE) ? errno : 0); @@ -514,11 +522,13 @@ main (int argc, char **argv) strip_trailing_slashes (file[i]); if (make_backups - && (x.interactive == I_ALWAYS_SKIP + && (x.exchange + || x.interactive == I_ALWAYS_SKIP || x.interactive == I_ALWAYS_NO)) { error (0, 0, - _("--backup is mutually exclusive with -n or --update=none-fail")); + _("cannot combine --backup with " + "--exchange, -n, or --update=none-fail")); usage (EXIT_FAILURE); } diff --git a/tests/local.mk b/tests/local.mk index 7cd1ef7b5..f0ac0386f 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -698,6 +698,7 @@ all_tests = \ tests/mv/into-self-3.sh \ tests/mv/into-self-4.sh \ tests/mv/leak-fd.sh \ + tests/mv/mv-exchange.sh \ tests/mv/mv-n.sh \ tests/mv/mv-special-1.sh \ tests/mv/no-copy.sh \ diff --git a/tests/mv/mv-exchange.sh b/tests/mv/mv-exchange.sh new file mode 100755 index 000000000..485403a1d --- /dev/null +++ b/tests/mv/mv-exchange.sh @@ -0,0 +1,41 @@ +#!/bin/sh +# Test mv --exchange. + +# Copyright (C) 2024 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <https://www.gnu.org/licenses/>. + +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src +print_ver_ mv + + +# Test exchanging files. +touch a || framework_failure_ +mkdir b || framework_failure_ +if ! mv -T --exchange a b 2>exchange_err; then + grep 'not supported' exchange_err || { cat exchange_err; fail=1; } +else + test -d a || fail=1 + test -f b || fail=1 +fi + +# Test wrong number of arguments. +touch c || framework_failure_ +returns_ 1 mv --exchange a 2>/dev/null || fail=1 +returns_ 1 mv --exchange a b c 2>/dev/null || fail=1 + +# Both files must exist. +returns_ 1 mv --exchange a d 2>/dev/null || fail=1 + +Exit $fail -- 2.40.1
Hi all, I'm looking for any comments on coresched, a program that allows you to manage core scheduling cookies for tasks. === What is Core Scheduling === Core Scheduling can be used to ensure that certain tasks will never be scheduled on the same physical core. This can be a useful, alternative, mitigation to hardware vulnerabilities like L1tf or MDS. The full software mitigation for these vulnerabilities would be to disable SMT/Hyper-Threading. However, this can be prohibitively expensive and therefore often not done in practice. With Core Scheduling you can mitigate in these issues in some scenarios, while keeping SMT enabled. Core Scheduling works by adding a random "cookie" to a process. Only processes with the same core scheduling cookie are allowed to run on sibling cores. Tasks that trust each other can be given the same cookie and untrusted tasks are given a different cookie. This is important when running VMs that don't trust each other, as it prevents a guest VM to leak data from another guest VM with L1tf or MDS. === Motivation === The kernel exposes a prctl uapi to manage core scheduling cookies (see https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html) Last week, I wanted to use core scheduling on some programs. Adding the prctl calls and recompiling felt a bit inconvenient, so I looked for a program like taskset that could do the job without having to modify the target program. I couldn't find any, and so I wrote a small program that does this. Hopefully it saves the next person some time :) === RFC === I'm looking forward to any comments that you might have on the patch! Please note that I haven't written the manpage and the bash completion script yet. I first wanted to get some feedback on the program before I start documenting it in more detail. I'm particularly curious about your thoughts on the following things: - General comments about interacting with the program: Do the options make sense? Are there any necessary functions missing? Are the error messages helpful? Is the output too verbose/not verbose enough? - How should the program behave if the prctl core scheduling API is not available? It has been in Linus' tree since november 2021 (commit a41b74451b35f7a6529689760eb8c05241feecbc) but it can be disabled with CONFIG_SCHED_CORE=n - Most of the options require the user to have the CAP_SYS_PTRACE capability. Should the program notify the user that the capability is missing when the prctl call returns -EPREM, or does a mention in the man page suffice? - I've currently licensed it under the EUPL v1.2, which is easier to enforce in my jurisdiction than the GPL. It is GPL compatible so it shouldn't be an issue, but if anybody has any remarks on this, please let me know. Thanks for taking the time! Best regards, Thijs Raymakers Signed-off-by: Thijs Raymakers <thijs@raymakers.nl> --- .gitignore | 1 + bash-completion/coresched | 0 configure.ac | 12 +- meson.build | 16 +- meson_options.txt | 2 +- schedutils/Makemodule.am | 8 + schedutils/coresched.1.adoc | 16 ++ schedutils/coresched.c | 340 ++++++++++++++++++++++++++++++++++++ 8 files changed, 389 insertions(+), 6 deletions(-) create mode 100644 bash-completion/coresched create mode 100644 schedutils/coresched.1.adoc create mode 100644 schedutils/coresched.c diff --git a/.gitignore b/.gitignore index 6ecbfa7fe..316f3cdcc 100644 --- a/.gitignore +++ b/.gitignore @@ -94,6 +94,7 @@ ylwrap /colcrt /colrm /column +/coresched /ctrlaltdel /delpart /dmesg diff --git a/bash-completion/coresched b/bash-completion/coresched new file mode 100644 index 000000000..e69de29bb diff --git a/configure.ac b/configure.ac index ab7c98636..3a189a075 100644 --- a/configure.ac +++ b/configure.ac @@ -2500,9 +2500,9 @@ UL_REQUIRES_HAVE([setterm], [ncursesw, ncurses], [ncursesw or ncurses library]) AM_CONDITIONAL([BUILD_SETTERM], [test "x$build_setterm" = xyes]) # build_schedutils= is just configure-only variable to control -# ionice, taskset and chrt +# ionice, taskset, coresched and chrt AC_ARG_ENABLE([schedutils], - AS_HELP_STRING([--disable-schedutils], [do not build chrt, ionice, taskset]), + AS_HELP_STRING([--disable-schedutils], [do not build chrt, ionice, taskset, coresched]), [], [UL_DEFAULT_ENABLE([schedutils], [check])] ) @@ -2545,6 +2545,14 @@ UL_REQUIRES_SYSCALL_CHECK([taskset], AM_CONDITIONAL([BUILD_TASKSET], [test "x$build_taskset" = xyes]) +UL_ENABLE_ALIAS([coresched], [schedutils]) +UL_BUILD_INIT([coresched]) +UL_REQUIRES_SYSCALL_CHECK([coresched], + [UL_CHECK_SYSCALL([prctl])], + [prctl]) +AM_CONDITIONAL([BUILD_CORESCHED], [test "x$build_coresched" = xyes]) + + have_schedsetter=no AS_IF([test "x$ac_cv_func_sched_setscheduler" = xyes], [have_schedsetter=yes], [test "x$ac_cv_func_sched_setattr" = xyes], [have_schedsetter=yes]) diff --git a/meson.build b/meson.build index f7baab7a2..8244c43a9 100644 --- a/meson.build +++ b/meson.build @@ -3107,13 +3107,23 @@ exe4 = executable( install : opt, build_by_default : opt) +exe5 = executable( + 'coresched', + 'schedutils/coresched.c', + include_directories : includes, + link_with : lib_common, + install_dir : usrbin_exec_dir, + install : opt, + build_by_default : opt) + if opt and not is_disabler(exe) - exes += [exe, exe2, exe3, exe4] + exes += [exe, exe2, exe3, exe4, exe5] manadocs += ['schedutils/chrt.1.adoc', 'schedutils/ionice.1.adoc', 'schedutils/taskset.1.adoc', - 'schedutils/uclampset.1.adoc'] - bashcompletions += ['chrt', 'ionice', 'taskset', 'uclampset'] + 'schedutils/uclampset.1.adoc', + 'schedutils/coresched.1.adoc'] + bashcompletions += ['chrt', 'ionice', 'taskset', 'uclampset', 'coresched'] endif ############################################################ diff --git a/meson_options.txt b/meson_options.txt index 7b8cf3f35..3405c1b73 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -162,7 +162,7 @@ option('build-pipesz', type : 'feature', option('build-setterm', type : 'feature', description : 'build setterm') option('build-schedutils', type : 'feature', - description : 'build chrt, ionice, taskset') + description : 'build chrt, ionice, taskset, coresched') option('build-wall', type : 'feature', description : 'build wall') option('build-write', type : 'feature', diff --git a/schedutils/Makemodule.am b/schedutils/Makemodule.am index 1040da85f..0cb655401 100644 --- a/schedutils/Makemodule.am +++ b/schedutils/Makemodule.am @@ -29,3 +29,11 @@ dist_noinst_DATA += schedutils/uclampset.1.adoc uclampset_SOURCES = schedutils/uclampset.c schedutils/sched_attr.h uclampset_LDADD = $(LDADD) libcommon.la endif + +if BUILD_CORESCHED +usrbin_exec_PROGRAMS += coresched +MANPAGES += schedutils/coresched.1 +dist_noinst_DATA += schedutils/coresched.1.adoc +coresched_SOURCES = schedutils/coresched.c +coresched_LDADD = $(LDADD) libcommon.la +endif diff --git a/schedutils/coresched.1.adoc b/schedutils/coresched.1.adoc new file mode 100644 index 000000000..60a21cd01 --- /dev/null +++ b/schedutils/coresched.1.adoc @@ -0,0 +1,16 @@ +//po4a: entry man manual +//// +coresched(1) manpage +//// += coresched(1) +:doctype: manpage +:man manual: User Commands +:man source: util-linux {release-version} +:page-layout: base +:command: coresched +:colon: : +:copyright: © + +== NAME + +coresched - manage core scheduling cookies for tasks diff --git a/schedutils/coresched.c b/schedutils/coresched.c new file mode 100644 index 000000000..4be8f9fda --- /dev/null +++ b/schedutils/coresched.c @@ -0,0 +1,340 @@ +/** + * SPDX-License-Identifier: EUPL-1.2 + * + * coresched.c - manage core scheduling cookies for tasks + * + * Copyright (C) 2024 Thijs Raymakers + * Licensed under the EUPL v1.2 + */ + +#include <getopt.h> +#include <stdbool.h> +#include <stdio.h> +#include <sys/prctl.h> +#include <sys/wait.h> + +#include "c.h" +#include "closestream.h" +#include "nls.h" +#include "strutils.h" + +typedef enum { + SCHED_CORE_SCOPE_PID = PR_SCHED_CORE_SCOPE_THREAD, + SCHED_CORE_SCOPE_TGID = PR_SCHED_CORE_SCOPE_THREAD_GROUP, + SCHED_CORE_SCOPE_PGID = PR_SCHED_CORE_SCOPE_PROCESS_GROUP, +} core_sched_type_t; + +typedef enum { + SCHED_CORE_CMD_NONE = 0, + SCHED_CORE_CMD_GET = 1, + SCHED_CORE_CMD_CREATE = 2, + SCHED_CORE_CMD_COPY = 4, + SCHED_CORE_CMD_EXEC = 8, +} core_sched_cmd_t; + +struct args { + pid_t from_pid; + pid_t to_pid; + core_sched_type_t type; + core_sched_cmd_t cmd; + int exec_argv_offset; +}; + +unsigned long core_sched_get_cookie(struct args *args); +void core_sched_create_cookie(struct args *args); +void core_sched_pull_cookie(pid_t from); +void core_sched_push_cookie(pid_t to, core_sched_type_t type); +void core_sched_copy_cookie(struct args *args); +void core_sched_exec_with_cookie(struct args *args, char **argv); + +core_sched_type_t parse_core_sched_type(char *str); +bool verify_arguments(struct args *args); +void parse_arguments(int argc, char **argv, struct args *args); + +unsigned long core_sched_get_cookie(struct args *args) +{ + unsigned long cookie = 0; + int prctl_errno = prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, + args->from_pid, SCHED_CORE_SCOPE_PID, &cookie); + if (prctl_errno) { + errx(-prctl_errno, "Failed to get cookie from PID %d", + args->from_pid); + } + return cookie; +} + +void core_sched_create_cookie(struct args *args) +{ + int prctl_errno = prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, + args->from_pid, args->type, 0); + if (prctl_errno) { + errx(-prctl_errno, "Failed to create cookie for PID %d", + args->from_pid); + } +} + +void core_sched_pull_cookie(pid_t from) +{ + int prctl_errno = prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, from, + SCHED_CORE_SCOPE_PID, 0); + if (prctl_errno) { + errx(-prctl_errno, "Failed to pull cookie from PID %d", from); + } +} + +void core_sched_push_cookie(pid_t to, core_sched_type_t type) +{ + int prctl_errno = + prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, to, type, 0); + if (prctl_errno) { + errx(-prctl_errno, "Failed to push cookie to PID %d", to); + } +} + +void core_sched_copy_cookie(struct args *args) +{ + core_sched_pull_cookie(args->from_pid); + core_sched_push_cookie(args->to_pid, args->type); +} + +void core_sched_exec_with_cookie(struct args *args, char **argv) +{ + if (!args->exec_argv_offset) { + errx(EINVAL, "when --exec is provided, a program name " + "has to be given."); + } + + // Move the argument list to the first argument of the program + argv = &argv[args->exec_argv_offset]; + + pid_t pid = fork(); + if (pid == -1) { + errx(errno, "Failed to spawn new process"); + } + + if (!pid) { + // If a source PID is provided, try to copy the cookie from + // that PID. Otherwise, create a brand new cookie with the + // provided type. + if (args->from_pid) { + core_sched_pull_cookie(args->from_pid); + } else { + args->from_pid = getpid(); + core_sched_create_cookie(args); + } + if (execvp(argv[0], argv)) { + errexec(argv[0]); + } + } else { + int status = 0; + waitpid(pid, &status, 0); + exit(status); + } +} + +core_sched_type_t parse_core_sched_type(char *str) +{ + if (!strncmp(str, "pid\0", 4)) { + return SCHED_CORE_SCOPE_PID; + } else if (!strncmp(str, "tgid\0", 5)) { + return SCHED_CORE_SCOPE_TGID; + } else if (!strncmp(str, "pgid\0", 5)) { + return SCHED_CORE_SCOPE_PGID; + } + + errx(EINVAL, "'%s' is an invalid option. Must be one of pid/tgid/pgid", + str); + __builtin_unreachable(); +} + +static void __attribute__((__noreturn__)) usage(void) +{ + fputs(USAGE_HEADER, stdout); + fprintf(stdout, _(" %s --get <PID>\n"), program_invocation_short_name); + fprintf(stdout, _(" %s --new <PID> [-t <TYPE>]\n"), + program_invocation_short_name); + fprintf(stdout, _(" %s --copy -s <PID> -d <PID> [-t <TYPE>]\n"), + program_invocation_short_name); + fprintf(stdout, _(" %s --exec [-s <PID>] -- PROGRAM ARGS... \n"), + program_invocation_short_name); + + fputs(USAGE_SEPARATOR, stdout); + fputsln(_("Manage core scheduling cookies for tasks."), stdout); + + fputs(USAGE_FUNCTIONS, stdout); + fputsln(_(" -g, --get <PID> get the core scheduling cookie of a PID"), + stdout); + fputsln(_(" -n, --new <PID> assign a new core scheduling cookie to PID"), + stdout); + fputsln(_(" -c, --copy copy the core scheduling cookie from PID to\n" + " another PID, requires the --source and --dest option"), + stdout); + fputsln(_(" -e, --exec execute a program with a new core scheduling\n" + " cookie."), + stdout); + + fputs(USAGE_OPTIONS, stdout); + fputsln(_(" -s, --source <PID> where to copy the core scheduling cookie from."), + stdout); + fputsln(_(" -d, --dest <PID> where to copy the core scheduling cookie to."), + stdout); + fputsln(_(" -t, --type type of the destination PID, or the type of\n" + " the PID when a new core scheduling cookie\n" + " is created. Can be one of the following:\n" + " pid, tgid or pgid. Defaults to tgid."), + stdout); + fputs(USAGE_SEPARATOR, stdout); + fprintf(stdout, + USAGE_HELP_OPTIONS( + 25)); /* char offset to align option descriptions */ + fprintf(stdout, USAGE_MAN_TAIL("coresched(1)")); + exit(EXIT_SUCCESS); +} + +bool verify_arguments(struct args *args) +{ + if (args->cmd == SCHED_CORE_CMD_NONE) { + usage(); + } + + // Check if the value of args->cmd is a power of 2 + // In that case, only a single function option was set. + if (!(args->cmd && !(args->cmd & (args->cmd - 1)))) { + errx(EINVAL, "Cannot do more than one function at a time."); + } + + if (args->from_pid < 0) { + errx(EINVAL, "source PID cannot be negative"); + } + + if (args->to_pid < 0) { + errx(EINVAL, "destination PID cannot be negative"); + } + + if (args->from_pid == 0 && args->cmd == SCHED_CORE_CMD_COPY) { + errx(EINVAL, "valid argument to --source is required"); + } + + if (args->to_pid == 0 && args->cmd == SCHED_CORE_CMD_COPY) { + errx(EINVAL, "valid argument to --dest is required"); + } + + if (args->from_pid == 0 && args->cmd != SCHED_CORE_CMD_EXEC) { + errx(EINVAL, "PID cannot be zero"); + } + + return true; +} + +void parse_arguments(int argc, char **argv, struct args *args) +{ + int c; + + enum { + OPT_GET = 'g', + OPT_NEW = 'n', + OPT_COPY = 'c', + OPT_EXEC = 'e', + OPT_SRC = 's', + OPT_DEST = 'd', + OPT_TYPE = 't', + OPT_VERSION = 'V', + OPT_HELP = 'h' + }; + + static const struct option longopts[] = { + { "get", required_argument, NULL, OPT_GET }, + { "new", required_argument, NULL, OPT_NEW }, + { "copy", no_argument, NULL, OPT_COPY }, + { "exec", no_argument, NULL, OPT_EXEC }, + { "source", required_argument, NULL, OPT_SRC }, + { "destination", required_argument, NULL, OPT_DEST }, + { "type", required_argument, NULL, OPT_TYPE }, + { "version", no_argument, NULL, OPT_VERSION }, + { "help", no_argument, NULL, OPT_HELP }, + { NULL, 0, NULL, 0 } + }; + + while ((c = getopt_long(argc, argv, "g:n:ces:d:t:Vh", longopts, + NULL)) != -1) + switch (c) { + case OPT_GET: + args->cmd |= SCHED_CORE_CMD_GET; + args->from_pid = strtos32_or_err( + optarg, "Failed to parse PID for --get"); + break; + case OPT_NEW: + args->cmd |= SCHED_CORE_CMD_CREATE; + args->from_pid = strtos32_or_err( + optarg, "Failed to parse PID for --new"); + break; + case OPT_COPY: + args->cmd |= SCHED_CORE_CMD_COPY; + break; + case OPT_EXEC: + args->cmd |= SCHED_CORE_CMD_EXEC; + break; + case OPT_SRC: + args->from_pid = strtos32_or_err( + optarg, "Failed to parse PID for --source"); + break; + case OPT_DEST: + args->to_pid = strtos32_or_err( + optarg, "Failed to parse PID for --dest"); + break; + case OPT_TYPE: + args->type = parse_core_sched_type(optarg); + break; + case OPT_VERSION: + print_version(EXIT_SUCCESS); + case OPT_HELP: + usage(); + default: + errtryhelp(EXIT_FAILURE); + } + + if (argc > optind) { + args->exec_argv_offset = optind; + } + verify_arguments(args); +} + +int main(int argc, char **argv) +{ + struct args arguments = { 0 }; + arguments.type = SCHED_CORE_SCOPE_TGID; + + setlocale(LC_ALL, ""); + bindtextdomain(PACKAGE, LOCALEDIR); + textdomain(PACKAGE); + close_stdout_atexit(); + + parse_arguments(argc, argv, &arguments); + + unsigned long cookie = 0; + switch (arguments.cmd) { + case SCHED_CORE_CMD_GET: + cookie = core_sched_get_cookie(&arguments); + if (cookie) { + printf("core scheduling cookie of pid %d is 0x%lx\n", + arguments.from_pid, cookie); + } else { + printf("pid %d doesn't have a core scheduling cookie\n", + arguments.from_pid); + exit(1); + } + break; + case SCHED_CORE_CMD_CREATE: + core_sched_create_cookie(&arguments); + break; + case SCHED_CORE_CMD_COPY: + core_sched_copy_cookie(&arguments); + break; + case SCHED_CORE_CMD_EXEC: + core_sched_exec_with_cookie(&arguments, argv); + break; + default: + usage(); + exit(1); + } +} -- 2.44.0
When I knew RENAME_EXCHANGE, I thought we should extend mv command as you did: adding --swap. However, after researching the past challenges, I decided not to propose the feature to coreutils. https://www.gnu.org/software/coreutils/rejected_requests.html https://lists.gnu.org/archive/html/coreutils/2018-12/msg00004.html https://www.mail-archive.com/coreutils@gnu.org/msg10276.html Masatake YAMATO > On 05/03/2024 04:10, Paul Eggert wrote: >> On 3/4/24 16:43, Dominique Martinet wrote: >>> Adding Rob to the loop because this impacts compatibility with >>> toybox/maybe busybox implementations >> Busybox does not use RENAME_EXCHANGE, so this isn't a Busybox issue. >> Toybox mv added -x to its development version yesterday: >> https://github.com/landley/toybox/commit/a2419ad52d489bf1a84a9f3aa73afb351642c765 >> so there's little prior art there, and there's still plenty of time to >> fix its problems before exposing it to the world. >> >>> I also see --swap mostly used by scripts and this actually feels a bit >>> dangerous to me -- I'd *always* use this with -T. >> Yes, it's a problem. >> By "see --swap mostly used by scripts" I assume you mean scripts that >> haven't been written yet, assuming that nobody had -x until >> yesterday.... >> >>> (by the way, what's this "rename" command you speak of? >> https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/ >> Now that I've looked into it further, util-linux already has an "exch" >> command that does exactly what you want. This is the command that >> toybox >> should implement rather than try to simulate it with "mv -x" (which >> causes all sorts of problems). >> That is, toybox should revert yesterday's change to "mv", and should >> implement "exch" instead. > > I think having the functionality in mv(1) is better than in rename(1), > but since exch(1) is already released that's probably > the best place for this functionality now. > > A separate exch command may be overkill for just this, > but perhaps related functionality might be added to that command in > future. > For e.g. some of the discussed functionality for a "replace" command > might reside there. > > So I think I'll remove the as yet unreleased mv --swap from coreutils, > given that > util-linux is as widely available as coreutils on GNU/Linux platforms. > > cheers, > Pádraig > >
On Tue, Mar 05, 2024 at 02:16:05PM +0000, Pádraig Brady wrote: > I think having the functionality in mv(1) is better than in rename(1), > but since exch(1) is already released that's probably > the best place for this functionality now. > > A separate exch command may be overkill for just this, rename(1) was also my initial idea, but it's too complex and rarely used by users for simple tasks like those we can now achieve with the new simple command exch(1). > but perhaps related functionality might be added to that command in future. > For e.g. some of the discussed functionality for a "replace" command > might reside there. > > So I think I'll remove the as yet unreleased mv --swap from coreutils, given that > util-linux is as widely available as coreutils on GNU/Linux platforms. Yes, it seems better to have this Linux-specific feature in util-linux. We should discuss such changes early next time ;-) Thanks for CC: Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
On 05/03/2024 04:10, Paul Eggert wrote:
> On 3/4/24 16:43, Dominique Martinet wrote:
>> Adding Rob to the loop because this impacts compatibility with
>> toybox/maybe busybox implementations
>
> Busybox does not use RENAME_EXCHANGE, so this isn't a Busybox issue.
>
> Toybox mv added -x to its development version yesterday:
>
> https://github.com/landley/toybox/commit/a2419ad52d489bf1a84a9f3aa73afb351642c765
>
> so there's little prior art there, and there's still plenty of time to
> fix its problems before exposing it to the world.
>
>
>> I also see --swap mostly used by scripts and this actually feels a bit
>> dangerous to me -- I'd *always* use this with -T.
>
> Yes, it's a problem.
>
> By "see --swap mostly used by scripts" I assume you mean scripts that
> haven't been written yet, assuming that nobody had -x until yesterday....
>
>
>> (by the way, what's this "rename" command you speak of?
>
> https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/
>
> Now that I've looked into it further, util-linux already has an "exch"
> command that does exactly what you want. This is the command that toybox
> should implement rather than try to simulate it with "mv -x" (which
> causes all sorts of problems).
>
> That is, toybox should revert yesterday's change to "mv", and should
> implement "exch" instead.
I think having the functionality in mv(1) is better than in rename(1),
but since exch(1) is already released that's probably
the best place for this functionality now.
A separate exch command may be overkill for just this,
but perhaps related functionality might be added to that command in future.
For e.g. some of the discussed functionality for a "replace" command
might reside there.
So I think I'll remove the as yet unreleased mv --swap from coreutils, given that
util-linux is as widely available as coreutils on GNU/Linux platforms.
cheers,
Pádraig
On Tue, Mar 05, 2024 at 12:51:41AM +0530, Tanish Yadav wrote: > login-utils/su-common.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Applied, thanks. -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
Do not free tmp for non login branch as basename may return a pointer to some part of it. Signed-off-by: Tanish Yadav <devtany@gmail.com> --- login-utils/su-common.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/login-utils/su-common.c b/login-utils/su-common.c index 242b6ce4e..8cb54e1c1 100644 --- a/login-utils/su-common.c +++ b/login-utils/su-common.c @@ -851,10 +851,10 @@ static void run_shell( arg0[0] = '-'; strcpy(arg0 + 1, shell_basename); args[0] = arg0; + free(tmp); } else { - args[0] = basename(tmp); - } - free(tmp); + args[0] = basename(tmp); + } if (su->fast_startup) args[argno++] = "-f"; -- 2.44.0
[-- Attachment #1: Type: text/plain, Size: 696 bytes --] [TO += Skyler] Hi Karel, Skyler, On Mon, Mar 04, 2024 at 01:33:59PM +0100, Karel Zak wrote: > On Sun, Mar 03, 2024 at 11:59:51AM +0100, Alejandro Colomar wrote: > > This seems to be a bug in util-linux, not shadow, so I've added > > util-linux@ to the thread. > > Fixed. Thanks for your report. Thank you. Skyler, it's been fixed here: <https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/commit/?id=677a3168b261f3289e282a02dfd85d7f37de0447> Have a lovely day! Alex > Karel > > > -- > Karel Zak <kzak@redhat.com> > http://karelzak.blogspot.com -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
On Fri, Mar 01, 2024 at 06:30:15PM +0000, Eric Curtin wrote: > We are looking into optimizing the boot sequence of a device with many > partitions. Nice topic :-) > On boot in the default systemd implementation, all the block devices > are queried via libblkid and the various symlinks are set up in > /dev/disk/* from the results of those queries. The problem is on a > device with many partitions this can delay the boot by hundreds of > milliseconds, which is not ideal, especially when in many cases all > you really care about is mounting the block device that represents the > rootfs partition. It's a little bit more complex. It's not just about rootfs. The reason why udevd scans all the devices is to create a udev database and fill it with data usable for running the system. For example, when you type 'lsblk --fs', it reads the data from the database (findmnt, mount, etc.), which is also used by udev rules and for dependency evaluations in systemd, etc. This is the reason why the system gathers all the data about a new device when it's detected. If you want to avoid all of this, you need to customize udev rules where you can filter out what and how to scan. > We can sort of guess "/dev/sde38" is the correct > one, but that's not deterministic. > > So we started digging and came across blkid_find_dev_with_tag and > blkid_dev_devname, which you can call like this: > > blkid_dev_devname(blkid_find_dev_with_tag(cache, "PARTLABEL", "system_a"))) > > blkid_dev_devname(blkid_find_dev_with_tag(cache, "PARTLABEL", "system_b"))) You're on the right track. PARTLABEL and PARTUUID are stored in the partition table, so it's unnecessary to scan partitions for their content (filesystems). > On first glance this looks useful as you don't have to loop through > all the devices to use. > > But this function only seems to work if the data is already cached, so > it's not so useful on boot. Yes, using blkid_dev_devname() is not advisable. It's part of the old high-level libblkid API from the pre-udev era. If you truly need to read data from the device, then utilizing the low-level probing API is recommended. This can be done from the command line with 'blkid -p', but you'll need to disable scanning for all unwanted data (using '--usage no*'). For instance: blkid -o udev -p --usages nofilesystem,raid,crypto,others /dev/sda1 This command will only return ID_PART_ENTRY_* data from the partition table. You can use the LIBBLKID_DEBUG=all environment variable to see the library's operations. The question arises whether using blkid is the ideal solution if you only require PARTLABELs and PARTUUIDs. For example, sfdisk could be a more efficient approach: sfdisk -l /dev/sda -o+NAME,UUID However, a potential issue is that sfdisk only provides the guessed partition names (paths); the name used by the kernel might be different. > Has anyone any ideas on how we can optimize the identification of a > block device via UUID, LABEL, PARTUUID, PARTLABEL, etc.? Because the > current implementations don't scale well when you have many block > devices. It depends on your goal. You can heavily customize your system to speed up boot (all the necessary tools are available for this purpose). However, the problem I see is the issue of portability and the maintenance overhead. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
On Sun, Mar 03, 2024 at 11:59:51AM +0100, Alejandro Colomar wrote: > This seems to be a bug in util-linux, not shadow, so I've added > util-linux@ to the thread. Fixed. Thanks for your report. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
[-- Attachment #1: Type: text/plain, Size: 2148 bytes --] Hi, This seems to be a bug in util-linux, not shadow, so I've added util-linux@ to the thread. The discussion started in the email below, and was later continued in <https://github.com/shadow-maint/shadow/pull/960>. Have a lovely day! Alex On Sat, Mar 02, 2024 at 11:33:16AM -0600, Serge E. Hallyn wrote: > On Sat, Mar 02, 2024 at 11:34:07AM -0500, Skyler Ferrante (RIT Student) wrote: > > Hi Serge, > > > > I was playing around with some of the shadow-utils binaries and I > > realized that an unprivileged user can set argv[0] to contain escape > > sequences, and then cause it to be logged in /var/log/auth.log. > > > > PoC > > ``` > > #include<stdio.h> > > #include<unistd.h> > > int main(int argc, char** my_argv){ > > char* prog = "/usr/bin/su"; > > char* argv[] = {"\033[33mYellow", "root", NULL}; > > char* envp[] = {NULL}; > > > > execve(prog, argv, envp); > > printf("Failed to exec\n"); > > } > > ``` > > Run the binary, and type an incorrect password for root. Now run `tail > > /var/log/auth.log`. It should contain Yellow text. This can be used to > > hide log contents (move the cursor/delete characters). Some terminals > > also allow setting clipboard contents through escape sequences (my > > terminal, windows-terminal, supports this). > > > > It may be a good idea to refuse argv[0] if it contains binary data. > > You could also prevent this bug by not allowing an attacker to choose > > Prog (e.g. su could just use "su" as Prog). > > > > If you don't think this is a bad enough security issue to hide, I can > > post an issue on github. I would argue that you shouldn't cat auth.log > > or view it from tail, but I know a lot of people do. > > > > Cheers, > > Skyler > > Terminals can be a nuisance :) > > I don't think we need to hide this issue, but of course definitely address > it. I'm Cc:ing the other maintainers in case they feel differently. > > Did you want to send a PR to fix it? > > Thanks, > -serge -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
Hi Guys, We are looking into optimizing the boot sequence of a device with many partitions. On boot in the default systemd implementation, all the block devices are queried via libblkid and the various symlinks are set up in /dev/disk/* from the results of those queries. The problem is on a device with many partitions this can delay the boot by hundreds of milliseconds, which is not ideal, especially when in many cases all you really care about is mounting the block device that represents the rootfs partition. We can sort of guess "/dev/sde38" is the correct one, but that's not deterministic. So we started digging and came across blkid_find_dev_with_tag and blkid_dev_devname, which you can call like this: blkid_dev_devname(blkid_find_dev_with_tag(cache, "PARTLABEL", "system_a"))) blkid_dev_devname(blkid_find_dev_with_tag(cache, "PARTLABEL", "system_b"))) On first glance this looks useful as you don't have to loop through all the devices to use. But this function only seems to work if the data is already cached, so it's not so useful on boot. Has anyone any ideas on how we can optimize the identification of a block device via UUID, LABEL, PARTUUID, PARTLABEL, etc.? Because the current implementations don't scale well when you have many block devices. I suspect we may not be the first to encounter this, so just probing to see if anyone had ideas on how to solve this in the past. Is mise le meas/Regards, Eric Curtin
The util-linux release v2.40-rc2 is available at http://www.kernel.org/pub/linux/utils/util-linux/v2.40/ Feedback and bug reports, as always, are welcomed. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
On Wed, Feb 28, 2024 at 03:06:14PM +0100, Stanislav Brabec wrote: > term-utils/setterm.1.adoc | 4 ++++ > 1 file changed, 4 insertions(+) Applied, thanks. -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
Debugging an error of setterm, I realized that setterm --powerdown operates on stdout but setterm --powersave operates on stdin. Such unexpected behavior should be documented. I prefer a less accurate generic "always redirect both stdin and stdout" over recommending of the correct I/O stream for each option separately. Signed-off-by: Stanislav Brabec <sbrabec@suse.cz> --- term-utils/setterm.1.adoc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/term-utils/setterm.1.adoc b/term-utils/setterm.1.adoc index 880fe10d6..365c4bb00 100644 --- a/term-utils/setterm.1.adoc +++ b/term-utils/setterm.1.adoc @@ -156,6 +156,10 @@ Turns underline mode on or off. include::man-common/help-version.adoc[] +== WARNING + +Use of *setterm* in combination with stdout redirection can have unexpected results, as some options operate on stdin. To prevent problems, always redirect both stdin and stdout to the same device. + == COMPATIBILITY Since version 2.25 *setterm* has support for long options with two hyphens, for example *--help*, beside the historical long options with a single hyphen, for example *-help*. In scripts it is better to use the backward-compatible single hyphen rather than the double hyphen. Currently there are no plans nor good reasons to discontinue single-hyphen compatibility. -- 2.43.0 -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.com Křižíkova 148/34 (Corso IIa) tel: +420 284 084 060 186 00 Praha 8-Karlín fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76
On Wed, Feb 21, 2024 at 06:30:50PM +0100, Jan Kara wrote: > libmount/src/hook_loopdev.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) Applied, thanks. -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
On Wed, Feb 21, 2024 at 06:30:50PM +0100, Jan Kara wrote: > Avoid holding writeable fd to a loop device that is being mounted. In > the hardened configurations (CONFIG_BLK_DEV_WRITE_MOUNTED = n) the > kernel wants to make sure nobody else has the block device writeably > open when mounting so this makes the mount fail. > > Reported-by: JunChao Sun <sunjunchao2870@gmail.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- Sounds good! Acked-by: Christian Brauner <brauner@kernel.org> > libmount/src/hook_loopdev.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/libmount/src/hook_loopdev.c b/libmount/src/hook_loopdev.c > index 8c8f7f218732..e2114b0cbebe 100644 > --- a/libmount/src/hook_loopdev.c > +++ b/libmount/src/hook_loopdev.c > @@ -356,15 +356,19 @@ success: > */ > mnt_optlist_append_flags(ol, MS_RDONLY, cxt->map_linux); > > - /* we have to keep the device open until mount(1), > - * otherwise it will be auto-cleared by kernel > + /* > + * We have to keep the device open until mount(1), otherwise it > + * will be auto-cleared by kernel. However we don't want to > + * keep writeable fd as kernel wants to block all writers to > + * the device being mounted (in the more hardened > + * configurations). So grab read-only fd instead. > */ > - hd->loopdev_fd = loopcxt_get_fd(&lc); > + hd->loopdev_fd = open(lc.device, O_RDONLY | O_CLOEXEC); > if (hd->loopdev_fd < 0) { > - DBG(LOOP, ul_debugobj(cxt, "failed to get loopdev FD")); > + DBG(LOOP, > + ul_debugobj(cxt, "failed to reopen loopdev FD")); > rc = -errno; > - } else > - loopcxt_set_fd(&lc, -1, 0); > + } > } > done: > loopcxt_deinit(&lc); > -- > 2.35.3 >
Avoid holding writeable fd to a loop device that is being mounted. In the hardened configurations (CONFIG_BLK_DEV_WRITE_MOUNTED = n) the kernel wants to make sure nobody else has the block device writeably open when mounting so this makes the mount fail. Reported-by: JunChao Sun <sunjunchao2870@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> --- libmount/src/hook_loopdev.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/libmount/src/hook_loopdev.c b/libmount/src/hook_loopdev.c index 8c8f7f218732..e2114b0cbebe 100644 --- a/libmount/src/hook_loopdev.c +++ b/libmount/src/hook_loopdev.c @@ -356,15 +356,19 @@ success: */ mnt_optlist_append_flags(ol, MS_RDONLY, cxt->map_linux); - /* we have to keep the device open until mount(1), - * otherwise it will be auto-cleared by kernel + /* + * We have to keep the device open until mount(1), otherwise it + * will be auto-cleared by kernel. However we don't want to + * keep writeable fd as kernel wants to block all writers to + * the device being mounted (in the more hardened + * configurations). So grab read-only fd instead. */ - hd->loopdev_fd = loopcxt_get_fd(&lc); + hd->loopdev_fd = open(lc.device, O_RDONLY | O_CLOEXEC); if (hd->loopdev_fd < 0) { - DBG(LOOP, ul_debugobj(cxt, "failed to get loopdev FD")); + DBG(LOOP, + ul_debugobj(cxt, "failed to reopen loopdev FD")); rc = -errno; - } else - loopcxt_set_fd(&lc, -1, 0); + } } done: loopcxt_deinit(&lc); -- 2.35.3
Dne 15. 02. 24 v 10:37 Karel Zak napsal(a): > On Wed, Feb 14, 2024 at 03:23:45PM +0100, Karel Zak wrote: >> What about: >> >> systemd does not invoke fsck -A to check all devices; instead, it >> calls fsck individually for devices selected based on the logic >> implemented in systemd-fsck. > I had short discussion about it with Lennart, and he suggested not to > document anywhere systemd-fsck as it's private systemd stuff. I have > pushed to repository: > > https://github.com/util-linux/util-linux/commit/9cb7b7671d903573d6c3b9d8112ec13953cdcdc6 It sound clear. I didn't want to document any details of the implementation in the third party package and it will not need an update in case of an systemd change. -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.com Křižíkova 148/34 (Corso IIa) tel: +420 284 084 060 186 00 Praha 8-Karlín fax: +420 284 084 001 Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76
On Wed, Feb 14, 2024 at 03:23:45PM +0100, Karel Zak wrote: > On Wed, Feb 14, 2024 at 01:17:46PM +0100, Stanislav Brabec wrote: > > systemd uses its own implementation of fsck with a slightly different > > behavior (e. g. fsck -A checks noauto volumes, systemd-fsck does not). > > systemd calls fsck from util-linux, but a new instance (with -l) for each device. > It would be nice to be more explicit and explain it, because "it has its > own implementation" sounds like fsck from util-linux is completely out > of game :-) > > > +== NOTES > > +*systemd* does not call *fsck -A*, but it has its own implementation > > *systemd-fsck*(8). > > What about: > > systemd does not invoke fsck -A to check all devices; instead, it > calls fsck individually for devices selected based on the logic > implemented in systemd-fsck. I had short discussion about it with Lennart, and he suggested not to document anywhere systemd-fsck as it's private systemd stuff. I have pushed to repository: https://github.com/util-linux/util-linux/commit/9cb7b7671d903573d6c3b9d8112ec13953cdcdc6 Hope it good enough :-) Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
On Wed, Feb 14, 2024 at 01:17:46PM +0100, Stanislav Brabec wrote: > systemd uses its own implementation of fsck with a slightly different > behavior (e. g. fsck -A checks noauto volumes, systemd-fsck does not). systemd calls fsck from util-linux, but a new instance (with -l) for each device. It would be nice to be more explicit and explain it, because "it has its own implementation" sounds like fsck from util-linux is completely out of game :-) > +== NOTES > +*systemd* does not call *fsck -A*, but it has its own implementation > *systemd-fsck*(8). What about: systemd does not invoke fsck -A to check all devices; instead, it calls fsck individually for devices selected based on the logic implemented in systemd-fsck. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com
systemd uses its own implementation of fsck with a slightly different behavior (e. g. fsck -A checks noauto volumes, systemd-fsck does not). Refer to it. It is a complementar change to https://github.com/systemd/systemd/commit/000680a68d. Signed-off-by: Stanislav Brabec <sbrabec@suse.cz> --- disk-utils/fsck.8.adoc | 4 ++++ sys-utils/fstab.5.adoc | 2 ++ 2 files changed, 6 insertions(+) diff --git a/disk-utils/fsck.8.adoc b/disk-utils/fsck.8.adoc index 976e7ff08..4ba6f4cc1 100644 --- a/disk-utils/fsck.8.adoc +++ b/disk-utils/fsck.8.adoc @@ -151,6 +151,9 @@ enables libmount debug output. _/etc/fstab_ +== NOTES +*systemd* does not call *fsck -A*, but it has its own implementation *systemd-fsck*(8). + == AUTHORS mailto:tytso@mit.edu>[Theodore Ts'o], @@ -169,6 +172,7 @@ mailto:kzak@redhat.com[Karel Zak] *fsck.vfat*(8), *fsck.xfs*(8), *reiserfsck*(8) +*systemd-fsck*(8) include::man-common/bugreports.adoc[] diff --git a/sys-utils/fstab.5.adoc b/sys-utils/fstab.5.adoc index 1b972ef3b..0f12560e3 100644 --- a/sys-utils/fstab.5.adoc +++ b/sys-utils/fstab.5.adoc @@ -132,6 +132,8 @@ The proper way to read records from *fstab* is to use the routines *getmntent*(3 The keyword *ignore* as a filesystem type (3rd field) is no longer supported by the pure libmount based mount utility (since util-linux v2.22). +This document describes handling of *fstab* by *util-linux* and *libmount*. For *systemd*, read *systemd* documentation. There are slight differences. + == HISTORY The ancestor of this *fstab* file format appeared in 4.0BSD. -- 2.43.0 -- Best Regards / S pozdravem, Stanislav Brabec software developer --------------------------------------------------------------------- SUSE LINUX, s. r. o. e-mail: sbrabec@suse.com Křižíkova 148/34 (Corso IIa) 186 00 Praha 8-Karlín Czech Republic http://www.suse.cz/ PGP: 830B 40D5 9E05 35D8 5E27 6FA3 717C 209F A04F CD76