* dd PATCH: add conv=direct @ 2004-04-06 22:03 Andy Isaacson 2004-04-07 0:33 ` Andrew Morton 2004-04-07 22:02 ` Nathan Straz 0 siblings, 2 replies; 32+ messages in thread From: Andy Isaacson @ 2004-04-06 22:03 UTC (permalink / raw) To: bug-coreutils; +Cc: linux-kernel Linux-kernel: is this patch horribly wrong? On modern Linux, apparently the correct way to bypass the buffer cache when writing to a block device is to open the block device with O_DIRECT. This enables, for example, the user to more easily force a reallocation of a single sector of an IDE disk with a media error (without overwriting anything but the 1k "sector pair" containing the error). dd(1) is convenient for this purpose, but is lacking a method to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to enable this usage. Alas, I don't do autoconf, so I hope someone else can add the appropriate stanza to autoconfiscate HAS_O_DIRECT. -andy diff -ur coreutils-5.0.91/doc/coreutils.texi coreutils-5.0.91-adi/doc/coreutils.texi --- coreutils-5.0.91/doc/coreutils.texi 2003-09-04 16:26:51.000000000 -0500 +++ coreutils-5.0.91-adi/doc/coreutils.texi 2004-04-06 13:48:54.000000000 -0500 @@ -6373,6 +6373,11 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes. When used with @samp{block} or @samp{unblock}, pad with spaces instead of zero bytes. + +@item direct +@opindex direct +Open the output file with O_DIRECT, avoiding (on Linux) using the buffer +cache. @end table @end table diff -ur coreutils-5.0.91/src/dd.c coreutils-5.0.91-adi/src/dd.c --- coreutils-5.0.91/src/dd.c 2003-07-25 02:43:09.000000000 -0500 +++ coreutils-5.0.91-adi/src/dd.c 2004-04-06 13:44:09.000000000 -0500 @@ -66,6 +66,9 @@ /* Default input and output blocksize. */ #define DEFAULT_BLOCKSIZE 512 +/* XXX hack */ +#define HAVE_O_DIRECT 1 + /* Conversions bit masks. */ #define C_ASCII 01 #define C_EBCDIC 02 @@ -78,6 +81,7 @@ #define C_NOERROR 0400 #define C_NOTRUNC 01000 #define C_SYNC 02000 +#define C_DIRECT 010000 /* Use separate input and output buffers, and combine partial input blocks. */ #define C_TWOBUFS 04000 @@ -162,6 +166,9 @@ {"noerror", C_NOERROR}, /* Ignore i/o errors. */ {"notrunc", C_NOTRUNC}, /* Do not truncate output file. */ {"sync", C_SYNC}, /* Pad input records to ibs with NULs. */ +#ifdef HAVE_O_DIRECT + {"direct", C_DIRECT}, /* open files with O_DIRECT */ +#endif {NULL, 0} }; @@ -1190,6 +1197,11 @@ = (O_CREAT | (seek_records || (conversions_mask & C_NOTRUNC) ? 0 : O_TRUNC)); +#if HAVE_O_DIRECT + if (conversions_mask & C_DIRECT) + opts |= O_DIRECT; +#endif + /* Open the output file with *read* access only if we might need to read to satisfy a `seek=' request. If we can't read the file, go ahead with write-only access; it might work. */ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-06 22:03 dd PATCH: add conv=direct Andy Isaacson @ 2004-04-07 0:33 ` Andrew Morton 2004-04-07 16:21 ` Bruce Allen 2004-04-07 17:31 ` Andy Isaacson 2004-04-07 22:02 ` Nathan Straz 1 sibling, 2 replies; 32+ messages in thread From: Andrew Morton @ 2004-04-07 0:33 UTC (permalink / raw) To: Andy Isaacson; +Cc: bug-coreutils, linux-kernel Andy Isaacson <adi@hexapodia.org> wrote: > > On modern Linux, apparently the correct way to bypass the buffer cache > when writing to a block device is to open the block device with > O_DIRECT. This enables, for example, the user to more easily force a > reallocation of a single sector of an IDE disk with a media error > (without overwriting anything but the 1k "sector pair" containing the > error). dd(1) is convenient for this purpose, but is lacking a method > to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to > enable this usage. This would be rather nice to have. You'll need to ensure that the data is page-aligned in memory. While you're there, please add an fsync-before-closing option. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 0:33 ` Andrew Morton @ 2004-04-07 16:21 ` Bruce Allen 2004-04-07 16:42 ` Andrew Morton 2004-04-07 17:31 ` Andy Isaacson 1 sibling, 1 reply; 32+ messages in thread From: Bruce Allen @ 2004-04-07 16:21 UTC (permalink / raw) To: Andrew Morton; +Cc: Andy Isaacson, bug-coreutils, linux-kernel > > On modern Linux, apparently the correct way to bypass the buffer cache > > when writing to a block device is to open the block device with > > O_DIRECT. This enables, for example, the user to more easily force a > > reallocation of a single sector of an IDE disk with a media error > > (without overwriting anything but the 1k "sector pair" containing the > > error). dd(1) is convenient for this purpose, but is lacking a method > > to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to > > enable this usage. > > This would be rather nice to have. You'll need to ensure that the data > is page-aligned in memory. > > While you're there, please add an fsync-before-closing option. Andrew, am I right that this is NOT needed for the proposed O_DIRECT option, since open(2) says: "The I/O is synchronous, i.e., at the completion of the read(2) or write(2) system call, data is guaranteed to have been transferred." so the write will block until data is physically on the disk. Cheers, Bruce ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 16:21 ` Bruce Allen @ 2004-04-07 16:42 ` Andrew Morton 0 siblings, 0 replies; 32+ messages in thread From: Andrew Morton @ 2004-04-07 16:42 UTC (permalink / raw) To: Bruce Allen; +Cc: adi, bug-coreutils, linux-kernel Bruce Allen <ballen@gravity.phys.uwm.edu> wrote: > > > > On modern Linux, apparently the correct way to bypass the buffer cache > > > when writing to a block device is to open the block device with > > > O_DIRECT. This enables, for example, the user to more easily force a > > > reallocation of a single sector of an IDE disk with a media error > > > (without overwriting anything but the 1k "sector pair" containing the > > > error). dd(1) is convenient for this purpose, but is lacking a method > > > to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to > > > enable this usage. > > > > This would be rather nice to have. You'll need to ensure that the data > > is page-aligned in memory. > > > > While you're there, please add an fsync-before-closing option. > > Andrew, am I right that this is NOT needed for the proposed O_DIRECT > option, since open(2) says: > "The I/O is synchronous, i.e., at the completion of the read(2) or > write(2) system call, data is guaranteed to have been transferred." > so the write will block until data is physically on the disk. A conv=fsync option is an irrelevant wishlist item. If you were to provide conv=fsync then dd should still fsync the file after performing the direct-io read or write. You're correct that an fsync after a direct-io read or write should not need to perform any file data I/O. But it will often need to perform file metadata I/O - file allocation tables, inode, etc. direct-io only syncs the file data, not the file metadata. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 0:33 ` Andrew Morton 2004-04-07 16:21 ` Bruce Allen @ 2004-04-07 17:31 ` Andy Isaacson 2004-04-07 18:18 ` Andrew Morton 2004-04-07 19:12 ` Miquel van Smoorenburg 1 sibling, 2 replies; 32+ messages in thread From: Andy Isaacson @ 2004-04-07 17:31 UTC (permalink / raw) To: Andrew Morton; +Cc: bug-coreutils, linux-kernel On Tue, Apr 06, 2004 at 05:33:26PM -0700, Andrew Morton wrote: > Andy Isaacson <adi@hexapodia.org> wrote: > > dd(1) is convenient for this purpose, but is lacking a method > > to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to > > enable this usage. > > This would be rather nice to have. You'll need to ensure that the data > is page-aligned in memory. So, some confusion on my part about O_DIRECT: I can't get O_DIRECT to work on ext3, at all, on 2.4.25 -- open(O_DIRECT) succeeds, but the write returns EINVAL. Same code works fine when writing to a block device. If the problem is that ext3 can't support O_DIRECT, why does the open succeed? > While you're there, please add an fsync-before-closing option. Easy enough. How does this look? Note that C_TWOBUFS ensures the output buffer is getpagesize()-aligned. The next feature to add would be OpenBSD-style "KB/s" reporting. I'm not going there. -andy 2004-04-07 Andy Isaacson <adi@hexapodia.org> * add conv=direct and conv=fsync options. diff -ur coreutils-5.0.91/doc/coreutils.texi coreutils-5.0.91-adi/doc/coreutils.texi --- coreutils-5.0.91/doc/coreutils.texi 2003-09-04 16:26:51.000000000 -0500 +++ coreutils-5.0.91-adi/doc/coreutils.texi 2004-04-07 11:26:36.000000000 -0500 @@ -6373,6 +6373,16 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes. When used with @samp{block} or @samp{unblock}, pad with spaces instead of zero bytes. + +@item fsync +@opindex fsync +Call @samp{fsync(2)} on the output file before exiting. This ensures +that the file data is written to permanent store. + +@item direct +@opindex direct +Open the output file with O_DIRECT, avoiding (on Linux) using the buffer +cache. @end table @end table diff -ur coreutils-5.0.91/src/dd.c coreutils-5.0.91-adi/src/dd.c --- coreutils-5.0.91/src/dd.c 2003-07-25 02:43:09.000000000 -0500 +++ coreutils-5.0.91-adi/src/dd.c 2004-04-07 11:45:23.000000000 -0500 @@ -66,6 +66,9 @@ /* Default input and output blocksize. */ #define DEFAULT_BLOCKSIZE 512 +/* XXX hack */ +#define HAVE_O_DIRECT 1 + /* Conversions bit masks. */ #define C_ASCII 01 #define C_EBCDIC 02 @@ -78,6 +81,8 @@ #define C_NOERROR 0400 #define C_NOTRUNC 01000 #define C_SYNC 02000 +#define C_FSYNC 010000 +#define C_DIRECT 020000 /* Use separate input and output buffers, and combine partial input blocks. */ #define C_TWOBUFS 04000 @@ -162,6 +167,10 @@ {"noerror", C_NOERROR}, /* Ignore i/o errors. */ {"notrunc", C_NOTRUNC}, /* Do not truncate output file. */ {"sync", C_SYNC}, /* Pad input records to ibs with NULs. */ + {"fsync", C_FSYNC}, /* call fsync(2) before closing output file */ +#ifdef HAVE_O_DIRECT + {"direct", C_DIRECT | C_TWOBUFS}, /* open files with O_DIRECT */ +#endif {NULL, 0} }; @@ -1190,6 +1199,11 @@ = (O_CREAT | (seek_records || (conversions_mask & C_NOTRUNC) ? 0 : O_TRUNC)); +#if HAVE_O_DIRECT + if (conversions_mask & C_DIRECT) + opts |= O_DIRECT; +#endif + /* Open the output file with *read* access only if we might need to read to satisfy a `seek=' request. If we can't read the file, go ahead with write-only access; it might work. */ @@ -1240,5 +1254,11 @@ exit_status = dd_copy (); + if (conversions_mask & C_FSYNC) + { + if (fsync (STDOUT_FILENO) != 0) + error(0, errno, _("cannot fsync %s"), quote (output_file)); + } + quit (exit_status); } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 17:31 ` Andy Isaacson @ 2004-04-07 18:18 ` Andrew Morton 2004-04-07 19:24 ` Andy Isaacson 2004-04-07 20:46 ` Paul Eggert 2004-04-07 19:12 ` Miquel van Smoorenburg 1 sibling, 2 replies; 32+ messages in thread From: Andrew Morton @ 2004-04-07 18:18 UTC (permalink / raw) To: Andy Isaacson; +Cc: bug-coreutils, linux-kernel Andy Isaacson <adi@hexapodia.org> wrote: > > On Tue, Apr 06, 2004 at 05:33:26PM -0700, Andrew Morton wrote: > > Andy Isaacson <adi@hexapodia.org> wrote: > > > dd(1) is convenient for this purpose, but is lacking a method > > > to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to > > > enable this usage. > > > > This would be rather nice to have. You'll need to ensure that the data > > is page-aligned in memory. > > So, some confusion on my part about O_DIRECT: I can't get O_DIRECT to > work on ext3, at all, on 2.4.25 ext3 doesn't support O_DIRECT in 2.4 kernels. I did a patch once and I think it's in 2.4-aa kernels. ext3 supports O_DIRECT in 2.6 kernels. Quite a number of filesystems do. > -- open(O_DIRECT) succeeds, but the write > returns EINVAL. Yup that's a bit silly. In 2.6 we do the check at open() and fcntl() time. In 2.4 we don't fail until the actual I/O attempt. > Same code works fine when writing to a block device. > If the problem is that ext3 can't support O_DIRECT, why does the open > succeed? We have been insufficiently assiduous in merging externally-supported patches into the mainline 2.4 tree. > > While you're there, please add an fsync-before-closing option. > > Easy enough. How does this look? Note that C_TWOBUFS ensures the > output buffer is getpagesize()-aligned. Looks nice and simple. You'll need an ext2 filesystem to test it under 2.4. Be aware that it's rather a challenge to actually get the O_DIRECT #define in scope under some glibc versions. I think you need to define _GNU_SOURCE or something like that. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 18:18 ` Andrew Morton @ 2004-04-07 19:24 ` Andy Isaacson 2004-04-07 19:34 ` Andrew Morton 2004-04-07 20:46 ` Paul Eggert 1 sibling, 1 reply; 32+ messages in thread From: Andy Isaacson @ 2004-04-07 19:24 UTC (permalink / raw) To: Andrew Morton; +Cc: bug-coreutils, linux-kernel On Wed, Apr 07, 2004 at 11:18:41AM -0700, Andrew Morton wrote: > Andy Isaacson <adi@hexapodia.org> wrote: > > > While you're there, please add an fsync-before-closing option. > > > > Easy enough. How does this look? Note that C_TWOBUFS ensures the > > output buffer is getpagesize()-aligned. > > Looks nice and simple. You'll need an ext2 filesystem to test it under 2.4. I tested against a block device. Easier to just "swapoff /dev/hda2" than to find space to make a new filesystem. > Be aware that it's rather a challenge to actually get the O_DIRECT #define > in scope under some glibc versions. I think you need to define _GNU_SOURCE > or something like that. Yeah, I had to -D_GNU_SOURCE to build standalone, but coreutils' makefile seems to do OK. Would there be any reason to allow O_DIRECT on the read side? -andy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 19:24 ` Andy Isaacson @ 2004-04-07 19:34 ` Andrew Morton 2004-04-07 19:47 ` Andy Isaacson 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2004-04-07 19:34 UTC (permalink / raw) To: Andy Isaacson; +Cc: bug-coreutils, linux-kernel Andy Isaacson <adi@hexapodia.org> wrote: > > Would there be any reason to allow O_DIRECT on the read side? Sure. It saves CPU, avoids blowing pagecache, just as with O_DIRECT writes. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 19:34 ` Andrew Morton @ 2004-04-07 19:47 ` Andy Isaacson 2004-04-07 20:03 ` Andrew Morton 2004-04-09 0:37 ` dd PATCH: add conv=direct Anton Blanchard 0 siblings, 2 replies; 32+ messages in thread From: Andy Isaacson @ 2004-04-07 19:47 UTC (permalink / raw) To: Andrew Morton; +Cc: bug-coreutils, linux-kernel On Wed, Apr 07, 2004 at 12:34:55PM -0700, Andrew Morton wrote: > Andy Isaacson <adi@hexapodia.org> wrote: > > Would there be any reason to allow O_DIRECT on the read side? > > Sure. It saves CPU, OK, I can see that one. But it seems like a pretty small benefit to me -- CPU utilization is already really low. > avoids blowing pagecache, Um, that sounds like a bad idea to me. It seems to me it's the kernel's responsibility to figure out "hey, looks like a streaming read - let's not blow out the buffer cache trying to hold 20GB on a 512M system." If you're saying that the kernel guys have given up and the established wisdom is now "you gotta use O_DIRECT if you don't want to throw everything else out due to streaming data", well... I'm disappointed. > just as with O_DIRECT writes. Wouldn't opening both if= and of= with O_DIRECT turn dd into a synchronous copier? That would really suck in the "dd if=/dev/hda1 of=/dev/hdc1" case. With buffer cache doing readahead, that command can get, say, 40MB/s read and 40MB/s write; with synch read and synch write, it would drop to 40MB/s read+write, assuming that block sizes are big enough to amortize seek overhead. Having O_DIRECT on just of=, I think you can get back to 40MB/s+40MB/s. I claim that O_DIRECT on of= is important because you just plain *can not* do the minimal-sized IDE block scrub without it. I don't yet see a similar benefit to O_DIRECT on if= side. -andy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 19:47 ` Andy Isaacson @ 2004-04-07 20:03 ` Andrew Morton 2004-04-07 20:43 ` Andy Isaacson 2004-04-09 0:37 ` dd PATCH: add conv=direct Anton Blanchard 1 sibling, 1 reply; 32+ messages in thread From: Andrew Morton @ 2004-04-07 20:03 UTC (permalink / raw) To: Andy Isaacson; +Cc: bug-coreutils, linux-kernel Andy Isaacson <adi@hexapodia.org> wrote: > > I claim that O_DIRECT on of= is important because you just plain *can > not* do the minimal-sized IDE block scrub without it. I don't yet see a > similar benefit to O_DIRECT on if= side. If you want a block scrubber then write a block scrubber. If you want to add O_DIRECT support to dd then it should be implemented properly, and that means implementing it for both read and write. In fact the user should be able to specify the read-O_DIRECT and the write-O_DIRECT independently - if for no other reason than that the source and dest filesytems may not both support O_DIRECT. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 20:03 ` Andrew Morton @ 2004-04-07 20:43 ` Andy Isaacson 2004-04-07 21:00 ` Valdis.Kletnieks 2004-04-07 21:35 ` Bruce Allen 0 siblings, 2 replies; 32+ messages in thread From: Andy Isaacson @ 2004-04-07 20:43 UTC (permalink / raw) To: Andrew Morton; +Cc: bug-coreutils, linux-kernel On Wed, Apr 07, 2004 at 01:03:08PM -0700, Andrew Morton wrote: > Andy Isaacson <adi@hexapodia.org> wrote: > > I claim that O_DIRECT on of= is important because you just plain *can > > not* do the minimal-sized IDE block scrub without it. I don't yet see a > > similar benefit to O_DIRECT on if= side. > > If you want a block scrubber then write a block scrubber. They exist. They're a pain in the ass to find when you need one. dd is almost there; a small patch that might have a chance of getting accepted upstream seemed like a reasonable time investment. > If you want to add O_DIRECT support to dd then it should be implemented > properly, and that means implementing it for both read and write. > > In fact the user should be able to specify the read-O_DIRECT and the > write-O_DIRECT independently - if for no other reason than that the source > and dest filesytems may not both support O_DIRECT. Of course if both directions are supported they must be independently specifiable. I just don't see a compelling use case for the input side. Nevertheless, here you go. Still needs some autoconfiscation. 2004-04-07 Andy Isaacson <adi@hexapodia.org> * add conv=direct, conv=idirect, and conv=fsync options. diff -u'rx*.1' coreutils-5.0.91/doc/coreutils.texi coreutils-5.0.91-adi/doc/coreutils.texi --- coreutils-5.0.91/doc/coreutils.texi 2003-09-04 16:26:51.000000000 -0500 +++ coreutils-5.0.91-adi/doc/coreutils.texi 2004-04-07 15:24:17.000000000 -0500 @@ -6373,6 +6373,20 @@ Pad every input block to size of @samp{ibs} with trailing zero bytes. When used with @samp{block} or @samp{unblock}, pad with spaces instead of zero bytes. + +@item fsync +@opindex fsync +Call @samp{fsync(2)} on the output file before exiting. This ensures +that the file data is written to permanent store. + +@item direct +@opindex direct +Open the output file with O_DIRECT, avoiding (on Linux) using the buffer +cache. + +@item idirect +@opindex idirect +Open the input file with O_DIRECT, avoiding (on Linux) using the buffer @end table @end table diff -u'rx*.1' coreutils-5.0.91/src/dd.c coreutils-5.0.91-adi/src/dd.c --- coreutils-5.0.91/src/dd.c 2003-07-25 02:43:09.000000000 -0500 +++ coreutils-5.0.91-adi/src/dd.c 2004-04-07 15:23:24.000000000 -0500 @@ -66,6 +66,9 @@ /* Default input and output blocksize. */ #define DEFAULT_BLOCKSIZE 512 +/* XXX hack */ +#define HAVE_O_DIRECT 1 + /* Conversions bit masks. */ #define C_ASCII 01 #define C_EBCDIC 02 @@ -78,6 +81,9 @@ #define C_NOERROR 0400 #define C_NOTRUNC 01000 #define C_SYNC 02000 +#define C_FSYNC 010000 +#define C_DIRECT 020000 +#define C_IDIRECT 040000 /* Use separate input and output buffers, and combine partial input blocks. */ #define C_TWOBUFS 04000 @@ -162,6 +168,11 @@ {"noerror", C_NOERROR}, /* Ignore i/o errors. */ {"notrunc", C_NOTRUNC}, /* Do not truncate output file. */ {"sync", C_SYNC}, /* Pad input records to ibs with NULs. */ + {"fsync", C_FSYNC}, /* call fsync(2) before closing output file */ +#ifdef HAVE_O_DIRECT + {"direct", C_DIRECT | C_TWOBUFS}, /* open output with O_DIRECT */ + {"idirect", C_IDIRECT | C_TWOBUFS}, /* open input with O_DIRECT */ +#endif {NULL, 0} }; @@ -1177,7 +1188,14 @@ if (input_file != NULL) { - if (open_fd (STDIN_FILENO, input_file, O_RDONLY, 0) < 0) + int opts = O_RDONLY; + +#if HAVE_O_DIRECT + if (conversions_mask & C_IDIRECT) + opts |= O_DIRECT; +#endif + + if (open_fd (STDIN_FILENO, input_file, opts, 0) < 0) error (EXIT_FAILURE, errno, _("opening %s"), quote (input_file)); } else @@ -1190,6 +1208,11 @@ = (O_CREAT | (seek_records || (conversions_mask & C_NOTRUNC) ? 0 : O_TRUNC)); +#if HAVE_O_DIRECT + if (conversions_mask & C_DIRECT) + opts |= O_DIRECT; +#endif + /* Open the output file with *read* access only if we might need to read to satisfy a `seek=' request. If we can't read the file, go ahead with write-only access; it might work. */ @@ -1240,5 +1263,11 @@ exit_status = dd_copy (); + if (conversions_mask & C_FSYNC) + { + if (fsync (STDOUT_FILENO) != 0) + error(0, errno, _("cannot fsync %s"), quote (output_file)); + } + quit (exit_status); } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 20:43 ` Andy Isaacson @ 2004-04-07 21:00 ` Valdis.Kletnieks 2004-04-07 21:35 ` Bruce Allen 1 sibling, 0 replies; 32+ messages in thread From: Valdis.Kletnieks @ 2004-04-07 21:00 UTC (permalink / raw) To: Andy Isaacson; +Cc: Andrew Morton, bug-coreutils, linux-kernel [-- Attachment #1: Type: text/plain, Size: 474 bytes --] On Wed, 07 Apr 2004 15:43:41 CDT, Andy Isaacson said: > On Wed, Apr 07, 2004 at 01:03:08PM -0700, Andrew Morton wrote: > > If you want a block scrubber then write a block scrubber. > They exist. They're a pain in the ass to find when you need one. dd is And finding a block scrubber that actually works *properly* is even more of a challenge. I've seen more than a few that do stupid things like truncating the file and then writing the appropriate number of bytes.... [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 20:43 ` Andy Isaacson 2004-04-07 21:00 ` Valdis.Kletnieks @ 2004-04-07 21:35 ` Bruce Allen 2004-04-08 6:56 ` Paul Eggert 1 sibling, 1 reply; 32+ messages in thread From: Bruce Allen @ 2004-04-07 21:35 UTC (permalink / raw) To: Andy Isaacson; +Cc: Andrew Morton, bug-coreutils, linux-kernel > > If you want to add O_DIRECT support to dd then it should be implemented > > properly, and that means implementing it for both read and write. > > > > In fact the user should be able to specify the read-O_DIRECT and the > > write-O_DIRECT independently - if for no other reason than that the source > > and dest filesytems may not both support O_DIRECT. > > Of course if both directions are supported they must be independently > specifiable. I just don't see a compelling use case for the input side. Andy, since I was the one who suggested adding a conv=odirect flag to the output side, so that we could use a standard tool (dd) to force reallocation of bad sectors by writing to them, let me argue that it ALSO makes sense to allow O_DIRECT (as an independent option) on the input side. At least one obvious use is to help FIND unreadable disk sectors (or to verify that some sector is indeed unreadable). One would like to be able to do: dd if=/dev/hda of=/dev/null bs=512 count=1 skip=LBA conv=idirect and see if the dd suceeds or fails. If dd fails, then without having an O_DIRECT option (idirect) at the input side, there is no way of telling if the failure was because of an unreadable sector at LBA, or an unreadable sector somewhere else in the block containing LBA. With the O_DIRECT input option, you can be sure that if this command fails it's because the sector at LBA was unreadable, not some nearby sector. Cheers, Bruce ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 21:35 ` Bruce Allen @ 2004-04-08 6:56 ` Paul Eggert 2004-04-08 11:07 ` Jim Meyering 2004-04-08 16:23 ` Philippe Troin 0 siblings, 2 replies; 32+ messages in thread From: Paul Eggert @ 2004-04-08 6:56 UTC (permalink / raw) To: bug-coreutils; +Cc: Andy Isaacson, Andrew Morton, Bruce Allen, linux-kernel Bruce Allen <ballen@gravity.phys.uwm.edu> writes: > let me argue that it ALSO makes sense to allow O_DIRECT (as an > independent option) on the input side. At the end of this message is a proposed patch to implement everything people other than myself have asked for so far, along with several other things since I was in the neighborhood. This patch adds the following NEWS entry, which I'll repeat here to make it easy to see what's going on: dd has new conversions for the conv= option: nocreat do not create the output file excl fail if the output file already exists fdatasync physically write output file data before finishing fsync likewise, but also write metadata dd has new iflags= and oflags= options with the following flags: append append mode (makes sense for output file only) direct use direct I/O for data dsync use synchronized I/O for data sync likewise, but also for metadata nonblock use non-blocking I/O nofollow do not follow symlinks noctty do not assign controlling terminal from file This patch is relative to coreutils CVS, so your line numbers may differ. I didn't add hooks for Solaris-style directio, because that turns out to be a systemwide property and perhaps should be in a different command. 2004-04-07 Paul Eggert <eggert@twinsun.com> * NEWS: New dd conv= symbols nocreat, excl, fdatasync, fsync, and new dd options iflag= and oflag=. * doc/coreutils.texi (dd invocation): Likewise. * src/dd.c (usage): Likewise. * m4/jm-macros.m4 (jm_MACROS): Check for fdatasync within -lrt and -lposix4, so that it can be used in Solaris 2.5.1 and later. * src/Makefile.am (dd_LDADD, shred_LDADD): Add fdatasync's lib. * src/dd.c (fdatasync) [!HAVE_FDATASYNC]: New macro. (C_NOCREAT, C_EXCL, C_FDATASYNC, C_FSYNC): New macros. (input_flags, output_flags): New vars. (LONGEST_SYMBOL): New macro. (struct symbol_value): Renamed from struct conversion. Members symbol and value renamed from convname and conversion. The symbol value is now an array instead of a pointer; this saves a bit of space and time in practice. All uses changed. (conversions): Add nocreat, excl, fdatasync, fsync. Now const. (flags): New constant array. (iflag_error_msgid, oflag_error_msgid): New constants. (parse_symbols): Renamed from parse_conversion and generalized to handle either conversion or flag symbols. (scanargs): Adjust uses of parse_symbols accodingly. Add support for iflag= and oflag=. Reject attempts to use both excl and nocreat. (set_fd_flags): New function. (dd_copy): Just return X rather than calling quit (X), since our caller invokes quit with the returned value. Add support for fdatasync and fsync. (main): Add support for iflags=, oflags=, and new conv= symbols. * src/system.h (O_DIRECT, O_DSYNC, O_NDELAY, O_NOFOLLOW, O_RSYNC, O_SYNC): Define to 0 if not already defined. * NEWS: Remove duplicate mention of BLOCKSIZE. Index: NEWS =================================================================== RCS file: /home/meyering/coreutils/cu/NEWS,v retrieving revision 1.198 diff -p -u -r1.198 NEWS --- NEWS 7 Apr 2004 10:56:00 -0000 1.198 +++ NEWS 8 Apr 2004 04:01:34 -0000 @@ -9,14 +9,24 @@ GNU coreutils NEWS ** New features - stty now provides support (iutf8) for setting UTF-8 input mode. + dd has new conversions for the conv= option: - 'df', 'du', and 'ls' now take the default block size from the - BLOCKSIZE environment variable if the BLOCK_SIZE, DF_BLOCK_SIZE, - DU_BLOCK_SIZE, and LS_BLOCK_SIZE environment variables are not set. - Unlike the other variables, though, BLOCKSIZE does not affect - values like 'ls -l' sizes that are normally displayed as bytes. - This new behavior is for compatibility with BSD. + nocreat do not create the output file + excl fail if the output file already exists + fdatasync physically write output file data before finishing + fsync likewise, but also write metadata + + dd has new iflags= and oflags= options with the following flags: + + append append mode (makes sense for output file only) + direct use direct I/O for data + dsync use synchronized I/O for data + sync likewise, but also for metadata + nonblock use non-blocking I/O + nofollow do not follow symlinks + noctty do not assign controlling terminal from file + + stty now provides support (iutf8) for setting UTF-8 input mode. With stat, a specified format is no longer automatically newline terminated. If you want a newline at the end of your output, append `\n' to the format Index: doc/coreutils.texi =================================================================== RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v retrieving revision 1.175 diff -p -u -r1.175 coreutils.texi --- doc/coreutils.texi 7 Apr 2004 09:50:48 -0000 1.175 +++ doc/coreutils.texi 8 Apr 2004 04:55:38 -0000 @@ -6580,6 +6580,17 @@ when an odd number of bytes are read---t @cindex read errors, ignoring Continue after read errors. +@item nocreat +@opindex nocreat +@cindex creating output file, avoiding +Do not create the output file; the output file must already exist. + +@item excl +@opindex excl +@cindex creating output file, requiring +Fail if the output file already exists; @command{dd} must create the +output file itself. + @item notrunc @opindex notrunc @cindex truncating output file, avoiding @@ -6590,7 +6601,85 @@ Do not truncate the output file. Pad every input block to size of @samp{ibs} with trailing zero bytes. When used with @samp{block} or @samp{unblock}, pad with spaces instead of zero bytes. + +@item fdatasync +@opindex fdatasync +@cindex synchronized data writes, before finishing +Synchronize output data just before finishing. This forces a physical +write of output data. + +@item fsync +@opindex fsync +@cindex synchronized data and metadata writes, before finishing +Synchronize output data and metadata just before finishing. This +forces a physical write of output data and metadata. + +@end table + +@item iflag=@var{flag}[,@var{flag}]@dots{} +@opindex iflag +Access the input file using the flags specified by the @var{flag} +argument(s). (No spaces around any comma(s).) + +@item oflag=@var{flag}[,@var{flag}]@dots{} +@opindex oflag +Access the output file using the flags specified by the @var{flag} +argument(s). (No spaces around any comma(s).) + +Flags: + +@table @samp + +@item append +@opindex append +@cindex appending to the output file +Write in append mode, so that even if some other process is writing to +this file, every @command{dd} write will append to the current +contents of the file. This flag makes sense only for output. + +@item direct +@opindex direct +@cindex direct I/O +Use direct I/O for data, avoiding the buffer cache. + +@item dsync +@opindex dsync +@cindex synchronized data reads +Use synchronized I/O for data. For the output file, this forces a +physical write of output data on each write. For the input file, +this flag can matter when reading from a remote file that has been +written to synchronously by some other process. Metadata (e.g., +last-access and last-modified time) is not necessarily synchronized. + +@item sync +@opindex sync +@cindex synchronized data and metadata I/O +Use synchronized I/O for both data and metadata. + +@item nonblock +@opindex nonblock +@cindex nonblocking I/O +Use non-blocking I/O. + +@item nofollow +@opindex nofollow +@cindex symbolic links, following +Do not follow symbolic links. + +@item noctty +@opindex noctty +@cindex controlling terminal +Do not assign the file to be a controlling terminal for @command{dd}. +This has no effect when the file is not a terminal. + @end table + +These flags are not supported on all systems, and @samp{dd} rejects +attempts to use them when they are not supported. When reading from +standard input or writing to standard output, the @samp{nofollow} and +@samp{noctty} flags should not be specified, and the other flags +(e.g., @samp{nonblock}) can affect how other processes behave with the +affected file descriptors, even after @command{dd} exits. @end table Index: m4/jm-macros.m4 =================================================================== RCS file: /home/meyering/coreutils/cu/m4/jm-macros.m4,v retrieving revision 1.184 diff -p -u -r1.184 jm-macros.m4 --- m4/jm-macros.m4 20 Dec 2003 17:57:30 -0000 1.184 +++ m4/jm-macros.m4 8 Apr 2004 05:30:05 -0000 @@ -81,7 +81,6 @@ AC_DEFUN([jm_MACROS], AC_CHECK_FUNCS( \ endgrent \ endpwent \ - fdatasync \ ftruncate \ gethrtime \ hasmntopt \ @@ -110,6 +109,15 @@ AC_DEFUN([jm_MACROS], AC_FUNC_STRTOD AC_REQUIRE([GL_FUNC_GETCWD_PATH_MAX]) AC_REQUIRE([GL_FUNC_READDIR]) + + # for dd.c and shred.c + fetish_saved_libs=$LIBS + AC_SEARCH_LIBS([fdatasync], [rt posix4], + [test "$ac_cv_search_fdatasync" = "none required" || + LIB_FDATASYNC=$ac_cv_search_fdatasync]) + AC_SUBST([LIB_FDATASYNC]) + AC_CHECK_FUNCS(fdatasync) + LIBS=$fetish_saved_libs # See if linking `seq' requires -lm. # It does on nearly every system. The single exception (so far) is Index: src/Makefile.am =================================================================== RCS file: /home/meyering/coreutils/cu/src/Makefile.am,v retrieving revision 1.34 diff -p -u -r1.34 Makefile.am --- src/Makefile.am 17 Mar 2004 10:14:17 -0000 1.34 +++ src/Makefile.am 8 Apr 2004 05:23:13 -0000 @@ -32,10 +32,11 @@ AM_CPPFLAGS = -I.. -I$(srcdir) -I$(top_s # replacement functions defined in libfetish.a. LDADD = ../lib/libfetish.a $(LIBINTL) ../lib/libfetish.a -# for clock_gettime +# for clock_gettime and fdatasync +dd_LDADD = $(LDADD) $(LIB_FDATASYNC) dir_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME) ls_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME) -shred_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME) +shred_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME) $(LIB_FDATASYNC) vdir_LDADD = $(LDADD) $(LIB_CLOCK_GETTIME) ## If necessary, add -lm to resolve use of pow in lib/strtod.c. Index: src/dd.c =================================================================== RCS file: /home/meyering/coreutils/cu/src/dd.c,v retrieving revision 1.154 diff -p -u -r1.154 dd.c --- src/dd.c 21 Jan 2004 22:53:49 -0000 1.154 +++ src/dd.c 8 Apr 2004 06:26:20 -0000 @@ -49,6 +49,10 @@ # define S_TYPEISSHM(Stat_ptr) 0 #endif +#if ! HAVE_FDATASYNC +# define fdatasync(fd) (errno = ENOSYS, -1) +#endif + #define ROUND_UP_OFFSET(X, M) ((M) - 1 - (((X) + (M) - 1) % (M))) #define PTR_ALIGN(Ptr, M) ((Ptr) \ + ROUND_UP_OFFSET ((char *)(Ptr) - (char *)0, (M))) @@ -80,6 +84,10 @@ #define C_SYNC 02000 /* Use separate input and output buffers, and combine partial input blocks. */ #define C_TWOBUFS 04000 +#define C_NOCREAT 010000 +#define C_EXCL 020000 +#define C_FDATASYNC 040000 +#define C_FSYNC 0100000 /* The name this program was run with. */ char *program_name; @@ -111,6 +119,10 @@ static uintmax_t max_records = (uintmax_ /* Bit vector of conversions to apply. */ static int conversions_mask = 0; +/* Open flags for the input and output files. */ +static int input_flags = 0; +static int output_flags = 0; + /* If nonzero, filter characters through the translation table. */ static int translation_needed = 0; @@ -143,13 +155,18 @@ static size_t oc = 0; /* Index into current line, for `conv=block' and `conv=unblock'. */ static size_t col = 0; -struct conversion +/* A longest symbol in the struct symbol_values table below. */ +#define LONGEST_SYMBOL "fdatasync" + +/* A symbol and the corresponding integer value. */ +struct symbol_value { - char *convname; - int conversion; + char symbol[sizeof LONGEST_SYMBOL]; + int value; }; -static struct conversion conversions[] = +/* Conversion symbols, for conv="...". */ +static struct symbol_value const conversions[] = { {"ascii", C_ASCII | C_TWOBUFS}, /* EBCDIC to ASCII. */ {"ebcdic", C_EBCDIC | C_TWOBUFS}, /* ASCII to EBCDIC. */ @@ -160,9 +177,26 @@ static struct conversion conversions[] = {"ucase", C_UCASE | C_TWOBUFS}, /* Translate lower to upper case. */ {"swab", C_SWAB | C_TWOBUFS}, /* Swap bytes of input. */ {"noerror", C_NOERROR}, /* Ignore i/o errors. */ + {"nocreat", C_NOCREAT}, /* Do not create output file. */ + {"excl", C_EXCL}, /* Fail if the output file already exists. */ {"notrunc", C_NOTRUNC}, /* Do not truncate output file. */ {"sync", C_SYNC}, /* Pad input records to ibs with NULs. */ - {NULL, 0} + {"fdatasync", C_FDATASYNC}, /* Synchronize output data before finishing. */ + {"fsync", C_FSYNC}, /* Also synchronize output metadata. */ + {"", 0} +}; + +/* Flags, for iflag="..." and oflag="...". */ +static struct symbol_value const flags[] = +{ + {"append", O_APPEND}, + {"direct", O_DIRECT}, + {"dsync", O_DSYNC}, + {"noctty", O_NOCTTY}, + {"nofollow", O_NOFOLLOW}, + {"nonblock", O_NONBLOCK}, + {"sync", O_SYNC}, + {"", 0} }; /* Translation table formed by applying successive transformations. */ @@ -290,14 +324,16 @@ Copy a file, converting and formatting a \n\ bs=BYTES force ibs=BYTES and obs=BYTES\n\ cbs=BYTES convert BYTES bytes at a time\n\ - conv=KEYWORDS convert the file as per the comma separated keyword list\n\ + conv=CONVS convert the file as per the comma separated symbol list\n\ count=BLOCKS copy only BLOCKS input blocks\n\ ibs=BYTES read BYTES bytes at a time\n\ "), stdout); fputs (_("\ if=FILE read from FILE instead of stdin\n\ + iflag=FLAGS read as per the comma separated symbol list\n\ obs=BYTES write BYTES bytes at a time\n\ of=FILE write to FILE instead of stdout\n\ + oflag=FLAGS write as per the comma separated symbol list\n\ seek=BLOCKS skip BLOCKS obs-sized blocks at start of output\n\ skip=BLOCKS skip BLOCKS ibs-sized blocks at start of input\n\ "), stdout); @@ -308,7 +344,8 @@ Copy a file, converting and formatting a BLOCKS and BYTES may be followed by the following multiplicative suffixes:\n\ xM M, c 1, w 2, b 512, kB 1000, K 1024, MB 1000*1000, M 1024*1024,\n\ GB 1000*1000*1000, G 1024*1024*1024, and so on for T, P, E, Z, Y.\n\ -Each KEYWORD may be:\n\ +\n\ +Each CONV symbol may be:\n\ \n\ "), stdout); fputs (_("\ @@ -320,15 +357,38 @@ Each KEYWORD may be:\n\ lcase change upper case to lower case\n\ "), stdout); fputs (_("\ + nocreat do not create the output file\n\ + excl fail if the output file already exists\n\ notrunc do not truncate the output file\n\ ucase change lower case to upper case\n\ swab swap every pair of input bytes\n\ noerror continue after read errors\n\ sync pad every input block with NULs to ibs-size; when used\n\ with block or unblock, pad with spaces rather than NULs\n\ + fdatasync physically write output file data before finishing\n\ + fsync likewise, but also write metadata\n\ "), stdout); fputs (_("\ \n\ +Each FLAG symbol may be:\n\ +\n\ + append append mode (makes sense only for output)\n\ +"), stdout); + if (O_DIRECT) + fputs (_(" direct use direct I/O for data\n"), stdout); + if (O_DSYNC) + fputs (_(" dsync use synchronized I/O for data\n"), stdout); + if (O_SYNC) + fputs (_(" sync likewise, but also for metadata\n"), stdout); + if (O_NONBLOCK) + fputs (_(" nonblock use non-blocking I/O\n"), stdout); + if (O_NOFOLLOW) + fputs (_(" nofollow do not follow symlinks\n"), stdout); + if (O_NOCTTY) + fputs (_(" noctty do not assign controlling terminal from file\n"), + stdout); + fputs (_("\ +\n\ Note that sending a SIGUSR1 signal to a running `dd' process makes it\n\ print to standard error the number of records read and written so far,\n\ then to resume copying.\n\ @@ -486,33 +546,47 @@ write_output (void) oc = 0; } -/* Interpret one "conv=..." option. +/* Diagnostics for invalid iflag="..." and oflag="..." symbols. */ +static char const iflag_error_msgid[] = N_("invalid input flag: %s"); +static char const oflag_error_msgid[] = N_("invalid output flag: %s"); + +/* Interpret one "conv=..." or similar option STR according to the + symbols in TABLE, returning the flags specified. If the option + cannot be parsed, use ERROR_MSGID to generate a diagnostic. As a by product, this function replaces each `,' in STR with a NUL byte. */ -static void -parse_conversion (char *str) +static int +parse_symbols (char *str, struct symbol_value const *table, + char const *error_msgid) { - char *new; - int i; + int value = 0; do { - new = strchr (str, ','); + struct symbol_value const *entry; + char *new = strchr (str, ','); if (new != NULL) *new++ = '\0'; - for (i = 0; conversions[i].convname != NULL; i++) - if (STREQ (conversions[i].convname, str)) - { - conversions_mask |= conversions[i].conversion; - break; - } - if (conversions[i].convname == NULL) + for (entry = table; ; entry++) { - error (0, 0, _("invalid conversion: %s"), quote (str)); - usage (EXIT_FAILURE); + if (! entry->symbol[0]) + { + error (0, 0, _(error_msgid), quote (str)); + usage (EXIT_FAILURE); + } + if (STREQ (entry->symbol, str)) + { + if (! entry->value) + error (EXIT_FAILURE, 0, _(error_msgid), quote (str)); + value |= entry->value; + break; + } } str = new; - } while (new != NULL); + } + while (str); + + return value; } /* Return the value of STR, interpreted as a non-negative decimal integer, @@ -574,7 +648,12 @@ scanargs (int argc, char **argv) else if (STREQ (name, "of")) output_file = val; else if (STREQ (name, "conv")) - parse_conversion (val); + conversions_mask |= parse_symbols (val, conversions, + N_("invalid conversion: %s")); + else if (STREQ (name, "iflag")) + input_flags |= parse_symbols (val, flags, iflag_error_msgid); + else if (STREQ (name, "oflag")) + output_flags |= parse_symbols (val, flags, oflag_error_msgid); else { int invalid = 0; @@ -638,6 +717,12 @@ scanargs (int argc, char **argv) output_blocksize = DEFAULT_BLOCKSIZE; if (conversion_blocksize == 0) conversions_mask &= ~(C_BLOCK | C_UNBLOCK); + + if (input_flags & (O_DSYNC | O_SYNC)) + input_flags |= O_RSYNC; + + if ((conversions_mask & (C_EXCL | C_NOCREAT)) == (C_EXCL | C_NOCREAT)) + error (EXIT_FAILURE, 0, _("cannot combine excl and nocreat")); } /* Fix up translation table. */ @@ -928,6 +1013,22 @@ copy_with_unblock (char const *buf, size } } +/* Set the file descriptor flags for FD that correspond to the nonzero bits + in FLAGS. The file's name is NAME. */ + +static void +set_fd_flags (int fd, int flags, char const *name) +{ + if (flags) + { + int old_flags = fcntl (fd, F_GETFL); + int new_flags = old_flags | flags; + if (old_flags < 0 + || (new_flags != old_flags && fcntl (fd, F_SETFL, new_flags) == -1)) + error (EXIT_FAILURE, errno, _("setting flags for %s"), quote (name)); + } +} + /* The main loop. */ static int @@ -994,7 +1095,7 @@ dd_copy (void) } if (max_records == 0) - quit (exit_status); + return exit_status; while (1) { @@ -1061,7 +1162,7 @@ dd_copy (void) if (nwritten != n_bytes_read) { error (0, errno, _("writing %s"), quote (output_file)); - quit (EXIT_FAILURE); + return EXIT_FAILURE; } else if (n_bytes_read == input_blocksize) w_full++; @@ -1122,7 +1223,7 @@ dd_copy (void) if (nwritten != oc) { error (0, errno, _("writing %s"), quote (output_file)); - quit (EXIT_FAILURE); + return EXIT_FAILURE; } } @@ -1130,6 +1231,24 @@ dd_copy (void) if (real_obuf) free (real_obuf); + if ((conversions_mask & C_FDATASYNC) && fdatasync (STDOUT_FILENO) != 0) + { + if (errno != ENOSYS && errno != EINVAL) + { + error (0, errno, "fdatasync %s", quote (output_file)); + exit_status = EXIT_FAILURE; + } + conversions_mask |= C_FSYNC; + } + + if (conversions_mask & C_FSYNC) + while (fsync (STDOUT_FILENO) != 0) + if (errno != EINTR) + { + error (0, errno, "fsync %s", quote (output_file)); + return EXIT_FAILURE; + } + return exit_status; } @@ -1174,19 +1293,29 @@ main (int argc, char **argv) apply_translations (); - if (input_file != NULL) + if (input_file == NULL) { - if (open_fd (STDIN_FILENO, input_file, O_RDONLY, 0) < 0) - error (EXIT_FAILURE, errno, _("opening %s"), quote (input_file)); + input_file = _("standard input"); + set_fd_flags (STDIN_FILENO, input_flags, input_file); } else - input_file = _("standard input"); + { + if (open_fd (STDIN_FILENO, input_file, O_RDONLY | input_flags, 0) < 0) + error (EXIT_FAILURE, errno, _("opening %s"), quote (input_file)); + } - if (output_file != NULL) + if (output_file == NULL) + { + output_file = _("standard output"); + set_fd_flags (STDOUT_FILENO, output_flags, output_file); + } + else { mode_t perms = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; int opts - = (O_CREAT + = (output_flags + | (conversions_mask & C_NOCREAT ? 0 : O_CREAT) + | (conversions_mask & C_EXCL ? O_EXCL : 0) | (seek_records || (conversions_mask & C_NOTRUNC) ? 0 : O_TRUNC)); /* Open the output file with *read* access only if we might @@ -1210,8 +1339,8 @@ main (int argc, char **argv) quote (output_file)); /* Complain only when ftruncate fails on a regular file, a - directory, or a shared memory object, as the 2000-08 - POSIX draft specifies ftruncate's behavior only for these + directory, or a shared memory object, as + POSIX 1003.1-2003 specifies ftruncate's behavior only for these file types. For example, do not complain when Linux 2.4 ftruncate fails on /dev/fd0. */ if (ftruncate (STDOUT_FILENO, o) != 0 @@ -1226,10 +1355,6 @@ main (int argc, char **argv) } } #endif - } - else - { - output_file = _("standard output"); } install_handler (SIGINT, interrupt_handler); Index: src/system.h =================================================================== RCS file: /home/meyering/coreutils/cu/src/system.h,v retrieving revision 1.82 diff -p -u -r1.82 system.h --- src/system.h 6 Apr 2004 13:55:00 -0000 1.82 +++ src/system.h 8 Apr 2004 03:39:01 -0000 @@ -197,6 +197,14 @@ initialize_exit_failure (int status) # define O_TEXT _O_TEXT #endif +#if !defined O_DIRECT +# define O_DIRECT 0 +#endif + +#if !defined O_DSYNC +# define O_DSYNC 0 +#endif + #if !defined O_NDELAY # define O_NDELAY 0 #endif @@ -207,6 +215,18 @@ initialize_exit_failure (int status) #if !defined O_NOCTTY # define O_NOCTTY 0 +#endif + +#if !defined O_NOFOLLOW +# define O_NOFOLLOW 0 +#endif + +#if !defined O_RSYNC +# define O_RSYNC 0 +#endif + +#if !defined O_SYNC +# define O_SYNC 0 #endif #ifdef __BEOS__ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-08 6:56 ` Paul Eggert @ 2004-04-08 11:07 ` Jim Meyering 2004-04-08 19:32 ` Paul Eggert 2004-04-08 16:23 ` Philippe Troin 1 sibling, 1 reply; 32+ messages in thread From: Jim Meyering @ 2004-04-08 11:07 UTC (permalink / raw) To: Paul Eggert Cc: bug-coreutils, Andrew Morton, Bruce Allen, linux-kernel, Andy Isaacson Paul Eggert <eggert@CS.UCLA.EDU> wrote: > At the end of this message is a proposed patch to implement everything > people other than myself have asked for so far, along with several > other things since I was in the neighborhood. Thanks for that nice patch! I've applied it, and made these additional changes: 2004-04-08 Jim Meyering <jim@meyering.net> * src/dd.c (set_fd_flags): Don't OR in -1 when fcntl fails. Rename parameter, flags, to avoid shadowing global. Index: dd.c =================================================================== RCS file: /fetish/cu/src/dd.c,v retrieving revision 1.155 diff -u -p -r1.155 dd.c --- a/dd.c 8 Apr 2004 10:22:05 -0000 1.155 +++ b/dd.c 8 Apr 2004 11:02:20 -0000 @@ -1017,12 +1017,12 @@ copy_with_unblock (char const *buf, size in FLAGS. The file's name is NAME. */ static void -set_fd_flags (int fd, int flags, char const *name) +set_fd_flags (int fd, int add_flags, char const *name) { - if (flags) + if (add_flags) { int old_flags = fcntl (fd, F_GETFL); - int new_flags = old_flags | flags; + int new_flags = old_flags < 0 ? add_flags : (old_flags | add_flags); if (old_flags < 0 || (new_flags != old_flags && fcntl (fd, F_SETFL, new_flags) == -1)) error (EXIT_FAILURE, errno, _("setting flags for %s"), quote (name)); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-08 11:07 ` Jim Meyering @ 2004-04-08 19:32 ` Paul Eggert 2004-04-08 19:51 ` Paul Jarc 2004-04-08 21:34 ` Jim Meyering 0 siblings, 2 replies; 32+ messages in thread From: Paul Eggert @ 2004-04-08 19:32 UTC (permalink / raw) To: Jim Meyering Cc: bug-coreutils, Andrew Morton, Bruce Allen, linux-kernel, Andy Isaacson Jim Meyering <jim@meyering.net> writes: > 2004-04-08 Jim Meyering <jim@meyering.net> > > * src/dd.c (set_fd_flags): Don't OR in -1 when fcntl fails. Doesn't that fix generate worse code in the usual case, since it causes two conditional branches instead of one? How about this further patch? It relies on common subexpression elimination, but that's common these days. 2004-04-08 Paul Eggert <eggert@twinsun.com> * src/dd.c (set_fd_flags): Don't test old_flags < 0 twice. Index: src/dd.c =================================================================== RCS file: /home/meyering/coreutils/cu/src/dd.c,v retrieving revision 1.157 diff -p -u -r1.157 dd.c --- src/dd.c 8 Apr 2004 15:25:39 -0000 1.157 +++ src/dd.c 8 Apr 2004 19:17:02 -0000 @@ -1014,7 +1014,7 @@ copy_with_unblock (char const *buf, size } /* Set the file descriptor flags for FD that correspond to the nonzero bits - in FLAGS. The file's name is NAME. */ + in ADD_FLAGS. The file's name is NAME. */ static void set_fd_flags (int fd, int add_flags, char const *name) @@ -1022,9 +1022,9 @@ set_fd_flags (int fd, int add_flags, cha if (add_flags) { int old_flags = fcntl (fd, F_GETFL); - int new_flags = old_flags < 0 ? add_flags : (old_flags | add_flags); if (old_flags < 0 - || (new_flags != old_flags && fcntl (fd, F_SETFL, new_flags) == -1)) + || (old_flags != (old_flags | add_flags) + && fcntl (fd, F_SETFL, old_flags | add_flags) == -1)) error (EXIT_FAILURE, errno, _("setting flags for %s"), quote (name)); } } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-08 19:32 ` Paul Eggert @ 2004-04-08 19:51 ` Paul Jarc 2004-04-08 21:34 ` Jim Meyering 1 sibling, 0 replies; 32+ messages in thread From: Paul Jarc @ 2004-04-08 19:51 UTC (permalink / raw) To: Paul Eggert Cc: Jim Meyering, Andrew Morton, Bruce Allen, bug-coreutils, linux-kernel, Andy Isaacson Paul Eggert <eggert@CS.UCLA.EDU> wrote: > Jim Meyering <jim@meyering.net> writes: >> 2004-04-08 Jim Meyering <jim@meyering.net> >> >> * src/dd.c (set_fd_flags): Don't OR in -1 when fcntl fails. > > Doesn't that fix generate worse code in the usual case, since it > causes two conditional branches instead of one? I think it's unnecessary anyway - if old_flags is -1, then the conditional jumps to error() before looking at new_flags at all. paul ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-08 19:32 ` Paul Eggert 2004-04-08 19:51 ` Paul Jarc @ 2004-04-08 21:34 ` Jim Meyering 1 sibling, 0 replies; 32+ messages in thread From: Jim Meyering @ 2004-04-08 21:34 UTC (permalink / raw) To: Paul Eggert Cc: bug-coreutils, Andrew Morton, Bruce Allen, linux-kernel, Andy Isaacson Paul Eggert <eggert@CS.UCLA.EDU> wrote: > Jim Meyering <jim@meyering.net> writes: > >> 2004-04-08 Jim Meyering <jim@meyering.net> >> >> * src/dd.c (set_fd_flags): Don't OR in -1 when fcntl fails. > > Doesn't that fix generate worse code in the usual case, since it > causes two conditional branches instead of one? > > How about this further patch? It relies on common subexpression > elimination, but that's common these days. > > 2004-04-08 Paul Eggert <eggert@twinsun.com> > > * src/dd.c (set_fd_flags): Don't test old_flags < 0 twice. I see your point. Thanks for the comment fix. I've gone ahead and reverted my change, since I think your original code is a little more readable than the more-efficient-when-fcntl-fails code that you're proposing. (set_fd_flags): Undo part of today's change: it's a little cleaner -- and more efficient in the common case -- to go ahead and OR in the -1 when fcntl fails. Index: src/dd.c =================================================================== RCS file: /fetish/cu/src/dd.c,v retrieving revision 1.158 retrieving revision 1.159 diff -u -p -u -r1.158 -r1.159 --- a/src/dd.c 8 Apr 2004 21:26:28 -0000 1.158 +++ b/src/dd.c 8 Apr 2004 21:30:18 -0000 1.159 @@ -1014,7 +1014,7 @@ copy_with_unblock (char const *buf, size } /* Set the file descriptor flags for FD that correspond to the nonzero bits - in FLAGS. The file's name is NAME. */ + in ADD_FLAGS. The file's name is NAME. */ static void set_fd_flags (int fd, int add_flags, char const *name) @@ -1022,7 +1022,7 @@ set_fd_flags (int fd, int add_flags, cha if (add_flags) { int old_flags = fcntl (fd, F_GETFL); - int new_flags = old_flags < 0 ? add_flags : (old_flags | add_flags); + int new_flags = old_flags | add_flags; if (old_flags < 0 || (new_flags != old_flags && fcntl (fd, F_SETFL, new_flags) == -1)) error (EXIT_FAILURE, errno, _("setting flags for %s"), quote (name)); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-08 6:56 ` Paul Eggert 2004-04-08 11:07 ` Jim Meyering @ 2004-04-08 16:23 ` Philippe Troin 2004-04-08 20:20 ` dd patch to remove noctty Paul Eggert 1 sibling, 1 reply; 32+ messages in thread From: Philippe Troin @ 2004-04-08 16:23 UTC (permalink / raw) To: Paul Eggert Cc: bug-coreutils, Andy Isaacson, Andrew Morton, Bruce Allen, linux-kernel Paul Eggert <eggert@CS.UCLA.EDU> writes: > dd has new iflags= and oflags= options with the following flags: > > append append mode (makes sense for output file only) > direct use direct I/O for data > dsync use synchronized I/O for data > sync likewise, but also for metadata > nonblock use non-blocking I/O > nofollow do not follow symlinks > noctty do not assign controlling terminal from file noctty definitely seems overkill... I can't see why dd would ever want to open a file without O_NOCTTY. On systems where O_NOCTTY makes sense (SvR4) that is. Phil. ^ permalink raw reply [flat|nested] 32+ messages in thread
* dd patch to remove noctty 2004-04-08 16:23 ` Philippe Troin @ 2004-04-08 20:20 ` Paul Eggert 2004-04-08 21:40 ` Jim Meyering 0 siblings, 1 reply; 32+ messages in thread From: Paul Eggert @ 2004-04-08 20:20 UTC (permalink / raw) To: bug-coreutils; +Cc: Andy Isaacson, Andrew Morton, Bruce Allen, linux-kernel Philippe Troin <phil@fifi.org> writes: > noctty definitely seems overkill... I can't see why dd would ever want > to open a file without O_NOCTTY. Good point; it's just a confusion for the user. Here's a patch to cause dd to always use O_NOCTTY. If someone ever needs it the other way (not likely) I suppose we can add a "ctty" flag. 2004-04-08 Paul Eggert <eggert@cs.ucla.edu> * NEWS: Remove noctty flag from dd. Suggested by Philippe Troin. * doc/coreutils.texi (dd invocation): Likewise. * src/shred.c (O_NOCTTY): Remove redundant decl. * src/dd.c (flags, usage): Remove noctty flag. (main): Always use O_NOCTTY when opening files. Index: NEWS =================================================================== RCS file: /home/meyering/coreutils/cu/NEWS,v retrieving revision 1.199 diff -p -u -r1.199 NEWS --- NEWS 8 Apr 2004 10:25:27 -0000 1.199 +++ NEWS 8 Apr 2004 20:04:42 -0000 @@ -24,7 +24,6 @@ GNU coreutils NEWS sync likewise, but also for metadata nonblock use non-blocking I/O nofollow do not follow symlinks - noctty do not assign controlling terminal from file stty now provides support (iutf8) for setting UTF-8 input mode. Index: doc/coreutils.texi =================================================================== RCS file: /home/meyering/coreutils/cu/doc/coreutils.texi,v retrieving revision 1.176 diff -p -u -r1.176 coreutils.texi --- doc/coreutils.texi 8 Apr 2004 15:35:40 -0000 1.176 +++ doc/coreutils.texi 8 Apr 2004 20:05:38 -0000 @@ -6666,18 +6666,12 @@ Use non-blocking I/O. @cindex symbolic links, following Do not follow symbolic links. -@item noctty -@opindex noctty -@cindex controlling terminal -Do not assign the file to be a controlling terminal for @command{dd}. -This has no effect when the file is not a terminal. - @end table These flags are not supported on all systems, and @samp{dd} rejects attempts to use them when they are not supported. When reading from -standard input or writing to standard output, the @samp{nofollow} and -@samp{noctty} flags should not be specified, and the other flags +standard input or writing to standard output, the @samp{nofollow} flag +should not be specified, and the other flags (e.g., @samp{nonblock}) can affect how other processes behave with the affected file descriptors, even after @command{dd} exits. Index: src/shred.c =================================================================== RCS file: /home/meyering/coreutils/cu/src/shred.c,v retrieving revision 1.84 diff -p -u -r1.84 shred.c --- src/shred.c 21 Jan 2004 23:39:34 -0000 1.84 +++ src/shred.c 8 Apr 2004 20:04:45 -0000 @@ -110,10 +110,6 @@ #include "quotearg.h" /* For quotearg_colon */ #include "quote.h" /* For quotearg_colon */ -#ifndef O_NOCTTY -# define O_NOCTTY 0 /* This is a very optional frill */ -#endif - #define DEFAULT_PASSES 25 /* Default */ /* How many seconds to wait before checking whether to output another --- src/dd.c-bak Thu Apr 8 12:17:02 2004 +++ src/dd.c Thu Apr 8 13:18:30 2004 @@ -192,7 +192,6 @@ static struct symbol_value const flags[] {"append", O_APPEND}, {"direct", O_DIRECT}, {"dsync", O_DSYNC}, - {"noctty", O_NOCTTY}, {"nofollow", O_NOFOLLOW}, {"nonblock", O_NONBLOCK}, {"sync", O_SYNC}, @@ -384,9 +383,6 @@ Each FLAG symbol may be:\n\ fputs (_(" nonblock use non-blocking I/O\n"), stdout); if (O_NOFOLLOW) fputs (_(" nofollow do not follow symlinks\n"), stdout); - if (O_NOCTTY) - fputs (_(" noctty do not assign controlling terminal from file\n"), - stdout); fputs (_("\ \n\ Note that sending a SIGUSR1 signal to a running `dd' process makes it\n\ @@ -1300,7 +1296,8 @@ main (int argc, char **argv) } else { - if (open_fd (STDIN_FILENO, input_file, O_RDONLY | input_flags, 0) < 0) + int opts = input_flags | O_NOCTTY; + if (open_fd (STDIN_FILENO, input_file, O_RDONLY | opts, 0) < 0) error (EXIT_FAILURE, errno, _("opening %s"), quote (input_file)); } @@ -1313,7 +1310,7 @@ main (int argc, char **argv) { mode_t perms = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH; int opts - = (output_flags + = (output_flags | O_NOCTTY | (conversions_mask & C_NOCREAT ? 0 : O_CREAT) | (conversions_mask & C_EXCL ? O_EXCL : 0) | (seek_records || (conversions_mask & C_NOTRUNC) ? 0 : O_TRUNC)); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd patch to remove noctty 2004-04-08 20:20 ` dd patch to remove noctty Paul Eggert @ 2004-04-08 21:40 ` Jim Meyering 0 siblings, 0 replies; 32+ messages in thread From: Jim Meyering @ 2004-04-08 21:40 UTC (permalink / raw) To: Paul Eggert Cc: bug-coreutils, Andrew Morton, Bruce Allen, linux-kernel, Andy Isaacson Paul Eggert <eggert@CS.UCLA.EDU> wrote: > Philippe Troin <phil@fifi.org> writes: > >> noctty definitely seems overkill... I can't see why dd would ever want >> to open a file without O_NOCTTY. > > Good point; it's just a confusion for the user. Here's a patch to > cause dd to always use O_NOCTTY. If someone ever needs it the other > way (not likely) I suppose we can add a "ctty" flag. > > 2004-04-08 Paul Eggert <eggert@cs.ucla.edu> > > * NEWS: Remove noctty flag from dd. Suggested by Philippe Troin. > * doc/coreutils.texi (dd invocation): Likewise. > * src/shred.c (O_NOCTTY): Remove redundant decl. > * src/dd.c (flags, usage): Remove noctty flag. > (main): Always use O_NOCTTY when opening files. Applied. Thanks! ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 19:47 ` Andy Isaacson 2004-04-07 20:03 ` Andrew Morton @ 2004-04-09 0:37 ` Anton Blanchard 2004-04-09 1:42 ` Wim Coekaerts 1 sibling, 1 reply; 32+ messages in thread From: Anton Blanchard @ 2004-04-09 0:37 UTC (permalink / raw) To: Andy Isaacson; +Cc: Andrew Morton, bug-coreutils, linux-kernel Hi, > OK, I can see that one. But it seems like a pretty small benefit to me > -- CPU utilization is already really low. Maybe not to you but it does make a big difference on our 500 disk setup. At the moment we use dd to do an initial sniff, then ext3 utils to do O_DIRECT reads/writes. With O_DIRECT read/write in dd we could use it instead. (We are basically interested in IO performance that a database would see). > Um, that sounds like a bad idea to me. It seems to me it's the kernel's > responsibility to figure out "hey, looks like a streaming read - let's > not blow out the buffer cache trying to hold 20GB on a 512M system." If > you're saying that the kernel guys have given up and the established > wisdom is now "you gotta use O_DIRECT if you don't want to throw > everything else out due to streaming data", well... I'm disappointed. When you start hitting memory bandwidth limits, O_DIRECT will help you. Sure it wont be an issue for your dd copy scenario, but I wanted to point out there are other valid uses for it. Anton ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-09 0:37 ` dd PATCH: add conv=direct Anton Blanchard @ 2004-04-09 1:42 ` Wim Coekaerts 2004-04-10 21:28 ` Jim Meyering 0 siblings, 1 reply; 32+ messages in thread From: Wim Coekaerts @ 2004-04-09 1:42 UTC (permalink / raw) To: Anton Blanchard; +Cc: Andy Isaacson, Andrew Morton, bug-coreutils, linux-kernel philip copeland did a whole set of patches for coreutils to allow directio for both read write and mixed sizes even the rpm is at http://oss.oracle.com/projects/ocfs/files/source/RHAT/RHAS3/coreutils-4.5.3-33.src.rpm, I think he took it up with the maintainers but so far had no luck it makes sense to have this stuff, we use it a lot. Wim On Fri, Apr 09, 2004 at 10:37:37AM +1000, Anton Blanchard wrote: > > Hi, > > > OK, I can see that one. But it seems like a pretty small benefit to me > > -- CPU utilization is already really low. > > Maybe not to you but it does make a big difference on our 500 disk setup. > At the moment we use dd to do an initial sniff, then ext3 utils to do > O_DIRECT reads/writes. With O_DIRECT read/write in dd we could use it > instead. (We are basically interested in IO performance that a database > would see). > > > Um, that sounds like a bad idea to me. It seems to me it's the kernel's > > responsibility to figure out "hey, looks like a streaming read - let's > > not blow out the buffer cache trying to hold 20GB on a 512M system." If > > you're saying that the kernel guys have given up and the established > > wisdom is now "you gotta use O_DIRECT if you don't want to throw > > everything else out due to streaming data", well... I'm disappointed. > > When you start hitting memory bandwidth limits, O_DIRECT will help you. > Sure it wont be an issue for your dd copy scenario, but I wanted to point > out there are other valid uses for it. > > Anton > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-09 1:42 ` Wim Coekaerts @ 2004-04-10 21:28 ` Jim Meyering 0 siblings, 0 replies; 32+ messages in thread From: Jim Meyering @ 2004-04-10 21:28 UTC (permalink / raw) To: Wim Coekaerts Cc: Anton Blanchard, Andrew Morton, bug-coreutils, linux-kernel, Andy Isaacson Wim Coekaerts <wim.coekaerts@oracle.com> wrote: > philip copeland did a whole set of patches for coreutils to allow > directio for both read write and mixed sizes even > the rpm is at > http://oss.oracle.com/projects/ocfs/files/source/RHAT/RHAS3/coreutils-4.5.3-33.src.rpm, Thanks for the pointer. FYI, that URL didn't work for me. This one did: http://oss.oracle.com/projects/ocfs/dist/files/source/RHAT/RHAS3/coreutils-4.5.3-33.src.rpm Whoever maintains that code should consider merging their changes with something newer. There are over 500 lines of NEWS entries alone for the coreutils releases that have been made since 4.5.3. Besides, those patches have portability and robustness problems, and they aren't even based on coreutils-4.5.3. Maybe they're based on some vendor's version of coreutils? > I think he took it up with the maintainers but so far had no luck As far as I know, no one ever submitted such O_DIRECT-based patches to me or any of the bug-*@gnu.org mailing lists. The only pending O_DIRECT-based patch is one for shred. Clean, complete, and well-justified patches usually go in pretty quickly. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 18:18 ` Andrew Morton 2004-04-07 19:24 ` Andy Isaacson @ 2004-04-07 20:46 ` Paul Eggert 2004-04-07 21:06 ` Andrew Morton 2004-04-07 21:09 ` Andy Isaacson 1 sibling, 2 replies; 32+ messages in thread From: Paul Eggert @ 2004-04-07 20:46 UTC (permalink / raw) To: Andrew Morton; +Cc: Andy Isaacson, bug-coreutils, linux-kernel Andrew Morton <akpm@osdl.org> writes: > In 2.6 we do the check at open() and fcntl() time. In 2.4 we don't > fail until the actual I/O attempt. This raises the issue of what "dd conv=direct" should do in 2.4 kernels. I propose that it should report an error and exit, when the write fails, since conv=direct can't be implemented. The basic idea is that on systems that lack direct I/O, conv=direct should fail. Another issue with this patch: in Solaris, direct I/O is done by invoking directio(DIRECTIO_ON); see <http://docs.sun.com/db/doc/816-0213/6m6ne37so?q=directio&a=view>. Is Solaris direct I/O a direct analog to Linux direct I/O, or are there subtle differences in semantics that should be made visible to the users of GNU "dd"? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 20:46 ` Paul Eggert @ 2004-04-07 21:06 ` Andrew Morton 2004-04-07 21:09 ` Andy Isaacson 1 sibling, 0 replies; 32+ messages in thread From: Andrew Morton @ 2004-04-07 21:06 UTC (permalink / raw) To: Paul Eggert; +Cc: adi, bug-coreutils, linux-kernel Paul Eggert <eggert@CS.UCLA.EDU> wrote: > > Andrew Morton <akpm@osdl.org> writes: > > > In 2.6 we do the check at open() and fcntl() time. In 2.4 we don't > > fail until the actual I/O attempt. > > This raises the issue of what "dd conv=direct" should do in 2.4 > kernels. I propose that it should report an error and exit, when the > write fails, And when the read fails. > since conv=direct can't be implemented. The basic idea > is that on systems that lack direct I/O, conv=direct should fail. I think that's best. It's a bit user-unfriendly, but a silent fallback is misleading. It might be acceptable to print a warning, then fall back. > Another issue with this patch: in Solaris, direct I/O is done by > invoking directio(DIRECTIO_ON); see > <http://docs.sun.com/db/doc/816-0213/6m6ne37so?q=directio&a=view>. > Is Solaris direct I/O a direct analog to Linux direct I/O, or are > there subtle differences in semantics that should be made visible to > the users of GNU "dd"? solaris directio(DIRECTIO_ON) is a hint only. If the system cannot perform the uncached zerocopy then it will fall back to buffered IO. Solaris will perform direct-IO "when the application's buffer is aligned on a two-byte (short) boundary, the offset into the file is on a device sector boundary, and the size of the operation is a multiple of device sectors." On Linux you set direct-io with open(O_DIRECT) or fcntl(F_SETFL, O_DIRECT). If the filesystem doesn't support direct-io we will fail the open/fcntl attempt up-front in 2.6 only. If O_DIRECT was successfully set and the IO is not correctly aligned Linux will fail the relevant I/O attempt. Alignment requirements are: 2.4: page aligned on-disk and in-memory 2.6: device sector aligned on-disk and in-memory. I'd recomment that dd use getpagesize() alignment on-disk and in-memory. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 20:46 ` Paul Eggert 2004-04-07 21:06 ` Andrew Morton @ 2004-04-07 21:09 ` Andy Isaacson 1 sibling, 0 replies; 32+ messages in thread From: Andy Isaacson @ 2004-04-07 21:09 UTC (permalink / raw) To: Paul Eggert; +Cc: Andrew Morton, bug-coreutils, linux-kernel On Wed, Apr 07, 2004 at 01:46:31PM -0700, Paul Eggert wrote: > Andrew Morton <akpm@osdl.org> writes: > > In 2.6 we do the check at open() and fcntl() time. In 2.4 we don't > > fail until the actual I/O attempt. > > This raises the issue of what "dd conv=direct" should do in 2.4 > kernels. I propose that it should report an error and exit, when the > write fails, since conv=direct can't be implemented. The basic idea > is that on systems that lack direct I/O, conv=direct should fail. I agree, this is the appropriate behavior. When "dd conv=direct" is used on a fd that cannot support direct I/O, and the kernel doesn't tell us until it's too late, erroring on the I/O is the only sane action. Fortuitously, that's what happens with the patches I posted. (Note that it's a function of the kernel and the underlying FS whether direct I/O is going to work. So some kind of heuristic on kernel version is a bad idea.) > Another issue with this patch: in Solaris, direct I/O is done by > invoking directio(DIRECTIO_ON); see > <http://docs.sun.com/db/doc/816-0213/6m6ne37so?q=directio&a=view>. > Is Solaris direct I/O a direct analog to Linux direct I/O, or are > there subtle differences in semantics that should be made visible to > the users of GNU "dd"? The Solaris semantics are more sane than the Linux ones, to be sure -- it always does the I/O, falling back to buffered I/O if conditions are not right for direct. I don't believe I have a test system for the Solaris case (when was directio(3C) added?) so someone else will have to add that support. -andy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 17:31 ` Andy Isaacson 2004-04-07 18:18 ` Andrew Morton @ 2004-04-07 19:12 ` Miquel van Smoorenburg 2004-04-07 20:14 ` Andy Isaacson 1 sibling, 1 reply; 32+ messages in thread From: Miquel van Smoorenburg @ 2004-04-07 19:12 UTC (permalink / raw) To: linux-kernel In article <20040407173116.GB2814@hexapodia.org>, Andy Isaacson <adi@hexapodia.org> wrote: >The next feature to add would be OpenBSD-style "KB/s" reporting. I'm >not going there. > >diff -ur coreutils-5.0.91/doc/coreutils.texi >coreutils-5.0.91-adi/doc/coreutils.texi Doesn't it already do that ? $ dd if=/dev/zero of=/tmp/file bs=4K count=100 100+0 records in 100+0 records out 409600 bytes transferred in 0.005108 seconds (80189583 bytes/sec) $ dpkg -S /bin/dd coreutils: /bin/dd $ dpkg -s coreutils Package: coreutils Version: 5.0.91-2 Mike. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 19:12 ` Miquel van Smoorenburg @ 2004-04-07 20:14 ` Andy Isaacson 0 siblings, 0 replies; 32+ messages in thread From: Andy Isaacson @ 2004-04-07 20:14 UTC (permalink / raw) To: Miquel van Smoorenburg; +Cc: linux-kernel On Wed, Apr 07, 2004 at 07:12:21PM +0000, Miquel van Smoorenburg wrote: > In article <20040407173116.GB2814@hexapodia.org>, > Andy Isaacson <adi@hexapodia.org> wrote: > >The next feature to add would be OpenBSD-style "KB/s" reporting. I'm > >not going there. > > > >diff -ur coreutils-5.0.91/doc/coreutils.texi > >coreutils-5.0.91-adi/doc/coreutils.texi > > Doesn't it already do that ? > > $ dd if=/dev/zero of=/tmp/file bs=4K count=100 > 100+0 records in > 100+0 records out > 409600 bytes transferred in 0.005108 seconds (80189583 bytes/sec) hmmm... debian/patches/05_dd-performance-counter.patch Not upstream. -andy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-06 22:03 dd PATCH: add conv=direct Andy Isaacson 2004-04-07 0:33 ` Andrew Morton @ 2004-04-07 22:02 ` Nathan Straz 2004-04-07 22:09 ` Andy Isaacson 1 sibling, 1 reply; 32+ messages in thread From: Nathan Straz @ 2004-04-07 22:02 UTC (permalink / raw) To: Andy Isaacson; +Cc: bug-coreutils, linux-kernel On Tue, Apr 06, 2004 at 05:03:58PM -0500, Andy Isaacson wrote: > Linux-kernel: is this patch horribly wrong? ... > to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to > enable this usage. Adding the functionality to conv= doesn't seem right to me. conv= is for converting the data in some way. This is just changing the way data is written. Right? Have you looked at lmdd? It supports O_DIRECT and many other things not in the standard dd. -- Nate Straz nstraz@sgi.com sgi, inc http://www.sgi.com/ Linux Test Project http://ltp.sf.net/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 22:02 ` Nathan Straz @ 2004-04-07 22:09 ` Andy Isaacson 2004-04-08 11:44 ` Miquel van Smoorenburg 0 siblings, 1 reply; 32+ messages in thread From: Andy Isaacson @ 2004-04-07 22:09 UTC (permalink / raw) To: bug-coreutils, linux-kernel On Wed, Apr 07, 2004 at 05:02:24PM -0500, Nathan Straz wrote: > On Tue, Apr 06, 2004 at 05:03:58PM -0500, Andy Isaacson wrote: > > Linux-kernel: is this patch horribly wrong? > ... > > to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to > > enable this usage. > > Adding the functionality to conv= doesn't seem right to me. conv= is > for converting the data in some way. This is just changing the way data > is written. Right? There's already conv=notrunc which means "open without O_TRUNC". I agree that it's a nonintuitive interface, but then, the entire dd(1) command line is. > Have you looked at lmdd? It supports O_DIRECT and many other things not > in the standard dd. Wasn't aware of that one. Thanks for the pointer. -andy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: dd PATCH: add conv=direct 2004-04-07 22:09 ` Andy Isaacson @ 2004-04-08 11:44 ` Miquel van Smoorenburg 0 siblings, 0 replies; 32+ messages in thread From: Miquel van Smoorenburg @ 2004-04-08 11:44 UTC (permalink / raw) To: linux-kernel In article <20040407220928.GI2814@hexapodia.org>, Andy Isaacson <adi@hexapodia.org> wrote: >On Wed, Apr 07, 2004 at 05:02:24PM -0500, Nathan Straz wrote: >> On Tue, Apr 06, 2004 at 05:03:58PM -0500, Andy Isaacson wrote: >> > Linux-kernel: is this patch horribly wrong? >> ... >> > to force O_DIRECT. The enclosed patch adds a "conv=direct" flag to >> > enable this usage. >> >> Adding the functionality to conv= doesn't seem right to me. conv= is >> for converting the data in some way. This is just changing the way data >> is written. Right? > >There's already conv=notrunc which means "open without O_TRUNC". I >agree that it's a nonintuitive interface, but then, the entire dd(1) >command line is. Well as you already added iflags and oflags, it makes more sense to use those. oflags=fsync makes more sense to me than conv=fsync. Likewise oflags=notrunc. Mike. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2004-04-10 21:28 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-04-06 22:03 dd PATCH: add conv=direct Andy Isaacson 2004-04-07 0:33 ` Andrew Morton 2004-04-07 16:21 ` Bruce Allen 2004-04-07 16:42 ` Andrew Morton 2004-04-07 17:31 ` Andy Isaacson 2004-04-07 18:18 ` Andrew Morton 2004-04-07 19:24 ` Andy Isaacson 2004-04-07 19:34 ` Andrew Morton 2004-04-07 19:47 ` Andy Isaacson 2004-04-07 20:03 ` Andrew Morton 2004-04-07 20:43 ` Andy Isaacson 2004-04-07 21:00 ` Valdis.Kletnieks 2004-04-07 21:35 ` Bruce Allen 2004-04-08 6:56 ` Paul Eggert 2004-04-08 11:07 ` Jim Meyering 2004-04-08 19:32 ` Paul Eggert 2004-04-08 19:51 ` Paul Jarc 2004-04-08 21:34 ` Jim Meyering 2004-04-08 16:23 ` Philippe Troin 2004-04-08 20:20 ` dd patch to remove noctty Paul Eggert 2004-04-08 21:40 ` Jim Meyering 2004-04-09 0:37 ` dd PATCH: add conv=direct Anton Blanchard 2004-04-09 1:42 ` Wim Coekaerts 2004-04-10 21:28 ` Jim Meyering 2004-04-07 20:46 ` Paul Eggert 2004-04-07 21:06 ` Andrew Morton 2004-04-07 21:09 ` Andy Isaacson 2004-04-07 19:12 ` Miquel van Smoorenburg 2004-04-07 20:14 ` Andy Isaacson 2004-04-07 22:02 ` Nathan Straz 2004-04-07 22:09 ` Andy Isaacson 2004-04-08 11:44 ` Miquel van Smoorenburg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).