* hardlink util -- files de-duplication @ 2018-06-01 11:38 Karel Zak 2018-06-01 13:08 ` Ruediger Meier ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Karel Zak @ 2018-06-01 11:38 UTC (permalink / raw) To: util-linux For last 17 years in Red Hat based distros is available hardlink(1) util, man hardlink: hardlink traverses one or more directories searching for duplicate files. When it finds duplicate files, it uses one of them as the master. It then removes all other duplicates and places a hardlink for each one pointing to the master file. This allows for conservation of disk space where multiple directories on a single filesystem contain many dupli‐ cate files. ... the util is little bit orphaned, what about to add this util to util-linux to make it available for another distros and keep it maintained in serious way? ;-) https://src.fedoraproject.org/cgit/rpms/hardlink.git/ It's one .c file. Comments & objections? Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak @ 2018-06-01 13:08 ` Ruediger Meier 2018-06-01 20:20 ` Kevin Fenzi 2018-06-01 13:25 ` Aurélien Aptel 2018-06-01 13:29 ` Dmitry V. Levin 2 siblings, 1 reply; 11+ messages in thread From: Ruediger Meier @ 2018-06-01 13:08 UTC (permalink / raw) To: Karel Zak; +Cc: util-linux, Kevin Fenzi On Friday 01 June 2018, Karel Zak wrote: > For last 17 years in Red Hat based distros is available hardlink(1) > util, man hardlink: > > hardlink traverses one or more directories searching for duplicate > files. When it finds duplicate files, it uses one of them as the > master. It then removes all other duplicates and places a > hardlink for each one pointing to the master file. This allows for > conservation of disk space where multiple directories on a single > filesystem contain many dupli cate files. > > ... > > the util is little bit orphaned, what about to add this util to > util-linux to make it available for another distros and keep it > maintained in serious way? ;-) +1 > > https://src.fedoraproject.org/cgit/rpms/hardlink.git/ The original and almost identical repo is this: https://pagure.io/hardlink.git I've CC'ed the project admin Kevin Fenzi. > It's one .c file. > > Comments & objections? > > Karel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-01 13:08 ` Ruediger Meier @ 2018-06-01 20:20 ` Kevin Fenzi 2018-06-02 0:00 ` Francisco J. Tsao Santin 0 siblings, 1 reply; 11+ messages in thread From: Kevin Fenzi @ 2018-06-01 20:20 UTC (permalink / raw) To: Ruediger Meier, Karel Zak; +Cc: util-linux, tsao [-- Attachment #1.1: Type: text/plain, Size: 1345 bytes --] On 06/01/2018 06:08 AM, Ruediger Meier wrote: > On Friday 01 June 2018, Karel Zak wrote: >> For last 17 years in Red Hat based distros is available hardlink(1) >> util, man hardlink: >> >> hardlink traverses one or more directories searching for duplicate >> files. When it finds duplicate files, it uses one of them as the >> master. It then removes all other duplicates and places a >> hardlink for each one pointing to the master file. This allows for >> conservation of disk space where multiple directories on a single >> filesystem contain many dupli cate files. >> >> ... >> >> the util is little bit orphaned, what about to add this util to >> util-linux to make it available for another distros and keep it >> maintained in serious way? ;-) > > +1 > >> >> https://src.fedoraproject.org/cgit/rpms/hardlink.git/ > > The original and almost identical repo is this: > https://pagure.io/hardlink.git > > I've CC'ed the project admin Kevin Fenzi. I've also added my co-maintainer (tsao@fedoraproject.org). He's done most of the recent work on it, I haven't had time to do much with it at all. I'd personally be in favor of it moving into util-linux. Hopefully it would get more time and attention there and more widespread use. If tsao agrees, lets make it happen. kevin [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-01 20:20 ` Kevin Fenzi @ 2018-06-02 0:00 ` Francisco J. Tsao Santin 0 siblings, 0 replies; 11+ messages in thread From: Francisco J. Tsao Santin @ 2018-06-02 0:00 UTC (permalink / raw) To: Kevin Fenzi; +Cc: Ruediger Meier, Karel Zak, util-linux, tsao [-- Attachment #1: Type: text/plain, Size: 1310 bytes --] On Fri, 1 Jun 2018, Kevin Fenzi wrote: > On 06/01/2018 06:08 AM, Ruediger Meier wrote: > > On Friday 01 June 2018, Karel Zak wrote: > >> the util is little bit orphaned, what about to add this util to > >> util-linux to make it available for another distros and keep it > >> maintained in serious way? ;-) > > > > I've also added my co-maintainer (tsao@fedoraproject.org). > He's done most of the recent work on it, I haven't had time to do much > with it at all. Yep, I would like having more time to improve a bit the tool but... I only added some patches for bugfixing. > > I'd personally be in favor of it moving into util-linux. Hopefully it > would get more time and attention there and more widespread use. > > If tsao agrees, lets make it happen. Good for me too :-) I only want to point a little issue: I suppose you know, there is another hardlink tool with similar functions, and (re)written in python by the Debian people[1][2]. In fact, the name of the file is the same (but they are placed in different paths, /usr/bin vs /usr/sbin). I hope it doesn't cause a distro-war ;-) [1] https://packages.debian.org/sid/utils/hardlink [2] https://jak-linux.org/projects/hardlink/ -- Francisco Javier Tsao Santín http://gattaca.es 1024D/71CF4D62 42 F1 53 35 EF 98 98 8A FC 6C 56 B3 4C A7 7D FB ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak 2018-06-01 13:08 ` Ruediger Meier @ 2018-06-01 13:25 ` Aurélien Aptel 2018-06-01 13:45 ` Samuel Thibault 2018-06-06 12:02 ` Carlos Santos 2018-06-01 13:29 ` Dmitry V. Levin 2 siblings, 2 replies; 11+ messages in thread From: Aurélien Aptel @ 2018-06-01 13:25 UTC (permalink / raw) To: Karel Zak, util-linux Karel Zak <kzak@redhat.com> writes: > Comments & objections? Not objecting but I feel like I should mention there are multiple well-established alternatives: https://github.com/tobiasschulz/fdupes http://freedup.org/ https://rdfind.pauldreik.se/ https://github.com/markfasheh/duperemove -- Aurélien Aptel / SUSE Labs Samba Team GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3 SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-01 13:25 ` Aurélien Aptel @ 2018-06-01 13:45 ` Samuel Thibault 2018-06-06 12:02 ` Carlos Santos 1 sibling, 0 replies; 11+ messages in thread From: Samuel Thibault @ 2018-06-01 13:45 UTC (permalink / raw) To: Aurélien Aptel; +Cc: Karel Zak, util-linux Aurélien Aptel, le ven. 01 juin 2018 15:25:48 +0200, a ecrit: > Karel Zak <kzak@redhat.com> writes: > > Comments & objections? > > Not objecting but I feel like I should mention there are multiple > well-established alternatives: > > https://github.com/tobiasschulz/fdupes > http://freedup.org/ > https://rdfind.pauldreik.se/ > https://github.com/markfasheh/duperemove Yes, in Debian it was mentioned that we'd need a de-duplication tool for de-duplication tools :) Samuel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-01 13:25 ` Aurélien Aptel 2018-06-01 13:45 ` Samuel Thibault @ 2018-06-06 12:02 ` Carlos Santos 1 sibling, 0 replies; 11+ messages in thread From: Carlos Santos @ 2018-06-06 12:02 UTC (permalink / raw) To: Aurélien Aptel; +Cc: Karel Zak, util-linux > From: "Aur=C3=A9lien Aptel" <aaptel@suse.com> > To: "Karel Zak" <kzak@redhat.com>, "util-linux" <util-linux@vger.kernel.o= rg> > Sent: Friday, June 1, 2018 10:25:48 AM > Subject: Re: hardlink util -- files de-duplication > Karel Zak <kzak@redhat.com> writes: >> Comments & objections? >=20 > Not objecting but I feel like I should mention there are multiple > well-established alternatives: >=20 > https://github.com/tobiasschulz/fdupes > http://freedup.org/ > https://rdfind.pauldreik.se/ > https://github.com/markfasheh/duperemove Compared to hardlink, fdupes has a richer feature set (e.g. dedupsoft links). FreeDup has its own life and does not seem to be adoptable. Dupremove is a complex software which requires glib2 and sqlite3. --=20 Carlos Santos (Casantos) - DATACOM, P&D =E2=80=9CMarched towards the enemy, spear upright, armed with the certainty that only the ignorant can have.=E2=80=9D =E2=80=94 Epitaph of a volunteer ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak 2018-06-01 13:08 ` Ruediger Meier 2018-06-01 13:25 ` Aurélien Aptel @ 2018-06-01 13:29 ` Dmitry V. Levin 2018-06-12 10:55 ` Karel Zak 2 siblings, 1 reply; 11+ messages in thread From: Dmitry V. Levin @ 2018-06-01 13:29 UTC (permalink / raw) To: Karel Zak; +Cc: Alexey Gladkov, util-linux [-- Attachment #1: Type: text/plain, Size: 1019 bytes --] On Fri, Jun 01, 2018 at 01:38:07PM +0200, Karel Zak wrote: > For last 17 years in Red Hat based distros is available hardlink(1) > util, man hardlink: > > hardlink traverses one or more directories searching for duplicate > files. When it finds duplicate files, it uses one of them as the > master. It then removes all other duplicates and places a hardlink > for each one pointing to the master file. This allows for > conservation of disk space where multiple directories on a single > filesystem contain many dupli‐ cate files. > > ... > > the util is little bit orphaned, what about to add this util to > util-linux to make it available for another distros and keep it > maintained in serious way? ;-) > > https://src.fedoraproject.org/cgit/rpms/hardlink.git/ > > It's one .c file. > > Comments & objections? Better late than never. BTW, our hardlink package has some Owl patches applied, please remind us to rebase and submit them. ;) -- ldv [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-01 13:29 ` Dmitry V. Levin @ 2018-06-12 10:55 ` Karel Zak 2018-06-12 11:22 ` Ruediger Meier 0 siblings, 1 reply; 11+ messages in thread From: Karel Zak @ 2018-06-12 10:55 UTC (permalink / raw) To: Alexey Gladkov, util-linux On Fri, Jun 01, 2018 at 04:29:30PM +0300, Dmitry V. Levin wrote: > On Fri, Jun 01, 2018 at 01:38:07PM +0200, Karel Zak wrote: > > For last 17 years in Red Hat based distros is available hardlink(1) > > util, man hardlink: > > > > hardlink traverses one or more directories searching for duplicate > > files. When it finds duplicate files, it uses one of them as the > > master. It then removes all other duplicates and places a hardlink > > for each one pointing to the master file. This allows for > > conservation of disk space where multiple directories on a single > > filesystem contain many dupli‐ cate files. > > > > ... > > > > the util is little bit orphaned, what about to add this util to > > util-linux to make it available for another distros and keep it > > maintained in serious way? ;-) > > > > https://src.fedoraproject.org/cgit/rpms/hardlink.git/ > > > > It's one .c file. > > > > Comments & objections? > > Better late than never. > > BTW, our hardlink package has some Owl patches applied, > please remind us to rebase and submit them. ;) It seems there is no any strong objection against hardlink. So, I think we can add it as *optional* (--enable-hardlink) to util-linux. IMHO it's good idea to have such tool in basic Linux toolset. The long term goal should be to add another new features to make it more attractive to users who have to use another alternatives now :-) I won't have enough time in next two weeks to work on this task (fix indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-) Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-12 10:55 ` Karel Zak @ 2018-06-12 11:22 ` Ruediger Meier 2018-06-12 12:12 ` Karel Zak 0 siblings, 1 reply; 11+ messages in thread From: Ruediger Meier @ 2018-06-12 11:22 UTC (permalink / raw) To: Karel Zak; +Cc: Alexey Gladkov, util-linux On Tuesday 12 June 2018, Karel Zak wrote: > I won't have enough time in next two weeks to work on this task (fix > indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-) I would do this, maybe next week or so. Should it go to sys-utils or misc-utils? cu, Rudi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication 2018-06-12 11:22 ` Ruediger Meier @ 2018-06-12 12:12 ` Karel Zak 0 siblings, 0 replies; 11+ messages in thread From: Karel Zak @ 2018-06-12 12:12 UTC (permalink / raw) To: Ruediger Meier; +Cc: Alexey Gladkov, util-linux On Tue, Jun 12, 2018 at 01:22:45PM +0200, Ruediger Meier wrote: > On Tuesday 12 June 2018, Karel Zak wrote: > > I won't have enough time in next two weeks to work on this task (fix > > indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-) > > I would do this, maybe next week or so. Should it go to sys-utils or > misc-utils? Thanks! I think misc-utils is better in this case. The sys-utils directory should be used for kernel API wrappers (ioctl, syscalls, sysfs, etc) -- hmm... why we have kill.c in misc-utils? :-) (But it's "color of the bikeshed" topic, so better not start this discussion :-) Note that the best way is to use "indent --linux" for the first patch. And another changes to the code do by additional patches. So, we will able to keep track about our local changes. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-06-12 12:12 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak 2018-06-01 13:08 ` Ruediger Meier 2018-06-01 20:20 ` Kevin Fenzi 2018-06-02 0:00 ` Francisco J. Tsao Santin 2018-06-01 13:25 ` Aurélien Aptel 2018-06-01 13:45 ` Samuel Thibault 2018-06-06 12:02 ` Carlos Santos 2018-06-01 13:29 ` Dmitry V. Levin 2018-06-12 10:55 ` Karel Zak 2018-06-12 11:22 ` Ruediger Meier 2018-06-12 12:12 ` Karel Zak
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.