* hardlink util -- files de-duplication
@ 2018-06-01 11:38 Karel Zak
2018-06-01 13:08 ` Ruediger Meier
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Karel Zak @ 2018-06-01 11:38 UTC (permalink / raw)
To: util-linux
For last 17 years in Red Hat based distros is available hardlink(1)
util, man hardlink:
hardlink traverses one or more directories searching for duplicate
files. When it finds duplicate files, it uses one of them as the
master. It then removes all other duplicates and places a hardlink
for each one pointing to the master file. This allows for
conservation of disk space where multiple directories on a single
filesystem contain many dupli‐ cate files.
...
the util is little bit orphaned, what about to add this util to
util-linux to make it available for another distros and keep it
maintained in serious way? ;-)
https://src.fedoraproject.org/cgit/rpms/hardlink.git/
It's one .c file.
Comments & objections?
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak
@ 2018-06-01 13:08 ` Ruediger Meier
2018-06-01 20:20 ` Kevin Fenzi
2018-06-01 13:25 ` Aurélien Aptel
2018-06-01 13:29 ` Dmitry V. Levin
2 siblings, 1 reply; 11+ messages in thread
From: Ruediger Meier @ 2018-06-01 13:08 UTC (permalink / raw)
To: Karel Zak; +Cc: util-linux, Kevin Fenzi
On Friday 01 June 2018, Karel Zak wrote:
> For last 17 years in Red Hat based distros is available hardlink(1)
> util, man hardlink:
>
> hardlink traverses one or more directories searching for duplicate
> files. When it finds duplicate files, it uses one of them as the
> master. It then removes all other duplicates and places a
> hardlink for each one pointing to the master file. This allows for
> conservation of disk space where multiple directories on a single
> filesystem contain many dupli cate files.
>
> ...
>
> the util is little bit orphaned, what about to add this util to
> util-linux to make it available for another distros and keep it
> maintained in serious way? ;-)
+1
>
> https://src.fedoraproject.org/cgit/rpms/hardlink.git/
The original and almost identical repo is this:
https://pagure.io/hardlink.git
I've CC'ed the project admin Kevin Fenzi.
> It's one .c file.
>
> Comments & objections?
>
> Karel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak
2018-06-01 13:08 ` Ruediger Meier
@ 2018-06-01 13:25 ` Aurélien Aptel
2018-06-01 13:45 ` Samuel Thibault
2018-06-06 12:02 ` Carlos Santos
2018-06-01 13:29 ` Dmitry V. Levin
2 siblings, 2 replies; 11+ messages in thread
From: Aurélien Aptel @ 2018-06-01 13:25 UTC (permalink / raw)
To: Karel Zak, util-linux
Karel Zak <kzak@redhat.com> writes:
> Comments & objections?
Not objecting but I feel like I should mention there are multiple
well-established alternatives:
https://github.com/tobiasschulz/fdupes
http://freedup.org/
https://rdfind.pauldreik.se/
https://github.com/markfasheh/duperemove
--
Aurélien Aptel / SUSE Labs Samba Team
GPG: 1839 CB5F 9F5B FB9B AA97 8C99 03C8 A49B 521B D5D3
SUSE Linux GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak
2018-06-01 13:08 ` Ruediger Meier
2018-06-01 13:25 ` Aurélien Aptel
@ 2018-06-01 13:29 ` Dmitry V. Levin
2018-06-12 10:55 ` Karel Zak
2 siblings, 1 reply; 11+ messages in thread
From: Dmitry V. Levin @ 2018-06-01 13:29 UTC (permalink / raw)
To: Karel Zak; +Cc: Alexey Gladkov, util-linux
[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]
On Fri, Jun 01, 2018 at 01:38:07PM +0200, Karel Zak wrote:
> For last 17 years in Red Hat based distros is available hardlink(1)
> util, man hardlink:
>
> hardlink traverses one or more directories searching for duplicate
> files. When it finds duplicate files, it uses one of them as the
> master. It then removes all other duplicates and places a hardlink
> for each one pointing to the master file. This allows for
> conservation of disk space where multiple directories on a single
> filesystem contain many dupli‐ cate files.
>
> ...
>
> the util is little bit orphaned, what about to add this util to
> util-linux to make it available for another distros and keep it
> maintained in serious way? ;-)
>
> https://src.fedoraproject.org/cgit/rpms/hardlink.git/
>
> It's one .c file.
>
> Comments & objections?
Better late than never.
BTW, our hardlink package has some Owl patches applied,
please remind us to rebase and submit them. ;)
--
ldv
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-01 13:25 ` Aurélien Aptel
@ 2018-06-01 13:45 ` Samuel Thibault
2018-06-06 12:02 ` Carlos Santos
1 sibling, 0 replies; 11+ messages in thread
From: Samuel Thibault @ 2018-06-01 13:45 UTC (permalink / raw)
To: Aurélien Aptel; +Cc: Karel Zak, util-linux
Aurélien Aptel, le ven. 01 juin 2018 15:25:48 +0200, a ecrit:
> Karel Zak <kzak@redhat.com> writes:
> > Comments & objections?
>
> Not objecting but I feel like I should mention there are multiple
> well-established alternatives:
>
> https://github.com/tobiasschulz/fdupes
> http://freedup.org/
> https://rdfind.pauldreik.se/
> https://github.com/markfasheh/duperemove
Yes, in Debian it was mentioned that we'd need a de-duplication tool for
de-duplication tools :)
Samuel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-01 13:08 ` Ruediger Meier
@ 2018-06-01 20:20 ` Kevin Fenzi
2018-06-02 0:00 ` Francisco J. Tsao Santin
0 siblings, 1 reply; 11+ messages in thread
From: Kevin Fenzi @ 2018-06-01 20:20 UTC (permalink / raw)
To: Ruediger Meier, Karel Zak; +Cc: util-linux, tsao
[-- Attachment #1.1: Type: text/plain, Size: 1345 bytes --]
On 06/01/2018 06:08 AM, Ruediger Meier wrote:
> On Friday 01 June 2018, Karel Zak wrote:
>> For last 17 years in Red Hat based distros is available hardlink(1)
>> util, man hardlink:
>>
>> hardlink traverses one or more directories searching for duplicate
>> files. When it finds duplicate files, it uses one of them as the
>> master. It then removes all other duplicates and places a
>> hardlink for each one pointing to the master file. This allows for
>> conservation of disk space where multiple directories on a single
>> filesystem contain many dupli cate files.
>>
>> ...
>>
>> the util is little bit orphaned, what about to add this util to
>> util-linux to make it available for another distros and keep it
>> maintained in serious way? ;-)
>
> +1
>
>>
>> https://src.fedoraproject.org/cgit/rpms/hardlink.git/
>
> The original and almost identical repo is this:
> https://pagure.io/hardlink.git
>
> I've CC'ed the project admin Kevin Fenzi.
I've also added my co-maintainer (tsao@fedoraproject.org).
He's done most of the recent work on it, I haven't had time to do much
with it at all.
I'd personally be in favor of it moving into util-linux. Hopefully it
would get more time and attention there and more widespread use.
If tsao agrees, lets make it happen.
kevin
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-01 20:20 ` Kevin Fenzi
@ 2018-06-02 0:00 ` Francisco J. Tsao Santin
0 siblings, 0 replies; 11+ messages in thread
From: Francisco J. Tsao Santin @ 2018-06-02 0:00 UTC (permalink / raw)
To: Kevin Fenzi; +Cc: Ruediger Meier, Karel Zak, util-linux, tsao
[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]
On Fri, 1 Jun 2018, Kevin Fenzi wrote:
> On 06/01/2018 06:08 AM, Ruediger Meier wrote:
> > On Friday 01 June 2018, Karel Zak wrote:
> >> the util is little bit orphaned, what about to add this util to
> >> util-linux to make it available for another distros and keep it
> >> maintained in serious way? ;-)
> >
>
> I've also added my co-maintainer (tsao@fedoraproject.org).
> He's done most of the recent work on it, I haven't had time to do much
> with it at all.
Yep, I would like having more time to improve a bit the tool but... I
only added some patches for bugfixing.
>
> I'd personally be in favor of it moving into util-linux. Hopefully it
> would get more time and attention there and more widespread use.
>
> If tsao agrees, lets make it happen.
Good for me too :-) I only want to point a little issue: I suppose you know,
there is another hardlink tool with similar functions, and (re)written in
python by the Debian people[1][2]. In fact, the name of the file is the
same (but they are placed in different paths, /usr/bin vs /usr/sbin).
I hope it doesn't cause a distro-war ;-)
[1] https://packages.debian.org/sid/utils/hardlink
[2] https://jak-linux.org/projects/hardlink/
--
Francisco Javier Tsao Santín
http://gattaca.es
1024D/71CF4D62 42 F1 53 35 EF 98 98 8A FC 6C 56 B3 4C A7 7D FB
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-01 13:25 ` Aurélien Aptel
2018-06-01 13:45 ` Samuel Thibault
@ 2018-06-06 12:02 ` Carlos Santos
1 sibling, 0 replies; 11+ messages in thread
From: Carlos Santos @ 2018-06-06 12:02 UTC (permalink / raw)
To: Aurélien Aptel; +Cc: Karel Zak, util-linux
> From: "Aur=C3=A9lien Aptel" <aaptel@suse.com>
> To: "Karel Zak" <kzak@redhat.com>, "util-linux" <util-linux@vger.kernel.o=
rg>
> Sent: Friday, June 1, 2018 10:25:48 AM
> Subject: Re: hardlink util -- files de-duplication
> Karel Zak <kzak@redhat.com> writes:
>> Comments & objections?
>=20
> Not objecting but I feel like I should mention there are multiple
> well-established alternatives:
>=20
> https://github.com/tobiasschulz/fdupes
> http://freedup.org/
> https://rdfind.pauldreik.se/
> https://github.com/markfasheh/duperemove
Compared to hardlink, fdupes has a richer feature set (e.g. dedupsoft
links). FreeDup has its own life and does not seem to be adoptable.
Dupremove is a complex software which requires glib2 and sqlite3.
--=20
Carlos Santos (Casantos) - DATACOM, P&D
=E2=80=9CMarched towards the enemy, spear upright, armed with the certainty
that only the ignorant can have.=E2=80=9D =E2=80=94 Epitaph of a volunteer
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-01 13:29 ` Dmitry V. Levin
@ 2018-06-12 10:55 ` Karel Zak
2018-06-12 11:22 ` Ruediger Meier
0 siblings, 1 reply; 11+ messages in thread
From: Karel Zak @ 2018-06-12 10:55 UTC (permalink / raw)
To: Alexey Gladkov, util-linux
On Fri, Jun 01, 2018 at 04:29:30PM +0300, Dmitry V. Levin wrote:
> On Fri, Jun 01, 2018 at 01:38:07PM +0200, Karel Zak wrote:
> > For last 17 years in Red Hat based distros is available hardlink(1)
> > util, man hardlink:
> >
> > hardlink traverses one or more directories searching for duplicate
> > files. When it finds duplicate files, it uses one of them as the
> > master. It then removes all other duplicates and places a hardlink
> > for each one pointing to the master file. This allows for
> > conservation of disk space where multiple directories on a single
> > filesystem contain many dupli‐ cate files.
> >
> > ...
> >
> > the util is little bit orphaned, what about to add this util to
> > util-linux to make it available for another distros and keep it
> > maintained in serious way? ;-)
> >
> > https://src.fedoraproject.org/cgit/rpms/hardlink.git/
> >
> > It's one .c file.
> >
> > Comments & objections?
>
> Better late than never.
>
> BTW, our hardlink package has some Owl patches applied,
> please remind us to rebase and submit them. ;)
It seems there is no any strong objection against hardlink. So, I
think we can add it as *optional* (--enable-hardlink) to util-linux.
IMHO it's good idea to have such tool in basic Linux toolset.
The long term goal should be to add another new features
to make it more attractive to users who have to use another
alternatives now :-)
I won't have enough time in next two weeks to work on this task (fix
indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-)
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-12 10:55 ` Karel Zak
@ 2018-06-12 11:22 ` Ruediger Meier
2018-06-12 12:12 ` Karel Zak
0 siblings, 1 reply; 11+ messages in thread
From: Ruediger Meier @ 2018-06-12 11:22 UTC (permalink / raw)
To: Karel Zak; +Cc: Alexey Gladkov, util-linux
On Tuesday 12 June 2018, Karel Zak wrote:
> I won't have enough time in next two weeks to work on this task (fix
> indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-)
I would do this, maybe next week or so. Should it go to sys-utils or
misc-utils?
cu,
Rudi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: hardlink util -- files de-duplication
2018-06-12 11:22 ` Ruediger Meier
@ 2018-06-12 12:12 ` Karel Zak
0 siblings, 0 replies; 11+ messages in thread
From: Karel Zak @ 2018-06-12 12:12 UTC (permalink / raw)
To: Ruediger Meier; +Cc: Alexey Gladkov, util-linux
On Tue, Jun 12, 2018 at 01:22:45PM +0200, Ruediger Meier wrote:
> On Tuesday 12 June 2018, Karel Zak wrote:
> > I won't have enough time in next two weeks to work on this task (fix
> > indention, reuse some lib/ stuff, etc.), so any volunteer(s)? ;-)
>
> I would do this, maybe next week or so. Should it go to sys-utils or
> misc-utils?
Thanks! I think misc-utils is better in this case.
The sys-utils directory should be used for kernel API wrappers (ioctl,
syscalls, sysfs, etc) -- hmm... why we have kill.c in misc-utils? :-)
(But it's "color of the bikeshed" topic, so better not start this
discussion :-)
Note that the best way is to use "indent --linux" for the first patch.
And another changes to the code do by additional patches. So, we will
able to keep track about our local changes.
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-06-12 12:12 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-01 11:38 hardlink util -- files de-duplication Karel Zak
2018-06-01 13:08 ` Ruediger Meier
2018-06-01 20:20 ` Kevin Fenzi
2018-06-02 0:00 ` Francisco J. Tsao Santin
2018-06-01 13:25 ` Aurélien Aptel
2018-06-01 13:45 ` Samuel Thibault
2018-06-06 12:02 ` Carlos Santos
2018-06-01 13:29 ` Dmitry V. Levin
2018-06-12 10:55 ` Karel Zak
2018-06-12 11:22 ` Ruediger Meier
2018-06-12 12:12 ` Karel Zak
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.