Filtrar lineas de texto repetidas
2003-08-05 00:00:00
Necesitaba un programa que eliminase las lineas repetidas de un texto, y como el uniq del UNIX es practicamente inutil, tuve que currarmelo yo (unique):
#!/usr/bin/perl @lineas = <STDIN>; foreach $linea (@lineas) { $estaba="Pono"; foreach $cadena (@repasadas) { if ($cadena eq $linea) { $estaba="Pozi"; } } if ($estaba eq "Pono") { push @repasadas, $linea; print "$linea"; } }
ACTUALIZACIí“N 23/07/2004 Usando diccionarios, _muchísimo_ más rápido.
#!/usr/bin/perl while (<STDIN>) { if (!(exists $lineas{$_})) { print "$_"; $lineas{$_}="1"; } }
Keith Amling (25/09/2005, 13:22) uniq is designed for use with sorted text. If you sort the input first it will handle it correctly, for example $ cat file c a a b a $ cat file | uniq c a b a $ cat file | sort | uniq a b c If you need the lines sorted as they were initially then uniq is useless. While I'm on the subject of uniq, don't forget uniq -c. |
Saiyine (25/09/2005, 23:41) Yeah, that was right the problem, I didn't wanted sort to mess with the order. Thanks for this lot of commentaries! |