Sunday, January 16, 2011

Strip text from some delimited fields in a list

Hello all!

Wondering what the fastest way would be to strip the text from just some delimited fields in a list.

My list looks like this:
text text:number:text:text:text:text:*:*:*:*:*:*:*:*:*:*:*:*:*:*:*:*

And I want it to look like this:
text text:*:*:*:*:text:*:*:*:*:*:*:*:*:*:*:*:*:*:*:*:*

So some fields that have data need to be replaced by asterisks, some fields need to be untouched, and the delimiter isn't consistent (first and second field separated by spaces). This is on a linux filesystem and a way to do it inline on the file is preferred.

Thanks much for the help!

  • I would use a regular expression that matches your text (parentheses capture text into buffer designated by \1, \2, etc):

    (.*):([0-9]+):(.*):(.*):(.*):(.*):(.*):(.*):(.*):(.*):(.*):(.*)
    

    and a replacement regular expression:

    \1:\2:\*:\*:\*:\6:\*:\*:\*:\*:\*:\*
    

    with sed:

    sed 's/matching-regularexpression/replacement-regular-expression/' name-of-text-file
    

    You may have to fiddle with escape characters a little depending on your shell.

    Greeblesnort : don't forget sed -f if you've got multiple changes to make....
  • I'd probably do it in perl:

    perl -pi.bak -e 's/^(\w+\s\w+):\d+:\w+:\w+:\w+:(.*)/$1:$2/' FILENAME

0 comments:

Post a Comment