Thursday, April 14, 2011

How to create an identical gzip of the same file?

I have a file, its contents are identical. It is passed into gzip and only the compressed form is stored. I'd like to be able to generate the zip again, and only update my copy should they differ. As it stands diffing tools (diff, xdelta, subversion) see the files as having changed.

Premise, I'm storing a mysqldump of an important database into a subversion repository. It is my intention that a cronjob periodically dump the db, gzip it, and commit the file. Currently, every time the file is dumped and then gzipped it is considered as differing. I'd prefer not to have my revision numbers needlessly increase every 15m.

I realize I could dump the file as just plain text, but I'd prefer not as it's rather large.

The command I am currently using to generate the dumps is:

mysqldump $DB --skip-extended-insert | sed '$d' | gzip -n > $REPO/$DB.sql.gz

The -n instructs gzip to remove the filename/timestamp information. The sed '$d' removes the last line of the file where mysqldump places a timestamp.

At this point, I'm probably going to revert to storing it in a plain text fashion, but I was curious as to what kind of solution there is.

Resolved, Mr. Bright was correct, I had mistakenly used a capital N when the correct argument was a lowercase one.

From stackoverflow
  • The -N instructs gzip to remove the filename/timestamp information.

    Actually, that does just the opposite. -n is what tells it to forget the original file name and time stamp.

    Danny : Always something friggin stupid. Sigh. Thank you for pointing out the silly mistake.
  • I think gzip is preserving the original date and timestamp on the file(s) which will cause it to produce a different archive.

    -N --name
              When  compressing,  always  save  the original file
              name and time stamp;  this  is  the  default.  When
              decompressing,  restore  the original file name and
              time stamp if present. This  option  is  useful  on
              systems  which  have a limit on file name length or
              when the time stamp has  been  lost  after  a  file
              transfer.
    

0 comments:

Post a Comment