Tuesday, 18 June 2013

selective rsync cracked --precurse-parents

We all know that rsync is one of the elite unix programs. It has no equal and it is so well written and so powerful why would anyone try?

So what is my problem?

I want to back up /var/lib/mysql/ and /etc/pki/ and I want to do it recursively so that I recreate the actual path, (none of that incestuous relative stuff of me!)

What I /think/ I'm after is:

rsync --precurse-parents -maPAX \
--filter='+ /var/lib/mysql/**' \

--filter='+ /var/www/sites/*.org/**' \--filter='+ /var/www/sites/notice.*/**' \--filter='- /**' \
--filter='- *' \--rsync-path='sudo rsync' 'rsync@server:/' /var/backup/server

Where the --precurse-parents   would be like  --prune-empty-dirs
 but would include the parent dir /var and /var/lib because of /var/lib/mysql while excluding /var/* and /var/lib/*.

It is something that I've fought with for over a decade. I've written perl scripts to solve the problem. I've written bash scripts. I've even been crazy enough to read the documentation, (man rsync), but it wasn't until today that I understood.

about 83% of the way through the man is:

       Note  that,  when  using  the  --recursive  (-r)  option (which is implied by -a), every subcomponent of every path is visited from the top down, so
       include/exclude patterns get applied recursively to each subcomponent’s full name (e.g. to  include  "/foo/bar/baz"  the  subcomponents  "/foo"  and
       "/foo/bar" must not be excluded).  The exclude patterns actually short-circuit the directory traversal stage when rsync finds the files to send.  If
       a pattern excludes a particular parent directory, it can render a deeper include pattern ineffectual because rsync  did  not  descend  through  that
       excluded section of the hierarchy.  This is particularly important when using a trailing ’*’ rule.  For instance, this won’t work:

              + /some/path/this-file-will-not-be-found
              + /file-is-included
              - *

       This  fails  because  the  parent  directory "some" is excluded by the ’*’ rule, so rsync never visits any of the files in the "some" or "some/path"
       directories.  One solution is to ask for all directories in the hierarchy to be included by using a single rule: "+ */" (put it somewhere before the
       "-  *" rule), and perhaps use the --prune-empty-dirs option.  Another solution is to add specific include rules for all the parent dirs that need to
       be visited.  For instance, this set of rules works fine:

              + /some/
              + /some/path/
              + /some/path/this-file-is-found
              + /file-also-included
              - *

And that solved the problem for me:

rsync -maPAX \--filter='- *.swp' \--filter='- .git/' \--filter='+ /var/' \
--filter='+ /var/lib/' \
--filter='+ /var/lib/mysql**' \

--filter='+ /var/www/sites/' \
--filter='+ /var/www/sites/*.org/' \
--filter='+ /var/www/sites/*.org/**' \--filter='+ /var/www/sites/notice.*/' \
--filter='+ /var/www/sites/notice.*/**' \--filter='- /var/www/sites/*' \
--filter='- /var/www/*' \
--filter='- /var/*/*' \
--filter='- /var/*' \
--filter='- /**' \
--filter='- /*' \--prune-empty-dirs \
--rsync-path='sudo rsync' 'rsync@server:/' /var/backup/server

I think of this as, "include /var/ {so that rsync can see /var/www}"
"include /var/www/sites/*.org/ {include all of the .org sites}"
"include /var/www/sites/*.org/** {and the files+dirs of those .org sites}"

The mysql line includes the desired dir and everything in it, but would also match /var/lib/mysql_archive_do_NOT_backup, so it is a little more risky.
So each time rsync has to chose it goes through the whole filter form the top down and includes/excludes things that it finds, and if it hasn't included /var/www then /var/www/sites is _never_ going to match. The usual advice is to try the following:
 rsync -maPAX \
--filter='+ */' \
--filter='+ /var/www/sites/*.org/' \
--filter='- /var/www/sites/*.org/**' \
--filter='- /var/www/*' \
--filter='- /var/*/*' \
--filter='- /var/*' \
--filter='- /**' \
rsync@remote ~rsync/backup/
but I think that the first filter line
has the hardest implication to comprehend.

rsync -mnavvPAX  from to
is really helpful (the -nvv does a dry-run and gives additional info.)

 This would then be:
rsync -dwim --filter='+ /var/www/sites/*.org/**'  server /var/backup/server/
I'm sure there is still a better way to get rsync to precurse-parents, as it were, but I'm happy with this solution, (until some kind person adds a comment suggesting an even easier or quicker way to do this.
 [dwim = Do What I mean; not a real rsync flag]

No comments:

Post a Comment

About this blog

Sort of a test blog... until it isn't