Builidng and compressing archives with tar and process substitution

Builidng and compressing archives with tar and process substitution

Created:21 Jun 2018 18:53:34 , in  Host development

To large degree this article is a result of my observations on people's struggles related to compressing files, possibly also directories, with tar command line utility efficiently and safely. Most of the time using tar on its own causes no difficulty. Problems crop up when someone has a try of using tar in tandem with other command, like find program, which provides list of files for tar to work on.

Sometimes things are not going smoothly simply due to someone's inability of handling file names safely on the command line. As a result, unusual file names get in the way and make tar choke or refuse to continue. Some other time one just cannot make two or more programs work together without one of them throwing errors or warnings. The end result is the same like in the previous case.

This text is an attempt to address most of these issue through a couple of easy to understand but robust examples. I'm using aforementioned tar and find commands as well as something called process substitution in them to do so here.

A compressing oneliner for inpatient

Some people just look how to build and compress their archives efficiently, quickly and safely. If your are one of them, here is an oneliner example which is likely to work for you:


tar cfjv archive.bz2 --no-recursion --verbatim-files-from --null --files-from <(find . -print0 )

Running it on the command line results in file archive.bz2. It is a bzip2 compressed tar archive consisting of files and directories under the current working directory.

You might want to change name of the output file and options passed to find command to suit your goal better.

The oneliner breakup

To understand how the oneliner above works, it is best to start analyzing it from its end.

Firstly, all the names of files under current working directory get found recursively using find command and written to a temporary file which tar receives as its input then. Tar reads the file, creates a new archive and compresses it using its bzip2 filter.

A bit more on find command

Find is an ultimate program for searching for files and directories on the command line.


find . -print0

Line of code above searches current working directory recursively. Each file name found as a result of the search is terminated by null character before being printing on terminal. The measure prevents for file names with white space characters in them, which are not so common but crop up sometimes, to confuse other commands receiving them as input.

Process substitution

Process substitution in the form used in this text allows you to save output of any command(s) in a temporary file in an efficient and concise manner. Other command can take the file as its input next.

Process substitution has special syntax: <(). Command(s) are placed in parentheses:


<( command )

Output of the above, like said previously, is a temporary file. The file can be read like this:


cat <( command )

which is, well, a well-known but not necessarily recommended way. A better one is:


$(< <( command ) ) 

or more concisely:


$(< file ) 

As you might have noticed, neither of the two commands above uses external program to read file. Clearly $(< ) is a step in the right direction.

Making process substitution and find work together

Here is a concrete example of the above. With it we look for files in current user's home directory which were accessed within the last 3 days. The result of the search gets saved to a temporary file, read using $(< ) and printed on the terminal.


echo "$(< <( find ~ -atime -3 ))"

Since the above is just an example, you might want to add maxdepth option to stop find program descending too deep into the directory tree.


echo "$(< <( find ~ -maxdepth 2 -atime -3 ))" 

Tar options

In the oneliner given in the first listing tar program receives several configuration options.

Long options are:

  • --no-recursion - tar not to descend recursively into directories when creating archive. Find does it already.

  • --verbatim-files-from - tells tar not to consider filenames starting with a dash as extra options

  • --files-from - tar to get names of files for newly created archive from a file

  • --null - null character terminates all filenames,

The last option is important because find command has been instructed to return file names and terminate each of them with null character. Tar needs to know this information or it might get confused.

In general, terminating file names with null character is a big step towards preventing these names from wrecking havoc among other programs which receive them as their input.

Short options are:

  • c for create,

  • f for file,

  • j for tar to use bzip2 for archive compression,

  • v for the program to work in verbose mode.

Tar has this ability of using several different filters for archive compressing. The oneliner given uses slower but more efficient bzip2.

Using basic idea to deal with advanced cases

The oneliner given in the first part of this article is a mixture of two program invocations linked with process substitution. It is a simple, but due to large number of configurable options tar and find programs can accept, powerful piece of code.

For example, line of code below creates an archive from files modified within the last 3 hours and found in current user's home directory (possibly also sub-directories). The archive is then compressed using gzip program.


tar cfzv archive.bz --no-recursion --verbatim-files-from --null --files-from <( find ~ -mmin -180 -print0 )  

Final thoughts

Tar and find programs have been around for dozens of years and hardly require any introduction. Both have quality manuals and tones of examples of creative use available online . Process substitution, this crucial linchpin, which in my examples facilitates making the two commands work seamlessly together is not a new thing either ( it was made available to BASH users in the middle of 90s), but for some reason it has received nowhere near the amount of publicity it deserves.

One goal of this article has been to shed some light on this form of inter-process communication through basic examples. Process substitution is known to be an efficient technique, it can take advantage of multiprocessing, and last but not least, once understood right, it is easy to incorporate in scripts and oneliners.

I hope, that at this point of this article you have got not only somewhat better understanding of how to create and compress archives with tar and find in a safe and efficient manner but also have gained some knowledge related to how process substitution works in shell code.

If you liked the article please share it, if not drop me a line why ;).

This post was updated on 21 Jun 2018 21:02:33

Tags:  BASH 


Author, Copyright and citation

Author

Sylwester Wojnowski

Author of the above article, Sylwester Wojnowski, is sWWW admin and owner.He enjoys doing Maths and studying algorithms, writing code in scripting and command languages, Thrash Metal music and playing electric guitar.

Copyrights

©Copyright, 2018 Sylwester Wojnowski. This article may not be reproduced or published as a whole or in parts without permission from the author. If you share it, please give author credit and do not remove embedded links.

Computer code, if present in the article, is excluded from the above and licensed under GPLv3.

Citation

Cite this article as:

Wojnowski, Sylwester. "Builidng and compressing archives with tar and process substitution." From sWWW - Code For The Web . https://wojnowski.net.pl//main/index/builidng-and-compressing-archives-with-tar-and-process-substitution