1001Ferramentas
πŸ“¦Generators

tar Command Builder

Build tar commands (create, extract, list) with gz/bz2/xz/zst compression and exclusion patterns.


  

tar in depth: archive modes, compression filters, and reproducible backups

The tar command β€” short for tape archive β€” was written by Jean-loup Gailly in 1979 for Unix v7, originally to serialise filesystems onto magnetic tape drives. Almost half a century later, it is still the de facto packaging format on every Linux distribution, the layer that ships every Docker image, the format your kernel sources arrive in, and the bedrock under nightly server backups. The reason it survived is its design: tar does not compress, it only packages. Compression is layered on top using interchangeable filters (gzip, bzip2, xz, zstd), and that decoupling lets you swap compression algorithms without changing the archive format itself.

Operating modes and required flags

Every tar invocation needs a mode flag and an archive file:

  • -c β€” create a new archive.
  • -x β€” extract from an existing archive.
  • -t β€” list (table of contents) without extracting.
  • -r β€” append files to an uncompressed archive.
  • -u β€” append only files newer than the matching entry.
  • -f <file> β€” the archive path. Always required, even when piping (use - for stdin/stdout).

Compression filters: gzip, bzip2, xz, zstd

Filters are short flags that pipe the archive through a compressor:

-z   gzip   (.tar.gz / .tgz)   fast, modest ratio
-j   bzip2  (.tar.bz2)         slower, better ratio than gzip
-J   xz     (.tar.xz)          slow, ~30% smaller than gzip
--zstd      (.tar.zst)         modern sweet spot: ~xz size, 5-10x faster
--lz4       (.tar.lz4)         very fast, low ratio (HPC scratch)
--lzma      (.tar.lzma)        legacy xz predecessor

For a 1 GB source tree of source code, expect roughly: gzip ~300 MB / 4 s, bzip2 ~230 MB / 25 s, xz ~180 MB / 90 s, zstd -19 ~185 MB / 12 s. zstd is the modern default for both backups and container layers.

Common modifiers

  • -v β€” verbose, prints each entry to stderr.
  • -C <dir> β€” change directory before the next argument.
  • --exclude=<glob> β€” skip matching paths (repeatable, or via --exclude-from=file).
  • --strip-components=N β€” drop N leading path components on extract.
  • -p / --preserve-permissions β€” keep UNIX modes; default only when running as root.
  • --owner=0 --group=0 --numeric-owner β€” force UID/GID for reproducible archives.
  • --mtime='@0' + --sort=name β€” bit-for-bit reproducible output (paired with the previous flags).
  • -S β€” handle sparse files efficiently (good for VM disk images).

Worked examples

# Create a gzipped backup
tar -czvf backup.tar.gz pasta/

# Extract anywhere
tar -xzvf backup.tar.gz

# Create, excluding heavy directories
tar --exclude='node_modules' --exclude='.git' \
    -czf src.tgz src/

# Preview the contents without extracting
tar -tzvf backup.tar.gz | head

# Extract dropping the top-level directory
tar -xJf big.tar.xz --strip-components=1 -C /tmp

# Reproducible archive (byte-identical across runs)
tar --sort=name --owner=0 --group=0 --numeric-owner \
    --mtime='@0' -cf src.tar src/

# Pipe across SSH (no temp file)
tar -czf - data/ | ssh host 'tar -xzf - -C /backup'

Gotchas: GNU vs BSD tar, paths, and security

GNU vs BSD tar. macOS ships BSD tar (libarchive), which lacks some GNU long options and uses -s for substitution instead of --transform. Scripts that must run on both should stick to short flags and avoid GNU-only extensions like --anchored.

Absolute paths. Modern tar strips a leading / on create by default; pass --absolute-names only if you know what you are doing β€” an archive with absolute paths can overwrite system files on extraction.

Path traversal. Older tar and naΓ―ve Python tarfile users were vulnerable to entries like ../../etc/passwd. Modern tar refuses these by default; if you write your own extractor, validate every member before unpacking.

Windows. Windows 11 24H2 includes native tar; older versions need 7-Zip, WSL, or Git Bash. .tar.gz survives the round-trip fine, but be careful with UNIX permissions and symlinks β€” NTFS handles both partially.

FAQ

Is .tar.gz safe to share? The format itself is safe; what matters is whether the producer is trusted. Verify with a detached signature (gpg --verify) or a checksum from a separate channel.

Can I open .tar.gz on Windows? Yes β€” 7-Zip, WinRAR, the built-in tar on Windows 11, or WSL all work. Double-extraction (.tar inside .gz) used to be a 7-Zip quirk; modern versions unwrap both in one click.

What is the best compressor for large backups? zstd --long=27 -19 β€” roughly the same ratio as xz on big trees but 5-10x faster to compress and 3-5x faster to decompress.

How do I append a file to a .tar.gz? You cannot β€” gzipped archives are not seekable. Either re-create the archive, or use plain .tar with -r and recompress at the end.

Why is my second tar smaller even though nothing changed? Filesystem inode order and metadata timestamps differ between runs. For reproducible builds use --sort=name --mtime='@0' --owner=0 --group=0 --numeric-owner.

Related Tools