Skip to content

macOS tool to execute a declarative script of actions, aka a playlist

License

Notifications You must be signed in to change notification settings

abra-code/replay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

replay

A macOS tool to execute a list of declared actions. Currently supported actions are:

  • file operations like clone, move, create, delete,
  • execution of a child tool,
  • text operations like echo

Key features:

  • concurrent operations for fastest execution with optional automatic dependency resolution
  • serial operations supported if a sequence is required
  • designed to improve performance of shell scripts running a series of slow tasks, which underutilize the CPU, storage or network resources
  • self contained code - not calling external tools to perform file operations
  • supports cloning on APFS so duplicates don't take unnecessary space on disk
  • companion "dispatch" tool helps with ad hoc task distribution without the need to create a playlist

The documentation is in both tools' help as below. Example usage in test scripts accompanying this code.
Example application can be found in Delta.app, where it is combined with find tool and delivers about 5-fold speed increase:
https://github.com/abra-code/DeltaApp

The content of replay --help:


replay -- execute a declarative script of actions, aka a playlist

Usage:

  replay [options] [playlist_file.json|plist]

Options:

  -k, --playlist-key KEY   Use a key in root dictionary of the playlist file for action steps array.
                     If absent, the playlist file root container is assumed to be an array of action steps.
                     The key may be specified multiple times to execute more than one playlist in the file.
  -s, --serial       Execute actions serially in the order specified in the playlist (slow).
                     Default behavior is to execute actions concurrently, if possible, after dependency analysis (fast).
  -p, --no-dependency   An option for concurrent execution to skip dependency analysis. Actions must be independent.
  -o, --ordered-output   In simple concurrent execution mode preserve the order of printed task outputs as specified
                     in the playlist. The tasks are still executed concurrently without order guarantee
                     but printing is ordered. Ignored in serial execution and concurrent execution with dependencies.
  -t, --max-tasks NUMBER   Maximum number of concurrently executed actions. Default is 0, which is treated as unbound.
                     Limiting the number of concurrent operations may sometimes result in faster execution.
                     With intensive file I/O tasks a low number like 4 may yield the best performance.
                     For CPU intensive tasks you may use logical CPU core count (or a multiplier) as obtained by:
                     sysctl -n hw.ncpu
  -e, --stop-on-error   Stop executing the remaining playlist actions on first error.
  -f, --force        If the file operation fails, delete destination and try again.
  -n, --dry-run      Show a log of actions which would be performed without running them.
  -v, --verbose      Show a log of actions while they are executed.
  -r, --start-server BATCH_NAME   Start server and listen for dispatch requests. "BATCH_NAME" must be a unique name
                     identifying a group of actions to be executed concurrently. Subsequent requests to add actions
                     with "dispatch" tool must refer to the same name. "replay" server listens to request messages
                     sent by "dispatch". If the server is not running for given batch name, the first request to add
                     an action starts "replay" in server mode. Therefore starting the server manually is not required
                     but it is possible if needed.
  -l, --stdout PATH  log standard output to provided file path.
  -m, --stderr PATH  log standard error to provided file path.
  -V, --version      Display version.
  -h, --help         Display this help.

Playlist format:

  Playlists can be composed in plist or JSON files.
  In the usual form the root container of a plist or JSON file is a dictionary,
  where you can put one or more playlists with unique keys.
  A playlist is an array of action steps.
  Each step is a dictionary with action type and parameters. See below for actions and examples.
  If you don't specify the playlist key, the root container is expected to be an array of action steps.
  More than one playlist may be present in a root dictionary. For example, you may want preparation steps
  in one playlist to be executed by "replay" invocation with --serial option
  and have another concurrent playlist with the bulk of work executed by a second "replay" invocation

Environment variables expansion:

  Environment variables in form of ${VARIABLE} are expanded in all paths.
  New file content may also contain environment variables in its body (with an option to turn off expansion).
  Missing environment variables or malformed text is considered an error and the action will not be executed.
  It is easy to make a mistake and allowing evironment variables resolved to empty would result in invalid paths,
  potentially leading to destructive file operations.

Dependency analysis:

  In default execution mode (without --serial or --no-dependency option) "replay" performs dependency analysis
  and constructs an execution graph based on files consumed and produced by actions.
  If a file produced by action A is needed by action B, action B will not be executed until action A is finished.
  For example: if your playlist contains an action to create a directory and other actions write files
  into this directory, all these file actions will wait for directory creation to be finished and then they will
  be executed concurrently if otherwise independent from each other.
  Concurrent execution imposes a couple of rules on actions:
  1. No two actions may produce the same output. With concurrent execution this would produce undeterministic results
     depending on which action happened to run first or fail if they accessed the same file for writing concurrently.
     "replay" will not run any actions when this condition is detected during dependency analysis.
  2. Action dependencies must not create a cycle. In other words the dependency graph must be a proper DAG.
     An example cycle is one action copying file A to B and another action copying file B to A.
     Replay algorithm tracks the number of unsatisifed dependencies for each action. When the number drops to zero,
     the action is dispatched for execution. For actions in a cycle that number never drops to zero and they can
     never be dispatched. After all dispatched tasks are done "replay" verifies all actions in the playlist
     were executed and reports a failure if they were not, listing the ones which were skipped.
  3. Deletion and creation of the same file or directory in one playlist will result in creation first and
     deletion second because the deletion consumes the output of creation. If deletion is a required preparation step
     it should be executed in a separate playlist before the main tasks are scheduled. You may pass --playlist-key
     multiple times as a parameter and the playlists will be executed one after another in the order specified.
  4. Moving or deleting an item makes it unusable for other actions at the original path. Such actions are exclusive
     consumers of given input paths and cannot share their inputs with other actions. Producing additional items under
     such exclusive input paths is also not allowed. "replay" will report an error during dependency analysis
     and will not execute an action graph with exclusive input violations.

Actions and parameters:

  clone       Copy file(s) from one location to another. Cloning is supported on APFS volumes.
              Source and destination for this action can be specified in 2 ways.
              One to one:
    from      Source item path.
    to        Destination item path.
              Or many items to destination directory:
    items     Array of source item paths.
    destination directory   Path to output folder.
  copy        Synonym for clone. Functionally identical.
  move        Move a file or directory.
              Source and destination for this action can be specified the same way as for "clone".
              Source path(s) are invalidated by "move" so they are marked as exclusive in concurrent execution.
  hardlink    Create a hardlink to source file.
              Source and destination for this action can be specified the same way as for "clone".
  symlink     Create a symlink pointing to original file.
              Source and destination for this action can be specified the same way as for "clone".
    validate   Bool value to indicate whether to check for the existence of source file. Default is true.
              It is usually a mistake if you try to create a symlink to nonexistent file,
              that is why "validate" is true by default but it is possible to create a dangling symlink.
              If you know what you are doing and really want that behavior, set "validate" to false.
  create      Create a file or a directory.
              You can create either a file with optional content or a directory but not both in one action step.
    file      New file path (only for files).
    content   New file content string (only for files).
    raw       Bool value to indicate whether environment variables should be expanded or not in content text.
              Default value is "false", meaning that environment variables are expanded.
              Pass "true" if you want to write a script with some ${VARIABLE} usage
    directory   New directory path. All directories leading to the deepest one are created if they don't exist.
  delete      Delete a file or directory (with its content).
              CAUTION: There is no warning or user confirmation requested before deletion.
    items     Array of item paths to delete (files or directories with their content).
              Item path(s) are invalidated by "delete" so they are marked as exclusive in concurrent execution.
  execute     Run an executable as a child process.
    tool      Full path to a tool to execute.
    arguments   Array of arguments to pass to the tool (optional).
    inputs    Array of file paths read by the tool during execution (optional).
    exclusive inputs    Array of file paths invalidated (items deleted or moved) by the tool (rare, optional).
    outputs   Array of file paths writen by the tool during execution (optional).
    stdout    Bool value to indicate whether the output of the tool should be printed to stdout (optional).
              Default value is "true", indicating the output from child process is printed to stdout.
  echo        Print a string to stdout.
    text      The text to print.
    raw       Bool value to indicate whether environment variable expansion should be suppressed. Default is "false".
    newline   Bool value to indicate whether the output string should be followed by newline. Default is "true".

Streaming actions through stdin pipe:

"replay" allows sending a stream of actions via stdin when the playlist file is not specified.
Actions may be executed serially or concurrently but without dependency analysis.
Dependency analysis requires a complete set of actions to create a graph, while streaming
starts execution immediately as the action requests arrive.
Concurrent execution is default, which does not guarantee the order of actions but an option:
--ordered-output has been added to ensure the output order is the same as action scheduling order.
For example, while streaming actions A, B, C in that order, the execution may happen like this: A, C, B
but the printed output will still be preserved as A, B, C. This implies that that output of C
will be delayed if B is taking long to finish.

The format of streamed/piped actions is one action per line (not plist or json!), as follows:
- ignore whitespace characters at the beginning of the line, if any
- action and options come first in square brackets, e.g.: [clone], [move], [delete], [create file] [create directory]
  some options can be added as key=value as described in "Actions and parmeters" section above with examples below
- the first character following the closing square bracket ']' is used as a field delimiter for the parameters to the action
- variable length parameters are following, separated by the same field separator, specific to given action
Param interpretation per action
(examples use "tab" as a separator)
1. [clone], [move], [hardlink], [symlink] allows only simple from-to specification,
with first param interpretted as "from" and second as "to" e.g.:
[clone]	/path/to/src/file.txt	/path/to/dest/file.txt
[symlink validate=false]	/path/to/src/file.txt	/path/to/symlink/file.txt
2. [delete] is followed by one or many paths to items, e.g.:
[delete]	/path/to/delete/file1.txt	/path/to/delete/file2.txt
3. [create] has 2 variants: [create file] and [create directory].
If "file" or "directory" option is not specified, it falls back to "file"
A. [create file] requires path followed by optional content, e.g.:
[create file]	/path/to/create/file.txt	Created by replay!
[create file raw=true]	/path/to/file.txt	Do not expand environment variables like ${HOME}
B. [create directory] requires just a single path, e.g.:
[create directory]	/path/to/create/directory
4. [execute] requires tool path and may have optional parameters separated with the same delimiter (not space delimited!), e.g.:
[execute]	/bin/echo	Hello from replay!
[execute stdout=false]	/bin/echo	This will not be printed
The following example uses a different separator: "+" to explicitly show delimited parameters:
[execute]+/bin/sh+-c+/bin/ls ${HOME} | /usr/bin/grep ".txt"
5. [echo] requires one string after separator. Supported modifiers are raw=true and newline=false

Example JSON playlist:

{
  "Shepherd Playlist": [
    {
      "action": "create",
      "directory": "${HOME}/Pen",
    },
    {
      "action": "clone",
      "from": "${HOME}/sheep.txt",
      "to": "${HOME}/Pen/clone.txt",
    },
    {
      "action": "move",
      "items": [
          "${HOME}/sheep1.txt",
          "${HOME}/sheep2.txt",
          "${HOME}/sheep3.txt",
          ],
      "destination directory": "${HOME}/Pen",
    },
    {
      "action": "delete",
      "items": [
          "${HOME}/Pen/clone.txt",
          ],
    },
  ],
}

Example plist playlist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Shepherd Playlist</key>
    <array>
        <dict>
            <key>action</key>
            <string>create</string>
            <key>directory</key>
            <string>${HOME}/Pen</string>
        </dict>
        <dict>
            <key>action</key>
            <string>clone</string>
            <key>from</key>
            <string>${HOME}/sheep.txt</string>
            <key>to</key>
            <string>${HOME}/Pen/clone.txt</string>
        </dict>
        <dict>
            <key>action</key>
            <string>move</string>
            <key>items</key>
            <array>
                <string>${HOME}/sheep1.txt</string>
                <string>${HOME}/sheep2.txt</string>
                <string>${HOME}/sheep3.txt</string>
            </array>
            <key>destination directory</key>
            <string>${HOME}/Pen</string>
        </dict>
        <dict>
            <key>action</key>
            <string>delete</string>
            <key>items</key>
            <array>
                <string>${HOME}/Pen/clone.txt</string>
            </array>
        </dict>
    </array>
</dict>
</plist>

Example execution:

  replay --dry-run --playlist-key "Shepherd Playlist" shepherd.plist

In the above example playlist some output files are inputs to later actions.
The dependency analysis will create an execution graph to run dependent actions after the required outputs are produced.

See also:

  dispatch --help

The content of dispatch --help:


dispatch -- companion tool for "replay" to simplify adding tasks for concurrent execution

Usage:

  dispatch batch-name [action-name] [action params]

Description:

  "dispatch" starts "replay" in server mode as a background process and sends tasks to it.
Batch name is a required user-provided parameter to all invocations identifying a task batch.
A batch can be understood as a single job with mutiple tasks. Each instance of "replay"
running in a server mode is associated with one uniqueley named batch/job.
"dispatch" is just a client-facing helper tool to send tasks to "replay" server.
It is intended for ad hoc execution of unstructured tasks when the rate of scheduling tasks
is higher than their execution time and they can be run concurrently.
Invoking "dispatch batch-name wait" at the end allows the client script to wait for all
scheduled tasks to finish.
A typical sequence of calls could be demonstrated by the following example:

   dispatch example-batch echo "Starting the batch job"
   dispatch example-batch create file ${HOME}/she-sells.txt 'she sells'
   dispatch example-batch execute /bin/sh -c "/bin/echo 'sea shells' > ${HOME}/shells.txt"
   dispatch example-batch execute /bin/sleep 10
   dispatch example-batch wait

The first invocation of "dispatch" for unique batch name starts a new instance of "replay"
server with default parameters. If you wish to control "replay" behavior you can start it
explicitly with "start" action and provide parameters to forward to "replay", for example:

   dispatch example-batch start --verbose --ordered-output --stop-on-error

Subsequent use of "start" action for the same batch name will not restart the server
but a warning will be printed about already running server instance.

Supported actions are the same as "replay" actions plus a couple of special control words:

   start [replay options]
   clone /from/item/path /to/item/path
   copy /from/item/path /to/item/path
   move /from/item/path /to/item/path
   hardlink /from/item/path /to/item/path
   symlink /from/item/path /to/item/path
   create file /path/to/new/file "New File Content"
   create directory /path/to/new/dir
   delete /path/to/item1 /path/to/item2 /path/to/itemN
   execute /path/to/tool param1 param2 paramN
   echo "String to print"
   wait

If invoked without any action name, "dispatch" opens a standard input for streaming actions
in the same format as accepted by "replay" tool, for example:

   echo "[echo]|Streaming actions" | dispatch stream-job
   echo "[execute]|/bin/ls|-la" | dispatch stream-job
   dispatch stream-job wait

With a couple of notes:
 - you cannot pass "start" and "wait" options that way - these are instructions for
   "dispatch" tool, not real actions to forward to "replay".
 - each line is sent to "replay" server separately so it is not as performant as streaming
   actions directly to "replay" in regular, non-server mode.
 - "replay" stdout cannot be piped when executed this way but "replay" can be started
   with --stdout /path/to/log.out and --stderr /path/to/log.err options keep the logs.
 - a reminder that streaming actions as text requires parameters to be separated by some
   non-interfering field separator (vertical bar in the above example).


Options:

  -V, --version      Display version.
  -h, --help         Display this help

See also:

  replay --help

The content of fingerprint --help:


Usage: fingerprint [-g, --glob=PATTERN]... [OPTIONS]... [PATH]...
Calculate a combined hash, aka fingerprint, of all files in specified path(s) matching the GLOB pattern(s)
OPTIONS:
  -g, --glob=PATTERN  Glob patterns (repeatable, unexpanded) to match files under directories
  -H, --hash=ALGO     File content hash algorithm: crc32c (default) or blake3
  -F, --fingerprint-mode=MODE  Options to include paths in final fingerprint:
        default  : only file content hashes (rename-insensitive) - default if not specified
        absolute : include full absolute paths (detects moves/renames)
        relative : use relative paths when under searched dirs (recommended)
  -X, --xattr=MODE    Control extended attribute (xattr) hash caching:
        on      : use cache if valid, update if changed - default
        off     : disable xattr caching
        refresh : force recompute and update xattrs
        clear   : disable caching and delete existing xattrs
  -I, --inputs=FILE   Read input paths from FILE (one path per line, repeatable)
                      Supports Xcode .xcfilelist with ${VAR}/$(VAR) and plain lists.
  -l, --list          List matched files with their hashes
  -h, --help          Print this help message
  -V, --version       Display version.
  -v, --verbose       Print all status information

PATH arguments (positional) can be:
  - Directories for recursive traversal
  - Individual files to fingerprint directly
  - Symlinks (entire symlink chains are followed and fingerprinted)
  - Non-existent paths (treated as files with sentinel hash value)

Paths can be absolute or relative. Relative paths are resolved against the current directory.
Glob patterns apply only to files discovered during directory traversal, not to directly specified files.
When no glob pattern is specified, all files under provided directories are fingerprinted.

With --xattr=ON the tool caches computed file hashes and saves FileInfo in "public.fingerprint.crc32c"
or "public.fingerprint.blake3" xattr for files, depending on hash choice and then reads it back on next
fingerprinting if file inode, size and modification dates are unchanged.
FileInfo is a 32 byte structure:
	"inode" : 8 bytes,
	"size" : 8 bytes,
	"mtime_ns" : 8 bytes,
	{ crc32c : 4 bytes, reserved: 4 bytes } or blake3 : 8 bytes
xattr caching option significantly speeds up subsequent fingerprinting after initial hash calculation.
Turning it off makes the tool always perform file hashing, which might be justified in a zero trust
hostile environment at the file I/O and CPU expense. In a trusted or non-critical environment without malicious suspects,
the combination of lightweight crc32c and xattr caching provides excellent performance and very low chances of collisions.

About

macOS tool to execute a declarative script of actions, aka a playlist

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •