Alto CLI
Anatomy of the Alto CLI
This is a work in progress. More commands and more documentation to come.
alto init
This command will generate a new Alto project in the current directory. It will prompt you for a project name and configuration format. The project name is used to determine where your cached PEX files, catalogs, state, and data reservoir are materialized. The configuration format is used to determine the format of the alto.{toml,yaml,json}
configuration file.
alto list
This command will list all of the top-level tasks in your project. These are the tasks that are derived from the alto.{toml,yaml,json}
configuration file. Each of these tasks has subtasks that are derived from the taps
and targets
keys in the configuration file.
alto list --all
This command will list all of the top-level tasks in your project, as well as all of the subtasks for each top-level task. This is useful for understanding the full breadth of tasks that are available to you.
alto {tap}:{target}
This command will run a data pipeline from the specified tap to the specified target. This is the most common command that you will run. It is the similar to running tap | target
but with all of the benefits of a declarative configuration file, automatic plugin caching, state management, and so on.
alto {tap}:reservoir
This command will run a data pipeline from the specified tap to the reservoir. The reservoir is a special target that is used to store the data that is produced by the tap. This is useful for replaying the data in a different pipeline and preservation of historical extracts.
alto reservoir:{tap}-{target}
This command will run a data pipeline from the reservoir to the specified target. You can think of this as a replay of the data that was previously extracted by the tap. The reservoir:{tap}
portion of the command denote the source of the data, and the {target}
portion of the command denotes the destination of the data.
alto catalog:{tap}
This command will run discovery on the specified tap. This is useful for generating a base catalog for the tap. The base catalog is used to generate a runtime catalog that is used to run the tap. The runtime catalog is generated by applying the select
and metadata
keys from the configuration file to the base catalog. You will rarely need to run this command directly, as it is automatically run when you run a pipeline.
alto state:{tap}
This command fetches the state for the specified tap. This is useful for debugging state issues. You will rarely need to run this command directly, as it is automatically run when you run a pipeline.
alto config:{tap or target}
This command will generate a configuration template for the specified tap or target. This configuration is based completely on the config
key of the tap or target. You will rarely need to run this command directly, as it is automatically run when you run a pipeline.
alto build:{plugin}
This command will build a PEX file for the specified plugin.
alto dump
This command will dump the complete rendered configuration of the alto project. This is useful for debugging configuration issues. It is also an extremely interesting way by which you can ship your project to other machines. You can think of this as a "compiled" version of your project. You can run this command on your local machine copying the output. If you write that output to an alto.json
anywhere. Your project should be able to run on any machine that has Alto installed barring any filepaths that are hard-coded in the configuration.