Skip to main content

Alto Next Steps

The quickstart guide is a great way to get to quickly get to the a-ha moment of Alto. However, it is helpful to move beyond the quickstart and see a more contrived example of what a real project might look like. This will expose you to some more of the possibilities of Alto. While the rest of the learning content will have toml, yaml, and json examples, we will use toml in this brief exposure to a more refined and terse config file. Exposure is the key here, so don't worry about understanding everything right away. We will cover the details in the next section.

Configurationโ€‹

What you will notice right away is that the configuration can be very concise. These are the same 3 plugins that alto init generates, but with more liberties taken to show off the flexibility of the configuration and most comments removed.

alto.toml
[default]
project_name = "{project}"
load_path = "raw"
extensions = ["evidence"]
environment.STARTER_PROJECT = 1
# https://github.com/dlt-hub/dlt
utilities.dlt.pip_url = "python-dlt[duckdb]>=0.2.0a25"
utilities.dlt.environment.PEX_INHERIT_PATH = "fallback"

[default.taps]
# https://gitlab.com/meltano/tap-carbon-intensity
carbon-data.pip_url = "git+https://gitlab.com/meltano/tap-carbon-intensity.git#egg=tap_carbon_intensity"
carbon-data.executable = "tap-carbon-intensity"
carbon-data.load_path = "carbon_intensity"
carbon-data.capabilities = ["state", "catalog"]
carbon-data.select = ["*.*", "~*.dnoregion"]

# https://hub.meltano.com/extractors/tap-bls
labor-data.pip_url = "git+https://github.com/frasermarlow/tap-bls#egg=tap_bls"
labor-data.executable = "tap-bls"
labor-data.capabilities = ["state", "catalog"]
labor-data.load_path = "bls"
labor-data.select = ["JTU000000000000000JOR", "JTU000000000000000JOL"]
labor-data.config.startyear = "2019"
labor-data.config.endyear = "2020"
labor-data.config.calculations = "true"
labor-data.config.annualaverage = "false"
labor-data.config.aspects = "false"
labor-data.config.disable_collection = "true"
labor-data.config.update_state = "false"
labor-data.config.series_list_file_location = "./series.json"

[default.targets]
# https://hub.meltano.com/loaders/target-singer-jsonl
jsonl.pip_url = "target-jsonl==0.1.4"
jsonl.executable = "target-jsonl"
jsonl.config.destination_path = "@format output/{this.load_path}"

[github_actions]
load_path = "cicd"
targets.jsonl.config.destination_path = "@format /github/workspace/output/{this.load_path}"

Listing the available commandsโ€‹

You will notice here that, because alto is entirely driven by configuration, the commands listed here take advantage of the names we have given to the plugins. This is an extremely easy way of adding context to your CLI experience.

(env) vscode โžœ /workspaces/alto/example_1 (main) $ alto list --all
๐Ÿ“ฆ Alto version: 0.2.10
๐Ÿ— Working directory: .
๐ŸŒŽ Environment: DEVELOPMENT

๐Ÿ‘ท build [core] Generate pex plugin based on the alto config.
๐Ÿ‘ท build:carbon-data Build the carbon-data plugin
๐Ÿ‘ท build:labor-data Build the labor-data plugin
๐Ÿ‘ท build:jsonl Build the jsonl plugin
๐Ÿ‘ท build:dlt Build the dlt plugin
๐Ÿ›  config [core] Generate configuration files on disk.
๐Ÿ›  config:carbon-data Render configuration for the carbon-data plugin
๐Ÿ›  config:labor-data Render configuration for the labor-data plugin
๐Ÿ›  config:jsonl Render configuration for the jsonl plugin
๐Ÿ›  config:jsonl--carbon-data Render configuration for the jsonl plugin with carbon-data as source
๐Ÿ›  config:jsonl--labor-data Render configuration for the jsonl plugin with labor-data as source
๐Ÿ“– catalog [singer] Generate base catalog file for a Singer tap.
๐Ÿ“– catalog:carbon-data Generate base catalog for carbon-data
๐Ÿ“– catalog:labor-data Generate base catalog for labor-data
๐Ÿ“ฆ apply [singer] Apply user config to base catalog file.
๐Ÿ“ฆ apply:carbon-data Render runtime catalog for carbon-data
๐Ÿ“ฆ apply:labor-data Render runtime catalog for labor-data
๐Ÿ”Œ carbon-data [singer] Execute a data pipeline.
๐Ÿ”Œ carbon-data:jsonl Run the carbon-data to jsonl data pipeline
๐Ÿ”Œ labor-data [singer] Execute a data pipeline.
๐Ÿ”Œ labor-data:jsonl Run the labor-data to jsonl data pipeline
๐Ÿ’ง reservoir [singer] Execute a data pipeline.
๐Ÿ’ง reservoir:carbon-data-jsonl Run the carbon-data to jsonl data pipeline from the reservoir to the target
๐Ÿ’ง reservoir:labor-data-jsonl Run the labor-data to jsonl data pipeline from the reservoir to the target
๐Ÿ”Œ carbon-data:reservoir Run the carbon-data to reservoir data pipeline to the reservoir from the tap
๐Ÿ”Œ labor-data:reservoir Run the labor-data to reservoir data pipeline to the reservoir from the tap
๐Ÿงช test [singer] Run tests for taps.
๐Ÿงช test:carbon-data Test the carbon-data plugin
๐Ÿงช test:labor-data Test the labor-data plugin
๐Ÿ’ about [singer] Run the about command for a Singer tap.
๐Ÿš€ evidence [extension] Evidence.dev extension.
๐Ÿš€ evidence:initialize Generate Evidence project from template.
๐Ÿš€ evidence:build Build the Evidence dev reports.
๐Ÿš€ evidence:dev Run the Evidence dev server.

Running the pipelinesโ€‹

Tap to Targetโ€‹

Given the above configuration, you can run the following command to extract data from the BLS (Bereau of Labor and Statistics) and Carbon Intensity APIs and load it into JSONL files.

alto carbon-data:jsonl
alto labor-data:jsonl

Tap to Reservoirโ€‹

Alternatively, you can run the following command to extract data from the BLS (Bereau of Labor and Statistics) and Carbon Intensity APIs and load it into the project reservoir.

alto carbon-data:reservoir
alto labor-data:reservoir

Reservoir to Targetโ€‹

From the reservoir, you can replay the data extract (since it is still in Singer format and partitioned for parallelization) into any number of targets any number of times.

alto reservoir:carbon-data-jsonl
alto reservoir:carbon-data-snowflake # here as example, not in config
alto reservoir:carbon-data-parquet # here as example, not in config

Invoking utilitiesโ€‹

Lastly, you can invoke the utility, dlt, defined above (you can actually invoke any plugin this way).

alto invoke dlt --help  # invoke the executable directly

alto invoke python dlt # drop into a python shell with dlt installed
alto invoke python dlt ./path/to/pipeline.py # run a python script