Orchestrators

`zenml.orchestrators` `special`

Initialization for ZenML orchestrators.

An orchestrator is a special kind of backend that manages the running of each step of the pipeline. Orchestrators administer the actual pipeline runs. You can think of it as the 'root' of any pipeline job that you run during your experimentation.

ZenML supports a local orchestrator out of the box which allows you to run your pipelines in a local environment. We also support using Apache Airflow as the orchestrator to handle the steps of your pipeline.

`base_orchestrator`

Base orchestrator class.

`BaseOrchestrator (StackComponent, ABC)` `pydantic-model`

Base class for all orchestrators.

In order to implement an orchestrator you will need to subclass from this class.

How it works:

The run() method is the entrypoint that is executed when the pipeline's run method is called within the user code (pipeline_instance.run()).

This method will take the ZenML Pipeline instance and prepare it for eventual execution. To do this the following steps are taken:

The underlying protobuf pipeline is created.
Within the _configure_node_context() method the pipeline requirements, stack and runtime configuration is added to the step context
The _get_sorted_steps() method then generates a sorted list of steps which will later be used to directly execute these steps in order, or to easily build a dag
After these initial steps comes the most crucial one. Within the prepare_or_run_pipeline() method each orchestrator will have its own implementation that dictates the pipeline orchestration. In the simplest case this method will iterate through all steps and execute them one by one. In other cases this method will build and deploy an intermediate representation of the pipeline (e.g an airflow dag or a kubeflow pipelines yaml) to be executed within the orchestrators environment.

Building your own:

In order to build your own orchestrator, all you need to do is subclass from this class and implement your own prepare_or_run_pipeline() method. Overwriting other methods is NOT recommended but possible. See the docstring of the prepare_or_run_pipeline() method to find out details of what needs to be implemented within it.

Source code in zenml/orchestrators/base_orchestrator.py

class BaseOrchestrator(StackComponent, ABC):
    """Base class for all orchestrators.

    In order to implement an orchestrator you will need to subclass from this
    class.

    How it works:
    -------------
    The `run()` method is the entrypoint that is executed when the
    pipeline's run method is called within the user code
    (`pipeline_instance.run()`).

    This method will take the ZenML Pipeline instance and prepare it for
    eventual execution. To do this the following steps are taken:

    * The underlying protobuf pipeline is created.

    * Within the `_configure_node_context()` method the pipeline
    requirements, stack and runtime configuration is added to the step
    context

    * The `_get_sorted_steps()` method then generates a sorted list of
    steps which will later be used to directly execute these steps in order,
    or to easily build a dag

    * After these initial steps comes the most crucial one. Within the
    `prepare_or_run_pipeline()` method each orchestrator will have its own
    implementation that dictates the pipeline orchestration. In the simplest
    case this method will iterate through all steps and execute them one by
    one. In other cases this method will build and deploy an intermediate
    representation of the pipeline (e.g an airflow dag or a kubeflow
    pipelines yaml) to be executed within the orchestrators environment.

    Building your own:
    ------------------
    In order to build your own orchestrator, all you need to do is subclass
    from this class and implement your own `prepare_or_run_pipeline()`
    method. Overwriting other methods is NOT recommended but possible.
    See the docstring of the `prepare_or_run_pipeline()` method to find out
    details of what needs to be implemented within it.
    """

    # Class Configuration
    TYPE: ClassVar[StackComponentType] = StackComponentType.ORCHESTRATOR

    @abstractmethod
    def prepare_or_run_pipeline(
        self,
        sorted_steps: List[BaseStep],
        pipeline: "BasePipeline",
        pb2_pipeline: Pb2Pipeline,
        stack: "Stack",
        runtime_configuration: "RuntimeConfiguration",
    ) -> Any:
        """This method needs to be implemented by the respective orchestrator.

        Depending on the type of orchestrator you'll have to perform slightly
        different operations.

        Simple Case:
        ------------
        The Steps are run directly from within the same environment in which
        the orchestrator code is executed. In this case you will need to
        deal with implementation-specific runtime configurations (like the
        schedule) and then iterate through each step and finally call
        `self.run_step()` to execute each step.

        Advanced Case:
        --------------
        Most orchestrators will not run the steps directly. Instead, they
        build some intermediate representation of the pipeline that is then
        used to create and run the pipeline and its steps on the target
        environment. For such orchestrators this method will have to build
        this representation and either deploy it directly or return it.

        Regardless of the implementation details, the orchestrator will need
        to a way to trigger each step in the target environment. For this
        the `run_step()` method should be used.

        In case the orchestrator is using docker containers for orchestration
        of each step, the `zenml.entrypoints.step_entrypoint` module can be
        used as a generalized entrypoint that sets up all the necessary
        prerequisites, parses input parameters and finally executes the step
        using the `run_step()`method.

        If the orchestrator needs to know the upstream steps for a specific
        step to build a DAG, it can use the `get_upstream_step_names()` method
        to get them.

        Args:
            sorted_steps: List of sorted steps.
            pipeline: Zenml Pipeline instance.
            pb2_pipeline: Protobuf Pipeline instance.
            stack: The stack the pipeline was run on.
            runtime_configuration: The Runtime configuration of the current run.

        Returns:
            The optional return value from this method will be returned by the
            `pipeline_instance.run()` call when someone is running a pipeline.
        """

    def run(
        self,
        pipeline: "BasePipeline",
        stack: "Stack",
        runtime_configuration: "RuntimeConfiguration",
    ) -> Any:
        """Runs a pipeline.

        To do this, a protobuf pipeline is created, the context of the
        individual steps is expanded to include relevant data, the steps are
        sorted into execution order and the implementation specific
        `prepare_or_run_pipeline()` method is called.

        Args:
            pipeline: The pipeline to run.
            stack: The stack on which the pipeline is run.
            runtime_configuration: Runtime configuration of the pipeline run.

        Returns:
            The result of the call to `prepare_or_run_pipeline()`.
        """
        # Create the protobuf pipeline which will be needed for various reasons
        # in the following steps
        pb2_pipeline: Pb2Pipeline = Compiler().compile(
            create_tfx_pipeline(pipeline, stack=stack)
        )

        self._configure_node_context(
            pipeline=pipeline,
            pb2_pipeline=pb2_pipeline,
            stack=stack,
            runtime_configuration=runtime_configuration,
        )

        sorted_steps = self._get_sorted_steps(
            pipeline=pipeline, pb2_pipeline=pb2_pipeline
        )

        result = self.prepare_or_run_pipeline(
            sorted_steps=sorted_steps,
            pipeline=pipeline,
            pb2_pipeline=pb2_pipeline,
            stack=stack,
            runtime_configuration=runtime_configuration,
        )

        return result

    @staticmethod
    def _get_sorted_steps(
        pipeline: "BasePipeline", pb2_pipeline: Pb2Pipeline
    ) -> List["BaseStep"]:
        """Get steps sorted in the execution order.

        This simplifies the building of a DAG at a later stage as it can be
        built with one iteration over this sorted list of steps.

        Args:
            pipeline: The pipeline
            pb2_pipeline: The protobuf pipeline representation

        Returns:
            List of steps in execution order
        """
        # Create a list of sorted steps
        sorted_steps = []
        for node in pb2_pipeline.nodes:
            pipeline_node: PipelineNode = node.pipeline_node
            sorted_steps.append(
                get_step_for_node(
                    pipeline_node, steps=list(pipeline.steps.values())
                )
            )
        return sorted_steps

    def run_step(
        self,
        step: "BaseStep",
        run_name: str,
        pb2_pipeline: Pb2Pipeline,
    ) -> Optional[data_types.ExecutionInfo]:
        """This sets up a component launcher and executes the given step.

        Args:
            step: The step to be executed
            run_name: The unique run name
            pb2_pipeline: Protobuf Pipeline instance

        Returns:
            The execution info of the step.
        """
        # Substitute the runtime parameter to be a concrete run_id, it is
        # important for this to be unique for each run.
        runtime_parameter_utils.substitute_runtime_parameter(
            pb2_pipeline,
            {PIPELINE_RUN_ID_PARAMETER_NAME: run_name},
        )

        # Extract the deployment_configs and use it to access the executor and
        # custom driver spec
        deployment_config = runner_utils.extract_local_deployment_config(
            pb2_pipeline
        )
        executor_spec = runner_utils.extract_executor_spec(
            deployment_config, step.name
        )
        custom_driver_spec = runner_utils.extract_custom_driver_spec(
            deployment_config, step.name
        )

        # At this point the active metadata store is queried for the
        # metadata_connection
        repo = Repository()
        metadata_store = repo.active_stack.metadata_store
        metadata_connection = metadata.Metadata(
            metadata_store.get_tfx_metadata_config()
        )
        custom_executor_operators = {
            executable_spec_pb2.PythonClassExecutableSpec: step.executor_operator
        }

        # The protobuf node for the current step is loaded here.
        pipeline_node = self._get_node_with_step_name(
            step_name=step.name, pb2_pipeline=pb2_pipeline
        )

        # Create the tfx launcher responsible for executing the step.
        component_launcher = launcher.Launcher(
            pipeline_node=pipeline_node,
            mlmd_connection=metadata_connection,
            pipeline_info=pb2_pipeline.pipeline_info,
            pipeline_runtime_spec=pb2_pipeline.runtime_spec,
            executor_spec=executor_spec,
            custom_driver_spec=custom_driver_spec,
            custom_executor_operators=custom_executor_operators,
        )

        # In some stack configurations, some stack components (like experiment
        # trackers) will run some code before and after the actual step run.
        # This is where the step actually gets executed using the
        # component_launcher
        repo.active_stack.prepare_step_run()
        execution_info = self._execute_step(component_launcher)
        repo.active_stack.cleanup_step_run()

        return execution_info

    @staticmethod
    def _execute_step(
        tfx_launcher: launcher.Launcher,
    ) -> Optional[data_types.ExecutionInfo]:
        """Executes a tfx component.

        Args:
            tfx_launcher: A tfx launcher to execute the component.

        Returns:
            Optional execution info returned by the launcher.

        Raises:
            DuplicateRunNameError: If the run name is already in use.
        """
        pipeline_step_name = tfx_launcher._pipeline_node.node_info.id
        start_time = time.time()
        logger.info(f"Step `{pipeline_step_name}` has started.")
        try:
            execution_info = tfx_launcher.launch()
            if execution_info and get_cache_status(execution_info):
                logger.info(f"Using cached version of `{pipeline_step_name}`.")
        except RuntimeError as e:
            if "execution has already succeeded" in str(e):
                # Hacky workaround to catch the error that a pipeline run with
                # this name already exists. Raise an error with a more
                # descriptive
                # message instead.
                raise DuplicateRunNameError()
            else:
                raise

        run_duration = time.time() - start_time
        logger.info(
            f"Step `{pipeline_step_name}` has finished in "
            f"{string_utils.get_human_readable_time(run_duration)}."
        )
        return execution_info

    def get_upstream_step_names(
        self, step: "BaseStep", pb2_pipeline: Pb2Pipeline
    ) -> List[str]:
        """Given a step, use the associated pb2 node to find the names of all upstream nodes.

        Args:
            step: Instance of a Pipeline Step
            pb2_pipeline: Protobuf Pipeline instance

        Returns:
            List of step names from direct upstream steps
        """
        node = self._get_node_with_step_name(step.name, pb2_pipeline)

        upstream_steps = []
        for upstream_node in node.upstream_nodes:
            upstream_steps.append(upstream_node)

        return upstream_steps

    @staticmethod
    def requires_resources_in_orchestration_environment(
        step: "BaseStep",
    ) -> bool:
        """Checks if the orchestrator should run this step on special resources.

        Args:
            step: The step that will be checked.

        Returns:
            True if the step requires special resources in the orchestration
            environment, False otherwise.
        """
        # If the step requires custom resources and doesn't run with a step
        # operator, it would need these requirements in the orchestrator
        # environment
        return not (
            step.custom_step_operator or step.resource_configuration.empty
        )

    @staticmethod
    def _get_node_with_step_name(
        step_name: str, pb2_pipeline: Pb2Pipeline
    ) -> PipelineNode:
        """Given the name of a step, return the node with that name from the pb2_pipeline.

        Args:
            step_name: Name of the step
            pb2_pipeline: pb2 pipeline containing nodes

        Returns:
            PipelineNode instance

        Raises:
            KeyError: If the step name is not found in the pipeline.
        """
        for node in pb2_pipeline.nodes:
            if (
                node.WhichOneof("node") == "pipeline_node"
                and node.pipeline_node.node_info.id == step_name
            ):
                return node.pipeline_node

        raise KeyError(
            f"Step {step_name} not found in Pipeline "
            f"{pb2_pipeline.pipeline_info.id}"
        )

    @staticmethod
    def _configure_node_context(
        pipeline: "BasePipeline",
        pb2_pipeline: Pb2Pipeline,
        stack: "Stack",
        runtime_configuration: "RuntimeConfiguration",
    ) -> None:
        """Adds context to each pipeline node of a pb2_pipeline.

        This attaches important contexts to the nodes; namely
        pipeline.docker_configuration, stack information and the runtime
        configuration.

        Args:
            pipeline: Zenml Pipeline instance
            pb2_pipeline: Protobuf Pipeline instance
            stack: The stack the pipeline was run on
            runtime_configuration: The Runtime configuration of the current run
        """
        stack_json = json.dumps(stack.dict(), sort_keys=True)

        # Copy and remove the run name so an otherwise identical run reuses
        # our MLMD context
        runtime_config_copy = runtime_configuration.copy()
        runtime_config_copy.pop("run_name")
        runtime_config_json = json.dumps(
            runtime_config_copy, sort_keys=True, default=pydantic_encoder
        )

        docker_config_json = pipeline.docker_configuration.json(sort_keys=True)

        context_properties = {
            MLMD_CONTEXT_STACK_PROPERTY_NAME: stack_json,
            MLMD_CONTEXT_RUNTIME_CONFIG_PROPERTY_NAME: runtime_config_json,
            MLMD_CONTEXT_DOCKER_CONFIGURATION_PROPERTY_NAME: docker_config_json,
        }

        for node in pb2_pipeline.nodes:
            pipeline_node: PipelineNode = node.pipeline_node

            step = get_step_for_node(
                pipeline_node, steps=list(pipeline.steps.values())
            )
            step_context_properties = context_properties.copy()
            step_context_properties[
                MLMD_CONTEXT_STEP_RESOURCES_PROPERTY_NAME
            ] = step.resource_configuration.json(sort_keys=True)

            # We add the resolved materializer sources here so step operators
            # can fetch it in the entrypoint. This is needed to support
            # custom materializers which would otherwise be ignored.
            materializer_sources = {
                output_name: source_utils.resolve_class(materializer_class)
                for output_name, materializer_class in step.get_materializers(
                    ensure_complete=True
                ).items()
            }
            step_context_properties[
                MLMD_CONTEXT_MATERIALIZER_SOURCES_PROPERTY_NAME
            ] = json.dumps(materializer_sources, sort_keys=True)

            properties_json = json.dumps(
                step_context_properties, sort_keys=True
            )
            context_name = hashlib.md5(properties_json.encode()).hexdigest()

            add_context_to_node(
                pipeline_node,
                type_=ZENML_MLMD_CONTEXT_TYPE,
                name=context_name,
                properties=step_context_properties,
            )

`get_upstream_step_names(self, step, pb2_pipeline)`

Given a step, use the associated pb2 node to find the names of all upstream nodes.

Parameters:

Name	Type	Description	Default
`step`	`BaseStep`	Instance of a Pipeline Step	required
`pb2_pipeline`	`Pipeline`	Protobuf Pipeline instance	required

Returns:

Type	Description
`List[str]`	List of step names from direct upstream steps

Source code in zenml/orchestrators/base_orchestrator.py

def get_upstream_step_names(
    self, step: "BaseStep", pb2_pipeline: Pb2Pipeline
) -> List[str]:
    """Given a step, use the associated pb2 node to find the names of all upstream nodes.

    Args:
        step: Instance of a Pipeline Step
        pb2_pipeline: Protobuf Pipeline instance

    Returns:
        List of step names from direct upstream steps
    """
    node = self._get_node_with_step_name(step.name, pb2_pipeline)

    upstream_steps = []
    for upstream_node in node.upstream_nodes:
        upstream_steps.append(upstream_node)

    return upstream_steps

`prepare_or_run_pipeline(self, sorted_steps, pipeline, pb2_pipeline, stack, runtime_configuration)`

This method needs to be implemented by the respective orchestrator.

Depending on the type of orchestrator you'll have to perform slightly different operations.

Simple Case:

The Steps are run directly from within the same environment in which the orchestrator code is executed. In this case you will need to deal with implementation-specific runtime configurations (like the schedule) and then iterate through each step and finally call self.run_step() to execute each step.

Advanced Case:

Most orchestrators will not run the steps directly. Instead, they build some intermediate representation of the pipeline that is then used to create and run the pipeline and its steps on the target environment. For such orchestrators this method will have to build this representation and either deploy it directly or return it.

Regardless of the implementation details, the orchestrator will need to a way to trigger each step in the target environment. For this the run_step() method should be used.

In case the orchestrator is using docker containers for orchestration of each step, the zenml.entrypoints.step_entrypoint module can be used as a generalized entrypoint that sets up all the necessary prerequisites, parses input parameters and finally executes the step using the run_step()method.

If the orchestrator needs to know the upstream steps for a specific step to build a DAG, it can use the get_upstream_step_names() method to get them.

Parameters:

Name	Type	Description	Default
`sorted_steps`	`List[zenml.steps.base_step.BaseStep]`	List of sorted steps.	required
`pipeline`	`BasePipeline`	Zenml Pipeline instance.	required
`pb2_pipeline`	`Pipeline`	Protobuf Pipeline instance.	required
`stack`	`Stack`	The stack the pipeline was run on.	required
`runtime_configuration`	`RuntimeConfiguration`	The Runtime configuration of the current run.	required

Returns:

Type	Description
`Any`	The optional return value from this method will be returned by the `pipeline_instance.run()` call when someone is running a pipeline.

Source code in zenml/orchestrators/base_orchestrator.py

@abstractmethod
def prepare_or_run_pipeline(
    self,
    sorted_steps: List[BaseStep],
    pipeline: "BasePipeline",
    pb2_pipeline: Pb2Pipeline,
    stack: "Stack",
    runtime_configuration: "RuntimeConfiguration",
) -> Any:
    """This method needs to be implemented by the respective orchestrator.

    Depending on the type of orchestrator you'll have to perform slightly
    different operations.

    Simple Case:
    ------------
    The Steps are run directly from within the same environment in which
    the orchestrator code is executed. In this case you will need to
    deal with implementation-specific runtime configurations (like the
    schedule) and then iterate through each step and finally call
    `self.run_step()` to execute each step.

    Advanced Case:
    --------------
    Most orchestrators will not run the steps directly. Instead, they
    build some intermediate representation of the pipeline that is then
    used to create and run the pipeline and its steps on the target
    environment. For such orchestrators this method will have to build
    this representation and either deploy it directly or return it.

    Regardless of the implementation details, the orchestrator will need
    to a way to trigger each step in the target environment. For this
    the `run_step()` method should be used.

    In case the orchestrator is using docker containers for orchestration
    of each step, the `zenml.entrypoints.step_entrypoint` module can be
    used as a generalized entrypoint that sets up all the necessary
    prerequisites, parses input parameters and finally executes the step
    using the `run_step()`method.

    If the orchestrator needs to know the upstream steps for a specific
    step to build a DAG, it can use the `get_upstream_step_names()` method
    to get them.

    Args:
        sorted_steps: List of sorted steps.
        pipeline: Zenml Pipeline instance.
        pb2_pipeline: Protobuf Pipeline instance.
        stack: The stack the pipeline was run on.
        runtime_configuration: The Runtime configuration of the current run.

    Returns:
        The optional return value from this method will be returned by the
        `pipeline_instance.run()` call when someone is running a pipeline.
    """

`requires_resources_in_orchestration_environment(step)` `staticmethod`

Checks if the orchestrator should run this step on special resources.

Parameters:

Name	Type	Description	Default
`step`	`BaseStep`	The step that will be checked.	required

Returns:

Type	Description
`bool`	True if the step requires special resources in the orchestration environment, False otherwise.

Source code in zenml/orchestrators/base_orchestrator.py

@staticmethod
def requires_resources_in_orchestration_environment(
    step: "BaseStep",
) -> bool:
    """Checks if the orchestrator should run this step on special resources.

    Args:
        step: The step that will be checked.

    Returns:
        True if the step requires special resources in the orchestration
        environment, False otherwise.
    """
    # If the step requires custom resources and doesn't run with a step
    # operator, it would need these requirements in the orchestrator
    # environment
    return not (
        step.custom_step_operator or step.resource_configuration.empty
    )

`run(self, pipeline, stack, runtime_configuration)`

Runs a pipeline.

To do this, a protobuf pipeline is created, the context of the individual steps is expanded to include relevant data, the steps are sorted into execution order and the implementation specific prepare_or_run_pipeline() method is called.

Parameters:

Name	Type	Description	Default
`pipeline`	`BasePipeline`	The pipeline to run.	required
`stack`	`Stack`	The stack on which the pipeline is run.	required
`runtime_configuration`	`RuntimeConfiguration`	Runtime configuration of the pipeline run.	required

Returns:

Type	Description
`Any`	The result of the call to `prepare_or_run_pipeline()`.

Source code in zenml/orchestrators/base_orchestrator.py

def run(
    self,
    pipeline: "BasePipeline",
    stack: "Stack",
    runtime_configuration: "RuntimeConfiguration",
) -> Any:
    """Runs a pipeline.

    To do this, a protobuf pipeline is created, the context of the
    individual steps is expanded to include relevant data, the steps are
    sorted into execution order and the implementation specific
    `prepare_or_run_pipeline()` method is called.

    Args:
        pipeline: The pipeline to run.
        stack: The stack on which the pipeline is run.
        runtime_configuration: Runtime configuration of the pipeline run.

    Returns:
        The result of the call to `prepare_or_run_pipeline()`.
    """
    # Create the protobuf pipeline which will be needed for various reasons
    # in the following steps
    pb2_pipeline: Pb2Pipeline = Compiler().compile(
        create_tfx_pipeline(pipeline, stack=stack)
    )

    self._configure_node_context(
        pipeline=pipeline,
        pb2_pipeline=pb2_pipeline,
        stack=stack,
        runtime_configuration=runtime_configuration,
    )

    sorted_steps = self._get_sorted_steps(
        pipeline=pipeline, pb2_pipeline=pb2_pipeline
    )

    result = self.prepare_or_run_pipeline(
        sorted_steps=sorted_steps,
        pipeline=pipeline,
        pb2_pipeline=pb2_pipeline,
        stack=stack,
        runtime_configuration=runtime_configuration,
    )

    return result

`run_step(self, step, run_name, pb2_pipeline)`

This sets up a component launcher and executes the given step.

Parameters:

Name	Type	Description	Default
`step`	`BaseStep`	The step to be executed	required
`run_name`	`str`	The unique run name	required
`pb2_pipeline`	`Pipeline`	Protobuf Pipeline instance	required

Returns:

Type	Description
`Optional[tfx.orchestration.portable.data_types.ExecutionInfo]`	The execution info of the step.

Source code in zenml/orchestrators/base_orchestrator.py

def run_step(
    self,
    step: "BaseStep",
    run_name: str,
    pb2_pipeline: Pb2Pipeline,
) -> Optional[data_types.ExecutionInfo]:
    """This sets up a component launcher and executes the given step.

    Args:
        step: The step to be executed
        run_name: The unique run name
        pb2_pipeline: Protobuf Pipeline instance

    Returns:
        The execution info of the step.
    """
    # Substitute the runtime parameter to be a concrete run_id, it is
    # important for this to be unique for each run.
    runtime_parameter_utils.substitute_runtime_parameter(
        pb2_pipeline,
        {PIPELINE_RUN_ID_PARAMETER_NAME: run_name},
    )

    # Extract the deployment_configs and use it to access the executor and
    # custom driver spec
    deployment_config = runner_utils.extract_local_deployment_config(
        pb2_pipeline
    )
    executor_spec = runner_utils.extract_executor_spec(
        deployment_config, step.name
    )
    custom_driver_spec = runner_utils.extract_custom_driver_spec(
        deployment_config, step.name
    )

    # At this point the active metadata store is queried for the
    # metadata_connection
    repo = Repository()
    metadata_store = repo.active_stack.metadata_store
    metadata_connection = metadata.Metadata(
        metadata_store.get_tfx_metadata_config()
    )
    custom_executor_operators = {
        executable_spec_pb2.PythonClassExecutableSpec: step.executor_operator
    }

    # The protobuf node for the current step is loaded here.
    pipeline_node = self._get_node_with_step_name(
        step_name=step.name, pb2_pipeline=pb2_pipeline
    )

    # Create the tfx launcher responsible for executing the step.
    component_launcher = launcher.Launcher(
        pipeline_node=pipeline_node,
        mlmd_connection=metadata_connection,
        pipeline_info=pb2_pipeline.pipeline_info,
        pipeline_runtime_spec=pb2_pipeline.runtime_spec,
        executor_spec=executor_spec,
        custom_driver_spec=custom_driver_spec,
        custom_executor_operators=custom_executor_operators,
    )

    # In some stack configurations, some stack components (like experiment
    # trackers) will run some code before and after the actual step run.
    # This is where the step actually gets executed using the
    # component_launcher
    repo.active_stack.prepare_step_run()
    execution_info = self._execute_step(component_launcher)
    repo.active_stack.cleanup_step_run()

    return execution_info

`local` `special`

Initialization for the local orchestrator.

`local_orchestrator`

Implementation of the ZenML local orchestrator.

`LocalOrchestrator (BaseOrchestrator)` `pydantic-model`

Orchestrator responsible for running pipelines locally.

This orchestrator does not allow for concurrent execution of steps and also does not support running on a schedule.

Source code in zenml/orchestrators/local/local_orchestrator.py

class LocalOrchestrator(BaseOrchestrator):
    """Orchestrator responsible for running pipelines locally.

    This orchestrator does not allow for concurrent execution of steps and also
    does not support running on a schedule.
    """

    FLAVOR: ClassVar[str] = "local"

    def prepare_or_run_pipeline(
        self,
        sorted_steps: List[BaseStep],
        pipeline: "BasePipeline",
        pb2_pipeline: Pb2Pipeline,
        stack: "Stack",
        runtime_configuration: "RuntimeConfiguration",
    ) -> Any:
        """This method iterates through all steps and executes them sequentially.

        Args:
            sorted_steps: A list of steps in the pipeline.
            pipeline: The pipeline object.
            pb2_pipeline: The pipeline object in protobuf format.
            stack: The stack object.
            runtime_configuration: The runtime configuration object.
        """
        if runtime_configuration.schedule:
            logger.warning(
                "Local Orchestrator currently does not support the"
                "use of schedules. The `schedule` will be ignored "
                "and the pipeline will be run immediately."
            )
        assert runtime_configuration.run_name, "Run name must be set"

        # Run each step
        for step in sorted_steps:
            if self.requires_resources_in_orchestration_environment(step):
                logger.warning(
                    "Specifying step resources is not supported for the local "
                    "orchestrator, ignoring resource configuration for "
                    "step %s.",
                    step.name,
                )

            self.run_step(
                step=step,
                run_name=runtime_configuration.run_name,
                pb2_pipeline=pb2_pipeline,
            )

`prepare_or_run_pipeline(self, sorted_steps, pipeline, pb2_pipeline, stack, runtime_configuration)`

This method iterates through all steps and executes them sequentially.

Parameters:

Name	Type	Description	Default
`sorted_steps`	`List[zenml.steps.base_step.BaseStep]`	A list of steps in the pipeline.	required
`pipeline`	`BasePipeline`	The pipeline object.	required
`pb2_pipeline`	`Pipeline`	The pipeline object in protobuf format.	required
`stack`	`Stack`	The stack object.	required
`runtime_configuration`	`RuntimeConfiguration`	The runtime configuration object.	required

Source code in zenml/orchestrators/local/local_orchestrator.py

def prepare_or_run_pipeline(
    self,
    sorted_steps: List[BaseStep],
    pipeline: "BasePipeline",
    pb2_pipeline: Pb2Pipeline,
    stack: "Stack",
    runtime_configuration: "RuntimeConfiguration",
) -> Any:
    """This method iterates through all steps and executes them sequentially.

    Args:
        sorted_steps: A list of steps in the pipeline.
        pipeline: The pipeline object.
        pb2_pipeline: The pipeline object in protobuf format.
        stack: The stack object.
        runtime_configuration: The runtime configuration object.
    """
    if runtime_configuration.schedule:
        logger.warning(
            "Local Orchestrator currently does not support the"
            "use of schedules. The `schedule` will be ignored "
            "and the pipeline will be run immediately."
        )
    assert runtime_configuration.run_name, "Run name must be set"

    # Run each step
    for step in sorted_steps:
        if self.requires_resources_in_orchestration_environment(step):
            logger.warning(
                "Specifying step resources is not supported for the local "
                "orchestrator, ignoring resource configuration for "
                "step %s.",
                step.name,
            )

        self.run_step(
            step=step,
            run_name=runtime_configuration.run_name,
            pb2_pipeline=pb2_pipeline,
        )

`utils`

Utility functions for the orchestrator.

`add_context_to_node(pipeline_node, type_, name, properties)`

Adds a new context to a TFX protobuf pipeline node.

Parameters:

Name	Type	Description	Default
`pipeline_node`	`PipelineNode`	A tfx protobuf pipeline node	required
`type_`	`str`	The type name for the context to be added	required
`name`	`str`	Unique key for the context	required
`properties`	`Dict[str, str]`	dictionary of strings as properties of the context	required

Source code in zenml/orchestrators/utils.py

def add_context_to_node(
    pipeline_node: PipelineNode,
    type_: str,
    name: str,
    properties: Dict[str, str],
) -> None:
    """Adds a new context to a TFX protobuf pipeline node.

    Args:
        pipeline_node: A tfx protobuf pipeline node
        type_: The type name for the context to be added
        name: Unique key for the context
        properties: dictionary of strings as properties of the context
    """
    # Add a new context to the pipeline
    context: ContextSpec = pipeline_node.contexts.contexts.add()
    # Adding the type of context
    context.type.name = type_
    # Setting the name of the context
    context.name.field_value.string_value = name
    # Setting the properties of the context depending on attribute type
    for key, value in properties.items():
        c_property = context.properties[key]
        c_property.field_value.string_value = value

`create_tfx_pipeline(zenml_pipeline, stack)`

Creates a tfx pipeline from a ZenML pipeline.

Parameters:

Name	Type	Description	Default
`zenml_pipeline`	`BasePipeline`	The ZenML pipeline.	required
`stack`	`Stack`	The stack.	required

Returns:

Type	Description
`Pipeline`	The tfx pipeline.

Exceptions:

Type	Description
`KeyError`	If a step contains an upstream step which is not part of the pipeline.

Source code in zenml/orchestrators/utils.py

def create_tfx_pipeline(
    zenml_pipeline: "BasePipeline", stack: "Stack"
) -> tfx_pipeline.Pipeline:
    """Creates a tfx pipeline from a ZenML pipeline.

    Args:
        zenml_pipeline: The ZenML pipeline.
        stack: The stack.

    Returns:
        The tfx pipeline.

    Raises:
        KeyError: If a step contains an upstream step which is not part of
            the pipeline.
    """
    # Connect the inputs/outputs of all steps in the pipeline
    zenml_pipeline.connect(**zenml_pipeline.steps)

    tfx_components = {
        step.name: step.component for step in zenml_pipeline.steps.values()
    }

    # Add potential task dependencies that users specified
    for step in zenml_pipeline.steps.values():
        for upstream_step in step.upstream_steps:
            try:
                upstream_node = tfx_components[upstream_step]
            except KeyError:
                raise KeyError(
                    f"Unable to find upstream step `{upstream_step}` for step "
                    f"`{step.name}`. Available steps: {set(tfx_components)}."
                )

            step.component.add_upstream_node(upstream_node)

    artifact_store = stack.artifact_store

    # We do not pass the metadata connection config here as it might not be
    # accessible. Instead it is queried from the active stack right before a
    # step is executed (see `BaseOrchestrator.run_step(...)`)
    return tfx_pipeline.Pipeline(
        pipeline_name=zenml_pipeline.name,
        components=list(tfx_components.values()),
        pipeline_root=artifact_store.path,
        enable_cache=zenml_pipeline.enable_cache,
    )

`get_cache_status(execution_info)`

Returns whether a cached execution was used or not.

Parameters:

Name	Type	Description	Default
`execution_info`	`Optional[tfx.orchestration.portable.data_types.ExecutionInfo]`	The execution info.	required

Returns:

Type	Description
`bool`	`True` if the execution was cached, `False` otherwise.

Source code in zenml/orchestrators/utils.py

def get_cache_status(
    execution_info: Optional[data_types.ExecutionInfo],
) -> bool:
    """Returns whether a cached execution was used or not.

    Args:
        execution_info: The execution info.

    Returns:
        `True` if the execution was cached, `False` otherwise.
    """
    # An execution output URI is only provided if the step needs to be
    # executed (= is not cached)
    if execution_info and execution_info.execution_output_uri is None:
        return True
    else:
        return False

`get_step_for_node(node, steps)`

Finds the matching step for a tfx pipeline node.

Parameters:

Name	Type	Description	Default
`node`	`PipelineNode`	The tfx pipeline node.	required
`steps`	`List[zenml.steps.base_step.BaseStep]`	The list of steps.	required

Returns:

Type	Description
`BaseStep`	The matching step.

Exceptions:

Type	Description
`RuntimeError`	If no matching step is found.

Source code in zenml/orchestrators/utils.py

def get_step_for_node(node: PipelineNode, steps: List[BaseStep]) -> BaseStep:
    """Finds the matching step for a tfx pipeline node.

    Args:
        node: The tfx pipeline node.
        steps: The list of steps.

    Returns:
        The matching step.

    Raises:
        RuntimeError: If no matching step is found.
    """
    step_name = node.node_info.id
    try:
        return next(step for step in steps if step.name == step_name)
    except StopIteration:
        raise RuntimeError(f"Unable to find step with name '{step_name}'.")

Orchestrators

zenml.orchestrators special

base_orchestrator

BaseOrchestrator (StackComponent, ABC) pydantic-model

How it works:

Building your own:

get_upstream_step_names(self, step, pb2_pipeline)

prepare_or_run_pipeline(self, sorted_steps, pipeline, pb2_pipeline, stack, runtime_configuration)

Simple Case:

Advanced Case:

requires_resources_in_orchestration_environment(step) staticmethod

run(self, pipeline, stack, runtime_configuration)

run_step(self, step, run_name, pb2_pipeline)

local special

local_orchestrator

LocalOrchestrator (BaseOrchestrator) pydantic-model

prepare_or_run_pipeline(self, sorted_steps, pipeline, pb2_pipeline, stack, runtime_configuration)

utils

add_context_to_node(pipeline_node, type_, name, properties)

create_tfx_pipeline(zenml_pipeline, stack)

get_cache_status(execution_info)

get_step_for_node(node, steps)

`zenml.orchestrators` `special`

`base_orchestrator`

`BaseOrchestrator (StackComponent, ABC)` `pydantic-model`

`get_upstream_step_names(self, step, pb2_pipeline)`

`prepare_or_run_pipeline(self, sorted_steps, pipeline, pb2_pipeline, stack, runtime_configuration)`

`requires_resources_in_orchestration_environment(step)` `staticmethod`

`run(self, pipeline, stack, runtime_configuration)`

`run_step(self, step, run_name, pb2_pipeline)`

`local` `special`

`local_orchestrator`

`LocalOrchestrator (BaseOrchestrator)` `pydantic-model`

`prepare_or_run_pipeline(self, sorted_steps, pipeline, pb2_pipeline, stack, runtime_configuration)`

`utils`

`add_context_to_node(pipeline_node, type_, name, properties)`

`create_tfx_pipeline(zenml_pipeline, stack)`

`get_cache_status(execution_info)`

`get_step_for_node(node, steps)`