Skip to content

Orchestrators

zenml.orchestrators special

Initialization for ZenML orchestrators.

An orchestrator is a special kind of backend that manages the running of each step of the pipeline. Orchestrators administer the actual pipeline runs. You can think of it as the 'root' of any pipeline job that you run during your experimentation.

ZenML supports a local orchestrator out of the box which allows you to run your pipelines in a local environment. We also support using Apache Airflow as the orchestrator to handle the steps of your pipeline.

base_orchestrator

Base orchestrator class.

BaseOrchestrator (StackComponent, ABC)

Base class for all orchestrators.

In order to implement an orchestrator you will need to subclass from this class.

How it works:

The run(...) method is the entrypoint that is executed when the pipeline's run method is called within the user code (pipeline_instance.run(...)).

This method will do some internal preparation and then call the prepare_or_run_pipeline(...) method. BaseOrchestrator subclasses must implement this method and either run the pipeline steps directly or deploy the pipeline to some remote infrastructure.

Source code in zenml/orchestrators/base_orchestrator.py
class BaseOrchestrator(StackComponent, ABC):
    """Base class for all orchestrators.

    In order to implement an orchestrator you will need to subclass from this
    class.

    How it works:
    -------------
    The `run(...)` method is the entrypoint that is executed when the
    pipeline's run method is called within the user code
    (`pipeline_instance.run(...)`).

    This method will do some internal preparation and then call the
    `prepare_or_run_pipeline(...)` method. BaseOrchestrator subclasses must
    implement this method and either run the pipeline steps directly or deploy
    the pipeline to some remote infrastructure.
    """

    _active_deployment: Optional["PipelineDeploymentResponseModel"] = None

    @property
    def config(self) -> BaseOrchestratorConfig:
        """Returns the `BaseOrchestratorConfig` config.

        Returns:
            The configuration.
        """
        return cast(BaseOrchestratorConfig, self._config)

    @abstractmethod
    def get_orchestrator_run_id(self) -> str:
        """Returns the run id of the active orchestrator run.

        Important: This needs to be a unique ID and return the same value for
        all steps of a pipeline run.

        Returns:
            The orchestrator run id.
        """

    @abstractmethod
    def prepare_or_run_pipeline(
        self,
        deployment: "PipelineDeploymentResponseModel",
        stack: "Stack",
    ) -> Any:
        """The method needs to be implemented by the respective orchestrator.

        Depending on the type of orchestrator you'll have to perform slightly
        different operations.

        Simple Case:
        ------------
        The Steps are run directly from within the same environment in which
        the orchestrator code is executed. In this case you will need to
        deal with implementation-specific runtime configurations (like the
        schedule) and then iterate through the steps and finally call
        `self.run_step(...)` to execute each step.

        Advanced Case:
        --------------
        Most orchestrators will not run the steps directly. Instead, they
        build some intermediate representation of the pipeline that is then
        used to create and run the pipeline and its steps on the target
        environment. For such orchestrators this method will have to build
        this representation and deploy it.

        Regardless of the implementation details, the orchestrator will need
        to run each step in the target environment. For this the
        `self.run_step(...)` method should be used.

        The easiest way to make this work is by using an entrypoint
        configuration to run single steps (`zenml.entrypoints.step_entrypoint_configuration.StepEntrypointConfiguration`)
        or entire pipelines (`zenml.entrypoints.pipeline_entrypoint_configuration.PipelineEntrypointConfiguration`).

        Args:
            deployment: The pipeline deployment to prepare or run.
            stack: The stack the pipeline will run on.

        Returns:
            The optional return value from this method will be returned by the
            `pipeline_instance.run()` call when someone is running a pipeline.
        """

    def run(
        self,
        deployment: "PipelineDeploymentResponseModel",
        stack: "Stack",
    ) -> Any:
        """Runs a pipeline on a stack.

        Args:
            deployment: The pipeline deployment.
            stack: The stack on which to run the pipeline.

        Returns:
            Orchestrator-specific return value.
        """
        self._prepare_run(deployment=deployment)
        try:
            result = self.prepare_or_run_pipeline(
                deployment=deployment, stack=stack
            )
        finally:
            self._cleanup_run()

        return result

    def run_step(self, step: "Step") -> None:
        """Runs the given step.

        Args:
            step: The step to run.
        """
        assert self._active_deployment
        launcher = StepLauncher(
            deployment=self._active_deployment,
            step=step,
            orchestrator_run_id=self.get_orchestrator_run_id(),
        )
        launcher.launch()

    @staticmethod
    def requires_resources_in_orchestration_environment(
        step: "Step",
    ) -> bool:
        """Checks if the orchestrator should run this step on special resources.

        Args:
            step: The step that will be checked.

        Returns:
            True if the step requires special resources in the orchestration
            environment, False otherwise.
        """
        # If the step requires custom resources and doesn't run with a step
        # operator, it would need these requirements in the orchestrator
        # environment
        if step.config.step_operator:
            return False

        return not step.config.resource_settings.empty

    def _prepare_run(
        self, deployment: "PipelineDeploymentResponseModel"
    ) -> None:
        """Prepares a run.

        Args:
            deployment: The deployment to prepare.
        """
        self._active_deployment = deployment

    def _cleanup_run(self) -> None:
        """Cleans up the active run."""
        self._active_deployment = None
config: BaseOrchestratorConfig property readonly

Returns the BaseOrchestratorConfig config.

Returns:

Type Description
BaseOrchestratorConfig

The configuration.

get_orchestrator_run_id(self)

Returns the run id of the active orchestrator run.

Important: This needs to be a unique ID and return the same value for all steps of a pipeline run.

Returns:

Type Description
str

The orchestrator run id.

Source code in zenml/orchestrators/base_orchestrator.py
@abstractmethod
def get_orchestrator_run_id(self) -> str:
    """Returns the run id of the active orchestrator run.

    Important: This needs to be a unique ID and return the same value for
    all steps of a pipeline run.

    Returns:
        The orchestrator run id.
    """
prepare_or_run_pipeline(self, deployment, stack)

The method needs to be implemented by the respective orchestrator.

Depending on the type of orchestrator you'll have to perform slightly different operations.

Simple Case:

The Steps are run directly from within the same environment in which the orchestrator code is executed. In this case you will need to deal with implementation-specific runtime configurations (like the schedule) and then iterate through the steps and finally call self.run_step(...) to execute each step.

Advanced Case:

Most orchestrators will not run the steps directly. Instead, they build some intermediate representation of the pipeline that is then used to create and run the pipeline and its steps on the target environment. For such orchestrators this method will have to build this representation and deploy it.

Regardless of the implementation details, the orchestrator will need to run each step in the target environment. For this the self.run_step(...) method should be used.

The easiest way to make this work is by using an entrypoint configuration to run single steps (zenml.entrypoints.step_entrypoint_configuration.StepEntrypointConfiguration) or entire pipelines (zenml.entrypoints.pipeline_entrypoint_configuration.PipelineEntrypointConfiguration).

Parameters:

Name Type Description Default
deployment PipelineDeploymentResponseModel

The pipeline deployment to prepare or run.

required
stack Stack

The stack the pipeline will run on.

required

Returns:

Type Description
Any

The optional return value from this method will be returned by the pipeline_instance.run() call when someone is running a pipeline.

Source code in zenml/orchestrators/base_orchestrator.py
@abstractmethod
def prepare_or_run_pipeline(
    self,
    deployment: "PipelineDeploymentResponseModel",
    stack: "Stack",
) -> Any:
    """The method needs to be implemented by the respective orchestrator.

    Depending on the type of orchestrator you'll have to perform slightly
    different operations.

    Simple Case:
    ------------
    The Steps are run directly from within the same environment in which
    the orchestrator code is executed. In this case you will need to
    deal with implementation-specific runtime configurations (like the
    schedule) and then iterate through the steps and finally call
    `self.run_step(...)` to execute each step.

    Advanced Case:
    --------------
    Most orchestrators will not run the steps directly. Instead, they
    build some intermediate representation of the pipeline that is then
    used to create and run the pipeline and its steps on the target
    environment. For such orchestrators this method will have to build
    this representation and deploy it.

    Regardless of the implementation details, the orchestrator will need
    to run each step in the target environment. For this the
    `self.run_step(...)` method should be used.

    The easiest way to make this work is by using an entrypoint
    configuration to run single steps (`zenml.entrypoints.step_entrypoint_configuration.StepEntrypointConfiguration`)
    or entire pipelines (`zenml.entrypoints.pipeline_entrypoint_configuration.PipelineEntrypointConfiguration`).

    Args:
        deployment: The pipeline deployment to prepare or run.
        stack: The stack the pipeline will run on.

    Returns:
        The optional return value from this method will be returned by the
        `pipeline_instance.run()` call when someone is running a pipeline.
    """
requires_resources_in_orchestration_environment(step) staticmethod

Checks if the orchestrator should run this step on special resources.

Parameters:

Name Type Description Default
step Step

The step that will be checked.

required

Returns:

Type Description
bool

True if the step requires special resources in the orchestration environment, False otherwise.

Source code in zenml/orchestrators/base_orchestrator.py
@staticmethod
def requires_resources_in_orchestration_environment(
    step: "Step",
) -> bool:
    """Checks if the orchestrator should run this step on special resources.

    Args:
        step: The step that will be checked.

    Returns:
        True if the step requires special resources in the orchestration
        environment, False otherwise.
    """
    # If the step requires custom resources and doesn't run with a step
    # operator, it would need these requirements in the orchestrator
    # environment
    if step.config.step_operator:
        return False

    return not step.config.resource_settings.empty
run(self, deployment, stack)

Runs a pipeline on a stack.

Parameters:

Name Type Description Default
deployment PipelineDeploymentResponseModel

The pipeline deployment.

required
stack Stack

The stack on which to run the pipeline.

required

Returns:

Type Description
Any

Orchestrator-specific return value.

Source code in zenml/orchestrators/base_orchestrator.py
def run(
    self,
    deployment: "PipelineDeploymentResponseModel",
    stack: "Stack",
) -> Any:
    """Runs a pipeline on a stack.

    Args:
        deployment: The pipeline deployment.
        stack: The stack on which to run the pipeline.

    Returns:
        Orchestrator-specific return value.
    """
    self._prepare_run(deployment=deployment)
    try:
        result = self.prepare_or_run_pipeline(
            deployment=deployment, stack=stack
        )
    finally:
        self._cleanup_run()

    return result
run_step(self, step)

Runs the given step.

Parameters:

Name Type Description Default
step Step

The step to run.

required
Source code in zenml/orchestrators/base_orchestrator.py
def run_step(self, step: "Step") -> None:
    """Runs the given step.

    Args:
        step: The step to run.
    """
    assert self._active_deployment
    launcher = StepLauncher(
        deployment=self._active_deployment,
        step=step,
        orchestrator_run_id=self.get_orchestrator_run_id(),
    )
    launcher.launch()

BaseOrchestratorConfig (StackComponentConfig) pydantic-model

Base orchestrator config.

Source code in zenml/orchestrators/base_orchestrator.py
class BaseOrchestratorConfig(StackComponentConfig):
    """Base orchestrator config."""

    @root_validator(pre=True)
    def _deprecations(cls, values: Dict[str, Any]) -> Dict[str, Any]:
        """Validate and/or remove deprecated fields.

        Args:
            values: The values to validate.

        Returns:
            The validated values.
        """
        if "custom_docker_base_image_name" in values:
            image_name = values.pop("custom_docker_base_image_name", None)
            if image_name:
                logger.warning(
                    "The 'custom_docker_base_image_name' field has been "
                    "deprecated. To use a custom base container image with your "
                    "orchestrators, please use the DockerSettings in your "
                    "pipeline (see https://docs.zenml.io/advanced-guide/pipelines/containerization)."
                )

        return values

BaseOrchestratorFlavor (Flavor)

Base orchestrator flavor class.

Source code in zenml/orchestrators/base_orchestrator.py
class BaseOrchestratorFlavor(Flavor):
    """Base orchestrator flavor class."""

    @property
    def type(self) -> StackComponentType:
        """Returns the flavor type.

        Returns:
            The flavor type.
        """
        return StackComponentType.ORCHESTRATOR

    @property
    def config_class(self) -> Type[BaseOrchestratorConfig]:
        """Config class for the base orchestrator flavor.

        Returns:
            The config class.
        """
        return BaseOrchestratorConfig

    @property
    @abstractmethod
    def implementation_class(self) -> Type["BaseOrchestrator"]:
        """Implementation class for this flavor.

        Returns:
            The implementation class.
        """
config_class: Type[zenml.orchestrators.base_orchestrator.BaseOrchestratorConfig] property readonly

Config class for the base orchestrator flavor.

Returns:

Type Description
Type[zenml.orchestrators.base_orchestrator.BaseOrchestratorConfig]

The config class.

implementation_class: Type[BaseOrchestrator] property readonly

Implementation class for this flavor.

Returns:

Type Description
Type[BaseOrchestrator]

The implementation class.

type: StackComponentType property readonly

Returns the flavor type.

Returns:

Type Description
StackComponentType

The flavor type.

cache_utils

Utilities for caching.

generate_cache_key(step, input_artifact_ids, artifact_store, workspace_id)

Generates a cache key for a step run.

If the cache key is the same for two step runs, we conclude that the step runs are identical and can be cached.

The cache key is a MD5 hash of: - the workspace ID, - the artifact store ID and path, - the source code that defines the step, - the parameters of the step, - the names and IDs of the input artifacts of the step, - the names and source codes of the output artifacts of the step, - the source codes of the output materializers of the step. - additional custom caching parameters of the step.

Parameters:

Name Type Description Default
step Step

The step to generate the cache key for.

required
input_artifact_ids Dict[str, UUID]

The input artifact IDs for the step.

required
artifact_store BaseArtifactStore

The artifact store of the active stack.

required
workspace_id UUID

The ID of the active workspace.

required

Returns:

Type Description
str

A cache key.

Source code in zenml/orchestrators/cache_utils.py
def generate_cache_key(
    step: "Step",
    input_artifact_ids: Dict[str, "UUID"],
    artifact_store: "BaseArtifactStore",
    workspace_id: "UUID",
) -> str:
    """Generates a cache key for a step run.

    If the cache key is the same for two step runs, we conclude that the step
    runs are identical and can be cached.

    The cache key is a MD5 hash of:
    - the workspace ID,
    - the artifact store ID and path,
    - the source code that defines the step,
    - the parameters of the step,
    - the names and IDs of the input artifacts of the step,
    - the names and source codes of the output artifacts of the step,
    - the source codes of the output materializers of the step.
    - additional custom caching parameters of the step.

    Args:
        step: The step to generate the cache key for.
        input_artifact_ids: The input artifact IDs for the step.
        artifact_store: The artifact store of the active stack.
        workspace_id: The ID of the active workspace.

    Returns:
        A cache key.
    """
    hash_ = hashlib.md5()

    # Workspace ID
    hash_.update(workspace_id.bytes)

    # Artifact store ID and path
    hash_.update(artifact_store.id.bytes)
    hash_.update(artifact_store.path.encode())

    # Step source code
    hash_.update(step.spec.source.encode())

    # Step parameters
    for key, value in sorted(step.config.parameters.items()):
        hash_.update(key.encode())
        hash_.update(str(value).encode())

    # Input artifacts
    for name, artifact_id in input_artifact_ids.items():
        hash_.update(name.encode())
        hash_.update(artifact_id.bytes)

    # Output artifacts and materializers
    for name, output in step.config.outputs.items():
        hash_.update(name.encode())
        hash_.update(output.materializer_source.encode())

    # Custom caching parameters
    for key, value in sorted(step.config.caching_parameters.items()):
        hash_.update(key.encode())
        hash_.update(str(value).encode())

    return hash_.hexdigest()

get_cached_step_run(cache_key)

If a given step can be cached, get the corresponding existing step run.

A step run can be cached if there is an existing step run in the same workspace which has the same cache key and was successfully executed.

Parameters:

Name Type Description Default
cache_key str

The cache key of the step.

required

Returns:

Type Description
Optional[StepRunResponseModel]

The existing step run if the step can be cached, otherwise None.

Source code in zenml/orchestrators/cache_utils.py
def get_cached_step_run(cache_key: str) -> Optional["StepRunResponseModel"]:
    """If a given step can be cached, get the corresponding existing step run.

    A step run can be cached if there is an existing step run in the same
    workspace which has the same cache key and was successfully executed.

    Args:
        cache_key: The cache key of the step.

    Returns:
        The existing step run if the step can be cached, otherwise None.
    """
    client = Client()

    cache_candidates = client.list_run_steps(
        workspace_id=client.active_workspace.id,
        cache_key=cache_key,
        status=ExecutionStatus.COMPLETED,
        sort_by=f"{SorterOps.DESCENDING}:created",
        size=1,
    ).items

    if cache_candidates:
        return cache_candidates[0]
    return None

containerized_orchestrator

Containerized orchestrator class.

ContainerizedOrchestrator (BaseOrchestrator, ABC)

Base class for containerized orchestrators.

Source code in zenml/orchestrators/containerized_orchestrator.py
class ContainerizedOrchestrator(BaseOrchestrator, ABC):
    """Base class for containerized orchestrators."""

    @staticmethod
    def get_image(
        deployment: "PipelineDeploymentResponseModel",
        step_name: Optional[str] = None,
    ) -> str:
        """Gets the Docker image for the pipeline/a step.

        Args:
            deployment: The deployment from which to get the image.
            step_name: Pipeline step name for which to get the image. If not
                given the generic pipeline image will be returned.

        Raises:
            RuntimeError: If the deployment does not have an associated build.

        Returns:
            The image name or digest.
        """
        if not deployment.build:
            raise RuntimeError(
                f"Missing build for deployment {deployment.id}. This is "
                "probably because the build was manually deleted."
            )

        return deployment.build.get_image(
            component_key=ORCHESTRATOR_DOCKER_IMAGE_KEY, step=step_name
        )

    def get_docker_builds(
        self, deployment: "PipelineDeploymentBaseModel"
    ) -> List["BuildConfiguration"]:
        """Gets the Docker builds required for the component.

        Args:
            deployment: The pipeline deployment for which to get the builds.

        Returns:
            The required Docker builds.
        """
        pipeline_settings = deployment.pipeline_configuration.docker_settings

        included_pipeline_build = False
        builds = []

        for name, step in deployment.step_configurations.items():
            step_settings = step.config.docker_settings

            if step_settings != pipeline_settings:
                build = BuildConfiguration(
                    key=ORCHESTRATOR_DOCKER_IMAGE_KEY,
                    settings=step_settings,
                    step_name=name,
                )
                builds.append(build)
            elif not included_pipeline_build:
                pipeline_build = BuildConfiguration(
                    key=ORCHESTRATOR_DOCKER_IMAGE_KEY,
                    settings=pipeline_settings,
                )
                builds.append(pipeline_build)
                included_pipeline_build = True

        return builds
get_docker_builds(self, deployment)

Gets the Docker builds required for the component.

Parameters:

Name Type Description Default
deployment PipelineDeploymentBaseModel

The pipeline deployment for which to get the builds.

required

Returns:

Type Description
List[BuildConfiguration]

The required Docker builds.

Source code in zenml/orchestrators/containerized_orchestrator.py
def get_docker_builds(
    self, deployment: "PipelineDeploymentBaseModel"
) -> List["BuildConfiguration"]:
    """Gets the Docker builds required for the component.

    Args:
        deployment: The pipeline deployment for which to get the builds.

    Returns:
        The required Docker builds.
    """
    pipeline_settings = deployment.pipeline_configuration.docker_settings

    included_pipeline_build = False
    builds = []

    for name, step in deployment.step_configurations.items():
        step_settings = step.config.docker_settings

        if step_settings != pipeline_settings:
            build = BuildConfiguration(
                key=ORCHESTRATOR_DOCKER_IMAGE_KEY,
                settings=step_settings,
                step_name=name,
            )
            builds.append(build)
        elif not included_pipeline_build:
            pipeline_build = BuildConfiguration(
                key=ORCHESTRATOR_DOCKER_IMAGE_KEY,
                settings=pipeline_settings,
            )
            builds.append(pipeline_build)
            included_pipeline_build = True

    return builds
get_image(deployment, step_name=None) staticmethod

Gets the Docker image for the pipeline/a step.

Parameters:

Name Type Description Default
deployment PipelineDeploymentResponseModel

The deployment from which to get the image.

required
step_name Optional[str]

Pipeline step name for which to get the image. If not given the generic pipeline image will be returned.

None

Exceptions:

Type Description
RuntimeError

If the deployment does not have an associated build.

Returns:

Type Description
str

The image name or digest.

Source code in zenml/orchestrators/containerized_orchestrator.py
@staticmethod
def get_image(
    deployment: "PipelineDeploymentResponseModel",
    step_name: Optional[str] = None,
) -> str:
    """Gets the Docker image for the pipeline/a step.

    Args:
        deployment: The deployment from which to get the image.
        step_name: Pipeline step name for which to get the image. If not
            given the generic pipeline image will be returned.

    Raises:
        RuntimeError: If the deployment does not have an associated build.

    Returns:
        The image name or digest.
    """
    if not deployment.build:
        raise RuntimeError(
            f"Missing build for deployment {deployment.id}. This is "
            "probably because the build was manually deleted."
        )

    return deployment.build.get_image(
        component_key=ORCHESTRATOR_DOCKER_IMAGE_KEY, step=step_name
    )

dag_runner

DAG (Directed Acyclic Graph) Runners.

NodeStatus (Enum)

Status of the execution of a node.

Source code in zenml/orchestrators/dag_runner.py
class NodeStatus(Enum):
    """Status of the execution of a node."""

    WAITING = "Waiting"
    RUNNING = "Running"
    COMPLETED = "Completed"

ThreadedDagRunner

Multi-threaded DAG Runner.

This class expects a DAG of strings in adjacency list representation, as well as a custom run_fn as input, then calls run_fn(node) for each string node in the DAG.

Steps that can be executed in parallel will be started in separate threads.

Source code in zenml/orchestrators/dag_runner.py
class ThreadedDagRunner:
    """Multi-threaded DAG Runner.

    This class expects a DAG of strings in adjacency list representation, as
    well as a custom `run_fn` as input, then calls `run_fn(node)` for each
    string node in the DAG.

    Steps that can be executed in parallel will be started in separate threads.
    """

    def __init__(
        self, dag: Dict[str, List[str]], run_fn: Callable[[str], Any]
    ) -> None:
        """Define attributes and initialize all nodes in waiting state.

        Args:
            dag: Adjacency list representation of a DAG.
                E.g.: [(1->2), (1->3), (2->4), (3->4)] should be represented as
                `dag={2: [1], 3: [1], 4: [2, 3]}`
            run_fn: A function `run_fn(node)` that runs a single node
        """
        self.dag = dag
        self.reversed_dag = reverse_dag(dag)
        self.run_fn = run_fn
        self.nodes = dag.keys()
        self.node_states = {node: NodeStatus.WAITING for node in self.nodes}
        self._lock = threading.Lock()

    def _can_run(self, node: str) -> bool:
        """Determine whether a node is ready to be run.

        This is the case if the node has not run yet and all of its upstream
        node have already completed.

        Args:
            node: The node.

        Returns:
            True if the node can run else False.
        """
        # Check that node has not run yet.
        if not self.node_states[node] == NodeStatus.WAITING:
            return False

        # Check that all upstream nodes of this node have already completed.
        for upstream_node in self.dag[node]:
            if not self.node_states[upstream_node] == NodeStatus.COMPLETED:
                return False

        return True

    def _run_node(self, node: str) -> None:
        """Run a single node.

        Calls the user-defined run_fn, then calls `self._finish_node`.

        Args:
            node: The node.
        """
        self.run_fn(node)
        self._finish_node(node)

    def _run_node_in_thread(self, node: str) -> threading.Thread:
        """Run a single node in a separate thread.

        First updates the node status to running.
        Then calls self._run_node() in a new thread and returns the thread.

        Args:
            node: The node.

        Returns:
            The thread in which the node was run.
        """
        # Update node status to running.
        assert self.node_states[node] == NodeStatus.WAITING
        with self._lock:
            self.node_states[node] = NodeStatus.RUNNING

        # Run node in new thread.
        thread = threading.Thread(target=self._run_node, args=(node,))
        thread.start()
        return thread

    def _finish_node(self, node: str) -> None:
        """Finish a node run.

        First updates the node status to completed.
        Then starts all other nodes that can now be run and waits for them.

        Args:
            node: The node.
        """
        # Update node status to completed.
        assert self.node_states[node] == NodeStatus.RUNNING
        with self._lock:
            self.node_states[node] = NodeStatus.COMPLETED

        # Run downstream nodes.
        threads = []
        for downstram_node in self.reversed_dag[node]:
            if self._can_run(downstram_node):
                thread = self._run_node_in_thread(downstram_node)
                threads.append(thread)

        # Wait for all downstream nodes to complete.
        for thread in threads:
            thread.join()

    def run(self) -> None:
        """Call `self.run_fn` on all nodes in `self.dag`.

        The order of execution is determined using topological sort.
        Each node is run in a separate thread to enable parallelism.
        """
        # Run all nodes that can be started immediately.
        # These will, in turn, start other nodes once all of their respective
        # upstream nodes have completed.
        threads = []
        for node in self.nodes:
            if self._can_run(node):
                thread = self._run_node_in_thread(node)
                threads.append(thread)

        # Wait till all nodes have completed.
        for thread in threads:
            thread.join()

        # Make sure all nodes were run, otherwise print a warning.
        for node in self.nodes:
            if self.node_states[node] == NodeStatus.WAITING:
                upstream_nodes = self.dag[node]
                logger.warning(
                    f"Node `{node}` was never run, because it was still"
                    f" waiting for the following nodes: `{upstream_nodes}`."
                )
__init__(self, dag, run_fn) special

Define attributes and initialize all nodes in waiting state.

Parameters:

Name Type Description Default
dag Dict[str, List[str]]

Adjacency list representation of a DAG. E.g.: [(1->2), (1->3), (2->4), (3->4)] should be represented as dag={2: [1], 3: [1], 4: [2, 3]}

required
run_fn Callable[[str], Any]

A function run_fn(node) that runs a single node

required
Source code in zenml/orchestrators/dag_runner.py
def __init__(
    self, dag: Dict[str, List[str]], run_fn: Callable[[str], Any]
) -> None:
    """Define attributes and initialize all nodes in waiting state.

    Args:
        dag: Adjacency list representation of a DAG.
            E.g.: [(1->2), (1->3), (2->4), (3->4)] should be represented as
            `dag={2: [1], 3: [1], 4: [2, 3]}`
        run_fn: A function `run_fn(node)` that runs a single node
    """
    self.dag = dag
    self.reversed_dag = reverse_dag(dag)
    self.run_fn = run_fn
    self.nodes = dag.keys()
    self.node_states = {node: NodeStatus.WAITING for node in self.nodes}
    self._lock = threading.Lock()
run(self)

Call self.run_fn on all nodes in self.dag.

The order of execution is determined using topological sort. Each node is run in a separate thread to enable parallelism.

Source code in zenml/orchestrators/dag_runner.py
def run(self) -> None:
    """Call `self.run_fn` on all nodes in `self.dag`.

    The order of execution is determined using topological sort.
    Each node is run in a separate thread to enable parallelism.
    """
    # Run all nodes that can be started immediately.
    # These will, in turn, start other nodes once all of their respective
    # upstream nodes have completed.
    threads = []
    for node in self.nodes:
        if self._can_run(node):
            thread = self._run_node_in_thread(node)
            threads.append(thread)

    # Wait till all nodes have completed.
    for thread in threads:
        thread.join()

    # Make sure all nodes were run, otherwise print a warning.
    for node in self.nodes:
        if self.node_states[node] == NodeStatus.WAITING:
            upstream_nodes = self.dag[node]
            logger.warning(
                f"Node `{node}` was never run, because it was still"
                f" waiting for the following nodes: `{upstream_nodes}`."
            )

reverse_dag(dag)

Reverse a DAG.

Parameters:

Name Type Description Default
dag Dict[str, List[str]]

Adjacency list representation of a DAG.

required

Returns:

Type Description
Dict[str, List[str]]

Adjacency list representation of the reversed DAG.

Source code in zenml/orchestrators/dag_runner.py
def reverse_dag(dag: Dict[str, List[str]]) -> Dict[str, List[str]]:
    """Reverse a DAG.

    Args:
        dag: Adjacency list representation of a DAG.

    Returns:
        Adjacency list representation of the reversed DAG.
    """
    reversed_dag = defaultdict(list)

    # Reverse all edges in the graph.
    for node, upstream_nodes in dag.items():
        for upstream_node in upstream_nodes:
            reversed_dag[upstream_node].append(node)

    # Add nodes without incoming edges back in.
    for node in dag:
        if node not in reversed_dag:
            reversed_dag[node] = []

    return reversed_dag

input_utils

Utilities for inputs.

resolve_step_inputs(step, run_id)

Resolves inputs for the current step.

Parameters:

Name Type Description Default
step Step

The step for which to resolve the inputs.

required
run_id UUID

The ID of the current pipeline run.

required

Exceptions:

Type Description
InputResolutionError

If input resolving failed due to a missing step or output.

Returns:

Type Description
Tuple[Dict[str, ArtifactResponseModel], List[uuid.UUID]]

The IDs of the input artifacts and the IDs of parent steps of the current step.

Source code in zenml/orchestrators/input_utils.py
def resolve_step_inputs(
    step: "Step", run_id: UUID
) -> Tuple[Dict[str, "ArtifactResponseModel"], List[UUID]]:
    """Resolves inputs for the current step.

    Args:
        step: The step for which to resolve the inputs.
        run_id: The ID of the current pipeline run.

    Raises:
        InputResolutionError: If input resolving failed due to a missing
            step or output.

    Returns:
        The IDs of the input artifacts and the IDs of parent steps of the
        current step.
    """
    current_run_steps = {
        run_step.step.config.name: run_step
        for run_step in Client()
        .zen_store.list_run_steps(StepRunFilterModel(pipeline_run_id=run_id))
        .items
    }

    input_artifacts: Dict[str, "ArtifactResponseModel"] = {}
    for name, input_ in step.spec.inputs.items():
        try:
            step_run = current_run_steps[input_.step_name]
        except KeyError:
            raise InputResolutionError(
                f"No step `{input_.step_name}` found in current run."
            )

        try:
            artifact = step_run.output_artifacts[input_.output_name]
        except KeyError:
            raise InputResolutionError(
                f"No output `{input_.output_name}` found for step "
                f"`{input_.step_name}`."
            )

        input_artifacts[name] = artifact

    parent_step_ids = [
        current_run_steps[upstream_step].id
        for upstream_step in step.spec.upstream_steps
    ]

    return input_artifacts, parent_step_ids

local special

Initialization for the local orchestrator.

local_orchestrator

Implementation of the ZenML local orchestrator.

LocalOrchestrator (BaseOrchestrator)

Orchestrator responsible for running pipelines locally.

This orchestrator does not allow for concurrent execution of steps and also does not support running on a schedule.

Source code in zenml/orchestrators/local/local_orchestrator.py
class LocalOrchestrator(BaseOrchestrator):
    """Orchestrator responsible for running pipelines locally.

    This orchestrator does not allow for concurrent execution of steps and also
    does not support running on a schedule.
    """

    _orchestrator_run_id: Optional[str] = None

    def prepare_or_run_pipeline(
        self,
        deployment: "PipelineDeploymentResponseModel",
        stack: "Stack",
    ) -> Any:
        """Iterates through all steps and executes them sequentially.

        Args:
            deployment: The pipeline deployment to prepare or run.
            stack: The stack on which the pipeline is deployed.
        """
        if deployment.schedule:
            logger.warning(
                "Local Orchestrator currently does not support the "
                "use of schedules. The `schedule` will be ignored "
                "and the pipeline will be run immediately."
            )

        self._orchestrator_run_id = str(uuid4())
        start_time = time.time()

        # Run each step
        for step in deployment.step_configurations.values():
            if self.requires_resources_in_orchestration_environment(step):
                logger.warning(
                    "Specifying step resources is not supported for the local "
                    "orchestrator, ignoring resource configuration for "
                    "step %s.",
                    step.config.name,
                )

            self.run_step(
                step=step,
            )

        run_duration = time.time() - start_time
        run_id = orchestrator_utils.get_run_id_for_orchestrator_run_id(
            orchestrator=self, orchestrator_run_id=self._orchestrator_run_id
        )
        run_model = Client().zen_store.get_run(run_id)
        logger.info(
            "Pipeline run `%s` has finished in %s.",
            run_model.name,
            string_utils.get_human_readable_time(run_duration),
        )
        self._orchestrator_run_id = None

    def get_orchestrator_run_id(self) -> str:
        """Returns the active orchestrator run id.

        Raises:
            RuntimeError: If no run id exists. This happens when this method
                gets called while the orchestrator is not running a pipeline.

        Returns:
            The orchestrator run id.
        """
        if not self._orchestrator_run_id:
            raise RuntimeError("No run id set.")

        return self._orchestrator_run_id
get_orchestrator_run_id(self)

Returns the active orchestrator run id.

Exceptions:

Type Description
RuntimeError

If no run id exists. This happens when this method gets called while the orchestrator is not running a pipeline.

Returns:

Type Description
str

The orchestrator run id.

Source code in zenml/orchestrators/local/local_orchestrator.py
def get_orchestrator_run_id(self) -> str:
    """Returns the active orchestrator run id.

    Raises:
        RuntimeError: If no run id exists. This happens when this method
            gets called while the orchestrator is not running a pipeline.

    Returns:
        The orchestrator run id.
    """
    if not self._orchestrator_run_id:
        raise RuntimeError("No run id set.")

    return self._orchestrator_run_id
prepare_or_run_pipeline(self, deployment, stack)

Iterates through all steps and executes them sequentially.

Parameters:

Name Type Description Default
deployment PipelineDeploymentResponseModel

The pipeline deployment to prepare or run.

required
stack Stack

The stack on which the pipeline is deployed.

required
Source code in zenml/orchestrators/local/local_orchestrator.py
def prepare_or_run_pipeline(
    self,
    deployment: "PipelineDeploymentResponseModel",
    stack: "Stack",
) -> Any:
    """Iterates through all steps and executes them sequentially.

    Args:
        deployment: The pipeline deployment to prepare or run.
        stack: The stack on which the pipeline is deployed.
    """
    if deployment.schedule:
        logger.warning(
            "Local Orchestrator currently does not support the "
            "use of schedules. The `schedule` will be ignored "
            "and the pipeline will be run immediately."
        )

    self._orchestrator_run_id = str(uuid4())
    start_time = time.time()

    # Run each step
    for step in deployment.step_configurations.values():
        if self.requires_resources_in_orchestration_environment(step):
            logger.warning(
                "Specifying step resources is not supported for the local "
                "orchestrator, ignoring resource configuration for "
                "step %s.",
                step.config.name,
            )

        self.run_step(
            step=step,
        )

    run_duration = time.time() - start_time
    run_id = orchestrator_utils.get_run_id_for_orchestrator_run_id(
        orchestrator=self, orchestrator_run_id=self._orchestrator_run_id
    )
    run_model = Client().zen_store.get_run(run_id)
    logger.info(
        "Pipeline run `%s` has finished in %s.",
        run_model.name,
        string_utils.get_human_readable_time(run_duration),
    )
    self._orchestrator_run_id = None
LocalOrchestratorConfig (BaseOrchestratorConfig) pydantic-model

Local orchestrator config.

Source code in zenml/orchestrators/local/local_orchestrator.py
class LocalOrchestratorConfig(BaseOrchestratorConfig):
    """Local orchestrator config."""

    @property
    def is_local(self) -> bool:
        """Checks if this stack component is running locally.

        This designation is used to determine if the stack component can be
        shared with other users or if it is only usable on the local host.

        Returns:
            True if this config is for a local component, False otherwise.
        """
        return True
is_local: bool property readonly

Checks if this stack component is running locally.

This designation is used to determine if the stack component can be shared with other users or if it is only usable on the local host.

Returns:

Type Description
bool

True if this config is for a local component, False otherwise.

LocalOrchestratorFlavor (BaseOrchestratorFlavor)

Class for the LocalOrchestratorFlavor.

Source code in zenml/orchestrators/local/local_orchestrator.py
class LocalOrchestratorFlavor(BaseOrchestratorFlavor):
    """Class for the `LocalOrchestratorFlavor`."""

    @property
    def name(self) -> str:
        """The flavor name.

        Returns:
            The flavor name.
        """
        return "local"

    @property
    def docs_url(self) -> Optional[str]:
        """A url to point at docs explaining this flavor.

        Returns:
            A flavor docs url.
        """
        return self.generate_default_docs_url()

    @property
    def sdk_docs_url(self) -> Optional[str]:
        """A url to point at SDK docs explaining this flavor.

        Returns:
            A flavor SDK docs url.
        """
        return self.generate_default_sdk_docs_url()

    @property
    def logo_url(self) -> str:
        """A url to represent the flavor in the dashboard.

        Returns:
            The flavor logo.
        """
        return "https://public-flavor-logos.s3.eu-central-1.amazonaws.com/orchestrator/local.png"

    @property
    def config_class(self) -> Type[BaseOrchestratorConfig]:
        """Config class for the base orchestrator flavor.

        Returns:
            The config class.
        """
        return LocalOrchestratorConfig

    @property
    def implementation_class(self) -> Type[LocalOrchestrator]:
        """Implementation class for this flavor.

        Returns:
            The implementation class for this flavor.
        """
        return LocalOrchestrator
config_class: Type[zenml.orchestrators.base_orchestrator.BaseOrchestratorConfig] property readonly

Config class for the base orchestrator flavor.

Returns:

Type Description
Type[zenml.orchestrators.base_orchestrator.BaseOrchestratorConfig]

The config class.

docs_url: Optional[str] property readonly

A url to point at docs explaining this flavor.

Returns:

Type Description
Optional[str]

A flavor docs url.

implementation_class: Type[zenml.orchestrators.local.local_orchestrator.LocalOrchestrator] property readonly

Implementation class for this flavor.

Returns:

Type Description
Type[zenml.orchestrators.local.local_orchestrator.LocalOrchestrator]

The implementation class for this flavor.

logo_url: str property readonly

A url to represent the flavor in the dashboard.

Returns:

Type Description
str

The flavor logo.

name: str property readonly

The flavor name.

Returns:

Type Description
str

The flavor name.

sdk_docs_url: Optional[str] property readonly

A url to point at SDK docs explaining this flavor.

Returns:

Type Description
Optional[str]

A flavor SDK docs url.

local_docker special

Initialization for the local Docker orchestrator.

local_docker_orchestrator

Implementation of the ZenML local Docker orchestrator.

LocalDockerOrchestrator (ContainerizedOrchestrator)

Orchestrator responsible for running pipelines locally using Docker.

This orchestrator does not allow for concurrent execution of steps and also does not support running on a schedule.

Source code in zenml/orchestrators/local_docker/local_docker_orchestrator.py
class LocalDockerOrchestrator(ContainerizedOrchestrator):
    """Orchestrator responsible for running pipelines locally using Docker.

    This orchestrator does not allow for concurrent execution of steps and also
    does not support running on a schedule.
    """

    @property
    def settings_class(self) -> Optional[Type["BaseSettings"]]:
        """Settings class for the Local Docker orchestrator.

        Returns:
            The settings class.
        """
        return LocalDockerOrchestratorSettings

    @property
    def validator(self) -> Optional[StackValidator]:
        """Ensures there is an image builder in the stack.

        Returns:
            A `StackValidator` instance.
        """
        return StackValidator(
            required_components={StackComponentType.IMAGE_BUILDER}
        )

    def get_orchestrator_run_id(self) -> str:
        """Returns the active orchestrator run id.

        Raises:
            RuntimeError: If the environment variable specifying the run id
                is not set.

        Returns:
            The orchestrator run id.
        """
        try:
            return os.environ[ENV_ZENML_DOCKER_ORCHESTRATOR_RUN_ID]
        except KeyError:
            raise RuntimeError(
                "Unable to read run id from environment variable "
                f"{ENV_ZENML_DOCKER_ORCHESTRATOR_RUN_ID}."
            )

    def prepare_or_run_pipeline(
        self,
        deployment: "PipelineDeploymentResponseModel",
        stack: "Stack",
    ) -> Any:
        """Sequentially runs all pipeline steps in local Docker containers.

        Args:
            deployment: The pipeline deployment to prepare or run.
            stack: The stack the pipeline will run on.
        """
        if deployment.schedule:
            logger.warning(
                "Local Docker Orchestrator currently does not support the"
                "use of schedules. The `schedule` will be ignored "
                "and the pipeline will be run immediately."
            )

        from docker.client import DockerClient

        docker_client = DockerClient.from_env()
        entrypoint = StepEntrypointConfiguration.get_entrypoint_command()

        # Add the local stores path as a volume mount
        stack.check_local_paths()
        local_stores_path = GlobalConfiguration().local_stores_path
        volumes = {
            local_stores_path: {
                "bind": local_stores_path,
                "mode": "rw",
            }
        }
        orchestrator_run_id = str(uuid4())
        environment = {
            ENV_ZENML_DOCKER_ORCHESTRATOR_RUN_ID: orchestrator_run_id,
            ENV_ZENML_LOCAL_STORES_PATH: local_stores_path,
        }
        start_time = time.time()

        # Run each step
        for step_name, step in deployment.step_configurations.items():
            if self.requires_resources_in_orchestration_environment(step):
                logger.warning(
                    "Specifying step resources is not supported for the local "
                    "Docker orchestrator, ignoring resource configuration for "
                    "step %s.",
                    step.config.name,
                )

            arguments = StepEntrypointConfiguration.get_entrypoint_arguments(
                step_name=step_name, deployment_id=deployment.id
            )

            settings = cast(
                LocalDockerOrchestratorSettings,
                self.get_settings(step),
            )
            image = self.get_image(deployment=deployment, step_name=step_name)

            user = None
            if sys.platform != "win32":
                user = os.getuid()
            logger.info("Running step `%s` in Docker:", step_name)
            logs = docker_client.containers.run(
                image=image,
                entrypoint=entrypoint,
                command=arguments,
                user=user,
                volumes=volumes,
                environment=environment,
                stream=True,
                extra_hosts={"host.docker.internal": "host-gateway"},
                **settings.run_args,
            )

            for line in logs:
                logger.info(line.strip().decode())

        run_duration = time.time() - start_time
        run_id = orchestrator_utils.get_run_id_for_orchestrator_run_id(
            orchestrator=self, orchestrator_run_id=orchestrator_run_id
        )
        run_model = Client().zen_store.get_run(run_id)
        logger.info(
            "Pipeline run `%s` has finished in %s.",
            run_model.name,
            string_utils.get_human_readable_time(run_duration),
        )
settings_class: Optional[Type[BaseSettings]] property readonly

Settings class for the Local Docker orchestrator.

Returns:

Type Description
Optional[Type[BaseSettings]]

The settings class.

validator: Optional[zenml.stack.stack_validator.StackValidator] property readonly

Ensures there is an image builder in the stack.

Returns:

Type Description
Optional[zenml.stack.stack_validator.StackValidator]

A StackValidator instance.

get_orchestrator_run_id(self)

Returns the active orchestrator run id.

Exceptions:

Type Description
RuntimeError

If the environment variable specifying the run id is not set.

Returns:

Type Description
str

The orchestrator run id.

Source code in zenml/orchestrators/local_docker/local_docker_orchestrator.py
def get_orchestrator_run_id(self) -> str:
    """Returns the active orchestrator run id.

    Raises:
        RuntimeError: If the environment variable specifying the run id
            is not set.

    Returns:
        The orchestrator run id.
    """
    try:
        return os.environ[ENV_ZENML_DOCKER_ORCHESTRATOR_RUN_ID]
    except KeyError:
        raise RuntimeError(
            "Unable to read run id from environment variable "
            f"{ENV_ZENML_DOCKER_ORCHESTRATOR_RUN_ID}."
        )
prepare_or_run_pipeline(self, deployment, stack)

Sequentially runs all pipeline steps in local Docker containers.

Parameters:

Name Type Description Default
deployment PipelineDeploymentResponseModel

The pipeline deployment to prepare or run.

required
stack Stack

The stack the pipeline will run on.

required
Source code in zenml/orchestrators/local_docker/local_docker_orchestrator.py
def prepare_or_run_pipeline(
    self,
    deployment: "PipelineDeploymentResponseModel",
    stack: "Stack",
) -> Any:
    """Sequentially runs all pipeline steps in local Docker containers.

    Args:
        deployment: The pipeline deployment to prepare or run.
        stack: The stack the pipeline will run on.
    """
    if deployment.schedule:
        logger.warning(
            "Local Docker Orchestrator currently does not support the"
            "use of schedules. The `schedule` will be ignored "
            "and the pipeline will be run immediately."
        )

    from docker.client import DockerClient

    docker_client = DockerClient.from_env()
    entrypoint = StepEntrypointConfiguration.get_entrypoint_command()

    # Add the local stores path as a volume mount
    stack.check_local_paths()
    local_stores_path = GlobalConfiguration().local_stores_path
    volumes = {
        local_stores_path: {
            "bind": local_stores_path,
            "mode": "rw",
        }
    }
    orchestrator_run_id = str(uuid4())
    environment = {
        ENV_ZENML_DOCKER_ORCHESTRATOR_RUN_ID: orchestrator_run_id,
        ENV_ZENML_LOCAL_STORES_PATH: local_stores_path,
    }
    start_time = time.time()

    # Run each step
    for step_name, step in deployment.step_configurations.items():
        if self.requires_resources_in_orchestration_environment(step):
            logger.warning(
                "Specifying step resources is not supported for the local "
                "Docker orchestrator, ignoring resource configuration for "
                "step %s.",
                step.config.name,
            )

        arguments = StepEntrypointConfiguration.get_entrypoint_arguments(
            step_name=step_name, deployment_id=deployment.id
        )

        settings = cast(
            LocalDockerOrchestratorSettings,
            self.get_settings(step),
        )
        image = self.get_image(deployment=deployment, step_name=step_name)

        user = None
        if sys.platform != "win32":
            user = os.getuid()
        logger.info("Running step `%s` in Docker:", step_name)
        logs = docker_client.containers.run(
            image=image,
            entrypoint=entrypoint,
            command=arguments,
            user=user,
            volumes=volumes,
            environment=environment,
            stream=True,
            extra_hosts={"host.docker.internal": "host-gateway"},
            **settings.run_args,
        )

        for line in logs:
            logger.info(line.strip().decode())

    run_duration = time.time() - start_time
    run_id = orchestrator_utils.get_run_id_for_orchestrator_run_id(
        orchestrator=self, orchestrator_run_id=orchestrator_run_id
    )
    run_model = Client().zen_store.get_run(run_id)
    logger.info(
        "Pipeline run `%s` has finished in %s.",
        run_model.name,
        string_utils.get_human_readable_time(run_duration),
    )
LocalDockerOrchestratorConfig (BaseOrchestratorConfig, LocalDockerOrchestratorSettings) pydantic-model

Local Docker orchestrator config.

Source code in zenml/orchestrators/local_docker/local_docker_orchestrator.py
class LocalDockerOrchestratorConfig(  # type: ignore[misc] # https://github.com/pydantic/pydantic/issues/4173
    BaseOrchestratorConfig, LocalDockerOrchestratorSettings
):
    """Local Docker orchestrator config."""

    @property
    def is_local(self) -> bool:
        """Checks if this stack component is running locally.

        This designation is used to determine if the stack component can be
        shared with other users or if it is only usable on the local host.

        Returns:
            True if this config is for a local component, False otherwise.
        """
        return True
is_local: bool property readonly

Checks if this stack component is running locally.

This designation is used to determine if the stack component can be shared with other users or if it is only usable on the local host.

Returns:

Type Description
bool

True if this config is for a local component, False otherwise.

LocalDockerOrchestratorFlavor (BaseOrchestratorFlavor)

Flavor for the local Docker orchestrator.

Source code in zenml/orchestrators/local_docker/local_docker_orchestrator.py
class LocalDockerOrchestratorFlavor(BaseOrchestratorFlavor):
    """Flavor for the local Docker orchestrator."""

    @property
    def name(self) -> str:
        """Name of the orchestrator flavor.

        Returns:
            Name of the orchestrator flavor.
        """
        return "local_docker"

    @property
    def docs_url(self) -> Optional[str]:
        """A url to point at docs explaining this flavor.

        Returns:
            A flavor docs url.
        """
        return self.generate_default_docs_url()

    @property
    def sdk_docs_url(self) -> Optional[str]:
        """A url to point at SDK docs explaining this flavor.

        Returns:
            A flavor SDK docs url.
        """
        return self.generate_default_sdk_docs_url()

    @property
    def logo_url(self) -> str:
        """A url to represent the flavor in the dashboard.

        Returns:
            The flavor logo.
        """
        return "https://public-flavor-logos.s3.eu-central-1.amazonaws.com/orchestrator/docker.png"

    @property
    def config_class(self) -> Type[BaseOrchestratorConfig]:
        """Config class for the base orchestrator flavor.

        Returns:
            The config class.
        """
        return LocalDockerOrchestratorConfig

    @property
    def implementation_class(self) -> Type["LocalDockerOrchestrator"]:
        """Implementation class for this flavor.

        Returns:
            Implementation class for this flavor.
        """
        return LocalDockerOrchestrator
config_class: Type[zenml.orchestrators.base_orchestrator.BaseOrchestratorConfig] property readonly

Config class for the base orchestrator flavor.

Returns:

Type Description
Type[zenml.orchestrators.base_orchestrator.BaseOrchestratorConfig]

The config class.

docs_url: Optional[str] property readonly

A url to point at docs explaining this flavor.

Returns:

Type Description
Optional[str]

A flavor docs url.

implementation_class: Type[LocalDockerOrchestrator] property readonly

Implementation class for this flavor.

Returns:

Type Description
Type[LocalDockerOrchestrator]

Implementation class for this flavor.

logo_url: str property readonly

A url to represent the flavor in the dashboard.

Returns:

Type Description
str

The flavor logo.

name: str property readonly

Name of the orchestrator flavor.

Returns:

Type Description
str

Name of the orchestrator flavor.

sdk_docs_url: Optional[str] property readonly

A url to point at SDK docs explaining this flavor.

Returns:

Type Description
Optional[str]

A flavor SDK docs url.

LocalDockerOrchestratorSettings (BaseSettings) pydantic-model

Local Docker orchestrator settings.

Attributes:

Name Type Description
run_args Dict[str, Any]

Arguments to pass to the docker run call.

Source code in zenml/orchestrators/local_docker/local_docker_orchestrator.py
class LocalDockerOrchestratorSettings(BaseSettings):
    """Local Docker orchestrator settings.

    Attributes:
        run_args: Arguments to pass to the `docker run` call.
    """

    run_args: Dict[str, Any] = {}

    @validator("run_args", pre=True)
    def _convert_json_string(
        cls, value: Union[None, str, Dict[str, Any]]
    ) -> Optional[Dict[str, Any]]:
        """Converts potential JSON strings passed via the CLI to dictionaries.

        Args:
            value: The value to convert.

        Returns:
            The converted value.

        Raises:
            TypeError: If the value is not a `str`, `Dict` or `None`.
            ValueError: If the value is an invalid json string or a json string
                that does not decode into a dictionary.
        """
        if isinstance(value, str):
            try:
                dict_ = json.loads(value)
            except json.JSONDecodeError as e:
                raise ValueError(f"Invalid json string '{value}'") from e

            if not isinstance(dict_, Dict):
                raise ValueError(
                    f"Json string '{value}' did not decode into a dictionary."
                )

            return dict_
        elif isinstance(value, Dict) or value is None:
            return value
        else:
            raise TypeError(f"{value} is not a json string or a dictionary.")

output_utils

Utilities for outputs.

generate_artifact_uri(artifact_store, step_run, output_name)

Generates a URI for an output artifact.

Parameters:

Name Type Description Default
artifact_store BaseArtifactStore

The artifact store on which the artifact will be stored.

required
step_run StepRunResponseModel

The step run that created the artifact.

required
output_name str

The name of the output in the step run for this artifact.

required

Returns:

Type Description
str

The URI of the output artifact.

Source code in zenml/orchestrators/output_utils.py
def generate_artifact_uri(
    artifact_store: "BaseArtifactStore",
    step_run: "StepRunResponseModel",
    output_name: str,
) -> str:
    """Generates a URI for an output artifact.

    Args:
        artifact_store: The artifact store on which the artifact will be stored.
        step_run: The step run that created the artifact.
        output_name: The name of the output in the step run for this artifact.

    Returns:
        The URI of the output artifact.
    """
    return os.path.join(
        artifact_store.path,
        step_run.step.config.name,
        output_name,
        str(step_run.id),
    )

prepare_output_artifact_uris(step_run, stack, step)

Prepares the output artifact URIs to run the current step.

Parameters:

Name Type Description Default
step_run StepRunResponseModel

The step run for which to prepare the artifact URIs.

required
stack Stack

The stack on which the pipeline is running.

required
step Step

The step configuration.

required

Exceptions:

Type Description
RuntimeError

If an artifact URI already exists.

Returns:

Type Description
Dict[str, str]

A dictionary mapping output names to artifact URIs.

Source code in zenml/orchestrators/output_utils.py
def prepare_output_artifact_uris(
    step_run: "StepRunResponseModel", stack: "Stack", step: "Step"
) -> Dict[str, str]:
    """Prepares the output artifact URIs to run the current step.

    Args:
        step_run: The step run for which to prepare the artifact URIs.
        stack: The stack on which the pipeline is running.
        step: The step configuration.

    Raises:
        RuntimeError: If an artifact URI already exists.

    Returns:
        A dictionary mapping output names to artifact URIs.
    """
    output_artifact_uris: Dict[str, str] = {}
    for output_name in step.config.outputs.keys():
        artifact_uri = generate_artifact_uri(
            artifact_store=stack.artifact_store,
            step_run=step_run,
            output_name=output_name,
        )
        if fileio.exists(artifact_uri):
            raise RuntimeError("Artifact already exists")
        fileio.makedirs(artifact_uri)
        output_artifact_uris[output_name] = artifact_uri
    return output_artifact_uris

remove_artifact_dirs(artifact_uris)

Removes the artifact directories.

Parameters:

Name Type Description Default
artifact_uris Sequence[str]

URIs of the artifacts to remove the directories for.

required
Source code in zenml/orchestrators/output_utils.py
def remove_artifact_dirs(artifact_uris: Sequence[str]) -> None:
    """Removes the artifact directories.

    Args:
        artifact_uris: URIs of the artifacts to remove the directories for.
    """
    for artifact_uri in artifact_uris:
        if fileio.isdir(artifact_uri):
            fileio.rmtree(artifact_uri)

publish_utils

Utilities to publish pipeline and step runs.

get_pipeline_run_status(step_statuses, num_steps)

Gets the pipeline run status for the given step statuses.

Parameters:

Name Type Description Default
step_statuses List[zenml.enums.ExecutionStatus]

The status of steps in this run.

required
num_steps int

The total amount of steps in this run.

required

Returns:

Type Description
ExecutionStatus

The run status.

Source code in zenml/orchestrators/publish_utils.py
def get_pipeline_run_status(
    step_statuses: List[ExecutionStatus], num_steps: int
) -> ExecutionStatus:
    """Gets the pipeline run status for the given step statuses.

    Args:
        step_statuses: The status of steps in this run.
        num_steps: The total amount of steps in this run.

    Returns:
        The run status.
    """
    if ExecutionStatus.FAILED in step_statuses:
        return ExecutionStatus.FAILED
    if (
        ExecutionStatus.RUNNING in step_statuses
        or len(step_statuses) < num_steps
    ):
        return ExecutionStatus.RUNNING

    return ExecutionStatus.COMPLETED

publish_failed_pipeline_run(pipeline_run_id)

Publishes a failed pipeline run.

Parameters:

Name Type Description Default
pipeline_run_id UUID

The ID of the pipeline run to update.

required

Returns:

Type Description
PipelineRunResponseModel

The updated pipeline run.

Source code in zenml/orchestrators/publish_utils.py
def publish_failed_pipeline_run(
    pipeline_run_id: "UUID",
) -> "PipelineRunResponseModel":
    """Publishes a failed pipeline run.

    Args:
        pipeline_run_id: The ID of the pipeline run to update.

    Returns:
        The updated pipeline run.
    """
    return Client().zen_store.update_run(
        run_id=pipeline_run_id,
        run_update=PipelineRunUpdateModel(
            status=ExecutionStatus.FAILED,
            end_time=datetime.utcnow(),
        ),
    )

publish_failed_step_run(step_run_id)

Publishes a failed step run.

Parameters:

Name Type Description Default
step_run_id UUID

The ID of the step run to update.

required

Returns:

Type Description
StepRunResponseModel

The updated step run.

Source code in zenml/orchestrators/publish_utils.py
def publish_failed_step_run(step_run_id: "UUID") -> "StepRunResponseModel":
    """Publishes a failed step run.

    Args:
        step_run_id: The ID of the step run to update.

    Returns:
        The updated step run.
    """
    return Client().zen_store.update_run_step(
        step_run_id=step_run_id,
        step_run_update=StepRunUpdateModel(
            status=ExecutionStatus.FAILED,
            end_time=datetime.utcnow(),
        ),
    )

publish_output_artifact_metadata(output_artifact_ids, output_artifact_metadata)

Publishes the given output artifact metadata.

Parameters:

Name Type Description Default
output_artifact_ids Dict[str, UUID]

The IDs of the output artifacts.

required
output_artifact_metadata Dict[str, Dict[str, MetadataType]]

A mapping from output names to metadata.

required
Source code in zenml/orchestrators/publish_utils.py
def publish_output_artifact_metadata(
    output_artifact_ids: Dict[str, "UUID"],
    output_artifact_metadata: Dict[str, Dict[str, "MetadataType"]],
) -> None:
    """Publishes the given output artifact metadata.

    Args:
        output_artifact_ids: The IDs of the output artifacts.
        output_artifact_metadata: A mapping from output names to metadata.
    """
    client = Client()
    for output_name, artifact_metadata in output_artifact_metadata.items():
        artifact_id = output_artifact_ids[output_name]
        client.create_run_metadata(
            metadata=artifact_metadata, artifact_id=artifact_id
        )

publish_output_artifacts(output_artifacts)

Publishes the given output artifacts.

Parameters:

Name Type Description Default
output_artifacts Dict[str, ArtifactRequestModel]

The output artifacts to register.

required

Returns:

Type Description
Dict[str, UUID]

The IDs of the registered output artifacts.

Source code in zenml/orchestrators/publish_utils.py
def publish_output_artifacts(
    output_artifacts: Dict[str, "ArtifactRequestModel"]
) -> Dict[str, "UUID"]:
    """Publishes the given output artifacts.

    Args:
        output_artifacts: The output artifacts to register.

    Returns:
        The IDs of the registered output artifacts.
    """
    output_artifact_ids = {}
    client = Client()
    for name, artifact_model in output_artifacts.items():
        artifact_response = client.zen_store.create_artifact(artifact_model)
        output_artifact_ids[name] = artifact_response.id
    return output_artifact_ids

publish_pipeline_run_metadata(pipeline_run_id, pipeline_run_metadata)

Publishes the given pipeline run metadata.

Parameters:

Name Type Description Default
pipeline_run_id UUID

The ID of the pipeline run.

required
pipeline_run_metadata Dict[UUID, Dict[str, MetadataType]]

A dictionary mapping stack component IDs to the metadata they created.

required
Source code in zenml/orchestrators/publish_utils.py
def publish_pipeline_run_metadata(
    pipeline_run_id: "UUID",
    pipeline_run_metadata: Dict["UUID", Dict[str, "MetadataType"]],
) -> None:
    """Publishes the given pipeline run metadata.

    Args:
        pipeline_run_id: The ID of the pipeline run.
        pipeline_run_metadata: A dictionary mapping stack component IDs to the
            metadata they created.
    """
    client = Client()
    for stack_component_id, metadata in pipeline_run_metadata.items():
        client.create_run_metadata(
            metadata=metadata,
            pipeline_run_id=pipeline_run_id,
            stack_component_id=stack_component_id,
        )

publish_step_run_metadata(step_run_id, step_run_metadata)

Publishes the given step run metadata.

Parameters:

Name Type Description Default
step_run_id UUID

The ID of the step run.

required
step_run_metadata Dict[UUID, Dict[str, MetadataType]]

A dictionary mapping stack component IDs to the metadata they created.

required
Source code in zenml/orchestrators/publish_utils.py
def publish_step_run_metadata(
    step_run_id: "UUID",
    step_run_metadata: Dict["UUID", Dict[str, "MetadataType"]],
) -> None:
    """Publishes the given step run metadata.

    Args:
        step_run_id: The ID of the step run.
        step_run_metadata: A dictionary mapping stack component IDs to the
            metadata they created.
    """
    client = Client()
    for stack_component_id, metadata in step_run_metadata.items():
        client.create_run_metadata(
            metadata=metadata,
            step_run_id=step_run_id,
            stack_component_id=stack_component_id,
        )

publish_successful_step_run(step_run_id, output_artifact_ids)

Publishes a successful step run.

Parameters:

Name Type Description Default
step_run_id UUID

The ID of the step run to update.

required
output_artifact_ids Dict[str, UUID]

The output artifact IDs for the step run.

required

Returns:

Type Description
StepRunResponseModel

The updated step run.

Source code in zenml/orchestrators/publish_utils.py
def publish_successful_step_run(
    step_run_id: "UUID", output_artifact_ids: Dict[str, "UUID"]
) -> "StepRunResponseModel":
    """Publishes a successful step run.

    Args:
        step_run_id: The ID of the step run to update.
        output_artifact_ids: The output artifact IDs for the step run.

    Returns:
        The updated step run.
    """
    return Client().zen_store.update_run_step(
        step_run_id=step_run_id,
        step_run_update=StepRunUpdateModel(
            status=ExecutionStatus.COMPLETED,
            end_time=datetime.utcnow(),
            output_artifacts=output_artifact_ids,
        ),
    )

update_pipeline_run_status(pipeline_run)

Updates the status of the current pipeline run.

Parameters:

Name Type Description Default
pipeline_run PipelineRunResponseModel

The model of the current pipeline run.

required
Source code in zenml/orchestrators/publish_utils.py
def update_pipeline_run_status(pipeline_run: PipelineRunResponseModel) -> None:
    """Updates the status of the current pipeline run.

    Args:
        pipeline_run: The model of the current pipeline run.
    """
    assert pipeline_run.num_steps is not None
    client = Client()
    steps_in_current_run = depaginate(
        partial(client.list_run_steps, pipeline_run_id=pipeline_run.id)
    )

    new_status = get_pipeline_run_status(
        step_statuses=[step_run.status for step_run in steps_in_current_run],
        num_steps=pipeline_run.num_steps,
    )

    if new_status != pipeline_run.status:
        run_update = PipelineRunUpdateModel(status=new_status)
        if new_status in {ExecutionStatus.COMPLETED, ExecutionStatus.FAILED}:
            run_update.end_time = datetime.utcnow()

        Client().zen_store.update_run(
            run_id=pipeline_run.id, run_update=run_update
        )

step_launcher

Class to launch (run directly or using a step operator) steps.

StepLauncher

A class responsible for launching a step of a ZenML pipeline.

This class follows these steps to launch and publish a ZenML step: 1. Publish or reuse a PipelineRun 2. Resolve the input artifacts of the step 3. Generate a cache key for the step 4. Check if the step can be cached or not 5. Publish a new StepRun 6. If the step can't be cached, the step will be executed in one of these two ways depending on its configuration: - Calling a step operator to run the step in a different environment - Calling a step runner to run the step in the current environment 7. Update the status of the previously published StepRun 8. Update the status of the PipelineRun

Source code in zenml/orchestrators/step_launcher.py
class StepLauncher:
    """A class responsible for launching a step of a ZenML pipeline.

    This class follows these steps to launch and publish a ZenML step:
    1. Publish or reuse a `PipelineRun`
    2. Resolve the input artifacts of the step
    3. Generate a cache key for the step
    4. Check if the step can be cached or not
    5. Publish a new `StepRun`
    6. If the step can't be cached, the step will be executed in one of these
    two ways depending on its configuration:
        - Calling a `step operator` to run the step in a different environment
        - Calling a `step runner` to run the step in the current environment
    7. Update the status of the previously published `StepRun`
    8. Update the status of the `PipelineRun`
    """

    def __init__(
        self,
        deployment: "PipelineDeploymentResponseModel",
        step: Step,
        orchestrator_run_id: str,
    ):
        """Initializes the launcher.

        Args:
            deployment: The pipeline deployment.
            step: The step to launch.
            orchestrator_run_id: The orchestrator pipeline run id.

        Raises:
            RuntimeError: If the deployment has no associated stack.
        """
        self._deployment = deployment
        self._step = step
        self._orchestrator_run_id = orchestrator_run_id

        if not deployment.stack:
            raise RuntimeError(
                f"Missing stack for deployment {deployment.id}. This is "
                "probably because the stack was manually deleted."
            )

        self._stack = Stack.from_model(deployment.stack)
        self._step_name = _get_step_name_in_pipeline(
            step=step, deployment=deployment
        )

    def launch(self) -> None:
        """Launches the step."""
        logger.info(f"Step `{self._step_name}` has started.")

        pipeline_run, run_was_created = self._create_or_reuse_run()
        try:
            if run_was_created:
                pipeline_run_metadata = self._stack.get_pipeline_run_metadata(
                    run_id=pipeline_run.id
                )
                publish_utils.publish_pipeline_run_metadata(
                    pipeline_run_id=pipeline_run.id,
                    pipeline_run_metadata=pipeline_run_metadata,
                )
            client = Client()
            docstring, source_code = self._get_step_docstring_and_source_code()
            step_run = StepRunRequestModel(
                name=self._step_name,
                pipeline_run_id=pipeline_run.id,
                step=self._step,
                status=ExecutionStatus.RUNNING,
                docstring=docstring,
                source_code=source_code,
                start_time=datetime.utcnow(),
                user=client.active_user.id,
                workspace=client.active_workspace.id,
            )
            try:
                execution_needed, step_run_response = self._prepare(
                    step_run=step_run
                )
            except:  # noqa: E722
                logger.error(
                    f"Failed during preparation to run step `{self._step_name}`."
                )
                step_run.status = ExecutionStatus.FAILED
                step_run.end_time = datetime.utcnow()
                Client().zen_store.create_run_step(step_run)
                raise

            if execution_needed:
                try:
                    self._run_step(
                        pipeline_run=pipeline_run,
                        step_run=step_run_response,
                    )
                except:  # noqa: E722
                    logger.error(f"Failed to run step `{self._step_name}`.")
                    publish_utils.publish_failed_step_run(step_run_response.id)
                    raise

            publish_utils.update_pipeline_run_status(pipeline_run=pipeline_run)
        except:  # noqa: E722
            logger.error(f"Pipeline run `{pipeline_run.name}` failed.")
            publish_utils.publish_failed_pipeline_run(pipeline_run.id)
            raise

    def _get_step_docstring_and_source_code(self) -> Tuple[Optional[str], str]:
        """Gets the docstring and source code of the step.

        If any of the two is longer than 1000 characters, it will be truncated.

        Returns:
            The docstring and source code of the step.
        """
        from zenml.steps.base_step import BaseStep

        step_instance = BaseStep.load_from_source(self._step.spec.source)

        docstring = step_instance.docstring
        if docstring and len(docstring) > 1000:
            docstring = docstring[:1000] + "..."

        source_code = step_instance.source_code
        if source_code and len(source_code) > 1000:
            source_code = source_code[:1000] + "..."

        return docstring, source_code

    def _create_or_reuse_run(self) -> Tuple[PipelineRunResponseModel, bool]:
        """Creates a pipeline run or reuses an existing one.

        Returns:
            The created or existing pipeline run,
            and a boolean indicating whether the run was created or reused.
        """
        run_id = orchestrator_utils.get_run_id_for_orchestrator_run_id(
            orchestrator=self._stack.orchestrator,
            orchestrator_run_id=self._orchestrator_run_id,
        )

        date = datetime.utcnow().strftime("%Y_%m_%d")
        time = datetime.utcnow().strftime("%H_%M_%S_%f")
        run_name = self._deployment.run_name_template.format(
            date=date, time=time
        )

        logger.debug(
            "Creating pipeline run with ID: %s, name: %s", run_id, run_name
        )

        client = Client()
        pipeline_run = PipelineRunRequestModel(
            id=run_id,
            name=run_name,
            orchestrator_run_id=self._orchestrator_run_id,
            user=client.active_user.id,
            workspace=client.active_workspace.id,
            stack=self._deployment.stack.id
            if self._deployment.stack
            else None,
            pipeline=self._deployment.pipeline.id
            if self._deployment.pipeline
            else None,
            build=self._deployment.build.id
            if self._deployment.build
            else None,
            deployment=self._deployment.id,
            schedule_id=self._deployment.schedule.id
            if self._deployment.schedule
            else None,
            enable_cache=self._deployment.pipeline_configuration.enable_cache,
            enable_artifact_metadata=self._deployment.pipeline_configuration.enable_artifact_metadata,
            status=ExecutionStatus.RUNNING,
            pipeline_configuration=self._deployment.pipeline_configuration.dict(),
            num_steps=len(self._deployment.step_configurations),
            client_environment=self._deployment.client_environment,
            orchestrator_environment=get_run_environment_dict(),
            server_version=client.zen_store.get_store_info().version,
            start_time=datetime.utcnow(),
        )
        return client.zen_store.get_or_create_run(pipeline_run)

    def _prepare(
        self, step_run: StepRunRequestModel
    ) -> Tuple[bool, StepRunResponseModel]:
        """Prepares running the step.

        Args:
            step_run: The step to run.

        Returns:
            Tuple that specifies whether the step needs to be executed as
            well as the response model of the registered step run.
        """
        input_artifacts, parent_step_ids = input_utils.resolve_step_inputs(
            step=self._step, run_id=step_run.pipeline_run_id
        )
        input_artifact_ids = {
            input_name: artifact.id
            for input_name, artifact in input_artifacts.items()
        }

        cache_key = cache_utils.generate_cache_key(
            step=self._step,
            input_artifact_ids=input_artifact_ids,
            artifact_store=self._stack.artifact_store,
            workspace_id=Client().active_workspace.id,
        )

        step_run.input_artifacts = input_artifact_ids
        step_run.parent_step_ids = parent_step_ids
        step_run.cache_key = cache_key

        cache_enabled = is_setting_enabled(
            is_enabled_on_step=self._step.config.enable_cache,
            is_enabled_on_pipeline=self._deployment.pipeline_configuration.enable_cache,
        )

        execution_needed = True
        if cache_enabled:
            cached_step_run = cache_utils.get_cached_step_run(
                cache_key=cache_key
            )
            if cached_step_run:
                logger.info(f"Using cached version of `{self._step_name}`.")
                execution_needed = False
                cached_outputs = cached_step_run.output_artifacts
                step_run.original_step_run_id = cached_step_run.id
                step_run.output_artifacts = {
                    output_name: artifact.id
                    for output_name, artifact in cached_outputs.items()
                }
                step_run.status = ExecutionStatus.CACHED
                step_run.end_time = step_run.start_time

        step_run_response = Client().zen_store.create_run_step(step_run)

        return execution_needed, step_run_response

    def _run_step(
        self,
        pipeline_run: PipelineRunResponseModel,
        step_run: StepRunResponseModel,
    ) -> None:
        """Runs the current step.

        Args:
            pipeline_run: The model of the current pipeline run.
            step_run: The model of the current step run.
        """
        # Prepare step run information.
        step_run_info = StepRunInfo(
            config=self._step.config,
            pipeline=self._deployment.pipeline_configuration,
            run_name=pipeline_run.name,
            pipeline_step_name=self._step_name,
            run_id=pipeline_run.id,
            step_run_id=step_run.id,
        )

        output_artifact_uris = output_utils.prepare_output_artifact_uris(
            step_run=step_run, stack=self._stack, step=self._step
        )

        # Run the step.
        start_time = time.time()
        try:
            if self._step.config.step_operator:
                self._run_step_with_step_operator(
                    step_operator_name=self._step.config.step_operator,
                    step_run_info=step_run_info,
                )
            else:
                self._run_step_without_step_operator(
                    step_run_info=step_run_info,
                    input_artifacts=step_run.input_artifacts,
                    output_artifact_uris=output_artifact_uris,
                )
        except:  # noqa: E722
            output_utils.remove_artifact_dirs(
                artifact_uris=list(output_artifact_uris.values())
            )
            raise

        duration = time.time() - start_time
        logger.info(
            f"Step `{self._step_name}` has finished in "
            f"{string_utils.get_human_readable_time(duration)}."
        )

    def _run_step_with_step_operator(
        self,
        step_operator_name: str,
        step_run_info: StepRunInfo,
    ) -> None:
        """Runs the current step with a step operator.

        Args:
            step_operator_name: The name of the step operator to use.
            step_run_info: Additional information needed to run the step.
        """
        from zenml.step_operators.step_operator_entrypoint_configuration import (
            StepOperatorEntrypointConfiguration,
        )

        step_operator = _get_step_operator(
            stack=self._stack,
            step_operator_name=step_operator_name,
        )
        entrypoint_command = (
            StepOperatorEntrypointConfiguration.get_entrypoint_command()
            + StepOperatorEntrypointConfiguration.get_entrypoint_arguments(
                step_name=self._step_name,
                deployment_id=self._deployment.id,
                step_run_id=str(step_run_info.step_run_id),
            )
        )
        logger.info(
            "Using step operator `%s` to run step `%s`.",
            step_operator.name,
            self._step_name,
        )
        step_operator.launch(
            info=step_run_info,
            entrypoint_command=entrypoint_command,
        )

    def _run_step_without_step_operator(
        self,
        step_run_info: StepRunInfo,
        input_artifacts: Dict[str, "ArtifactResponseModel"],
        output_artifact_uris: Dict[str, str],
    ) -> None:
        """Runs the current step without a step operator.

        Args:
            step_run_info: Additional information needed to run the step.
            input_artifacts: The input artifacts of the current step.
            output_artifact_uris: The output artifact URIs of the current step.
        """
        runner = StepRunner(step=self._step, stack=self._stack)
        runner.run(
            input_artifacts=input_artifacts,
            output_artifact_uris=output_artifact_uris,
            step_run_info=step_run_info,
        )
__init__(self, deployment, step, orchestrator_run_id) special

Initializes the launcher.

Parameters:

Name Type Description Default
deployment PipelineDeploymentResponseModel

The pipeline deployment.

required
step Step

The step to launch.

required
orchestrator_run_id str

The orchestrator pipeline run id.

required

Exceptions:

Type Description
RuntimeError

If the deployment has no associated stack.

Source code in zenml/orchestrators/step_launcher.py
def __init__(
    self,
    deployment: "PipelineDeploymentResponseModel",
    step: Step,
    orchestrator_run_id: str,
):
    """Initializes the launcher.

    Args:
        deployment: The pipeline deployment.
        step: The step to launch.
        orchestrator_run_id: The orchestrator pipeline run id.

    Raises:
        RuntimeError: If the deployment has no associated stack.
    """
    self._deployment = deployment
    self._step = step
    self._orchestrator_run_id = orchestrator_run_id

    if not deployment.stack:
        raise RuntimeError(
            f"Missing stack for deployment {deployment.id}. This is "
            "probably because the stack was manually deleted."
        )

    self._stack = Stack.from_model(deployment.stack)
    self._step_name = _get_step_name_in_pipeline(
        step=step, deployment=deployment
    )
launch(self)

Launches the step.

Source code in zenml/orchestrators/step_launcher.py
def launch(self) -> None:
    """Launches the step."""
    logger.info(f"Step `{self._step_name}` has started.")

    pipeline_run, run_was_created = self._create_or_reuse_run()
    try:
        if run_was_created:
            pipeline_run_metadata = self._stack.get_pipeline_run_metadata(
                run_id=pipeline_run.id
            )
            publish_utils.publish_pipeline_run_metadata(
                pipeline_run_id=pipeline_run.id,
                pipeline_run_metadata=pipeline_run_metadata,
            )
        client = Client()
        docstring, source_code = self._get_step_docstring_and_source_code()
        step_run = StepRunRequestModel(
            name=self._step_name,
            pipeline_run_id=pipeline_run.id,
            step=self._step,
            status=ExecutionStatus.RUNNING,
            docstring=docstring,
            source_code=source_code,
            start_time=datetime.utcnow(),
            user=client.active_user.id,
            workspace=client.active_workspace.id,
        )
        try:
            execution_needed, step_run_response = self._prepare(
                step_run=step_run
            )
        except:  # noqa: E722
            logger.error(
                f"Failed during preparation to run step `{self._step_name}`."
            )
            step_run.status = ExecutionStatus.FAILED
            step_run.end_time = datetime.utcnow()
            Client().zen_store.create_run_step(step_run)
            raise

        if execution_needed:
            try:
                self._run_step(
                    pipeline_run=pipeline_run,
                    step_run=step_run_response,
                )
            except:  # noqa: E722
                logger.error(f"Failed to run step `{self._step_name}`.")
                publish_utils.publish_failed_step_run(step_run_response.id)
                raise

        publish_utils.update_pipeline_run_status(pipeline_run=pipeline_run)
    except:  # noqa: E722
        logger.error(f"Pipeline run `{pipeline_run.name}` failed.")
        publish_utils.publish_failed_pipeline_run(pipeline_run.id)
        raise

step_runner

Class to run steps.

StepRunner

Class to run steps.

Source code in zenml/orchestrators/step_runner.py
class StepRunner:
    """Class to run steps."""

    def __init__(self, step: "Step", stack: "Stack"):
        """Initializes the step runner.

        Args:
            step: The step to run.
            stack: The stack on which the step should run.
        """
        self._step = step
        self._stack = stack

    @property
    def configuration(self) -> StepConfiguration:
        """Configuration of the step to run.

        Returns:
            The step configuration.
        """
        return self._step.config

    def run(
        self,
        input_artifacts: Dict[str, "ArtifactResponseModel"],
        output_artifact_uris: Dict[str, str],
        step_run_info: StepRunInfo,
    ) -> None:
        """Runs the step.

        Args:
            input_artifacts: The input artifacts of the step.
            output_artifact_uris: The URIs of the output artifacts of the step.
            step_run_info: The step run info.
        """
        step_entrypoint = self._load_step_entrypoint()
        output_materializers = self._load_output_materializers()
        spec = inspect.getfullargspec(inspect.unwrap(step_entrypoint))

        # Parse the inputs for the entrypoint function.
        function_params = self._parse_inputs(
            args=spec.args,
            annotations=spec.annotations,
            input_artifacts=input_artifacts,
            output_artifact_uris=output_artifact_uris,
            output_materializers=output_materializers,
        )

        # Wrap the execution of the step function in a step environment
        # that the step function code can access to retrieve information about
        # the pipeline runtime, such as the current step name and the current
        # pipeline run ID
        cache_enabled = is_setting_enabled(
            is_enabled_on_step=step_run_info.config.enable_cache,
            is_enabled_on_pipeline=step_run_info.pipeline.enable_cache,
        )
        with StepEnvironment(
            step_run_info=step_run_info,
            cache_enabled=cache_enabled,
        ):
            self._stack.prepare_step_run(info=step_run_info)
            step_failed = False
            try:
                return_values = step_entrypoint(**function_params)
            except:  # noqa: E722
                step_failed = True
                raise
            finally:
                step_run_metadata = self._stack.get_step_run_metadata(
                    info=step_run_info,
                )
                publish_step_run_metadata(
                    step_run_id=step_run_info.step_run_id,
                    step_run_metadata=step_run_metadata,
                )
                self._stack.cleanup_step_run(
                    info=step_run_info, step_failed=step_failed
                )

        # Store and publish the output artifacts of the step function.
        output_annotations = parse_return_type_annotations(spec.annotations)
        output_data = self._validate_outputs(return_values, output_annotations)
        artifact_metadata_enabled = is_setting_enabled(
            is_enabled_on_step=step_run_info.config.enable_artifact_metadata,
            is_enabled_on_pipeline=step_run_info.pipeline.enable_artifact_metadata,
        )
        output_artifacts, artifact_metadata = self._store_output_artifacts(
            output_data=output_data,
            output_artifact_uris=output_artifact_uris,
            output_materializers=output_materializers,
            artifact_metadata_enabled=artifact_metadata_enabled,
        )
        output_artifact_ids = publish_output_artifacts(
            output_artifacts=output_artifacts,
        )
        publish_output_artifact_metadata(
            output_artifact_ids=output_artifact_ids,
            output_artifact_metadata=artifact_metadata,
        )

        # Update the status and output artifacts of the step run.
        publish_successful_step_run(
            step_run_id=step_run_info.step_run_id,
            output_artifact_ids=output_artifact_ids,
        )

    def _load_step_entrypoint(self) -> Callable[..., Any]:
        """Load the step entrypoint function.

        Returns:
            The step entrypoint function.
        """
        from zenml.steps import BaseStep

        step_instance = BaseStep.load_from_source(self._step.spec.source)
        step_instance._configuration = self._step.config
        return step_instance.entrypoint

    def _load_output_materializers(self) -> Dict[str, Type[BaseMaterializer]]:
        """Loads the output materializers for the step.

        Returns:
            The step output materializers.
        """
        materializers = {}
        for name, output in self.configuration.outputs.items():
            materializer_class: Type[
                BaseMaterializer
            ] = source_utils.load_and_validate_class(
                output.materializer_source, expected_class=BaseMaterializer
            )
            materializers[name] = materializer_class
        return materializers

    def _parse_inputs(
        self,
        args: List[str],
        annotations: Dict[str, Any],
        input_artifacts: Dict[str, "ArtifactResponseModel"],
        output_artifact_uris: Dict[str, str],
        output_materializers: Dict[str, Type[BaseMaterializer]],
    ) -> Dict[str, Any]:
        """Parses the inputs for a step entrypoint function.

        Args:
            args: The arguments of the step entrypoint function.
            annotations: The annotations of the step entrypoint function.
            input_artifacts: The input artifacts of the step.
            output_artifact_uris: The URIs of the output artifacts of the step.
            output_materializers: The output materializers of the step.

        Returns:
            The parsed inputs for the step entrypoint function.
        """
        from zenml.steps import BaseParameters

        function_params: Dict[str, Any] = {}

        if args and args[0] == "self":
            args.pop(0)

        for arg in args:
            arg_type = annotations.get(arg, None)
            arg_type = resolve_type_annotation(arg_type)

            # Parse the parameters
            if issubclass(arg_type, BaseParameters):
                step_params = arg_type.parse_obj(self.configuration.parameters)
                function_params[arg] = step_params

            # Parse the step context
            elif issubclass(arg_type, StepContext):
                step_name = self.configuration.name
                context = arg_type(
                    step_name=step_name,
                    output_materializers=output_materializers,
                    output_artifact_uris=output_artifact_uris,
                )
                function_params[arg] = context

            # Parse the input artifacts
            else:
                # At this point, it has to be an artifact, so we resolve
                function_params[arg] = self._load_input_artifact(
                    input_artifacts[arg], arg_type
                )

        return function_params

    def _load_input_artifact(
        self, artifact: "ArtifactResponseModel", data_type: Type[Any]
    ) -> Any:
        """Loads an input artifact.

        Args:
            artifact: The artifact to load.
            data_type: The data type of the artifact value.

        Returns:
            The artifact value.
        """
        # Skip materialization for `UnmaterializedArtifact`.
        if data_type == UnmaterializedArtifact:
            return UnmaterializedArtifact.parse_obj(artifact)

        # Skip materialization for `BaseArtifact` and its subtypes.
        if issubclass(data_type, BaseArtifact):
            logger.warning(
                "Skipping materialization by specifying a subclass of "
                "`zenml.artifacts.BaseArtifact` as input data type is "
                "deprecated and will be removed in a future release. Please "
                "type your input as "
                "`zenml.materializers.UnmaterializedArtifact` instead."
            )
            return artifact

        materializer_class: Type[
            BaseMaterializer
        ] = source_utils.load_and_validate_class(
            artifact.materializer, expected_class=BaseMaterializer
        )
        materializer = materializer_class(artifact.uri)
        return materializer.load(data_type=data_type)

    def _validate_outputs(
        self,
        return_values: Any,
        output_annotations: Dict[str, Any],
    ) -> Dict[str, Any]:
        """Validates the step function outputs.

        Args:
            return_values: The return values of the step function.
            output_annotations: The output annotations of the step function.

        Returns:
            The validated output, mapping output names to return values.

        Raises:
            StepInterfaceError: If the step function return values do not
                match the output annotations.
        """
        step_name = self.configuration.name

        # if there are no outputs, the return value must be `None`.
        if len(output_annotations) == 0:
            if return_values is not None:
                raise StepInterfaceError(
                    f"Wrong step function output type for step '{step_name}': "
                    f"Expected no outputs but the function returned something: "
                    f"{return_values}."
                )
            return {}

        # if there is only one output annotation (either directly specified
        # or contained in an `Output` tuple) we treat the step function
        # return value as the return for that output.
        if len(output_annotations) == 1:
            return_values = [return_values]

        # if the user defined multiple outputs, the return value must be a list
        # or tuple.
        if not isinstance(return_values, (list, tuple)):
            raise StepInterfaceError(
                f"Wrong step function output type for step '{step_name}': "
                f"Expected multiple outputs ({output_annotations}) but "
                f"the function did not return a list or tuple "
                f"(actual return value: {return_values})."
            )

        # The amount of actual outputs must be the same as the amount of
        # expected outputs.
        if len(output_annotations) != len(return_values):
            raise StepInterfaceError(
                f"Wrong amount of step function outputs for step "
                f"'{step_name}: Expected {len(output_annotations)} outputs "
                f"but the function returned {len(return_values)} outputs"
                f"(return values: {return_values})."
            )

        # Validate the output types.
        validated_outputs: Dict[str, Any] = {}
        for return_value, (output_name, output_type) in zip(
            return_values, output_annotations.items()
        ):
            # The actual output type must be the same as the expected output
            # type.
            if not isinstance(return_value, output_type):
                raise StepInterfaceError(
                    f"Wrong type for output '{output_name}' of step "
                    f"'{step_name}' (expected type: {output_type}, "
                    f"actual type: {type(return_value)})."
                )
            validated_outputs[output_name] = return_value
        return validated_outputs

    def _store_output_artifacts(
        self,
        output_data: Dict[str, Any],
        output_materializers: Dict[str, Type[BaseMaterializer]],
        output_artifact_uris: Dict[str, str],
        artifact_metadata_enabled: bool,
    ) -> Tuple[
        Dict[str, ArtifactRequestModel], Dict[str, Dict[str, "MetadataType"]]
    ]:
        """Stores the output artifacts of the step.

        Args:
            output_data: The output data of the step function, mapping output
                names to return values.
            output_materializers: The output materializers of the step.
            output_artifact_uris: The output artifact URIs of the step.
            artifact_metadata_enabled: Whether artifact metadata collection is
                enabled.

        Returns:
            An `ArtifactRequestModel` for each output artifact that was saved,
            and the metadata of each output artifact.
        """
        client = Client()
        active_user_id = client.active_user.id
        active_workspace_id = client.active_workspace.id
        artifact_stores = client.active_stack_model.components.get(
            StackComponentType.ARTIFACT_STORE
        )
        assert artifact_stores  # Every stack has an artifact store.
        artifact_store_id = artifact_stores[0].id
        output_artifacts: Dict[str, ArtifactRequestModel] = {}
        output_artifact_metadata: Dict[str, Dict[str, "MetadataType"]] = {}
        for output_name, return_value in output_data.items():
            materializer_class = output_materializers[output_name]
            materializer_source = self.configuration.outputs[
                output_name
            ].materializer_source
            uri = output_artifact_uris[output_name]
            materializer = materializer_class(uri)
            materializer.save(return_value)
            if artifact_metadata_enabled:
                try:
                    artifact_metadata = materializer.extract_metadata(
                        return_value
                    )
                    output_artifact_metadata[output_name] = artifact_metadata
                except Exception as e:
                    logger.warning(
                        f"Failed to extract metadata for output artifact "
                        f"'{output_name}' of step '{self.configuration.name}': "
                        f"{e}"
                    )
            output_artifact = ArtifactRequestModel(
                name=output_name,
                type=materializer_class.ASSOCIATED_ARTIFACT_TYPE,
                uri=uri,
                materializer=materializer_source,
                data_type=source_utils.resolve_class(type(return_value)),
                user=active_user_id,
                workspace=active_workspace_id,
                artifact_store_id=artifact_store_id,
            )
            output_artifacts[output_name] = output_artifact
        return output_artifacts, output_artifact_metadata
configuration: StepConfiguration property readonly

Configuration of the step to run.

Returns:

Type Description
StepConfiguration

The step configuration.

__init__(self, step, stack) special

Initializes the step runner.

Parameters:

Name Type Description Default
step Step

The step to run.

required
stack Stack

The stack on which the step should run.

required
Source code in zenml/orchestrators/step_runner.py
def __init__(self, step: "Step", stack: "Stack"):
    """Initializes the step runner.

    Args:
        step: The step to run.
        stack: The stack on which the step should run.
    """
    self._step = step
    self._stack = stack
run(self, input_artifacts, output_artifact_uris, step_run_info)

Runs the step.

Parameters:

Name Type Description Default
input_artifacts Dict[str, ArtifactResponseModel]

The input artifacts of the step.

required
output_artifact_uris Dict[str, str]

The URIs of the output artifacts of the step.

required
step_run_info StepRunInfo

The step run info.

required
Source code in zenml/orchestrators/step_runner.py
def run(
    self,
    input_artifacts: Dict[str, "ArtifactResponseModel"],
    output_artifact_uris: Dict[str, str],
    step_run_info: StepRunInfo,
) -> None:
    """Runs the step.

    Args:
        input_artifacts: The input artifacts of the step.
        output_artifact_uris: The URIs of the output artifacts of the step.
        step_run_info: The step run info.
    """
    step_entrypoint = self._load_step_entrypoint()
    output_materializers = self._load_output_materializers()
    spec = inspect.getfullargspec(inspect.unwrap(step_entrypoint))

    # Parse the inputs for the entrypoint function.
    function_params = self._parse_inputs(
        args=spec.args,
        annotations=spec.annotations,
        input_artifacts=input_artifacts,
        output_artifact_uris=output_artifact_uris,
        output_materializers=output_materializers,
    )

    # Wrap the execution of the step function in a step environment
    # that the step function code can access to retrieve information about
    # the pipeline runtime, such as the current step name and the current
    # pipeline run ID
    cache_enabled = is_setting_enabled(
        is_enabled_on_step=step_run_info.config.enable_cache,
        is_enabled_on_pipeline=step_run_info.pipeline.enable_cache,
    )
    with StepEnvironment(
        step_run_info=step_run_info,
        cache_enabled=cache_enabled,
    ):
        self._stack.prepare_step_run(info=step_run_info)
        step_failed = False
        try:
            return_values = step_entrypoint(**function_params)
        except:  # noqa: E722
            step_failed = True
            raise
        finally:
            step_run_metadata = self._stack.get_step_run_metadata(
                info=step_run_info,
            )
            publish_step_run_metadata(
                step_run_id=step_run_info.step_run_id,
                step_run_metadata=step_run_metadata,
            )
            self._stack.cleanup_step_run(
                info=step_run_info, step_failed=step_failed
            )

    # Store and publish the output artifacts of the step function.
    output_annotations = parse_return_type_annotations(spec.annotations)
    output_data = self._validate_outputs(return_values, output_annotations)
    artifact_metadata_enabled = is_setting_enabled(
        is_enabled_on_step=step_run_info.config.enable_artifact_metadata,
        is_enabled_on_pipeline=step_run_info.pipeline.enable_artifact_metadata,
    )
    output_artifacts, artifact_metadata = self._store_output_artifacts(
        output_data=output_data,
        output_artifact_uris=output_artifact_uris,
        output_materializers=output_materializers,
        artifact_metadata_enabled=artifact_metadata_enabled,
    )
    output_artifact_ids = publish_output_artifacts(
        output_artifacts=output_artifacts,
    )
    publish_output_artifact_metadata(
        output_artifact_ids=output_artifact_ids,
        output_artifact_metadata=artifact_metadata,
    )

    # Update the status and output artifacts of the step run.
    publish_successful_step_run(
        step_run_id=step_run_info.step_run_id,
        output_artifact_ids=output_artifact_ids,
    )

topsort

Utilities for topological sort.

Implementation heavily inspired by TFX: https://github.com/tensorflow/tfx/blob/master/tfx/utils/topsort.py

topsorted_layers(nodes, get_node_id_fn, get_parent_nodes, get_child_nodes)

Sorts the DAG of nodes in topological order.

Parameters:

Name Type Description Default
nodes Sequence[~NodeT]

A sequence of nodes.

required
get_node_id_fn Callable[[~NodeT], str]

Callable that returns a unique text identifier for a node.

required
get_parent_nodes Callable[[~NodeT], List[~NodeT]]

Callable that returns a list of parent nodes for a node. If a parent node's id is not found in the list of node ids, that parent node will be omitted.

required
get_child_nodes Callable[[~NodeT], List[~NodeT]]

Callable that returns a list of child nodes for a node. If a child node's id is not found in the list of node ids, that child node will be omitted.

required

Returns:

Type Description
List[List[~NodeT]]

A list of topologically ordered node layers. Each layer of nodes is sorted by its node id given by get_node_id_fn.

Exceptions:

Type Description
RuntimeError

If the input nodes don't form a DAG.

ValueError

If the nodes are not unique.

Source code in zenml/orchestrators/topsort.py
def topsorted_layers(
    nodes: Sequence[NodeT],
    get_node_id_fn: Callable[[NodeT], str],
    get_parent_nodes: Callable[[NodeT], List[NodeT]],
    get_child_nodes: Callable[[NodeT], List[NodeT]],
) -> List[List[NodeT]]:
    """Sorts the DAG of nodes in topological order.

    Args:
        nodes: A sequence of nodes.
        get_node_id_fn: Callable that returns a unique text identifier for a node.
        get_parent_nodes: Callable that returns a list of parent nodes for a node.
            If a parent node's id is not found in the list of node ids, that parent
            node will be omitted.
        get_child_nodes: Callable that returns a list of child nodes for a node.
            If a child node's id is not found in the list of node ids, that child
            node will be omitted.

    Returns:
        A list of topologically ordered node layers. Each layer of nodes is sorted
        by its node id given by `get_node_id_fn`.

    Raises:
        RuntimeError: If the input nodes don't form a DAG.
        ValueError: If the nodes are not unique.
    """
    # Make sure the nodes are unique.
    node_ids = set(get_node_id_fn(n) for n in nodes)
    if len(node_ids) != len(nodes):
        raise ValueError("Nodes must have unique ids.")

    # The outputs of get_(parent|child)_nodes should always be deduplicated,
    # and references to unknown nodes should be removed.
    def _apply_and_clean(
        func: Callable[[NodeT], List[NodeT]], func_name: str, node: NodeT
    ) -> List[NodeT]:
        seen_inner_node_ids = set()
        result = []
        for inner_node in func(node):
            inner_node_id = get_node_id_fn(inner_node)
            if inner_node_id in seen_inner_node_ids:
                logger.warning(
                    "Duplicate node_id %s found when calling %s on node %s. "
                    "This entry will be ignored.",
                    inner_node_id,
                    func_name,
                    node,
                )
            elif inner_node_id not in node_ids:
                logger.warning(
                    "node_id %s found when calling %s on node %s, but this node_id is "
                    "not found in the set of input nodes %s. This entry will be "
                    "ignored.",
                    inner_node_id,
                    func_name,
                    node,
                    node_ids,
                )
            else:
                seen_inner_node_ids.add(inner_node_id)
                result.append(inner_node)

        return result

    def get_clean_parent_nodes(node: NodeT) -> List[NodeT]:
        return _apply_and_clean(get_parent_nodes, "get_parent_nodes", node)

    def get_clean_child_nodes(node: NodeT) -> List[NodeT]:
        return _apply_and_clean(get_child_nodes, "get_child_nodes", node)

    # The first layer contains nodes with no incoming edges.
    layer = [node for node in nodes if not get_clean_parent_nodes(node)]

    visited_node_ids = set()
    layers = []
    while layer:
        layer = sorted(layer, key=get_node_id_fn)
        layers.append(layer)

        next_layer = []
        for node in layer:
            visited_node_ids.add(get_node_id_fn(node))
            for child_node in get_clean_child_nodes(node):
                # Include the child node if all its parents are visited. If the child
                # node is part of a cycle, it will never be included since it will have
                # at least one unvisited parent node which is also part of the cycle.
                parent_node_ids = set(
                    get_node_id_fn(p)
                    for p in get_clean_parent_nodes(child_node)
                )
                if parent_node_ids.issubset(visited_node_ids):
                    next_layer.append(child_node)
        layer = next_layer

    num_output_nodes = sum(len(layer) for layer in layers)
    # Nodes in cycles are not included in layers; raise an error if this happens.
    if num_output_nodes < len(nodes):
        raise RuntimeError("Cannot sort graph because it contains a cycle.")
    # This should never happen; raise an error if this occurs.
    if num_output_nodes > len(nodes):
        raise RuntimeError("Unknown error occurred while sorting DAG.")

    return layers

utils

Utility functions for the orchestrator.

get_orchestrator_run_name(pipeline_name)

Gets an orchestrator run name.

This run name is not the same as the ZenML run name but can instead be used to display in the orchestrator UI.

Parameters:

Name Type Description Default
pipeline_name str

Name of the pipeline that will run.

required

Returns:

Type Description
str

The orchestrator run name.

Source code in zenml/orchestrators/utils.py
def get_orchestrator_run_name(pipeline_name: str) -> str:
    """Gets an orchestrator run name.

    This run name is not the same as the ZenML run name but can instead be
    used to display in the orchestrator UI.

    Args:
        pipeline_name: Name of the pipeline that will run.

    Returns:
        The orchestrator run name.
    """
    user_name = Client().active_user.name
    return f"{pipeline_name}_{user_name}_{random.Random().getrandbits(32):08x}"

get_run_id_for_orchestrator_run_id(orchestrator, orchestrator_run_id)

Generates a run ID from an orchestrator run id.

Parameters:

Name Type Description Default
orchestrator BaseOrchestrator

The orchestrator of the run.

required
orchestrator_run_id str

The orchestrator run id.

required

Returns:

Type Description
UUID

The run id generated from the orchestrator run id.

Source code in zenml/orchestrators/utils.py
def get_run_id_for_orchestrator_run_id(
    orchestrator: "BaseOrchestrator", orchestrator_run_id: str
) -> UUID:
    """Generates a run ID from an orchestrator run id.

    Args:
        orchestrator: The orchestrator of the run.
        orchestrator_run_id: The orchestrator run id.

    Returns:
        The run id generated from the orchestrator run id.
    """
    run_id_seed = f"{orchestrator.id}-{orchestrator_run_id}"
    return uuid_utils.generate_uuid_from_string(run_id_seed)

is_setting_enabled(is_enabled_on_step, is_enabled_on_pipeline)

Checks if a certain setting is enabled within a step run.

This is the case if: - the setting is explicitly enabled for the step, or - the setting is neither explicitly disabled for the step nor the pipeline.

Parameters:

Name Type Description Default
is_enabled_on_step Optional[bool]

The setting of the step.

required
is_enabled_on_pipeline Optional[bool]

The setting of the pipeline.

required

Returns:

Type Description
bool

True if the setting is enabled within the step run, False otherwise.

Source code in zenml/orchestrators/utils.py
def is_setting_enabled(
    is_enabled_on_step: Optional[bool],
    is_enabled_on_pipeline: Optional[bool],
) -> bool:
    """Checks if a certain setting is enabled within a step run.

    This is the case if:
    - the setting is explicitly enabled for the step, or
    - the setting is neither explicitly disabled for the step nor the pipeline.

    Args:
        is_enabled_on_step: The setting of the step.
        is_enabled_on_pipeline: The setting of the pipeline.

    Returns:
        True if the setting is enabled within the step run, False otherwise.
    """
    if is_enabled_on_step is not None:
        return is_enabled_on_step
    if is_enabled_on_pipeline is not None:
        return is_enabled_on_pipeline
    return True