Developer-first workflow automation

Durable Python: Reliable, Long-Running Workflows with Just a Few Lines of Code

In modern distributed systems, developers often need to orchestrate workflows that connect APIs from diverse systems or services. Common automation use cases may include DevOps, AI pipelines, business operations workflows, and more.

Generally speaking, coding the business logic for such a workflow can be relatively straightforward. What’s more complicated for some developers is dealing with the technical and infrastructure aspects required to ensure a successful execution. One of the biggest challenges in this context is building reliable, long-running workflows. 

This post focuses on a new approach – called Durable Python – that is designed to streamline and simplify the development of reliable workflows for Python developers.

Understanding the challenge of long-running workflows

Take, for example, a workflow that copies a database or large chunk of data from one place to another.  Such a workflow might have the following steps: 

  1. Copy data 
  2. Validate the operation completed successfully 
  3. Delete the old data 
  4. Send notification to manager.

The code might look something like this:

This is clearly very basic code, but it can run well as long as all functions succeed, the code runs from the beginning to the end, and the functions are synchronous. But what happens if the functions are long-running? For example, what happens if functions take minutes or hours to execute, and the execution of this code stops in the middle because the server running it failed? 

There are several ways to deal with such a failure. The simplest one is to rerun the code. This could work, but it could also be expensive and time-consuming. Another way would be to save the state of the workflow and recover from it. To do that, you would need to manage the state of the workflow, monitor incomplete workflows, and run them from the state of failure. The copy procedure would then look something like:

The monitoring procedure, which should continuously run on a server, might look like this:

It can get even more complicated if the functions are asynchronous, but let’s focus on the simpler synchronous case. As shown above, a simple 4-line workflow turns into a multi-faceted system with dozens of lines of code, which makes it error-prone and requires significant testing. 

The workflow gets even more complicated when one or more functions are implemented on a different server. It’s possible that the remote server might not be available temporarily due to network or server failures. To address such errors, you would need to implement retry policies and workflow management to handle fatal errors. 

Talking with developers, our impression is that in many use cases, very little effort is invested in reliability. The reasons are clear: it’s very time-consuming, and in many cases, developers simply do not account for system errors because they are rare. But no free meals… the bill is served in production.

Building durable workflows the easy way

This simple example emphasizes the complexity of building long-running, reliable workflows.

But this is actually a very generic and common problem, especially in modern systems using microservices and APIs for external applications. 

There are various frameworks for building durable (long-running and reliable) workflows. Platforms for workflow automation like Airflow and durable execution platforms such as Microsoft Durable-Functions and do this well. But they all require understanding the framework, writing your code in a specific way, and building an infrastructure.

But what if you want to build a durable workflow without investing too much time in learning new technologies and setting them up, and without major efforts writing reliable long-running code?

Durable Python for durable workflow execution

To address this need, we present a new approach called Durable Python. Based on this approach, regular Python code can be deployed on a dedicated platform that assures that any task is durable by nature. Under the hood, the platform’s execution engine guarantees durability without requiring additional code from the developer: 

  • In case of failure, ensures return to execution after restart from the same spot it stopped 
  • Takes care of retries to overcome temporal network and remote server infrastructure failures
  • Provides execution monitoring tools to track the success or failure of executions
  • Provides logging utilities to debug and find error causes after execution

Introducing the AutoKitteh Platform for Durable Python

AutoKitteh is a platform that enables Durable Python. It is a server on which Python code (i.e., business logic) can be deployed and executed.  In our example above, with AutoKitteh all you need is the original four lines of code to ensure process reliability. In the extreme case of failure, it provides full visibility of the failed workflows and log-trace showing the exact location of errors. It also provides options to stop workflows and easily deploy new code versions.

This demo shows how you can run Python code in AutoKitteh as Durable Python. Under the hood, AutoKitteh runs the Python code as a Temporal workflow where calls to external, non-deterministic APIs are executed as Temporal activities. You can learn more about AutoKitteh here

How developers benefit from Durable Python

Durable Python makes it easy for developers to build reliable, long-running workflows:

  • Less code, less bugs, less development time, minimal ramp-up time
  • Reliability based on state management and seamless recovery from server downtime
  • Workflow management and tracing
  • “Serverless” environment for easy deployment of any workflow 

Durable Python can be very helpful in diverse use cases, such as core business workflows, CI/CD and DevOps pipelines, machine learning training loops, office automation, on-call ticket handling, and more.

Bottom line: less code, less time, less pain

By leveraging Durable-Python, you can create, deploy and manage reliable workflow automation with just a few lines of code. 

Durable execution and seamless recovery is managed under the hood without any need for developer involvement. Developers simply write the business logic and upload to AutoKitteh, which then adds the necessary functionality (e.g., queues, state management) “behind the scenes” to ensure durable execution.

While there several excellent dedicated tools for achieving durability (e.g., using Temporal), Durable Python lets you do it in a fast, simple and inexpensive manner.